O pozici
Mission:
As a Data Scientist specializing in LLM you will be responsible for developing, implementing and optimizing advanced algorithms, models and capabilities that help our teams (e.g. Field Operations) automate their workloads, e.g. in the context of decision support, mission planning and situational awareness. You will work on a variety of projects that involve understanding, processing and generating human language to solve complex problems and create innovative solutions. The ideal candidate will have a strong background in LLMs, machine learning and data science, with a proven track record of successful projects in these domains.
Co budeš dělat
- Algorithm Development: Design, develop and implement state-of-the-art algorithms and models, within the context of language models.
- Capability Development: Realize new AI-based capabilities in areas such as decision support, mission planning, workflow automation.
- Model Training and Optimization : Train and optimize large language models using vast amounts of textual data, ensuring high performance and accuracy.
- Data Preprocessing : Perform data preprocessing tasks such as tokenization, stemming, lemmatization and normalization to prepare datasets for training and evaluation.
- Research and Innovation: Stay current with the latest advancements in LLM and Natural Language Processing (NLP) and apply new techniques to improve existing models and develop new solutions.
- Collaboration: Work closely with data engineers, software developers, product managers and other stakeholders to understand project requirements and deliver effective solutions.
- Performance Evaluation: Evaluate the performance of models using appropriate metrics and techniques and iteratively improve their accuracy and efficiency.
- Deployment: Collaborate with engineering teams to deploy models into production environments and ensure their robustness and scalability.
- Documentation: Maintain comprehensive documentation of models, algorithms and processes for future reference and reproducibility.
Koho hledáme
- Education : Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field. A Ph.D. is a plus.
- Experience: 3+ years of experience in data science, with a focus on large language models and NLP.
- Strong programming skills in Python, with experience using NLP and LLM libraries such as spaCy, Hugging Face (Transformers, Datasets, PEFT, TRL) and the major model families (e.g. GPT, Claude, Gemini, Llama, Mistral, Qwen, Gemma) via both API and open weights.
- Proficiency in deep learning frameworks, primarily PyTorch (plus Keras/TensorFlow as needed), and familiarity with inference optimisation (quantisation, TensorRT-LLM).
- Experience with data preprocessing , curation and tokenisation for LLM workloads, including building and cleaning datasets for fine-tuning and retrieval (chunking, embeddings, deduplication, synthetic data generation).
- Solid understanding of transformer architectures and attention, with working knowledge of fine-tuning and alignment techniques (full fine-tuning, LoRA/QLoRA, instruction tuning, RLHF/DPO). Exposure to RNNs and CNNs is a plus rather than a core requirement.
- Experience training and fine-tuning LLMs and building RAG and agentic systems, including orchestration frameworks (LangChain, LlamaIndex, LangGraph), vector databases (e.g. Qdrant, Weaviate, pgvector) and tool/function calling.
- Experience with experimentation and tracking tooling : Jupyter notebooks plus experiment and prompt tracking (MLflow, Weights & Biases) and LLM evaluation (e.g. Ragas, LangSmith/Langfuse, custom eval harnesses).
- Familiarity with cloud platforms (AWS, Azure, Google Cloud) and their AI services, with a focus on Google Cloud (Vertex AI, model garden, managed endpoints).
- Experience deploying self-hosted and open-weight LLMs in production, using serving frameworks such as vLLM, TGI, Ollama or llama.cpp, with awareness of GPU sizing, quantisation formats (GGUF, AWQ, GPTQ) and on-prem or airgapped constraints.
- Working knowledge of MLOps/LLMOps and DevOps practices: Git, CI/CD, containerisation (Docker, Kubernetes), plus telemetry, monitoring and observability for model and inference performance.
- Analytical Skills: Excellent analytical and problem-solving skills with the ability to design innovative solutions to complex problems.
- AI Ethics and Bias Mitigation: Experience or awareness of AI ethics, fairness and bias mitigation strategies, in th
Benefity
- An excellent work environment and an opportunity to create a real impact in the world;
- A truly high-tech, state-of-the-art engineering company with flat structure and no politics;
- Working with the very latest technologies in Data & AI, including Edge AI, Swarming - both within our software platforms and within our embedded on-board systems;
- Flexible work arrangements;
- Professional development opportunities;
- Collaborative and inclusive work environment;
- Salary compatible with the level of proven experience.