O pozici
3Pillar is an AI transformation partner on a mission to help enterprises build the AI-native products and intelligent agents that will define the next era of business. With teams across North America, Europe, Latin America, and Asia, we work with the most ambitious companies in financial services, healthcare, media, and technology — helping them move faster, modernize boldly, and compete on their own terms. Our HelixAI platform and Helix Pods delivery model put our engineers at the center of real agentic transformation — doing work that is open, portable, and built to last. We are building the future of enterprise AI
Co budeš dělat
- AI-Ready Data Platform — The Single Source of Truth
- Architect and own the enterprise AI data platform — the unified, governed layer that ingests, transforms, stores, and serves all data consumed by AI systems across the organisation.
- Design multi-domain data models (lakehouse, data mesh, event-driven) that are structured from day one to serve AI workloads: clean lineage, versioned schemas, well-documented contracts, and low-latency serving APIs.
- Own the full data stack: real-time streaming (Kafka, Spark Structured Streaming), batch processing (Databricks, PySpark, Delta Lake), cloud storage and compute (AWS, Azure), and data quality /metadata management.
- Ensure this platform is the single, authoritative data source for all downstream consumers —conversational AI, dashboard assistants, autonomous agents, ML models, and reporting —eliminating data silos and conflicting truths.
- Drive modernisation of legacy pipelines (on-prem ETL, batch DWH) to cloud-native, AI-ready architectures with measurable improvements in cost, latency, and delivery velocity.
- Semantic Models & Knowledge Layer
- Design the semantic layer that sits above raw data — business-aligned ontologies, entity relationships, domain taxonomies, and knowledge graphs — so AI systems understand context, not just tokens.
- Build and maintain knowledge graphs (Neo4j or equivalent) that capture relationships between business entities, policies, KPIs, hierarchies, and domain rules — enabling structured reasoning alongside unstructured retrieval.
- Define and govern a feature store and semantic data contracts that serve both classical ML models and LLM-based applications from a single, well-versioned, trusted source.
- Own metadata management, data lineage, and audit trails across the semantic layer — ensuring every AI system can trace its outputs back to source data with full accountability.
- RAG, Vector & Retrieval Infrastructure
- Design the retrieval infrastructure that powers RAG-based AI applications: embedding pipelines, vector stores (Pinecone, FAISS, ChromaDB, OpenSearch), chunking strategies, and hybrid retrieval layers combining semantic search with structured queries.
- Define the data contracts between the AI data platform and retrieval consumers — ensuring consistent, freshness-guaranteed, well-indexed data surfaces to RAG pipelines, conversational AI, and agent tools.
- Architect retrieval systems that balance precision, recall, latency, and cost — with clear evaluation benchmarks, not just infrastructure defaults.
- ML/LLMOps Infrastructure
- Own the ML and LLMOps data infrastructure: training data curation pipelines, feature engineering, model registry, experiment tracking (MLflow), automated evaluation, and production monitoring.
- Build CI/CD pipelines for AI systems: automated data validation, model quality gates, deployment automation, rollback mechanisms, and production health dashboards.
- Design data infrastructure for LLM fine-tuning workflows — training corpus curation, data quality filtering, RLHF pipelines, and adapter management — ensuring models trained on this platform reflect accurate, governed, domain-specif
Koho hledáme
- Strong data engineering and architecture experience, with 3–5+ years building production AI/ML and LLM-era data infrastructure.
- Proven experience designing enterprise-scale AI data platforms that serve multiple AI consumers —not just one application or pipeline.
- Deep expertise in lakehouse and data mesh architectures: Databricks, Delta Lake, PySpark, Kafka, Spark Structured Streaming, cloud-native data services (AWS, Azure).
- Hands-on experience with vector stores, semantic models, knowledge graphs, and retrieval infrastructure in production environments.
- Working knowledge of LLMOps: model serving pipelines, MLflow, CI/CD for AI, automated evaluation, and production monitoring.
- Strong background in data governance, security, and compliance in regulated industries (financial services, payments, cybersecurity, healthcare).
- Experience defining data access controls for AI agents and automated systems — not just human users.