O pozici
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.
Co budeš dělat
- We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.
- You'll create challenging tasks and evaluation criteria within realistic simulated environments:
- Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
- Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent
- Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
- Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust
Koho hledáme
- 5+ years in software development
- Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
- Experience writing tests (functional, integration)
- English proficiency - B2+
Benefity
- Compensation: Up to $40/hr equivalent, depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.