O pozici
At Noxtua, we're building Europe's sovereign Legal AI — and that runs on infrastructure we own and operate ourselves, including our own GPU servers. We're looking for a Staff/Lead DevOps Engineer to take ownership of that infrastructure, lead a small team, and keep our platform stable, secure, and cost-efficient as we scale.
This is a hands-on leadership role: you'll set technical direction and grow the team, while staying close enough to the systems to dig into a server, a Kubernetes config, or an Ansible playbook yourself. Unlike most setups that run entirely on public cloud, a core part of this role is operating our own GPU hardware alongside our cloud environment.
You'll join Noxtua's Engineering organisation as part of a small DevOps team of around 4–5 engineers, reporting to Simon, our CTO. The team owns the platform our Legal AI runs on — a managed Kubernetes environment across multiple stages (dev, test, prod) on sovereign EU cloud infrastructure (OTC), alongside our own GPU servers. Day to day, you'll work closely with the backend, frontend, and AI teams whose services you deploy and operate, as well as with our cloud and hardware providers. As Lead, you'll set the team's technical direction while staying hands-on with the systems themselves.
Co budeš dělat
- Own and optimize Noxtua's infrastructure across OTC and our self-hosted GPU servers — ensuring efficient architecture, reliable operation, and cost control.
- Lead and grow a team of 4–5 DevOps engineers , setting technical direction, supporting their development, and having a strong ownership mindset.
- Operate our self-managed GPU server fleet — provisioning, driver installation, hardening, and connectivity via Ansible — and manage provider SLAs to keep heavy AI workloads running reliably.
- Build and maintain infrastructure automation using Infrastructure as Code (Terraform & Ansible).
- Run our container platform on Kubernetes , support teams with Docker, and keep our services (APIs) stable, accessible, and secure.
- Set up and maintain monitoring and alerting (e.g., Prometheus, Grafana) to ensure system reliability and performance.
- Develop and maintain CI/CD pipelines and collaborate with the development and AI teams to automate deployments and support AI-driven workloads.
Koho hledáme
- Leadership: Experience leading or mentoring a team, setting technical direction, and balancing hands-on operations with people responsibility.
- Managing server fleets : You've managed a fleet of servers and understand the methodology behind it — not just rented cloud instances. That includes OS-level operations on real hardware (e.g., installing drivers, hardening, verifying connectivity) and working with provider SLAs (availability, failures, support escalation). Experience with GPU servers is a strong plus, but not required.
- Linux & scripting: Strong proficiency in Linux and Bash, plus a scripting language such as Python.
- Cloud architecture & networking: Proven track record designing, operating, and cost-managing cloud-based architectures — ideally OTC (Open Telecom Cloud), or transferable experience from AWS, Azure, or Google Cloud — with solid networking fundamentals (DNS, OSI model).
- Infrastructure as Code: Strong focus on automating provisioning and configuration with Terraform and Ansible.
- Containerization & orchestration: Expertise in containerizing applications with Docker and running them at scale on Kubernetes.
- Monitoring & observability: Able to set up and maintain monitoring/alerting tools (e.g., Prometheus, Grafana), aggregate data, visualize insights, and derive actions.
Benefity
- Remote: 100% remote work possible (given a German residence), other countries upon request
- Working hours: Flexible working hours
- Vacation: 26 days + December 24th & 31st off, + 1 additional vacation day per year of employment (up to 30 days)
- Discounts: e.g., Urban Sports Club Membership, depending on location
- Equipment: Laptop (Lenovo or Mac), plus €1,000 net home office setup budget (paid with your first salary)