O pozici
Who We Are
While Xebia is a global tech company, our journey in CEE started with two Polish companies – PGS Software, known for world-class cloud and software solutions, and GetInData, a pioneer in Big Data. Today, we’re a team of 1,000+ experts delivering top-notch work across cloud, data, and software. And we’re just getting started.
What We Do
We work on projects that matter – and that make a difference. From fintech and e-commerce to aviation, logistics, media, and fashion, we help our clients build scalable platforms, data and AI solutions, and cutting-edge applications to shape the future of tech. Our clients include McLaren, Aviva, Deloitte, Spotify, Disney, ING, UPS, Tesco, Truecaller, AllSaints, Volotea, Schmitz Cargobull, Allegro, InPost, and many, many more.
We value smart tech, real ownership, and continuous growth. We use modern, open-source stacks, and we’re proud to be trusted partners of Databricks, dbt, Snowflake, Azure, GCP, and AWS. Fun fact: we were the first AWS Premier Partner in Poland!
Beyond Projects
What makes Xebia special? Our community. We support tech communities, organize meetups (Software Talks, Data Tech Talks), and have a culture that actively support your growth via Guilds, Labs, and personal development budgets — for both tech and soft skills. It’s not just a job. It’s a place to grow.
What sets us apart?
Our mindset. Our vibe. Our people. And while that’s hard to capture in text – come visit us and see for yourself.
Co budeš dělat
- designing and implementing SRE practices, including SLI/SLO frameworks, error budgets, toil budgets, and reliability reviews,
- leading the maturity progression from Level 1 (Reactive) through Level 5 (Autonomous),
- driving toil elimination by identifying, measuring, and automating repetitive operational work,
- designing and executing chaos engineering experiments to proactively identify reliability weaknesses,
- establishing production readiness review processes for new application onboarding,
- collaborating with engineering teams on joint RCA backlogs and incident reduction initiatives,
- defining and tracking SRE KPIs, including MTTD, MTTR, error budget consumption, toil ratio, and automation coverage,
- mentoring L2 engineers in SRE practices and engineering-led problem solving,
- contributing to capacity planning, performance engineering, and reliability architecture reviews,
- championing a blameless post-incident culture and continuous improvement.
Koho hledáme
- 5 - 8 years of experience in SRE, DevOps, or platform engineering,
- practical experience using AI-powered assistants (e.g. Claude Code, GitHub Copilot, Cursor) to improve productivity, quality, or decision-making in software delivery,
- deep understanding of SRE principles (Google SRE book concepts), including SLIs, SLOs, error budgets, and toil elimination,
- strong programming skills in Python, Go, or similar languages,
- extensive experience with cloud platforms such as AWS, Azure, or GCP, as well as Kubernetes,
- proficiency with observability tools, including Datadog, Splunk, Prometheus, and Grafana,
- experience with Infrastructure as Code (Terraform, Ansible) and CI/CD pipelines,
- proven track record of driving reliability improvements in production environments,
- experience with chaos engineering tools such as Gremlin, Chaos Monkey, or Litmus,
- strong analytical, problem-solving, and English communication skills (at least B2 level).
- Work from the European Union region and a work permit are required.
Benefity
- We support tech communities, organize meetups (Software Talks, Data Tech Talks), and have a culture that actively support your growth via Guilds, Labs, and personal development budgets — for both tech and soft skills.
- It’s not just a job. It’s a place to grow.