Senior ML Ops Engineer
kayak · Berlin Office
Candidature directe sur le site carrière de kayak — sans créer de compte.
Recevez les prochaines offres comme celle-ci par email
Une alerte gratuite pour postuler avant la foule — les offres arrivent en direct des pages carrière. Désinscription en un clic.
À propos du poste
KAYAK, part of Booking Holdings (NASDAQ: BKNG), is a leading travel search engine. With billions of queries across our platforms, we help people find their perfect flight, stay, rental car and vacation package. We're also transforming business travel with a new corporate travel solution, KAYAK for Business.
As an employee of KAYAK, you will be part of a travel company that operates a portfolio of global metasearch brands including momondo, Cheapflights and HotelsCombined, among others. From start-up to industry leader, innovation is in our DNA and every employee has an opportunity to make their mark. Our focus is on building the best travel search engine to make it easier for everyone to experience the world.
Every machine learning model KAYAK ships depends on reliable, scalable infrastructure to move from experiment to production — and that's exactly what this role makes possible. KAYAK is seeking a Senior MLOps Engineer who will focus on the design and implementation of our machine learning infrastructure and production lifecycle. This is a senior, hands-on role where you will bridge the gap between data science and production engineering.
You will join the Machine Learning Platform team and be responsible for building and maintaining scalable infrastructure & automated pipelines for model training, deployment, and monitoring, ensuring our ML models are reliable, reproducible, and performant. You will work closely with Data Scientists, ML Engineering and Operations teams to transform experimental code into robust, production-ready services at scale.
This role requires commuting to the Berlin office 3 times a week.
In this role, you will:
Build and maintain ML infrastructure end-to-end: Extend and operate the infrastructure that powers every model we ship — including CI/CD pipelines, model orchestration, and automated training pipelines designed to scale reliably without manual intervention.
Own model deployment and serving: Help define and evolve the standards and tooling for model serving, ensuring low latency and high availability across our ML services.
Develop core MLOps capabilities: Establish and maintain essential infrastructure that functions as reliable, self-service systems for the entire machine learning organization — with a focus on feature stores, model registries, and automated monitoring for performance and data drift.
Operationalize infrastructure for the ML team: Collaborate with Operations to enable Kubernetes (k8s) autoscaling and GPU provisioning, turning these into accessible, self-service tools for ML practitioners — including standing up and operating a Kubernetes-based development cluster and taking models from experimentation to GPU-backed production.
Improve platform reliability and performance: Partner with Operations to design resilient monitoring using advanced observability tooling. Define service-level objectives and implement automation to reduce manual interventions and improve system reliability.
Empower Data Scientists through standardized, optimized workflows: Amplify the impact of the ML team by building clear, well-supported "golden paths" — standardized workflows that streamline the model development lifecycle and let Data Scientists focus on modeling while you handle the infrastructure.
Please apply if you have:
Experience building and operating ML platforms in production environments.
Solid working knowledge of containerization and orchestration (Docker, Kubernetes), Linux internals, and model serving at scale.
Familiarity with ML lifecycle tooling, including orchestration frameworks, feature stores, model registries, and drift or performance monitoring.
Experience owning production systems: defining service-level objectives (SLOs), building observability (for example, using tools such as Prometheus, Grafana, or Datadog), participating in incident response, and diagnosing large-scale failures systematically. You look for opportunities to automate repetitive work rather than absorb it.
Comfort writing production-quality code in Python or a comparable language.
Experience modernizing production infrastructure with attention to reliability, risk, and cost — including thoughtful sequencing of work to maintain availability and continuity for live systems.
The ability to take ownership of technical outcomes, advocate for decisions using data, and communicate clearly in writing and in person — to both technical and non-technical audiences.
Benefits and Perks
Work from (almost) anywhere for up to 20 days per year
Focus on mental health and well-being:
Company-paid therapy sessions through SpringHealth
Company-paid subscription to HeadSpace
Company-wide week off a year – the whole team fully recharges (and returns without a pile-up of work!)
No meeting Fridays
Paid parental leave
Paid volunteer time
Focus on your career growth:
Development Dollars
Leadership development
Access to thousands of on-demand e-learnings
Travel Discounts
Employee Resource Groups
6 weeks paid vacation + a day off for your birthday
Free lunch 2 days per week
Pension plan contributions
Public transportation subsidies
Bike leasing program
Monthly social events, Thursday happy hours, sports teams
An awesome office in Friedrichshain, Berlin
Inclusion
At KAYAK, we want everyone to have the space to grow, share ideas and do great work. That's why we're focused on hiring the best talent from all walks of life and experiences, supporting them well and making sure no one feels like they have to fit a mold to belong here.
Need any adjustments for the interview, application or on the job? No problem – just give us a heads-up. We've got you.
#LI-AS1
Compétences
- Python
- SQL
- Docker
- Kubernetes
- Terraform
- Jenkins
- GitHub Actions
- React
Recevez les prochaines offres comme celle-ci par email
Une alerte gratuite pour postuler avant la foule — les offres arrivent en direct des pages carrière. Désinscription en un clic.