Machine Learning Engineering Manager

LLM Serving & Infrastructure

The Personalization team makes deciding what to play next on Spotify easier and more enjoyable for every listener. We seek to understand the world of music and podcasts better than anyone else so that we can make great recommendations to every individual and keep the world listening. Every day, hundreds of millions of people all over the world use the products we build which include destinations like Home and Search, original playlists like Discover Weekly and Daylist, and are at the forefront of new innovations like AI DJ and AI Playlists. Generative AI is transforming Spotify’s product capabilities and technical architecture. Generative recommender systems, agent frameworks, and LLMs present huge opportunities for our products to serve more user needs and use cases and unlock richer understanding of our content and users. This ML Manager will focus on the serving of a Unified Recommender model, based on open-weight LLM and transformer technology. You will collaborate with a diverse team to establish and implement the machine learning plan for the product domain, developing innovative recommendations and agent interactions. You will work as a technology leader, managing a team and influencing peers. You will collaborate with internal customers and platform teams, offering the opportunity to profoundly build the direction of the entire Spotify experience. Join us and you’ll keep millions of users listening and engaging with our platform every day!

What You’ll Do

Lead a high-performing engineering team to develop, build, and deploy a high-scale, low-latency LLM Serving Infrastructure.
Drive the implementation of a unified serving layer to support multiple LLM models and inference types (batch, offline eval flows and real-time/streaming).
Lead all aspects of the development of the Model Registry for deploying, versioning, and running LLMs across production environments.
Ensure successful integration with the core Personalization and Recommendation systems to deliver LLM-powered features.
Define and champion standardized technical interfaces and protocols for efficient model deployment and scaling.
Establish and monitor the serving infrastructure's performance, cost, and reliability, including load balancing, autoscaling, and failure recovery.
Collaborate closely with data science, machine learning research, and feature teams (Autoplay, Home, Search, etc.) to drive the active adoption of the serving infrastructure.
Scale up the serving architecture to handle hundreds of millions of users and high-volume inference requests for internal domain-specific LLMs.
Drive Latency and Cost Optimization: partner with SRE and ML teams to implement techniques like quantization, pruning, and efficient batching to minimize serving latency and cloud compute costs.
Develop Observability and Monitoring: build dashboards and alerting for service health, tracing, A/B test traffic, and latency trends to ensure consistency to defined SLAs.

Experience & Background

Extensive experience managing engineering teams focused on machine learning infrastructure or related fields.
Strong background in distributed systems design especially for ML serving platforms.
Proven track record in building scalable low-latency services in production environments.
Deep understanding of large language models (LLMs) including transformer architectures.
Experience collaborating cross-functionally with research scientists,

machine learning engineers, and product teams.

Technical Skills

Proficiency in Python,

Java, and/or Scala programming languages.

Expertise with Kubernetes,

docker containers, and cloud platforms such as AWS or GCP.

Familiarity with ML model deployment tools such as TensorFlow Serving,

or TorchServe.

Knowledge of monitoring tools like Prometheus,

Grafana, and distributed tracing systems.

Preferred Qualifications

Experience working in recommendation systems or personalization domains is a plus.
Knowledge of quantization techniques,

batching strategies, and cost optimization methods for ML inference workloads.

Benefits

Competitive salary package aligned with senior engineering management roles in tech industry.
Comprehensive health insurance plans including dental & vision coverage.
Generous paid time off policy plus flexible working hours & remote options.
Global parental leave six months off for all new parents
Employee assistance program "All The Feels"
Flexible public holidays allowing swaps according to values/beliefs