Director, Engineering - Serverless Inference
Bengaluru
Full Time
2 hours ago
Mid LevelEngineering
Job Description

Director, Engineering - Serverless Inference

Bengaluru Apply Dive in and do the best work of your career at DigitalOcean. Journey alongside a strong community of top talent who are relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big and bold, and are energized by the fast-paced environment of a true industry disruptor, you’ll find your place here.  We value winning together—while learning, having fun, and making a profound difference for the dreamers and builders in the world. We are seeking a player-coach Engineering Director to lead the team that implements and contributes to the design and optimization of our Serverless Inference infrastructure and APIs. In this role, you will tackle the challenges of large-scale AI workloads, focusing on throughput, GPU utilization, and fault tolerance to support next-generation inference needs of AI native enterprises.

What You'll Do:

  • Engineering Excellence: Raise the engineering bar through strong software design, operational discipline, incident management, and continuous improvement practices.
  • Culture: Foster a culture of ownership, accountability, and continuous learning across the team.
  • Team Leadership & Development: Recruit, mentor, and coach engineers on the team, fostering a culture of ownership, technical excellence, and continuous improvement.
  • Execution & Delivery: Partner closely with platform, GPU infrastructure, and product engineering teams to deliver production-grade systems and highly available APIs.
  • Cross-Functional Partnership: Collaborate with Product Management, other engineering teams, and key stakeholders to align priorities, manage dependencies, and communicate progress and risks.
  • Operational Health: Lead by example in incident response, operational readiness, and production-quality engineering practices.
  • Strategic Architecture & Planning: Define the technical roadmap and oversee the architecture of high-throughput API gateway for inference engine.

What You'll Add to DigitalOcean

  • 4+ years of experience directly managing engineering teams.
  • Experience managing teams that build and operate multi-tenant platforms or distributed backend systems.
  • Deep understanding of SRE principles, including observability, incident management, reliability engineering, capacity planning, and operational automation.
  • Strong understanding of cloud-native multi region architectures, microservices, and distributed systems fundamentals.
  • Experience with Kubernetes and/or operating high-scale distributed services in production environments.

Bonus

  • Experience working with GPUs, GPU utilization tracking, or accelerated computing infrastructure.
  • Familiarity with Large Language Models (LLMs) and modern LLM serving architectures...
How to Apply
About DigitalOcean

DigitalOcean provides simple tools and predictable pricing for infrastructure management, enabling digital native enterprises to develop, manage, and scale applications using compute, storage, and networking solutions. They offer scalable cloud compute products including Droplets (virtual machines), Kubernetes managed service, serverless Functions, Gradient AI Agentic Cloud for AI apps, managed hosting with App Platform, backups & snapshots, networking solutions (firewalls, load balancers, VPC), managed databases (MongoDB, Kafka, PostgreSQL, MySQL), storage options (Spaces object storage and Volumes block storage), developer tools (API, CLI), and management tools (monitoring, projects, IAM).

View Company Profile