Applied AI Engineer at Infer

About Infer

Infer is building the operating system for insurance agencies. We make AI agents (including voice agents) that handle the work agencies have always done by hand: qualifying inbound leads, helping producers during live calls, auditing calls after, running renewals, and bringing churned customers back. Our long bet is that AI eventually sells insurance directly. Agencies are the wedge because that is where the work, the data, and the customer relationships actually live. Get good there, and the rest follows. We are a YC company and have raised from Stellaris Venture partners and others.

Founders

Vaibhav Saxena: Architect and AI researcher (at Purdue), now a licensed insurance agent.
Urvin Soneta: Worked at BCG, surfer with six pack abs.
Suneel Matham: IITian and a philomath.

About the Role

Role Summary

We're hiring an Applied AI Engineer to own the system that tells us whether our voice agents are getting better, and to keep them improving autonomously. Voice quality is the product. If an agent stutters, hallucinates a quote, or misses a disclosure, we lose trust, deals, and sometimes compliance footing. The system that catches all of that before customers do is the most important infrastructure we will build this year. Currently, thousands of conversations run daily with real prospects. The role involves building a harness that scores every change end-to-end, a benchmark suite for new models on release day, a red-team pipeline probing failure modes, and self-improvement loops feeding production failures back into evaluation sets. This is an evals and infrastructure role with deep LLM work. You will touch audio but focus mainly on harnesses and loops around it—think of it as CI for voice conversations scoring agent behavior at every layer (STT, LLM, tools, TTS, full call outcomes) to catch regressions before customers do.

What You'll Do

Build and maintain the eval framework scoring voice agent quality across transcription, LLM reasoning, tool use, TTS, and full-conversation outcomes.
Design voice agent behavior: system prompts, tool use, conversation flow, error recovery, guardrails for real-time interactions.
Drive STT (speech-to-text) and TTS (text-to-speech) accuracy improvements by comparing providers, tuning configurations, running rigorous A/B experiments.
Improve TTS quality focusing on voice selection, latency vs fidelity tradeoffs, prosody, edge cases.
Curate and grow evaluation datasets including hard-case mining from production traffic.
Build benchmarks runnable against any new model within days; run red-team pipelines probing jailbreaks, hallucinated quotes, compliance failures.
Partner with backend engineers to integrate eval signals into CI so regressions block merges.
Build self-improvement loops where hard cases from production auto-feed eval sets; optimize prompts over time.

What Success Looks Like

Day 30

Understand agent workings across prompts, tools, evals, telephony & customer systems.
Ship v1 of evals with at least one trusted end-to-end metric.
Participate in customer call reviews tagging failure modes manually.
Benchmark one new model (open or closed) against production stack with defensible numbers.

Day 60

Eval system runs on updates blocking merges regressing on known cases.
First red-team suite covers at least three failure mode classes (jailbreaks/hallucinations/compliance), running on schedule.
Automated hard-case mining from production calls grows eval set without manual triage.
Benchmark at least one open source model (Qwen/DeepSeek or similar) with recommendation on switching.

Day 90

Swap in any new LLM with numbers-backed shipping decision within a week.
DSPy or GEPA-style prompt optimization runs over at least one production voice flow showing measurable lift.
Self-improvement v1 live for at least one failure pattern; fixes feed back into platform preventing repeat issues.
Spot failure patterns across accounts turning them into product fixes built by team.

Must-Haves

ML engineering experience shipping production systems.
Strong Python skills; working ML stack knowledge (PyTorch/Huggingface/pandas/scikit-learn).
Hands-on experience designing LLM-based agents: prompting/tool/function calling/multi-turn state/structured outputs.
Experience building eval frameworks for ML/LLM/voice systems; built LLM-as-judge pipelines & understand failure modes.
Practical experience with ASR/STT providers comparison/fine-tuning/open models like Whisper.
Practical experience with TTS systems (ElevenLabs or open models).
Comfortable working with audio data: sample rates/codecs/noise/alignment.

Nice-to-Haves

Designed voice agents handling barge-in/interruption recovery/disfluencies/natural turn-taking at prompt/behavior layer.
Experience with diarization/VAD/endpointing models.
Audio dataset curation/labeling/annotation pipelines experience.
Trained/fine-tuned ASR or TTS models from scratch or domain audio.
Experience with active learning/data-flywheel patterns over production traffic.
Open-source contributions to AI/ML frameworks.
Familiarity with cost/latency tradeoffs across model providers for real-time voice.

Company Info

Infer Founded: 2021 Batch: S21 Team Size: 9 Status: Active Infer builds AI-powered operating systems for insurance agencies focusing on automating lead qualification through voice agents and other AI tools to improve agency workflows. The company is backed by Y Combinator and Stellaris Venture Partners among others.

Founders

Suneel Matham

Founder ![Suneel Matham](https://bookface-images.s3.us-west-2.amazonaws.com/avatars/f5a1dbd4f3b86572839894b4a8ec6090100c22e3.jpg)

Vaibhav Saxena

Founder ![Vaibhav Saxena](https://bookface-images.s3.us-west-2.amazonaws.com/avatars/ba907a4373a3f7e576524c23ec9f129ae0444fad.jpg)

Urvin Soneta

Founder ![Urvin Soneta](https://bookface-images.s3.us-west-2.amazonaws.com/avatars/d5929c18a96fa67552d76e91b2a67c268b35fcb4.jpg)

Job Details

Title: Applied AI Engineer Location: Bengaluru, Karnataka, India Salary: ₹2M - ₹5M INR Equity: 0.01% - 0.20% Job Type: Full-time Role: Engineering / Machine Learning Experience Required: 3+ years Visa: US citizenship/visa not required