USD per year
Deepgram
Building foundational AI for speech transcription and understanding.
Research Scientist, Voice
$150K - $250K US / Remote (US) Job type Full-time Role Science, Research Experience 3+ years Visa US citizen/visa only Skills Python Apply to Deepgram and hundreds of other fast-growing YC startups with a singleprofile. Apply to role 7;
About the role
Company Overview Deepgram is a foundational AI company building state of the art, production-ready AI models that streamline human-computer interaction and amplify productivity. By enabling seamless communication between humans and machines, we believe we can harness the untapped potential of AI and help pave the way for a more productive future. We passionately believe in the potential of audio data to transform lives, businesses, and interactions across the globe - which is why Deepgram is trusted by well-respected companies like NASA, Twilio, Auth0, and Spotify to push the boundaries of what is possible in voice technology! The Opportunity At Deepgram, we spend every day tackling big, real-world challenges in voice. Our customers hire us to solve their hardest problems, taking real, complex audio and transforming it into novel insights. And to raise the bar, everything we build needs scale in its DNA. We arent content with simple horizontal scaling: we intend to replace entire data centers dedicated to speech analytics with a single rack of servers. These challenges demand creativity and innovative problem-solving every day. As a Research Scientist at Deepgram, youll have the freedom to explore and uncover breakthroughs. Youll also have a mandate to build -- applying the latest advancements in deep learning to develop accurate and performant voice AI models. You will collaborate with product & engineering to help deploy these models in the most scalable speech API on the planet. We look forward to you bringing your whole self to work, sharing learnings from your latest experiments, and collaborating with us to advance the state of AI and voice technology. The Role Deepgram is currently looking for an experienced Research Scientist who has worked extensively on building models to solve hard problems in voice AI domains including automatic speech recognition (ASR), text-to-speech (TTS), diarization and speaker identification, language detection, or code switching. Voice AI is a challenging problem space which involves dealing with raw audio waveforms generated by the human voice. The complexity of audio data poses unique infrastructure, engineering, and modeling challenges which are orders of magnitude more difficult than working with text. You should have extensive experience working on the hard technical aspects around deep learning for audio such as speech data curation and characterization, development of expressive and efficient neural network architectures for speech, distributed training at large-scales, and optimization of speech models for inference at scale. What Youll Do
- Stay up to date with the latest advances in deep learning with a particular eye towards their implications and applications within our products.
- Design and carry out experimental programs to build new voice AI models that solve critical problems for our customers.
- Drive large-scale training jobs successfully on distributed computing infrastructure.
- Optimize model architecture to make them as fast and memory-efficient as possible; deploy new models into production for use at massive scale.
- Document and present results and complex technical concepts clearly for internal and external audiences.
Youll Love This Role If You
- Are passionate about AI and excited about working on state of the art speech research.
- Enjoy building from the ground up and love to create new systems from scratch.
- Are obsessed with building and shipping practical solutions to real world problems.
- Are data-driven and prefer to solve problems using iterative experimentation.
- Have strong communication skills and are able to translate complex concepts in simple terms,...
Deepgram provides the most accurate and cost-effective real-time APIs for speech-to-text, text-to-speech, and voice agents. They unify speech-to-text, text-to-speech, and LLM orchestration into a single API to reduce complexity, latency, and cost. Deepgram offers Voice AI infrastructure for builders, platforms & partners embedding enterprise-grade Voice AI, and custom models for enterprises with unique workflows and compliance needs. They deliver intelligent voice experiences safely, securely, and at scale.
View Company Profile