The Role

Output has built a biological reasoning model that understands biology at the scale and complexity life actually operates. Our model independently learned the principles of molecular interactions, opening up drug treatments that were previously impossible. We're already generating therapies that traditional approaches cannot reach. The hardest problems in both AI and biology are being solved here, and there is room for you to own one. Output is currently in stealth, operated by a team of repeat founders and biotech veterans with multiple exits in AI x Bio, and backed by top-tier VCs including Y Combinator. You will own the data that our models learn from. This role requires a deep understanding of molecular biology - what a biological data source contains, what it implies, and what is missing. The quality and coverage of training data determines what our models can learn, and the biological insight behind how that data is constructed is the difference between a model that memorizes and one that reasons. You will construct training datasets that capture how proteins and molecules interact, drawing from diverse biological data sources and extending them with your understanding of molecular principles You will develop methods to expand training data beyond what exists in public databases, using biological and chemical reasoning to create new training signal where current data is sparse or absent You will design benchmarks grounded in real molecular phenomena, measuring whether our models have learned biologically meaningful capabilities rather than statistical shortcuts You will develop data strategies in collaboration with model researchers, determining what the model should learn from, what biological signal to prioritize, and how to sequence learning across modalities You will design approaches for integrating data across biological scales and modalities, building coherent training data from heterogeneous experimental and computational sources You will design rigorous splitting and evaluation strategies that prevent leakage and ensure model capabilities generalize to real biological problems You will stay current with biological data sources, experimental methods, and molecular databases, continuously identifying new sources of training signal

About You

You have a PhD in computational biology, biophysics, structural biology, chemistry, biochemistry, or a related biological field with 2+ years of post-doctoral or industry research experience, or equivalent depth through a combined biology and computational background You have deep understanding of molecular interactions, protein structure, and biological data at the molecular level, grounded in first principles rather than surface familiarity You have experience working with large-scale biological or molecular datasets, including sourcing, cleaning, integrating, and analyzing heterogeneous data

Company Values:

Heart: Culture of ownership and passion.
Excellence: Commitment to highest standards.
Practicality: Results-oriented impact on patients and community.
Honesty: Open and transparent issue addressing.
Fun: Creating a fun, engaging, rewarding workplace.

What They Offer:

Encouragement of new ideas, creativity, contrarian thinking.
Healthy feedback environment with leadership support and high expectations.
Ownership of day-to-day management focused on milestone achievement.
Competitive salary and equity in a growing startup.
Excellent medical, dental, and vision coverage.