Applied Research Engineer

Location: San Francisco, United States / Remote (US) Employment: Full-time Department: Engineering - Backend Experience: 6+ years Salary: $180K - $250K Equity: 1.00% - 1.50%

Technologies and Requirements

Amazon Web Services (AWS), C++, Go, Python, Rust, Torch/PyTorch, LLMs, US citizen/visa only

Overview

Zep is the memory and context layer for AI agents. As a Senior Applied Research Engineer, you will explore novel approaches to memory, context, and context generation, then own those ideas all the way to production. This is a research role with a strong applied focus. The company is looking for engineers who can run rigorous experiments, train and evaluate models, and ship production code that customers depend on.

How We Work

Small, distributed team working closely together.
Pair programming on hard problems.
Design reviews.
Learning is part of the job.
Encourage asking questions of customers, teammates, and assumptions.
Fix pain points when found.
Expectation to ask questions early, push back when disagreeing, and care about API users.

What You'll Do

Explore novel approaches to memory, context, and context generation; define problems; run experiments; ship results.
Own research to production end-to-end including dataset creation and curation, experiment design, evaluation, training and fine-tuning, production deployment.
Train, fine-tune, and evaluate models on Zep's domain.
Build evaluation harnesses that catch regressions before shipping.
Work with model serving stack to operate inference at low latency and reasonable cost on AWS.

What We're Looking For

6+ years of production engineering with strong backend systems background; experience shipping services with real throughput and latency requirements.
Master's degree in Computer Science or equivalent.
Strong research skills including methodology, dataset creation/curation, experiment design and evaluation; ability to frame open problems and design experiments that answer questions.
Hands-on experience with model fine-tuning; familiarity with transformer architectures; training/fine-tuning workflows; evaluation; PyTorch and OpenAI Triton for experimentation.
Experience with model serving technologies such as vLLM, SGLang or Triton Inference Server; operated inference in production.
Proficiency in Python plus one of Rust, C++, or Go for critical-path code and performance (Python-only not sufficient).
Hands-on AWS experience in production including deployments, monitoring, scaling, cost/reliability tradeoffs.

Nice to Have

Published or open-source work in retrieval, memory systems or LLM evaluation.

Tech Stack

Python, Rust/C++/Go, PyTorch, vLLM/SGLang, AWS.

This Role Is Probably NOT a Fit If

You are an ML researcher or model trainer who hasn't shipped research to production.
Your background is primarily Python application work without lower-level systems experience.
You haven't operated production backend systems with real latency or throughput requirements.

Interview Process:

Screening Call with Daniel (Founder)
Team Calls (2–3 hours back-to-back; may include a presentation)
Decision Call with Daniel again