USD per year
About the role
The Opportunity
We’re hiring a Sr. AI Systems Engineer to help support our emerging product, Night Shift, an AI research assistant that amplifies the impact of investigators by automating the tedious, repetitive steps involved in working a case. This role sits within the Machine Learning team and will work closely with partners in Engineering (Backend, Frontend, and Design) in a fast-paced environment. You will be one of the earliest technical contributors to our system architecture for agentic AI, and will own our AI evaluation framework. The outcome we’re after is clear and ambitious: measurably faster, more accurate leads for every officer and every shift.
The Skillset
Familiarity with Agentic Systems: Hands-on experience with LLM agents including:
- LLM API use (e.g. LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs)
- Agent Design: tool use (e.g. via MCP), retrieval, memory, grounding/attribution for claims, and guardrails.
- Architectural patterns: planning and hand-off for multi-agent systems, context management
- RAG: vector/hybrid search (e.g. pgvector, turbopuffer, rerankers, etc.)
ML Platform expertise: 5+ years building and shipping ML systems to production; experience in the following areas:
- Backend Python and JS familiarity required; Typescript/Golang familiarity welcome
- Web services (e.g. Express/FastAPI, REST, SSE, JWTs)
- Cloud Infrastructure (e.g. AWS, Terraform, VPC, Networking)
- Backend databases/stores (e.g. Postgres, Redis)
- Observability (e.g. Prometheus, Grafana, OpenTelemetry, LangSmith/Langfuse)
- [Preferred] Durable execution (e.g. Temporal, Hatchet)
- [Preferred] OLAP (e.g. ClickHouse, Bigquery)
- [Preferred] ML Inference (e.g. PyTorch, TensorRT, NVIDIA Triton), ideally in multimodal domains (text/image/video)
- [Preferred] Compute orchestration (e.g. Kubernetes, Prefect, Ray)
Experience with LLM Evaluations at scale: You’ve built offline/online eval harnesses and are familiar with the methodologies and metrics to measure:
- Search, retrieval, and recommendation performance
- Safety & robustness (security, compliance, red-teaming, regression testing)
- Cost, performance and latency trade-offs
- [Preferred] Agentic task success, trajectory quality,...