Job Description

Handshake is seeking motivated Software Engineers to evaluate Large Language Models (LLMs) in collaboration with top AI labs. As a Python Software Engineer, you will contribute to high-impact research collaborations focused on enhancing AI systems. This involves identifying gaps in modern LLMs and explaining how models fail. You will work with coding benchmarks that reflect real-world development across diverse languages and domains using Python. Handshake AI projects are remote and part-time opportunities. Must be located in the US and have proper work authorization (OPT and H-1B not supported). Key responsibilities include:

Developing and validating coding benchmarks by curating issues, solutions, and test suites from real-world repositories
Ensuring comprehensive unit and integration tests for solution verification
Maintaining the consistency and scalability of benchmark task distribution
Providing structured feedback on solution quality
Debugging and optimizing benchmark code
Documenting processes for reproducibility

Shape the future of AI while building skills and confidence to navigate an AI-driven job market. This role will leverage your engineering expertise in a flexible environment, but you must be able to commit 15+ hours/week. Pay starts at $65/hour, plus bonuses for task completion.