USD per year
Job Description
Handshake is seeking motivated Software Engineers to evaluate Large Language Models (LLMs) in collaboration with top AI labs. As a Python Software Engineer, you will contribute to high-impact research collaborations focused on enhancing AI systems. This involves identifying gaps in modern LLMs and explaining how models fail. You will work with coding benchmarks that reflect real-world development across diverse languages and domains using Python. Handshake AI projects are remote and part-time opportunities. Must be located in the US and have proper work authorization (OPT and H-1B not supported). Key responsibilities include:
- Developing and validating coding benchmarks by curating issues, solutions, and test suites from real-world repositories
- Ensuring comprehensive unit and integration tests for solution verification
- Maintaining the consistency and scalability of benchmark task distribution
- Providing structured feedback on solution quality
- Debugging and optimizing benchmark code
- Documenting processes for reproducibility
Shape the future of AI while building skills and confidence to navigate an AI-driven job market. This role will leverage your engineering expertise in a flexible environment, but you must be able to commit 15+ hours/week. Pay starts at $65/hour, plus bonuses for task completion.
Taro is a software engineering career platform that helps individuals build skills and confidence to navigate an AI-driven job market through flexible part-time roles evaluating large language models.
View Company Profile