DevOps Engineer - AI Model Evaluator
Obsidian
Full-time
Software Development, Software Architecture & Engineering
Location
helsinki, uusimaa, Finland
Posted
July 01, 2026
Job Description
About the Role
- Mercor is partnering with a leading AI research lab to support a Frontier Code Agents project.
- Contributors help evaluate and improve frontier AI coding models through structured technical assessments.
- The work focuses on realistic infrastructure engineering workflows and model evaluation.
- Spots are limited and filling quickly on a first come, first serve basis.
What You'll Do
- Use frontier AI coding agents to complete and evaluate complex infrastructure engineering tasks.
- Review model-generated implementations involving cloud platforms, Kubernetes, CI/CD systems, observability, and infrastructure automation.
- Identify bugs, edge cases, reliability issues, and failure modes.
- Compare outputs from multiple frontier models and assess their strengths and weaknesses.
- Apply professional engineering judgment to realistic infrastructure engineering scenarios.