Principal Coding Annotator / LLM Evaluation Engineer

Braintrust

Full-time Engineering

Apply Now

Location

london, england, United-Kingdom

Posted

May 28, 2026

Job Description

About the Role We are building and evaluating state‑of‑the‑art large language models (LLMs) and are looking for experienced software engineers to join our evaluation and annotation team. This role sits at the intersection of real‑world software engineering, model evaluation, and applied AI, and is critical to improving model reliability, reasoning, and code quality. 
This is a contracting engagement – initially 6 months – with potential for long‑term engagement. 
Location: Paris or London‑based preferred; alternatively Europe remote for strong candidates. 
What You’ll Do Create high‑quality coding prompts and reference answers (benchmark‑style, e.g. SWE‑Bench‑like problems). 
Evaluate LLM outputs for code generation, refactoring, debugging, and implementation tasks. 
Identify and document model failures, edge cases, and reasoning gaps. 
Perform head‑to‑head evaluations between private LLMs (Mistral‑based) a...
                    

Apply Now Similar Jobs

Job Details

Job Type

Full-time
Category

Engineering
Date Posted

May 28, 2026
Application Deadline

July 07, 2026