Location
manila, metro manila, Philippines
Posted
May 29, 2026
Job Description
We are looking for a highly skilled AI / LLM Engineer to lead the training, alignment, and optimization of large language models. This role focuses on Reinforcement Learning from Human Feedback (RLHF) and end‑to‑end post‑training pipelines, while ensuring models are efficient and production‑ready.
Key Responsibilities
- Lead and manage the end-to-end RLHF pipeline (data collection, reward modeling, RL fine‑tuning – PPO, DPO, GRPO, RLAIF)
- Design and implement Supervised Fine‑Tuning (SFT) pipelines using models like LLaMA, Mistral, and Qwen
- Build and train reward models based on human feedback
- Develop annotation pipelines (guidelines, calibration, dataset curation)
- Apply Constitutional AI & RLAIF to reduce manual labeling
- Perform model evaluation & red team for safety and quality
- Create benchmarks for performance, alignment, and r...