Location
yucatΓ‘n, mΓ©rida, Mexico
Posted
June 27, 2026
Job Description
We're building something that doesn't exist yet in Latin America: a domain-specific large language model trained on a large Spanish-language corpus, deployed on proprietary on-premise hardware, solving a real problem for clients who are already waiting. We can't tell you exactly what it is yet. What we can tell you: β The corpus is real and large β The clients are real and paying β The hardware is ready β The team is small and the decisions matter β The person who joins now shapes the architecture This is a fully on-site role in MΓ©rida, YucatΓ‘n. We want someone in the room β not because we don't trust remote work, but because the knowledge needs to live in the team, not in one person's laptop. ββββββββββββββββββββββββββββββ WHAT YOU'LL BUILD ββββββββββββββββββββββββββββββ β Large-scale Spanish text dataset pipeline: cleaning, deduplication, tokenization β Continual Pre-Training (CPT) on open-source base models (Llama/Qwen family) on dedicated GPU workstation β in our office β Supervis...