Site Reliability Engineer

RCS TECH

Full-time Software Development,IT Services and IT Consulting

Apply Now

Location

mexico, mexico, Mexico

Posted

June 04, 2026

Job Description

What You’ll Do  
 Reliability & Operations 
 - Own availability, latency, and scalability across SaaS and AI systems  
 - Define and enforce SLOs, SLIs, and error budgets 
 - Participate in a global on-call rotation (~1 week every 4 weeks) 
 - Lead incident response and drive blameless postmortems with systemic fixes 
 Platform & Infrastructure  
 - Architect and operate on-premise and multi-region, multi-cloud environments 
 - Manage large-scale Kubernetes workloads 
 - Build and evolve infrastructure using Terraform and Ansible 
 - Improve system resilience, fault isolation, and capacity planning 
 AI/ML & Automation  
 - Build and scale agentic AI systems for triage, anomaly detection, and self-healing 
 - Ensure reliability of model serving infrastructure 
 - Operate, optimize and scale distributed systems 
 What You Bring ...
                    

Apply Now Similar Jobs

Job Details

Job Type

Full-time
Category

Software Development,IT Services and IT Consulting
Date Posted

June 04, 2026
Application Deadline

July 14, 2026