Location
mexico city, cdmx, Mexico
Posted
June 01, 2026
Job Description
We are looking for Tech Ops - Production Support & Reliability Lead
Front-line production support for Braviant's AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role - not a developer role despite Lead in title.
Stack:
AWS - VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
PostgreSQL on Amazon RDS (~15 instances)
Datadog CloudWatch (APM, logs, alerting)
Java microservices / API-heavy app stacks
Jira (ITSM) Slack (ops channels)
Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane
Must-have:
3 years production support / SRE / NOC / ops engineering
Hands-on AWS - EC2/ECS, VPC networking, IAM
Operational PostgreSQL / RDS - slow query reading, basic tuning, vacuum awareness
Incident triage across infra app layers
Structured incident response (ITIL, NIST, or equivalent)
SLA management in a ticketed environment (Jira o...
Front-line production support for Braviant's AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role - not a developer role despite Lead in title.
Stack:
AWS - VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
PostgreSQL on Amazon RDS (~15 instances)
Datadog CloudWatch (APM, logs, alerting)
Java microservices / API-heavy app stacks
Jira (ITSM) Slack (ops channels)
Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane
Must-have:
3 years production support / SRE / NOC / ops engineering
Hands-on AWS - EC2/ECS, VPC networking, IAM
Operational PostgreSQL / RDS - slow query reading, basic tuning, vacuum awareness
Incident triage across infra app layers
Structured incident response (ITIL, NIST, or equivalent)
SLA management in a ticketed environment (Jira o...