Location
Remote, CA, United States
Posted
June 11, 2026
Job Description
NVIDIA DGX Cloud is building the operating model for reliable, scalable GPU infrastructure across internal, partner, and on-prem environments. We are looking for an Engineering Manager to lead a team of software and production engineers focused on Kubernetes-based operations, automation, reliability, and cluster lifecycle tooling. This leader will help run todayβs production systems while building the automation and engineering practices needed for the next generation of DGX Cloud infrastructure.
What youβll be doing:
+ Lead a team of software and production engineers building and operating DGX Cloud infrastructure across NVIDIA Cloud Partner (NCP) and on-prem environments.
+ Drive execution across cluster operations, Kubernetes operability, automation, GitOps, observability, and incident response.
+ Help define team priorities, roadmap, staffing, and operational ownership.
+ Partner with platform, workload, storage, networking, security, and TPM teams to i...
What youβll be doing:
+ Lead a team of software and production engineers building and operating DGX Cloud infrastructure across NVIDIA Cloud Partner (NCP) and on-prem environments.
+ Drive execution across cluster operations, Kubernetes operability, automation, GitOps, observability, and incident response.
+ Help define team priorities, roadmap, staffing, and operational ownership.
+ Partner with platform, workload, storage, networking, security, and TPM teams to i...