Machine Learning Systems Engineer (MLSys)

HPC AI TECHNOLOGY PTE. LTD.

Full-time Electrical & Energy Engineering
Apply Now
Location
singapore, singapore, Singapore
Posted
June 05, 2026

Job Description

Responsibilities

  • System Development & Maintenance
    Contribute to the development, optimization, and maintenance of core components of the machine learning platform, including feature stores, experiment tracking systems, model registries, workflow orchestration, and serving frameworks
  • Training Efficiency Optimization
    Assist in optimizing the performance of distributed training frameworks (e.g., PyTorch DDP, DeepSpeed, FSDP) on large-scale clusters, addressing challenges such as resource scheduling and communication bottlenecks
  • Inference Performance Optimization
    Participate in model deployment and serving, including performance profiling and acceleration through model compilation (e.g., TVM, TensorRT), operator optimization, computation graph optimization, and batching strategies
  • Infrastructure Support
    Leverage technologies such as containerization (Docker), orchestration (Kubernete...