Machine Learning Systems Engineer (MLSys)
HPC AI TECHNOLOGY PTE. LTD.
Full-time
Electrical & Energy Engineering
Location
singapore, singapore, Singapore
Posted
June 05, 2026
Job Description
Responsibilities
- System Development & Maintenance
Contribute to the development, optimization, and maintenance of core components of the machine learning platform, including feature stores, experiment tracking systems, model registries, workflow orchestration, and serving frameworks - Training Efficiency Optimization
Assist in optimizing the performance of distributed training frameworks (e.g., PyTorch DDP, DeepSpeed, FSDP) on large-scale clusters, addressing challenges such as resource scheduling and communication bottlenecks - Inference Performance Optimization
Participate in model deployment and serving, including performance profiling and acceleration through model compilation (e.g., TVM, TensorRT), operator optimization, computation graph optimization, and batching strategies - Infrastructure Support
Leverage technologies such as containerization (Docker), orchestration (Kubernete...