Senior LLM Inference Engineer — Performance & GPU Optimization

Confidential

Full-time Other-General

Apply Now

Location

singapore, singapore, Singapore

Posted

June 05, 2026

Job Description

                        Own the performance of large language models in production
— the latency, the throughput, the cost-per-token. This is deep inference-optimization work: profiling and tuning at the GPU and serving-engine level to make models run faster and cheaper at scale. You'll join a small, senior team at an established enterprise software company building LLM-powered capabilities into its products.
What you'll do: Optimize LLM inference for latency, throughput, and cost — at the kernel and serving-engine level Profile and tune GPU performance (CUDA, TensorRT-LLM); apply quantization, speculative decoding, and batching strategies Get the most out of serving frameworks like vLLM, SGLang, and Triton — and extend them where they fall short Optimize across hardware targets where relevant (NVIDIA and other accelerators) Partner with model and platform teams to take new architectures from works to fast
What you'll bring: Deep experience optimizing deep-learning inference in production Hands-on GP...
                    

Apply Now Similar Jobs

Job Details

Job Type

Full-time
Category

Other-General
Date Posted

June 05, 2026
Application Deadline

July 15, 2026