Apply Now
Location
london, england, United-Kingdom
Posted
June 29, 2026

Job Description

Requirements

  • Proven experience improving performance in production systems with tight constraints (latency, memory, bandwidth, power/thermal, or cost)
  • Strong proficiency with at least one relevant stack/toolchain (e.g. TensorRT, CUDA, Qualcomm QNN, Triton, OpenCL) and confidence learning adjacent frameworks quickly
  • Comfort operating at multiple levels of abstraction — from high‑level model behaviour down to low‑level kernel/runtime execution
  • Strong software engineering fundamentals (debugging, profiling, testing, and maintainable code)
  • Clear communicator and collaborative teammate; able to align multiple stakeholders on performance trade‑offs and priorities
  • (Desirable) Exposure to embedded or edge deployment of ML models, including benchmarking on real devices and handling system‑level constraints
  • (Desirable) Experience with NVIDIA and/or Qualcomm SoCs and performance tooling
  • (Desirable) Python ...