See all roles

Senior Machine Learning Systems Engineer (Training Optimization)

Work from home Full-time role Hiring
Company Description:

About the Group/Team We're the CORE team within the Generative AI supergroup. Our mission is to invent foundational technologies that will power the future of AI-assisted design. From large-scale models to groundbreaking research, our team builds the technical core of Canva’s creative intelligence engine. We collaborate globally to ship research that makes a real impact—from smart editing to AI video tools—at massive scale.

Job Description:

About the Role/Specialty As a Senior Machine Learning Systems Engineer, you’ll lead efforts to scale and optimize the training system for our large-scale multimodal and foundation models. You’ll design distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton—pushing the limits of performance across compute, memory, and communication layers. You'll sit at the intersection of systems and AI research, directly shaping how we train the models that will power Canva’s next generation of products.

What you’ll do (responsibilities)

  • You’ll design, implement, and optimize large-scale machine learning systems for training
  • You’ll improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency.
  • You’ll partner with research and modeling teams to align systems with algorithmic needs.
  • You’ll evaluate and apply best practices for distributed training using industry-leading frameworks.
  • You’ll dive deep into low-level optimization, including custom CUDA or Triton kernels.
  • You’ll debug, profile, and fine-tune training workflows to unlock new levels of scalability.
Qualifications:

What we're looking for

We’re looking for a systems-first engineer who thrives in fast-paced, high-impact environments. You’re deeply familiar with distributed model training at scale and understand the nuances of optimizing compute at every level of the stack. You're excited by challenges that stretch current boundaries, and you’re a strong collaborator who communicates clearly across domains.

  • Strong background in LLMs, multimodal AI, or diffusion models.
  • Proficiency in Python. Familiarity with a system programming language (e.g. C++ or Rust) is a plus.
  • Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM, NeMo, or DeepSpeed.
  • Familiarity with common optimization techniques such as FSDP/ZeRO, gradient checkpointing, or low-precision data types.
  • Hands-on experience writing custom GPU kernels in CUDA or Triton.
  • Excellent communication and problem-solving skills, incl. full proficiency in English.
Apply To This Job

You might like

Education Marketing Specialist - Indonesia

Work from home Full-time role

Operations Business Partner (12 months FTC)

Work from home Full-time role

Solutions Consultant

Work from home Full-time role

Senior Solutions Consultant, LATAM

Work from home Full-time role

Channel Account Executive

Work from home Full-time role

Virtual Assistant (AI-Fluent) – Full Remote

Work from home Full-time role

CV Sales Truck HCV & LT Senior Sales Representative

Work from home Full-time role

AI Product Manager, London

Work from home Full-time role

AI Product Manager, Berlin

Work from home Full-time role

AI/ML Engineer, Rome

Work from home Full-time role

Manager Service Delivery (Remote in US)

Work from home Full-time role

Remote Data Entry Clerk – Flexible Part-Time Work-From-Home Position | Earn Income From Home

Work from home Full-time role

Disney Data Entry Remote Jobs, Virtual Assistant Jobs - Hiring Now

Work from home Full-time role

Group Air Agent

Work from home Full-time role

Experienced Senior Manager of Strategic Customer Success for Enterprise Clients - Leading High-Performing Teams and Driving Customer Satisfaction in the Americas at blithequark

Work from home Full-time role

RN Care Manager, Remote in Southern Wake County, NC

Work from home Full-time role

Experienced Remote Data Entry Clerk – Dynamic Team at arenaflex

Work from home Full-time role

Immediately Need Teacher of the Visually Impaired in Greene County, OH

Work from home Full-time role

Big Data Engineer - C85793 5.8 Alpharetta, GA (Remote till COVID)

Work from home Full-time role

Experienced Social Media Customer Support Specialist – Electric Vehicle and Renewable Energy Industry Expert

Work from home Full-time role