Senior Machine Learning Engineer - Training Platform (AU …, Gold Coast
Senior Machine Learning Engineer - Training Platform (AU …, Gold Coast
-
Gold Coast, Australia
-
Posted: less than a week ago
-
Save
Description
Full-time
- Recruitment type: Permanent Job Description About the Group/Team We're part of the Training Platform team within Canva's AI Platform group, which sits in the Generative AI supergroup. Our team is responsible for the systems that power model training at scale, building the foundations that enable teams across Canva to create, train, and scale AI-powered experiences. Our focus is on building reliable, efficient, and developer-friendly training infrastructure — from orchestration and distributed training systems to experimentation and platform capabilities that support large-scale AI workloads. We enable teams across Canva to push the boundaries of what's possible with AI. About the Role/Specialty As a Senior Machine Learning Engineer, you'll focus on designing, scaling, and maturing the systems and infrastructure that support training workloads across Canva. You'll work on a Kubernetes‐based training platform that enables distributed AI workloads across a wide range of teams, frameworks, and use cases, while also contributing to the surrounding platform capabilities that support the end‐to‐end training lifecycle — such as experiment management, artifact management, and other core systems needed to run AI workloads reliably and at scale. You'll collaborate closely with research scientists, AI engineers, product teams, and cloud/infrastructure teams to ensure workloads can run efficiently, reproducibly, and reliably at scale. You'll also help shape the roadmap for the platform by understanding user pain points, improving platform capabilities, and contributing to the long‐term direction of Canva's training infrastructure. This role is ideal for someone who enjoys working on the systems behind AI — not just the models themselves — and wants to have broad impact across multiple teams. Responsibilities
- Contribute to the evolution of Canva's unified training platform for AI training workloads.
- Improve reliability, observability, debugging, and operational support for training systems.
- Design and build platform capabilities that enable better scheduling at scale, including resource allocation, priority management, and quota management for training workloads.
- Collaborate with research scientists, ML engineers, product teams, and cloud/infrastructure teams to improve training platform workflows and outcomes.
- Shape platform roadmap and priorities based on user pain points, adoption needs, and long‐term platform maturity.
- Mentor engineers and share best practices in AI systems and infrastructure. Qualifications
- Strong experience in training pipelines, distributed systems, or large‐scale AI infrastructure.
- Proficiency with Kubernetes and containerized workloads; experience with training infrastructure or distributed frameworks such as Ray, PyTorch distributed training, or similar technologies is highly valuable.
- Familiarity with modern cloud and infrastructure services that underpin high‐performance AI workloads (e.g., high‐performance storage, HPC environments, fast interconnects and networking, FSx, EFA).
- Strong sense of ownership and ability to work on complex, cross‐cutting problems that impact multiple teams.
- Team-oriented mindset with engineers, applied scientists, and infrastructure partners; deep focus on scalability, reliability, usability, and developer experience. Learning & Development
- Deep expertise in large‐scale AI training systems, Kubernetes‐based workload orchestration, and distributed infrastructure.
- Hands‐on experience with modern AI training workloads at scale.
- Exposure to cloud, storage, and networking capabilities required for high‐performance distributed training.
- Opportunities to influence platform‐wide architecture, roadmap, and AI Platform best practices.
- Growth through collaboration with world‐class ML engineers, applied scientists, and infrastructure specialists.
- Ability to shape how AI is built and scaled across a global product. Benefits
- Equity packages.
- Inclusive parental leave policy.
- Annual Vibe & Thrive allowance for wellbeing, social connection, and office setup.
- Flexible leave options. #J-18808-Ljbffr Apply on Kit Job: kitjobau.com/job/3qsnfk
- Recruitment type: Permanent Job Description About the Group/Team We're part of the Training Platform team within Canva's AI Platform group, which sits in the Generative AI supergroup. Our team is responsible for the systems that power model training at scale, building the foundations that enable teams across Canva to create, train, and scale AI-powered experiences. Our focus is on building reliable, efficient, and developer-friendly training infrastructure — from orchestration and distributed training systems to experimentation and platform capabilities that support large-scale AI workloads. We enable teams across Canva to push the boundaries of what's possible with AI. About the Role/Specialty As a Senior Machine Learning Engineer, you'll focus on designing, scaling, and maturing the systems and infrastructure that support training workloads across Canva. You'll work on a Kubernetes‐based training platform that enables distributed AI workloads across a wide range of teams, frameworks, and use cases, while also contributing to the surrounding platform capabilities that support the end‐to‐end training lifecycle — such as experiment management, artifact management, and other core systems needed to run AI workloads reliably and at scale. You'll collaborate closely with research scientists, AI engineers, product teams, and cloud/infrastructure teams to ensure workloads can run efficiently, reproducibly, and reliably at scale. You'll also help shape the roadmap for the platform by understanding user pain points, improving platform capabilities, and contributing to the long‐term direction of Canva's training infrastructure. This role is ideal for someone who enjoys working on the systems behind AI — not just the models themselves — and wants to have broad impact across multiple teams. Responsibilities
- Contribute to the evolution of Canva's unified training platform for AI training workloads.
- Improve reliability, observability, debugging, and operational support for training systems.
- Design and build platform capabilities that enable better scheduling at scale, including resource allocation, priority management, and quota management for training workloads.
- Collaborate with research scientists, ML engineers, product teams, and cloud/infrastructure teams to improve training platform workflows and outcomes.
- Shape platform roadmap and priorities based on user pain points, adoption needs, and long‐term platform maturity.
- Mentor engineers and share best practices in AI systems and infrastructure. Qualifications
- Strong experience in training pipelines, distributed systems, or large‐scale AI infrastructure.
- Proficiency with Kubernetes and containerized workloads; experience with training infrastructure or distributed frameworks such as Ray, PyTorch distributed training, or similar technologies is highly valuable.
- Familiarity with modern cloud and infrastructure services that underpin high‐performance AI workloads (e.g., high‐performance storage, HPC environments, fast interconnects and networking, FSx, EFA).
- Strong sense of ownership and ability to work on complex, cross‐cutting problems that impact multiple teams.
- Team-oriented mindset with engineers, applied scientists, and infrastructure partners; deep focus on scalability, reliability, usability, and developer experience. Learning & Development
- Deep expertise in large‐scale AI training systems, Kubernetes‐based workload orchestration, and distributed infrastructure.
- Hands‐on experience with modern AI training workloads at scale.
- Exposure to cloud, storage, and networking capabilities required for high‐performance distributed training.
- Opportunities to influence platform‐wide architecture, roadmap, and AI Platform best practices.
- Growth through collaboration with world‐class ML engineers, applied scientists, and infrastructure specialists.
- Ability to shape how AI is built and scaled across a global product. Benefits
- Equity packages.
- Inclusive parental leave policy.
- Annual Vibe & Thrive allowance for wellbeing, social connection, and office setup.
- Flexible leave options. #J-18808-Ljbffr Apply on Kit Job: kitjobau.com/job/3qsnfk
Highlights
-
Company nameCanva
-
Job positionSenior Machine Learning Engineer - Training Platform (AU remote) (Gold Coast)
Safety Tips
Beware of ads written with poor grammar or spelling.
More info about this ad
Senior Machine Learning Engineer - Training Platform (AU … has been posted in the Benowa Engineering category on Locanto.
Right now, this is the only ad posted in this category in Benowa.
Interested in more? Widen your search to view ads in nearby areas of Benowa. This includes Engineering in Broadbeach Waters, Main Beach and Robina. There are more ads within a 15 km radius for this category. If you want to view those ads, click here.