Staff Software Engineer, ML Platform

Posted 15 Days Ago
Be an Early Applicant
Sunnyvale, CA
Expert/Leader
Artificial Intelligence • Transportation
The Role
The Staff Software Engineer for the ML Platform at Wayve will take ownership of the machine learning training infrastructure, ensuring high availability, reliability, and scalability. Responsibilities include collaborating with machine learning engineers to optimize models, mentoring mid-level engineers, and managing projects involving scheduling and orchestration of training jobs across cloud services. This role focuses on building stable, scalable infrastructure for large model training.
Summary Generated by Built In

At Wayve we're committed to creating a diverse, fair and respectful culture that is inclusive of everyone based on their unique skills and perspectives, and regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, veteran status, pregnancy or related condition  (including breastfeeding) or any other basis as protected by applicable law.  

About us   

Founded in 2017, Wayve is the leading developer of Embodied AI technology.  Our advanced AI software and foundation models enable vehicles to perceive, understand, and navigate any complex environment, enhancing the usability and safety of automated driving systems.

Our vision is to create autonomy that propels the world forward.  Our intelligent, mapless, and hardware-agnostic AI products are designed for automakers, accelerating the transition from assisted to automated driving.  

At Wayve, big problems ignite us—we embrace uncertainty, leaning into complex challenges to unlock groundbreaking solutions. We aim high and stay humble in our pursuit of excellence, constantly learning and evolving as we pave the way for a smarter, safer future.

At Wayve, your contributions matter.  We value diversity, embrace new perspectives, and foster an inclusive work environment; we back each other to deliver impact.  

Make Wayve the experience that defines your career!  

The role 

We are looking for a Staff Software Engineer to drive the direction of the Wayve Machine Learning platform. The ML Platform team owns the machine learning training infrastructure and works with users to ensure that this infrastructure is reliable and efficiently utilised.

Key responsibilities:

  • You will take ownership of the training infrastructure, which is used for distributed training of large jobs. Your technical decisions will drive high quality projects that ensure availability, reliability and scalability of the system.
  • You will be working across functions with machine learning research engineers to optimise models so that they can be trained efficiently by maximising their usage of hardware resources and improving their reliability and observability.
  • You will collaborate with technical and non-technical stakeholders to understand current user needs and identify future bottlenecks.
  • You will guide and mentor mid-level engineers and promote high software engineering standards

Examples Projects:

  • Training job scheduling and orchestration e.g. tooling to schedule jobs across multiple cloud providers depending on model needs and hardware availability.
  • Tooling which provides thousands of GPUs simultaneously to our driving simulator, which we use to test the driving performance of our models off road.
  • Profiling training jobs with tools such as NVIDIA Nsight, identifying bottlenecks and optimizing the models to increase efficiency.

About you  

In order to set you up for success in this role at Wayve, we’re looking for the following skills and experience.  

Essential

  • Minimum of 10 years experience in platform engineering or similar field with a proven track record of designing and scaling resilient systems
  • Proficiency in Python, with the ability to mentor engineers on best practices and scalable design
  • Extensive experience with concurrent, parallel and distributed computing, including performance tuning and optimisation for large-scale applications
  • Comprehensive knowledge of cloud platforms, preferably Azure, including architecture design, cost optimization, security best practices and declarative configuration (Terraform)
  • Proven experience with containerization and orchestration technologies, including advanced knowledge of Docker and Kubernetes
  • Leadership and mentorship experience, guiding mid-level engineers, driving technical decision-making and collaborating with cross-functional teams to align engineering initiatives with business goals.
  • Passion for building stable and scalable infrastructure that empowers users to train large models seamlessly, efficiently and at scale.

Desirable

  • Experience with ML frameworks, preferably PyTorch, with a strong understanding of their internal workings and optimisation strategies.
  • Proven ability to profile, optimise and scale ML training jobs using advanced tools such as NVIDIA Nsight or TensorRT

#LI-HH1

We understand that everyone has a unique set of skills and experiences and that not everyone will meet all of the requirements listed above. If you’re passionate about self-driving cars and think you have what it takes to make a positive impact on the world, we encourage you to apply.

For more information visit Careers at Wayve. 

To learn more about what drives us, visit Values at Wayve 

DISCLAIMER: We will not ask about marriage or pregnancy, care responsibilities or disabilities in any of our job adverts or interviews. However, we do look to capture information about care responsibilities, and disabilities among other diversity information as part of an optional DEI Monitoring form to help us identify areas of improvement in our hiring process and ensure that the process is inclusive and non-discriminatory.



Top Skills

Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: London
200 Employees
On-site Workplace
Year Founded: 2017

What We Do

We're Wayve, a leading developer of embodied intelligence for autonomous vehicles. We use AI to pioneer a next-generation approach to self-driving: AV2.0, which enables fleet operators to unlock the benefits of AV technology at scale.

Founded in 2017, Wayve is made up of a diverse team of experts in machine learning and robotics. We were the first to deploy AVs on public roads with end-to-end deep learning. Today, our teams are based in London and California, and we're testing AVs in cities across the UK.

Inspired by our vision for a smarter, safer, more sustainable world, we're looking for people who are passionate about building breakthrough solutions to some of the world’s most important challenges. If you're looking for an exciting opportunity with a dynamic team, get in touch!

Similar Jobs

Snap Inc. Logo Snap Inc.

Staff Software Engineer, Machine Learning Infrastructure, AI Training Platform, 9+ Years of Experience

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
4 Locations
5000 Employees
195K-343K Annually

Snap Inc. Logo Snap Inc.

Software Engineer, ML Infrastructure, 6+ Years of Experience

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
5 Locations
5000 Employees
178K-313K Annually
Easy Apply
Hybrid
San Francisco, CA, USA
1100 Employees

Celonis Logo Celonis

Senior Software Engineer - Machine Learning

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Hybrid
Palo Alto, CA, USA
3000 Employees
164K-214K Annually

Similar Companies Hiring

Stepful Thumbnail
Software • Healthtech • Edtech • Artificial Intelligence
New York, New York
60 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account