Senior Distributed Systems Engineer

Reposted 15 Days Ago
Be an Early Applicant
Palo Alto, CA
180K-250K Annually
Senior level
Digital Media
The Role
The role involves collaborating with researchers to enhance distributed systems for training foundation models, focusing on optimization, hardware efficiency, and resilience against failures.
Summary Generated by Built In

We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.

Responsibilities

  • Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.
  • Profile and optimize our model training code-base to achieve best in class hardware efficiency.
  • Build systems to distribute work across massive GPU clusters efficiently.
  • Design and implement methods to robustly train models in the presence of hardware failures.
  • Build tooling to help us better understand problems in our largest training jobs.

Experience

  • 5+ years of work experience.
  • Experience working with multi-modal ML pipelines, high performance computing and/or low level systems.
  • Passion for diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability.
  • Experience building stable and highly efficient distributed systems.
  • Strong generalist Python and Software skills including significant experience with Pytorch.
  • Good to have experience working with high performance C++ or CUDA.

Compensation

  • The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan. 

Your application is reviewed by real people.

Top Skills

C++
Cuda
Python
PyTorch
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Minneapolis, MN
0 Employees
On-site Workplace

What We Do

Luma is a multimedia platform that delivers personalized movie and TV program selections from a range of sources to its viewers.

Similar Jobs

Verkada Inc Logo Verkada Inc

Senior Backend Engineer - Distributed Systems

Cloud • Hardware • Security • Software
San Mateo, CA, USA
2000 Employees
150K-280K Annually

Labelbox Logo Labelbox

Senior Software Engineer, AI Platform, Distributed Systems

Artificial Intelligence • Information Technology • Machine Learning
7 Locations
115 Employees
180K-260K Annually

Databricks Logo Databricks

Senior Software Engineer - Distributed Data Systems

Big Data • Machine Learning • Software • Analytics • Big Data Analytics
San Francisco, CA, USA
2200 Employees
166K-225K Annually
Foster City, CA, USA
525 Employees
160K-200K

Similar Companies Hiring

Artlist Thumbnail
Social Media • Other • Music • Digital Media
Tel Aviv, IL
450 Employees
bet365 Thumbnail
Software • Gaming • Esports • Digital Media • Automation
Denver, Colorado
9000 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account