Staff AI Infrastructure Engineer: Inference Platform

Posted 23 Days Ago
Be an Early Applicant
Santa Clara, CA
Senior level
Automotive
The Role
Design and implement a real-time data management platform for autonomous driving, optimize data pipelines for deep learning, contribute to data architecture evolution, and lead high-impact projects with multiple stakeholders.
Summary Generated by Built In

XPeng Motors is one of China’s leading smart electric vehicle (“EV”) company. We design, develop, manufactures and market smart EVs that are seamlessly integrated with advanced Internet, AI and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers. We strive to transform smart electric vehicles with technology and data, shaping the mobility experience of the future.

 

We’re looking for people who are as excited as we are to solve the complex technical challenges in autonomous driving, see the results of your work in massive production EV cars and make tremendous impact on our future.

 

Job Responsibilities:

  • Design, implement and operate components of our novel model inference platform( e.g. quota management, job scheduling, and queuing systems). You will play a critical role in scheduling GPU resources.

  • Identify performance bottlenecks and optimization opportunities

  • Work closely with Machine Learning Engineers to evolve the inference platform as per their use cases

  • Monitor system health, diagnose and troubleshoot issues, and perform routine maintenance tasks to ensure the reliability of the distributed inference infrastructure

  • Build and maintain documentation for infrastructure components and systems

Minimum Skill Requirements:

  • Advanced degree (MS or PhD) in Computer Science or related field

  • 5+ years of industry or research experience in ML Infra, model inference

  • Expertise in programming languages like Python/Java/C++ and experience with distributed computing frameworks

  • Experience with high-throughput, fault-tolerant system design

  • Proficient in Docker and Kubernetes

  • Experience with Jenkins, Github CI/CD, or similar tools

  • Experience with Prometheus, Grafana, or similar monitoring solutions

  • Excellent problem-solving skills and attention to detail

  • Strong communication skills and ability to work in a collaborative environment

 

Preferred Skill Requirements:

  • Strong background in building and maintaining large-scale distributed systems

  • Strong background in performance optimization and system scaling

  • Experience in scheduling jobs on heterogeneous computation resources

  • Deep understanding of cloud computing platforms

  • Deep knowledge of monitoring and observability practices

  • Experience with CUDA packages

  • Experience with PyTorch, Tensorflow or similar frameworks

 

What do we provide:

  • A dynamic, supportive, and engaging work environment where creativity thrives.

  • The opportunity to make a significant impact on the transportation revolution through advancements in autonomous driving.

  • Exposure to cutting-edge technologies alongside top industry talent.

  • Competitive compensation package.

  • Perks include snacks, lunches, and organized fun activities.

 

The base salary range for this full-time position is $180,000-$300,000, in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.

 

We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.

 

Top Skills

Java
Python
Scala
The Company
Palo Alto, CA
993 Employees
On-site Workplace
Year Founded: 2014

What We Do

Xpeng Motors is a leading Chinese electric vehicle and technology company that designs and manufactures intelligent automobiles that are seamlessly integrated with the Internet and utilize the latest advances in artificial intelligence. Focusing on China’s young and tech-savvy consumer base, XPENG Motors strives to offer smart mobility solutions with technology innovation and cutting-edge R&D. The company’s initial backers include its CEO & Chairman He Xiaopeng, the founder of UCWeb Inc. and a former Alibaba executive. It was co-founded in 2014 by Henry Xia and He Tao, former senior executives at Guangzhou Auto with expertise in innovative automotive technology and R&D. It has received funding from prominent Chinese and international investors including Alibaba Group, Foxconn Group and IDG Capital. Currently with 3,000 employees, the company is headquartered in Guangzhou and has design, R&D, manufacturing and sales & marketing divisions in Silicon Valley, San Diego, Beijing, Shanghai, Zhaoqing (Guangdong Province) and Zhengzhou (Henan Province).

Similar Jobs

BlackLine Logo BlackLine

Principal Business Systems Analyst HRIS

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Pleasanton, CA, USA
1810 Employees
130K-173K Annually

Atlassian Logo Atlassian

Principal Machine Learning Engineer - Central AI

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
190K-306K Annually

PwC Logo PwC

Data Architect- Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
Los Angeles, CA, USA
364000 Employees
100K-232K Annually

PwC Logo PwC

Data Architect- Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
Irvine, CA, USA
364000 Employees
100K-232K Annually

Similar Companies Hiring

Chamberlain Group Thumbnail
Software • PropTech • Mobile • Internet of Things • Hardware • Automotive • App development
Oak Brook, IL
5637 Employees
Cox Enterprises Thumbnail
Software • Other • Information Technology • Greentech • Cybersecurity • Cloud • Automotive
Atlanta, GA
50000 Employees
UL Solutions Thumbnail
Software • Renewable Energy • Professional Services • Energy • Consulting • Chemical • Automotive
Chicago, IL
15000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account