Software Engineer, AI Infrastructure (Training + Inference)

Posted 2 Days Ago
San Francisco, CA
Mid level
Artificial Intelligence • Software
The Role
The Software Engineer will design and optimize infrastructure for AI training and inference, focusing on distributed systems and performance enhancements.
Summary Generated by Built In

Job title: Software Engineer, AI Infrastructure (Training + Inference) / Member of Technical Staff

Who We Are
WaveForms AI is an Audio Large Language Models (LLMs) company building the future of audio intelligence through advanced research and products. Our models will transform human-AI interactions making them more natural, engaging and immersive.

Role overview: The Software Engineer, AI Infrastructure (Training + Inference) will be responsible for designing, building, and optimizing the infrastructure that powers our large scale training and real-time inference pipelines. This role combines expertise in distributed computing, system reliability, and performance optimization. The candidate will collaborate with researchers with a focus on building scalable systems to support novel multimodal training and maintaining uptime to deliver consistent results for real-time applications.

Key Responsibilities

  • Infrastructure Development: Design and implement infrastructure to support large-scale AI training and real-time inference with a focus on multimodal inputs.

  • Distributed Computing: Build and maintain distributed systems to ensure scalability, efficient resource allocation, and high throughput.

  • Training Stability: Monitor and enhance the stability of training workflows by addressing bottlenecks, failures, and inefficiencies in large-scale AI pipelines.

  • Real-time Inference Optimization: Develop and optimize real-time inference systems to deliver low-latency, high-throughput results across diverse applications.

  • Uptime & Reliability: Implement tools and processes to maintain high uptime and ensure infrastructure reliability during both training and inference phases.

  • Performance Tuning: Identify and resolve performance bottlenecks, improving overall system throughput and response times.

  • Collaboration: Work closely with research and engineering teams to integrate infrastructure with AI workflows, ensuring seamless deployment and operation.

Required Skills & Qualifications

  • Distributed Systems Expertise: Proven experience in designing and managing distributed systems for large-scale AI training and inference.

  • Infrastructure for AI: Strong background in building and optimizing infrastructure for real-time AI systems, with a focus on multimodal data (audio + text).

  • Performance Optimization: Expertise in optimizing resource utilization, improving system throughput, and reducing latency in both training and inference.

  • Training Stability: Experience in troubleshooting and stabilizing AI training pipelines for high reliability and efficiency.

  • Technical Proficiency: Strong programming skills (Python preferred), proficiency with PyTorch, and familiarity with cloud platforms (AWS, GCP, Azure).

Minimum Experience

  • 4-5 years of relevant professional experience is required

Top Skills

AWS
Azure
GCP
Python
PyTorch
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
9 Employees
On-site Workplace
Year Founded: 2024

What We Do

WaveForms AI is an Audio LLM research and product company aiming to solve the Speech Turing Test and create Emotional General Intelligence. Learn more at waveforms.ai/about.

Similar Jobs

True Anomaly Logo True Anomaly

Senior DevOps Engineer

Aerospace • Artificial Intelligence • Hardware • Machine Learning • Software • Defense
3 Locations
131 Employees
137K-170K

Datadog Logo Datadog

Senior Software Engineer - GameSDK

Artificial Intelligence • Cloud • Software • Cybersecurity
Remote
Hybrid
7 Locations
5000 Employees
187K-240K Annually

NBCUniversal Logo NBCUniversal

Broadcast & ENG Maintenance Engineer (NABET) - TEMP HIRE

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Hybrid
Los Angeles, CA, USA
68000 Employees
57K-120K Annually

General Motors Logo General Motors

Principal Networking Software Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Mountain View, CA, USA
165000 Employees
177K-272K Annually

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account