Senior AI Infrastructure Engineer

Posted 13 Hours Ago
Be an Early Applicant
San Francisco, CA
Senior level
Artificial Intelligence • Information Technology
The Role
Senior Cloud Engineer responsible for building a highly available, global, multi-cloud PaaS platform for AI workloads. Must have experience with infrastructure-as-code, cloud microservices architectures, Kubernetes, and software development fundamentals.
Summary Generated by Built In


As a Senior AI Infrastructure Engineer, you will be responsible for building the next generation, highly available, global, multi-cloud PaaS platform with open-source technologies to enable and accelerate Together AI’s rapid growth.

This system spans many diverse environments (Kubernetes, VMs, bare metal compute, and edge deployments) and provides a cohesive and reliable abstraction for running AI workloads in them. You will get to be a technology thought leader, evangelize new, cutting-edge technologies, and solve complex problems.

To be successful, you’ll need to be deeply technical and possess excellent communication, collaboration, and diplomacy skills. You have experience practicing infrastructure-as-code, including using tools like Terraform and Ansible. You have strong software development fundamentals and skills. In addition, you have strong systems knowledge and troubleshooting abilities.


Requirements

  • 5+ years of professional software development experience and proficiency in at least one backend programming language (Golang desired)
  • Demonstrated experience with high performance or distributed cloud microservices architectures and ideally experience building them in operation at a global scale using multiple cloud providers such as AWS, Azure, or GCP
  • Excellent understanding of low level operating systems concepts including multi-threading, memory management, networking and storage, performance, and scale
  • Pragmatic, methodical, well-organized, detail-oriented, and self-starting
  • Experience with Kubernetes and containerization, VPNs, AI workloads, and blockchain based protocols a plus
  • GPU programming, NCCL, CUDA knowledge a plus
  • Experience with Pytorch or Tensorflow a plus
  • 5+ years experience writing high-performance, well-tested, production quality code


Responsibilities

  • Perform architecture and research work for decentralized AI workloads
  • Work on the core, open-source Together AI platform
  • Create services, tools, and developer documentation
  • Create testing frameworks for robustness and fault-tolerance


About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure.


Compensation

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge.

Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at https://www.together.ai/privacy  

Top Skills

Go
The Company
San Francisco, California
84 Employees
On-site Workplace
Year Founded: 2022

What We Do

Together AI is a research-driven artificial intelligence company. We contribute leading open-source research, models, and datasets to advance the frontier of AI. Our decentralized cloud services empower developers and researchers at organizations of all sizes to train, fine-tune, and deploy generative AI models. We believe open and transparent AI systems will drive innovation and create the best outcomes for society

Similar Jobs

LogicMonitor Logo LogicMonitor

Senior UI Engineer

Artificial Intelligence • Cloud • Information Technology • Machine Learning • Software
Easy Apply
Hybrid
Santa Barbara, CA, USA
1100 Employees
125K-160K Annually

Crunchyroll Logo Crunchyroll

Staff Site Reliability Engineer - Data Engineering, Platform

Digital Media • eCommerce • Gaming • Mobile • News + Entertainment
Remote
San Francisco, CA, USA
1200 Employees
191K-239K Annually

Crunchyroll Logo Crunchyroll

Senior Data Engineer - Platform Engineering

Digital Media • eCommerce • Gaming • Mobile • News + Entertainment
Remote
San Francisco, CA, USA
1200 Employees
185K-232K Annually

Grammarly Logo Grammarly

System Engineer, Finance Infrastructure

Artificial Intelligence • Information Technology • Machine Learning • Natural Language Processing • Productivity • Software • Generative AI
Easy Apply
San Francisco, CA, USA
900 Employees

Similar Companies Hiring

Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account