Staff Cloud Software Engineer, Cloud Infrastructure

Posted 15 Days Ago
Hiring Remotely in United States
Remote
Senior level
Hardware • Manufacturing
The Role
Design and implement distributed systems for AI computing, collaborating across the application life cycle while ensuring effective deployment and operations in cloud environments.
Summary Generated by Built In

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.

This Staff Cloud Software position is looking to bring new specialized expertise into the team in the area of distributed high-performance and AI computing, especially in Kubernetes-based cloud native environments. You will be driving design, implementation, and integration of systems to support scaling compute capabilities seamlessly from single-host systems into exaflop-scale clusters.

This role is hybrid, based out of Santa Clara, CA or Austin, TX.

We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.


Responsibilities:

  • Design and drive implementation of distributed systems for AI computing applications in Cloud and novel supercomputing cluster environments
  • Hands-on software development, testing, integration, operations, and support
  • Closely collaborate with the team through the full stack and life cycle of AI data center applications, from data center design and rollout to MLOps
  • Operate within on-premises data centers and public cloud environments
  • Drive projects through their whole software development lifecycle, both on technical and non-technical side
  • Collaboration with both highly technical and non-technical stakeholders with differing backgrounds, being able to communicate highly complex topics to diverse audiences
  • Continuous improvement of engineering practices through code reviews and adoption of relevant techniques and technologies


Experience & Qualifications:

  • 10+ years of hands-on software engineering experience working with distributed systems in Cloud and/or HPC environments
  • 5+ years of experience working with clustered (multi-host) AI hardware and applications for training and inference
  • 5+ years of experience with Kubernetes clusters, including cluster and application deployment (e.g., CNI, CSI, Helm), operations, and development of extensions (e.g., Device plugins, Operators)
  • Strong working knowledge of Python and Go
  • Infrastructure as Code as a first-class citizen (e.g. Ansible)
  • Strong Git, GitOps, and CI/CD experience
  • Familiarity with performance requirement implications of AI/ML workloads, both inference and training
  • Familiarity with virtualization technologies and platforms
  • Hands-on experience with MLOps concepts and frameworks for end-to-end model training pipelines
  • Strong understanding of networking concepts – experience with network hardware configuration and management is a plus
  • Familiarity with security implications of multi-tenant environments on hardware, software, and networking level
  • Familiarity with observability, monitoring and alerting tools (e.g., Grafana, Prometheus, Loki)
  • Agile / lean software project management experience
  • Strong programming skills with years of experience in various programming languages; familiarity of both object oriented and functional programming
  • REST API development and integration experience – full-stack web development experience is a plus


Compensation for all engineers at Tenstorrent ranges from $100k - $500k including base and variable compensation targets. Experience, skills, education, background and location all impact the actual offer made.

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been licensing conditions set  by the U.S. government.

Our engineering positions and certain engineering support positions require access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency, asylee and refugee information and/or documentation will be required and considered as Tenstorrent moves through the employment process.

If a U.S. export license is required, employment will not begin until a license with acceptable conditions is granted by the U.S. government.  If a U.S. export license with acceptable conditions is not granted by the U.S. government, then the offer of employment will be rescinded.

Top Skills

Ansible
Ci/Cd
Git
Gitops
Go
Grafana
Kubernetes
Mlops
Prometheus
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Toronto, ON
389 Employees
On-site Workplace
Year Founded: 2016

What We Do

Tenstorrent is a next-generation computing company that builds computers for AI.

Headquartered in Toronto, Canada, with U.S. offices in Austin, Texas, and Silicon Valley, and global offices in Belgrade and Bangalore, Tenstorrent brings together experts in the field of computer architecture, ASIC design, advanced systems, and neural network compilers.

Join us: www.tenstorrent.com/careers

Similar Jobs

Airbnb Logo Airbnb

Staff Software Engineer, Cloud Infrastructure

Real Estate • Travel • PropTech
Remote
United States
14622 Employees
204K-255K Annually

Dandy Logo Dandy

Software Engineering Manager - 3D Applications

Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
Remote
USA
1200 Employees
200K-250K Annually
Remote
United States
3000 Employees
192K-213K Annually

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account