AI Infrastructure Engineer

Posted 19 Days Ago
Be an Early Applicant
Redwood City, CA
Mid level
Software • Generative AI
The Role
This role involves designing and managing AI infrastructure, optimizing systems for efficiency, and collaborating with teams for AI projects. Requires experience in ML infrastructure and software design.
Summary Generated by Built In


Job Duties:
Design core, backend software components. Interface with other teams to incorporate their innovations and vice versa. Conduct design and code reviews. Analyze and improve efficiency, scalability, and stability of various system resources. Design and implement the hardware and software infrastructure required for AI projects. Procure, configure, and manage servers, GPUs, TPUs, and other hardware resources. Set up cloud-based environments (e.g., AWS, Azure, GCP) for AI workloads. Deploy and manage distributed computing clusters (e.g., Kubernetes) for AI model training and inference. Optimize cluster performance and resource allocation for AI workloads. Monitor cluster health and troubleshoot issues as they arise. Architect and maintain data storage solutions (e.g., data lakes, databases) for AI datasets. Ensure data security, access controls, and data versioning. Implement data pipelines for efficient data ingestion and preprocessing. Develop and maintain automation scripts and tools for infrastructure provisioning and scaling. Implement continuous integration and continuous deployment (CI/CD) pipelines for AI models. Orchestrate workflows for training, evaluation, and deployment of AI models. Optimize infrastructure to handle large-scale AI workloads efficiently. Monitor and analyze system performance, making adjustments as needed. Implement load balancing and scaling strategies to meet demand. Implement security best practices to protect AI infrastructure and data. Stay up-to-date with security vulnerabilities and apply patches and updates. Ensure compliance with relevant data privacy and regulatory requirements. Collaborate with data scientists and AI engineers to understand their infrastructure needs. Provide technical support and troubleshooting assistance for AI infrastructure issues. Train and educate team members on best practices for using AI infrastructure.
Minimum Education & Experience Required:
Must have Bachelor’s degree or the equivalent in Computer Science, Computer Engineering or a related field, plus three (3) years of experience with ML infrastructure (PyTorch, Vertex AI, and Sagemaker) or related experience.


Minimum Skills Required:
Must have experience with: Experience with one or more search engine, recommendations, natural language processing, personalization, or similar applied ML domain. Experience with building, scaling, and optimizing distributed enterprise-grade Machine Learning systems. Experience with architectural patterns of large-scale software applications. Experience with publishing papers in machine learning and/or computer vision conferences and journals. Experience with large-scale machine learning techniques like semi-supervised learning, weakly-supervised learning, and online adaptation of ML models. Experience with publishing machine learning domains such as computer vision and natural language processing.

How to Apply:
Submit resume and apply online at http://www.fireworks.ai/careers and search for job by title.

Top Skills

AWS
Azure
GCP
Kubernetes
PyTorch
Sagemaker
Vertex Ai
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Redwood City, CA
63 Employees
On-site Workplace
Year Founded: 2022

What We Do

Fireworks.ai offers generative AI platform as a service. We optimize for rapid product iteration building on top of gen AI as well as minimizing cost to serve.

https://fireworks.ai/careers

Similar Jobs

Snap Inc. Logo Snap Inc.

Staff Software Engineer, Machine Learning Infrastructure, AI Training Platform, 9+ Years of Experience

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
4 Locations
5000 Employees
195K-343K Annually
Santa Clara, CA, USA
993 Employees
11 Locations
4900 Employees
84K-112K Annually
11 Locations
4900 Employees
147K-195K Annually

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account