Senior AI Infrastructure Engineer

Posted 16 Days Ago
Be an Early Applicant
3 Locations
114K-205K Annually
Senior level
Other • Utilities
The Role
The Senior AI Infrastructure Engineer will design, deploy, and maintain AI-optimized environments, focusing on performance and reliability while ensuring efficient workload management and supporting AI-driven applications.
Summary Generated by Built In

At T-Mobile, we invest in YOU!  Our Total Rewards Package ensures that employees get the same big love we give our customers.  All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches. That’s how we’re UNSTOPPABLE for our employees!

Do you have a desire to help drive the future direction of T-Mobile’s On-Premises Infrastructure? Is there a passion within you for helping IT customers and developers achieve their technology goals? If so, this could be the position for you! T-Mobile’s Platform Delivery & Automation team is looking for the ideal candidate to join our team of passionate change agents, helping to ensure our developers are data-informed and AI enabled through innovative technology infrastructure solutions. If developing private cloud solutions, hands on engineering, and customer focus is your thing, then apply!
Our mission is to accelerate the delivery of on-premises and private cloud solutions that empower our customers to self-serve on scalable, secure, cost-effective infrastructure faster and on-demand. We support and deliver complex virtualized and bare-metal workloads. We challenge the status quo and everything we do is focused on value-add to our customer base.
This role will be responsible for designing, deploying, and maintaining high-performance computing environments optimized for AI and machine learning workloads. The role involves building scalable infrastructure, ensuring efficient workload management, providing self-service and on-demand tooling, and collaborating with teams to support AI-driven applications. This role will drive operational excellence, and work with diverse hardware and software solutions to enhance performance and reliability of our on-premises AI/ML infrastructure.

Job Responsibilities

  • Technical System Expertise: Understands system protocols, how systems operate and data flows. Aware of the benefits of current technology. Understands the building blocks, interactions, dependencies, and tools required to complete software and automation work. Independent study of the latest technology is expected.
  • Technical Engineering Services: Drives engineering projects by active contribution to the application of engineering techniques; conducting tests and inspections; preparing reports and calculations. Expected to supervise and mentor associate and base level engineers as needed. Develops procedures and processes to validate and enhance and optimize network. Creates appropriate validation tests and inspection techniques and effectively documents results. Able to prepare executive summaries of activities and clearly communicate areas of opportunity.  Work closely with developers, Data Scientists, and other software engineers to understand their requirements and usage patterns
  • Innovation: Contributes to designs to implement new ideas which improve an existing and new system/process/service. Understands and can apply new industry perspectives to our existing business model. Review existing designs and processes to highlight more efficient ways to complete existing workload more effectively through industry perspectives.
  • Technical Writing: Writes basic documentation on how technology works. Creates clear documentation for new code and systems used. Documenting systems designs, presentations, and business requirements for consumption and consideration at the manager level.
  • Technical Leadership: Collaborates with technical teams and utilizes system expertise to deliver technical solutions. Continuously learns and teaches others existing & new technologies. Contributes to the development of others through mentoring or in-house workshops and learning sessions.
  • Technology Strategy: Contributes to new and existing technology options that support business goals.

Minimum Required

  • 5+ years technical engineering experience, preferably in multiple technology focus areas
  • Expert understanding of AI/ML infrastructure components, or GPU-based systems – preferably in a high-availability, large scale environment.
  • Hands-on Experience with NVIDIA DGX servers, BasePOD architectures, and advanced GPU technologies.
  • Proficient in Linux/UNIX environments, including scripting/automation tools (Bash, Python, Ansible, Terraform)
  • Understanding of AI infrastructure security best practices
  • Experience with container orchestration (Kubernetes, Docker) and GPU workload management tools.
  • Strong knowledge of networking (InfiniBand/Ethernet) and storage solutions in AI/ML contexts.


Nice to Have

  • Understanding of CI/CD pipelines using tools such as Git, Artifactory, Jenkins, etc.
  • Experience with AI/ML pipelines (PyTorch, TensorFlow, RAPIDS AI, or other deep learning frameworks)
  • Experience with configuring and using monitoring tools (e.g., Prometheus, Grafana, NVIDIA DGCM)

Education

  • Bachelor’s degree in computer science, computer information systems, computer applications, engineering or related.
  • In lieu of a degree, experience within a technology background may be considered.

• At least 18 years of age
• Legally authorized to work in the United States
Travel:
Travel Required (Yes/No):Yes
DOT Regulated:
DOT Regulated Position (Yes/No):No
Safety Sensitive Position (Yes/No):No

Base Pay Range: $113,600 - $205,000

Corporate Bonus Target: 15%

The pay range above is the general base pay range for a successful candidate in the role. The successful candidate’s actual pay will be based on various factors, such as work location, qualifications, and experience, so the actual starting pay will vary within this range.

At T-Mobile, employees in regular, non-temporary roles are eligible for an annual bonus or periodic sales incentive or bonus, based on their role. Most Corporate employees are eligible for a year-end bonus based on company and/or individual performance and which is set at a percentage of the employee’s eligible earnings in the prior year. Certain positions in Customer Care are eligible for monthly bonuses based on individual and/or team performance. To find the pay range for this role based on hiring location, https://paylookup.t-mobile.com/paylookup?reqID=REQ306486¶dox=1

At T-Mobile, our benefits exemplify the spirit of One Team, Together! A big part of how we care for one another is working to ensure our benefits evolve to meet the needs of our team members. Full and part-time employees have access to the same benefits when eligible. We cover all of the bases, offering medical, dental and vision insurance, a flexible spending account, 401(k), employee stock grants, employee stock purchase plan, paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave, family building benefits, back-up care, enhanced family support, childcare subsidy, tuition assistance, college coaching, short- and long-term disability, voluntary AD&D coverage, voluntary accident coverage, voluntary life insurance, voluntary disability insurance, and voluntary long-term care insurance. We don't stop there - eligible employees can also receive mobile service & home internet discounts, pet insurance, and access to commuter and transit programs! To learn about T-Mobile’s amazing benefits, check out www.t-mobilebenefits.com.

Never stop growing!
As part of the T-Mobile team, you know the Un-carrier doesn’t have a corporate ladder–it’s more like a jungle gym of possibilities! We love helping our employees grow in their careers, because it’s that shared drive to aim high that drives our business and our culture forward. By applying for this career opportunity, you’re living our values while investing in your career growth–and we applaud it. You’re unstoppable!
T-Mobile USA, Inc. is an Equal Opportunity Employer. All decisions concerning the employment relationship will be made without regard to age, race, ethnicity, color, religion, creed, sex, sexual orientation, gender identity or expression, national origin, religious affiliation, marital status, citizenship status, veteran status, the presence of any physical or mental disability, or any other status or characteristic protected by federal, state, or local law. Discrimination, retaliation or harassment based upon any of these factors is wholly inconsistent with how we do business and will not be tolerated.
Talent comes in all forms at the Un-carrier. If you are an individual with a disability and need reasonable accommodation at any point in the application or interview process, please let us know by emailing [email protected] or calling 1-844-873-9500. Please note, this contact channel is not a means to apply for or inquire about a position and we are unable to respond to non-accommodation related requests.

Top Skills

Ai Infrastructure Security
Ansible
Basepod Architectures
Bash
Docker
Grafana
Kubernetes
Linux
Nvidia Dgx Servers
Prometheus
Python
PyTorch
Rapids Ai
TensorFlow
Terraform
Unix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bellevue, WA
89,016 Employees
On-site Workplace

What We Do

T-Mobile U.S. Inc. (NASDAQ: TMUS) is America’s supercharged Un-carrier, delivering an advanced 4G LTE and transformative nationwide 5G network that will offer reliable connectivity for all. T-Mobile’s customers benefit from its unmatched combination of value and quality, unwavering obsession with offering them the best possible service experience and undisputable drive for disruption that creates competition and innovation in wireless and beyond. Based in Bellevue, Wash., T-Mobile provides services through its subsidiaries and operates its flagship brands, T-Mobile, Metro by T-Mobile and Sprint.

Similar Jobs

NVIDIA Logo NVIDIA

Senior Software Engineer – AI Infrastructure and Tooling

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Remote
6 Locations
21960 Employees

Atlassian Logo Atlassian

Lead Principal Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
Seattle, WA, USA
11000 Employees
232K-373K Annually

Qualtrics Logo Qualtrics

Staff Machine Learning Engineer - DICE

Artificial Intelligence • Information Technology • Natural Language Processing • Software • Business Intelligence • Generative AI
Seattle, WA, USA
5000 Employees
195K-356K Annually

Atlassian Logo Atlassian

Distinguished Engineer - Atlassian Corporate Engineering

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
Seattle, WA, USA
11000 Employees
327K-400K Annually

Similar Companies Hiring

Voltage Park Thumbnail
Software • Other • Machine Learning • Infrastructure as a Service (IaaS) • Hardware • Cloud • Artificial Intelligence
San Francisco, CA
51 Employees
Energy CX Thumbnail
Utilities • Professional Services • Greentech • Financial Services • Energy • Consulting • Business Intelligence
Chicago, IL
55 Employees
Artlist Thumbnail
Social Media • Other • Music • Digital Media
Tel Aviv, IL
450 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account