Sr Staff Engineer, Cloud Infrastructure

Posted 10 Days Ago
Be an Early Applicant
San Francisco, CA
Senior level
Software
The Role
The Sr. Staff Cloud Infrastructure Engineer will design, automate, and optimize a Kubernetes-based platform to ensure scalability and reliability. Responsibilities include leading platform management, architecting automation, integrating observability tools, and collaborating with development teams. The role emphasizes mentoring and improving developer productivity by simplifying infrastructure deployment.
Summary Generated by Built In

Amplitude is a leading digital analytics platform that helps companies unlock the power of their products. More than 3,500 customers, including Atlassian, Jersey Mike’s, NBCUniversal, Shopify, and Under Armour, rely on Amplitude to gain self-service visibility into the entire customer journey. Amplitude guides companies every step of the way as they capture data they can trust, uncover clear insights about customer behavior, and take faster action. When teams understand how people are using their products, they can deliver better product experiences that drive growth. 

As an organization, we approach challenges with humility, take ownership of our contributions, and embrace a growth mindset that pushes us to constantly improve ourselves, each other, and the value we bring to customers and partners.

Amplitude’s Commitment to Diversity Equity & Inclusion (DEI): Amplitude believes that diversity enables the creation of better products, improves the ability to solve complex problems, and drives more powerful solutions. We strive to create an environment of inclusion—one focused on psychological safety, empathy, and human connection—that will allow employees of all backgrounds to thrive.

About the Role:

We are looking for a highly experienced and collaborative Sr. Staff Cloud Infrastructure Engineer to join our team. You will be responsible for the design, automation, and optimization of our Kubernetes-based platform, ensuring that it is scalable, easy to use, and reliable. This is a critical role for our company, and we are seeking someone who not only has deep technical expertise in cloud infrastructure and Kubernetes but also values mentoring, collaboration, and open communication. Your work will directly impact our developer productivity by building systems and abstractions that simplify the deployment of new workloads, making it easy for developers to focus on building features, not infrastructure.

In this role, you will help drive a cultural shift in how our Platform team operates, working to create positive relationships across the company, building trust, and making our platform easier to use. We value someone who listens, learns, and communicates effectively while still ensuring high technical standards and reliability.


Key Responsibilities:

  • Lead the design, implementation, and management of our Kubernetes-based platform, focusing on scalability, developer experience, and system reliability.
  • Architect and maintain automation around Kubernetes, ensuring that the platform is easy for developers to use and requires minimal toil to deploy or modify workloads in a self-service model.
  • Collaborate with cross-functional teams (developers, leaders, and other infrastructure teams) to gather requirements, build consensus, and deliver impactful solutions.
  • Integrate observability into the platform, using tools like Datadog, Prometheus, Grafana, New Relic, and Splunk to monitor system health and performance.
  • Drive infrastructure-as-code initiatives using tools like Kubernetes Operators, Helm, Kustomize, and Terraform promoting automation, repeatability, and reliability.
  • Ensure that the platform integrates seamlessly with CI/CD pipelines (using Argo CD / Workflows / Rollouts, Github Actions, Jenkins, or similar) and continuously improve developer workflows.
  • Contribute to the operational excellence of the platform, including on-call responsibilities and incident management, while building self-healing capabilities where possible.
  • Act as a mentor to other engineers on the team, promoting growth and knowledge sharing, ensuring that the team thrives even in the absence of specific individuals.
  • Foster a culture of collaboration, empathy, and trust within the team and across departments, helping to bridge gaps between engineering and other business functions.
  • Take a hands-on approach to problem-solving, sometimes submitting PRs to resolve issues in codebases or providing detailed solutions when teams need assistance.


What We’re Looking For:

  • 8+ years of experience in some combination of cloud-native software development, platform engineering, site reliability engineering, and/or cloud infrastructure, with a more recent focus on Kubernetes and the cloud-native ecosystem.
  • Strong expertise in Kubernetes and related CNCF projects (e.g., Argo CD/Workflows, Backstage, Envoy, CoreDNS, and more) and in simplifying complex cloud infrastructure for broader teams.
  • Operational experience at scale with technologies like Kafka and Airflow.
  • Proficient in common infrastructure languages like Golang, Python, and Terraform, with experience developing and operating production systems.
  • Extensive experience with AWS cloud infrastructure, networking, and security.
  • Proven experience with monitoring and observability tools (Datadog, Splunk, Prometheus, Grafana Cloud, etc.) and a strong understanding of system performance tuning.
  • Expertise in building abstractions over Kubernetes to simplify developer interaction with the platform.
  • Excellent communication skills, with the ability to collaborate across teams, build consensus, and drive initiatives in a high-pressure environment.
  • High level of empathy and patience, with a commitment to mentoring and helping others succeed, and the ability to incorporate feedback and turn it into actionable improvements.
  • Experience with infrastructure-as-code and automation (Terraform, Helm, Kustomize, etc.), with a focus on reducing toil and operational overhead.
  • A mindset focused on improving the developer experience and business alignment, with the flexibility to make decisions that may go against ideal technical preferences when necessary.

By applying for this job, you acknowledge that Amplitude processes your personal data in accordance with the Amplitude Applicant Privacy Notice.

Staying Safe - Protect Yourself From Recruitment Fraud
We are aware of individuals and entities fraudulently representing themselves as Amplitude recruiters and/or hiring managers. Amplitude will never ask for financial information or payment, or for personal information such as bank account number or social security number during the job application or interview process. Any emails from the Amplitude recruiting team will come from an @amplitude.com email address. You can learn more about how to protect yourself from these types of fraud by referring to this article. Please exercise caution and cease communications if something feels suspicious about your interactions.

Top Skills

Kubernetes
The Company
New York, NY
505 Employees
On-site Workplace
Year Founded: 2012

What We Do

Amplitude is the Digital Optimization System. Powered by the proprietary Amplitude Behavioral Graph, the Digital Optimization System enables organizations to see and predict which combination of features and actions translate to business outcomes – from loyalty to lifetime value – and intelligently adapt each experience in real-time based on these insights. Amplitude is the brain behind more than 45,000 digital products at over 1,000 enterprise customers and 23 of the Fortune 100, helping them innovate faster and smarter by answering the strategic question: "How do our digital products drive our business?"

Similar Jobs

General Motors Logo General Motors

JR-202421799 Sr. Dev Ops Software Engineer - Commercial Software

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Mountain View, CA, USA
165000 Employees
152K-233K Annually

BAE Systems, Inc. Logo BAE Systems, Inc.

Experienced DevOps Engineer - GenAI - Hybrid

Aerospace • Hardware • Information Technology • Security • Software • Cybersecurity • Defense
Hybrid
San Diego, CA, USA
40000 Employees
112K-191K Annually

General Motors Logo General Motors

JR-202424636 Senior Software Developer - Simulation DevOps Infrastructure

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Mountain View, CA, USA
165000 Employees
152K Annually

Anduril Logo Anduril

Cloud Infrastructure Engineer

Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
Costa Mesa, CA, USA
1400 Employees
122K-183K Annually

Similar Companies Hiring

Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account