Staff Engineer - Site Reliability

Posted 7 Hours Ago
Hiring Remotely in United States
Remote
Senior level
Software
The Role
The SRE - Staff Engineer will focus on maintaining and improving the reliability, availability, and performance of critical systems at Aviatrix. Responsibilities include system architecture, automation, monitoring, incident response, and ensuring compliance with business SLOs. The role involves extensive collaboration within a global team to enhance system efficiency and customer satisfaction.
Summary Generated by Built In

 

 

The Aviatrix SRE team is a small but highly skilled global group of Systems Engineers/SREs dedicated to ensuring the reliability, availability, and performance of Aviatrix’s critical systems and services. Our mission is to build and maintain a robust, resilient infrastructure that enables Aviatrix to deliver high-quality services with agility through automation, best practices, and a culture of operational excellence.

About the Role

As an SRE – Staff Engineer, you’ll play a key role in designing, implementing, and maintaining highly available, fault-tolerant, and scalable systems. You’ll focus on automation, proactive monitoring, and Infrastructure-as-Code (IaC) to drive efficiency and reliability across our services.

Tech Stack & Responsibilities

  • Kubernetes – Manage application lifecycles, automate operational tasks, troubleshoot issues, integrate monitoring and alerting, optimize infrastructure, and ensure reliable operations using custom-built operators and cdk8s.
  • Terraform – Implement Infrastructure-as-Code (IaC) to enable rapid provisioning, seamless configuration changes, and efficient scaling.
  • Automation & Development – Build and enhance automation tools and frameworks in Golang and Python to streamline operations.

On-Call Rotation

We maintain a structured on-call rotation to ensure 24/7 coverage:

During Business Hours (rotates every 2 days)

  • EST: 9 AM – 6 PM
  • CST: 8 AM – 5 PM
  • PST: 6 AM – 3 PM

Outside Business Hours (6 PM – 9 AM PT, rotates weekly: Monday to Monday)Location & Eligibility

This is a remote role open to candidates located in the US or Canada. You must be eligible to work in either country and currently reside there.

If you're passionate about building resilient infrastructure, automating operations, and ensuring system reliability at scale, we'd love to hear from you! 🚀

RESPONSIBILITIES:  

  • Ensure Reliability and Availability: You will ensure uptime for crucial services and systems based on business required SLOs. Minimize service disruptions through proactive monitoring, capacity planning and fault-tolerant design.
  • Architecture and System Design: you will design and architect complex, scalable and reliable systems.
  • Automation and Efficiency: you will develop and implement automation tools and frameworks to automate routine tasks to reduce human error and to streamline and improve operational processes to increase efficiency.
  • Build Observability and Monitoring tools: you will define, build, deploy, maintain, and extend our observability and monitoring tools to enhance system reliability and availability.
  • Incident Management and Response: you will maintain an effective on-call rotation to ensure 24/7 coverage. You will respond to incident response procedures to swiftly address and mitigate service disruptions.
  • Performance Monitoring and SLIs/SLOs: you will help define and monitor Service level Indicators (SLIs) and Service Level Objectives to set clear expectations for system performance.
  • Collaboration: you will work closely with product engineering to ensure service-level objectives and reliability targets are met
  • Problem-Solving & Troubleshooting: you respond to escalations by troubleshooting complex system and application incidents, perform root cause analysis, implement necessary corrective actions.
  • Thought Leadership and Innovation: Stay up to date with latest industry trends, emerging technologies. Iterate on best practices to increase the quality & velocity of development and deliverables.  

QUALIFICATIONS:   

  • 8+ years of experience maintaining and deploying highly available, fault-tolerant systems at scale. 
  • Proficiency in Golang or Python is required.
  • Infrastructure-as-code (IaC): Deep understanding of Terraform core components (e.g., Terragrunt is a bonus) with real-world experience using Terraform for infrastructure provisioning and management.
  • At least one cloud service provider experience (e.g., AWS, GCP, Azure, OCI)  
  • Good knowledge with Kubernetes (e.g., cdk8s and operators are a bonus)
  • Solid experience developing Automation tools and frameworks.
  • Experience with Logging Solutions (e.g., Loki, Syslog, Elasticsearch, Logstash, Kibana, Filebeat, Fluentbit, etc.) 
  • Experience with Monitoring and Metrics Solutions (e.g., Prometheus, Grafana, Victoria Metrics)
  • Practical experience with Linux system administration
  • Experience with Version control system (e.g., Git, GitHub) and code review  
  •  Excellent communication skills are required.

US Pay Range

The US annual base salary range for this full-time position is $177,000-$190,000 + benefits + 401(k) match + equity. The pay range is determined by the role, work location, job-related skills, level, experience, and relevant education. [Certain roles are eligible to earn sales commission, depending on the terms of the applicable plan.] The range displayed is the minimum and maximum target base salary and is applicable only for new hires for the listed position located in the US. Your Talent Advisor can share more details regarding salary ranges, benefits, and equity for your location during the hiring process.


BENEFITS

US: We cover 100% of employee premiums and 88% of dependent(s) premiums for medical, dental and vision coverage, 401(k) match, short and long-term disability, life/AD&D insurance, $1,000/year education reimbursement, and a flexible vacation policy. 

Outside the US: We offer a comprehensive benefits package which, (subect to regional variations) could include pension, private medical for you and dependents, generous holiday allowance, life assurance, long-term disability, annual wellbeing stipend

Your total compensation package will be based on job-related knowledge, education, certifications and location, per our aligned ranges.

About Aviatrix
Aviatrix is the cloud networking expert. We’re on a mission to make cloud networking simple so companies stay agile. Trusted by more than 500 of the world’s leading enterprises, our cloud networking platform creates the visibility, security, and control needed to adapt with ease and move ahead at speed. Combined with the Aviatrix Certified Engineer (ACE) Program, the industry's leading multicloud networking and security certification, Aviatrix empowers the cloud networking community to stay at the forefront of digital transformation.

WE WANT TO INCLUDE YOU

We embrace the fact that not everyone’s journey took the same route or started at the same place. If your experience doesn’t quite meet the requirements but the opportunity excites you and you believe you could be great, don’t let that hold you back from applying. Tell us what you CAN bring and what makes you special.

Aviatrix is a community where everyone's career can grow and we want to help you achieve your goals and be “your best YOU,” however that looks. If you're seeking an opportunity where you can be excited to start work every morning with enthusiastic people, make a real difference and be part of something amazing then let’s talk. We want to get to know you and how we could grow together.

Aviatrix, Inc. is an equal opportunity employer and does not make hiring decisions based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.

CPRA - California Applicant Privacy Notice


Top Skills

Go
Kubernetes
Python
The Company
HQ: Santa Clara, CA
223 Employees
On-site Workplace
Year Founded: 2014

What We Do

Aviatrix cloud network platform delivers advanced networking, security and operational visibility required by enterprises with the simplicity and automation of cloud. More than 400 customers worldwide leverage Aviatrix and it’s proven multi-cloud network reference architecture to design, deploy and operate a repeatable network and security architecture that is consistent across any public cloud. Combined with the industry’s first and only multi-cloud networking certification (ACE), Aviatrix is empowering IT to lead and accelerate the transformation to the cloud. Learn more at Aviatrix.com.

Similar Jobs

EZ Texting Logo EZ Texting

Staff Site Reliability Engineer, Telecom & SMS

Information Technology • Marketing Tech
Remote
United States
74 Employees
155K-188K Annually

Workiva Logo Workiva

Staff Software Engineer - Site Reliability

Artificial Intelligence • Cloud • Fintech • Professional Services • Software • Analytics • Financial Services
Remote
USA
2800 Employees
120K-204K Annually

NBCUniversal Logo NBCUniversal

Staff Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote
Hybrid
Los Angeles, CA, USA
68000 Employees
145K-175K Annually
Remote
United States
600 Employees

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account