Site Reliability Engineer

Posted 9 Days Ago
Be an Early Applicant
Sydney, New South Wales
Senior level
Cloud • Marketing Tech
The Role
The Site Reliability Engineer at Avetta will manage and monitor high-availability replicated cloud systems, ensuring a minimum 99.9% uptime. Responsibilities include overseeing NOC operations, defining golden signals, automating monitors and dashboards, and collaborating with development teams. Candidates need at least 5 years of experience in site reliability engineering and proficiency with AWS and observability tools.
Summary Generated by Built In

Join Avetta as a Site Reliability Engineer in Australia!

Site Reliability Engineers are pioneers of the production systems, we believe in proactive discovery and analysis of our entire stack, continually optimizing, tuning, and scaling the system for maximal end-user experience on a globally distributed cloud-based SaaS platform.  Downtime is not within the SRE’s vocabulary.  The ability to maintain highly resilient and distributed systems, while integrating uptime monitors using programmatic APIs and developing intelligent scaling algorithms are important skills for the SRE.  In addition, the SRE needs to be able to communicate effectively with both development and product teams to drive technical discovery and help prioritize features that maintain and exceed uptime goals and end-user experience.

Essential Duties and Responsibilities:

  • Lead the management and monitoring of highly available replicated cloud systems.
  • Oversee 24/7 Network Operations Center (NOC) operations, maintaining a minimum 99.9% annual uptime.
  • Define golden signals for all services in our core SaaS application.
  • Manage NOC engineer teams, including scheduling and responsibilities.
  • Design PagerDuty escalation policies across various teams.
  • Expertise in AWS technologies and building dashboards with leading observability platforms.
  • Automate monitors and dashboards using modern programmatic methods.
  • Provide regular reports to Engineering leadership and executive teams for continuous improvement.

Minimum Qualifications:

  • Minimum B.S. or B.A. in Computer Science.
  • Minimum of 5 years of experience as a Site Reliability Engineer, including some experience in managing teams and leading projects.
  • Stellar communication and interpersonal skills for effective collaboration with Development & Product teams.
  • Proficiency in monitoring the networking stack using distributed tracing and profiling tools.
  • Proficient with building dashboards with NewRelic, Kibana, Grafana, Prometheus and other observability platforms.
  • Proficient with AWS technologies. 
  • Working knowledge in monitoring RESTful microservices and basic HTTP protocols.
  • Able to automate monitors and dashboards using REST APIs, GraphQL, and other modern programmatic methods.
  • Working knowledge of profiling tools for measuring CPU, Memory, I/O, Disk, and process threads dumps.
  • Experience in managing, integrating, and automating alerting and escalation tools.
  • Must live in Australia with unlimited rights to work. Preference will be given to those living in Sydney or Newcastle areas.

Nice to Haves:

  • Troubleshooting experience with modern container and networking technologies (Kubernetes, HAProxy, ALB).
  • Familiarity with scripting languages like Bash, Python, and Go.
  • Ability to administer and tune load balancer technologies.
  • Experience in managing, monitoring, and benchmarking distributed file systems.
  • Proficiency in configuration management tools (SaltStack, Ansible, Terraform).

Metrics That Matter:

  • System Monitoring: Create and automate system monitor and escalation policies.
  • System Management: Respond and resolve internal requests within business hours.
  • High Availability & Resilience: Maintain 99.95% uptime and be the first responder in emergency situations.
  • Full-Stack Observability: Build dashboards for end-to-end detection of system anomalies.
  • Innovation: Propose new ideas and improvements to the team regularly.

Join us at Avetta and be at the forefront of driving technical excellence and ensuring a seamless experience for our users across the globe.


#LI-HYBRID 

#LI-REMOTE

Top Skills

AWS
Go
Python
The Company
HQ: Orem, UT
833 Employees
On-site Workplace
Year Founded: 2003

What We Do

Avetta is building the connections that build the world.

Avetta provides a cloud-based supply chain risk management and commercial marketplace platform. Our global solution is uniquely designed to connect the world’s leading organizations with qualified suppliers, driving sustainable growth. We build trustworthy bonds through responsive technology and human insight. Our process is collaborative. Our global reach is complemented by our local expertise. Hundreds of global organizations depend on Avetta to align their supply chains to sustainable business practices, worldwide. Discover more at avetta.com.

Similar Jobs

Citadel Securities Logo Citadel Securities

Site Reliability Engineer

Information Technology • Software • Financial Services
Sydney, New South Wales, AUS
1900 Employees
125K-350K Annually

Dynatrace Logo Dynatrace

Site Reliability Engineer

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Hybrid
Sydney, New South Wales, AUS
4700 Employees

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
Sydney, New South Wales, AUS
11000 Employees
Hybrid
Sydney, New South Wales, AUS
619 Employees

Similar Companies Hiring

Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account