Senior Site Reliability Engineer

Posted 19 Days Ago
Be an Early Applicant
Hiring Remotely in Bengaluru, Karnataka
Remote
Hybrid
Mid level
Information Technology • Software • Consulting
The Role
The Senior Site Reliability Engineer will manage the Kubernetes infrastructure within AWS, ensuring high service quality and availability. Responsibilities include designing, deploying, and maintaining systems, implementing SRE best practices, and collaborating with cross-functional teams to enhance system reliability and efficiency. This role involves automation, managing cloud resources, and responding to production escalations while participating in an on-call rotation.
Summary Generated by Built In

About the company

 

Everbridge (NASDAQ: EVBG) empowers enterprises and government organizations to anticipate, mitigate, respond to, and recover stronger from critical events. In today’s unpredictable world, resilient organizations minimize impact to people and operations, absorb stress, and return to productivity faster when deploying critical event management (CEM) technology. Everbridge digitizes organizational resilience by combining intelligent automation with the industry’s most comprehensive risk data to Keep People Safe and Organizations Running™. For more information, visit www.everbridge.com, read the company blog, and follow on Twitter. Everbridge… Empowering Resilience

What you'll do

  • Are you motivated by an incredible sense of purpose in doing work that helps keep people safe? Are you passionate about innovating on cutting-edge technology to develop robust architecture principles, operability guidelines, progressive scaling methodologies, and implementing other sophisticated techniques to reliably operate the infrastructure at scale? Do you have an appetite for streamlining efficiency, automating away toil, and proactively eliminating problems before they occur? If so, this position is a perfect opportunity for you to join the Everbridge Kubernetes Platform team.
  •  As part of the Everbridge Kubernetes Platform team, you will play a critical role in ensuring the overall service quality and availability of Everbridge's solutions. This includes designing, deploying, and managing Kubernetes at scale, evangelizing both Kubernetes and SRE best practices, and helping to push the boundaries of the latest technology. The platforms that you will support are critical to the delivery of time-sensitive information to help keep people safe and businesses running.
  • Own and maintain the Kubernetes infrastructure from conception to completion within AWS. Including services such as VPCs, EC2, Transit Gateways, IAM roles and policies, Route53, S3, SGs, NACLs
  • Build upon the operational availability, security, scalability, efficiency, monitoring, instrumentation, and overall service reliability of Everbridge's Kubernetes solutions.
  • Collaborate across Agile teams with Architects, Developers, Quality, Data, Security, and other engineers on designing and implementing highly reliable solutions.
  • Research and implement SRE and Kubernetes best practices and by creating automation, cross-functional collaboration, and data-driven decisions to reinforce the integrity and reliability of our systems.
  • Participate in a rotating on-call rotation to resolve production escalations

What you'll bring:

  • 3+ years of technical AWS experience, managing and owning systems in a production environment
  • 2+ years of Kubernetes experience (EKS, AKS, GKE, Self-managed)
  • 3+ years of Terraform or similar IaC experience
  • Experience with the following tooling: GitLab CICD, Packer, Docker, EKS, Kubernetes, Spinnaker, Helm, Argo, Jenkins
  • Experience with Telemetry tools such as Datadog, SumoLogic, Grafana, Prometheus
  • Experience writing automation in languages such as Python, Go, Bash, Java
  • Experience with configuration management tools such as Salt, Ansible, AWS user_data
  • Experience with a DevOps/SRE production environment
  • Experience with Agile practices
  • Large-scale production UNIX/Linux experience

#LI-BK1 


About Everbridge


Everbridge empowers enterprises and government organizations to anticipate, mitigate, respond to, and recover stronger from critical events. In today’s unpredictable world, resilient organizations minimize impact to people and operations, absorb stress, and return to productivity faster when deploying critical event management (CEM) technology. Everbridge digitizes organizational resilience by combining intelligent automation with the industry’s most comprehensive risk data to Keep People Safe and Organizations Running™. For more information, visit www.everbridge.com, read the company blog, and follow on Twitter. Everbridge… Empowering Resilience

 

Everbridge is an Equal Opportunity/Affirmative Action Employer. All qualified Applicants will receive consideration for employment without regard to race, creed, color, religion, or sex including sexual orientation and gender identity, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

Top Skills

AWS
Bash
Go
Java
Kubernetes
Python
Terraform
The Company
Belfast
1,437 Employees
On-site Workplace

What We Do

Keeping People Safe and Businesses Running. Faster.

Everbridge, Inc. (NASDAQ: EVBG) is a global software company that provides enterprise software applications that automate and accelerate organizations’ operational response to critical events in order to Keep People Safe and Businesses Running™. During public safety threats such as active shooter situations, terrorist attacks or severe weather conditions, as well as critical business events including IT outages, cyber-attacks or other incidents such as product recalls or supply-chain interruptions, over 5,300 global customers rely on the company’s Critical Event Management Platform to quickly and reliably aggregate and assess threat data, locate people at risk and responders able to assist, automate the execution of pre-defined communications processes through the secure delivery to over 100 different communication devices, and track progress on executing response plans.

Similar Jobs

Atlassian Logo Atlassian

Senior Site Reliability Engineer, Customer Support Technology

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
Bengaluru, Karnataka, IND
11000 Employees
Remote
8 Locations
880 Employees
Easy Apply
Remote
India
100 Employees
Remote
India
2736 Employees

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account