Principal Site Reliability Engineer

Posted 2 Days Ago
Hiring Remotely in Charlotte, NC
Remote
Expert/Leader
Internet of Things
The Role
Implement and maintain monitoring systems, respond to system outages, automate tasks, work on network performance improvement, define service level objectives, conduct postmortems, lead site reliability engineers, communicate with multiple stakeholders.
Summary Generated by Built In

Company Description

At Brightspeed, we are reimagining how people live, work, play and connect by providing fast, reliable internet connections and an awesome customer experience in twenty states throughout the Midwest and South.

Backed by funds managed by Apollo Global Management, our vision is to accelerate the upgrade of copper to fiber optic technologies, bringing faster and more reliable internet service to many rural markets traditionally underserved by broadband providers, while delivering best-in-class customer experience.

Be a part of the team that will make this vision a reality….designing and building a world class fiber network and creating a customer experience second to none.

Check us out on the web!

Job Description

We are currently looking for a Principal Site Reliability Engineer to join our growing team. In this role, you will implement and maintain monitoring systems to track the performance and availability of business-critical systems and infrastructure using metrics to identify trends and potential issues. You will also work closely with development teams, operations, and other stakeholders to ensure that new services and features are reliable and scalable.

As a Principal Site Reliability Engineer, your duties and responsibilities will include:

  • Implement and maintain monitoring systems to track the performance and availability of Business-critical systems and infrastructure. Use metrics to identify trends and potential issues.
  • Respond to system outages and performance issues, performing root cause analysis to prevent recurrence
  • Develop scripts and tools to automate repetitive tasks, such as deployment, scaling, and monitoring
  • Work closely with development teams, operations, and other stakeholders to ensure that new services and features are reliable and scalable
  • Work on reducing latency and improving the speed of data transmission across the network
  • Define and measure Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to ensure services meet required performance and availability targets+
  • Conduct postmortems after incidents to identify what went wrong and what can be improved
  • Work with Lead Application owners and internal Change Management to review code changes and support deployments
  • Lead the team of site reliability engineers onshore/offshore, mentor them for support activities required for system reliability
  • Must have ability to communicate and abstract the messaging to multiple target audiences including Sr business & IT leadership, technology, and business teams.

Qualifications

WHAT IT TAKES TO CATCH OUR EYE:

  • Master’s degree in computer science, telecommunications, or similar areas, with a minimum of 10 years software engineering experience, including a minimum of 5 years as a site reliability engineer
  • Proven track record of managing mission critical customer facing applications for reliability
  • 5+ years of experience supporting operations and maintenance for cloud-native applications in production that are fault-tolerant, self-healing, scalable and high available
  • Excellent troubleshooting and problem-solving skills, with a keen attention to detail to identify and resolve complex production issues
  • Deep understanding of cloud computing platforms (GCP) and containerization technologies (e.g., Docker, Kubernetes)
  • Solid experience with core Kubernetes concepts such as Pods, Workloads, Services, Ingress/Egress, Deployments, ConfigMaps, HPA, Liveliness Probe, and Secrets
  • Strong knowledge of infrastructure as code tools (e.g., Terraform, Ansible, ArgoCD) and CI/CD pipelines
  • Strong experience working with integration of code quality tool (SonarQube or Checkmarx) with CI/CD pipeline
  • Strong experience with monitoring, logging, and observability tools like, Splunk, GCP log, Dynatrace etc.
  • Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders
  • Must have proven written and verbal communication skills, including presentations using tools like PowerPoint
  • Must have ability to communicate and abstract the messaging to multiple target audiences including Sr business & IT leadership, technology and business teams

BONUS POINTS FOR:

  • Certifications such as Google Professional Cloud DevOps Engineer or AWS Certified DevOps Engineer 


#LI-SS1

Additional Information

WHY JOIN US?

We aspire to contemporary ways of working.

Recognized as a Top Workplace by the Charlotte Observer, Brightspeed HQ is located on the 7th floor of the new Vantage South End - East Tower in Charlotte, NC. We prioritize hiring talent in the Charlotte area, whenever possible, to make it a truly vibrant destination for our hybrid workforce. At Brightspeed, we have roles that are designated as remote, hybrid, office or field-based, depending on the position, business needs and individual circumstances. We also invest in technology that enables our entire team to stay connected. Why? Because Brightspeed recognizes the value of finding the best talent for the job, wherever they may be.

We offer competitive compensation and comprehensive benefits.

Our benefits and paid time off programs reflect our underlying belief in promoting overall wellness through physical, emotional and financial health. Brightspeed offers a comprehensive benefit program, including competitive medical, dental, vision, and life insurance; an employee assistance program; a 401K plan with company match and a host of voluntary benefits. 

Diversity, equity and inclusion are at the center of our grounding belief in Being Real. 

When we bring our authentic selves to work, everyone is better as a result. A diverse team helps us be fierce advocates for more accessible, inclusive and high-quality internet, because we believe doing so promotes equity in the communities we serve.

Brightspeed is an Equal Opportunity Employer/Veterans/Disabled

For all applicants, please take a moment to review our Privacy Notices:

  • Brightspeed’s Privacy Notice for California Residents
  • Brightspeed’s Privacy Notice

Top Skills

Docker
Kubernetes
The Company
Charlotte, NC
65 Employees
On-site Workplace

What We Do

On August 3, 2021, Apollo Global Management and Lumen Technologies, Inc. entered an agreement for Apollo to acquire Lumen’s Incumbent Local Exchange Carrier (ILEC) assets and associated operations across 20 states for $7.5 billion. Brightspeed was formed to create a new high-speed internet company to serve the customers in those states.

Our vision is to accelerate the upgrade of copper to fiber optic technologies, bringing faster and more reliable internet service to many rural markets traditionally under-served by broadband providers, while delivering best-in-class customer experience.

Be a part of the team that will make this vision a reality….designing and building a world class fiber network and creating a customer experience second to none.

Similar Jobs

SentinelOne Logo SentinelOne

Principal Site Reliability Engineer

Information Technology • Security • Cybersecurity
Remote
United States
1050 Employees
204K-281K Annually

Gemini Logo Gemini

Principal Site Reliability Engineer, Platform

Blockchain • Fintech • Cryptocurrency
Remote
USA
660 Employees

HashiCorp Logo HashiCorp

Sr. Platform Software Engineer - HCP Terraform

Cloud • Information Technology • Security • Software
Remote
United States
1200 Employees
177K-208K Annually

Core Scientific Logo Core Scientific

Senior Site Reliability Engineer (SRE)

Blockchain • Fintech • Cryptocurrency
Remote
USA
290 Employees

Similar Companies Hiring

Optimum Thumbnail
Software • Retail • Mobile • Marketing Tech • Internet of Things • Digital Media • AdTech
Long Island City, NY
9000 Employees
Arch Systems Inc. Thumbnail
Software • Manufacturing • Machine Learning • Internet of Things • Industrial • Artificial Intelligence • Analytics
US
80 Employees
Halter Thumbnail
Software • Machine Learning • Internet of Things • Hardware • Greentech • Business Intelligence • Agriculture
Auckland City, NZ
150 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account