Sr. Staff, Site Reliability Engineering - Observability

Posted 7 Days Ago
Be an Early Applicant
Hiring Remotely in United States
Remote
198K-270K Annually
Senior level
Information Technology • Security • Cybersecurity
Defeating every attack, every second of every day.
The Role
As a Senior Staff SRE, you will architect and implement observability and automated incident management in a microservices-based SaaS environment. Responsibilities include defining SLOs, guiding auto-remediation frameworks, and promoting best practices while mentoring engineers.
Summary Generated by Built In

About Us:

SentinelOne is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to threats in real-time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With SentinelOne, organizations gain full transparency into everything happening across the network at machine speed – to defeat every attack, at every stage of the threat lifecycle. 

We are a values-driven team where names are known, results are rewarded, and friendships are formed. Trust, accountability, relentlessness, ingenuity, and OneSentinel define the pillars of our collaborative and unified global culture. We're looking for people that will drive team success and collaboration across SentinelOne. If you’re enthusiastic about innovative approaches to problem-solving, we would love to speak with you about joining our team!

Due to Federal Government contract requirements, U.S. Citizenship is required for this position.FedRamp Staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne. 

What are we looking for?

We are seeking to hire a Senior Staff Engineer to join our Site Reliability Engineering (SRE) Team at SentinelOne. This role can be 100% remote for individuals based in the US, or hybrid if local to a corporate office location.

As a Senior Staff SRE, you will architect and lead the implementation of advanced observability, automated triage, and self-healing capabilities within our microservices-based SaaS environment. You will be instrumental in driving our organization’s evolution towards proactive, scalable incident management by enabling smart alert correlation, automated root cause analysis, and autonomous remediation systems. Additionally, you will define and implement Service Level Objectives (SLOs) that align with business goals, ensuring our systems meet reliability standards and exceed customer expectations.

What will you do? 

  • Design and guide the implementation of end-to-end alert correlation, auto-triage, and auto-remediation frameworks that meet the needs of a microservices-based SaaS architecture.
  • Ensure solutions align with business priorities and customer impact goals.
  • Define, implement, and monitor SLOs in collaboration with product and engineering teams. 
  • Establish reliability standards that meet business and customer expectations, driving accountability and transparency around service performance.
  • Partner with software engineers, SREs, and data scientists to implement and refine monitoring, alerting, alert correlation, auto-remediation, and SLO solutions.
  • Lead initiatives to promote best practices and knowledge sharing across all of SentinelOne engineering.
  • Mentor engineers and contribute to a culture of reliability engineering excellence through thought leadership and guidance on advanced SRE principles and practices.

What skills and knowledge should you bring?

  • Extensive SRE Experience: Proven experience in architecting and implementing SRE solutions at scale within a microservices or distributed systems environment.
    • 10+ years of progressive professional experience, with 5+ years of recent experience supporting enterprise SaaS environments (or equivalent combination of education, experience, and certifications).
  • Technical Expertise: Deep knowledge of incident management, alert correlation, automated triage, self-healing strategies, and SLO frameworks. Strong understanding of observability platforms, including monitoring, logging, and tracing solutions.
  • Programming & Scripting: Proficient in one or more programming languages (e.g., Python, Go, Java) with experience in automation and scripting for incident management workflows.
  • Machine Learning & Data Analysis: Experience with machine learning, anomaly detection, or data analytics techniques for real-time alert correlation and triage systems.
  • Cloud Infrastructure: Expertise in cloud platforms (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes), with experience in infrastructure-as-code (e.g., Terraform).
  • Problem-Solving & Decision-Making: Ability to make critical architectural decisions with a focus on business impact, reliability, and system performance.

Why us?

You will be joining a cutting-edge company, where you will tackle extraordinary challenges and work with the very best in the industry.

  • Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
  • Unlimited PTO
  • Industry leading gender-neutral parental leave
  • Paid Company Holidays
  • Paid Sick Time
  • Employee stock purchase program
  • Disability and life insurance
  • Employee assistance program
  • Gym membership reimbursement
  • Cell phone reimbursement

This U.S. role has a base pay range that will vary based on the location of the candidate.  For some
locations, a different pay range may apply.  If so, this range will be provided to you during the recruiting
process.  You can also reach out to the recruiter with any questions.

Base Salary Range

$198,000$270,000 USD

SentinelOne is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

SentinelOne participates in the E-Verify Program for all U.S. based roles. 

Top Skills

Sre
The Company
HQ: Mountain View, CA
1,050 Employees
Remote Workplace
Year Founded: 2013

What We Do

SentinelOne is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to threats in real-time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With SentinelOne, organizations gain full transparency into everything happening across the network at machine speed – to defeat every attack, at every stage of the threat lifecycle.

We are a values-driven team where names are known, results are rewarded, and friendships are formed. Trust, accountability, relentlessness, ingenuity, and OneSentinel define the pillars of our collaborative and unified global culture. We're looking for people that will drive team success and collaboration across SentinelOne. If you’re enthusiastic about innovative approaches to problem-solving, we would love to speak with you about joining our team!

Gallery

Gallery

Similar Jobs

Cisco Meraki Logo Cisco Meraki

Site Reliability Engineer, FedRamp, Remote in the U.S.

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI
Easy Apply
Remote
United States
3000 Employees
95K-153K Annually

AMP Logo AMP

Site Reliability Engineer - Embedded

Artificial Intelligence • Computer Vision • Greentech • Machine Learning • Robotics • Industrial • Automation
Easy Apply
Remote
United States
130 Employees

Comcast Advertising Logo Comcast Advertising

Site Reliability Engineer 3

AdTech • Digital Media • Marketing Tech
Remote
Pennsylvania, USA
5000 Employees
82K-192K Annually

Atlassian Logo Atlassian

Senior Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Similar Companies Hiring

Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account