Principal Site Reliability Engineer

Posted 2 Days Ago
Hiring Remotely in USA
Remote
170K-200K Annually
Senior level
Security • Cybersecurity
The Role
As a Principal Site Reliability Engineer, you will enhance cloud infrastructure reliability and scalability at UltraViolet Cyber by managing EKS clusters, automating infrastructure with IaC tools, building CI/CD pipelines, and implementing monitoring solutions while collaborating with engineering teams and leading incident management.
Summary Generated by Built In

Make a difference here.


UltraViolet Cyber is a leading platform-enabled unified security operations company providing a comprehensive suite of security operations solutions. Founded and operated by security practitioners with decades of experience, the UltraViolet Cyber security-as-code platform combines technology innovation and human expertise to make advanced real-time cybersecurity accessible for all organizations by eliminating risks of separate red and blue teams.


By creating continuously optimized identification, detection, and resilience from today’s dynamic threat landscape, UltraViolet Cyber provides both managed and custom-tailored unified security operations solutions to the Fortune 500, Federal Government, and Commercial clients. UltraViolet Cyber is headquartered in McLean, Virginia, with global offices across the U.S. and in India. 


UltraViolet is seeking a highly skilled Principal Site Reliability Engineer (SRE) with expert-level experience in Amazon Elastic Kubernetes Service (EKS), DevOps, and AWS to enhance the scalability, reliability, and security of our cloud infrastructure. As a key member of our engineering team, you will work across multiple disciplines to ensure the resilience and efficiency of our systems, employing automation and modern DevOps practices to drive operational excellence. This is a highly dynamic role that requires a combination of hands-on expertise, leadership skills, and continuous learning to help mature our infrastructure and reliability processes. 

Work You'll Do:

  • System Reliability & Performance: Ensure the availability, performance, scalability, and security of our cloud-based services using best practices in SRE and DevOps. 
  • Kubernetes & EKS Management: Architect, deploy, and maintain Kubernetes clusters, primarily using Amazon Elastic Kubernetes Service (EKS) 
  • Infrastructure as Code (IaC): Automate infrastructure provisioning, configuration, and management using Terraform, Pulumi, or similar tools. 
  • CI/CD Pipelines: Build, maintain, and enhance continuous integration and continuous deployment (CI/CD) pipelines, optimizing deployment workflows for speed and reliability. 
  • Monitoring & Incident Response: Design and implement comprehensive monitoring, alerting, and logging solutions using tools such as Prometheus, Grafana, and CloudWatchto proactively identify and address system issues. 
  • Security & Compliance: Enforce security best practices, implement access controls, and ensure compliance with industry standards 
  • Capacity Planning & Scaling: Analyze system performance and scalability, implementing proactive strategies to accommodate growth and prevent downtime. 
  • Collaboration & Cross-Functional Leadership: Work closely with Engineering and Product teams to integrate reliability principles into the software development lifecycle. 
  • Incident Management & Root Cause Analysis: Lead post-mortem investigations for critical incidents, identifying actionable improvements to enhance system resilience. 
  • Cost Optimization: Assess and optimize cloud costs while maintaining performance and reliability, leveraging AWS savings plans, right-sizing resources, and improving infrastructure efficiency.

What You Have:

  • Extensive experience in AWS, with deep expertise in managing EKS clusters, networking, IAM, security groups, and other core AWS services. 
  • Strong proficiency in Kubernetes (EKS, Helm, Kubectl, Operators) with a proven track record of deploying, maintaining, and scaling containerized applications. 
  • Hands-on experience in DevOps tools & methodologies, including Terraform, Ansible or SaltStack, Helm, GitOps, ArgoCD, and CI/CD platforms such as GitHub Actions or Jenkins 
  • Proficiency in scripting and automation using Python, Bash, or Golang to enhance system reliability and efficiency. 
  • Experience with observability and monitoring tools, including Prometheus, Grafana, Loki, or AWS CloudWatch. 
  • Deep understanding of networking principles, including DNS, VPC, Load Balancers, VPNs, and Service Mesh architectures 
  • Strong background in security best practices, including IAM policies, encryption, secrets management, and vulnerability scanning (AWS KMS, HashiCorp Vault, etc.). 
  • Experience working with highly available, distributed systems, including microservices architecture and cloud-native applications. 
  • Previous experience in an Agile or DevOps culture, promoting collaboration, automation, and iterative improvements. 
  • Excellent troubleshooting skills, with the ability to analyze complex system failures and drive solutions. 
  • Strong communication and leadership skills, with the ability to mentor junior engineers and collaborate effectively with cross-functional teams. 
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience. 

What We Offer:

  • 401(k), including an employer match of 100% of the first 3% contributed and 50% of the next 2% contributed
  • Medical, Dental, and Vision Insurance (available on the 1st day of the month following your first day of employment)
  • Group Term Life, Short-Term Disability, Long-Term Disability
  • Voluntary Life, Hospital Indemnity, Accident, and/or Critical Illness
  • Participation in the Discretionary Time Off (DTO) Program
  • 11 Paid Holidays Annually 

UltraViolet Cyber maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect our company's differing products, services, industries and lines of business. Candidates are typically placed into the range based on the preceding factors.


We sincerely thank all applicants in advance for submitting their interest in this position. We know your time is valuable.


UltraViolet Cyber welcomes and encourages diversity in the workplace regardless of race, gender, religion, age, sexual orientation, gender identity, disability, or veteran status. 


If you want to make an impact, UltraViolet Cyber is the place for you! 

Top Skills

Bash
Go
Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: McLean, Virginia
205 Employees
On-site Workplace

What We Do

Unified Security Operations, Delivered. We tear down the walls between red and blue teams & address risk exposure when it’s discovered—not weeks later. UltraViolet Cyber is a leading platform-enabled unified security operations company providing a comprehensive suite of security operations solutions.

Founded and operated by security practitioners with decades of experience, the UltraViolet Cyber security-as- code platform combines technology innovation and human expertise to make advanced real time cybersecurity accessible for all organizations by eliminating risks of separate red and blue teams. By creating continuously optimized identification, detection and resilience from today’s dynamic threat landscape, UltraViolet Cyber provides both managed and custom-tailored unified security operations solutions to the Fortune 500, Federal Government, and Commercial clients.

UltraViolet Cyber is headquartered in McLean, Virginia with global offices across the U.S. and in India.

Similar Jobs

DFIN Logo DFIN

Principal Site Reliability Engineer - Cloud (Remote)

Artificial Intelligence • Fintech • Information Technology • Software • Data Privacy
Remote
United States
2600 Employees

GitLab Logo GitLab

Intermediate Site Reliability Engineer, US Public Sector Services

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
US
2350 Employees
104K-222K Annually

GitLab Logo GitLab

Intermediate Site Reliability Engineer, FinOps

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
29 Locations
2350 Employees

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Similar Companies Hiring

Coro Thumbnail
Software • Security • Information Technology • Data Privacy • Cybersecurity • Cloud • Artificial Intelligence
Chicago, IL
330 Employees
MacPaw Thumbnail
Software • Security • Information Technology • Data Privacy • Cybersecurity • App development
Cambridge, MA
550 Employees
Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account