Senior Site Reliability Engineer

Posted 9 Hours Ago
Be an Early Applicant
Boston, MA
Senior level
Software
The Role
The Senior Site Reliability Engineer will design and manage scalable systems, automate infrastructure, establish monitoring systems, optimize performance, and ensure security compliance. Additionally, the role involves mentoring junior engineers and collaborating with product teams to enhance service reliability and efficiency.
Summary Generated by Built In
We are BrainGu

BrainGu is a technology company that builds developer platforms. We believe the future has to be innovated; it has to be created; it has to be secured. Through platforms that create order-of-magnitude improvements to quality in the form of resilience, scalability, reliability, and security – (rs)2 – we enable our customers to deliver the future.

Our mission is to dream of, incubate, and scale dual-use technology platforms that unlock innovation.

Our vision is to unlock innovation by enabling more organizations to build high quality software faster, and at lower cost.

Overview

This role sits within the Engineering Operations Value Stream (EngOps) supporting our flagship Developer Experience Platform, SmoothGlue.  As a member of the EngOps team, you will be responsible for working towards our SRE strategy and operating model and helping to mature our SRE discipline. 

Building iteratively with a strong understanding of the trade-offs required to implement SRE frameworks and capabilities is a must have as well as a strong willingness to collaborate. Automating yourself out of a job is not viewed as a risk but rather a worldview that is required in this role. 

You will work closely with our EngOps CTO and team as well as our Platform Product team to help inform and drive roadmaps, metrics, and overall organizational maturity. 

Responsibilities 

  • System Architecture and Design
    • Design, implement, and manage highly available, scalable, and fault-tolerant systems.
    • Collaborate with software engineering teams to optimize application performance and reliability.
    • Evaluate and recommend appropriate technologies, tools, and infrastructure solutions.
  • Infrastructure Automation
    • Develop and maintain infrastructure as code (IaC) using tools like Terraform, Ansible, or similar.
    • Automate deployment, configuration, and scaling of applications and services.
    • Implement continuous integration and continuous deployment (CI/CD) pipelines.
  • Monitoring and Incident Management:
    • Establish and maintain comprehensive monitoring, alerting, and logging systems.
    • Respond to incidents, troubleshoot issues, and ensure timely resolution to minimize downtime.
    • Participate in on-call rotations and post-incident analysis to drive continuous improvement.
  • Performance Optimization:
    • Analyze system performance and identify bottlenecks; implement optimizations.
    • Conduct capacity planning to anticipate future resource needs and scalability requirements.
    • Implement strategies to improve system response times and overall efficiency.
  • Security and Compliance:
    • Collaborate with security teams to implement best practices for system and data protection.
    • Ensure compliance with industry standards and regulations relevant to the company's operations.
  • Mentorship and Collaboration:
    • Provide guidance, mentorship, and technical leadership to junior SREs and engineering teams.
    • Foster a collaborative environment by sharing knowledge and promoting best practices.

Requirements 

  • Bachelor’s degree or equivalent work experience.
  • 6+ years of relevant work experience.
  • Highly motivated self-starter with excellent interpersonal and communication skills. Able to communicate efficiently at multiple levels of seniority.
  • Highly developed documentation skills
  • Experience working in customer facing role, customers may be end-user, developers, or org leadership
  • Certification or formal training in site reliability engineering concepts and practices
  • Prior experience working towards SLIs, SLOs and observability capabilities at a large scale.
  • Experience working on observability, logging and metrics toolsets.
  • Experience of k8s and container technologies such as Docker, Openshift, RKE and EKS.
  • Experience troubleshooting routing and networking in a cloud environment (AWS, GCP or Azure) 
  • Experience with Secrets products such as HashiCorp Vault or CyberArk.
  • Highly effective navigating large and complex organizations.
  • Ability to work under pressure and manage tight deadlines or unexpected changes in expectations or requirements.
  • Experience working in CISO or security led organisations desirable but not essential.
  • AWS Solutions Architect - Associate certification is preferred.

Tech Stack 

  • Kubernetes, Docker, Cri-O, Containerd, or other container technologies
  • Major programming or scripting languages
  • Istio, Linkerd, Consul, or other service mesh
  • Ansible, Terraform, Helm, Kustomize or other Infrastructure as Code (IaC) and Configuration as Code (CaC)
  • AWS, Azure, GCP, or other cloud technologies

Specific Job Needs

  • Located in one of the following locations: Boston, Massachusetts  
  • Willing to obtain and maintain a Top Secret Clearance 
  • Willing to travel up to 50%
  • Expected base salary of $150,000 - $170,000
Employee Perks
  • 12 weeks of fully paid parental leave for birth or adoption
  • 31 days of PTO, which includes federal holidays
  • 100% employer-paid insurance plans (employee-only)
  • 401(k) matching up to 5%
  • $10k “BrainBudget” to facilitate your personal and professional growth
  • $1,500 “Battle Station Budget” to outfit your home office with maximum RGB
  • 85% paid healthcare premiums for you, your spouse, and dependents
  • A monthly cell phone and internet stipend
  • Supplemental Tricare plan for Veterans
  • Monthly stipend for Veterans

Top Skills

Go
Java
Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Grand Rapids, MI
68 Employees
Hybrid Workplace
Year Founded: 2012

What We Do

BrainGu develops custom DevSecOps software that enables mission success and boasts exceptional user and developer experience by working directly with end-users to solve their real-life problems and continuously improve capabilities. By automating pinch points, BrainGu innovates new ways to solve mission problems.

Our vision is to solve complex national security challenges for the United States and its allies by incubating and scaling technology solutions that emphasize fielded, meaningful military capability in the hands of operators and mission owners.

BrainGu is setting the standard for rapid deployment and scalability of mission applications. As part of our Mission App as a Service solution offering, BrainGu offers subscription and packaged app timeline products that are aligned to BrainGu’s overall mission to provide the best, cutting-edge technology to the warfighter at the tactical edge.

Similar Jobs

Klaviyo Logo Klaviyo

Senior Site Reliability Engineer

Consumer Web • eCommerce • Marketing Tech • Retail • Software • Analytics • Generative AI
Hybrid
Boston, MA, USA
2000 Employees
157K-235K Annually
Remote
Newton, MA, USA
2327 Employees
119K-165K Annually

NVIDIA Logo NVIDIA

Senior Site Reliability Engineer - Infrastructure

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
4 Locations
21960 Employees

NVIDIA Logo NVIDIA

Senior Site Reliability Engineer - GPU Clusters

Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
4 Locations
21960 Employees

Similar Companies Hiring

HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
52 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account