Site Reliability Engineer

Posted 3 Days Ago
Be an Early Applicant
San Diego, CA
Mid level
Information Technology
The Role
The Site Reliability Engineer will design and implement cloud infrastructure and services with a focus on DevSecOps principles. Responsibilities include managing observability and monitoring solutions, optimizing Kubernetes clusters, incident response, and collaborating with DevOps and security teams for system reliability.
Summary Generated by Built In

Company Description

We are a leading-edge technology consulting firm committed to empowering organizations through the implementation of cloud-native and enterprise DevSecOps transformations. Our team of dedicated experts is driven by a passion for harnessing cutting-edge technologies to deliver unparalleled value to our clients. We specialize in crafting innovative technical solutions grounded in cloud-native principles, containerization, and the implementation of advanced automation-driven DevSecOps practices.

At the heart of our ethos lies a relentless pursuit of progress and the establishment of new industry benchmarks. Our unwavering commitment to excellence sets us apart and makes us the preferred choice for our clients. We recognize that delivering exceptional technical solutions necessitates the expertise of renowned professionals.

If you share our zeal for constructing cloud-native systems, developing cloud-based applications, and designing automation solutions, and if you are seeking to join a company that stands as a dominant force in the realms of Enterprise DevSecOps and Cloud Native domains, then you've discovered the ideal destination.

We cultivate a vibrant, inclusive, and collaborative environment that champions innovation and continuous learning. As a member of our team, you will have the opportunity to engage in exciting projects, tackle intricate challenges, and make a substantial contribution to the advancement of digital transformation for our clients. Come and be a part of a team that thrives on pushing the boundaries of what technology can achieve.

Job Description

This position will primarily focus on providing design and implementation expertise on infrastructure provisioning, management and lifecycle implementation of cloud components and services, containers and other critical concepts of DevSecOps principles. 

Key Responsibilities: 

  • Observability & Monitoring: Design and manage monitoring solutions using Prometheus, Thanos, Grafana, and Mimir to ensure the health and performance of Kubernetes clusters and applications. 
  • Logging & Tracing: Implement Loki, Promtail, and OpenTelemetry to collect, process, and analyze logs and traces for debugging and forensic analysis. 

  • Kubernetes Operations: Deploy, maintain, and optimize Kubernetes clusters, ensuring observability tools are properly integrated and configured. 

  • Incident Response & SLOs: Define SLIs, SLOs, and error budgets, develop alerting strategies using Alertmanager, and automate incident response processes. 

  • High Availability & Scalability: Optimize observability stack for high availability in limited connectivity environments, leveraging solutions like Thanos for long-term storage and Minio for object storage. 

  • Security & Compliance: Implement observability best practices in compliance with security frameworks and Kubernetes security tools such as NeuVector. 

  • Automation & Infrastructure as Code (IaC): Automate observability deployments using Terraform, Helm, and Kubernetes Operators. 

  • Collaboration & Documentation: Work closely with DevOps, security, and platform teams to enhance system reliability and maintain comprehensive documentation. 

Qualifications

  • Active Secret or Top Secret Clearance.

  • Strong Kubernetes expertise in managing and monitoring clusters at scale. 

  • Experience with observability stacks including Prometheus, Loki, Thanos, Grafana, OpenTelemetry, and Mimir. 

  • Proficiency in logging and tracing frameworks, including Promtail, Fluent Bit, and OpenTelemetry. 

  • Hands-on experience with incident management and alerting using Alertmanager, Grafana Alerts, and PagerDuty/Slack integrations. 

  • Deep understanding of Kubernetes networking, service meshes (Istio/Linkerd), and security monitoring. 

  • Scripting & Automation: Proficiency in Python, Go, or Bash for automating observability tasks. 

  • Infrastructure as Code (IaC): Experience with Terraform, Helm, and Kubernetes Operators. 

  • Strong troubleshooting and root cause analysis skills in large-scale distributed systems. 

  • Experience working in air-gapped or limited connectivity environments is a plus. 


Preferred Skills: 

  • Experience with NeuVector, Falco, or other Kubernetes security monitoring tools. 
  • Knowledge of eBPF-based observability tools such as Cilium Hubble. 
  • Experience optimizing observability stacks for performance and cost efficiency. 
  • Familiarity with DevSecOps practices and compliance frameworks. 


Additional Information

We Value:

  • Drive: Passion and energy to implement quality technical solutions. Self-motivation and intellectual curiosity
  • Commitment to Quality: Passion to conceive and produce world-class solutions that drive real-world value for the customer
  • Customer Focus: Consultative approach to solving problems for customers. Expectations management.
  • Communication: Superior communication skills. Ability to clearly articulate problems, solutions, risks, rewards etc. (written and verbal)
  • Technical Skills: Love for technology. You have to be inherently passionate about technology.
  • Business Acumen: Technology ultimately is used to enable the business. We look for people who understand how the businesses can be enabled through their technical solutions

What we offer:

  • Ability to make a noticeable difference for the organization and our customers
  • Tremendous growth opportunity by becoming part of a rapidly growing organization. It’s not your tenure but what you can bring to the table that defines how your career will be shaped. You control your growth.
  • Complex but interesting challenges to improve the depth and breadth of your technical and business skills. Our consultants are business technologists and understand how technology drives business. 
  • Competitive pay and benefits

Oteemo is an equal employment and affirmative action employer. We evaluate qualified applicants on merit and business needs and not on race, color, religion, creed, gender, sexual orientation, national origin, ancestry, age, disability, genetic information, marital status, veteran status or any other factor protected by law. Oteemo complies with the law regarding reasonable accommodations for handicapped and disabled employees.

Top Skills

Bash
Go
Kubernetes
Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Reston, VA
58 Employees
On-site Workplace
Year Founded: 2014

What We Do

We help enterprises unlock the power of modern technology to transform business through acceleration, enablement, and adoption. Let us help your enterprise connect its People, Process, Technology, Culture to enable reliable, secure innovation, agility, and resiliency.

- Decrease DevSecOps cycle times from months to weeks
- Accelerate software release rates
- Ensure digital adoption across the enterprise
- Increase ROI and reduce TCO

Creds:
CNCF (Cloud Native Compute Foundation) Member
Certified Kubernetes Service Provider (KCSP)
Certified Kubernetes for CNCF (KTP) Training Partner
AWS Advanced Consulting Partner

Let’s get started. Contact us to learn how we can help transform your digital supply chain at www.oteemo.com.

Similar Jobs

Roblox Logo Roblox

Principal SRE, Compute Orchestration

Computer Vision • Gaming • Software • Virtual Reality • Web3 • Metaverse
Hybrid
San Mateo, CA, USA
2500 Employees
289K-338K Annually
Easy Apply
3 Locations
1100 Employees

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
Hybrid
San Francisco, CA, USA
289097 Employees

Similar Companies Hiring

Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Enterprise Web • Consulting • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account