Site Reliability Engineer (5667)

Posted 22 Days Ago
Hiring Remotely in USA
Remote
Mid level
Information Technology • Consulting
The Role
As a Site Reliability Engineer, you will design and manage highly available systems, optimize performance, automate deployment and configuration, develop documentation, and handle incident responses. You will collaborate with teams to troubleshoot issues and improve service delivery.
Summary Generated by Built In

As Site Reliability Engineer, you’ll lead the design, implementation, and management of highly available and scalable systems, applying industry best practices and reliability engineering principles.

We know that you can’t have great technology services without amazing people. At MetroStar, we are obsessed with our people and have led a two-decade legacy of building the best and brightest teams. Because we know our future relies on our deep understanding and relentless focus on our people, we live by our mission: A passion for our people. Value for our customers.

If you think you can see yourself delivering our mission and pursuing our goals with us, then check out the job description below!

What you’ll do:

  • Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance to meet defined service level objectives.
  • Design and implement monitoring, alerting, and incident response strategies to proactively identify and mitigate potential issues, ensuring uninterrupted service availability.
  • Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes.
  • Develop and maintain comprehensive documentation for system configurations, processes, and procedures.
  • Participate in on-call rotations and respond to incidents, working diligently to resolve issues and prevent recurrence.

What you’ll need to succeed:

  • Possess an active Secret U.S. Government security clearance or higher
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • Minimum of 3 years of professional experience in a Site Reliability Engineering role or similar capacity.
  • Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible).
  • Proficiency in managing, leading, and engineering incident and outage response
  • Strong engineering experience in network protocols (e.g., TCP/IP, DNS, HTTP/HTTPS, Load Balancing, etc.)
  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash) and RPA (e.g. Blue Prism, UIPath) to automate tasks and develop tools.
  • Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker).
  • Expertise in implementing and managing monitoring and logging solutions (e.g., Splunk, Prometheus, Grafana, ELK stack).
  • Familiarity with CI/CD pipeline development and management (e.g., GitLab CI, Azure DevOps, AWS Lambda, Jenkins)
  • Proven track record of designing, building, and maintaining highly available and scalable systems.
  • Expert proficiency in developing automated functional, regression and performance tests and developing automated testing standards for development teams.
  • Experience facilitating change and configuration management processes to drive reliability.
  • Strong problem-solving skills, with the ability to diagnose complex issues and implement effective solutions.
  • Excellent communication skills, with the ability to collaborate effectively across diverse teams.

Like we said, we are big fans of our people. That’s why we offer a generous benefits package, professional growth, and valuable time to recharge. Learn more about our company culture code and benefits. Plus, check out our accolades.

Commitment to Non-Discrimination
All qualified applicants will receive consideration for employment based on merit and without regard to sex, race, ethnicity, age, national origin, citizenship, religion, physical or mental disability, medical condition, genetic information, pregnancy, family structure, marital status, ancestry, domestic partner status, sexual orientation, gender identity or expression, veteran or military status, status as a protected veteran, or any other status protected by applicable federal, state, local, or international law.

 What we want you to know:

In compliance with federal law, all persons hired will be required to verify identity and eligibility to work in the United States and to complete the required employment eligibility verification form upon hire.

 Not ready to apply now? 

Sign up to join our newsletter here.

Top Skills

Bash
Go
Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Reston, VA
250 Employees
On-site Workplace
Year Founded: 1999

What We Do

MetroStar is a digital services and management consulting company specializing in emerging technologies within the public sector. MetroStar is a mission accelerator - we embrace disruptions in tech to propel progress. Through our user-centric capabilities, we create new paths to government innovation and shape thoughtful outcomes for the people.

Similar Jobs

GitLab Logo GitLab

Intermediate Site Reliability Engineer, US Public Sector Services

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
US
2350 Employees
104K-222K Annually

DFIN Logo DFIN

Principal Site Reliability Engineer - Cloud (Remote)

Artificial Intelligence • Fintech • Information Technology • Software • Data Privacy
Remote
United States
2600 Employees

GitLab Logo GitLab

Intermediate Site Reliability Engineer, FinOps

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
29 Locations
2350 Employees

Comcast Advertising Logo Comcast Advertising

Site Reliability Engineer 3

AdTech • Digital Media • Marketing Tech
Remote
Pennsylvania, USA
5000 Employees
82K-192K Annually

Similar Companies Hiring

InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees
Quantum Rise Thumbnail
Software • Professional Services • Natural Language Processing • Machine Learning • Consulting • Automation • Artificial Intelligence
Chicago, Illinois
17 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account