Site Reliability Engineer

Posted Yesterday
Hiring Remotely in Naperville, IL
Remote
Hybrid
Mid level
Artificial Intelligence • Big Data • Cloud • Information Technology • Machine Learning
The Role
The Site Reliability Engineer will ensure system reliability and support infrastructure through performance optimization, incident management, and analysis. Responsibilities include monitoring system metrics, leading incident management, collaborating with DevOps teams, and optimizing resource usage to enhance system scalability and reliability.
Summary Generated by Built In

Egen is a fast-growing and entrepreneurial company with a data-first mindset. We bring together the best engineering talent working with the most advanced technology platforms, including Google Cloud and Salesforce, to help clients drive action and impact through data and insights. We are committed to being a place where the best people choose to work so they can apply their engineering and technology expertise to envision what is next for how data and platforms can change the world for the better. We are dedicated to learning, thrive on solving tough problems, and continually innovate to achieve fast, effective results.


We are seeking a Site Reliability Engineer to ensure system reliability and infrastructure support. You will be responsible for delivering scalability, performance optimization, incident management, and analysis.

Responsibilities:

  • Ensure system reliability and uptime of applications depending on the SLA’s
  • Monitor system performance metrics and determine the approaches to optimize the system
  • Lead incident management efforts with available methodology and document RCA(Root Cause Analysis), lessons learned, and any SOP’s for solving the issue in future
  • Work closely with DevOps and Application teams to align priorities, share knowledge and drive continuous improvement initiatives
  • Prioritize response efforts based on issue severity, potential impact on users, and business priorities
  • Evaluate and approve changes to production systems, balancing the need for innovation with the requirement of stability and reliability
  • Optimize resource usage and manage costs by identifying inefficiencies, rightsizing infrastructure resources, and implementing cost-saving measures

What we're looking for:

  • 3+ years of SRE experience with Azure and/or AWS
  • Bachelor’s Degree is preferred but will consider relevant experience as an equivalent
  • Programming: Java, SpringBoot, SQL, Bash
  • Monitoring: DataDog, Splunk, Grafana
  • Docker, Kubernetes, Linux
  • Incident/Alerts Management: VictorOps, PagerDuty
  • Git, Bitbucket
  • Troubleshooting complex, intertwined distributed services
  • Attention to detail
  • Testing, Monitoring, Logging, Alerting
  • Documentation
  • Incident Management

Top Skills

Bash
Java
SQL
The Company
HQ: Naperville, IL
240 Employees
Hybrid Workplace
Year Founded: 2000

What We Do

Egen is a data engineering and cloud modernization firm partnering with leading Chicagoland companies to launch, scale, and modernize industry-changing technologies. We are catalysts for change who create digital breakthroughs at warp speed. Our team of cloud and data engineering experts are trusted by top clients in pursuit of the extraordinary.

Our mission is to be an enabler of amazing possibilities for companies looking to use the power of cloud and data. We want to stand shoulder to shoulder with clients, as true technology partners, and make sure they succeed at what they have set out to do. We want to be disruptors, game-changers, and innovators who have played an important part in moving the world forward.

Similar Jobs

Motive Logo Motive

Site Reliability Engineer, Embedded

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
United States
3600 Employees
109K-156K Annually

RunPod Logo RunPod

Site Reliability Engineer

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Easy Apply
Remote
USA
53 Employees

RunPod Logo RunPod

Site Reliability Engineer - Manager

Artificial Intelligence • Cloud • Software • Infrastructure as a Service (IaaS)
Easy Apply
Remote
USA
53 Employees

Voltage Park Logo Voltage Park

Site Reliability Engineer

Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
Remote
2 Locations
51 Employees
140K-180K Annually

Similar Companies Hiring

RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account