Senior Engineer, (Site Reliability)

Posted 9 Days Ago
Be an Early Applicant
India
5-7 Years Experience
Information Technology
The Role
The Senior Engineer (Site Reliability) is responsible for designing and maintaining scalable systems, incident management, monitoring, automation, performance tuning, capacity planning, mentoring junior staff, and ensuring effective documentation. This role focuses on maintaining system reliability and optimizing operational efficiency in AWS environments.
Summary Generated by Built In

Responsibilities:

  • System Design and Architecture: Design, implement, and maintain scalable and reliable systems, ensuring they can handle both current and future demands.
  • Incident Management: Lead incident response efforts, diagnose root causes, and implement long-term solutions to prevent recurrence. Ensure effective communication during outages.
  • Monitoring and Observability: Develop and maintain comprehensive monitoring and alerting systems to proactively identify and address issues before they impact users.
  • Automation and Efficiency: Automate repetitive tasks and processes to improve operational efficiency and reduce manual intervention.
  • Performance Tuning: Continuously optimize system performance, including fine-tuning applications, databases, and infrastructure to meet service level objectives (SLOs).
  • Capacity Planning: Forecast future system requirements based on growth trends and current usage, and plan capacity upgrades to ensure system reliability.
  • Collaboration and Mentoring: Work closely with development teams to integrate reliability into the software development lifecycle. Mentor junior SREs and share best practices.
  • Documentation and Knowledge Sharing: Create and maintain detailed documentation on system design, incident response procedures, and operational practices to ensure knowledge is preserved and accessible.

Requirements:

  • 5+ years of experience as an SRE within AWS environments at medium to large-scale organizations.
  • 5+ years of hands-on experience implementing and managing observability tools, such as Prometheus, New Relic, Grafana, or similar.
  • Advanced programming skills in Python, Groovy, and Bash.
  • Strong understanding of database technologies, including both SQL and NoSQL systems.
  • 3+ years of experience developing and managing infrastructure deployment pipelines using Git, Terraform, Helm, Jenkins/Jenkins X/ArgoCD, or similar tools.
  • Proven expertise in designing, evaluating, and supporting production environments in AWS, including VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail, Control Tower, GuardDuty, MSK, S3, Glacier, Gateways, Direct Connect, Route 53, RDS, ALBs, Autoscaling, and more.
  • Hands-on experience with Linux systems and protocols and technologies such as HTTP, REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message Brokers, Nginx, Vault, etc.
  • Extensive experience with Kubernetes and various container and cloud-native technologies.
  • Significant experience in managing 24/7 on-call rotations, creating runbooks, establishing support procedures, and proactively monitoring systems across multiple geographic locations.
  • Ability to thrive under pressure and excel in a technically challenging environment.

Innovation Lives Here

You go all in no matter what you do, and so do we. At Lytx, we’re powered by cutting-edge technology and Happy People. You want your work to make a positive impact in the world, and that’s what we do. Join our diverse team of hungry, humble and capable people united to make a difference.

Together, we help save lives on our roadways.

Find out how good it feels to be a part of an inclusive, collaborative team. We’re committed to delivering an environment where everyone feels valued, included and supported to do their best work and share their voices.

Lytx, Inc. is proud to be an equal opportunity/affirmative action employer and maintains a drug-free workplace. We’re committed to attracting, retaining and maximizing the performance of a diverse and inclusive workforce. EOE/M/F/Disabled/Vet.

Top Skills

Bash
Groovy
Python
The Company
Framingham, MA
790 Employees
On-site Workplace
Year Founded: 1998

What We Do

Learn how Lytx video telematics can help you improve safety, efficiency, and DOT compliance in your fleet. Start improving your fleet operations today.

Jobs at Similar Companies

Silverfort Logo Silverfort

Sales Engineer- TOLA

Information Technology • Sales • Security • Cybersecurity • Automation
Remote
United States
357 Employees

Jobba Trade Technologies, Inc. Logo Jobba Trade Technologies, Inc.

Customer Success Specialist

Cloud • Information Technology • Productivity • Professional Services • Software
Hybrid
Chicago, IL, USA
45 Employees

InCommodities Logo InCommodities

Head of People & Culture - US

Information Technology • Machine Learning • Analytics • Energy • Automation • Renewable Energy
Hybrid
Austin, TX, USA
234 Employees

Similar Companies Hiring

Silverfort Thumbnail
Security • Sales • Information Technology • Cybersecurity • Automation
GB
357 Employees
Jobba Trade Technologies, Inc. Thumbnail
Software • Professional Services • Productivity • Information Technology • Cloud
Chicago, IL
45 Employees
InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account