Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity

Posted Yesterday
Be an Early Applicant
Hiring Remotely in Bengaluru, Karnataka
Remote
Senior level
Cloud • Security • Software • Analytics
The Role
As a Site Reliability Engineer in Engineering Productivity at Arista Networks, you will design and operate systems to enhance development team's activities. Responsibilities include monitoring production systems, automating responses to alerts, improving infrastructure stability, debugging issues, and collaborating with software engineers for process optimization.
Summary Generated by Built In

Company Description

Arista Networks was founded to pioneer and deliver software driven cloud networking solutions for large data center storage and computing environments. Arista’s award-winning platforms, ranging in Ethernet speeds from 10 to 400 gigabits per second, redefine scalability, agility and resilience. Arista has shipped more than 20 million cloud networking ports worldwide with CloudVision and EOS, an advanced network operating system. Committed to open standards, Arista is a founding member of the 25/50GbE consortium. Arista Networks products are available worldwide directly and through partners.
Additional information and resources can be found at:
www.arista.com
www.twitter.com/aristanetworks
www.facebook.com/AristaNW
www.youtube.com/user/AristaNetworks

Job Description

Working in Engineering Productivity (EngProd), you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use. The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, and Varnish and also internal systems that we’ve built from the ground-up to automate CI/CD, testing, analysis, and visualization.

Responsibilities:

  • Keeping the production status green all the time

  • Proactively monitor, respond to, and enhance alerts

  • Build automated responses to the most common alerts or work with the rest of the EngProd team to build them

  • Create and maintain the incident response runbooks working with the service dev teams

  • Debug and resolve issues impacting developer user experience and infrastructure stability

  • Develop patterns to support system reliability and socialize them within the EngProd team

  • Review and contribute to the specifications and implementations written by other team members.

  • Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure and provide fixes for those problems.

  • Provide support for our tools and infrastructure to Arista’s development team.

Qualifications

  • At least BS Computer Science or Engineering +5 years’ experience, MS Computer Science or Engineering + 4 years’ experience, or Ph.D. in Computer Science or equivalent work experience.

  • Knowledge of one or more of Go, Python, Javascript, Shell Scripting.

  • Knowledge of Linux (or UNIX).

  • Experience operating software systems at scale

  • Strong understanding of the fundamentals of storage and networking

  • Comfortable with Ansible and GitOps 

  • Applied understanding of software engineering principles.

  • Strong problem solving and software troubleshooting skills.

  • Ability to design a solution and implement features independently. Ability to work in small teams.

Additional Information

All your information will be kept confidential according to EEO guidelines.

Top Skills

Go
JavaScript
Python
Shell Scripting
The Company
HQ: Santa Clara, CA
29 Employees
On-site Workplace
Year Founded: 2004

What We Do

Arista Networks is a leader in data-driven, client to cloud networking for data center, campus, and routing environments. Arista’s award-winning platforms deliver availability, agility, automation, analytics, and security.

We've created this space to keep you updated Arista channel and partner news and updates.

Similar Jobs

Remote
8 Locations
880 Employees

Red Hat Logo Red Hat

Site Reliability Engineer - OpenShift

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Remote
India
20000 Employees
Remote
8 Locations
880 Employees
Remote
8 Locations
880 Employees

Similar Companies Hiring

HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account