Staff Engineer - DevOps Site Reliability

Posted 15 Hours Ago
Hiring Remotely in USA
Remote
Senior level
Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
The Role
This role involves providing L3 support for a business-critical SaaS application, working across the full technology stack, and automating SRE tools. Responsibilities include managing incidents, analyzing performance through monitoring tools, and communicating effectively with various stakeholders. A strong understanding of AWS services, Kubernetes, and networking is essential.
Summary Generated by Built In

Company Description

We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (19000+ experts across 33 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!

Job Description

  • Experienced L3 SRE engineer based on business-critical SaaS application.
  • Capacity to L3 across the full stack including infra, backend and front-end, before escalation to engineering business unit.
  • Capacity to automate SRE tools to provide proactive.
  • L3 support, close to our tech monitoring strategy.
  • Capacity to work under business pressure for business critical applications.
  • Capacity to communicate accordingly with L1,L2, Engineering, Product managers, leadership and end-users during troubleshooting.
  • Capacity to communicate accordingly.
  • Experience with incident and problem management.
  • Experience with multitenant applications.
  • Solid understanding of networking concepts(TCP/IP, DNS, Routing, etc) like VPCs, subnets, firewalls, and load balancing, TLS and SSL.
  • Experience with CI/CD pipelines (e.g., Jenkins, Github Actions) & version control.
  • Python, react/next.
  • Monitoring and logging to analyze & track resource utilization, application performance, and identify potential issues, Grafana, Prometheus, Loki or ELK.
  • Experience with AWS, particularly EKS, serverless, queue & various databases.
  • Solid knowledge Kubernetes.

Qualifications

Must have Skills: EKS, Github Actions, Python (Strong), Kubernetes (Expert), Prometheus.

Good to Have Skills: 

  • Previous experience building a user-facing GenAI/LLM software application.
  • Security best practices in cloud environments. - AWS Managed Services (RDS, Batch, Lambda, Fargate, Step Functions, SQS/SNS, etc.).
  • FastAPI and NextJS experience (if we're still using the latter).
  • Websockets, Server-Side Events, Pub/Sub (RabbitMQ, Kafka, etc.).
  • Cloud security concepts (IAM, access control).
  • Terraform experience. 

Top Skills

Python
The Company
19,994 Employees
On-site Workplace
Year Founded: 1996

What We Do

Nagarro helps future-proof your business through a forward-thinking, fluidic, and CARING mindset. We excel at digital engineering and help our clients become human-centric, digital-first organizations, augmenting their ability to be responsive, efficient, intimate, creative, and sustainable. Today, we are 19,000 experts across 36 countries, forming a Nation of Nagarrians, ready to help our customers succeed.

Similar Jobs

Atlassian Logo Atlassian

Principal Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
167K-269K Annually

Block Logo Block

Senior Software Engineer, DevOps

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Remote
Hybrid
7 Locations
12000 Employees
168K-297K Annually

Capital One Logo Capital One

Lead Software Engineer, DevOps (Remote Eligible)

Fintech • Machine Learning • Payments • Software • Financial Services
Remote
3 Locations
55000 Employees
176K-241K Annually

Nagarro Logo Nagarro

Staff Engineer - DevOps Site Reliability

Artificial Intelligence • Information Technology • Machine Learning • Software • Virtual Reality • Analytics
Remote
USA
19994 Employees

Similar Companies Hiring

Air Space Intelligence Thumbnail
Software • Machine Learning • Aerospace
Boston, , MA
109 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account