Important Information
- Experience: More than 4 years
- Job Mode: Full-time
- Work Mode: Hybrid
Job Summary
- Site Reliability Engineering (SRE) is a discipline that blends software engineering with infrastructure and operations, aimed at building scalable and highly reliable software systems.
- Focus on application monitoring, emergency response, and change management to ensure reliability and efficiency.
- Collaborate with development teams throughout the software lifecycle to solve system-related issues and automate routine tasks.
- Enhance system reliability, scalability, and performance by leveraging modern tools and processes.
Responsibilities and Duties
- Application Monitoring: Utilize tools and automation for continuous application monitoring and reliability.
- Emergency Response: Respond promptly to emergency incidents, perform root cause analysis, and resolve ongoing production issues.
- Change Management: Manage and streamline release and change management processes to improve system performance.
- Collaboration: Partner with development teams to solve system issues, automate routine tasks, and eliminate toil.
- Reliability and Scalability: Ensure systems are highly reliable, scalable, and efficient to meet performance standards.
Qualifications and Skills
- Strong understanding of monitoring tools such as Azure Monitoring, App Insights, Prometheus, and Grafana.
- Experience with Infrastructure as Code tools like Terraform, ARM/Bicep, or Pulumi.
- Proficiency in release management tooling such as ArgoCD, Harness, and Octopus.
- Familiarity with incident alert tools like PagerDuty or Opsgenie.
- Expertise in container orchestration tools like Kubernetes and AKS.
- Proficiency in scripting (C#, Python, Bash, PowerShell -one of them is mandatory)
- Strong collaboration and problem-solving abilities to resolve system issues effectively.
- Knowledge of project tracking and version management tools like JIRA, SVN, and GitHub.
Role-specific Requirements
- Proven experience in application monitoring and automated reliability processes.
- Strong background in managing system reliability and performing root cause analysis during emergency responses.
- Hands-on experience in change management processes and production environment releases.
- Advanced knowledge of tools and practices for infrastructure automation and incident handling.
- Familiarity with scalable system architecture principles and best practices.
Technologies
- Monitoring Tools: Azure Monitoring, App Insights, Prometheus, Grafana
- Infrastructure as Code: Terraform, ARM/Bicep, Pulumi
- Release Management Tools: ArgoCD, Harness, Octopus
- Incident Alert Tools: PagerDuty, Opsgenie
- Container Orchestration: Kubernetes, AKS
- Project Management Tools: JIRA, SVN, GitHub
- Scripting: C#, Python, Bash or PowerShell
Skillset Competencies
- Advanced monitoring and incident management techniques.
- Infrastructure as Code and automation of routine workflows.
- Expertise in release and change management processes.
- Strong knowledge of container orchestration and scalable system design.
- Excellent communication, collaboration, and problem-solving skills.
- Ability to work effectively in cross-functional and virtual teams.
About Encora
Encora is a trusted partner for digital engineering and modernization, working with some of the world’s leading enterprises and digital-native companies. With over 9,000 experts in 47+ offices worldwide, Encora offers expertise in areas such as Product Engineering, Cloud Services, Data & Analytics, AI & LLM Engineering, and more. At Encora, hiring is based on skills and qualifications, embracing diversity and inclusion regardless of age, gender, nationality, or background.
Top Skills
What We Do
Headquartered in Santa Clara, California, and backed by renowned private equity firms Advent International and Warburg Pincus, Encora is the preferred technology modernization and innovation partner to some of the world’s leading enterprise companies. It provides award-winning digital engineering services including Product Engineering & Development, Cloud Services, Quality Engineering, DevSecOps, Data & Analytics, Digital Experience, Cybersecurity, and AI & LLM Engineering. Encora's deep cluster vertical capabilities extend across diverse industries, including HiTech, Healthcare & Life Sciences, Retail & CPG, Energy & Utilities, Banking Financial Services & Insurance, Travel, Hospitality & Logistics, Telecom & Media, Automotive, and other specialized industries.
With over 9,000 associates in 47+ offices and delivery centers across the U.S., Canada, Latin America, Europe, India, and Southeast Asia, Encora delivers nearshore agility to clients anywhere in the world, coupled with expertise at scale in India. Encora’s Cloud-first, Data-first, AI-first approach enables clients to create differentiated enterprise value through technology