Site Reliability Engineer

Posted 25 Days Ago
Be an Early Applicant
Pozuelo de Alarcón, Madrid, Comunidad de Madrid
Mid level
Consumer Web
The Role
The Site Reliability Engineer enhances system reliability, manages incidents, optimizes performance, implements automation, and collaborates on architectural decisions.
Summary Generated by Built In

Job Description. What is it?

Role Description (concept):

The purpose of the Site Reliability Engineer (SRE) role is to enhance and maintain the high availability and reliability of systems and applications, ensuring they effectively support business operations and contribute to a positive user experience. This role sits at the crossroads of software engineering and operations, adopting practices from both disciplines to create robust and efficient systems. Their responsibilities include:

  • Enhancing and maintaining the availability and reliability of systems and applications.
  • Proactively managing incidents to minimize downtime.
  • Optimizing system performance and ensuring scalability.
  • Implementing automation to increase operational efficiency.
  • Collaborating with security teams to strengthen system protection.
  • Developing disaster recovery strategies.
  • Maintaining detailed documentation to facilitate knowledge sharing.
  • Working with development teams to integrate reliability from the design phase.
  • Continuously evaluating and optimizing system performance and operational processes.
  • Ensuring the technological infrastructure supports business growth and objectives.

What does he/she do? (tasks):

Architecture:

- Involve in architecture decisions to ensure systems resiliency at the outset of software development

Automation and Orchestration:

- Develop scripts and use tools to automate deployment, infrastructure provisioning, configuration management, and scaling, using the CI/CD development method.

- Orchestrate complex workflows across various environments to ensure consistency and reliability.

Continuous Integration and Continuous Deployment (CI/CD):

- Design, implement, and manage CI/CD pipelines to facilitate rapid and reliable code deployments with minimal manual intervention. This may include integrating automated testing to ensure code quality

Infrastructure as Code (IaC):

- Foster use of IaC tools and practices to manage infrastructure provisioning and configuration, ensuring environments are reproducible, scalable, and maintainable.

Monitoring, Logging, and Alerting:

- Implement comprehensive monitoring and logging solutions to collect, analyze, and act on performance data and alerts.

- Use observability data to proactively identify and address issues, ensuring high availability and performance.

Performance Optimization:

- Regularly assess system performance to identify bottlenecks and inefficiencies.

- Implement optimizations to improve system response times, resource utilization, and users satisfaction

Incident Management and Reliability Engineering:

- Participate in on-call rotations, swiftly address and resolve incidents, and lead post-mortem analyses to identify root causes and prevent recurrence.

- Develop resilience and recovery strategies to meet defined Service Level Objectives (SLOs).

Security and Compliance:

- Ensure that all aspects of software development, deployment, and operations adhere to security best practices and compliance requirements.

- Implement security controls, conduct regular audits, and address vulnerabilities promptly

Quality Assurance (QA):

- Facilitate QA Teams: Provide support to QA teams by setting up environments and deploying necessary tools for quality-related activities.

- Automation Support: Collaborate with QA to automate testing processes and manage risks effectively.

- Non-Functional Testing: Work closely with QA to develop, execute and evaluate outcomes from non-functional testing

Responsibilities

  • Develop, Scale, and Automate: Design, build, and scale systems using advanced automation techniques. Develop and maintain automation scripts for system deployment and management.
  • Incident Management: Lead on-call rotations for specific systems. Conduct detailed post-mortem analyses and develop preventative strategies.
  • Performance Metrics: Define and monitor critical reliability metrics independently. Analyze performance data to identify trends and areas for improvement.
  • Cross-functional Collaboration: Work closely with development teams to ensure system reliability and performance from the design phase. Advocate for SRE principles across teams.
  • Capacity Planning and Management: Lead capacity planning and management efforts, aligning with business needs and objectives. Develop strategies for scalability and performance under varying loads.
  • Continuous Improvement: Identify and address inefficiencies in current systems and processes. Champion new technologies for operational excellence.
  • Security: Lead initiatives to strengthen system security postures. Conduct vulnerability assessments and remediation efforts.

Mandatory Skills:

  • Monitoring, Logging, and Observability: Desired advanced in comprehensive monitoring, logging, and observability strategies
  • Automation: Recommended advanced knowledge in Python and Bash for complex automation.
  • Configuration as Code: Recommended Advanced skills in Ansible for sophisticated configuration management.
  • Containerization and Orchestration: Intermediate knowledge of Docker and basic Kubernetes.
  • Databases: Recommended advanced knowledge in managing databases, with a focus on relational/no relational databases.
  • Version Control Systems: Desired advanced knowledge in proficiency with Git,

Recommended Skills:

  • Infrastructure as Code: Recommended Advanced skills in Terraform for sophisticated infrastructure provisioning and management
  • Programming: Recommended proficient in Java, with practical experience in Spring Boot.
  • Cloud Platform: Recommended Advanced knowledge of Cloud Platforms. Job Description
  • Networking and Security: Advanced knowledge in understanding of advanced networking and security concepts and practices.
  • Databases: Recommended advanced knowledge in managing databases, with a focus on relational/no relational databases.
  • CI/CD: Understanding and experience on continuous integration/deployment concepts.

Soft Skills

  • Communication: Effective verbal and written communication, focusing on clarity and understanding.
  • Collaboration: Teamwork, learning from others, and supporting team members.
  • Problem-solving: Ability to address problems with supervision and thorough investigation.
  • Emotional Intelligence: Self-awareness, regulation, and constructive handling of feedback.
  • Adaptability: Willingness to learn new technologies and methodologies.
  • Resilience: Learning from mistakes and not being discouraged by challenges.
  • Customer-focused Mindset: Basic understanding of user experience.
  • Leadership and Time Management: Self-leadership, task management, and productivity

Top Skills

Ansible
Bash
Cloud Platforms
Docker
Git
Java
Kubernetes
Python
Spring Boot
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Dublin,, Dublin,
21,728 Employees
On-site Workplace

What We Do

Verisure is the leading provider of peace of mind and protection to residential and small business customers across Europe and Latin America. We deliver professionally-monitored security services to over 5.5 million customers in 17 countries across Europe and Latin America, with a team of more than 28,000 colleagues.

Verisure’s brand family includes: Securitas Direct in Spain and Portugal (*), AlertAlarm, Dansikring Direct, Falck Alarms, Mediaveil, TeleAtlantic and NorAlarm to name a few!

Our alarms are the most widely installed home security systems in Europe. A strong focus on quality and service means our customers are among the most satisfied in the industry!

GROWTH

Verisure enjoyed consistent growth over the past 35 years as a result of its highly entrepreneurial and innovative approach to business. We also continue to expand internationally.

- Strong and visionary Management Team and a robust business plan for value creation.
- We are a big company with a start-up mindset, fast, agile and lean, merit based, high-performance and value-driven

INNOVATION

- We continuously invest more in new innovation to provide effective, intelligent and reliable security solutions.
- Offer a breakthrough product & service proposition: identify, research, develop, test & refine advanced security solutions.
- Develop exclusive hardware and software features.
- Research & Development centers in Madrid and Malmö.
- +600 R&D and IT experts… and growing!

PEOPLE

Our successful growth is dependent on our talent pipeline. Our People are our business! We are:
- Passionate in everything we do
- Committed to making a difference
- Always Innovating
- Winning as a Team
- With Trust & Responsibility

Similar Jobs

Datadog Logo Datadog

Manager I, Engineering - Datadog Governance SRE

Artificial Intelligence • Cloud • Software • Cybersecurity
Hybrid
2 Locations
5000 Employees

Datadog Logo Datadog

Director, Engineering - Observability SRE

Artificial Intelligence • Cloud • Software • Cybersecurity
Hybrid
2 Locations
5000 Employees

Celonis Logo Celonis

Staff Site Reliability Engineer (Orchestration & Actions Platform)

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Hybrid
Madrid, Comunidad de Madrid, ESP
3000 Employees

Celonis Logo Celonis

Senior Site Reliability Engineer (Orchestration & Actions)

Big Data • Information Technology • Productivity • Software • Analytics • Business Intelligence • Consulting
Hybrid
Madrid, Comunidad de Madrid, ESP
3000 Employees

Similar Companies Hiring

Munchkin, Inc. Thumbnail
Manufacturing • Kids + Family • Food • Enterprise Web • eCommerce • Consumer Web • 3D Printing
Milton, Ontario
325 Employees
News 12 Thumbnail
News + Entertainment • Digital Media • Consumer Web
Bethpage, NY
400 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account