Job Description. What is it?
Role Description (concept):
The purpose of the Site Reliability Engineer (SRE) role is to enhance and maintain the high availability and reliability of systems and applications, ensuring they effectively support business operations and contribute to a positive user experience. This role sits at the crossroads of software engineering and operations, adopting practices from both disciplines to create robust and efficient systems. Their responsibilities include:
- Enhancing and maintaining the availability and reliability of systems and applications.
- Proactively managing incidents to minimize downtime.
- Optimizing system performance and ensuring scalability.
- Implementing automation to increase operational efficiency.
- Collaborating with security teams to strengthen system protection.
- Developing disaster recovery strategies.
- Maintaining detailed documentation to facilitate knowledge sharing.
- Working with development teams to integrate reliability from the design phase.
- Continuously evaluating and optimizing system performance and operational processes.
- Ensuring the technological infrastructure supports business growth and objectives.
What does he/she do? (tasks):
Architecture:
- Involve in architecture decisions to ensure systems resiliency at the outset of software development
Automation and Orchestration:
- Develop scripts and use tools to automate deployment, infrastructure provisioning, configuration management, and scaling, using the CI/CD development method.
- Orchestrate complex workflows across various environments to ensure consistency and reliability.
Continuous Integration and Continuous Deployment (CI/CD):
- Design, implement, and manage CI/CD pipelines to facilitate rapid and reliable code deployments with minimal manual intervention. This may include integrating automated testing to ensure code quality
Infrastructure as Code (IaC):
- Foster use of IaC tools and practices to manage infrastructure provisioning and configuration, ensuring environments are reproducible, scalable, and maintainable.
Monitoring, Logging, and Alerting:
- Implement comprehensive monitoring and logging solutions to collect, analyze, and act on performance data and alerts.
- Use observability data to proactively identify and address issues, ensuring high availability and performance.
Performance Optimization:
- Regularly assess system performance to identify bottlenecks and inefficiencies.
- Implement optimizations to improve system response times, resource utilization, and users satisfaction
Incident Management and Reliability Engineering:
- Participate in on-call rotations, swiftly address and resolve incidents, and lead post-mortem analyses to identify root causes and prevent recurrence.
- Develop resilience and recovery strategies to meet defined Service Level Objectives (SLOs).
Security and Compliance:
- Ensure that all aspects of software development, deployment, and operations adhere to security best practices and compliance requirements.
- Implement security controls, conduct regular audits, and address vulnerabilities promptly
Quality Assurance (QA):
- Facilitate QA Teams: Provide support to QA teams by setting up environments and deploying necessary tools for quality-related activities.
- Automation Support: Collaborate with QA to automate testing processes and manage risks effectively.
- Non-Functional Testing: Work closely with QA to develop, execute and evaluate outcomes from non-functional testing
Responsibilities
- Develop, Scale, and Automate: Design, build, and scale systems using advanced automation techniques. Develop and maintain automation scripts for system deployment and management.
- Incident Management: Lead on-call rotations for specific systems. Conduct detailed post-mortem analyses and develop preventative strategies.
- Performance Metrics: Define and monitor critical reliability metrics independently. Analyze performance data to identify trends and areas for improvement.
- Cross-functional Collaboration: Work closely with development teams to ensure system reliability and performance from the design phase. Advocate for SRE principles across teams.
- Capacity Planning and Management: Lead capacity planning and management efforts, aligning with business needs and objectives. Develop strategies for scalability and performance under varying loads.
- Continuous Improvement: Identify and address inefficiencies in current systems and processes. Champion new technologies for operational excellence.
- Security: Lead initiatives to strengthen system security postures. Conduct vulnerability assessments and remediation efforts.
Mandatory Skills:
- Monitoring, Logging, and Observability: Desired advanced in comprehensive monitoring, logging, and observability strategies
- Automation: Recommended advanced knowledge in Python and Bash for complex automation.
- Configuration as Code: Recommended Advanced skills in Ansible for sophisticated configuration management.
- Containerization and Orchestration: Intermediate knowledge of Docker and basic Kubernetes.
- Databases: Recommended advanced knowledge in managing databases, with a focus on relational/no relational databases.
- Version Control Systems: Desired advanced knowledge in proficiency with Git,
Recommended Skills:
- Infrastructure as Code: Recommended Advanced skills in Terraform for sophisticated infrastructure provisioning and management
- Programming: Recommended proficient in Java, with practical experience in Spring Boot.
- Cloud Platform: Recommended Advanced knowledge of Cloud Platforms. Job Description
- Networking and Security: Advanced knowledge in understanding of advanced networking and security concepts and practices.
- Databases: Recommended advanced knowledge in managing databases, with a focus on relational/no relational databases.
- CI/CD: Understanding and experience on continuous integration/deployment concepts.
Soft Skills
- Communication: Effective verbal and written communication, focusing on clarity and understanding.
- Collaboration: Teamwork, learning from others, and supporting team members.
- Problem-solving: Ability to address problems with supervision and thorough investigation.
- Emotional Intelligence: Self-awareness, regulation, and constructive handling of feedback.
- Adaptability: Willingness to learn new technologies and methodologies.
- Resilience: Learning from mistakes and not being discouraged by challenges.
- Customer-focused Mindset: Basic understanding of user experience.
- Leadership and Time Management: Self-leadership, task management, and productivity
Top Skills
What We Do
Verisure is the leading provider of peace of mind and protection to residential and small business customers across Europe and Latin America. We deliver professionally-monitored security services to over 5.5 million customers in 17 countries across Europe and Latin America, with a team of more than 28,000 colleagues.
Verisure’s brand family includes: Securitas Direct in Spain and Portugal (*), AlertAlarm, Dansikring Direct, Falck Alarms, Mediaveil, TeleAtlantic and NorAlarm to name a few!
Our alarms are the most widely installed home security systems in Europe. A strong focus on quality and service means our customers are among the most satisfied in the industry!
GROWTH
Verisure enjoyed consistent growth over the past 35 years as a result of its highly entrepreneurial and innovative approach to business. We also continue to expand internationally.
- Strong and visionary Management Team and a robust business plan for value creation.
- We are a big company with a start-up mindset, fast, agile and lean, merit based, high-performance and value-driven
INNOVATION
- We continuously invest more in new innovation to provide effective, intelligent and reliable security solutions.
- Offer a breakthrough product & service proposition: identify, research, develop, test & refine advanced security solutions.
- Develop exclusive hardware and software features.
- Research & Development centers in Madrid and Malmö.
- +600 R&D and IT experts… and growing!
PEOPLE
Our successful growth is dependent on our talent pipeline. Our People are our business! We are:
- Passionate in everything we do
- Committed to making a difference
- Always Innovating
- Winning as a Team
- With Trust & Responsibility