Senior Site Reliability Engineer (SRE)

Posted 4 Days Ago
Be an Early Applicant
3 Locations
Senior level
Fintech • Payments • Financial Services
The Role
As a Senior Site Reliability Engineer, you will ensure the reliability and performance of systems by designing scalable architectures, optimizing CI/CD pipelines, automating tasks, leading incident management, and mentoring team members. Your expertise will enhance development velocity and customer satisfaction.
Summary Generated by Built In

Description

About Zeal Group

Zeal Group is an award-winning FinTech organisation offering a variety of products. Founded in 2017, we have grown to a team of 700+ employees across the globe 🌎

Our offices and presence are spread across Europe, Asia, North & South Africa, Middle East and South America, with our Technology hub located in Cyprus 🚀

We are a product and people focused company who are passionate about growth, innovative technology, and collaboration 🙌🏼
About the Role

We are looking for a Senior Site Reliability Engineer (SRE) to join our engineering team and help drive the reliability, scalability, and performance of our infrastructure. As a Senior SRE, you will play a key role in architecting and maintaining highly available systems, optimizing our CI/CD pipelines, automating repetitive tasks, and ensuring seamless deployment and observability for our services. Your contributions will have a direct impact on our development velocity, service uptime, and overall customer satisfaction. Our team of SRE engineers is fully responsible for the infrastructure in the clouds and its fault tolerance and performance. To support the development and their pipelines, we have a separate DevOps team that helps them.
Responsibilities:

  • System Design & Architecture: Collaborate with software engineers and DevOps to design and implement resilient and scalable systems, focusing on high availability, fault tolerance, and disaster recovery.
  • Automation & Infrastructure as Code: Develop and maintain infrastructure automation scripts and tools using Terraform, Ansible, or similar technologies, ensuring reproducibility and consistency across environments.
  • CI/CD Pipeline Optimization: Build and enhance CI/CD pipelines to accelerate deployment speed and reduce time to market, including implementing blue-green or canary deployments where applicable.
  • Monitoring & Alerting & Logging: Create, manage, and refine monitoring dashboards and alerting systems using tools like Prometheus, Grafana, ElasticSearch to proactively detect and address potential issues before they impact customers.
  • Incident Management & Troubleshooting: Lead incident response efforts, perform root cause analysis, and implement long-term fixes to prevent reoccurrence, ensuring a fast, reliable response to production issues.
  • Performance Tuning: Conduct regular performance testing and tuning, identifying bottlenecks in infrastructure performance and system resources.
  • Mentorship & Leadership: Guide and mentor other team members, sharing best practices and helping to build a culture of reliability and performance within the engineering organization.
Requirements
  • 5+ years of experience in SRE, DevOps, or a similar role, with a proven track record of managing large-scale, distributed systems.
  • Strong knowledge of Linux/Unix systems and networking fundamentals.
  • Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Hands-on experience with infrastructure as code (IaC) tools such as Terraform, Ansible.
  • Familiarity with monitoring, logging, and alerting tools (e.g., Prometheus, Grafana, ELK).
  • Knowledge of technology for storing and delivering secrets to microservices (Hashicorp Vault)
  • Cloud Expertise: Experience with cloud platform (GCP) and understanding of cloud-native architectures.
  • Problem-Solving Skills: Strong analytical and problem-solving skills, with a focus on building automated, scalable solutions to complex challenges.
  • Collaboration: Ability to work cross-functionally with engineering, product, and support teams, with excellent communication and collaboration skills.

Technology Stack:

  • CDN providers: Akamai, EdgeNext
  • Cloud Platform: GCP
  • Orchestration: Kubernetes
  • CI/CD: GitLab, ArgoCD
  • IAC: terraform, ansible
  • Event streams: Kafka, RabbitMQ
  • Logging: ElasticSearch, Kibana, filebeat, logstash
  • Monitoring: Prometheus/VictoriaMetrics, Grafana, AlertManager, PagerDuty.
  • Secret Management: Hashicorp Vault, External Secret Operator
  • Artifactory: Sonatype Nexus
  • Object storage: GCS, minio

Top Skills

Bash
Go
Python
The Company
Amsterdam
348 Employees
On-site Workplace
Year Founded: 2017

What We Do

Zeal Group is an award-winning FinTech organisation offering a variety of products. Founded in 2017, we have grown to a team of 700+ employees across the globe
Our offices and presence are spread across Europe, Asia, North & South Africa, Middle East and South America, with our Technology hubs located in Cyprus and Netherlands
We are a product and people focused company who are passionate about growth, innovative technology, and collaboration

Similar Jobs

Libertex Group Logo Libertex Group

Site Reliability Engineer

Fintech • Software • Financial Services
Vojvodina, SRB
1012 Employees
2 Locations
289 Employees
2 Locations
182 Employees
Novi Sad, Juzno-Bački, Vojvodina, SRB
6277 Employees

Similar Companies Hiring

Bectran, Inc Thumbnail
Software • Machine Learning • Information Technology • Fintech • Automation • Artificial Intelligence
Schaumburg, IL
51 Employees
Energy CX Thumbnail
Utilities • Professional Services • Greentech • Financial Services • Energy • Consulting • Business Intelligence
Chicago, IL
55 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account