Empowering Africa’s tomorrow, together…one story at a time.
With over 100 years of rich history and strongly positioned as a local bank with regional and international expertise, a career with our family offers the opportunity to be part of this exciting growth journey, to reset our future and shape our destiny as a proudly African group.
Job Summary
Work as part of an integrated (run & build) tribe in lower complexity environments to provide enterprise wide application support across multiple stakeholder groups by maintaining & optimizing enterprise-grade applications (tech products & services).
Job Description
We are seeking a skilled and motivated Specialist Site Reliability Engineer (SRE) to join the CIB TxB Technology Foreign Exchange team. This role requires a professional with expertise in DevOps pipelines, AWS, and automation, alongside strong troubleshooting and monitoring capabilities. The successful candidate will play a crucial role in maintaining and improving the reliability, scalability, and performance of our systems, ensuring seamless operations and minimal downtime.
Key Responsibilities:
DevOps and Pipeline Management:
- Design, implement, and maintain CI/CD pipelines to ensure efficient and reliable software delivery.
- Automate deployment processes to minimize manual interventions and optimize deployment speed.
- Collaborate with development teams to integrate DevOps best practices into the development lifecycle.
Cloud Infrastructure (AWS):
- Manage and optimize AWS cloud infrastructure to ensure scalability, cost-effectiveness, and reliability.
- Implement robust infrastructure-as-code solutions using tools such as Terraform or CloudFormation.
- Monitor and manage cloud resources, ensuring high availability and adherence to security best practices.
Monitoring and Alerting:
- Configure and maintain application performance monitoring (APM) tools, including New Relic and Grafana.
- Establish and maintain effective alerting systems, integrating with tools like OpsGenie.
- Analyze and resolve system issues proactively using logs and metrics.
Database Management:
- Monitor and maintain PostgreSQL databases, optimizing performance and ensuring data integrity.
Troubleshooting and Incident Response:
- Perform root cause analysis for system failures and implement preventive measures.
- Respond to incidents and outages, minimizing impact and ensuring prompt resolution.
Automation and Health Tasks:
- Develop scripts and tools (e.g., PowerShell) to automate routine operational tasks and health checks.
- Implement and maintain certificate upgrading processes to ensure secure communications.
- Identify repetitive tasks and design automation to improve operational efficiency.
User Support and Incident Logging:
- Provide technical support to users, troubleshooting and resolving issues as needed.
- Log, manage, and track support incidents using ServiceNow (SNOW).
- Collaborate with relevant teams to ensure timely resolution of user-reported problems.
- Provide updates and feedback to business concerning Incidents and MIM calls.
Collaboration and Communication:
- Partner with cross-functional teams to ensure alignment on reliability and performance objectives.
- Share knowledge and mentor team members on SRE best practices and tools.
Required Skills and Qualifications:
Technical Expertise:
- 5+ years Information Technology experience with min 3 years relevant application support experience
- Proficiency in DevOps pipelines and CI/CD tools such as Azure DevOps.
- Strong hands-on experience with AWS services, including EC2, RDS, S3, Lambda, and CloudWatch.
- Proficiency in scripting languages such as PowerShell, Bash, or Python.
- Solid understanding of monitoring tools like New Relic, Grafana, and Prometheus.
- Experience with PostgreSQL database management and optimization.
Problem-Solving:
- Proven troubleshooting skills with a focus on diagnosing and resolving complex system issues.
- Experience with APM tools to track and resolve performance bottlenecks.
Automation:
- Demonstrated experience in automating operational tasks and processes, including health monitoring and certificate management.
Alerting and Incident Management:
- Hands-on experience with OpsGenie or similar alerting platforms.
- Familiarity with incident response processes and tools.
Communication and Collaboration:
- Strong written and verbal communication skills, with the ability to collaborate effectively across teams.
- Ability to document processes and procedures clearly and concisely.
Preferred Qualifications (or working toward this):
- Certifications such as AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), or PostgreSQL certifications.
- Experience with containerization technologies like Docker and Kubernetes.
- Familiarity with ITIL or SRE principles and practices.
Why Join Us?
- Opportunity to work on cutting-edge technologies and influence system architecture.
- Be part of a collaborative, innovative, and growth-oriented team.
Education
Bachelor's Degree: Information Technology
Absa Bank Limited is an equal opportunity, affirmative action employer. In compliance with the Employment Equity Act 55 of 1998, preference will be given to suitable candidates from designated groups whose appointments will contribute towards achievement of equitable demographic representation of our workforce profile and add to the diversity of the Bank.
Absa Bank Limited reserves the right not to make an appointment to the post as advertised
Top Skills
What We Do
Absa Group Limited (Absa) has forged a new way of getting things done, driven by bravery and passion, with the readiness to realise growth on the African continent and beyond.
We’re a truly African brand, inspired by the people we serve in Botswana, Ghana, Kenya, Mauritius, Mozambique, Seychelles, South Africa, Tanzania, Uganda, and Zambia. We also have representative offices in China, Namibia, Nigeria and the United States, as well as securities entities in the United Kingdom and the United States, along with technology support colleagues in the Czech Republic.