Inetum

Site Reliability Engineer

Posted 22 Hours Ago

Be an Early Applicant

Bucharest, București

Senior level

Information Technology • Consulting

The Role

As a Site Reliability Engineer, you will enhance system reliability and performance by automating operations, monitoring system performance, managing incident protocols, and collaborating with development teams on risk analysis and mitigation strategies. Your role also includes maintaining CI/CD pipelines and contributing to disaster recovery planning.

Summary Generated by Built In

Company Description

Our Mission Statement

Digital and human resources at the center of the sustainable development of our society.

In a world of continuous transformation, accelerated by technological developments and societal challenges, it is necessary to adapt in an ongoing, agile way to meet the challenges of the future.

About Inetum

Present in 19 countries with a dense network of sites, Inetum partners with major software publishers to meet the challenges of digital transformation with proximity and flexibility. Driven by its ambition for growth and scale, Inetum generated sales of 2.5 billion euros in 2023.

For more information, visit: www.inetum.com

Job Description

Approach operations challenges with a software engineering perspective, leveraging:

Monitor and appropriate address system issues.
Create strategies to detect issues.
Design systems to troubleshoot automatically.
Write and review post-mortems.
Collaborate with development teams and other stakeholders to identify potential risks.
Once risks are identified, you will analyze and evaluate potential impact and likelihood of occurrence.
Based on the risk assessment, you will implement various risk mitigation strategies to mitigate operational risks.
Continuously monitor and review the effectiveness of their risk strategies.
Study historical trends in terms of performance by using metrics like charts and graphs.
Trace the problems with system monitoring tools.
Monitor the log files to manage infrastructures at scale.
Minimizing the MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving the incidents quickly.
Maintain internal tooling.
Monitoring system performance, identifying bottlenecks, and executing pipeline optimization.
Implementing comprehensive service metrics to track and report on system reliability, performance, and efficiency.
Developing and maintaining CI/CD pipelines, enhancing the consistency and speed of software deployment.
Automating routine tasks and creating tools to improve team efficiency and system robustness.
Collaborating with development teams to integrate operational considerations into the software development life cycle.
Managing incident response protocols, including on-call rotations for junior engineers and strategic planning for senior personnel.
Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
Contributing to disaster recovery plans and ensuring robust backup systems are in place.
Partner with development teams to improve services through rigorous testing and release procedures.
Participate in system design consulting, platform management, and capacity planning.
Create sustainable systems and services through automation and uplifts.
Balance feature development speed and reliability with well-defined service-level objectives.
Working on-call shift to prevent incidents from ever happening.
Running our infrastructure with Ansible, Terraform, GitLab CI/CD, and Kubernetes.

Qualifications

Experience in using: Linux, UNIX and Windows.
DB administration & maintenance: Oracle, Cassandra, PostgreSQL, AWS DB setups, Caching DB.
Familiar with: GIT, Jira, Jenkins, Ansible.
Strong knowledge of DevOps and CI/CD pipeline (GitHub, Terraform).
Knowledge of monitoring solutions: Grafana, Prometheus, Dynatrace.
'Hands-on' AWS implementation experience across a broad range of AWS services.
Must have AWS development experience (Containerization - Docker, Amazon EKS, Lambda, EC2, S3, Amazon Document DB, PostgreSQL).
Experience with core AWS platform architecture, including areas such as: Organizations, Account Design, VPC, Subnet, segmentation strategies.
Comfortable working with cloud-native infrastructure, such as AWS Lambda, Google App Engine, and Azure Cloud Services.
Backup and Disaster Recovery approach and design.
Environment and application automation.
Proficiency in programming languages such as Python, Go, or Java.
Familiar with Encryption, Logging, and Privacy/Security Protocols (e.g., TLS 1.2, ELK stack).
Good knowledge of REST/SOAP/JSON web service API implementation.
Bachelor's degree in Computer Science, Information Technology, or a related field.
Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.
Strong understanding of cloud-based applications and infrastructure, including AWS, Azure, or Google Cloud.
Experience with IT operations best practices such as ITIL, COBIT, or DevOps.
Experience with IT service management tools such as ServiceNow or Remedy.
Familiarity with banking customer acquisition applications is preferred.

Additional Information

Benefits

Full access to foreign language learning platform
Personalized access to tech learning platforms
Tailored workshops and trainings to sustain your growth
Medical subscription
Meal tickets
Monthly budget to allocate on flexible benefit platform
Access to 7 Card services
Wellbeing activities and gatherings

Hybrid: 1-2 days/week from office (Bucharest)

Top Skills

Java

Python

View all jobs at Inetum

View Inetum Profile

Report Job

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

Resume Uploaded Successfully

The Company

20,111 Employees

On-site Workplace

What We Do

Inetum is a European leader in digital services. Inetum’s team of 28,000 consultants and specialists strive every day to make a digital impact for businesses, public sector entities and society. Inetum’s solutions aim at contributing to its clients’ performance and innovation as well as the common good.

Present in 19 countries with a dense network of sites, Inetum partners with major software publishers to meet the challenges of digital transformation with proximity and flexibility.

Driven by its ambition for growth and scale, Inetum generated sales of 2.5 billion euros in 2023.

Top Employer Europe 2024