Site Reliability Engineering Leader

Posted 15 Days Ago
Be an Early Applicant
Hiring Remotely in Argentina
Remote
Mid level
Fintech • Payments • Financial Services
The Role
As a Site Reliability Engineering Leader at dLocal, you will design and implement resilient systems, develop quality gates based on service level objectives, automate validation processes, influence architectural decisions, and collaborate across teams to meet monitoring and alerting requirements.
Summary Generated by Built In

Why should you join dLocal?


dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets. Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly. As both a payments processor and a merchant of record where we operate, we make it possible for our merchants to make inroads into the world’s fastest-growing, emerging markets. 


By joining us you will be a part of an amazing global team that makes it all happen, in a flexible, remote-first dynamic culture with travel, health, and learning benefits, among others. Being a part of dLocal means working with 900+ teammates from 25+ different nationalities and developing an international career that impacts millions of people’s daily lives. We are builders, we never run from a challenge, we are customer-centric, and if this sounds like you, we know you will thrive in our team.


What's the opportunity?


We are looking for a Site Reliability Engineering (SRE) Engineer to join our team! As our Site Reliability Engineering (SRE) Engineer, you will be focused on the design and implementation of systems that are highly resilient, scalable and reliable. You will be part of a talented team that works on mission-critical applications with big customers like Netflix, Amazon, Nike, Facebook & more!


An SRE Engineer asks the necessary questions:

What data do we need in order to understand how our systems are performing?

How do we collect this data?

What patterns are we looking for in the data and what do they mean?

Who should be notified when a certain system is not working properly?

Do we have any systems that we need more data for?


An SRE engineer designs systems and processes to answer the questions above and to provide automated support and response where possible.

What will you do?

  • Develop quality gates based on production-level service level objectives (SLOs) to detect issues earlier in the development cycle.
  • Automate build testing and validation using service-level indicators (SLIs) and SLOs
  • Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development.
  • Design processes, playbooks and checklists for other engineers to follow during and after incidents
  • Write post mortems and perform technical after-action reviews to understand root cause and propose system improvements to reduce overall fault rates
  • Interact with members from almost all teams across the business to understand their monitoring, alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements.
  • Automate the provisioning of monitoring tools and rules with tools like Terraform and Ansible / Chef
  • Design base level requirements for new and existing services to ensure that all dLocal infrastructure and code are monitored consistently and accurately at a basic level.
  • Monitor both the technical health as well as the security health of dLocal infrastructure and systems
  • Optimize signal-to-noise ratio for alerting to ensure we receive only the alerts that are actionable and make sense.

Which skill do you need?

  • Over 3 years’ of experience as SRE Engineer or in a very similar role
  • Experience with monitoring tools such as New Relic, DataDog, Nagios
  • Experience working with tools such as Jira, PagerDuty and Confluence and integrating these tools with automated processing techniques (API integrations)
  • Experience with CI/CD tools such as Github Actions, Jenkins, Spinnaker, ArgoCD or similar
  • Knowledge of security best practices and infosec tooling. (You will be writing systems to monitor for breaches and insecurities.)
  • Strong communication skills
  • Problem-solving skills
  • Detail-oriented person
  • Highly analytical person
  • Ability to collaborate across multi-functional teams
  • Cloud experience (AWS) is highly advantageous (as most systems will integrate with AWS at some level).
  • IaC experience with a tool like Terraform is highly advantageous 
  • CaC experience with a tool like Ansible, Chef or Salt is highly advantageous
  • Database knowledge is highly advantageous (both in terms of how they perform and SQL syntax).

What happens after you apply?


Our Talent Acquisition team is invested in creating the best candidate experience possible, so don’t worry, you will definitely hear from us. We will review your CV and keep you posted by email at every step of the process!


Also, you can check out our webpage, Linkedin, Instagram, and Youtube for more about dLocal!

Top Skills

Ansible
Argocd
AWS
Chef
Ci/Cd
Datadog
Github Actions
Jenkins
Nagios
New Relic
Spinnaker
Terraform
The Company
932 Employees
On-site Workplace
Year Founded: 2016

What We Do

dLocal started with one goal – to close the payments innovation gap between global enterprise companies, and customers in emerging economies. We have over 900 payment methods, in more than 40 countries.

With the ability to accept local payment methods and facilitate cross-border fund settlement worldwide, our merchants reach billions of underserved consumers in the high-growth markets of Africa, Asia, and Latin America. dLocal offers the ideal payment solutions for global commerce:

Payins: Accept local payment methods
Payouts: Compliantly send funds cross-border
Defense Suite: Manage fraud effectively
dLocal for Platforms: Unify your platform’s payment solution
Local Issuing: Localize payments for your gig-economy workers, suppliers, and partners

Similar Jobs

TrueML Logo TrueML

Senior Engineer II, Payments

Fintech • Machine Learning • Social Impact • Software • Financial Services
Remote
Argentina
340 Employees
56K-101K Annually

Superhuman Logo Superhuman

Senior Backend Engineer

Consumer Web • Enterprise Web • Mobile • Productivity • Software
Easy Apply
Remote
13 Locations
116 Employees

Superhuman Logo Superhuman

AI Software Engineer

Consumer Web • Enterprise Web • Mobile • Productivity • Software
Easy Apply
Remote
13 Locations
116 Employees

Luxury Presence Logo Luxury Presence

Sr. Software Engineer - LATAM (Remote)

Marketing Tech • Real Estate • Software • PropTech • SEO
Remote
12 Locations
417 Employees

Similar Companies Hiring

Bectran, Inc Thumbnail
Software • Machine Learning • Information Technology • Fintech • Automation • Artificial Intelligence
Schaumburg, IL
51 Employees
Energy CX Thumbnail
Utilities • Professional Services • Greentech • Financial Services • Energy • Consulting • Business Intelligence
Chicago, IL
55 Employees
MassMutual India Thumbnail
Insurance • Information Technology • Fintech • Financial Services • Big Data
Hyderabad, Telangana

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account