Octus

Site Reliability Engineer

Posted 8 Days Ago

Easy Apply

Be an Early Applicant

Bogotá, Bogotá, D.C.

Hybrid

Senior level

Fintech • News + Entertainment • Software • Database • Financial Services

The Role

The Site Reliability Engineer will build and maintain scalable services, automate processes, improve system reliability, and collaborate with engineering teams on production issues.

Summary Generated by Built In

Octus

Octus is a leading global provider of credit intelligence, data, and analytics. Since 2013, tens of thousands of professionals across hedge fund, investment banking, management consulting, and law firm verticals have come to rely on Octus to make better, faster, and more confident decisions in pace with the fast-moving credit markets.
For more information, visit: https://octus.com/

Working at Octus

Octus hires growth-minded innovators and trailblazers across the globe to drive our business and culture. Our core values – Action Oriented, Customer First Mindset, Effective Team Players, and Driven to Excel – define an organizational ethos that’s as high-performing as it is human. Among other perks, Octus employees enjoy competitive health benefits, matched 401k and pension plans, PTO, generous parental leave, gym subsidies, educational reimbursements for career development, recognition programs, pet-friendly offices (US only), and much more.
Role

We’re looking for Site Reliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure across our cloud environment. Site Reliability Engineers combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes wherever possible, using whichever tools are best for the job. You’ll be the experts for the environments that you operate infrastructure in, helping partner teams build & configure their software to operate reliably within.We strongly believe in engineering teams being responsible for the operations of their services in production. In this role, you’ll work closely with engineers to advocate and participate in sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues.

What you'll do:

Identify, assess, and mitigate risks associated with our systems, applications, and infrastructure.
Proactively recognize sources of instability in distributed systems and analyze how complex systems fail from a reliability and resilience perspective.
Improve our applications availability, reliability, and observability and reduce outages to a minimum.
Implement DR strategies, including backups and recovery techniques with minimal downtime for different applications.
Automate and codify our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof.
Deep dive into issues and outages to establish root causes and communicate them to your business partners.
Write and maintain thorough documentation to share with your teammates around the world, allowing them all to function as a cohesive unit.
Participate in a 24/7 weekly on-call rotation with members of your team to troubleshoot incidents in a complex distributed systems environment.
Ability to create meaningful metrics and alerting for service health monitoring.

Skills and knowledge you should posses:

Bachelor's degree in Computer Science or a related field, or equivalent experience
5+ years of experience in SRE, Devops or systems engineering
Proficient in command-line interface (CLI) operations, shell scripting (Python or Bash), and Linux system administration
Extensive experience working with Infrastructure as code technologies, preferably Terraform
Extensive experience working with major cloud providers, preferably AWS
Significant experience working with Observability and telemetry tools ( Datadog, AWS Cloudwatch, New Relic, Prometheus, Grafana etc.)
Professional experience in working with at least one general purpose programming language (Python, PHP, Go, C# etc.)
Experience building CI/CD workflows with tools like Jenkins, CircleCI, Github actions or AWS Code pipeline
Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP

Bonus points (nice skills to have):

Database Systems Fundamentals (MySQL/Postgres) and administering them at scale including schema and query optimization
Familiarity working with event driven systems and messaging infrastructure (Kafka, RabbitMQ, AWS Kinesis etc.)
Experience working with containerized and serverless applications such as Docker, AWS ECS, Kubernetes and AWS Lambda
Experience working with web servers such as Nginx, Apache, Tomcat etc.
Application security, infrastructure security and SOC2 compliance experience

Equal Employment Opportunity

Octus is committed to providing equal employment opportunities to all employees and applicants for employment without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, pregnancy, veteran status, or any other legally protected status. We strive to create an inclusive and diverse work environment where all individuals are valued, respected, and treated fairly. We believe that diversity enriches our workplace and enhances our ability to innovate and succeed.

Top Skills

AWS

Aws Cloudwatch

Aws Code Pipeline

Bash

CircleCI

Datadog

Dns

Github Actions

Grafana

HTTP

Jenkins

Linux

New Relic

Prometheus

Python

Smtp

Tcp/Ip

Terraform

Tls

View all jobs at Octus

View Octus Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: New York, NY

708 Employees

Hybrid Workplace

Year Founded: 2013

What We Do

Founded in 2013, Reorg has fundamentally changed the way financial and legal professionals access complex and opaque business information.

Our unique editorial team combines reporting with financial and legal analysis to provide a holistic view of topical situations and delivers that view in real time through our proprietary platform, which is powered by machine learning and natural language processing applications.

Today, with offices on three continents, Reorg serves 26,000 professionals across the world’s leading hedge funds, asset managers, investment banks, law firms and financial advisors so they can make better business, investment and advisory decisions. Our vision is to be the best-in-class provider of complex and opaque credit information delivered in a clear, actionable way.

Why Work With Us

Reorg hires innovators and trailblazers across the globe to drive our business and our incredible corporate culture alike. Our core values define an organizational ethos that’s as high-performing as it is human. Reorg employees enjoy competitive health benefits, matched 401k and pension plans, and educational reimbursements for career development.

Gallery

Octus Offices

Learn More

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Reorg has adopted a hybrid working policy. For non-remote employees located within a reasonable commuting distance to one of our offices, the requirement is to work from the office at least 2 days per week.

Typical time on-site: 2 days a week

HQNYC Office

Bucharest Office

El Segundo Office

London Office

Pune Office

Vilnius Office

Washington DC Office

Learn more

View all jobs at Octus

View Octus Profile

Report Job