Site Reliability Engineer

Posted 2 Days Ago
Easy Apply
Be an Early Applicant
Bogotá, Bogotá, D.C.
Hybrid
Senior level
Fintech • News + Entertainment • Software • Database • Financial Services
The Role
As a Site Reliability Engineer at Octus, you'll build, operate, and maintain high-performance services in a cloud environment. Responsibilities include risk mitigation, improving application reliability, automating tooling, diagnosing outages, and creating metrics for service monitoring, while participating in a 24/7 on-call rotation.
Summary Generated by Built In

Octus

Octus is a leading global provider of credit intelligence, data, and analytics. Since 2013, tens of thousands of professionals across hedge fund, investment banking, management consulting, and law firm verticals have come to rely on Octus to make better, faster, and more confident decisions in pace with the fast-moving credit markets.
For more information, visit: https://octus.com/

Working at Octus

Octus hires growth-minded innovators and trailblazers across the globe to drive our business and culture. Our core values – Action Oriented, Customer First Mindset, Effective Team Players, and Driven to Excel – define an organizational ethos that’s as high-performing as it is human. Among other perks, Octus employees enjoy competitive health benefits, matched 401k and pension plans, PTO, generous parental leave, gym subsidies, educational reimbursements for career development, recognition programs, pet-friendly offices (US only), and much more. 
Role

We’re looking for Site Reliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure across our cloud environment. Site Reliability Engineers combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes wherever possible, using whichever tools are best for the job. You’ll be the experts for the environments that you operate infrastructure in, helping partner teams build & configure their software to operate reliably within.We strongly believe in engineering teams being responsible for the operations of their services in production. In this role, you’ll work closely with engineers to advocate and participate in sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues.

What you'll do:

  • Identify, assess, and mitigate risks associated with our systems, applications, and infrastructure.
  • Proactively recognize sources of instability in distributed systems and analyze how complex systems fail from a reliability and resilience perspective.
  • Improve our applications availability, reliability, and observability and reduce outages to a minimum.
  • Implement DR strategies, including backups and recovery techniques with minimal downtime for different applications.
  • Automate and codify our tooling, processes, and infrastructure to speed up development and make them repeatable and error-proof.
  • Deep dive into issues and outages to establish root causes and communicate them to your business partners.
  • Write and maintain thorough documentation to share with your teammates around the world, allowing them all to function as a cohesive unit.
  • Participate in a 24/7 weekly on-call rotation with members of your team to troubleshoot incidents in a complex distributed systems environment.
  • Ability to create meaningful metrics and alerting for service health monitoring.

Skills and knowledge you should posses:

  • Bachelor's degree in Computer Science or a related field, or equivalent experience
  • 5+ years of experience in SRE, Devops or systems engineering
  • Proficient in command-line interface (CLI) operations, shell scripting (Python or Bash), and Linux system administration
  • Extensive experience working with Infrastructure as code technologies, preferably Terraform
  • Extensive experience working with major cloud providers, preferably AWS
  • Significant experience working with Observability and telemetry tools ( Datadog, AWS Cloudwatch,  New Relic, Prometheus, Grafana etc.)
  • Professional experience in working with at least one general purpose programming language (Python, PHP, Go, C# etc.)
  • Experience building CI/CD workflows with tools like Jenkins, CircleCI, Github actions or AWS Code pipeline
  • Fundamental understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP, SMTP

Bonus points (nice skills to have):

  • Database Systems Fundamentals (MySQL/Postgres) and administering them at scale including schema and query optimization
  • Familiarity working with event driven systems and messaging infrastructure (Kafka, RabbitMQ, AWS Kinesis etc.)
  • Experience working with containerized and serverless applications such as Docker, AWS ECS, Kubernetes and AWS Lambda
  • Experience working with web servers such as Nginx, Apache, Tomcat etc.
  • Application security, infrastructure security and SOC2 compliance experience

Equal Employment Opportunity

Octus is committed to providing equal employment opportunities to all employees and applicants for employment without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, pregnancy, veteran status, or any other legally protected status. We strive to create an inclusive and diverse work environment where all individuals are valued, respected, and treated fairly. We believe that diversity enriches our workplace and enhances our ability to innovate and succeed.

Top Skills

Bash
C#
Go
PHP
Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New York, NY
708 Employees
Hybrid Workplace
Year Founded: 2013

What We Do

Founded in 2013, Reorg has fundamentally changed the way financial and legal professionals access complex and opaque business information.

Our unique editorial team combines reporting with financial and legal analysis to provide a holistic view of topical situations and delivers that view in real time through our proprietary platform, which is powered by machine learning and natural language processing applications.

Today, with offices on three continents, Reorg serves 26,000 professionals across the world’s leading hedge funds, asset managers, investment banks, law firms and financial advisors so they can make better business, investment and advisory decisions. Our vision is to be the best-in-class provider of complex and opaque credit information delivered in a clear, actionable way.

Why Work With Us

Reorg hires innovators and trailblazers across the globe to drive our business and our incredible corporate culture alike. Our core values define an organizational ethos that’s as high-performing as it is human. Reorg employees enjoy competitive health benefits, matched 401k and pension plans, and educational reimbursements for career development.

Gallery

Gallery

Octus Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

Reorg has adopted a hybrid working policy. For non-remote employees located within a reasonable commuting distance to one of our offices, the requirement is to work from the office at least 2 days per week.

Typical time on-site: 2 days a week
HQNYC Office
Bucharest Office
El Segundo Office
London Office
Pune Office
Vilnius Office
Washington DC Office
Learn more

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account