Senior Site Reliability Engineer - US/Canada

Posted 24 Days Ago
Hiring Remotely in United States
Remote
Senior level
Artificial Intelligence • Cybersecurity
The Role
The Senior Site Reliability Engineer will enhance the reliability and scalability of DataVisor's infrastructure. Responsibilities include automating deployment pipelines, monitoring production systems, optimizing big data platforms, and maintaining cloud reliability. The role also involves collaboration with engineering to improve system performance and manage capacity.
Summary Generated by Built In

Description

DataVisor is a next generation security company that utilizes industry leading unsupervised machine learning to detect fraudulent activity for financial transactions, mobile user acquisition, social networks, commerce and money laundering. Our solution is used by some of the largest internet properties in the world, including Pinterest, FedEx, AirAsia, Synchrony Financial, Zomato and Ping An, to protect them from the ever-increasing risk of fraud. Our award-winning software is powered by a team of world-class experts in big data, security, and scalable infrastructure. Our culture is open, positive, collaborative, and results driven. Come join us!

We are seeking a Senior Site Reliability Engineer (SRE) to join our growing team. The ideal candidate will have a passion for building reliable systems, experience with automation, and a solid understanding of large-scale distributed systems. You will work closely with the engineering team to improve reliability, scalability, and performance across our infrastructure.

You will report to CTO direclty and be working with a team of seasoned engineers to automate, increase the reliability and enhance the security of our production environment. Projects include scaling our global, multi-cloud footprint, optimize our large real-time decision platform and improve the reliability of our global cloud footprint.

Requirements

5+ years of experience with production environment running Linux

3+ years of experience working with cloud solutions such as AWS, Azure or Aliyun

Familiar with big data technology such as Spark and/or Flink

Love to automate tasks through coding and scripting

Experience with algorithms, data structures, complexity analysis and software design

Code well on Python, Java and Bash

Key Responsibilities:

  • Design, implement, and maintain release automation pipelines to streamline the deployment process.
  • Develop systems for proactive monitoring, auto-diagnosis, and incident resolution in production environments.
  • Work with big data platforms such as Apache Spark or Apache Flink, optimizing and scaling our data processing pipelines.
  • Perform maintenance and troubleshooting for databases, with preference for experience in Yugabyte, ClickHouse, and MySQL.
  • Ensure the reliability of cloud infrastructure using Kubernetes on AWS or GCP.
  • Participate in on-call rotation to ensure system reliability, with a focus on automation to minimize manual intervention.
  • Collaborate with engineering teams to improve system performance and manage capacity planning.

PREFERRED EXPERIENCE

  • Familiar with container technology such as Docker, Kubernetes
  • Experience with database system best practices on Yugabyte, Clickouse and MySQL etc.
  • Strong understanding of security best practices
  • Completed a SOC 2/PCI certification in the past is a big plus
Benefits
  • Health insurance
  • PTO and sick days
  • 401K Plan

Top Skills

Bash
Java
Python
The Company
Mountain View, CA
112 Employees
On-site Workplace
Year Founded: 2013

What We Do

DataVisor is a leading AI-Powered fraud and risk management platform that enables organizations to respond to fast-evolving cyber attacks and mitigate risks as they happen in real time. Our mission is to protect large consumer facing enterprises protect their business and their customers from digital threats and restore trust and safety online. DataVisor is venture-backed by New View Capital and Sequoia and is Series- C funded. It is recognized as an industry leader and has been adopted by many Fortune 500 companies across the globe.

Similar Jobs

Upstart Logo Upstart

Senior Site Reliability Engineer

Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
Easy Apply
Remote
2 Locations
1500 Employees
160K-222K Annually

NBCUniversal Logo NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote
Hybrid
New York, NY, USA
68000 Employees
110K-145K Annually

Motive Logo Motive

Site Reliability Engineer, Embedded

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
United States
3600 Employees
109K-156K Annually

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Similar Companies Hiring

RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account