Site Reliability Engineer

Posted 12 Days Ago
Hiring Remotely in USA
Remote
Mid level
Gaming • News + Entertainment • eSports
The Role
As a Site Reliability Engineer at Pavilion Payments, you will be responsible for building resilient infrastructure, improving system reliability through monitoring, incident management, automation, and collaborating with IT leadership on service level objectives. You'll also work with cloud, network, and security teams to optimize and maintain operations and ensure compliance with best practices.
Summary Generated by Built In

Pavilion Payments enables the world’s gaming entertainment leaders to create amazing consumer experiences and maximize spend across all of their physical and digital properties. Our complete suite of payment solutions enables safe, secure, and trusted cash access at the cage, on the casino floor, or online. Our compliance and security solutions offer additional layers of automation and risk protection. And our analytics solutions enable clients to view performance across all of their gaming properties.

About the Role

As Pavilion Pay’s inaugural Site Reliability Engineer (SRE), you will play a foundational role in building a resilient infrastructure and ensuring high availability across our systems. This position, part of IT Operations, will work closely with Network, Cloud Infrastructure, DevOps, and Cloud Architects to implement best practices in system reliability, observability, and automated response. This role emphasizes reliability, platform management, and network security.

Key Responsibilities:

Reliability and Incident Management

  • Establish and track reliability metrics such as Latency, Traffic, Errors, and Capacity, focusing on uptime across applications and products, with plans to expand monitoring to kiosk and edge networks.
  • Develop and refine monitoring systems using Grafana to ensure comprehensive visibility, focusing on continuous improvements in reliability.
  • Establish robust processes for incident response and root cause analysis, leveraging OpsGenie to ensure timely and structured responses.
  • Work with TailScale, SUSE, and F5 to support secure, resilient network connectivity and load balancing.

Platform Management and Service Objectives

  • Collaborate with IT leadership to define and maintain service level objectives (SLOs) and monitor performance against these standards.
  • Structure and optimize platform management with a focus on supporting uptime in our production environment.

Automation, IaC, and CI/CD Pipelines

  • Develop and maintain Terraform configurations for scalable, repeatable infrastructure deployment, focusing on minimizing manual tasks and ensuring resource consistency.
  • Work with DevOps to optimize CI/CD workflows using Azure DevOps, focusing on pipeline automation and deployment efficiency.
  • Automate repetitive tasks and enhance deployment processes within AKS and Azure environments, aiming to reduce potential deployment bottlenecks.

Network and Security Collaboration

  • Partner with network engineers to optimize and maintain F5 load balancers and Palo Alto Networks/Panorama for secure, resilient network operations.
  • Collaborate with security teams to ensure network traffic and access patterns align with security best practices, integrating observability into network operations.

Requirements:

  • Technical Skills Desired: Proficiency with SUSE, AKS, Linux, Azure Cloud, Grafana, Rancher, Terraform, Azure DevOps pipelines.
  • Monitoring Tools: Strong experience with Grafana for observability and OpsGenie for incident response, with a focus on maintaining uptime and proactive alerts.
  • Automation and Scripting: Proficiency in scripting (e.g., Bash, Python) and experience with TailScale for secure networking solutions.
  • Problem-Solving Mindset: Experience in identifying and remediating performance and security issues, focusing on proactive, long-term solutions.

First 90 Days:

  1. Understand the Product: Deepen familiarity with Pavilion Pay's products and their interdependencies.
  2. Develop Monitoring Structures: Work with current Grafana structures, defining future enhancements.
  3. Network Architecture and Monitoring: Learn our network architecture and support basic monitoring/alerting systems currently in place.
  4. Platform Familiarization: Gain familiarity with platform elements, especially Terraform, CI/CD, and SLO definitions, to support a highly reliable production environment.


Pavilion Payments provides equal employment opportunities to all employees and applicants for employment without regard to race, color, religion, sex (including pregnancy), national origin, ancestry, age, marital status, sexual orientation, gender identity or expression, disability, veteran status, genetic information or any other basis protected by law. Those applicants requiring reasonable accommodation to the application and/or interview process should notify a representative of the Human Resources Department

Top Skills

Azure
Bash
Linux
Python
The Company
HQ: Las Vegas, Nevada
104 Employees
On-site Workplace
Year Founded: 1995

What We Do

Pavilion Payments enables the world’s gaming entertainment leaders to create amazing consumer experiences and maximize spend across all of their physical and digital properties. Our complete suite of payment solutions enables safe, secure and trusted cash access at the cage, on the casino floor, or online. Our compliance and security solutions offer additional layers of automation and risk protection. And our analytics solutions enable clients to view performance across all of their gaming properties.

Visit our website at pavilionpayments.com to learn more about our solutions for casino debit and credit card cash advance, e-check, ATM, full-service TITO and payment kiosks, Anti-Money Laundering (AML) compliance assistance, layered security, and analytics

Similar Jobs

Atlassian Logo Atlassian

Principal Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
167K-269K Annually

The PNC Financial Services Group Logo The PNC Financial Services Group

Infrastructure / Site Reliability Engineer (GOV) - Tempus

Machine Learning • Payments • Security • Software • Financial Services
Remote
USA
56000 Employees

Voltage Park Logo Voltage Park

Site Reliability Engineer

Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
Remote
San Francisco, CA, USA
51 Employees
140K-180K Annually

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Similar Companies Hiring

News 12 Thumbnail
News + Entertainment • Digital Media • Consumer Web
Bethpage, NY
400 Employees
bet365 Thumbnail
Software • Gaming • eSports • Digital Media • Automation
Denver, Colorado
6100 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account