Lead Observability Engineer

Posted 14 Days Ago
Be an Early Applicant
Dallas, TX
Senior level
Healthtech • Travel
The Role
The Lead Observability Engineer leads a team to implement observability solutions, focusing on monitoring systems and troubleshooting incidents to maintain software availability. This role involves developing infrastructure as code scripts, designing monitoring solutions across various domains, and ensuring compliance with policies while also serving as a thought leader in observability and site reliability engineering.
Summary Generated by Built In

Job Description:

Position Overview

The primary responsibility of the Lead Observability Engineer is leading the technical direction and implementation of the pipelines and infrastructure focused on monitoring and observability within Sands.

This platform is essential, providing the tooling, practices, and visibility that our infrastructure and development engineering teams leverage to observe, and maintain the environments and platforms in our cloud and on-premises systems.

The Lead Observability Engineer will be responsible for a team of Observability Engineers ensuring that technical infrastructure build and operational teams have effective tools to monitor, observe and operate systems and platforms within the framework of large enterprise compliance and governance needs.

The team will develop, maintain and execute infrastructure such as code scripts and playbooks to automate deployment and maintenance tasks to ensure the availability, reliability, and efficient operation of the enterprise systems.

The position demands someone who is highly technically competent, detail oriented, and driven to stay current with evolving technologies.

All duties are to be performed in accordance with departmental and Las Vegas Sands Corp.’s policies, practices, and procedures. All Las Vegas Sands Corp. Team Members are expected to conduct and carry themselves in a professional manner at all times. Team Members are required to observe the Company’s standards, work requirements and rules of conduct.   

Essential Duties & Responsibilities

  • Lead a team of observability engineers in designing and implementing observability solutions, monitoring system health, and troubleshooting incidents to ensure high availability and performance of software applications and infrastructure.

  • Work with Central Head of Operations to decide 7 execute upon on priorities for monitoring, alerting and observability KPIs that are required.

  • Develop solutions to observability demands.

  • Deliver broad services that cover the following domains:

    • Log Collection and Analysis

    • Operational Metrics

    • Distributed Tracing

    • Build, Test, and Deployment Automation

    • Platform reliability engineering monitoring

  • Act as an evangelist for the observability domain across the enterprise and influence IT stakeholders to apply observability best practices.

  • Design, develop, and maintain automation solutions to support observability and operations, focusing on improving system monitoring, alerting, and reporting capabilities.

  • Provide technology and/or process solutions to high-impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards.

  • Own, develop, and be accountable for observability policies, processes, and architectural decisions.

  • Responsible for ensuring operational methods, procedures, facilities, and tools are established, reviewed, and maintained.

  • Monitor and research emerging observability trends and technologies with the potential to improve efficiency, security, and business capabilities.

  • Develop and execute proof-of-concept projects to evaluate new solutions for potential adoption.

  • Develop documentation (e.g., including data flow diagrams, logical diagrams, and physical diagrams) and training in compliance with standards.

  • Apply enterprise design principles and best practices for implementing and supporting observability services.

  • Operate with a limited level of direct supervision and exercise independence of judgment and autonomy.

  • Serve as advisor and coach to less senior team members, allocating work as necessary.

  • Be a strong thought leader in Observability, Site Reliability engineering Principles

  • Consistently share standard methodologies and improve processes within and across teams.

  • Perform job duties in a safe manner.

  • Attend work as scheduled on a consistent and regular basis.

  • Perform other related duties as assigned.

Minimum Qualifications

  • At least 21 years of age.

  • Proof of authorization to work in the United States.

  • Bachelor’s Degree in Computer Science, Engineering or related discipline required.

  • Advanced degree in technology or engineering is a plus.

  • Must be able to obtain and maintain any certification or license, as required by law or policy. 

  • 5-10+ years demonstrated experience leading distributed Monitoring, Observability, IT operations, DevOps, SRE, or observability groups with expertise in on-premises IT infrastructure, applications and private & public cloud monitoring.

  • Experience in ITRS, Geneos and OpsView is a plus.

  • Strong expertise with scripting in Python, Java and RESTful Services, with focus on building high throughput/High volume distributed systems.

  • Strong expertise in Linux/Unix, Container orchestration (e.g., Kubernetes), container runtimes and optimization.

  • Strong understanding of Site Reliability Engineering and DevOps principles.

  • Strong technical acumen in Cloud Architecture, Performance Benchmarking, and Capacity planning.

  • Demonstrated experience leading and growing engineers and teams.

  • Strong Cloud (AWS, GCP, Azure etc.) platform knowledge.

  • Proficiency in Project Management and work item management tools such as Azure DevOps and Portfolio.

  • Strong knowledge of logging systems, experience with ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or similar platforms.

  • Experience with tools like Harness, GitLab, Terraform, Ansible, or CloudFormation for managing and monitoring infrastructure.

  • Demonstrated experience diagnosing performance bottlenecks and other system issues using observability data.

  • Demonstrated understanding and respect of IT service management practices (e.g., change, release, incident, problem management).

  • Able to multi-task and handle various types of requests from different people/areas.

  • Strong analytical and problem-solving skills.

  • Effective written and verbal communication skills in English.

Physical Requirements

Must be able to:

  • Physically access assigned workspace areas with or without reasonable accommodation.

  • Work indoors and be exposed to various environmental factors such as, but not limited to, CRT, noise, and dust.

  • Utilize laptop and standard keyboard to perform essential functions of the job.

Top Skills

Java
Python
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Las Vegas, Nevada
947 Employees
On-site Workplace

What We Do

Founded in 1990, Las Vegas Sands is the preeminent developer and operator of world-class integrated resorts that drive valuable business and leisure tourism in the regions where we operate. Featuring an array of richly diverse and compelling offerings under one roof, our integrated resorts blend luxury hotels and state-of-the-art meeting and convention facilities with a variety of amenities such as gaming, celebrity chef restaurants, high-end shopping and an action-packed schedule of concerts, shows, exhibits and other attractions.

Sands has a 30-year track record of successfully developing and operating some of the largest and most complex business and leisure properties in the world, generating significant economic benefits for our host regions and enhancing their stature as global tourism and business capitals. Our integrated resorts propel continuous positive impact through tourism, jobs and community investments that make our regions great places to live, work and visit.

Sands is dedicated to being a good corporate citizen, anchored by the core tenets of serving people, planet and communities. We deliver a great working environment for our team members worldwide, drive social impact through the Sands Cares community engagement and charitable giving program and lead in environmental performance through the award-winning Sands ECO360 global sustainability program.

Sands is not just a developer. We are developers of positive impact.

Similar Jobs

Nexthink Logo Nexthink

Adopt Specialist Solution Consultant

Artificial Intelligence • Big Data • Information Technology • Software
Remote
Austin, TX, USA
1051 Employees

PwC Logo PwC

Managed Services - Data Engineering Technical Lead - Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
41 Locations
364000 Employees
83K-198K Annually
Hybrid
Austin, TX, USA
800 Employees

The PNC Financial Services Group Logo The PNC Financial Services Group

Software Engineer Lead

Machine Learning • Payments • Security • Software • Financial Services
Farmers Branch, TX, USA
56000 Employees

Similar Companies Hiring

Zealthy Thumbnail
Telehealth • Social Impact • Pharmaceutical • Healthtech
New York City, NY
13 Employees
Cencora Thumbnail
Pharmaceutical • Logistics • Healthtech
Conshohocken, PA
46000 Employees
Stepful Thumbnail
Software • Healthtech • Edtech • Artificial Intelligence
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account