Job Description:
Position Overview
The primary responsibility of the Lead Observability Engineer is leading the technical direction and implementation of the pipelines and infrastructure focused on monitoring and observability within Sands.
This platform is essential, providing the tooling, practices, and visibility that our infrastructure and development engineering teams leverage to observe, and maintain the environments and platforms in our cloud and on-premises systems.
The Lead Observability Engineer will be responsible for a team of Observability Engineers ensuring that technical infrastructure build and operational teams have effective tools to monitor, observe and operate systems and platforms within the framework of large enterprise compliance and governance needs.
The team will develop, maintain and execute infrastructure such as code scripts and playbooks to automate deployment and maintenance tasks to ensure the availability, reliability, and efficient operation of the enterprise systems.
The position demands someone who is highly technically competent, detail oriented, and driven to stay current with evolving technologies.
All duties are to be performed in accordance with departmental and Las Vegas Sands Corp.’s policies, practices, and procedures. All Las Vegas Sands Corp. Team Members are expected to conduct and carry themselves in a professional manner at all times. Team Members are required to observe the Company’s standards, work requirements and rules of conduct.
Essential Duties & Responsibilities
-
Lead a team of observability engineers in designing and implementing observability solutions, monitoring system health, and troubleshooting incidents to ensure high availability and performance of software applications and infrastructure.
-
Work with Central Head of Operations to decide 7 execute upon on priorities for monitoring, alerting and observability KPIs that are required.
-
Develop solutions to observability demands.
-
Deliver broad services that cover the following domains:
-
Log Collection and Analysis
-
Operational Metrics
-
Distributed Tracing
-
Build, Test, and Deployment Automation
-
Platform reliability engineering monitoring
-
-
Act as an evangelist for the observability domain across the enterprise and influence IT stakeholders to apply observability best practices.
-
Design, develop, and maintain automation solutions to support observability and operations, focusing on improving system monitoring, alerting, and reporting capabilities.
-
Provide technology and/or process solutions to high-impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards.
-
Own, develop, and be accountable for observability policies, processes, and architectural decisions.
-
Responsible for ensuring operational methods, procedures, facilities, and tools are established, reviewed, and maintained.
-
Monitor and research emerging observability trends and technologies with the potential to improve efficiency, security, and business capabilities.
-
Develop and execute proof-of-concept projects to evaluate new solutions for potential adoption.
-
Develop documentation (e.g., including data flow diagrams, logical diagrams, and physical diagrams) and training in compliance with standards.
-
Apply enterprise design principles and best practices for implementing and supporting observability services.
-
Operate with a limited level of direct supervision and exercise independence of judgment and autonomy.
-
Serve as advisor and coach to less senior team members, allocating work as necessary.
-
Be a strong thought leader in Observability, Site Reliability engineering Principles
-
Consistently share standard methodologies and improve processes within and across teams.
-
Perform job duties in a safe manner.
-
Attend work as scheduled on a consistent and regular basis.
-
Perform other related duties as assigned.
Minimum Qualifications
-
At least 21 years of age.
-
Proof of authorization to work in the United States.
-
Bachelor’s Degree in Computer Science, Engineering or related discipline required.
-
Advanced degree in technology or engineering is a plus.
-
Must be able to obtain and maintain any certification or license, as required by law or policy.
-
5-10+ years demonstrated experience leading distributed Monitoring, Observability, IT operations, DevOps, SRE, or observability groups with expertise in on-premises IT infrastructure, applications and private & public cloud monitoring.
-
Experience in ITRS, Geneos and OpsView is a plus.
-
Strong expertise with scripting in Python, Java and RESTful Services, with focus on building high throughput/High volume distributed systems.
-
Strong expertise in Linux/Unix, Container orchestration (e.g., Kubernetes), container runtimes and optimization.
-
Strong understanding of Site Reliability Engineering and DevOps principles.
-
Strong technical acumen in Cloud Architecture, Performance Benchmarking, and Capacity planning.
-
Demonstrated experience leading and growing engineers and teams.
-
Strong Cloud (AWS, GCP, Azure etc.) platform knowledge.
-
Proficiency in Project Management and work item management tools such as Azure DevOps and Portfolio.
-
Strong knowledge of logging systems, experience with ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or similar platforms.
-
Experience with tools like Harness, GitLab, Terraform, Ansible, or CloudFormation for managing and monitoring infrastructure.
-
Demonstrated experience diagnosing performance bottlenecks and other system issues using observability data.
-
Demonstrated understanding and respect of IT service management practices (e.g., change, release, incident, problem management).
-
Able to multi-task and handle various types of requests from different people/areas.
-
Strong analytical and problem-solving skills.
-
Effective written and verbal communication skills in English.
Physical Requirements
Must be able to:
-
Physically access assigned workspace areas with or without reasonable accommodation.
-
Work indoors and be exposed to various environmental factors such as, but not limited to, CRT, noise, and dust.
-
Utilize laptop and standard keyboard to perform essential functions of the job.
Top Skills
What We Do
Founded in 1990, Las Vegas Sands is the preeminent developer and operator of world-class integrated resorts that drive valuable business and leisure tourism in the regions where we operate. Featuring an array of richly diverse and compelling offerings under one roof, our integrated resorts blend luxury hotels and state-of-the-art meeting and convention facilities with a variety of amenities such as gaming, celebrity chef restaurants, high-end shopping and an action-packed schedule of concerts, shows, exhibits and other attractions.
Sands has a 30-year track record of successfully developing and operating some of the largest and most complex business and leisure properties in the world, generating significant economic benefits for our host regions and enhancing their stature as global tourism and business capitals. Our integrated resorts propel continuous positive impact through tourism, jobs and community investments that make our regions great places to live, work and visit.
Sands is dedicated to being a good corporate citizen, anchored by the core tenets of serving people, planet and communities. We deliver a great working environment for our team members worldwide, drive social impact through the Sands Cares community engagement and charitable giving program and lead in environmental performance through the award-winning Sands ECO360 global sustainability program.
Sands is not just a developer. We are developers of positive impact.