Senior Staff Site Reliability Engineer, Monitoring & Observability

Posted 12 Days Ago
Easy Apply
Be an Early Applicant
Dublin
Hybrid
Mid level
Artificial Intelligence • Consumer Web • Edtech • Enterprise Web • HR Tech • Social Impact • Generative AI
Udemy is a learning company that empowers organizations and individuals with flexible and effective skill development.
The Role
As a Senior Staff Site Reliability Engineer, you'll lead monitoring and observability strategies, optimize systems, enhance visibility, and improve incident management across engineering teams.
Summary Generated by Built In
About us

At Udemy, we’re on a mission to transform lives through learning. Through our intelligent skills platform and a global community of instructors, we’ve helped over 70 million learners and 16,000 organizations achieve their goals. Come join us in ensuring everyone, everywhere has access to the skills they need to unlock their potential and create possibilities for themselves and others.

Hybrid work

Udemy is headquartered in San Francisco with global offices in Australia, India, Ireland, Türkiye, and other US locations. Our robust hybrid work model spans San Francisco, Denver, Ankara, Dublin, and Melbourne. This hybrid position requires two days per week in the office at the nearest hub. Learn more about us on our company page.

About you 

You are a motivated, meticulous Engineer with a team-oriented approach and exceptional problem-solving skills. You are organized and proactive and take the initiative to prioritize your own work and projects effectively.

You thrive in a collaborative environment and are eager to work with and learn alongside the best in Product, Design, and Engineering.

At Udemy, we value individuals who thrive in the face of complexity and love to turn challenges into solutions. As a Monitoring & Observability Engineer, you'll be a key player in building and evolving our systems. You know that complex systems are hard to measure and monitor, but you're driven to tackle these challenges head-on.

You have deep expertise in microservices and are passionate about optimizing the way we monitor, measure, and instrument them. User experience is at the heart of your work, and you're always thinking about how our metrics impact the way people interact with our systems. Linux is your natural environment, and you aren't afraid to dive deep into troubleshooting application, system, and network issues. You've worked with industry-leading monitoring tools like Datadog, New Relic, and Honeycomb, and you're always eager to refine your skills and learn new ones.

Above all, you're a strong communicator in English and excel at collaborating with engineers and teams across the organization.

We care less about your formal education or mathematical expertise and more about your hands-on experience and your passion for monitoring. If you're obsessed with building observability systems, automating repetitive tasks, and driving improvements across the board, we want you on our team.

Here’s what you will be doing:

  • Leading the evolution of our monitoring and observability strategy, making it a core pillar of how we work
  • Partnering with engineering teams to enhance the visibility and reliability of our systems, ensuring that we build for long-term success
  • Driving the standardization of SLIs + SLOs across all engineering teams, aligning on best practices
  • Owning and optimizing our current monitoring systems, including Datadog, Sentry, and other key tools
  • Collaborating with teams to proactively improve site availability, ensuring a seamless user experience
  • Leading incident analysis while fostering a Blameless Culture, ensuring that we learn from challenges and improve
  • Promoting best practices for on-call and incident management, ensuring teams are always prepared and resilient
  • Continuously improving developer happiness and productivity by automating manual tasks and creating processes that prevent surprises

About your skills:

  • 3+ years experience managing complex monitoring systems like Datadog, Honeycomb, or New Relic
  • Proficiency in programming languages such as Go (preferred), Python, Bash, or Java
  • Experience with incident management tools and processes, with at least 3 years on-call experience
  • Hands-on experience with paging tools and incident response frameworks
  • Solid understanding of Terraform, Kubernetes (K8s), and AWS for deployment and management
  • A knack for problem-solving, with the ability to think creatively and work collaboratively with peers
  • Excellent communication skills and a desire to continuously learn and grow within a fast-paced environment

We understand that not everyone will match each of the above qualifications. However, we also realize that everyone has unique experiences that can add value to our company. Even if you think your background might not perfectly align, we'd love to hear from you!

Life at Udemy 

We aspire to be as vibrant and dynamic as the communities we serve, as inquisitive as those who use our platform, and as revolutionary as the future we strive to open for everyone. Here are some of the things we love about life at Udemy:

  • We’re invested in creating an inclusive environment that welcomes a diverse range of backgrounds and experiences. From creating employee resource groups, ensuring we’re a Fair Pay Workplace, and building a flexible work culture, our belonging, equity, diversity, and inclusion (BEDI) initiatives always put our people first. We want you to be able to bring your authentic self to work because when we all do, we’re better for it.

  • Learning is what we do – inside and out. Our Learning & Development team is second to none, helping ensure your journey is one of continuous progression. You’ll also have unlimited access to Udemy courses, monthly UDays (meeting-free professional development days), and a generous annual professional development stipend.

  • Our reason to exist is to revolutionize learning – that calls for taking risks and learning from failures. Whether it’s our hackathons (a company-wide effort to envision new possibilities for our product) or sharing our prototypes, we see experimentation as a crucial step on the path to success.

  • We’re committed to creating world-class employee experiences and are proud of the recognition of this by Great Place to Work. 

Of course, the best thing about being part of Udemy is knowing your work makes a difference for people and organizations around the world. You’ve got the skills; why not use them to help others develop theirs?

At Udemy, we value diversity and inclusion and consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, veteran status, medical condition, or disability. 

Our Benefits Start with U

Our benefits start with you and were built to provide you and your family with the protection and care you need, making it easy to access the right coverage when you need it most. Benefits vary by region, and we encourage applicants to review our US Benefits and Ireland Benefits pages to get an understanding of some of the benefits we offer. For details on region-specific benefits, please refer to the information provided during the hiring process.

Information regarding data privacy is available within the Udemy Careers Privacy Notice.

Top Skills

AWS
Bash
Datadog
Go
Honeycomb
Java
Kubernetes
New Relic
Python
Terraform

What the Team is Saying

Nicole
Cyndia
Ned
Yukie
Stephen
Han
Wynne
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: San Francisco, CA
1,500 Employees
Hybrid Workplace
Year Founded: 2010

What We Do

At Udemy, we’re on a mission to transform lives through learning. Through our intelligent skills platform and a global community of instructors, we’ve helped 77 million learners and 17,000 organizations achieve their goals. Come join us in ensuring everyone, everywhere has access to the skills they need to unlock their potential and create possibilities for themselves and others.

Why Work With Us

As a learning company, we have a rich culture of curiosity. We offer employees free access to every course on the platform, as well as a $1,500 yearly stipend that can be used for educational opportunities, conferences, books, and more. We also host guest speakers and have a comprehensive and internal training curriculum. Become a lifelong learner!

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Udemy Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

We offer hybrid work schedules and hybrid working so our people can make work fit their unique needs.

Typical time on-site: 2 days a week
Company Office Image
HQSan Francisco, CA
Mexico
Ankara, TR
Austin, TX
Company Office Image
Chennai, IN
Denver, CO
Company Office Image
Dublin, IE
Gurugram, IN
İstanbul, TR
Melbourne, AU
Company Office Image
Mumbai, IN
Learn more

Similar Jobs

Udemy Logo Udemy

Senior Sales & Solutions Engineer

Artificial Intelligence • Consumer Web • Edtech • Enterprise Web • HR Tech • Social Impact • Generative AI
Easy Apply
Hybrid
Dublin, IRL
1500 Employees

Udemy Logo Udemy

Senior Program Manager, Instructor Engagement

Artificial Intelligence • Consumer Web • Edtech • Enterprise Web • HR Tech • Social Impact • Generative AI
Easy Apply
Hybrid
Dublin, IRL
1500 Employees

Udemy Logo Udemy

Senior Technical Support Analyst

Artificial Intelligence • Consumer Web • Edtech • Enterprise Web • HR Tech • Social Impact • Generative AI
Easy Apply
Hybrid
Dublin, IRL
1500 Employees

Udemy Logo Udemy

Sales Development Representative (Spanish and Italian)

Artificial Intelligence • Consumer Web • Edtech • Enterprise Web • HR Tech • Social Impact • Generative AI
Easy Apply
Hybrid
Dublin, IRL
1500 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account