Team Lead, Site Reliability Engineering

Posted 2 Days Ago
Be an Early Applicant
Toronto, ON
Senior level
Marketing Tech
The Role
As the Team Lead for Site Reliability Engineering, you will oversee the SRE team to enhance the reliability and performance of a SaaS product, ensuring automated monitoring and efficient process optimization while collaborating across departments to manage incidents and architect resilient systems.
Summary Generated by Built In

Overview: 

Guidepoint’s Engineering team thrives on problem-solving and creating happier users. As Guidepoint works to achieve its mission of making individuals, businesses, and the world smarter through personalized knowledge-sharing solutions, the engineering team is taking on challenges to improve our internal application architecture and create new products to optimize the seamless delivery of our services. 

The site reliability engineering team lead is responsible for ensuring the reliability, scalability and performance of a SaaS product running on Azure. The role involves, leading a team of SRE’s to proactively monitor, Automate and optimize system performance while fostering a culture of collaboration with development teams, innovations and continuous improvements. As the SRE lead, this person will act as the bridge between development ad operations driving best practices of in reliability engineering and proactive management of environments thru Observability, Key areas of focus would include maintaining uptime, monitoring performance, resolving incidents, optimizing capacity, managing error budgets, and collaborating with development teams to build resilient and maintainable systems.


This is a hybrid position based in Toronto. 

What You’ll Do:

  • Guide, mentor, and upskill the SRE team, ensuring alignment with organizational priorities
  • Design and implement monitoring strategies to ensure uptime and minimize failures
  • Automate manual processes to improve efficiency and reduce human error
  • Define, manage, and maintain SLOs and SLIs to ensure high availability of systems
  • Manage error budgets and trigger breach actions as per established policies
  • Enhance Datadog automated monitoring and alerting, ensuring critical events are managed through the Status Page
  • Lead incident response alongside engineering leads, support RCA efforts, and drive auto-remediation initiatives
  • Collaborate with Product, Support, Engineering, and Cloud Operations teams to deliver scalable and reliable solutions
  • Actively participate in cost optimization initiatives with Cloud Operations and Engineering
  • Handle escalated customer issues and ensure satisfactory resolution
  • Conduct regular team meetings and training sessions
  • Identify areas for process improvement and implement best practices
  • Provide insights and recommendations to enhance reliability and customer satisfaction

What You Have:

  • 8+ years of experience in software development and Site Reliability Engineering or Production Engineering
  • 3+ years of experience leading an SRE team with expertise in Infrastructure as Code (IaC) using Terraform and Ansible, managing and operating Kubernetes clusters, and implementing monitoring and observability solutions with Datadog
  • Comprehensive understanding of web application security
  • Strong system engineering background with Linux/Windows
  • Proficient in development with Python or Golang
  • Strong understanding of Azure libraries (Client, Management, Asset)
  • In-depth knowledge of web application SaaS platforms and architecture
  • Proficient in SQL and possibly other database operations
  • Strong communication skills
  • Expertise in technical writing and documentation
  • Ability to rapidly analyze issues, anticipate consequences, make decisions, and take action
  • Ability to work independently and as part of a team
  • Experience in presenting monthly reports and metrics to managers and stakeholders

What We Offer:

  • Paid Time Off
  • Comprehensive benefits plan
  • Company RRSP Match
  • Development opportunities through the LinkedIn Learning platform

About Guidepoint: 

Guidepoint is a leading research enablement platform designed to advance understanding and empower our clients’ decision-making process. Powered by innovative technology, real-time data, and hard-to-source expertise, we help our clients to turn answers into action.

Backed by a network of nearly 1.5 million experts and Guidepoint’s 1,300 employees worldwide, we inform leading organizations’ research by delivering on-demand intelligence and research on request. With Guidepoint, companies and investors can better navigate the abundance of information available today, making it both more useful and more powerful.

At Guidepoint, our success relies on the diversity of our employees, advisors, and client base, which allows us to create connections that offer a wealth of perspectives. We are committed to upholding policies that contribute to an equitable and welcoming environment for our community, regardless of background, identity, or experience.

#LI-DH1

#LI-Hybrid

Top Skills

Go
Python
The Company
HQ: New York, NY
2,882 Employees
On-site Workplace
Year Founded: 2003

What We Do

Guidepoint connects clients with vetted subject matter experts—Advisors—from our global professional network. Our clients leverage the insights and perspectives shared by our Advisors to stay informed and make better business decisions.

Our multinational client list includes nine of the top 10 global consulting firms, hundreds of hedge funds (including five of the largest firms), and many of the largest private equity firms and Fortune-ranked companies. Guidepoint’s fourteen offices on three continents provide 24/7, quick and agile service.

Similar Jobs

Block Logo Block

Senior Software Engineer, Bank Accounts

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Remote
Hybrid
Toronto, ON, CAN
12000 Employees
162K-251K Annually
10 Locations
2674 Employees

General Motors Logo General Motors

Body Controls Calibration Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Oshawa, ON, CAN
165000 Employees

General Motors Logo General Motors

Body Controls Calibration Engineer

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Hybrid
Markham, ON, CAN
165000 Employees

Similar Companies Hiring

JuiceMedia.AI Thumbnail
Marketing Tech • Machine Learning • Digital Media • Big Data Analytics • Analytics • Agency • AdTech
Marina Del Rey, CA
68 Employees
Effectv Thumbnail
Marketing Tech • Digital Media • AdTech
New York, NY
2157 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account