Overview:
Guidepoint’s Engineering team thrives on problem-solving and creating happier users. As Guidepoint works to achieve its mission of making individuals, businesses, and the world smarter through personalized knowledge-sharing solutions, the engineering team is taking on challenges to improve our internal application architecture and create new products to optimize the seamless delivery of our services.
The site reliability engineering team lead is responsible for ensuring the reliability, scalability and performance of a SaaS product running on Azure. The role involves, leading a team of SRE’s to proactively monitor, Automate and optimize system performance while fostering a culture of collaboration with development teams, innovations and continuous improvements. As the SRE lead, this person will act as the bridge between development ad operations driving best practices of in reliability engineering and proactive management of environments thru Observability, Key areas of focus would include maintaining uptime, monitoring performance, resolving incidents, optimizing capacity, managing error budgets, and collaborating with development teams to build resilient and maintainable systems.
This is a hybrid position based in Toronto.
What You’ll Do:
- Guide, mentor, and upskill the SRE team, ensuring alignment with organizational priorities
- Design and implement monitoring strategies to ensure uptime and minimize failures
- Automate manual processes to improve efficiency and reduce human error
- Define, manage, and maintain SLOs and SLIs to ensure high availability of systems
- Manage error budgets and trigger breach actions as per established policies
- Enhance Datadog automated monitoring and alerting, ensuring critical events are managed through the Status Page
- Lead incident response alongside engineering leads, support RCA efforts, and drive auto-remediation initiatives
- Collaborate with Product, Support, Engineering, and Cloud Operations teams to deliver scalable and reliable solutions
- Actively participate in cost optimization initiatives with Cloud Operations and Engineering
- Handle escalated customer issues and ensure satisfactory resolution
- Conduct regular team meetings and training sessions
- Identify areas for process improvement and implement best practices
- Provide insights and recommendations to enhance reliability and customer satisfaction
What You Have:
- 8+ years of experience in software development and Site Reliability Engineering or Production Engineering
- 3+ years of experience leading an SRE team with expertise in Infrastructure as Code (IaC) using Terraform and Ansible, managing and operating Kubernetes clusters, and implementing monitoring and observability solutions with Datadog
- Comprehensive understanding of web application security
- Strong system engineering background with Linux/Windows
- Proficient in development with Python or Golang
- Strong understanding of Azure libraries (Client, Management, Asset)
- In-depth knowledge of web application SaaS platforms and architecture
- Proficient in SQL and possibly other database operations
- Strong communication skills
- Expertise in technical writing and documentation
- Ability to rapidly analyze issues, anticipate consequences, make decisions, and take action
- Ability to work independently and as part of a team
- Experience in presenting monthly reports and metrics to managers and stakeholders
What We Offer:
- Paid Time Off
- Comprehensive benefits plan
- Company RRSP Match
- Development opportunities through the LinkedIn Learning platform
About Guidepoint:
Guidepoint is a leading research enablement platform designed to advance understanding and empower our clients’ decision-making process. Powered by innovative technology, real-time data, and hard-to-source expertise, we help our clients to turn answers into action.
Backed by a network of nearly 1.5 million experts and Guidepoint’s 1,300 employees worldwide, we inform leading organizations’ research by delivering on-demand intelligence and research on request. With Guidepoint, companies and investors can better navigate the abundance of information available today, making it both more useful and more powerful.
At Guidepoint, our success relies on the diversity of our employees, advisors, and client base, which allows us to create connections that offer a wealth of perspectives. We are committed to upholding policies that contribute to an equitable and welcoming environment for our community, regardless of background, identity, or experience.
#LI-DH1
#LI-Hybrid
Top Skills
What We Do
Guidepoint connects clients with vetted subject matter experts—Advisors—from our global professional network. Our clients leverage the insights and perspectives shared by our Advisors to stay informed and make better business decisions.
Our multinational client list includes nine of the top 10 global consulting firms, hundreds of hedge funds (including five of the largest firms), and many of the largest private equity firms and Fortune-ranked companies. Guidepoint’s fourteen offices on three continents provide 24/7, quick and agile service.