Site Reliability Engineer

Reposted 4 Days Ago
Palo Alto, CA
Mid level
Artificial Intelligence • Software • Generative AI
The Role
As a Site Reliability Engineer, you will ensure the reliability, availability, and performance of the company's cloud services. Responsibilities include incident management, automation, performance optimization, and collaborating on software development, all within a hybrid cloud environment.
Summary Generated by Built In

About Glean

We’re on a mission to make knowledge work faster and more humane. We believe that AI will fundamentally transform how people work. In the future, everyone will work in tandem with expert AI assistants who find knowledge, create and synthesize information, and execute work. These assistants will free people up to focus on the higher-level, creative aspects of their work.

We’re building a system of intelligence for every company in the world. On the surface, you can think of it as Google + ChatGPT for the enterprise. Under the hood, our platform is the connective tissue between AI and knowledge. It brings all of a company’s knowledge together, understands it at a deep level, provides industry-leading search relevance over it, and connects it to generative AI agents and applications.

Glean was founded by a seasoned team of former Google search and Facebook engineers who saw a need in the enterprise space for their technical depth and passion for AI. We’re a diverse team of curious and creative people who want to help each other get big things done—so we can help other teams do the same. 

We're backed by some of the Valley's leading venture capitalists—including Sequoia, Kleiner Perkins, Lightspeed, and General Catalyst—and have assembled a world-class team with senior leadership experience at Google, Slack, Facebook, Dropbox, Rubrik, Uber, Intercom, Pinterest, Palantir, and others.

Role

We are seeking a skilled and motivated Site Reliability Engineer (SRE) to become a valuable addition to our dynamic and innovative team. As a SRE, you will play a critical role in ensuring the reliability, availability, and performance of our cloud-based services and applications. You will work closely with our engineering teams to design, build, and maintain robust, scalable, and highly available cloud infrastructure.

Much of our software development focuses on building infrastructure to scale our operations in a hybrid cloud environment and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale and fast growth which are unique to Glean, while using your expertise in coding, algorithms, problem-solving, and SRE practices. We keep Glean applications up and running, ensuring our customers have the best and most reliable experience possible.

 

What you will do and achieve

  • Ensure High Availability: Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure. 
  • Incident Management: Play an active role in production on-call, responding swiftly to troubleshoot and resolve production issues. Minimize service disruptions and downtime by conducting thorough triaging and debugging of product or system issues. Continuously optimize the on-call process for sustainability and efficiency.
  • Automation and Tooling: Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
  • Performance Optimization: Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
  • Security and Compliance: Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
  • Monitoring and Alerting: Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
  • Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable SRE insights during launch reviews, influencing and enhancing system architecture.

Who you are

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 3+ years of experience as a Site Reliability Engineer or similar role, with a primary focus on managing cloud-based services and infrastructure.
  • 5+ years of experience with software development in one or more programming languages.
  • Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure.
  • Practical experience with containerization technologies, including Docker and Kubernetes. Familiarity with infrastructure as code tools like Terraform is essential.
  • Solid understanding of networking, security principles, and best practices.
  • Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively.

Benefits

  • Competitive compensation
  • Medical, Vision and Dental coverage
  • Flexible work environment and time-off policy
  • 401k
  • Company events
  • A home office improvement stipend when you first join
  • Annual education stipend
  • Wellness stipend
  • Healthy lunches and dinners provided daily

For California based applicants: 

The standard base salary range for this position is $155,000 - $250,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.

Top Skills

Alerting Tools
AWS
Azure
Cloud Platforms
Docker
Google Cloud Platform
Kubernetes
Monitoring Tools
Software Development
Terraform
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Palo Alto, CA
224 Employees
On-site Workplace
Year Founded: 2019

What We Do

Glean searches across all your company’s apps to help you find exactly what you need and discover the things you should know.

🔍 AI-powered workplace search.
💡 Personalized results and knowledge discovery.
⚡ Easy to use, ready to go— right out of the box.

Similar Jobs

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Voltage Park Logo Voltage Park

Site Reliability Engineer

Artificial Intelligence • Cloud • Hardware • Machine Learning • Other • Software • Infrastructure as a Service (IaaS)
Remote
2 Locations
51 Employees
140K-180K Annually

Alchemy Logo Alchemy

Site Reliability Engineer

Blockchain • Information Technology • Software • Cryptocurrency • Web3
Easy Apply
Hybrid
2 Locations
200 Employees

Xero Logo Xero

Site Reliability Engineer - Chaos Engineering

Cloud • Fintech • Information Technology • Machine Learning • Software
Hybrid
San Mateo, CA, USA
4700 Employees
185K-202K Annually

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account