Site Reliability Engineer (Pinot)

Posted 6 Days Ago
Hiring Remotely in US
Remote
Senior level
Software
The Role
As a Site Reliability Engineer at StarTree, you will manage and tune large-scale distributed systems, specifically Apache Pinot. Your responsibilities include monitoring system performance, collaborating with customers to resolve incidents, executing disaster recovery strategies, and influencing system roadmaps based on troubleshooting experiences.
Summary Generated by Built In

At StarTree we're a group of passionate individuals that desire to improve the lives of many by developing tools and technologies that support availability and speed in the world of real-time analytics. 

Our aim is to make it simple for every company to delight their users - external and internal - and create new revenue streams from their data, by building the world’s most comprehensive and accessible cloud analytics system.

About the role:

StarTree is seeking exceptional Site Reliability Engineers for Pinot (SRE- Pinot), to manage, tune, and debug the large-scale highly available distributed systems. You will be working with a team of passionate and talented engineers in the automation, tuning, and troubleshooting of Apache Pinot. We are looking for motivated, hardworking, and focused individuals who have a real passion for operational excellence, data systems, and automation.

Responsibilities:

  • Leverage various monitoring and alerting services to solve intricate programming problems at scale.
  • Manage and tune multiple critical customer-facing Apache Pinot clusters
  • Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
  • Build a rapport with and work closely with customers to mitigate and resolve incidents
  • Execute disaster recovery strategies with minimal downtime
  • Collaborate with other engineers to understand and troubleshoot systems and use the experience gained to influence the roadmap of other teams
  • Debugging Pinot queries and ingestion when incidents occur.

Requirements:

  • 5+ years of experience as an engineer (SRE, SDET, or development)
  • Experience managing highly available production-facing distributed systems and in-depth knowledge of Java are a plus
  • Exposure with cloud platforms such as AWS, GCP, or Azure is a plus
  • Experience with Kubernetes and container orchestration is a plus
  • Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
  • Strong troubleshooting and critical thinking skills
  • Experience working with managing Apache Pinot is preferred
  • Experience building Java apps is required. 
  • Experience with zookeeper is a huge plus 

About StarTree:  

StarTree is a cloud-based software company that enables business customers to derive advanced insights from real-time and historical data. StarTree was founded by the core software engineering team and inventors of Apache Pinot, which currently powers hundreds of user-facing applications at companies across industries, including LinkedIn, Uber, Target, 7Eleven, Etsy, Walmart, WePay, Factual, Weibo, and more. StarTree Cloud has enabled even more companies to deploy and operate real-time analytics at scale, including Stripe, Sovrn, Roadie, Just Eat Takeaway.com, Dialpad, Guitar Center, Blinkit, and more.

StarTree recently announced our Series B Funding with investment from GGV Capital, Sapphire Ventures, Bain Capital Ventures, and CRV. We have been named one of The Information's 50 Most Promising Startups and one of CRN's 10 Coolest Cloud Computing Startup Companies of 2022!

Top Skills

Java
The Company
HQ: Mountain View, CA
60 Employees
On-site Workplace
Year Founded: 2019

What We Do

When you hear “decision maker,” it's natural to think, “C-suite,” or “executive.” But these days, we’re all decision-makers. So, yes, the CEO is a crucial decider, but the franchise owner, the student, or the big box shopper all have important choices to make, too. The difference is, those traditional decision-makers have access to relevant data to guide their thinking. All the others? Flying blind.

Our vision is to change that. We believe every decision-maker should have access to fast, fresh, actionable insights. We’re motivated to unlock what’s possible when you put the right information in the right hands at the right time. Like the restaurant owner who’s able to call in an extra cook because she can see the surge in orders coming. Or the holiday shopper who has plenty of time to pick a different gift for his loved one because he sees his original order is delayed at the moment the supplier registers a procurement delay. Or the use-case we can’t even envision yet, because it is sitting in your data, waiting to be set free.

Exposing timely information to real decision makers in intuitive apps that not only inform, but allow you to act on the information being shared. That’s how we’re unleashing the power of user-facing analytics.

Similar Jobs

NBCUniversal Logo NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote
Hybrid
New York, NY, USA
68000 Employees
110K-145K Annually

Motive Logo Motive

Site Reliability Engineer, Embedded

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
United States
3600 Employees
109K-156K Annually

Atlassian Logo Atlassian

Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees

Atlassian Logo Atlassian

Principal Site Reliability Engineer

Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Remote
San Francisco, CA, USA
11000 Employees
167K-269K Annually

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account