Staff Site Reliability Engineer

Posted 14 Days Ago
Be an Early Applicant
Alameda, CA
149K Annually
Senior level
Security • Software • Database
The Role
As a Staff Site Reliability Engineer at Saildrone, you'll lead engineering efforts to enhance system reliability and performance, focusing on monitoring architecture and observability best practices. Responsibilities include developing alert strategies, managing incidents, optimizing performance, and mentoring the SRE team to ensure high-quality service delivery.
Summary Generated by Built In

About Us

Saildrone is an oceanographic survey and maritime defense company creating a paradigm shift in how navies, civil governments, and commercial organizations obtain the real-time, accurate data required to monitor and protect our oceans. Saildrone’s fleet of uncrewed surface vehicles (USVs) carry purpose-built payloads supporting border protection, critical infrastructure security, hydrographic survey, offshore energy, and metocean monitoring. Powered by renewable wind and solar energy, Saildrone USVs provide long-duration operations measured in months, not days. Proprietary software applications and machine learning technology transform collected data into actionable insights and intelligence.

We are based in Alameda, CA, with offices in Washington DC and St. Petersburg, FL, and operate missions worldwide. We are backed by top-tier investors in the frontier tech and sustainability sectors, including Social Capital, Capricorn, Lux Capital, BOND Capital, and Emerson Collective.

This is an exciting opportunity with a fast-growing team at the cutting-edge intersection of big data services and autonomous hardware. You will be part of a high-performing, multidisciplinary team that delivers high impact for humanity and future generations.


The Role

We are seeking a talented Staff Site Reliability Engineer with a strong focus on observability and mentorship to join our dynamic team. In this role, you will act as a team tech lead, guiding engineering efforts to ensure the reliability, scalability, and performance of our systems while fostering a culture of continuous learning and improvement across the Software group. Your expertise in observability tools and practices will play a crucial role in scaling up Saildrone’s Site Reliability Engineering team, helping to ensure the quality of service that our customers have come to expect.


Responsibilities

  • Monitoring Architecture: Design and implement robust monitoring frameworks to track the health and performance of applications and infrastructure.
  • Observability Practices: Establish observability best practices, leveraging tools such as Datadog, Prometheus, Grafana, or similar to provide actionable insights.
  • Alerting Strategies: Develop and maintain effective alerting strategies to ensure prompt incident response while minimizing noise.
  • Incident Management: Lead incident response efforts, conducting thorough postmortems and root cause analyses to prevent future occurrences.
  • Performance Optimization: Analyze system performance metrics and logs to identify bottlenecks and implement solutions for optimization.
  • Collaboration: Work closely with development, operations, and product teams to integrate observability into the development lifecycle and improve system reliability.
  • Documentation: Create and maintain comprehensive documentation of monitoring setups, incident responses, and SRE best practices.
  • Capacity Planning: Collaborate on capacity planning efforts to ensure the infrastructure can scale to meet growing demands.
  • Tooling and Automation: Identify opportunities for automation in monitoring and alerting processes to improve efficiency and reliability.
  • Mentorship: Provide guidance and mentorship to our new SRE team and to the Software group as a whole, sharing expertise in monitoring, observability, and incident management.

Minimum Experience

  • 8+ years SRE experience. BA/BS in related field or equivalent experience.

Required Skills

  • Strong knowledge of AWS services and managing cloud-based infrastructure at scale.
  • Strong experience with monitoring and observability tools (e.g., Datadog, Grafana, Prometheus).
  • Strong proficiency with log management and analysis tools (e.g., Datadog Logs, ELK Stack, Splunk).
  • Skills in scripting languages (e.g., Python, Bash) for automation and custom monitoring solutions.
  • Strong experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
  • Strong proficiency with Kubernetes, Helm Charts, and Helm deployment patterns.
  • Understanding of key performance metrics and monitoring aspects (e.g., CPU usage, memory consumption, latency, error rates).
  • Expertise in setting up alerts, handling incidents, and performing root cause analysis.
  • High attention to detail for accurate monitoring, alert configuration, and performance tuning.
  • Experience with monitoring databases (e.g., MySQL, PostgreSQL, MongoDB) and understanding related performance metrics.
  • Effective communication skills to collaborate with cross-functional teams and report on system health and incidents.
  • Excellent problem-solving skills and a proactive mindset.

Desired Skills and Experience

  • AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer).
  • Experience with other cloud platforms (Azure, Google Cloud Platform).
  • Knowledge of networking fundamentals, including DNS, load balancing, and content delivery networks (CDNs).
  • Ability to anticipate potential issues and implement proactive monitoring strategies.

Physical Requirements

  • Work is performed on a computer and requires ability to operate a keyboard and other peripheral devices.

Location: This is a hybrid position in Alameda, CA. Our waterfront office offers beautiful views of San Francisco Bay in always sunny Alameda. Even our walls have good karma, our offices mix software development with a hardware production line in the former airplane hangar used to film 'The Matrix'.

Benefits:

  • Paid time off, including vacation, bereavement, jury duty, sick time and parental leave
  • Comprehensive and competitive medical, dental and vision plans, and HSA with employer matching.
  • Company sponsored life insurance
  • Stock Options
  • Annual stipend for continued learning and development
  • Quarterly company BBQs at our Alameda HQ (bring your friends and family!)
  • Free Bay Area Public Transportation via AlamedaTMA with the BayPass Clipper Card
  • Plenty of snacks in our 3 office locations
  • Dog-friendly work environment


A reasonable estimate of the current range is $149,400 to 198,000 annually.


Catch up on the latest news about us:

TIME 100 Most Influential Companies 2024: Saildrone

The Tiny Craft Mapping Superstorms at Sea – The New York Times
An Underwater Mountain was Newly Discovered off California Coast – San Francisco Chronicle

The Navy Is Using Robot Ships to Deter Human Smuggling out of Haiti – Defense One

How US Navy Experiments Could Get Drones Beyond Spying and Into Battle – Defense News

USVs Could Deter IUU Fishing – USNI Proceedings

Mullen, Former Joint Chiefs Chairman, to Lead Board for Unmanned Tech Firm Saildrone – Breaking Defense

Saildrone’s First Aluminum Surveyor Autonomous Vessel Splashes Down for Navy Testing – TechCrunch

Saildrone Featured Videos Playlist


We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
At Saildrone, we value diversity and are committed to creating an inclusive workplace that welcomes people from all backgrounds, experiences, and perspectives. We believe that a diverse and inclusive team leads to innovation and better problem-solving. We encourage applications from candidates of all genders, ethnicities, races, sexual orientations, disabilities, and backgrounds.
Individual compensation packages are based on geographic location, scope of the role, relevant experience, and the ability to deal with complexity and problem solve within our organization, among other factors.
All employees are required to provide proof of authorization to work in the U.S. within their first 3 days of work. Please note that the Company does not sponsor employees for work visas or permanent resident cards to work in the U.S. If you need sponsorship for a work visa or green card, you will not be qualified for employment with Saildrone.

Any unsolicited resumes/candidate profiles submitted through our website or to personal email accounts of employees of Saildrone are considered property of Saildrone and are not subject to payment of agency fees.

#LI-Hybrid

#LI-LP1

Top Skills

Python
The Company
HQ: Alameda, CA
145 Employees
On-site Workplace
Year Founded: 2014

What We Do

Saildrone provides comprehensive turnkey data solutions for maritime security, ocean mapping, and ocean data. The company provides real-time access to critical data from any ocean on earth, 24/7/365, and uses proprietary software applications to transform that data into actionable insights and intelligence. Saildrone’s fleet of uncrewed surface vehicles (USVs), powered by renewable wind and solar power, have a minimal carbon footprint and are designed to make ocean intelligence cost-effective at scale. Saildrones operate 24/7/365, without the need for a crewed support vehicle, and have sailed over 750,000 nautical miles from the Arctic to the Antarctic and spent more than 17,000 days at sea in the harshest ocean conditions on the planet.

Similar Jobs

NBCUniversal Logo NBCUniversal

Staff Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote
Hybrid
Los Angeles, CA, USA
68000 Employees
145K-175K Annually
Easy Apply
3 Locations
1100 Employees

Roblox Logo Roblox

Senior SRE, Compute Orchestration

Computer Vision • Gaming • Software • Virtual Reality • Web3 • Metaverse
San Mateo, CA, USA
2500 Employees
193K-239K Annually

BlackLine Logo BlackLine

Senior Site Reliability Engineer - FedRamp

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Hybrid
Los Angeles, CA, USA
1810 Employees
157K-196K Annually

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account