Staff Site Reliability Engineer

Posted 16 Days Ago
Be an Early Applicant
28 Locations
Remote
Senior level
Machine Learning • Software
The Role
As a Staff Site Reliability Engineer, you will optimize infrastructure, enhance scalability, manage automation workflows, ensure compliance with security standards, and collaborate cross-functionally. You will also participate in on-call rotations to resolve production incidents and maintain reliability in services.
Summary Generated by Built In

We are seeking an experienced Staff Site Reliability Engineer to join our fully remote team. As a key player in our Engineering team, you will contribute to infrastructure design and optimization and have an impact on the scalability, resilience, and performance of Neptune solutions. This role demands a deep understanding of distributed systems, performance optimization, and the ability to drive significant business value through technical solutions.


Our tech stack (the bigger the overlap, the better):

  • Languages: Rust, JVM (Java, Spring, Scala, Kotlin), Python.
  • Data: ClickHouse, Kafka, Elasticsearch, Redis, MySQL.
  • Cloud platforms: Microsoft Azure, Google Cloud Platform (GCP).
  • DevOps tools: Kubernetes, Terraform, Helm.
  • Others: Protobufs, gRPC, Swagger.

 

Responsibilities:

  • Ownership of Site Reliability Process: Own the site reliability process and systems through all stages, from design and implementation to deployment and continuous maintenance.
  • Infrastructure Optimization: Ensure the scalability, resilience, and performance of Neptune solutions across global SaaS and client-hosted environments, including platforms such as GCP, Azure, AWS, and on-premise systems.
  • Automation Strategy Development: Design and implement automation workflows to streamline deployments, upgrades, and incident response, reducing manual tasks and enhancing operational efficiency and consistency.
  • Security and Compliance: Ensure infrastructure and processes meet security and industry standards, protecting sensitive data.
  • Cross-Functional Collaboration: Partner with development, product, customer success, and client teams to align on requirements and deliver robust, scalable, and reliable solutions.
  • Documentation and Knowledge Sharing: Document architecture, operational procedures, and troubleshooting guides to enable knowledge sharing, repeatability, and continuous improvement.
  • Incident Management: Participate in on-call rotations, effectively addressing and resolving production incidents to maintain system uptime and performance.


You might be a fit if you have:

  • 6+ years in SRE, DevOps, or related roles.
  • Strong experience managing and optimizing Kubernetes clusters for robust, scalable, and efficient infrastructure.
  • Proven expertise in designing and implementing automation solutions for infrastructure and application deployment, with experience in Terraform, Helm, and GitLab CI/CD.
  • Strong programming skills in Shell and Python.
  • Extensive experience with Linux system administration and network management.
  • Expertise in managing distributed computing systems and near real-time data streaming platforms.
  • Fluency in English, with solid communication skills for interacting with global customers.

Nice to have:

  • Experience in security best practices, compliance standards (e.g., SOC 2), and infrastructure hardening.
  • Experience with multi-cloud architecture and cloud-native technologies.
  • Experience in high-traffic, petabyte-scale data environments.
  • Experience with ClickHouse and Kafka deployments.

 

We offer:

  • Flexibility: 100% remote work with offices (co-works) in Warsaw/Wrocław/Poznań/Kraków available and flexible working hours;
  • Share in our success: Participate in the Employee Stock Option Plan and be part of our growth journey;
  • Time off: 20 paid service-free days per year;
  • Ownership and impact: Space to take action, bring your ideas to life, and make a real impact.


Any questions?

Check out our ultimate guide for candidates to the neptune.ai Engineering team.

Don’t hesitate to contact our Talent Acquisition team, and check out our About us page to get to know the story and faces behind Neptune.



By applying, you consent for neptune.ai to process your personal data to assess your suitability for the role you have applied for in accordance with the General Data Protection Regulation (GDPR). Your personal data will remain confidential and shared only with authorized personnel involved in the recruitment process. You have the right to access, rectify, or delete your personal data at anytime.
With your optional consent, we can retain your data for up to 12 months after the application to consider you for future suitable roles if you’re not a match for the current position.

Top Skills

Java
Kotlin
Python
Rust
Scala
The Company
Palo Alto, California
73 Employees
On-site Workplace
Year Founded: 2017

What We Do

neptune.ai makes it easy to log, store, organize, compare, register & share 𝗮𝗹𝗹 𝘆𝗼𝘂𝗿 𝗠𝗟 𝗺𝗼𝗱𝗲𝗹 𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗮 𝗶𝗻 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗽𝗹𝗮𝗰𝗲. - Automate and standardize as your modeling team grows. - Collaborate on models and results with your team and across the org. - Use hosted, deploy on-premises or in a private cloud. Integrate with any MLOps stack. 𝗧𝗮𝗸𝗲 𝗶𝗻𝘁𝗲𝗿𝗮𝗰𝘁𝗶𝘃𝗲 𝘁𝗼𝘂𝗿 𝗼𝗳 𝗮 𝗽𝘂𝗯𝗹𝗶𝗰 𝗻𝗲𝗽𝘁𝘂𝗻𝗲.𝗮𝗶 𝗽𝗿𝗼𝗷𝗲𝗰𝘁: https://bit.ly/3pSS1dZ

Similar Jobs

Sanity.io Logo Sanity.io

Senior Site Reliability Engineer

Artificial Intelligence • Enterprise Web • Software
Remote
28 Locations
190 Employees

Fivetran Logo Fivetran

Senior Staff Site Reliability Engineer

Big Data • Cloud • Software • Database
Remote
27 Locations
1200 Employees

GitLab Logo GitLab

Intermediate Site Reliability Engineer, Database Operations

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
28 Locations
2050 Employees

GitLab Logo GitLab

Intermediate Site Reliability Engineer, Durability

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
28 Locations
2050 Employees

Similar Companies Hiring

InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees
RunPod Thumbnail
Software • Infrastructure as a Service (IaaS) • Cloud • Artificial Intelligence
Charlotte, North Carolina
53 Employees
Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account