Senior Site Reliability Engineer

Reposted 8 Days Ago
Be an Early Applicant
Bengaluru, Karnataka
Senior level
Big Data • Software
The Role
Design and optimize scalable Aerospike deployments, automate infrastructure, monitor system performance, ensure reliability, and manage incident responses in a 24/7 on-call rotation.
Summary Generated by Built In

About Aerospike
At Aerospike, we dream big. Our focus is helping companies tackle seemingly insurmountable problems and doing what’s never been done before. That is why we developed the world's leading real-time data platform that powers mission-critical applications at the world's most innovative, category-disrupting companies. Aerospike companies have deployed extreme-scale real-time applications to fight fraud, dramatically increase shopping cart size, enable global digital payments, and deliver hyper-personalized
user experiences to tens of millions of customers.
Customers like Airtel, Experian, Nielsen, PayPal, Snap, Verizon Media, and Wayfair rely on Aerospike as the data foundation for the future to help them act in the microsecond moments that matter.
Headquartered in Mountain View, California, Aerospike has a global presence with offices in London, Bangalore, and Tel Aviv.

Senior Site Reliability Engineer

As a Senior Site Reliability Engineer (SRE) for Aerospike Cloud, you will play a key role in designing, building, and optimizing scalable and resilient cloud-based Aerospike deployments. You will focus on enhancing reliability, performance, and automation, ensuring our platform efficiently supports multiple cloud product offerings. Your work will involve developing robust infrastructure, implementing intelligent monitoring, and driving continuous improvements to enhance system efficiency and scalability.

Key Responsibilities

  • Designing, implementing, and managing large-scale Aerospike deployments across multiple cloud environments, ensuring high availability and performance.
  • Developing deep expertise in Aerospike and its cloud deployment patterns, understanding failure scenarios, and designing resilient remediation strategies.
  • Automating infrastructure and service configurations to improve system efficiency, reliability, and scalability.
  • Building and maintaining monitoring, alerting, and observability solutions to proactively detect and resolve issues, ensuring system health.
  • Implementing and enforcing security best practices for cloud infrastructure, access control, and data protection to safeguard deployments.
  • Participating in incident response, post-mortems, and continuous improvement initiatives, driving long-term stability and reliability.
  • Collaborating with development teams to ensure new deployments and updates align with SRE best practices for reliability, performance, and scalability.
  • Being part of a 24/7 on-call rotation, responding to critical incidents and minimizing downtime through proactive mitigation strategies.

Required Experience

  • 6+ years of experience in Site Reliability Engineering (SRE), DevOps, or related fields, with a focus on building scalable, resilient, and automated cloud-based systems.
  • Hands-on experience designing, deploying, and optimizing production-grade, business-critical systems in cloud environments.
  • Expertise with at least one major public cloud provider (AWS, Google Cloud, or Azure), including cloud-native services and architectures.
  • Strong proficiency in infrastructure-as-code (IaC) tools such as Terraform to enable automated and reproducible infrastructure.
  • Experience in CI/CD pipeline design and implementation, enabling seamless, automated software delivery and infrastructure updates.
  • Deep understanding of Linux/Unix systems, networking fundamentals, and distributed system architectures.
  • Proficiency in scripting and software development using Python, Bash, or Go to build automation, tooling, and infrastructure enhancements.
  • Experience with containerization and orchestration technologies such as Docker and Kubernetes for efficient service deployment and scaling.
  • Hands-on experience with monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Elasticsearch, Kibana) to drive data-driven system improvements.
  • Strong problem-solving skills with an engineering-first mindset for improving system reliability, scalability, and performance.
  • Experience implementing security best practices for cloud infrastructure, access control, and data protection.
  • Excellent English communication skills (verbal and written) to collaborate effectively across teams and document key processes.

Preferred Skills and Qualifications

  • Hands-on experience managing and optimizing database deployments and services in production environments, ensuring high availability and performance.
  • Familiarity with Aerospike or other distributed NoSQL databases.
  • Relevant industry certifications, such as AWS Certified DevOps Engineer, AWS Certified Solutions Architect, Google Professional Cloud DevOps Engineer, or equivalent.
  • Kubernetes certifications such as Certified Kubernetes Administrator (CKA), Certified Kubernetes Application Developer (CKAD), or Certified Kubernetes Security Specialist (CKS).

Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.

Top Skills

Aerospike
AWS
Azure
Bash
Ci/Cd
Datadog
Docker
Elasticsearch
Go
GCP
Grafana
Kibana
Kubernetes
Linux
Prometheus
Python
Terraform
Unix
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Mountain View, CA
191 Employees
On-site Workplace
Year Founded: 2009

What We Do

The Aerospike Real-time Data Platform enables organizations to act instantly across billions of transactions while reducing server footprint up to 80%. The Aerospike multi-cloud platform powers real-time applications with predictable sub-millisecond performance up to petabyte scale with five-nines uptime with globally distributed, strongly consistent data. Applications built on the Aerospike Real-time Data Platform fight fraud, provide recommendations that dramatically increase shopping cart size, enable global digital payments, and deliver hyper-personalized user experiences to tens of millions of customers. Customers such as Airtel, Experian, European Central Bank, Nielsen, PayPal, Snap, Verizon Media and Wayfair rely on Aerospike as their data foundation for the future.

Similar Jobs

Easy Apply
Hybrid
Bengaluru, Karnataka, IND
1100 Employees
Easy Apply
Hybrid
Bengaluru, Karnataka, IND
1100 Employees

BlackLine Logo BlackLine

Senior Site Reliability Engineer

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Remote
Hybrid
Bengaluru, Karnataka, IND
1810 Employees
Easy Apply
Hybrid
Bengaluru, Karnataka, IND
1100 Employees

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account