Today’s complex, fast-paced systems have become a minefield of reliability risks—any of which could cause an outage that costs millions and destroys customer confidence. That’s why high-availability teams use Gremlin to find and fix reliability risks before they become incidents. The Gremlin Reliability Platform helps software teams proactively monitor and test their systems for common reliability risks, build and enforce reliability standards, and automate their reliability practices organization-wide. As the industry leader in Chaos Engineering and reliability testing, we work with hundreds of the world’s largest organizations where high availability is non-negotiable.
About the Role of Solutions Architect
Gremlin’s team is growing, and we’re seeking a passionate Solutions Architect to help prove the value of Reliability Management to customers. In this pre- and post-sales role, you will have the opportunity to demonstrate Gremlin Reliability Management and offer guidance on best practices for building reliable architectures. As customers convert to a paid subscription, you will advise on how to design and implement experiments to activate customers for their reliability journey.
In this role, you'll get to:
- Demonstrate Gremlin in customer calls and webinars
- Partner with sales team to drive technical wins and grow Gremlin’s customer base
- Participate and lead proof-of-concepts with potential customers
- Educate potential customers on Reliability and Chaos Engineering
- Work with existing customers on technical projects and assist in troubleshooting
- Consult with customers on the resiliency of their applications and architecture, diagnose gaps and recommend solutions
- Participate in technical workshops and conferences
Collaborate with different functions of the company including Product Marketing, Support, and Engineering
We'll expect you to have:
- 5+ years of experience as a Solution Architect in a tech company
- Excellent verbal and written communication skills
- Strong problem-solving skills
- Hands on experience with:
- Kubernetes Platforms, Managed and Unmanaged
- AKS, EKS, GKE
- OpenShift, Rancher
- Certified k8 Administrator is a plus
- Linux - Shell scripting, Certified Linux Administrator is a plus
- Container and Container Runtimes
- Operating Systems concepts (CPU, Memory, and networking)
- Kubernetes Platforms, Managed and Unmanaged
- Working knowledge of :
- Observability solutions - Application Performance Management
- Load Testing solutions (e.g JMeter, LoadRunner, Grafana K6)
- CI/CD and Automation Tools (e.g. Jenkins, Ansible)
- Service Mesh (e.g. Istio), REST APIs and related tools
- Familiarity with one or more programming Languages - Python, Java, Go
- Certification and experience with one or more public cloud providers including Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
Bonus experience:
- Experience in a SRE or DevOps role resolving production outages
- Knowledge of modern DevOps and SRE tools
- Integration into ITSM Tools
*The role does not offer sponsorship employment benefits.
**If you don't think you meet all of the criteria below but still are interested in the job, please apply. Nobody checks every box—we’re looking for candidates that are particularly strong in a few areas, and have some interest and capabilities in others.
Gremlin offers a competitive total rewards package, which includes:
- Base salary
- Equity
- Healthcare, dental, and vision benefits
- 401(k) with employer match.
- Variable compensation for specific roles.
Compensation is based on the candidate’s skills and qualifications.
About Gremlin:
Gremlin is a team of industry veterans and people eager to learn from one another. We set the standard for reliability and equip leading organizations with the mindset and expertise needed to drive reliability improvements that move the world forward. We’re backed by top-tier investors Index Ventures, Amplify Partners, and Redpoint Ventures. Our customers love us, and we’re thrilled to be a partner in their success.
What Do We Care About:
- We Care about our People
People are our critical differentiators. The company strives to treat our people with respect, empathy, and dignity. We expect that our people will treat each other similarly. In both cases, we will assume good intent. All are welcome at Gremlin. We know our differences make us stronger and that our best ideas and contributions can come from anyone at any level.
- We Care about Collaboration
Gremlin is strongest when we come together as one team with shared goals. Be the glue, not the glitter. But as a remote company, teamwork and collaboration won’t happen by accident. We approach every challenge as a shared challenge. We rely on each other for diverse perspectives and creative ideas. We celebrate our wins as a team.
- We Care about Results
Be high productivity, low drama. Results matter. To keep our pace, everyone owns the outcomes of their actions and takes action when needed. We reward speed over perfection. We empower each other to iterate and experiment.
You are welcome at Gremlin for who you are. The more voices and ideas we have represented in our business, the more we will all flourish, contribute, and build a more reliable internet. Gremlin is a place where everyone can grow and is encouraged. However you identify and whatever background you bring with you, please apply if this sounds like a role that would make you excited to come into work everyday. It’s in our differences that we will find the power to keep building a more reliable internet by building and designing tools used by the best companies in the world.
Visit our website to learn more - https://www.gremlin.com/about
Top Skills
What We Do
Gremlin’s Reliability Management Platform enables high-velocity engineering teams to standardize and automate reliability across their organizations without slowing down software delivery. Gremlin's Reliability Score sets the standard for reliability so there's no guesswork, and an automated suite of Reliability Management tools makes it easy to integrate reliability throughout the software lifecycle so there's no slowdown.