Staff Site Reliability Engineer

Posted 2 Days Ago
Be an Early Applicant
Hiring Remotely in United States
Remote
Senior level
Cloud • Legal Tech • Software
The global standard in no-code contract lifecycle management (CLM) software.
The Role
As a Staff Site Reliability Engineer, you will develop and implement reliable and scalable systems, lead post-incident reviews, collaborate with engineering teams on reliability targets, manage incidents, and contribute to disaster recovery strategies. You will also provide mentorship and improve operational metrics while participating in on-call rotations.
Summary Generated by Built In

As the most trusted global leader in data-first contract lifecycle management (CLM) software, Agiloft helps organizations manage the end-to-end process of proposing, negotiating, signing, and leveraging contracts using our flexible Data-first Agreement Platform (DAP). With contract data as the foundation, customers quickly and collaboratively reach agreement and leverage contract visibility to thrive with competitive advantage. Employing powerful, pragmatic artificial intelligence as a legal force multiplier, and robust integration capabilities as a data liberator, organizations around the world trust Agiloft’s certified implementers to deliver connected, intelligent, and autonomous solutions across the entire contract lifecycle.


Top analysts like Gartner, Forrester, and IDC agree, all showing Agiloft as a leader in the CLM space. Our no code platform is easily managed and administered by business users, which is why Agiloft is the contract you keep: nearly a full 100% of new customers are satisfied with their initial implementations, and some 97% of customers renew every year. Ours is a growing, vibrant, successful company that is at the forefront of a market that is becoming a must-have for all organizations.


We believe that the way to build the strongest, most vibrant place to work is to bring in individuals from all walks of life, and to support them in bringing their authentic selves to their day, every day. Our working philosophy is that “EX = CX”: when employee experience is excellent, so is customer experience. We support multiple Employee Resource Groups (ERGs), and offer a working environment that supports healthy work/life balance, including floating holidays and a quarterly, no-questions-asked wellness day.


Position Overview


As a Staff Site Reliability Engineer (SRE), you will be responsible for developing and implementing highly reliable and scalable system. You will work closely with different functional teams to create a stable, efficient, and scalable environment, leading complex projects requiring collaboration with multiple stakeholders.

Job Responsibilities

  • Define and enforce SRE best practices and standards.
  • Architect and implement highly reliable and scalable systems.
  • Lead complex post-incident reviews and implement systemic improvements.
  • Collaborate with product and engineering teams to set reliability targets.
  • Manage high-impact incidents and coordinate incident response.
  • Contribute to budget planning and resource allocation.
  • Lead efforts to establish disaster recovery strategies.
  • Provide technical leadership and mentorship to the SRE team.
  • Continuously track and improve metrics (for example, DORA) to optimize software delivery and operational performance.
  • Participate in on-call rotation.
  • Other duties as assigned

Required Qualifications

  • 8-10 years of experience in similar or related role
  • Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience)
  • In-depth knowledge of Cloud Ops technologies including Amazon Web Services (AWS) and Terraform or other Infrastructure as Code (IaC)
  • Advanced knowledge in Linux operating systems and troubleshooting OS issues
  • Expertise in setting up and managing monitoring tools (such as Prometheus, Grafana, Datadog, Nagios, Open Telemetry, ELK, or similar tools)
  • In-depth understanding of monitoring and alerting systems, networking principles (such as load balancing, CDN, and disaster recovery)
  • Strong understanding of:
  • Incident management
  • Capacity planning
  • Disaster recovery
  • Observability practices (in tools such as OpenTelemetry and Jaeger)
  • Advanced experience with or knowledge of with security measures and practices (for example, threat modeling, compliance, and secure coding practices)
  • Strong analytical and problem-solving skills
  • Knowledge with Linux systems and common system administration tasks
  • Strong understanding of programming/scripting languages (such as Python) including additional scripting skills in multiple languages to automate SRE operations
  • Excellent communication and teamwork skills
  • A willingness to learn and adapt in a fast-paced, dynamic environment

Preferred Qualifications

  • Familiarity with DevOps practices, infrastructure as Code tools, and Agile methodologies a plus

Ensuring a diverse and inclusive workplace is our priority. We are committed to an environment of acceptance where you are free to bring your full self to work. All employment decisions at Agiloft are based on business needs, job requirements, and individual qualifications without regard to race, color, religion or belief, national or social ethnic origin, sex, age, sexual orientation, gender identity and/or expression, parental status, marital status, Veteran status, or any other status protected by the laws or regulations in the locations where we operate. If you have a need that requires accommodation during the recruiting process, please let us know by contacting Director, Talent Acquisition, Brad Toothman at [email protected].

 

Applicants from underrepresented groups such as minorities, veterans, or individuals with disabilities encouraged to apply.


Applications will be reviewed as submitted. There will be no application deadline for this opportunity.

Top Skills

Python
The Company
HQ: Redwood City, CA
350 Employees
On-site Workplace
Year Founded: 1991

What We Do

As the global leader in contract lifecycle management (CLM) software, Agiloft is trusted to provide significant savings in purchasing, enable more efficient legal operations, and accelerate sales cycles, all while drastically lowering compliance risk. Agiloft’s adaptable no-code platform ensures rapid deployment and a fully extensible system. Using contracts as the core system of commercial record, Agiloft’s CLM software leverages AI to improve contract management for legal departments, procurement, and sales operations. Visit www.agiloft.com for more. 

We're hiring! To view our current job openings, please visit https://www.agiloft.com/jobs.htm.

Why Work With Us

We are a passionate group of humans dedicated to helping other humans thrive. We may work with contracts, but with careers at Agiloft, the most important contract we keep is the human contract, the commitment we have to each other.

Gallery

Gallery

Similar Jobs

EZ Texting Logo EZ Texting

Staff Site Reliability Engineer, Telecom & SMS

Information Technology • Marketing Tech
Remote
United States
74 Employees
155K-188K Annually

Upstart Logo Upstart

Senior Site Reliability Engineer

Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
Easy Apply
Remote
2 Locations
1500 Employees
160K-222K Annually

NBCUniversal Logo NBCUniversal

Site Reliability Engineer

AdTech • Cloud • Digital Media • Information Technology • News + Entertainment • App development
Remote
Hybrid
New York, NY, USA
68000 Employees
110K-145K Annually

Motive Logo Motive

Site Reliability Engineer, Embedded

Artificial Intelligence • Fintech • Hardware • Information Technology • Sales • Software • Transportation
Easy Apply
Remote
United States
3600 Employees
109K-156K Annually

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account