Staff Software Engineer, Site Reliability (SRE)

Posted 17 Days Ago
Be an Early Applicant
Menlo Park, CA
Senior level
Artificial Intelligence • Software • Conversational AI • Generative AI
The Role
As a Staff Software Engineer in Site Reliability, you'll enhance systems reliability, maintain production services and collaborate on scalable solutions for a growing user base.
Summary Generated by Built In

About the role

As one of the founding members of our Site Reliability Engineering function here at Character, you’ll have the opportunity to support our infrastructure with thousands of nodes, terabytes of data and millions of daily active users on our site.  You’ll be responsible for ensuring our product's reliability, scalability, and performance as we aggressively grow our user base, with a goal of growing to 3 billion users. Work closely with our development team to design and implement processes and systems that ensure the stability and availability of our service.

What you’ll do

  • Maintain production services and keep them operational.

  • Develop tools, Instrumentation and automation to monitor and optimize the performance and reliability of our service.

  • Develop, implement and maintain automation tools and processes to prevent and mitigate service disruptions.

  • Collaborate with development teams to design and implement scalable, reliable systems, CI/CD processes for deployment.

  • Establish and support SLAs and SLOs for our site

  • Provide system monitoring and incident alerts

  • Participate in on-call rotations to provide support for critical incidents and outages.

  • Develop plans for site reliability and disaster recovery

Who you are

Competitive candidates will have:

  • 5+ years of experience in a development focused DevOps/SRE role within a technology organization that has significant scale

  • Deep experience with and proven success in developing software tools and automation wherever needed using Python and Golang

  • Expertise with SQL, Linux, CI/CD, Kubernetes, Terraform to support a site/application within a large multi node infrastructure and a growing user base. 

  • Experience working with multiple cloud computing platforms such as GCP is also a must

  • Demonstrated experience to successfully and reliably troubleshoot technical issues and challenges across a range of platforms and systems

  • Experience with incident management and event postmortems

Outstanding candidates will have one or more of the following:

  • Familiarity with GPU clusters and/or HPC environments is preferred

  • Experience with monitoring and logging tools such as Prometheus and Grafana

  • Hands-on experience scaling a consumer product from early days into hypergrowth

About Character.AI

Character.AI empowers people to connect, learn and tell stories through interactive entertainment. Over 20 million people visit Character.AI every month, using our technology to supercharge their creativity and imagination. Our platform lets users engage with tens of millions of characters, enjoy unlimited conversations, and embark on infinite adventures.


In just two years, we achieved unicorn status and were honored as Google Play's AI App of the Year—a testament to our innovative technology and visionary approach.


Join us and be a part of establishing this new entertainment paradigm while shaping the future of Consumer AI!

At Character, we value diversity and welcome applicants from all backgrounds. As an equal opportunity employer, we firmly uphold a non-discrimination policy based on race, religion, national origin, gender, sexual orientation, age, veteran status, or disability. Your unique perspectives are vital to our success.

Top Skills

Ci/Cd
GCP
Go
Grafana
Kubernetes
Linux
Prometheus
Python
SQL
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Menlo Park, California
30 Employees
Remote Workplace
Year Founded: 2021

What We Do

Creating revolutionary open-ended conversational applications through breakthrough research.

Similar Jobs

BuildOps Logo BuildOps

Senior Backend Engineer (Java)

Cloud • Mobile • Software
Easy Apply
Hybrid
Los Angeles, CA, USA
300 Employees

Finch Logo Finch

Senior Software Engineer, Fullstack

Fintech • HR Tech • Software
Hybrid
2 Locations
68 Employees

Liftoff Logo Liftoff

Staff Software Engineer, Data Platform

AdTech • Big Data • Machine Learning • Marketing Tech • Mobile • Software
Easy Apply
Remote
3 Locations
645 Employees

BlackLine Logo BlackLine

Senior Software Engineer Test

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Remote
Hybrid
Pleasanton, CA, USA
1810 Employees
133K-167K Annually

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account