Description
We are looking for a DevOps Team Lead to lead our geographically diverse team and take ownership of our Cloud Infrastructure and Platform Engineering strategy, enabling high-scale, cutting-edge GenAI products running across 40+ Kubernetes clusters on GCP and AWS.
This role combines technical leadership, team management, and hands-on engineering, requiring solid expertise in cloud-native technologies, Kubernetes at scale, and modern DevOps principles. You will collaborate closely with engineering teams to design scalable infrastructure solutions, optimize developer workflows, and ensure platform reliability and efficiency.
Role and Responsibilities
- Team Leadership & Mentorship: Lead and manage a geographically distributed team, fostering growth, engagement, and professional development. Mentor engineers, conduct performance reviews, career growth planning, and encourage knowledge-sharing across R&D teams.
- Cloud & Kubernetes Management: Guide the design and implementation of scalable multi-cluster Kubernetes environments across GCP & AWS.
- Developer Experience & Enablement: Oversee the development of self-service tools and automation to improve efficiency for R&D teams.
- Incident & Reliability Engineering: Collaborate with engineering teams to optimize cost, performance, and reliability of production infrastructure through monitoring, capacity planning, and scaling strategies.
- Security & Governance: Drive best practices for RBAC, IAM, cloud security, and compliance, ensuring robust infrastructure security.
- Automation & Infrastructure as Code: Promote adoption of GitOps workflows and Infrastructure as Code (Terraform, Helm, Crossplane) for improved automation and consistency.
- Cross-Team Collaboration: Align cloud infrastructure goals with business needs by working closely with engineering, security, and product teams.
Technology Assessment: Evaluate and advocate for new technologies to enhance platform reliability, efficiency, and scalability.
Requirements
Technical Expertise:
- 7+ years of DevOps, SRE, or Platform Engineering experience.
- 5+ years working with public cloud platforms (AWS/GCP) at scale.
- Senior-level Kubernetes expertise, including experience managing enterprise-grade, multi-cluster environments.
- Experience with Infrastructure as Code (Terraform, Helm) and familiarity with GitOps principles (ArgoCD, FluxCD, etc.).
- Familiarity with observability and monitoring tools (Prometheus, Grafana, Datadog, OpenTelemetry, etc.).
- Proficiency in scripting and automation (Python, Go, Bash) for infrastructure management.
- Knowledge of cloud networking (VPC, load balancers, service meshes) and security best practices (RBAC, IAM, security groups, network policies).
- Experience with CI/CD pipelines, optimizing for performance, security, and developer velocity.
Leadership & Execution:
- 2+ years of experience managing or leading DevOps, SRE, or Platform Engineering teams.
- Proven experience leading geographically distributed teams with a strong focus on team engagement, mentoring, performance reviews, and career growth planning.
- Effective communication and collaboration skills, capable of aligning multiple stakeholders and teams.
- Strong incident management capabilities, including on-call escalation experience, root cause analysis, and postmortems.
- Passion for automation, self-service, and internal tool development to streamline workflows.
- Ability to influence and drive adoption of DevOps best practices, fostering a culture of automation, collaboration, and continuous improvement.
Preferred Qualifications (Nice-to-Have):
- Experience managing teams working with self-hosted on-prem deployments and managed private VPC deployments (Bring Your Own Cloud models).
- Advanced expertise in Helm and Crossplane for Kubernetes resource management.
- Experience in GenAI or large-scale SaaS platforms.
- Knowledge of SQL/NoSQL databases and distributed systems.
- DevSecOps experience, including security automation and compliance frameworks.
About Us
AI21 Labs is pioneering the development of Foundation Models and AI Systems for enterprises, accelerating the adoption of Generative AI in production.
Established in 2017 by AI visionaries Prof. Amnon Shashua, Prof. Yoav Shoham, and Ori Goshen, our mission is to equip businesses with cutting-edge LLMs and AI capabilities. Backed by leading investors like Pitango, Google, Nvidia, Intel Capital, and Comcast Ventures.
Join us on this exciting journey and advance your career with AI21 Labs!
Top Skills
What We Do
AI21 is pioneering the development of enterprise AI Systems and foundation models. Our mission is to transform cutting-edge deep tech research into enterprise-ready AI systems. We offer privately deployed models with unmatched security, privacy and reliability with tailored solutions for every organization. Founded in 2017, AI21 has raised $336 million from leading investors including NVIDIA, Google and Intel.