Overview:
We are seeking an experienced Staff Systems Engineer to join our growing infrastructure team. As we advance our AI stack and scale our infrastructure, you will play a pivotal role in designing, maintaining, and optimizing our systems. Your expertise will help ensure high availability, security, and performance while enabling seamless deployment for our customers and internal teams.
As a key technical leader, you will guide architectural decisions, mentor engineers, and drive improvements across our infrastructure. Your contributions will be instrumental in shaping the future of our AI-powered solutions.
Your Responsibilities:
-
Lead the design, development, and optimization of the Pharia AI stack and the supporting infrastructure.
-
Define best practices and guide teams in writing Helm charts and deploying their artifacts efficiently.
-
Architect, set up, and maintain highly available Kubernetes (K8s) clusters on StackIT or similar cloud platforms.
-
You know how to design, build and maintain Kubernets Operators.
-
Provide strategic guidance and hands-on assistance to customers for deploying and maintaining our products on their infrastructure.
-
Ensure compliance with security and reliability best practices; represent the team in audits and respond to security questionnaires.
-
Act as a technical leader, mentoring engineers through pair programming, code reviews, and technical discussions.
-
Drive automation efforts and improve CI/CD pipelines to enhance deployment efficiency and system resilience.
-
Collaborate with cross-functional teams to align infrastructure with business and product goals.
Your Profile:
-
Extensive experience in designing, deploying, and maintaining Kubernetes clusters in production environments.
-
Automation & CI/CD Expertise: Proficiency in tools such as Helm, Ansible, Terraform, ArgoCD, GitLab CI, and JFrog.
-
Experience with Kubernetes Operators design and implementation.
-
Strong programming skills in at least one language from our stack: Rust or Go.
-
Deep understanding of security, reliability, and scalability best practices for infrastructure.
-
Proven experience mentoring engineers, leading technical projects, and setting best practices.
-
Excellent communication and collaboration skills, with a track record of contributing to a culture of learning and innovation.
-
Experience working in fast-paced startup environments is a plus.
What You Can Expect From Us:
-
Become part of an AI revolution!
-
30 days of paid vacation
-
Access to a variety of fitness & wellness offerings via Wellhub
-
Mental health support through nilo.health
-
Substantially subsidized company pension plan for your future security
-
Subsidized Germany-wide transportation ticket
-
Budget for additional technical equipment
-
Flexible working hours for better work-life balance and hybrid working model
-
Virtual Stock Option Plan
Top Skills
What We Do
We are an AI research and application company that researches, develops and operationalises large-scale AI models for language, image data and strategy, thereby contributing to securing Europe's digital sovereignty