Infrastructure Engineer (GPU Cluster)

Posted 11 Days Ago
Be an Early Applicant
2 Locations
Remote
Senior level
Artificial Intelligence • Energy
The Role
The Infrastructure Engineer will own the infrastructure management, focusing on GPU clusters and cloud systems while enhancing performance and security.
Summary Generated by Built In

💧 About Pallon

At Pallon, a spin-off from ETH Zurich, we’re creating AI that automatically detects defects in sewer inspection videos and advises cities on when & how to fix them. By providing more precise, objective data, we aim to fix wastewater leaks, reduce CO2 emissions, and prevent urban flooding. Our mission is to make cities more sustainable and resilient.

The Role

We're looking for a seasoned infrastructure engineer to take full ownership of our infrastructure — from our high-performance GPU cluster to our cloud systems. You’ll be joining a small, deeply technical team building cutting-edge computer vision and deep learning systems.

This is a hands-on, high-impact role. You’ll lead critical decisions around architecture, performance, and scale, while also jumping in to solve real-world issues — whether that’s designing GPU scheduling strategies, tuning networking performance, or swapping out hardware.

You’ll collaborate closely with our platform and computer vision teams to make sure their tools run fast, reliably, and securely — and you'll have the autonomy to shape how that all comes together.

In this role, you might find yourself:

  • Designing and building a custom GPU cluster for deep learning workloads.

  • Deciding how we manage and scale our infrastructure — both on-prem and in the cloud.

  • Keeping systems running smoothly and securely — from data pipelines to distributed training jobs.

  • Troubleshooting weird kernel errors, configuring systemd units, or debugging Kubernetes evictions.

  • Making calls on when to script, when to automate, and when to just fix the thing.

You’ll be great in this role if:

  • You’ve spent 5+ years owning infrastructure end-to-end, ideally in startup environments.

  • You’re comfortable at every layer — from bare-metal servers and NVMe drives to container orchestration and cloud-native tools.

  • You have strong Linux fundamentals, and you know your way around networking, storage, and distributed systems.

  • You can code well enough to automate, debug, and build tooling across a variety of languages.

  • You communicate clearly and collaborate well — especially with engineers who aren’t infra specialists.

  • You thrive with autonomy and can manage your own priorities effectively.

  • You’re curious and fast-learning, especially when tackling new tools or challenges.

  • You have a university degree in Computer Science or a related field.

Bonus points for:

  • Experience with machine learning infrastructure or HPC clusters.

  • Familiarity with data engineering workflows and ETL pipelines.

Our Tech Stack

You don’t need to have experience with all of this — but here’s what we use today:

  • HPC Cluster (our hardware, colocated in a datacenter): Linux, Nvidia GPUs, Slurm, Infiniband

  • Cloud: Google Cloud Platform, Kubernetes, Docker, GitLab CI/CD

  • Data Analytics: DBT, BigQuery, Metabase

Read more about our Engineering team here.

😎 Benefits & Team Culture

As a part of Pallon, you will:

  • Contribute to a positive impact on society and the environment.

  • Develop a novel product that changes a whole industry.

  • Be part of a motivated, smart, fun, and supportive team of software engineers and AI researchers.

  • Own a part of Pallon and have a part in our success with our Employee Stock Option Plan (ESOP).

  • Work for the Underworld, not the Devil: exploring sewers virtually and in real life during our Pallon offsites.

  • Work from home or enjoy access to our beautiful office space located in Zürich.

Inclusion statement

At Pallon, we highly value equality of opportunity and inclusivity, and we would like to particularly encourage women and candidates from under-represented backgrounds to apply, even if you don’t match with 100% of the requirements.

Top Skills

BigQuery
Dbt
Docker
Gitlab Ci/Cd
Google Cloud Platform
Infiniband
Kubernetes
Linux
Metabase
Nvidia Gpus
Slurm
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Zurich
38 Employees
On-site Workplace
Year Founded: 2019

What We Do

Pallon is a service that uses artificial intelligence to quickly & objectively report defects in your sewer and manhole inspection footage

Similar Jobs

Pallon Logo Pallon

Infrastructure Engineer (GPU Cluster)

Artificial Intelligence • Energy
Remote
2 Locations
38 Employees

Pallon Logo Pallon

Infrastructure Engineer (GPU Cluster)

Artificial Intelligence • Energy
Remote
2 Locations
38 Employees

GitLab Logo GitLab

Senior Fullstack Engineer (Ruby on Rails/Vue.js), Create: Remote Development

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
29 Locations
2350 Employees

GitLab Logo GitLab

Engineering Manager, Plan

Cloud • Security • Software • Cybersecurity • Automation
Easy Apply
Remote
29 Locations
2350 Employees

Similar Companies Hiring

Stepful Thumbnail
Software • Healthtech • Edtech • Artificial Intelligence
New York, New York
60 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account