About Us:
Modal is building the serverless compute platform to support the next generation of AI companies. In order to deliver the developer experience we wanted, we went deep and built our own infrastructure—including our own custom file system, container runtime, scheduler, container image builder, and much more.
We're a small team based out of New York, Stockholm and San Francisco. In just one year, we've reached 8-figure revenue, tripled our headcount, scaled to support thousands of GPUs, and raised over $32M in funding.
Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.
The Role:
Modal is looking for a software engineer to help us with some of the most challenging AI/ML problems our customers are facing. As a Forward Deployed ML Engineer, you will:
-
Help our customers architect and build complex AI applications
-
Optimize performance for open-source models and frameworks
-
Write examples and build demos that showcase Modal
-
Contribute to the core Modal stack
-
Help our community build cool stuff on top of Modal
Requirements:
We are looking for someone with these skills:
-
Experience working with AI applications (ideally some cool examples you want to demo!)
-
At least a few years professional software engineering experience (or solutions engineering or similar)
-
Willing to work in-person in New York City or Stockholm (SF also an option for very strong candidates)
Top Skills
What We Do
Deploy generative AI models, large-scale batch jobs, job queues, and more on Modal's platform. We help data science and machine learning teams accelerate development, reduce costs, and effortlessly scale workloads across thousands of CPUs and GPUs.
Our pay-per-use model ensures you're billed only for actual compute time, down to the CPU cycle. No more wasted resources or idle costs—just efficient, scalable computing power when you need it.