Cartesia

Researcher: Inference

Posted 6 Days Ago

Be an Early Applicant

San Francisco, CA

Mid level

Artificial Intelligence • Software

The Role

Conduct research to improve inference for AI models, optimize pipelines for performance, and develop novel techniques for efficient model execution and stateful inference.

Summary Generated by Built In

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

The Role

• Conduct cutting-edge research to improve the efficiency, scalability, and robustness of inference for state-of-the-art AI models across various modalities, including audio, text, and vision.

• Design and optimize inference pipelines to balance performance, latency, and resource utilization in diverse deployment environments, from edge devices to cloud systems.

• Develop and implement novel techniques for efficient model execution, including quantization, pruning, sparsity, distillation, and hardware-aware optimizations.

• Explore speculative decoding methods, caching strategies, and other advanced techniques to reduce latency and computational overhead during inference.

• Investigate trade-offs between model quality and inference efficiency, designing architectures and workflows that meet real-world application requirements.

• Prototype and refine methods for stateful inference, streaming inference, and task-specific conditioning to enable new capabilities and use cases.

• Collaborate closely with cross-functional teams to ensure inference research seamlessly integrates into production systems and applications.

What We’re Looking For

• Deep expertise in optimizing inference for machine learning models, with a strong understanding of techniques such as speculative decoding, model compression, low-precision computation, and hardware-specific tuning.

• Strong programming skills in Python, with experience in frameworks like PyTorch, TensorFlow, or ONNX, and familiarity with inference deployment tools such as TensorRT or TVM.

• Knowledge of hardware architectures and accelerators, including GPUs, TPUs, and edge devices, and their impact on inference performance.

• Experience in designing and evaluating scalable, low-latency inference pipelines for production systems.

• A solid understanding of the trade-offs between model accuracy, latency, and computational efficiency in deployment scenarios.

• Strong problem-solving skills and a passion for exploring innovative techniques to push the boundaries of real-time and resource-constrained inference.

Nice-to-Haves

• Experience with speculative decoding and other emerging techniques for improving inference performance.

• Familiarity with stateful or streaming inference techniques.

• Background in designing hybrid architectures or task-specific models optimized for inference.

• Early-stage startup experience or a track record of developing and deploying efficient inference systems in fast-paced R&D environments.

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Our perks

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Top Skills

Onnx

Python

PyTorch

TensorFlow

Tensorrt

Tvm

View all jobs at Cartesia

View Cartesia Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

33 Employees

Remote Workplace

Year Founded: 2023

What We Do

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Try Sonic at https://play.cartesia.ai and join our Discord at https://discord.com/invite/gAbbHgdyQM.