Haize Labs

Research Engineer

Posted 9 Days Ago

Be an Early Applicant

New York, NY

Mid level

Artificial Intelligence

The Role

Develop and implement advanced methods for optimizing AI applications, focusing on dynamic testing, reliable evaluation models, and data generation. Collaborate with customers to adapt tooling for various domains.

Summary Generated by Built In

Haize Labs gets LLM apps out of POCs and into production. We eliminate the risk and improve the reliability of LLM apps by haizing them -- i.e. rigorously, proactively, and continuously fuzz-testing them.

We are looking for Research Engineers to help develop our reliability platform, with a focus on:

Data-efficient alignment of evaluation models
Dynamic testing of AI applications
Observability and anomaly detection
Discrete optimization (with applications in architecture search and automated prompting)

Our work is both intellectually stimulating and practically in high demand. Your work will result in net-new primitives, frameworks, and algorithms for developing robust LLM applications. You work will directly influence how LLM apps are tested, verified, and deployed everywhere. You will directly influence how the world responsibly uses LLMs.

Responsibilities

Develop optimization, synthetic data generation, and fuzzing methods for breaking LLM systems.
Implement complex automated evaluation models and systems.
Go from research idea to code within hours; and iterate quickly on experiments and data.
Work directly with customers to adapt our tools for different domains.

Qualifications

First-author publications in top-tier ML venues (NeurIPS, ICML, ICLR, and others).
Bias for action & experimentation over philosophizing (though sometimes good).
Not interested in printing papers for papers' sake.
Some production engineering experience (e.g. ML in an applied setting). No spaghetti research code!
Some familiarity with ideas from active learning, weak supervision, synthetic data, functional verification, reinforcement learning, reward modeling, automated evaluation. A subset of this is fine.

Annual Salary

$150,000 – $600,000 USD

Logistics

Location policy: In NYC.
US visa sponsorship: If you are exceptional, we will sponsor.
Compensation and Benefits: We provide generous salary, equity, and benefits

We're Not Here to Play Games.

We're not here to write GPT wrappers or get rich quick off the AI bubble. We're here to solve the hardest problem in AI: making it safe, reliable, and production-ready.

Since our company's inception in 2024, we've amassed amazing customers like OpenAI, Anthropic, AI21, and several others. We've developed best-in-class tooling for evaluation, dynamic testing, red-teaming, observability, and continuous robustification. And we’re backed + advised by the founders of Cognition, Hugging Face, Weights and Biases, Nous, Etched, Okta, Replit and C-suite execs from Google, Stripe, Databricks, Robinhood, and more.

Our core team is exceptionally fit for this mission. We turned down Stanford PhDs, got into & rejected Y Combinator, wrote ML-guided matchmaking for 50,000+ students, built an educational nonprofit supporting 60 countries, and did some other cool things along the way. Our early hires include an MIT PhD with 21,000+ Physics/ML/Stats citations, a Datadog engineering manager who led their GenAI observability team, a Citadel quant with a huge open-source presence, and more.

We can only serve our mission with an incredibly high talent-density team. Come here to push yourself, learn fast, experience excellence, grow with each other, and pursue your life's work.

Top Skills

Active Learning

Automated Evaluation

Functional Verification

Fuzz Testing

Machine Learning

Optimization

Reinforcement Learning

Synthetic Data

Weak Supervision

View all jobs at Haize Labs

View Haize Labs Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: New York City, New York

10 Employees

On-site Workplace

What We Do

haize labs is the robustness and reliability layer underlying any AI model, in any industry, for any use case.

to solve the robustness problem, we haize (i.e. stress-test and red-team) AI models to preemptively discover all their failure modes before production.