Technical Product Manager, AI Evals

Sorry, this job was removed at 06:06 p.m. (CST) on Thursday, Apr 03, 2025
14 Locations
Remote
140K-190K Annually
Software
The Role

Hi, we're The Browser Company 👋 and we're building a better way to use the internet.
Browsers are unique in that they are one of the only pieces of software that you share with your parents as well as your kids. Which makes sense, they're our doorway to the most important things — through them we socialize with loved ones, work on our passion projects, and explore our curiosities. But on their own, they don’t actually do a whole lot, they’re kind of just there. They don’t help us organize our messy lives or make it easier to compose our ideas. We believe that the browser could do so much more — it can empower and support the amazing things we do on the internet. That’s why we’re building one: a browser that can help us grow, create, and stay curious.
To accomplish this lofty task, we’re building a diverse team of people from different backgrounds and experiences. This isn’t optional, it’s crucial to our mission, as we need a wide range of perspectives to challenge our assumptions and shape our browser through a bold, creative lens. With that in mind, we especially encourage women, people of color, and others from historically marginalized groups to apply.

About The Role

The Browser Company is hiring for Technical Product Manager, AI Evals to help build the foundation for Dia, our browser-native AI assistant. This role is perfect for someone who loves turning complex user needs into structured training data and enjoys working at the intersection of AI and user experience.

You'll combine systematic thinking with meticulous attention to detail to create the datasets that help our models understand and assist users more effectively. Your work will be crucial to Dia's success, creating the training data that enables our AI to understand user intent and deliver helpful responses.

From evaluation sets to large-scale training data, you'll build the datasets that help us measure, improve, and scale our AI capabilities. This isn't just about data collection – it's about understanding our users deeply and translating their needs into examples that help our models learn and improve.

Overall you will...

  • Build high-quality datasets for model evaluation and training, from targeted eval sets to large-scale training data

  • Partner with engineers to ensure datasets are comprehensive, properly formatted, and easy to use

  • Work with product owners to understand product goals and translate them into effective training data

  • Collaborate with User Research and Membership teams to understand user needs deeply

  • Use support tickets and user feedback to inform and inspire dataset creation

  • Establish and maintain quality standards for our datasets

  • Navigate technical tools to manage and update training data

After 1 month you will...

  • Get onboarded onto the team with an onboarding buddy

  • Learn about our AI strategy and how datasets drive model improvements

  • Get familiar with our technical tools including Braintrust and GitHub

  • Begin contributing to evaluation datasets for specific features including labeling/annotation of existing sets

  • Start understanding patterns in user interactions with Dia

  • Regularly share feedback about Dia in our #dogfooding channel

After 3 months you will...

  • Independently create evaluation datasets for new features

  • Run evals yourself and in tandem with engineers, tracking scores over time

  • Regularly analyze user feedback to inform dataset creation

  • Develop systematic approaches to dataset organization

  • Learn to identify what makes a good training example

  • Start contributing to larger-scale dataset projects

After 6 months you will...

  • Own dataset creation for both evaluation and training purposes

  • Help define quality standards for dataset development

  • Work with product owners to align datasets with product vision

  • Create documentation and processes for dataset management

  • Build relationships across research and membership teams

  • Contribute insights about user patterns and model capabilities

Qualifications

  • You have 3+ years of hands-on experience with AI evaluation (”evals”), data labeling, or model fine-tuning. You have a strong understanding of best practices in these areas.

  • You have 5+ years of experience working with large datasets, from spreadsheets to user feedback, in a technical, product, or QA role.

  • You understand how users think about product capabilities, can distinguish between current features and future potential, and collaborate across teams to turn insights into action.

  • You're comfortable with technical tools like GitHub and can navigate engineering-adjacent systems (Sqlite, Python, Braintrust, or Xcode experience are a plus!)

  • You're excited about AI, language models, and taking creative approaches to dataset creation to ensure diverse, high-quality examples for AI training and evaluation.

  • We're primarily focused on hiring in North American time zones and require that folks have 4+ hours of overlap time with team members in Eastern Time Zone.

Compensation and Benefits

💰 With our flexible compensation model, employees have the ability to choose the cash-to-equity ratio that best suits their individual needs. Every offer we extend includes three options: a salary-optimized offer, an equity-optimized offer, and a balanced offer.

The annual salary range for this role is $140,000- $190,000 USD. The actual salary range offered will vary based on experience level and interview performance.
🧘🏻‍♀️ In addition to a competitive salary and equity package, we provide every employee with the following benefits:

  • comprehensive benefits package with employee medical, dental, and vision - we cover 100% of premiums for employees, and up to 95% for dependents

  • 401k plan

  • flexible vacation policy - on average, our team members take between 15-20 vacation days plus federal holidays (holidays vary by location)

  • remote-friendly working environment - our core working hours are 11 AM-2 PM Eastern Time, Monday-Friday

  • 12 weeks of paid parental leave

  • $1,500 USD home office stipend

  • Employees based in the US also receive additional services like free annual memberships to One Medical (where available), Talkspace, Teladoc, and HealthAdvocate

The Browser Company is a well-funded, ambitious startup of close to 100 people (and growing!) who are passionate about building great products. We are a remote-first, distributed team, with the option to work from office in Brooklyn, New York. We strongly support diversity and encourage people from all backgrounds to apply. 
🚙 To read more about what we value as a company, check out Notes on Roadtrips on our blog.

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
New York, New York
127 Employees
On-site Workplace
Year Founded: 2019

What We Do

The Browser Company of New York is a group of friendly humans working to make the internet feel more like home. But how?

The web browser is one of the most important tools we use — not just on our computers, but in our lives. The world has changed in the past 15 years, but our web browsers look and behave pretty much the same. We think it’s time to push the web browser forward again, which is why we built Arc — a browser that’s not just faster, but also more personal, focused, creative… and maybe even more fun.

If this is as exciting for you as it is for us, don't hesitate to say hello! We're always looking for great people to join our mission.

[email protected]

Similar Jobs

Easy Apply
Remote
United States
360 Employees
84K-99K Annually

Vannevar Labs Logo Vannevar Labs

Manager, Software Engineering (Collection)

Artificial Intelligence • Machine Learning • Software • Defense
Remote
3 Locations
130 Employees

Vannevar Labs Logo Vannevar Labs

Senior Full Stack Engineer

Artificial Intelligence • Machine Learning • Software • Defense
Remote
2 Locations
130 Employees

Vannevar Labs Logo Vannevar Labs

Senior Perception Engineer

Artificial Intelligence • Machine Learning • Software • Defense
Remote
2 Locations
130 Employees

Similar Companies Hiring

True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees
Caliola Engineering Thumbnail
Software • Machine Learning • Hardware • Defense • Data Privacy • App development • Aerospace
Colorado Springs, CO
53 Employees
Red 6 Thumbnail
Virtual Reality • Software • Hardware • Defense • Aerospace
Orlando, Florida
113 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account