ML Research Engineer Internship, FineWeb - US Remote

Posted 20 Hours Ago
Be an Early Applicant
Hiring Remotely in United States
Remote
Internship
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
Hugging Face is the leading open platform for AI builders.
The Role
As an ML Research Engineer Intern at Hugging Face, you will work with the FineWeb team to create high-quality web data through distributed data processing and model training. This role emphasizes open-source contributions and aims to enhance accessibility to cutting-edge machine learning technologies.
Summary Generated by Built In

Description

At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.

About the Role

High-quality datasets are the foundation of strong LLMs, yet, most labs releasing state-of-the-art models are vague when it comes to the pretraining data. At Hugging Face we want to enable all the community to build the best models by building and open-sourcing the finest datasets. FineWeb and FineWeb-Edu are examples of very strong, web-scale datasets we released this year while also open-sourcing the distributed processing library datatrove.

During this internship you will work alongside the FineWeb team and build the next generation of high-quality web data, by running distributed data processing and ablating the data quality by training small models. Checkout hf.co/science for more information about the science team at Hugging Face and the and blog posts for the work of this team specifically.

About You

If you love open-source but also have an eye for art and creativity, are passionate about making complex technology more accessible to engineers and artists, and want to contribute to one of the fastest-growing ML ecosystems, then we can't wait to see your application!

If you're interested in joining us, but don't tick every box above, we still encourage you to apply! We're building a diverse team whose skills, experiences, and background complement one another. We're happy to consider where you might be able to make the biggest impact.

More about Hugging Face

We are actively working to build a culture that values diversity, equity, and inclusivity. We are intentionally building a workplace where people feel respected and supported—regardless of who you are or where you come from. We believe this is foundational to building a great company and community. Hugging Face is an equal opportunity employer and we do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We value development. You will work with some of the smartest people in our industry. We are an organization that has a bias for impact and is always challenging ourselves to continuously grow. We provide all employees with reimbursement for relevant conferences, training, and education.

We care about your well-being. We offer flexible working hours and remote options. We support our employees wherever they are. While we have office spaces around the world, especially in the US, Canada, and Europe, we're very distributed and all remote employees have the opportunity to visit our offices. If needed, we'll also outfit your workstation to ensure you succeed.

We support the community. We believe significant scientific advancements are the result of collaboration across the field. Join a community supporting the ML/AI community.

Requirements

Please provide a cover letter mentioning why you would like to work in open-source at Hugging Face. We encourage you to mention your skills, potential expertise, and topics on which you would like to work.

Top Skills

Machine Learning
The Company
HQ: Brooklyn, NY
175 Employees
Hybrid Workplace
Year Founded: 2017

What We Do

Hugging Face is the leading open platform for AI builders.

For researchers, Hugging Face is the place to publish models and collaborate with the community. For data scientists, it’s where you can explore over 300k off-the-shelf models for any machine learning task and create your own. For software developers, it’s where you can turn data and models into applications and features.

Today, we host over 1 million models, datasets, and applications (including the latest Large Language Models and creative Generative AI experiences) that millions of people use every month. Our mission is to democratize good machine learning. We do this through open science (with projects like BigScience and BigCode), open source (with libraries like transformers, diffusers), and our commercial products and services to accelerate the adoption of good machine learning at companies.

Why Work With Us

We are a decentralized, global, hybrid company that wants you to spend your day focused on the work that excites you. We have an asynchronous communication style and an experimentation-heavy culture. We have a strong bias for impact, a generalist & diverse mindset, and a kind ambition.

Gallery

Gallery

Similar Jobs

Hugging Face Logo Hugging Face

Machine Learning Engineer Internship, TRL - US Remote

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
Remote
United States
175 Employees

Hugging Face Logo Hugging Face

Machine Learning Engineer Internship, Gradio - US Remote

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
Remote
United States
175 Employees

Hugging Face Logo Hugging Face

Open-Source Machine Learning Engineer - International Remote

Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Generative AI
Remote
United States
175 Employees
Remote
California, USA
7214 Employees

Similar Companies Hiring

Hedra Thumbnail
Software • News + Entertainment • Marketing Tech • Generative AI • Enterprise Web • Digital Media • Consumer Web
San Francisco, CA
14 Employees
HERE Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees
True Anomaly Thumbnail
Software • Machine Learning • Hardware • Defense • Artificial Intelligence • Aerospace
Colorado Springs, CO
131 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account