Lenfest Internship- Data Scientist

Posted 14 Days Ago
Philadelphia, PA
Internship
Digital Media • News + Entertainment
The Role
The Lenfest Internship for Data Scientists at the Philadelphia Inquirer involves developing machine learning applications for local newsrooms, focusing on NLP techniques for location extraction, building data pipelines, and optimizing the BERT model while collaborating with a research team.
Summary Generated by Built In

The Philadelphia Inquirer is a public benefit corporation owned by the nonprofit Lenfest Institute for Journalism. Together, we're at the center of a critical mission to create a lasting future for ambitious, engaging, and useful local journalism. We're doing this, in part, by deepening our connection with the communities we serve. Our integrated digital and print platforms are the Philadelphia region's largest media network. We're passionate about building a sustainable model for indispensable local journalism, and we take pride in finding diverse, dynamic, and talented individuals to help push our team forward.

Description

The Philadelphia Inquirer in partnership with the Brown Institute at Columbia Journalism School are hiring students to assist on a project titled 'Developing machine learning applications that operationalize DEI best practices in local newsrooms'. The project includes producing

open-source methods and automated tools for news organizations to extract geographic data from news coverage for analysis and future product development, and builds upon existing work by the partnership.

The student will work under the supervision of members from the Philadelphia Inquirer and Brown Institute at Columbia Journalism School and alongside a team of other researchers. The goal of the project is to produce a well documented, open source toolset that any local newsroom with an API or access to machine-readable content can use.

The first phase of the project is focused on the data pipeline and is two-fold: (1) experiment with NLP techniques to successfully extract locations and (2) build a pipeline for training data preparation that can be audited and reviewed by non-technical members of the project.

The second phase of the project is focused on the machine learning portion of the pipeline. The goal is to iterate on the fine-tune training of BERT, as well as explore other potential ML opportunities.

The third phase of the project is focused on location extraction and building a pipeline for others to use the tool.

The final phase of the project is focused on geocoding, and taking entities identified in the second and third phases and assigning them geography provided by geocoding services, gazetteers, and other third-party data providers.

Deliverables

The student will work among a team of researchers to deliver on these project phases. The student is expected to attend weekly meetings with the team and supervisors to review progress throughout each phase.

Key deliverables include the following:

Phase one: Building a pipeline for NLP location extraction using open-source libraries with applied heuristics. This includes documenting performance of any approach taken and

recommendations for final application. The pipeline will be delivered in a Colab notebook or set of notebooks. The student (and team) will also be responsible for building a pipeline for assembling a training dataset. Actual data assembly will be performed by another team of researchers.

Phase two: The student (and team) will iterate on adjusting parameters in the fine-tune training portion of BERT and will document the accuracy of the model(s). The student (and team) will also experiment with other ML-based approaches and provide recommendations alongside the NLP recommendations developed in phase one. Output from this phase will be delivered in notes as well as a Colab notebook or set of notebooks.

Phase three: The student (and team) will work on producing a pipeline to apply the ML-model, as well as NLP-based approaches to extract location entities from any sentence or news story. Deliverables from this phase of The Project will be provided in the form of a Colab notebook or set of notebooks, including documentation of ingesting data from a variety of inputs, including file uploads and connections to an API/database.

Phase four: the student (and team) will work on enhancing the geocoding aspect of the pipeline. This includes providing additional context to the entities being passed to geocoding APIs to strengthen their return. It also includes providing inputs for new users of the tool to manipulate the location of their paper (and relevance of locations), building a database of returned locations to prevent future geocoding costs, and constructing a pipeline to connect to third-party data providers for better results (i.e. Placekey). Deliverables from this phase of The Project will be provided in the form of a Colab notebook or set of notebooks.

Resources required

All work on the project will be conducted off-site on personal equipment. Data will be provided by The Philadelphia Inquirer, as well as another team of researchers responsible for constructing the training dataset. Computational resources, including servers, storage, and video meetings will be provided by The Philadelphia Inquirer and the Brown Institute at Columbia Journalism School.

Final product

The student (and team) will deliver a folder of Colab notebooks as well as extensive documentation for each phase of the project. The student (and team) will be supervised should they be interested in producing a research paper documenting an aspect (or aspects) of the project and its output.

Pay Rate - $25.00/Hour

* We know not everyone reading this will fit exactly what we've described. We encourage everyone to apply who shares our passion for indispensable journalism and our drive to create a sustainable business model to support it. As an equal opportunity employer, The Inquirer is committed to fostering a diverse and inclusive culture, and we especially encourage members of underrepresented communities to submit an application, including women, people of color, LGBTQ people, and people with special needs*
We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law.

Other details

  • Pay Type Hourly
  • Job Start Date Monday, August 16, 2021


Apply Now

Top Skills

Python
The Company
HQ: Philadelphia, PA
1,001 Employees
On-site Workplace

What We Do

Since 1829, The Philadelphia Inquirer has been “asking on behalf of the people” by providing essential journalism for the diverse communities of the Philadelphia region. The Inquirer, a for-profit public benefit corporation owned by the non-profit Lenfest Institute, produces Pulitzer Prize-winning journalism that changes lives and leads to lasting reforms. Its multiple brand platforms — including newspapers, Inquirer.com, e-Editions, apps, newsletters, and live events — reach a growing audience of more than 10 million people a month.

“In a free state, there should always be an inquirer asking on behalf of the people: Why? Why? Why?” — John Norvell, Inquirer co-founder

Similar Jobs

Hybrid
Philadelphia, PA, USA
289097 Employees

Pfizer Logo Pfizer

Clinical Scientist Late Stage Oncology - Manager (Clinical Data Review)

Artificial Intelligence • Healthtech • Machine Learning • Natural Language Processing • Biotech • Pharmaceutical
Hybrid
Collegeville, PA, USA
121990 Employees
100K-167K Annually

PwC Logo PwC

Data Architect- Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
Pittsburgh, PA, USA
364000 Employees
100K-232K Annually

PwC Logo PwC

Data Architect- Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
Philadelphia, PA, USA
364000 Employees
100K-232K Annually

Similar Companies Hiring

Effectv Thumbnail
Marketing Tech • Digital Media • AdTech
New York, NY
2157 Employees
Artlist Thumbnail
Social Media • Other • Music • Digital Media
Tel Aviv, IL
450 Employees
bet365 Thumbnail
Software • Gaming • eSports • Digital Media • Automation
Denver, Colorado
6100 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account