Data Engineer

Reposted 21 Days Ago
Bethesda, MD
Mid level
Biotech
The Role
As a Data Engineer at NCBI, you will design, develop, and maintain biomedical data resources, implementing efficient algorithms and managing large data sets.
Summary Generated by Built In

Overview

Black Canyon Consulting (BCC) is searching for Data Engineer(s)  to support our work for the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), an institute of the National Institutes of Health. This opportunity is full time at the NCBI in Bethesda, MD.

The NCBI is part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). NCBI is the world's premier biomedical center hosting over six million daily users seeking research, clinical, genetic, and other information impacting biomedical research and public health. At NCBI, you can literally help accelerate cures for diseases! NCBI's wide range of applications, platforms (node, python, Django, C++, you name it), and environments (big data [petabytes], machine learning, multiple clouds) serve more users than almost any other US Government Agency, according to https://analytics.usa.gov/.

We attract the best people in the business with our competitive benefits package that includes medical, dental and vision coverage, 401k plan with employer contribution, paid holidays, vacation, and tuition reimbursement. If you enjoy being a part of a high performing, professional service and technology focused organization, please apply today!

Job Description 
  • There are multiple openings. You will work with a talented group of scientists and software developers to design, develop, test, and maintain programs for NCBI's world's premier biomedical data resources, with examples below - 
    • PubMed - with 33+ million biomedical literature and 5+ million daily users
    • GenBank - with over 12 trillion nucleotide bases. A part of the International Nucleotide Sequence Database Collaboration(INSDC), exchanging data with the DNA DataBank of Japan (DDBJ) and the European Nucleotide Archive (ENA) daily.
    • SRA - The Sequence Read Archive(SRA), the largest publicly available repository of high-throughput sequencing data, available in multiple cloud providers and NCBI servers, also part of International Nucleotide Sequence Database Collaboration(INSDC). 
    • ClinicalTrials.gov - providing access to both privately and publicly funded clinical trial studies around the world
  • Specific tasks may include implementing efficient bioinformatic algorithms, and facilitating the development of cloud-ready tools and pipelines to improve the performance and scalability of searching in and submitting to the more than ten terabytes of genetic sequence data at NCBI.

Required Skills

  • Proficiency in Python
  • Experience with MS SQL server and relational database design and optimization
  • Programming experience in a Linux environment and shell scripts such as BASH
  • Experience in handling large amounts of data
  • Ability to work with common structured documents (at least one of XML, JSON)
  • Experience with CI/CD pipelines, unit tests, integration, and regression testing

 Desired Skills

  • Experience with Cloud technologies:
    • AWS: EC2, S3, Lambda
    • GCP: GKE, Google Store, Cloud functions
  • 5+ years of working with genetic and biological data
  • Familiarity with NGS computational tools and formats (BWA, GATK, Galaxy, etc.)
  • Demonstrated active involvement into open source communities (github, etc.)
  • Experience managing production workflow of an online public databases
  • Experience with RESTful API design

Top Skills

AWS
Bash
GCP
Ms Sql
Python
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bethesda, MD
271 Employees
On-site Workplace
Year Founded: 1988

What We Do


Official account of the National Center for Biotechnology Information (NCBI) at the National Library of Medicine. NCBI serves as an international resource for the scientific research community - providing access to public databases and software tools for analyzing biological data, as well as performing research in computational biology.

The NCBI was established in 1988 by an act of the United States Congress as division of the National Library of Medicine at the National Institutes of Health, with a mission to find new approaches to deal with the increasing volume and complexity of biological data in order to facilitate the understanding of genes and their role in health and disease.

The NCBI is made up of multidisciplinary research and development teams composed of molecular biologists, biochemists, structural biologists, clinicians, mathematicians, and computer scientists who:

Archive: Gather scientific and medical research data from around the globe
• Serve as the largest repository of the world’s primary biological research data
• Produce curated datasets to enhance the value and usability of the primary data

Access: Develop systems for discovering and integrating scientific and medical data
• Create search tools and data cross-referencing mechanisms
• Display and enable download of information from the world's largest collection of biological data

Advance: Promote understanding of processes that effect health and disease
• Perform cutting-edge research in computational biology
• Design and build algorithms, programs and systems for analysis of biological data
• Provide support and training through a varied and vigorous outreach program

Similar Jobs

PwC Logo PwC

Data Engineer - Director

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote
Hybrid
69 Locations
370000 Employees
148K-317K Annually

PwC Logo PwC

GCP Data Engineer - Senior Manager

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Remote
Hybrid
67 Locations
370000 Employees
130K-256K Annually

PwC Logo PwC

Managed Services - Data Operations Engineer (Teradata & Data Stage) - Experienced Associate

Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Hybrid
46 Locations
370000 Employees
49K-116K Annually

ZoomInfo Logo ZoomInfo

Principal Data Engineer

Big Data • Information Technology • Machine Learning • Sales • Software • Database • Generative AI
Remote
4 Locations
3500 Employees
184K-253K Annually

Similar Companies Hiring

SOPHiA GENETICS Thumbnail
Software • Healthtech • Biotech • Big Data • Artificial Intelligence
Boston, MA
450 Employees
Pfizer Thumbnail
Pharmaceutical • Natural Language Processing • Machine Learning • Healthtech • Biotech • Artificial Intelligence
New York, NY
121990 Employees
Takeda Thumbnail
Software • Pharmaceutical • Manufacturing • Healthtech • Biotech • Analytics
Cambridge, MA
50000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account