Computercraft is looking for a Genetic Sequence Database Product Owner and Data Wrangler to support our work for the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH).
NCBI, one of the 400 most-visited sites in the world, is the premier biomedical center, hosting over four million daily users in search of clinical, genetic, and other information. NCBI’s wide range of applications (e.g., PubMed, ClinicalTrials.gov), platforms, and environments (e.g., big data [petabytes], machine learning, multiple clouds) serve more users with more data than any other U.S. Government agency. Working on NCBI products, you can help to accelerate the development of cures for diseases like cancer.
The Sequence Archives and Submissions (SeqArch) program needs a Product Owner and Data Wrangler for the GenBank sequence database, a unique scientific resource of human health and genetic data at NCBI. This person will be responsible for coordinating data exchange with the International Nucleotide Sequence Database Collaboration, generating downloadable data for external users, and coordinating targeted updates to the database based on systematic changes in taxonomic information.
In this position you will help manage GenBank’s data-access-related products, tools, and protocols. You will make decisions about the direction of the product and prioritize tasks. You will also work to define development tasks, establish delivery schedules, and ensure compliance with the organization’s policies and procedures.
Job Responsibilities
- Develop product vision, goals, and strategic roadmaps
- Lead data-gathering efforts through market research, data analysis, and user research to make balanced, objective decisions and provide clear guidance to delivery teams to create incremental value in an Agile environment
- Synthesize data-gathering efforts into a logical organization of epics and user stories for the development team
- Collaborate with users and lead cross-functional teams to define and optimize user workflows to improve user experience
- Understand customer segments and identify targeted solutions to exceed their needs
- Lead teams through a complete product lifecycle of discovery to delivery
- Nurture partnerships with various stakeholders who wish to participate in the sharing of genomic data for research in cloud and conventional environments, using secure cross-agency protocols
- Participate in external collaborations and work with senior stakeholders
- Analyze incoming genetic sequence data for trends
- Prioritize the actions of the product team
- Critically evaluate datasets and functional annotations to assess quality
- Monitor automated dataflows for loading data to production databases
- Provide critical expertise to NCBI in biological data curation of genetic sequences
- Analyze log files, error files, or test-case “diffs” that can total hundreds of megabytes using tools such as sed, grep, awk, and Perl to confirm known/expected outcomes and identify outlier/problematic outcomes
Required Skills/Experience
- B.S. in bioinformatics, molecular biology, data science, computer science, information technology, or a similar field
- Excellent verbal and written communication skills
- Genomics/bioinformatics experience
- Strong understanding of molecular biology concepts
- Scientific ETL data model experience/skills
- The ability to troubleshoot technical and staffing roadblocks and mitigate resource risks
- Experience managing large and cross-functional projects in a complex, policy-driven environment
- Strong customer engagement, networking, presentation, and collaboration skills
- Ability to incorporate and diplomatically resolve conflicting priorities from multiple user groups and technical stakeholders
- Data processing experience in a Linux environment (5+ years)
- Experience coaching team members and eliminating knowledge silos
Desired Skills/Experience
- Experience working with GenBank or other sequence databases at NIH or other organizations
- Experience with data interoperability and sharing standards and policies
- Experience working with Cloud data storage and processing platforms (e.g., AWS, GCP)
- Proficiency in at least one scripting language (e.g., BASH, Python)
- Experience working with large SQL databases involving many tables and billions of data rows
- Experience with CI/CD pipelines, unit tests, integration, and regression testing
- Expertise in bioinformatics of sequence analysis and tools including BLAST and multiple sequence aligners
- Solid understanding of key molecular biology concepts, such as the central dogma that describes the flow of genetic information from gene (DNA) to mRNA to protein
- Experience working in Product Owner or Product Manager positions in an Agile environment (e.g., developing vision, strategic plan, roadmap, requirements; applying user testing methodologies; prioritizing features based on value and effort)
The compensation for this position will be based on the experience of the successful candidate. The expected pay range for this position is $110,000 to $150,000.
Top Skills
What We Do
Computercraft, an American Indian– and Woman-owned small business, provides the public with user-friendly access to reliable and current genetic sequence, genomic, chemical, and scientific information.
Our technical and scientific staff work with customers to build and refine high-profile information resources that get accurate health and biomedical research data into the hands of researchers and other stakeholders all over the globe, helping them solve our world’s greatest health challenges.
In some of our other work, we provide program management support and health communication and outreach that directly and indirectly facilitate and sustain our nation's public health efforts