Data Engineer - Bioinformatics

Posted 2 Days Ago
Be an Early Applicant
London, Greater London, England
Senior level
Healthtech
The Role
As a Data Engineer, you'll develop and maintain data pipelines for genomic data, ensuring high-quality data transformation and curation. You will collaborate with scientists and researchers to meet their data requirements while adhering to best practices in agile software development.
Summary Generated by Built In

Description

We are looking for a Data Engineer to help solve some of the key challenges around a programme of work at industrial scale with global significance. The successful Data Engineer will know how to communicate to and between technical and non-technical stakeholders as well as facilitate discussions within a multidisciplinary team including scientists, software engineers, product managers and other data engineers.

You will be contributing towards the delivery of data releases that will be used worldwide and will have experience with genetic data. 

Our Future Health will be the UK’s largest ever health research programme, bringing people together to develop new ways to detect, prevent and treat diseases. We are a charity, supported by the UK Government, in partnership with charities and industry. We work closely with the NHS and with public authorities across all nations and regions of the UK.

Our plan is to bring together 5 million volunteers from right across the UK who will be asked to contribute information to help build one of the most detailed pictures we have ever had of people’s health. Researchers will be able to use this information to make new discoveries about human health and diseases. So future generations can live in good health for longer.

What you’ll be doing

You’ll be part of a multidisciplinary team that’s creating pipelines that didn’t exist before, owning them in production and improving them over time. Your key responsibilities will include but not be limited to:

  • Supporting the build of data pipelines from data providers to our primary data store and trusted research environment. 
  • Producing logic for data transformation steps as code, which meets the requirements for our end users and builds well curated, accessible and quality controlled data for analysis. 
  • Developing prototypes for pipelines for complex transformations drawing on existing workflows developed in industry and academia. 
  • Keeping abreast of best practice in data engineering across industry, research and Government and facilitating the adoption of standards. 
  • Providing technical input into the upstream parts of the data pipeline, including the specification and transfer of data from data providers. 
  • Routine ad-hoc data curation activities requiring hands on development of bespoke ETL cleaning scripts using languages such as Python. 
  • Working with researchers to understand the data requirements and working with them to deliver the data needed for their projects. 
Requirements

You will have a solid understanding and experience of bioinformatics, in particular tools and methods associated with genomic data. To succeed in this role, you will have some of the following skills: 

  • Experience of working in an agile development team following best practices like code review and pairing. 
  • Familiar with version control and Git/GitHub.
  • Able to design, build and test pipelines using a range of different technologies. You will know how to create repeatable and reusable products.
  • Experience with storing, searching and filtering large scale genomic data. 
  • Good understanding of cloud environments (ideally Azure), distributed computing and scaling workflows and pipelines. 
  • Proficient in Python. 
  • Experience of workflow management tools, e.g. Nextflow, WDL/Cromwell, Airflow, Prefect, Dagster). 
  • Understanding of common data transformation and storage formats, e.g. Apache Parquet. 
  • Experience of working with data lakes; experience with Spark, Databricks. 
  • Understanding of containerisation, e.g. Docker. 
  • Awareness of data standards such as GA4GH ( ) and FAIR (). 
  • Understanding and working knowledge of information governance and data security approaches appropriate for sensitive health data.
Benefits
  • Up to £60,000 per annum basic salary.
  • Generous company pension package with employer contributions of up to 12%.
  • 30 days annual leave (plus bank holidays).
  • Continuous career development with regular appraisals and learning and development opportunities.
  • A lovely new office in Holborn, Central London – we offer flexible and remote working arrangements.

Join us - let’s prevent disease together.

Top Skills

Python
The Company
HQ: London
278 Employees
Hybrid Workplace
Year Founded: 2020

What We Do

Our Future Health is the UK’s largest ever health research programme, bringing people together to develop new ways to prevent, detect and treat diseases.

Our mission is to create an incredibly detailed picture of the UK population’s health, by recruiting up to five million adult volunteers from across the UK. Each volunteer will be asked to fill out a questionnaire and provide a blood sample that can be linked to their health records. Taken together, the data will present health researchers with a powerful tool to identify new ways of tackling diseases such as cancer, diabetes, and dementia.

It's an unprecedented challenge that involves answering questions that have never been asked before – ethical, practical, and technological. And by getting these answers right, we believe Our Future Health will allow future generations to live in good health for longer.

We are currently expanding our team and looking for specialists across various fields – people who are motivated by the opportunity of creating something new that will make a difference to society. Click on the Jobs tab above to start exploring our current opportunities.

Our Future Health is a registered charity in England, Wales and Scotland.

Similar Jobs

Chainlink Labs Logo Chainlink Labs

Senior Sales Strategy and Operations Analyst

Blockchain • Internet of Things • Payments • Cryptocurrency • Web3
Remote
London, Greater London, England, GBR
650 Employees

Snap Inc. Logo Snap Inc.

Machine Learning Engineer Manager

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
London, Greater London, England, GBR
5000 Employees

Snap Inc. Logo Snap Inc.

Machine Learning Engineer - Computer Vision

Artificial Intelligence • Cloud • Machine Learning • Mobile • Software • Virtual Reality • App development
Hybrid
London, Greater London, England, GBR
5000 Employees

Capco Logo Capco

Business Analyst Consultant / Senior Consultant

Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
London, England, GBR
6000 Employees

Similar Companies Hiring

Sage Thumbnail
Software • Healthtech • Hardware • Analytics
New York, NY
44 Employees
Zealthy Thumbnail
Telehealth • Social Impact • Pharmaceutical • Healthtech
New York City, NY
13 Employees
Cencora Thumbnail
Pharmaceutical • Logistics • Healthtech
Conshohocken, PA
46000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account