Release 12

Information about the data released on XXXX

What data is included in Release 12?

Data from 1,900,557 participants are included in this release. Of those, 1,900,495 participants have completed and submitted the baseline questionnaire, and 1,433,275 have completed their in-person Clinic Measurements appointment. 102,103 participants also have country and region data as part of the first participant geographies data release. For 650,979 of these individuals we have generated genotype array data. 1,703,311 participants were successfully linked to an NHS number of which 1,648,707 participants have at least one secondary care or death registration record.

Participant data

The Participant table includes information from 1,900,557 participants who have registered and consented to join the Our Future Health programme, and submitted a complete questionnaire on or before 15 July 2025.

Participant geographies data

The Country and Region table contains data for 102,103 participants, representing our earliest cohort, including those who registered to join the programme between 2021 and 2022. This cohort will be expanded in upcoming releases.

Questionnaire data

Release 12 of the Questionnaire table includes 1,900,495 participants who have completed either v1, v2, v2.1 or v2.2 of the Our Future Health baseline questionnaire. This includes participants who joined during the initial pilots from 2021 and after the main recruitment period began in October 2022.

  • participants who started the questionnaire on or after 24 May 2021 will have completed v1 of the questionnaire (N = 52,745 participants)

  • participants who started the questionnaire on or after 20 November 2022 will have completed v2 of the questionnaire (N = 737,333 participants)

  • participants who started the questionnaire on or after 21 December 2023 will have completed v2.1 of the questionnaire (N = 369,559 participants)

  • participants who started the questionnaire on or after 13 June 2024 will have completed v2.2 of the questionnaire (N = 740,858 participants)

Clinic measurements data

As of July 2025, over 1.5 million participants have attended an Our Future Health Clinic appointment. The current release includes a subset of 1,433,275 participants who have both completed and submitted a questionnaire and attended an appointment (both on or before 1 April 2025).

Linked health records data

In total, we have attempted linkage to health records data for 1,781,135 participants, who completed their questionnaire before 9 April 2025. 1,703,311 (95.6%) of the 1,781,135 participants sent to were successfully linked to an NHS number. 1,648,707 participants (97.8% of all linked participants) have at least one secondary care or death registration record in one or more of the linked health records data tables.

Linked Health Records data from this release includes participants that completed their questionnaire before 9 April 2025 and, therefore, contains fewer participants than the current Questionnaire data release. This is due to lag between the submission of participant details to and the data being received, quality assured and processed.


Participant and Questionnaire data

What information does the Participant and Questionnaire data contain?

For details on what information is included in the Participant and Questionnaire data, see our Participant data and Questionnaire data pages . These pages cover how we:

  • de-identify data

  • manage re-identification risk

  • version control

  • tailor questionnaire journeys

  • store the data in the TRE

Due to variations in the timing of data exports, a small number of participants in the Participant data table may not have corresponding records in the Questionnaire table. Additionally, to ensure the most robust handling of withdrawals, these are now processed independently for each dataset, resulting in some variation in participant counts across assets.

What changes have been made as part of this release?

There are no changes in this release. Participants who have withdrawn from the program have been removed from Release 12. Version v2.2 of the questionnaire remains the active live version.

What should I be aware of when working with the participant and questionnaire data in this release?

Technical data loss

A suspected system issue that occurred prior to October 2022 resulted in a small number of questionnaires submitted around that time to have missing data for some questions. The missing data cannot be explained by errors in dynamic logic. We are analysing the impact and will provide further information in future releases.

Implausible age and year combinations

Responses to questions about age or year of birth are initially validated against the participant’s recorded date of birth at the time of response. However, if a participant later updates their date of birth, these earlier responses are not re-validated. The Participant data reflects the most recent date of birth, which may lead to inconsistencies between updated birth information and previously recorded responses. This issue affects only a small number of cases, and we plan to resolve it in a future data release.

Updating responses to parent questions

Due to the current data capture process, there are cases where a participant updates their response to a parent question, which correctly overwrites the original answer. However, responses to dependent (dynamic) questions linked to the previous parent response may persist, resulting in logical inconsistencies. This issue affects a very small proportion of submissions; less than 0.1% across all versions. We are actively working on a solution.

One example involves sex-specific questions. In a small number of records, there are inconsistencies between the participant’s self-reported sex and their responses to sex-specific items. This can occur when a participant changes their response to "What sex were you registered with at birth?" - recorded in fields DEMOG_SEX_1_1 or DEMOG_SEX_2_1 - after having completed questions tailored to their previous response. As a result, responses to questions intended for the opposite sex may be retained in error, rather than being removed or excluded based on the updated logic path.

Errors in questionnaire configuration

For comprehensive documentation on all historical bugs related to errors in the implementation of dynamic logic, please refer to Change log for Questionnaire versions. Please note that errors in logic may persist across releases, even after they have been fixed for the affected version.

Participants who have registered more than once (participant and questionnaire data)

As described on the Participant data page, we are aware that some individuals may have registered multiple times. This may mean that in a small number of cases, the same person may have submitted multiple questionnaires under different registrations.

Currently, it is not possible to identify these duplicate records from the participant or questionnaire data with high confidence. Although a participant who submitted multiple questionnaires under different registrations might do so in good faith and be expected to provide similar answers, responses are unlikely to be identical. This approach would also not detect multiple registrations where questionnaire responses are very different. However, approved study applications which include linked data could use the linked health records to identify a large proportion of the duplicate records whenever submitted personal information is the same or similar (see What should I be aware of when working with the linked health records data in this release?).

Updating records between releases

In exceptional cases, a participant’s record may appear to be modified between releases. For example, if a participant mistakenly completes a questionnaire intended for their partner, the incorrect record is deleted to allow the correct individual to submit their responses. Such cases are extremely rare, affecting fewer than 0.001% of records. See our documentation for Release 9 for more details.


Participant geographies data

What information does the Participant geographies data contain?

Currently, the Participant geographies data consists of a single table that contains the country and region linked to each participant’s self-reported address collected during their registration for the Our Future Health programme.

For the current release, all participants must have submitted a complete questionnaire on or before 15 July 2025.

For details on how we process Participant geographies data and how we create the Country and Region table, see our Participant geographies data page.

What changes have been made as part of this release?

This is the first release for Participant geographies data, and contains only country and region for a subset of partipants who joined the programme earliest, specifically those who registered between 2021 and 2022.


Clinic measurements data

What information does the Clinic measurements data contain?

For details on what information is included in the Clinic measurements data see our Clinic measurements data page. This page covers how we:

  • de-identify data

  • manage re-identification risk

  • version control

  • store the data in the TRE

For the current release, all participants must have attended a clinic appointment and have submitted a complete questionnaire on or before 15 July 2025.

What changes have been made as part of this release?

There are no changes to this release. Participants who withdrew from the programme have been removed from Release 12. For information on how appointments are conducted, see Procedure for Clinic measurements

What should I be aware of when working with the Clinic measurements data in this release?

Un-versioned updates to the appointments process

The current versioning approach applied to the Clinic measurements data table includes only two major versions, which can be used to identify whether or not a participant had an appointment that included heart rhythm or third heart readings. These updates include things such as:

  • introducing XS and XL blood pressure cuffs

  • changes to the order of measurements collected

  • addition of specific instructions for obtaining readings from pregnant individuals

For more details on versioning, please refer to the section on Change log for Clinic measurements appointment processes

Participants who have registered more than once (clinic measurements data)

As described on the Participant data page, we are aware that some individuals may have registered multiple times. This may mean that in a very small number of cases, the same person may have attended multiple in-person appointments under different registrations.

Currently, it is not possible to identify these duplicate records from the clinic measurements data directly. Even where a participant may have attended multiple in-person appointments and had physical measurements taken, natural variation and measurement error will mean that it is unlikely that the measurements would be identical. However, approved study applications which include linked data could use the linked health records to identify a large proportion of the duplicate records whenever submitted personal information is the same or similar (see What should I be aware of when working with the linked health records data in this release?).

Multiple measurements obtained for heart readings

During the original appointment process (version 1), the protocol for heart readings was to obtain only two measurements. However, in version 1, it was reported that clinicians occasionally took multiple readings and re-entered values for the first two measurements, attempting to achieve more typical results. To mitigate this, version 2 introduced the option for a third reading if abnormal measurements were recorded for the first two readings.

Missing data for third heart readings

Due to technical issues, software updates, or rare system failures, there may be isolated cases of data capture inconsistencies. As of appointment version 2, participants who have abnormal readings recorded for their first and second set of heart measurements are offered the opportunity to provide a third set of measurements, as described in the section Do all participants provide every measurement?

However, we note two exceptions:

  1. criteria met but data missing (false negative data): participants who meet the criteria for a third readings, but have no data for third readings

  2. criteria not met but data provided (false positive data): participants who do not meet the criteria but do have data for a third reading

This discrepancy affects fewer than 0.01% of records. The vast majority of participants who meet the criteria for third readings in version 2 have data recorded as expected.

Data capture for height, weight and waist measurements

During appointments, the following ranges are allowed for height, weight, and waist measurements:

  • height: Between 90 and 299 centimetres

  • weight: Between 20 and 400 kilograms

  • waist circumference: Between 30 and 200 centimetres

These ranges are intentionally broad and may not always reflect biologically plausible measurements. The same boundaries are applied to both height and weight in the Our Future Health Baseline Questionnaire.

We have identified infrequent outliers in the clinic measurements data that suggest occasional human error during data capture, affecting less than 1% of observations. These errors are likely to include:

  • waist circumference may have been entered in inches instead of centimetres

  • height and weight measurements may have been reversed, with height entered in the weight field and vice versa

  • the same values may have been erroneously entered for multiple fields (e.g., height and weight, or height, weight, and waist)

No mitigation has been applied in the current release, meaning these issues will persist in the data.

To ensure accurate measurements are recorded, our data capture application and associated Standard Operating Procedures (SOPs) are continually updated with guidelines and prompts to assist in precise data collection. We are committed to addressing these data issues and may update our data cleansing rules in future releases.


Linked health records data

What information does the linked health records data contain?

This release contains linked health records from Hospital Episodes Statistics (HES), the National Disease Registration Service (NDRS), and Office of National Statistics (ONS) Death Registration.

The HES and NDRS data sets provide a wide range of information on patient admissions to NHS facilities, including clinical, administrative, and geographic information. The HES data sets do not contain electronic patient health records or information on medicines and dosages. For more information on how these data sets are collected and processed, please refer to the HES Data Collection page (external link) and NDRS Access Page (external link). The HES and NDRS data sets only include records collected by NHS England (NHSE), meaning these data contain only care records from NHS providers in England.

The data contains linked heath records from selected Hospital Episodes Statistics (HES) data sets, Admitted Patient Care (APC), Accident & Emergency (A&E), Outpatient and Emergency Care Dataset (ECDS), and selected cancer data sets from the National Disease Registration Service (NDRS), Cancer Registry and Cancer Pathways data sets. For more details on each data set see the section on the linked health records data page entitled Linked data set descriptions.

We used the HES data dictionary v2.04 and NDRS data dictionary v5.2 for validation on variable format and codes.

All linked health records data have been provided by NHS England.

What changes have been made as part of this release?

We have updated the pseudonymised provider code list to incorporate any new providers. In total, we mapped 957 providers. 26 providers (2.7% of all providers in data) were mapped to unknown because they did not appear in either the NHS Organisation Data Service API or the Archived Closed Organisation data set.

How are the data sets structured?

The data release includes 9 entities, organised as follows:

  • Hospital Episode Statistics

    • nhse_eng_inpat (Admitted Patient Care)

    • nhse_eng_ed (Accident and Emergency)

    • nhse_eng_outpat (Outpatients)

    • nhse_eng_ecds (Emergency Care Dataset)

  • Civil Registrations of Death 

    • nhse_engwal_deaths

  • National Disease Registration Service Cancer Data (NDRS)

    • nhse_eng_canpat (Cancer Pathways)

    • nhse_eng_canreg_pattumour (Cancer Registry Patient Tumour)

    • nhse_eng_canreg_treat (Cancer Registry Cancer Treatment)

  • Linked Participants

    • participant_nhs_linked

The table below summarises the available data, including number of available fields and dates for each data set. These dates include provisional data for the HES datasets. Please refer to the section on provisional data for more information on the dates for finalised vs provisional data. Further descriptions can be found on the Linked data set descriptions page.

The entity names indicate the data source, geographic data coverage, and name of the data set. For example, nhse_engwal_death indicates the data source is NHS England, the entity includes data from England and Wales, and the data set is for deaths.

Entity
Description
Dates Available (including provisional data)
Number of Fields

HES Admitted Patient Care

Episodes of in-patient care

1 April 1997 to 31 March 2025

108

HES Accident & Emergency

Attendance of major A&E department

1 April 2007 to 31 March 2020

91

HES Outpatient

Outpatient appointments

1 April 2003 to 31 March 2025

55

HES Emergency Care Dataset

Attendances of major A&E department

1 April 2020 to 31 March 2025

162

ONS Death Registration

Death registration and mortality data

9 June 2022 to 11 June 2025

20

NDRS Cancer Pathways

Cancer pathways data

1 January 2013 to 21 July 2024

12

NDRS Cancer Registry Patient Tumour

Cancer treatment data at tumour-level

1 January 1995 to 31 December 2022

49

NDRS Cancer Registry Cancer Treatment

Cancer data by treatment event at given tumour

1 January 1995 to 20 June 2024

22

Linked Participants

Participants successfully linked to an NHS number

All participants who submitted questionnaire before 9 April 2025

2

How did we de-identify the linked health records data to minimise risks of identifying participants?

For categorical fields with a higher risk of re-identification, we suppressed categories which included fewer than 10 participants as well as codes which indicate admissions from or discharge to mental or penal facilities.

To avoid the suppressed category being deduced by elimination, the next smallest category was also suppressed. Categories were suppressed by replacing the coded entries for corresponding participants with the suppression code, -999.

The following fields had suppression applied: admission source (ADMISORC), admission method (ADMIMETH), and discharge destination (DISDEST) in HES Admitted Patient Care. The table below shows which codes are suppressed in each column.

We also suppress SNOMED codes in HES Emergency Care Dataset (ECDS) related to penal or detention centres, psychiatric admissions, homelessness, and rehabilitation. We also propose to suppress admissions requiring speciality resources, including mountain rescue, air ambulance and coastguard rescue service to further mitigate re-identification risk through spontaneous recognition and further mask the small number of participants with penal, mental health, and homelessness codes.

In the NDRS Cancer Registry Treatment data set, we are releasing a field which lists chemotherapy drugs received during treatment (CHEMO_ALL_DRUGS). This field also contains the name of any clinical trials a participant was enrolled in during treatment. To mitigate re-identification risk, we replaced the name of the clinical trial with ANONYMISED CLINICAL TRIAL.

Column
Suppressed Categories

ADMIMETH

2C = Baby born at home as intended;

25 = Admission via Mental Health Crisis Resolution Team;

83 = Baby born outside the Health Care Provider except when born at home as intended 84 = Admission by Admissions Panel of a High Security Psychiatric Hospital patient not entered on the HSPH Admissions Waiting List (available between 1999 and 2006)

ADMISORC

37=Court (1999-00 to 2006-07 and from 2022-23) 38=Penal establishment: police station;

39=Penal establishment, court or police station / police custody suite;

40=Penal establishment;

41=Court;

42=Police Station / Police Custody Suite;

48=High security psychiatric hospital, Scotland;

49=high security psychiatric accommodation in an NHS hospital provider;

50=NHS other hospital provider: medium secure unit

DISDEST

38=Penal establishment: police station;

39=Penal establishment, court or police station / police custody suite;

40=Penal establishment;

42=Police Station / Police Custody Suite;

48=High security psychiatric hospital, Scotland;

49=High security psychiatric accommodation in an NHS hospital provider;

50=NHS other hospital provider: medium secure unit

SNOMED codes

1047991000000102 = Arrival by prison transport 1066011000000104 = Referred by Her Majesty's prison service 1079611000000109 = Place of occurrence of injury is prison 1066001000000101 = Custodial services: detention centre 61801003 = Patient referral for psychiatric aftercare 4266003 = Referral to drug addiction rehabilitation service 38670004 = Alcoholism rehabilitation 183584001 = Referral to community psychiatric nurse 61801003 = Referral to community rehabilitation 231467000 = Absinthe addiction 1077211000000104 = Homeless persons drop in centre 32911000 = Homeless 105526001 = Homeless family 1079661000000106 = Place of occurrence of injury is hostel for the homeless 1077211000000104 = Referred by homeless drop-in centre 1066051000000100 = Referred by mountain rescue service 1048081000000101 = Fixed wing / medical repatriation by air

What should I be aware of when working with the linked health records data in this release?

Further information on known data quality issues in the NHSE data sets can be found in the NHSE HES Data Quality Reports (external link)

The cohort represented in the cancer data is different to the other linked health records data.

The NDRS Cancer Pathways, Cancer Registry Patient Tumour and Cancer Registry Cancer Treatment data sets are a re-release of the cancer data available in Release 10. They include cancer records for participants who submitted a questionnaire before 15 October 2024. All other linked health records are for participants who submitted a questionnaire before 9 April 2025. This means it is likely that there are some participants in the Linked Participants data with a cancer diagnosis that we do not have cancer records for.

To mitigate this issue, we recommend using the SUBMISSION_DATE field in the questionnaire to filter the participants to those who submitted a questionnaire prior to 15 October 2024 and comparing those participants with the successfully linked participants in the Linked Participants entity. This will provide the list of participants who were eligible for linkage and could appear in the NDRS data sets.

Discrepancies in number of participants the Linked Participants entity

Please note that there are 13 participants with a linked health record who do not have their PIDs listed in the Linked Participants entity. We are working with NHSE to solve this discrepancy.

The Linked Participants entity lists all the successfully linked participants who submitted a questionnaire prior to 9 April 2025.

Using medical ontologies (ICD-10, OPCS-4, and SNOMED) in the cohort browser

The cohort browser in the Trusted Research Environment can only filter numeric or categorical data. In the NDRS Cancer data sets, some fields with diagnosis and procedures information like ICD-10 and OPCS-4 codes are entered as strings. Therefore, it is not possible to use the cohort browser to filter by specific diagnosis and procedure codes in those data sets. To filter for specific ICD-10 or OPCS-4 codes, we recommend loading and filtering the data using a Jupyter Notebook. It is possible to access ICD-10 and OPCS-4 information in the cohort browser for the HES and ONS Death Registration data sets. It is also possible to access SNOMED information in the cohort browser for HES Emergency Care Dataset.

Participants who have registered more than once (linked health records data)

As described on the Participant data page, we are aware that some individuals may have registered multiple times. Participants with multiple registrations in which they have provided identical or nearly identical personal information (name, address and date of birth) may be linked to the same NHS number, and thus may have duplicate health records.

Unlike other data types, health records in the HES and NDRS data sets have additional administrative record identifiers. A large proportion of participants with multiple registrations can therefore be detected by locating duplicate administrative identifiers in these datasets. For example, duplicate EPIKEY entries in Admitted Patient Care (APC) would indicate that records linked to multiple participants have been linked to the same NHS number. We acknowledge that this is not a complete solution, for example, because it will not detect participants who have not been successfully linked or who have not health records despite linkage to an NHS number. In addition, not every approved study application will include linked health records. We are investigating ways to make use of linked health records and other data in order to ensure that detection and removal of duplicate participants is as complete as possible.

HES provisional data may change between releases

The HES Admitted Patient Care, Outpatient, and Emergency Care data include some provisional records. These are the most recent admissions and appointments that were available for the cohort at the time the data was supplied by NHS England, but the records have not been finalised. Therefore, the data entered in these records could change slightly in future releases. Once a year, the latest full financial year of provisional data is finalised and made available to Our Future Health by NHS England.

In the current release:

  • any appointments in the Outpatient data that occurred from 1 April 2024 onwards are likely provisional data and subject to change in future releases

  • any hospital episodes in the Admitted Patient Care data that finished after 31 March 2024 onwards are likely provisional data and subject to change in future releases

  • any appointments that occurred before 1 April 2024 or hospital episodes which finished prior to 1 April 2024 are likely finalised data and are not subject to change

Last updated