Participant geographies data
Information on the national and regional geographic data associated with participants who have joined the Our Future health programme, including the scope, structure and processing of these data
The Participant geographies data provides geographic information derived from participants’ self-reported address at the time of registration to the Our Future Health programme. Currently, Participant geographies data consists of a single dataset containing both country and region data. This page outlines the structure, scope, and methodology used to generate and process the data for release.
What information is included in the participant geographies data?
The participant geographies data currently consists of a single data called the Country and Region table. This table includes the country and region linked to each participant’s self-reported address collected during their registration for the Our Future Health programme.
Country and region
The United Kingdom (UK) is made up of four countries, called the devolved nations:
England
Scotland
Wales
Northern Ireland
England itself is divided in to nine official regions:
North East
North West
Yorkshire and the Humber
East Midlands, West Midland
East of England
London
South East
South West
These regions, along with the three other devolved nations, form the political and geographical makeup of the UK.
Why country and region matters for health research
Understanding the UK’s countries and regions is important in health research because it helps identify demographic differences that affect health needs, outcomes, and service provision.
Future releases
Additional geographic levels will be introduced in future releases. Each level will be released as a distinct dataset.
Country and region data processing and release
We are releasing the geographic data in stages. The initial releases include a subset of participants who joined the programme earliest, specifically those who registered between 2021 and 2022. Future releases will expand to include all participants who:
Registered before the cut-off date for that release, and
Have fully completed and submitted their questionnaire.
Data for participants who have fully withdrawn from Our Future Health is not included, as those data are deleted routinely after they request to withdraw. Participants who have fully withdrawn from the programme since the last data release will not be included in the current data release.
Only registration address is used. No subsequent address changes or participant relocations are reflected.
How do we map participant addresses to geographic areas?
During registration, participants provide their postcode and address. Our registration system integrates the Ideal Postcodes lookup service, which returns additional location data associated with the address, including geocoded latitude and longitude coordinates. These coordinates are used to generate point geometries that represent each participant's location.
We use these coordinates to assign each participant to standard UK geographic boundaries, such as country and region. This is accomplished using a point-in-polygon spatial mapping technique, whereby each participant’s point geometry is overlaid onto official boundary shapefiles that divide the UK into discrete polygonal areas representing defined geographic unit. For details in the exact shapefiles, see the below section Geographic boundary file sources and versioning
Each point is evaluated for spatial containment within a given polygon. Once a match is identified, the spatial information is transformed into a structured tabular format. In this output, each participant is linked to the relevant geographic unit, including both the unit’s official code (e.g. country code as E92000001
) and its corresponding label (e.g. country label as “England”).
Geographic boundary file sources and versioning
Country-level boundaries
Source: Office for National Statistics (ONS)
Dataset: Countries (December 2024) Boundaries UK BFC (shapefile)
Published: 11 February 2025
Region-level boundaries
Source: Office for National Statistics (ONS)
Dataset: International Territorial Level 1 (January 2025) Boundaries UK BFC (shapefile)
Published: 4 February 2025
We chose ITLs because they provide a harmonised structure for regional geography across the UK, explicitly recognising the devolved nations as distinct regions. This framework is increasingly used in health, policy, and economic research and supports greater comparability across studies by aligning with national and international geographic standards. The Office for National Statistics (ONS), the UK’s official statistical agency, adopts ITLs as the standard for regional data collection and reporting, ensuring consistency across official statistics.
The shapefiles used are fully clipped, meaning that they include only the precise geographic extents of official administrative areas, excluding any extraneous or overlapping spatial features. This ensures accurate and unambiguous assignment of participants to geographic units, particularly for those located near administrative boundaries.
All geographic assignments are based on fixed versions of official boundary shapefiles. These files are used consistently across all data releases to ensure temporal stability and prevent inconsistencies that might arise from administrative boundary changes over time. Even when newer versions of boundary datasets become available, we do not adopt dynamic updates.
This decision is made to preserve longitudinal comparability and to reduce the risk of participant re-identification through geographic triangulation or shifts in classification over successive data releases.
Licensing
Geographic boundary data are © Crown copyright and database rights 2024. They contain public sector information licensed under the Open Government Licence v3.0.
Why use latitude and longitude coordinates?
Our Future Health uses latitude and longitude coordinates derived from individual participant residential addresses to map to administrative geographies.
The main alternative approach would be to use the locations of postcode centroids. Postcode centroids represent postcode areas using a single, central point and are more commonly used for aggregated spatial analysis. While they offer simplicity postcode centroids are inherently less precise and are subject to change due to updates in postal geography, such as the creation of new postcodes, boundary shifts, or service reorganisation by postal authorities. These changes can introduce inconsistencies across time and reduce the reliability of geographic classification for individual records.
In contrast, latitude and longitude coordinates of individual addresses offer superior accuracy for participants-level spatial assignment, particularly in cases where participants are located near the edges of administrative boundaries. This precision allows us to assign participants to defined geographic units through point-in-polygon mapping techniques. This approach enhances spatial accuracy and consistency, supporting the scientific objectives of the Our Future Health programme.
Limitations and caveats
A small proportion of participants could not be assigned to a geographic area due to incomplete, invalid, or unresolvable address data.
Additionally, all assignments are based solely on the residential address provided at the time of registration. Subsequent changes of address are not reflected in these data.
The quality and precision of geographic assignment are contingent upon the accuracy of the self-reported registration address and the reliability of the Ideal Postcodes service.
Data access and de-identification
Data access
Datasets are stored and maintained independently from other participant datasets within the Trusted Research Environment (TRE).
Access to any participant geographies dataset is restricted and requires a dedicated request. Requests are reviewed by an expert panel to ensure that the geographic information is necessary and appropriate for the intended research purpose.
How do we de-identify the data to minimise risks of identifying participants?
Participant geography datasets exclude all participant level address and postcode data and contain only non-identifiable geographic classifications derived from coordinate-based mapping.
How is the data organised in the Trusted Research Environment (TRE)?
In the TRE, all participant geographies datasets will be maintained as separate entities.
For the current release, the Country and Region table is organised as a single entity, containing one row per participant with three variables: Participant ID (PID), country at registration, and region at registration.
Each entity can be linked to other entities (for example, the questionnaire dataset) using the PID, which is a unique participant identifier. Aside from the PID, variable names are unique both within and across all entities.
For the geographies data, the release datasets always store participant information using the relevant codes. The codings file includes these codes along with their full textual labels (referred to as "meanings" in the codings file), which correspond to the codes and labels used in the shapefiles from which the data are derived as described in Geographic boundary file sources and versioning
Below is an example of the release candidate:
A1B2C3D
E92000001
TLF
M4N5K6L
E92000001
TLI
R7T9LQ2
W92000004
TLL
How do I interpret the structured field names?
Field names are short, descriptive, and often abbreviated labels used to indicate the contents of each column in the dataset.
These data are not versioned and contain only one value per column. The field names reflect the geographic level followed by the context or time point from which they were obtained, for example:
country_at_reg
- the derived country at the time of registration.region_at_reg
- the derived region at the time of registration.
What metadata is available to help document the participant geographies releases?
We provide the following data files on our Data and cohort page (external link):
data dictionary - which defines the raw data fields and metadata information, such as labels, descriptions and units of measurements
coding file - which contains the granular details of categorical or raw coded values
If using Microsoft Excel to browse these files, for an optimal viewing experience, ensure the encoding settings are set to UTF-8.
Last updated