Participant geographies data
Information on the national and regional geographic data associated with participants who have joined the Our Future Health programme, including the scope, structure and processing of these data.
The Participant geographies data release provides geographic information derived from participants’ self-reported address at the time of registration to the Our Future Health programme. Currently, Participant geographies data consists of four datasets containing country and region, middle- and lower-layer statistical geographies (LSOA, MSOA, and Intermediate Zones). This page outlines the structure, scope, and methodology used to generate and process the data for this release.
What information is included in the participant geographies data?
This documentation describes the datasets available for representing participant geographies within the Our Future Health programme. These datasets are derived from participants’ self-reported addresses collected during registration and are linked to official UK geographic boundaries, enabling analyses at multiple spatial scales.
The participant geographies data currently consists four separate datasets:
Country and region for England, Wales and Scotland
Middle Layer Super Output Areas (MSOA) for England and Wales
Lower Layer Super Output Areas (LSOA) for England and Wales
Intermediate Zones (IZ) for Scotland
The data are organised by geographic level and by boundary source.
geographic level refers to the hierarchy of areas (e.g. MSOAs vs LSOAs)
boundary source refers to the authority that defines these areas: the ONS for England and Wales, and the Scottish Government for Scotland
Researchers can select the level and source most relevant to their analyses, for example, using broader geographies for national comparisons or smaller areas for local or community-level investigations.
More details file types, versions and sources are available below in Boundary files and versions
Why geographies matter for health research
Geographic data provide essential context for understanding patterns in health, wellbeing, and access to care. Linking participant data to official geographic boundaries allows researchers to explore how health outcomes vary across different parts of the UK, taking into account social, environmental, and policy differences between regions and local areas.
Different geographic scales are suited to different research questions. Using an area that is too large can mask local inequalities, while using an area that is too small can breach confidentiality, produce unstable estimates due to small population sizes, or reduce statistical power required to detect meaningful differences when combined with other data.
Population and household size estimates
Understanding expected population and household sizes is critical for choosing the appropriate geographic level for analysis and ensuring statistical reliability. These values are estimates; for more details on population distributions in the Our Future Health cohort, see Characteristics of Our Future Health participants
Country and region
The UK comprises four devolved nations with widely varying populations:
England
58,620,100
Wales
3,186,600
Scotland
5,546,900
Northern Ireland
1,927,900
These statistics are sources from the November 2025 ONS population estimates.
England is further divided into nine separate regions with a wide range in population estimates (e.g. North East has a population of ~2.68 million and the South East of ~9.38 million). National and regional geographies are useful for comparing large-scale health trends or assessing impacts of national or devolved policies.
Mid-level geographies
Mid-level geographies include units such as MSOAs in England and Wales and Intermediate Zones in Scotland. These are stable, local areas that cover multiple neighbourhoods.
MSOA (England)
6,856
5,000 - 15,000
2,000 - 6,000
MSOA (Wales)
408
5,000 - 15,000
2,000 - 6,000
Intermediate Zones
1,334
2,500 - 6,000
1,000 - 2,500
Lower-level geographies
Lower-level geographies represent small-area units such as LSOAs in England and Wales, providing fine-grained, neighbourhood-level detail; note that Scottish Data Zones are not included in the current release but are expected in future updates.
LSOA (England)
33,557
1,000 - 3,000
400 - 1,200
LSOA (Wales)
1,917
1,000 - 3,000
400 - 1,200
Data Zones
7,392
500 - 1,000
200 - 500
Population and household sizes can vary substantially, particularly in highly dense urban areas or very rural regions. For example, typical ranges for LSOAs are 1,000 to 3,000 residents and 400 to 1,200 households, absolute minimum and maximum values can be much wider, where observed LSOA populations range from approximately 800 to 9,000 residents, with household counts between 350 and 3,500. These extreme values represent outliers rather than the norm.
When to use mid-level geographies
Mid-level geographies are most appropriate when stable estimates are needed, for example, when outcomes are rare, sample sizes are small, or to reduce statistical noise while examining urban areas or neighbourhood patterns.
Typical applications include:
mapping cancer incidence
analysing hospital admission rates
analysing access to green space or exposure to air pollution between local areas
estimating smoking or alcohol use between local areas to identify spatial patterns related to targeted public health interventions
When to use lower-level geographies
LSOAs are ideal for detailed analyses of local social or environmental exposures, particularly at street or block level. They are the standard unit for the Index of Multiple Deprivation (IMD), making them well suited for studies of deprivation or localised health inequalities. Lower-level geographies are most effective when there is sufficient data to avoid small-number instability, ensuring reliable statistics.
Typical applications include:
examining variation in obesity or mental health within a city
analysing access to green space or exposure to air pollution at neighbourhood scale, within a local area
identifying local inequalities in emergency admissions
mapping infectious disease risk within urban areas
Future releases
Country data for Northern Ireland were included in the initial release but have since been removed due to small numbers. These data will be reinstated in a future release.
Similarly, Data Zones for Scotland are not included in the current release due to a high proportion of small participant counts. They will be added in a future release as our resource grows.
Additional geographic levels may be introduced in future releases. Each level will be released as a distinct dataset.
Geographies data processing and release
Each release includes all eligible participants who registered before the cut-off date and have fully completed and submitted their baseline questionnaire.
Data for participants who have fully withdrawn from Our Future Health are not included, as their data are routinely deleted following a withdrawal request. Accordingly, participants who have withdrawn since the previous data release are also excluded from the current release.
Only the participant's address at registration is used for geographic assignment. Subsequent address changes or participant relocations after registration are not reflected in this dataset.
How do we map participant addresses to geographic areas?
Address geocoding
During registration, participants provide their postcode and address. Our registration system integrates the Ideal Postcodes (external link) lookup service, which returns additional location data associated with the address, including geocoded latitude and longitude coordinates. These coordinates are used to generate point geometries that represent each participant's location.
Spatial mapping
Each participant’s coordinate is assigned to official UK statistical geographies using a point-in-polygon (PIP) process. The coordinate is checked against Output Area (OA) boundary polygons - the smallest statistical units in the UK. Once the containing OA is identified, the result is recorded in a table linking the participant to the corresponding OA code and label.
Linking to higher-level geographies
The lookup table outlines the permitted hierarchical paths between UK geographic units such as:
OA → LSOA → MSOA → Local Authority District (LAD) → Region → Country
OA → Data Zone (DZ) → Intermediate Zone (IZ) → Local Authority District (LAD) → Region → Country
Mappings at the OA and LAD levels are used internally to move from coordinates to broader geographies, but these intermediate mappings are not part of the released datasets.
The lookup table is built from official geographic boundary shapefiles published by national statistical agencies, including the Office for National Statistics (ONS) and the Scottish Government. The process involves:
cleaning shapefiles to remove gaps, overlaps, or boundary inconsistencies within and between geographic levels
sequentially joining shapefiles through spatial joins to extract the relevant codes and labels for each level
validating that each lower-level geography (e.g. an OA) maps to only one higher-level geography (e.g. an LSOA), ensuring one-to-one hierarchical consistency
The lookup table will be used until at least the next Census cycle and remain consistent across releases. It has been checked against authoritative sources to ensure compatibility with official UK geography standards, for example:
Boundary files and versions
We used the following boundary files and versions:
Country-level boundaries
Source: Office for National Statistics (ONS)
Published: 11 February 2025
Region-level boundaries
Source: Office for National Statistics (ONS)
Published: 4 February 2025
Local Authority District boundaries
Source: Office for National Statistics (ONS)
Published: 31 July 2025
Middle Super Output Area boundaries
Source: Office for National Statistics (ONS)
Published: 10 July 2024
Lower Super Output Area boundaries
Source: Office for National Statistics (ONS)
Published: 9 July 2024
Intermediate Zone boundaries
Source: Spatial Data Scotland
Published: 16 December 2024
Data Zone boundaries
Source: Spatial Data Scotland
Published: 16 December 2024
Output Area boundaries (England and Wales)
Source: Office for National Statistics (ONS)
Published: 12 July 2023
Output Area boundaries (Scotland)
Source: National Records of Scotland (NRS)
Published: 4 November 2024
We chose International Territorial Level 1 (ITL1) boundaries because they provide a consistent regional framework across the UK and recognise the devolved nations as distinct regions. ITLs are widely used in health, policy, and economic research, supporting comparability across studies, and are the ONS standard for regional data collection.
The files are fully clipped, including only the exact boundaries of official areas. This ensures accurate assignment of participants, especially near administrative edges. For more details on files files see the ONS digital boundaries documentation (external link).
All geographic assignments use fixed versions of official boundary files applied consistently across releases. Newer versions of the boundary files are not implemented. This preserves longitudinal comparability, ensures stability over time, and reduces the risk of participant re-identification due to changes in boundaries or classification.
Licensing
Geographic boundary data are © Crown copyright and database rights 2024. They contain public sector information licensed under the Open Government Licence v3.0 (external link).
Why use latitude and longitude coordinates?
Our Future Health maps participants to administrative geographies using latitude and longitude coordinates derived from residential addresses. While postcode centroids are a simpler alternative, they are less precise and can change over time due to updates in postal geography, reducing reliability for individual-level analysis. Using exact coordinates allows accurate assignment to geographic units via point-in-polygon mapping, ensuring spatial precision and consistency for research purposes.
Limitations and caveats
A small proportion of participants could not be assigned to a geographic area due to incomplete, invalid, or unresolvable address data.
Additionally, all assignments are based solely on the residential address provided at the time of registration. Subsequent changes of address are not reflected in these data.
The quality and precision of geographic assignment are contingent upon the accuracy of the self-reported registration address and the reliability of the Ideal Postcodes service.
Data access and de-identification
Data access
Access to participant geography datasets is restricted and must be specifically requested in a study application or amendment. Each request is reviewed by an expert panel to ensure the geographic detail is necessary and appropriate for the proposed research.
To support data minimisation, each geographic hierarchy - from small to mid-level areas - is provided as a separate dataset. Researchers must request access to each level individually and will be granted only the level(s) required for their approved analyses. This approach maintains an appropriate balance between research value and data sensitivity.
How do we de-identify the data to minimise risks of identifying participants?
Participant geography datasets include only non-identifiable geographic classifications derived from coordinate-based mapping. As described previously, data for Northern Ireland and Scottish Data Zones are excluded from the current release due to potential re-identification risks; these will be provided in a future release once appropriate disclosure controls are in place.
Participants living in LSOAs or Data Zones with 10 or fewer individuals are also excluded. These exclusions are carried through all higher-level geographies to prevent the inadvertent disclosure of information about small groups.
How is the data organised in the Trusted Research Environment (TRE)?
In the TRE, participant geography datasets are stored as separate entities. The country and region dataset includes three variables per participant: Participant ID (PID), country at registration, and region at registration. The other geography datasets contain the PID and the relevant geographic area.
All entities can be linked using the PID, which uniquely identifies each participant. Other variable names are unique across all datasets.
Geographic areas are recorded using standard codes. A separate codings file provides these codes and their corresponding labels, consistent with those used in the source shapefiles.
Below is an example of the country and region release:
A1B2C3D
E92000001
TLF
M4N5K6L
E92000001
TLI
R7T9LQ2
W92000004
TLL
How do I interpret the structured field names?
Field names are short, descriptive, and often abbreviated labels used to indicate the contents of each column in the dataset.
These data are not versioned and contain only one value per column. The field names reflect the geographic level followed by the context or time point from which they were obtained, for example:
COUNTRY_AT_REG- the derived country at the time of registration.REGION_AT_REG- the derived region at the time of registration.
What metadata is available to help document the participant geographies releases?
We provide the following data files on our Data and cohort page (external link):
Data dictionary - which defines the raw data fields and metadata information, such as labels, descriptions and units of measurements
Coding file - which contains the granular details of categorical or raw coded values
For Participant geographies, these files are available per dataset.
If using Microsoft Excel to browse these files, for an optimal viewing experience, ensure the encoding settings are set to UTF-8.
Last updated
