> For the complete documentation index, see [llms.txt](https://ourfuturehealth.gitbook.io/our-future-health/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ourfuturehealth.gitbook.io/our-future-health/~/revisions/KWSA3W0EpF5Kobxp6RS0/data/participant-geographies-data.md).

# Participant geographies data

The Participant geographies data provides geographic information derived from participants’ self-reported address at the time of registration to the Our Future Health programme. Currently, Participant geographies data consists of a single dataset containing both country and region data. This page outlines the structure, scope, and methodology used to generate and process the data for release.

***

### What information is included in the participant geographies data? <a href="#what-information-do-the-participant-geographies-datasets-contain" id="what-information-do-the-participant-geographies-datasets-contain"></a>

The participant geographies data currently consists of a single data called the Country and Region table. This table includes the country and region linked to each participant’s self-reported address collected during their registration for the Our Future Health programme.

### Country and region <a href="#what-information-do-the-participant-geographies-datasets-contain" id="what-information-do-the-participant-geographies-datasets-contain"></a>

The United Kingdom (UK) is made up of four countries, called the devolved nations:&#x20;

* England
* Scotland
* Wales
* Northern Ireland

England itself is divided in to nine official regions:&#x20;

* North East
* North West
* Yorkshire and the Humber
* East Midlands, West Midland
* &#x20;East of England
* London
* South East
* South West

These regions, along with the three other devolved nations, form the political and geographical makeup of the UK.

#### Why country and region matters for health research

Understanding the UK’s countries and regions is important in health research because it helps identify demographic differences that affect health needs, outcomes, and service provision.

#### Future releases

Additional geographic levels will be introduced in future releases. Each level will be released as a distinct dataset.

### Country and region data processing and release <a href="#participant-geographies-data-processing-and-release" id="participant-geographies-data-processing-and-release"></a>

We are releasing the geographic data in stages. The initial releases include a subset of participants who joined the programme earliest, specifically those who registered between 2021 and 2022. Future releases will expand to include all participants who:

1. Registered before the cut-off date for that release, and
2. Have fully completed and submitted their questionnaire.

Data for participants who have fully withdrawn from Our Future Health is not included, as those data are deleted routinely after they request to withdraw. Participants who have fully withdrawn from the programme since the last data release will not be included in the current data release.

Only registration address is used. No subsequent address changes or participant relocations are reflected.

#### **How do we map participant addresses to geographic areas?**

During registration, participants provide their postcode and address. Our registration system integrates the [Ideal Postcodes](https://ideal-postcodes.co.uk/) lookup service, which returns additional location data associated with the address, including geocoded latitude and longitude coordinates. These coordinates are used to generate point geometries that represent each participant's location.

We use these coordinates to assign each participant to standard UK geographic boundaries, such as country and region. This is accomplished using a point-in-polygon spatial mapping technique, whereby each participant’s point geometry is overlaid onto official boundary shapefiles that divide the UK into discrete polygonal areas representing defined geographic unit. For details in the exact shapefiles, see the below section [#geographic-boundary-file-sources-and-versioning](#geographic-boundary-file-sources-and-versioning "mention")

Each point is evaluated for spatial containment within a given polygon. Once a match is identified, the spatial information is transformed into a structured tabular format. In this output, each participant is linked to the relevant geographic unit, including both the unit’s official code (e.g. country code as `E92000001`) and its corresponding label (e.g. country label as “England”).

#### **Geographic boundary file sources and versioning**

**Country-level boundaries**

* Source: Office for National Statistics (ONS)
* Dataset: [Countries (December 2024) Boundaries UK BFC](https://geoportal.statistics.gov.uk/datasets/ons::countries-december-2024-boundaries-uk-bfc-2/about) (shapefile)
* Published: 11 February 2025

**Region-level boundaries**

* Source: Office for National Statistics (ONS)
* Dataset: [International Territorial Level 1 (January 2025) Boundaries UK BFC](https://geoportal.statistics.gov.uk/datasets/ons::international-territorial-level-1-january-2025-boundaries-uk-bfc/about) (shapefile)
* Published: 4 February 2025

We chose ITLs because they provide a harmonised structure for regional geography across the UK, explicitly recognising the devolved nations as distinct regions. This framework is increasingly used in health, policy, and economic research and supports greater comparability across studies by aligning with national and international geographic standards. The Office for National Statistics (ONS), the UK’s official statistical agency, adopts ITLs as the standard for regional data collection and reporting, ensuring consistency across official statistics.

The shapefiles used are fully clipped, meaning that they include only the precise geographic extents of official administrative areas, excluding any extraneous or overlapping spatial features. This ensures accurate and unambiguous assignment of participants to geographic units, particularly for those located near administrative boundaries.

All geographic assignments are based on fixed versions of official boundary shapefiles. These files are used consistently across all data releases to ensure temporal stability and prevent inconsistencies that might arise from administrative boundary changes over time. Even when newer versions of boundary datasets become available, we do not adopt dynamic updates.

This decision is made to preserve longitudinal comparability and to reduce the risk of participant re-identification through geographic triangulation or shifts in classification over successive data releases.

#### **Licensing**

Geographic boundary data are © Crown copyright and database rights 2024. They contain public sector information licensed under the [Open Government Licence v3.0](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/).

#### **Why use latitude and longitude coordinates?**

Our Future Health uses latitude and longitude coordinates derived from individual participant residential addresses to map to administrative geographies.

The main alternative approach would be to use the locations of postcode centroids. Postcode centroids represent postcode areas using a single, central point and are more commonly used for aggregated spatial analysis. While they offer simplicity postcode centroids are inherently less precise and are subject to change due to updates in postal geography, such as the creation of new postcodes, boundary shifts, or service reorganisation by postal authorities. These changes can introduce inconsistencies across time and reduce the reliability of geographic classification for individual records.

In contrast, latitude and longitude coordinates of individual addresses offer superior accuracy for participants-level spatial assignment, particularly in cases where participants are located near the edges of administrative boundaries. This precision allows us to assign participants to defined geographic units through point-in-polygon mapping techniques. This approach enhances spatial accuracy and consistency, supporting the scientific objectives of the Our Future Health programme.

#### **Limitations and caveats**

A small proportion of participants could not be assigned to a geographic area due to incomplete, invalid, or unresolvable address data.

Additionally, all assignments are based solely on the residential address provided at the time of registration. Subsequent changes of address are not reflected in these data.

The quality and precision of geographic assignment are contingent upon the accuracy of the self-reported registration address and the reliability of the Ideal Postcodes service.

### Data access and de-identification

**Data access**

Datasets are stored and maintained independently from other participant datasets within the Trusted Research Environment (TRE).&#x20;

Access to any participant geographies dataset is restricted and requires a dedicated request. Requests are reviewed by an expert panel to ensure that the geographic information is necessary and appropriate for the intended research purpose.

**How do we de-identify the data to minimise risks of identifying participants?**

Participant geography datasets exclude all participant level address and postcode data and contain only non-identifiable geographic classifications derived from coordinate-based mapping.

### How is the data organised in the Trusted Research Environment (TRE)?

In the TRE, all participant geographies datasets will be maintained as separate entities.

For the current release, the Country and Region table is organised as a single entity, containing one row per participant with three variables: Participant ID (PID), country at registration, and region at registration.

Each entity can be linked to other entities (for example, the questionnaire dataset) using the PID, which is a unique participant identifier. Aside from the PID, variable names are unique both within and across all entities.

For the geographies data, the release datasets always store participant information using the relevant codes. The codings file includes these codes along with their full textual labels (referred to as "meanings" in the codings file), which correspond to the codes and labels used in the shapefiles from which the data are derived as described in [#geographic-boundary-file-sources-and-versioning](#geographic-boundary-file-sources-and-versioning "mention")&#x20;

Below is an example of the release candidate:

| PID     | COUNTRY\_AT\_REG | REGION\_AT\_REG |
| ------- | ---------------- | --------------- |
| A1B2C3D | E92000001        | TLF             |
| M4N5K6L | E92000001        | TLI             |
| R7T9LQ2 | W92000004        | TLL             |

#### How do I interpret the structured field names?

Field names are short, descriptive, and often abbreviated labels used to indicate the contents of each column in the dataset.

These data are not versioned and contain only one value per column. The field names reflect the geographic level followed by the context or time point from which they were obtained, for example:

* `country_at_reg` - the derived country at the time of registration.
* `region_at_reg` - the derived region at the time of registration.

#### What metadata is available to help document the participant geographies releases?

We provide the following data files on our [Data and cohort page (external link)](https://research.ourfuturehealth.org.uk/data-and-cohort):

* data dictionary - which defines the raw data fields and metadata information, such as labels, descriptions and units of measurements
* coding file - which contains the granular details of categorical or raw coded values

If using Microsoft Excel to browse these files, for an optimal viewing experience, ensure the encoding settings are set to UTF-8.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ourfuturehealth.gitbook.io/our-future-health/~/revisions/KWSA3W0EpF5Kobxp6RS0/data/participant-geographies-data.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.