> For the complete documentation index, see [llms.txt](https://ourfuturehealth.gitbook.io/our-future-health/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ourfuturehealth.gitbook.io/our-future-health/data-types/overview-of-data-in-the-tre.md).

# Overview of data in the TRE

### Where is Our Future Health Data stored?&#x20;

Our Future Health stores de-identified data from our participants on a Trusted Research Environment (TRE) hosted by [DNAnexus](https://dnanexus.gitbook.io/ofh). A TRE is a secure platform that facilitates data storage and analysis without the need for any datasets to be downloaded on individual devices.&#x20;

All data from Our Future Health is analysed on the TRE using different [analytical tools](https://dnanexus.gitbook.io/ofh/running-analyses/tool-library).&#x20;

All file transfers in and out of the TRE must be performed via the [Airlock](https://dnanexus.gitbook.io/ofh/airlock/importing-and-exporting-files-through-the-airlock).&#x20;

For more information on navigating and analysing data in the TRE, please refer to the [TRE documentation](https://dnanexus.gitbook.io/ofh).

Visit our public [github](https://github.com/ourfuturehealth/tre-example-notebooks) repository for example code scripts, inlcuding python, R and bash notebooks, to get started with the resource.

Researchers interested in using Our Future Health data need to [apply for access](https://research.ourfuturehealth.org.uk/apply-to-access-the-data/) by creating an account and submitting a research proposal. Once access to the data has been granted, de-identified data can be accessed through the [DNAnexus environment](https://dnanexus.gitbook.io/ofh/about_ofh/accessing_tre).

***

### What data types does Our Future Health have and how are they structured in the TRE?

Currently, Our Future Health has the following data types:

* **Participant data**, which contains information on participant sex, gender, ethnicity, month and year of birth, consent version, month and year of consent, month and year of registration, blood sample
* Self-reported baseline **questionnaire data**, which contains information of socio-economic, lifestyle and individual and family health&#x20;
* **Clinic measurements data**, which includes blood pressure, height, weight, BMI, heart rate and POCT lipid profile&#x20;
* **Genetic data,** which includes both genotyping array and imputed genetic data in two file formats (pVCF and BGEN format), and files with sample QC, kinship, ancestry and PCA loadings information&#x20;
* **Participant geographies data** for all devolved nations including small area statistical zones such as LSOA, MSOA and Intermediate Zones
* **Linked health records data** for participants receiving care in England, including HES, cancer registry and deaths&#x20;

The data types are separated into entities in the TRE. Each entity contains one or several tables. Below is a description of the entities as they appear in the TRE. For in-depth descriptions of entity variable names and codes, please refer to our data dictionary and coding file on the [Data and cohort page](https://research.ourfuturehealth.org.uk/data-and-cohort/).&#x20;

<table><thead><tr><th width="260">Data Type</th><th width="477">Entity name(s) in TRE</th></tr></thead><tbody><tr><td><a href="/pages/lRssf4ukqbvypH0DAPF6">Participant data</a></td><td><ul><li><code>participant</code></li></ul></td></tr><tr><td><a href="/pages/jUWMOdf7ofqCy5nTGAyj">Questionnaire data </a></td><td><ul><li><code>questionnaire</code></li></ul></td></tr><tr><td><a href="/pages/PLp4CajNv7s7v4ywAsb9">Participant geographies</a></td><td><ul><li><code>country_region</code></li><li><code>msoa</code></li><li><code>lsoa</code></li><li><code>intermediate_zones</code></li></ul></td></tr><tr><td><a href="/pages/5W0vlT6ny9lQB2g2rJJg">Clinic measurements data </a></td><td><ul><li><code>clinic_measurements</code></li><li><code>poct_lipid_profile</code></li></ul></td></tr><tr><td><a href="/pages/reuk3OSrnC1bcZz86eh9">Genetic data</a></td><td><ul><li>inventory file (snv / imputed)</li><li><p>snv_pvcf / imputed_pvcf</p><ul><li>160 / 809 <code>VCF</code> files</li><li>160 / 809 <code>VCF</code> index files</li></ul></li><li><p>snv_bgen / imputed_bgen</p><ul><li>160 / 809 <code>bgen</code> files</li><li>160 / 809 <code>bgen</code> index files</li><li>160 / 809 <code>bgen.sample</code> files</li></ul></li><li><p>snv_resources / imputed_resources</p><ul><li><code>tsv</code> sample QC metrics file</li><li><code>txt</code> file containing byte ranges for VCF files</li><li><code>bed</code> file containing region coordinates</li><li>kinship file / ancestry file</li><li>snv: PCA loadings <code>VCF</code> and <code>VCF</code> index files</li><li>imputed: variant metrics <code>VCF</code> and <code>VCF</code> index files</li></ul></li></ul></td></tr><tr><td><a href="/pages/4hAnA2YlHf8axK4BBZ6B">Linked health records data</a></td><td><ul><li><code>nhse_eng_inpat</code> </li><li><code>nhse_eng_ed</code></li><li><code>nhse_eng_outpat</code></li><li><code>nhse_eng_ecds</code> </li><li><code>nhse_eng_primcare_meds</code></li><li><code>nhse_engwal_deaths</code></li><li><code>nhse_eng_canpat</code></li><li><code>nhse_eng_canreg_pattumour</code> </li><li><code>nhse_eng_canreg_treat</code> </li><li><code>nhse_eng_canreg_pre1995</code></li><li><code>participant_nhs_linked</code></li></ul></td></tr></tbody></table>

### How do I link entities together?

Participant, questionnaire, geography, clinic measurements, and some linked health records datasets in the TRE have one row per participant. The majority of linked health records datasets are structured with one row per health episode, which means they contain multiple rows per participant. Data from different entities can be linked for analysis using the key, `PID`, which is a unique participant identifier. No two participants have the same `PID`, which makes it possible to do a one-to-one linkage using one of the available [tools](https://dnanexus.gitbook.io/ofh/running-analyses/tool-library) in the DNAnexus library.&#x20;

### How can I access the cohort browser in the TRE?

The data in the entities can be visualised in the cohort browser in your TRE project by clicking on the dataset file, which will be labelled as `study_ofs_XXX.dataset` with the type/class label, `dataset record`, where `XXX` is a study it number unique to each project. For more information about working with the Cohort Browser, please visit the [DNAnexus page](https://dnanexus.gitbook.io/ofh/cohort-browser/introduction-to-cohort-browser).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ourfuturehealth.gitbook.io/our-future-health/data-types/overview-of-data-in-the-tre.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
