Questionnaire data
Information about the questionnaire data in the Our Future Health resource, including the scope and structure of these data and how the data were generated and processed
Participants can complete the questionnaire online immediately after consenting to join the programme or they can access it later from the participant dashboard. Participants answer questions about their health history, work and education, lifestyle and family health history. This section provides a brief overview of the features of the questionnaire, including its objectives, design, contents, and administration, as well as some details on how each data set is prepared for release.
Content and administration
What types of questions do we ask?
The current baseline health questionnaire has a total of 288 questions (v2). There are 68 core questions, which all our participants see, and 220 dynamic questions, which are shown to participants selectively based on previous responses.
There are 5 sections within the questionnaire:
About you and your household (for example, age, sex, height, weight and ethnicity)
Work and education (for example, income, employment history and education)
Your lifestyle (for example, socialising, screen use and alcohol intake)
Family health history (for example, sibling and parent health)
Your health history (for example, screenings, medications and any current symptoms)
For more detailed information on the question text, answer options, help information and validation see the questions we asked in:
For information on our versioning approach see How do we use major and minor versioning for the questionnaire?.
Do all participants respond to every question?
The current (v2) questionnaire has 68 core questions (69 in v1), which all our participants see, and 220 dynamic questions (133 in v1), which are shown to participants selectively based on previous responses. This ensures that the questionnaire journey is tailored to each individual and that participants are only asked questions that are relevant to them. Therefore, not all participants respond to every question.
Examples for dynamic questions include:
only participants who report that they smoke will be subsequently asked about their smoking behaviours
only participants who respond that their sex was registered as female (or intersex, or prefer not to answer) at birth will be asked about their gynaecological health
Answers to questions that were not presented to a participant are recorded as NULL.
For comprehensive details on the question-answer pairings for each dynamic question in version 2.2, please download the questionnaire logic file provided below. This file outlines the dynamic logic using pseudocode, including the relevant field names and answer values (last updated 4 July 2024).
How do participants access the questionnaire?
The questionnaire is available online via our participant website. Participants can start the questionnaire immediately after consenting or they can access it later from the participant dashboard. Before starting, we tell participants:
approximately how long the questionnaire will take to complete
that we will save their answers as they go
that they can stop at any time and continue later
The questionnaire is an important part of the study, so we provide contextual support to help them complete it. For some questions, participants can get additional guidance by selecting "How to answer this question". We also always provide a contact number for our support team in case participants need further help.
We send reminders to participants to prompt them to either start or complete the questionnaire, depending on their progress. If participants log into their account again and have not finished the questionnaire, the prompt to complete it will still be on their dashboard.
At any time before submission, participants can go back and change their answers. Once submitted, the answers are no longer editable.
What were the response rates for the questionnaire?
For participants who joined Our Future Health on or before 13 February 2025:
72% of participants who register with us, go on to consent to take part in the programme
80% of those who consent, start the questionnaire
92% of participants who start the questionnaire, complete it
Version changes and developments
How has the baseline health questionnaire changed over time?
The current data release includes responses from participants who joined the programme on or before April 2024. These participants will have completed either v1, v2, v2.1 or v2.2 of our baseline health questionnaire.
When did each version of the questionnaire go live?
24 May 2021
v1
First version of the questionnaire
20 November 2022
v2
Expanded from 202 to 288 total questions. (27 v1 questions were removed, 47 v1 questions were updated and 113 new questions were introduced to v2)
21 December 2023
v2.1
Logic changes only. No changes to the questionnaire content.
7 June 2024
v2.2
Logic changes and one data type update.
How is version control applied to the questionnaire?
Changes to the underlying questionnaire templates are tracked using the Git version control system. For each new version, we batched changes and released them in a single new version that went live at the same time.
Each version of the questionnaire consists of 2 components that provide an accurate record of exactly what participants experienced when completing it.
These 2 components are:
a human-readable copy of the questionnaire that describes the full participant experience. Questions are presented in a template that includes display logic and response validation, where applicable.
a JSON template file that contains and encodes all questionnaire content. Our online questionnaire platform interprets these files and presents them to participants via a web application. This is an exact representation of the questionnaire and contains all the questionnaire content as "data".
How do we use major and minor versioning for the questionnaire?
So far, there have been two major versions: v1 and v2., and two minor versions: v2.1 and v2.2.
A new major version of the questionnaire is introduced when substantial changes are made, such as adding new questions, modifying answer options, or altering question phrasing.
Minor versions are used for updates to a major version and currently include two main types of changes:
changes to the logical conditions that determine which dynamic questions a participant sees
changes to question information, such as switching from single-select to multi-select answer types
Minor updates do not change the structure of the tables, but there may be changes to the content. To identify which questions have been updated across minor versions, you can refer to the QUESTIONNAIRE_VERSION
indicator in the questionnaire table, in conjunction with the release documentation.
See the Change log for Questionnaire versions for more information on bug fixes and impacts on data quality.
Spelling errors, grammatical mistakes, clarifications, and updated encodings may be corrected without changing the version. Our versioning approach may evolve over time. We will adjust our documentation as appropriate.
What changed in questionnaire version v2?
Version v2 of the questionnaire includes changes to some of the original questions and their structure. We made these changes based on a review of question content and user testing that focused on how well participants understood what we asked.
These changes were further informed by the experiences of other relevant programmes, including UK Censuses and other research studies.
We have made these changes to ensure that we are capturing important information about health and lifestyle in a way that is sensitive and aligned with current best practices and research priorities.
The v2 questionnaire includes new questions and allows participants to respond to new or updated questions including:
a greatly expanded range of health conditions, family health history, and medication types
smoking status aligned to questions from the Connect for Cancer Prevention Study (external link)
alcohol and smoking cessation aligned to questions from the University College London Alcohol toolkit study and smoking toolkit study (external link)
both opposite-sex and same-sex marriage and civil partnership status
more detail on sexual orientation, sex and gender identity and any transgender history
responses to modernised screen-time questions
To test changes between v1 and v2 of the questionnaire: we used a survey-based test to assess comprehension and acceptability of the modified or newly added questions and sections. We recruited 330 members of the public to take part in the test and measured their ability to accurately answer the questions.
The new questions and sections we tested and subsequently included in v2 asked about participants:
family health history
medication
sexuality
gender
For each of these questions or sections, 98% to 100% of the people who took part in the survey-based test said that they understood what was being asked. Also, 88% to 100% of the respondents felt that they could accurately answer the new questions and sections. The main difficulty participants reported, was a lack of confidence in recalling all details of for example past medical diagnoses.
Questionnaire data processing and release
How do we process the data for each release?
We process the raw data from all participants who were in the programme on or before the cut-off date for each release. Data for participants who have fully withdrawn from Our Future Health is deleted after they request to withdraw. Any participants who have fully withdrawn from the programme since the last data release will not be included in the current data release.
We performed minimal additional data processing for this release, including:
migrating data from the web-based questionnaire platform to match the required TRE table format
validating variables to ensure they meet specifications such as data type, length, measurement, and value ranges
converting all height and weight entries to the metric system as stored in
DEMOG_HEIGHT_1_1
andDEMOG_WEIGHT_1_1
, if participants originally provided them in imperial or metric units as indicated byDEMOG_HEIGHT_ENTER_UNIT_1_1
andDEMOG_WEIGHT_ENTER_UNIT_1_1
setting responses in the GAD7 and PHQ9 subsections to "Prefer not to say" (-3) for participants who opted to skip these sections by selecting "Yes" (1) to
SKIP_PHQ9_GAD7_1_1
How do we de-identify the questionnaire data to minimise risks of identifying participants?
In the Questionnaire table we:
Suppressed the number of children for participants who reported that they had more than 20 children in fields:
CHILDREN_BIO_NUM_1_1
CHILDREN_BIO_NUM_2_1
CHILDREN_BIRTHED_NUM_1_1
We also suppressed the age at first child and the age at last child for participants who reported age at birth of under 12 or over 48 years. This includes both self-reported male and female participants in fields:
CHILDREN_BIRTHED_FIRST_AGE_1_1
CHILDREN_BIRTHED_LAST_AGE_1_1
CHILDREN_BIO_FIRST_AGE_1_1
CHILDREN_BIO_LAST_AGE_1_1
Suppressed values are represented by the unique code “-999”, allowing researchers to distinguish them from non-responses, which are coded as NULL. Where applicable, the corresponding numeric codes and textual labels for suppressed values are detailed in the codings file.
We carefully review all questionnaire age and year fields to prevent unintended disclosure of participants' age status if they are suppressed due to being over 95. In such cases, responses are suppressed and replaced with "-999"' as outlined above.
How is the Questionnaire table organised in the Trusted Research Environment (TRE)?
The data release includes 358 variables for the Questionnaire data across all versions of the questionnaire. In the TRE, the data is organised into a single entity which we refer to as the Questionnaire table. Each entity can be linked to another entity (for example, the Participant table using a unique identifier (called Participant Identifier or PID). Other than the PID variable names are unique within and between all entities.
How do I interpret the structured field names?
Field names are short, descriptive and often abbreviated names used to describe the contents of a particular column. In the questionnaire table, each field name consists of four main components in the following format:
[Primary_topic]_[Unique_descriptor]_[Version]_[Item]
These components are:
Primary topic: A single element (a word, phrase, or abbreviation) that represents a broad category describing the overall family of fields. This is often shared among multiple fields.
Unique descriptor (optional): Single or multiple elements, connected by underscores, which describe the specific content of the field. Unique descriptors are systematically constructed with elements in sequence, often shared between a set of related fields, to give insights into the distinctions between these fields and their progression within the questionnaire.
Version (of the question): If a question has been amended to the extent that it may significantly impact the way in which participants respond, the version will be updated. This is separate from the questionnaire version as some questions are identical. In the current data release, these question version numbers are either 1 or 2.
Item: This indicates the maximum number of responses allowed per field. For single answer questions, the iterator is always 1. However, for multiple-answer questions, it represents the maximum number of possible answers or "M" (indicating multiple).
Example field names:
The variable ACTIVITY_TYPE_1_M
corresponds to the question: “In the last 4 weeks, did you spend any time doing the following?”, which allows for multiple responses.
If a participant selects option 1 - "Walking for pleasure (not as a means of transport)", they are subsequently asked two follow-up questions, regardless of any other options selected:
ACTIVITY_TYPE_WALK_1_1
: "How many times in the last 4 weeks did you go walking for pleasure?"ACTIVITY_TYPE_WALK_DUR_1_1
: "Each time you went walking for pleasure, about how long did you spend doing it?"
Abbreviations used here are chosen based on a presumed ease of recognition or common acceptance in universal and medical contexts. For example:
"DEMOG" refers to DEMOGRAPHICS e.g.
DEMOG_SEX_1_1
"CHG" refers to CHANGE e.g.
HEALTH_WEIGHT_CHG_1_1
"GYN" refers to GYNECOLOGY e.g.
GYN_MENSTR_CYCLE_DAYS_1_1
How do the field names relate to the questionnaire identifiers in the human-readable questionnaire?
The field names used as column headings in the datasets do not directly correspond to the abbreviated codes or identifiers used in the human-readable questionnaire templates. As a result, the field names seen in the data files may differ from those listed in the template.
If you are using the human-readable template to identify variables of interest, we recommend using the question text itself as a consistent reference point. This text is also included in the description column of the data dictionary, allowing you to reliably map from the questionnaire to the correct dataset column.
For version 2.2, a logic file is provided (linked in the relevant section), which includes a 1:1 mapping between the dataset field names and the corresponding identifiers in the human-readable questionnaire template. This file serves as a definitive cross-reference to support accurate variable selection and interpretation.
How are different versions of the same questions stored?
Each questionnaire is accompanied by a questionnaire version identifier (v1, v2, v2.1, or v2.2) according to the QUESTIONNAIRE_VERSION
column. Each major version of a question is stored in a distinct field. Minor versions of a question are stored in the same field as the major version.
For example, work status questions changed between v1 and v2 of the questionnaire, and further fixes were applied to these questions in both v2.1 and v2.2. However, all responses from v2 and subsequent versions of the questionnaire are stored in the field WORK_STATUS_2_M
:
v1
[1]
NULL
v2
NULL
[1, 2]
v2.1
NULL
[2, 6]
v2.2
NULL
[3, 5, 6]
What are the main limitations of the questionnaire?
Detail and depth
The questionnaire asks participants about a wide range of topics relating to their health and lifestyles. However, it’s not possible to ask every question a researcher may need. We aim to add more specialised questions and other data collections in the future, to broaden the range of available data. Participants have provided consent to be re-contacted and invited to take part in further studies.
Accessibility
The questionnaire is currently only available online on our participant website. Participants will need to have access to a suitable device and the internet to complete it. Also, the questionnaire is only available in English. Later this year, we plan to move our questionnaire to a new questionnaire platform. This platform will enable us to overcome some of the barriers that some participants face when accessing and completing the questionnaire.
What metadata is available to help document the Questionnaire release?
We provide the following data files on our Data and cohort page (external link):
Data dictionary – which defines the raw data fields and metadata information, such as labels, descriptions and units of measurements
Participant and Questionnaire coding file – which contains the granular details of categorical or raw coded values for fields contained within the participant and questionnaire data
If using Microsoft Excel to browse these files, for an optimal viewing experience, ensure the encoding settings are set to UTF-8.
On the Questionnaire data section above we also provide:
Human-readable versions of both version 1 and version 2 of the questionnaire - which are text copies of the baseline health questionnaire
Questionnaire logic codebook – which represents dynamic logic implemented for v2.2 of the baseline health questionnaire and can be used in conjunction with v2 of the human-readable questionnaire
Last updated