| Literature DB >> 28898245 |
Marieke P Hoevenaar-Blom1, Juliette Guillemont2, Tiia Ngandu3, Cathrien R L Beishuizen1, Nicola Coley4,5, Eric P Moll van Charante6, Sandrine Andrieu4,5, Miia Kivipelto3,7,8,9, Hilkka Soininen7,10, Carol Brayne11, Yannick Meiller12, Edo Richard1,13.
Abstract
Lack of attention to missing data in research may result in biased results, loss of power and reduced generalizability. Registering reasons for missing values at the time of data collection, or-in the case of sharing existing data-before making data available to other teams, can save time and efforts, improve scientific value and help to prevent erroneous assumptions and biased results. To ensure that encoding of missing data is sufficient to understand the reason why data are missing, it should ideally be context-free. Therefore, 11 context-free codes of missing data were carefully designed based on three completed randomized controlled clinical trials and tested in a new randomized controlled clinical trial by an international team consisting of clinical researchers and epidemiologists with extended experience in designing and conducting trials and an Information System expert. These codes can be divided into missing due to participant and/or participation characteristics (n = 6), missing by design (n = 4), and due to a procedural error (n = 1). Broad implementation of context-free missing data encoding may enhance the possibilities of data sharing and pooling, thus allowing more powerful analyses using existing data.Entities:
Mesh:
Year: 2017 PMID: 28898245 PMCID: PMC5595279 DOI: 10.1371/journal.pone.0182362
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Categories of missing data.
| Source | Category of missing data | Examples | Abbreviation | Type |
|---|---|---|---|---|
| Participant and/ or participation characteristic | Assessed but the participant does not know | Family history could not be answered, because of broken contact | ASSU | MCAR |
| Assessed but the participant was not able to provide the information | Disability preventing a physical test | ASSD | MNAR | |
| Refusal | Participant does not want to tell his weight because he is embarrassed | ASSR | MNAR | |
| Not applicable | “Do you feel disabled in doing volunteer work” in case the participant is not engaged in any volunteering | NA | MNAR | |
| The visit has been missed | In case of a missed visit, all variables for this visit are missing | MISS | MAR/ MNAR | |
| Dropout | In case of dropout, variables subsequent to the date of dropout are missing | DROP | MAR/ MNAR | |
| By design | Not assessed, variable not in the study | Only applicable for data pooling | NASS | MCAR |
| Not applicable because of conditional variable | Date of birth of siblings if participant does not have siblings | NAC | MNAR | |
| Due to random subsampling | Expensive measurement only performed in random subsample. For others these values are missing. | RS | MCAR | |
| Answer/value not available yet | Blood sample was collected though not analyzed yet | NAV | MCAR | |
| Procedural error | Not assessed/ registered, by mistake | Box of questionnaires got lost | ERR | MCAR |
a MCAR: missing completely at random, MAR: missing at random or MNAR: missing not at random. The types are an indication of the most common scenarios fitting to this category (see Discussion section).
b Often, the question whether it is applicable (for instance ‘do you take medication’) is not included. In this case NA has to be filled in manually. However, if this question is asked and therefore the conditional variables can be skipped, a digital questionnaire can fill out the NAC category automatically.
c In ‘ASSD’ and ‘NA’ categories the fact that the value is missing depends on the reason why it is missing, so this fits the definition of MNAR. However, how to handle this in the analyses should be decided on a case to case basis.
d We advise for these categories to make subcategories, specific to the study (see Discussion section).
e In a digital questionnaire it is possible that the conditional questions are automatically skipped so the participant does not have to deal with the questions that are not applicable to their situation. To inform the data analyst that the variable is deliberately skipped the NAC value will be automatically filled out.