| Literature DB >> 28086961 |
Evangelos Kontopantelis1,2, Rosa Parisi3, David A Springate4,5, David Reeves4,5.
Abstract
BACKGROUND: In modern health care systems, the computerization of all aspects of clinical care has led to the development of large data repositories. For example, in the UK, large primary care databases hold millions of electronic medical records, with detailed information on diagnoses, treatments, outcomes and consultations. Careful analyses of these observational datasets of routinely collected data can complement evidence from clinical trials or even answer research questions that cannot been addressed in an experimental setting. However, 'missingness' is a common problem for routinely collected data, especially for biological parameters over time. Absence of complete data for the whole of a individual's study period is a potential bias risk and standard complete-case approaches may lead to biased estimates. However, the structure of the data values makes standard cross-sectional multiple-imputation approaches unsuitable. In this paper we propose and evaluate mibmi, a new command for cleaning and imputing longitudinal body mass index data.Entities:
Keywords: Body mass index; Cleaning; Longitudinal data; Multiple imputation
Mesh:
Year: 2017 PMID: 28086961 PMCID: PMC5234260 DOI: 10.1186/s13104-016-2365-z
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Algorithm workflow
Mean errors between observed and imputed BMI values, one missing value per individual
| Cases | Methoda | Obs. | Mean | Std.Dev | Min | Max |
|---|---|---|---|---|---|---|
| Allb | Simple ×1 | 10000 | 1.113 | 1.353 | 0.000 | 26.900 |
| mibmi ×100 | 10000 | 1.120 | 1.351 | 0.000 | 26.888 | |
| Twofold ×100 | 10000 | 0.949 | 1.026 | 0.000 | 15.481 | |
| Interpolation | Simple ×1 | 6132 | 0.801 | 0.819 | 0.000 | 11.651 |
| mibmi ×100 | 6132 | 0.808 | 0.819 | 0.001 | 11.429 | |
| Twofold ×100 | 6132 | 0.804 | 0.810 | 0.000 | 11.180 | |
| Extrapolation | Simple ×1 | 3868 | 1.606 | 1.808 | 0.000 | 26.900 |
| mibmi ×100 | 3868 | 1.614 | 1.805 | 0.000 | 26.888 | |
| Twofold ×100 | 3868 | 1.179 | 1.260 | 0.001 | 15.481 |
aSimple refers to a single imputation that ignores variability in the observations (option xsimp); mibmi refers to the default multiple imputation approach with the command and 100 imputations; twofold refers to the twofold algorithm described in the paper and 100 imputations
bAll refers to both interpolations (between observations imputations) and extrapolations (not between observations imputations)
Mean errors between observed and imputed BMI values, three sequential missing values per individual (interpolation only)
| Cases | Methoda | Obs. | Mean | Std.Dev | Min | Max |
|---|---|---|---|---|---|---|
| Allb | Simple ×1 | 30,000 | 0.980 | 1.002 | 0.000 | 16.017 |
| mibmi ×100 | 30,000 | 0.989 | 1.004 | 0.000 | 16.034 | |
| Twofold ×100 | 30,000 | 1.137 | 1.155 | 0.000 | 18.318 | |
| Time point 1 | Simple ×1 | 10,000 | 0.935 | 0.945 | 0.000 | 9.829 |
| mibmi ×100 | 10,000 | 0.943 | 0.947 | 0.000 | 9.779 | |
| Twofold ×100 | 10,000 | 1.094 | 1.114 | 0.000 | 18.318 | |
| Time point 2 | Simple ×1 | 10,000 | 1.059 | 1.068 | 0.000 | 16.017 |
| mibmi ×100 | 10,000 | 1.069 | 1.071 | 0.000 | 16.034 | |
| Twofold ×100 | 10,000 | 1.234 | 1.231 | 0.000 | 17.126 | |
| Time point 3 | Simple ×1 | 10,000 | 0.947 | 0.984 | 0.000 | 10.645 |
| mibmi ×100 | 10,000 | 0.955 | 0.985 | 0.000 | 10.538 | |
| Twofold ×100 | 10,000 | 1.084 | 1.111 | 0.000 | 13.473 |
aSimple refers to a single imputation that ignores variability in the observations (option xsimp); mibmi refers to the default multiple imputation approach with the command and 100 imputations; twofold refers to the twofold algorithm described in the paper and 100 imputations
bAll refers to aggregates across all three time points
Fig. 2Predictions example for a single patient