| Literature DB >> 25829972 |
Katya L Masconi1, Tandi E Matsha2, Justin B Echouffo-Tcheugui3, Rajiv T Erasmus4, Andre P Kengne5.
Abstract
Missing values are common in health research and omitting participants with missing data often leads to loss of statistical power, biased estimates and, consequently, inaccurate inferences. We critically reviewed the challenges posed by missing data in medical research and approaches to address them. To achieve this more efficiently, these issues were analyzed and illustrated through a systematic review on the reporting of missing data and imputation methods (prediction of missing values through relationships within and between variables) undertaken in risk prediction studies of undiagnosed diabetes. Prevalent diabetes risk models were selected based on a recent comprehensive systematic review, supplemented by an updated search of English-language studies published between 1997 and 2014. Reporting of missing data has been limited in studies of prevalent diabetes prediction. Of the 48 articles identified, 62.5% (n = 30) did not report any information on missing data or handling techniques. In 21 (43.8%) studies, researchers opted out of imputation, completing case-wise deletion of participants missing any predictor values. Although imputation methods are encouraged to handle missing data and ensure the accuracy of inferences, this has seldom been the case in studies of diabetes risk prediction. Hence, we elaborated on the various types and patterns of missing data, the limitations of case-wise deletion and state-of the-art methods of imputations and their challenges. This review highlights the inexperience or disregard of investigators of the effect of missing data in risk prediction research. Formal guidelines may enhance the reporting and appropriate handling of missing data in scientific journals.Entities:
Keywords: Diabetes mellitus; Guidelines; Modeling; Patient Stratification; Patterns; Predictive; Preventive and Personalized Medicine; Risk; Screening
Year: 2015 PMID: 25829972 PMCID: PMC4380106 DOI: 10.1186/s13167-015-0028-0
Source DB: PubMed Journal: EPMA J ISSN: 1878-5077 Impact factor: 6.543
Figure 1Workflow summarizing the selection of papers. Keywords: prevalent, diabetes, risk, prediction.
Characteristics of 48 included studies of undiagnosed diabetes risk prediction models
|
|
|
|
|
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| ||||||||||||
|
|
|
|
|
| |||||||||
| Adhikari et al. [ | 2010 | Validate | India (L/M) | / | Current | 551 | >20 | X | X | ||||
| Akyil et al. [ | 2014 | Validate | Turkey (L/M) | / | Current | 702 | / | X | X | ||||
| Al Khalaf et al. [ | 2010 | Develop | Kuwaiti (L/M) | Caucasian | Current | X | 562 | >20 | X | X | |||
| Al-Lawati et al. [ | 2007 | Develop | Oman (H) | Caucasian | Existing | 4,881 | >20 | X | X | ||||
| Baan et al. [ | 1999 | Develop | Netherlands (H) | / | Existing | X | 1,016 | 55–75 | X | X | |||
| Bang et al. [ | 2009 | Develop | USA (H) | / | Existing | 5,258 | >20 | X | X | X | |||
| Bergmann et al. [ | 2007 | Validate | Germany (H) | / | Current | 526 | 41–79 | X | X | ||||
| Bindraban et al. [ | 2008 | Develop | Netherlands (H) | Asian, Black, Caucasian | Existing | 1,434 | 35–60 | X | X | ||||
| Chaturvedi et al. [ | 2008 | Develop | India (L/M) | / | Existing | 4,044 | 35–64 | X | X | ||||
| de Leon et al. [ | 2008 | Develop | Canary Islands (H) | Caucasian | Current | 6,237 | 18–75 | X | X | ||||
| de Sousa et al. [ | 2009 | Develop | Brazil (L/M) | Multi-ethnic | Existing | X | 1,224 | >35 | X | X | |||
| Franciosi et al. [ | 2005 | Validate | Italy (H) | / | Existing | X | 1,377 | 55–75 | X | X | |||
| Gao et al. [ | 2010 | Validate | China (L/M) | Asian | Current | 1,986 | 20–74 | X | X | ||||
| Ginde et al. [ | 2007 | Validate | USA (H) | Caucasian, African-American, Hispanic | Current | 604 | / | X | X | ||||
| Glumer et al. [ | 2004 | Develop | Denmark (H) | / | Existing | 6,784 | 30–60 | X | X | ||||
| Glümer et al. [ | 2005 | Validate | Australia/Denmark (H) | / | Existing | 7,079/6,270 | 30–60 | X | X | ||||
| Glumer et al. [ | 2006 | Validate | Global | Multi-ethnic | Existing | 29,758 | / | X | X | ||||
| Gray et al. [ | 2010 | Develop | UK (H) | Caucasian, Asian | Existing | 6,186 | 40–75 | X | X | ||||
| Gray et al. [ | 2013 | Develop | Portugal (H) | / | Existing | 3,435 (18–94) | 18–94 | X | X | ||||
| Griffin et al. [ | 2000 | Develop | UK (H) | Caucasian | Existing | 1,077 | 40–64 | X | X | ||||
| Hanif et al. [ | 2008 | Develop | UK (H) | Asian | Current | 435 | 20–75 | X | X | ||||
| Heianza et al. [ | 2013 | Develop | Japan (H) | Asian | Existing | 7,477 | 18–88 | X | X | ||||
| Heikes et al. [ | 2008 | Develop | USA (H) | Representative of USA population | Existing | 7,029 | >20 | X | X | ||||
| Heldgaard & Griffin [ | 2006 | Develop | Denmark (H) | / | Current | X | 1,355 | 20–69 | X | X | |||
| Keesukphan et al. [ | 2007 | Develop | Thailand (L/M) | / | Existing | 429 | 18–81 | X | X | ||||
| Ko et al. [ | 2010 | Develop | China (L/M) | Asian | Existing | 7,695 | X | X | |||||
| Ku & Kegels [ | 2013 | Validate | Philippines (L/M) | / | Current | 1,789 | X | X | |||||
| Lee et al. [ | 2012 | Develop | Korea (L/M) | / | Existing | 9,602 | >20 | X | X | ||||
| Li et al. [ | 2009 | Develop | Germany (H) | / | Current | 921 | 14–93 | X | X | ||||
| Lin et al. [ | 2009 | Validate | Taiwan (H) | Asian | Current | 2,759 | >18 | X | X | ||||
| Lindstrom et al. [ | 2003 | Develop | Finland (H) | / | Existing | X | 4,435 | 35–64 | X | X | |||
| Liu et al. [ | 2011 | Develop | China (L/M) | / | Existing | 1,851 | 40–90 | X | X | ||||
| Mohan et al. [ | 2005 | Validate | India (L/M) | Asian | Existing | 2,350 | >35 | X | X | ||||
| Park et al. [ | 2002 | Validate | UK (H) | Caucasian | Existing | X | 6,567 | 39–78 | X | X | |||
| Rahman et al. [ | 2008 | Validate | UK (H) | / | Existing | 25,639 | 40–79 | X | X | ||||
| Ramachandran et al. [ | 2005 | Develop | India (L/M) | Asian | Existing | 10,003 | >20 | X | X | ||||
| Rathmann et al. [ | 2005 | Validate | Germany (H) | Caucasian | Existing | 1,353 | 55–74 | X | X | ||||
| Robinson et al. [ | 2011 | Develop | Canada (H) | Caucasian, Aboriginal, Asian, Black, Hispanic | Current | 6,475 | 40–74 | X | X | X | |||
| Ruige et al. [ | 2001 | Validate | USA (H) | Hispanics, Caucasian, Black, Native American | Current | 1,471 | >20 | X | X | ||||
| Spijkerman et al. [ | 1997 | Develop | Netherlands (H) | Caucasian | Existing | X | 2,364 | 50–74 | X | X | |||
| Ta et al. [ | 2005 | Validate | Finland (H) | / | Current supplemented with existing | X | 2,966 | 45–74 | X | X | |||
| Tankove et al. [ | 2004 | Validate | UK (H) | Black, Asian | Existing | 803 | 40–75 | X | X | ||||
| Winkler et al. [ | 2010 | Validate | Vietnam (L/M) | / | Current | 721 | 30–70 | X | X | ||||
| Witte et al. [ | 2011 | Validate | Bulgaria (L/M) | / | Current | 2,169 | X | X | |||||
| Zhang et al. [ | 2012 | Validate | Hungary (L/M) | / | Current | 68,476 | >18 | X | X | ||||
| Zhou et al. [ | 2010 | Validate | UK (H) | Caucasian | Existing | 6,990 | 35–55 | X | X | ||||
| Zhang et al. [ | 2014 | Validate | USA (H) | Caucasian, Black | Existing | X | 20,633 | >20 | X | X | |||
| Zhou et al. [ | 2013 | Develop | China (L/M) | / | Existing | 41,809 | 20–74 | X | X | ||||
Figure 2Graphical representation of handling of missing data from the 48 selected studies. MI multiple imputation, SI single imputation.
Details of imputation options
|
|
| |
|---|---|---|
| Single imputation methods | ||
| Simple imputation | In a predictor (X) which is unrelated to all other X’s, substitution replaces all missing continuous values with the mean (or median) of all participants who have a valid value or the mode for categorical predictors [ | Mean substitution is easily implemented with the package ‘ |
| Simple imputation reduces variability and correlation estimates by ignoring relationships between variables but assumes MCAR. Regression coefficients are biased towards 0 (zero) since the outcome (Y) is not considered [ | ||
| Conditional mean imputation | Regression imputation assumes strong relationships between the X to be imputed and the independent X’s used in the univariable or multivariable regression formula [ | Conditional mean imputation can be implemented in R through the creation of a regression model and the subsequent inbuilt ‘ |
| Stochastic regression imputation | An alternative to conditional mean imputation, stochastic regression imputation includes a random element to the prediction of values, highlighting the uncertainty of imputed values [ | This can be implemented with the ‘ |
| Hotdecking | Hotdecking replaces the missing value of an individual with a random value from a pool of individuals who are matched to the missing individual by predictors, the ‘deck’ [ | The command ‘ |
| Multiple imputation methods | ||
| Markov chain Monte Carlo (MCMC) | Multivariate normal imputation assumes a multivariate distribution and the MCMC algorithm is used to obtain imputed values and allow for uncertainty in the estimated model predictors [ | The command ‘ |
| Maximum likelihood | The expectation-maximization (EM) algorithm, also called joint modeling, assumes a multivariate distribution. First a set of parameter values that produces the maximum likelihood are identified from the conditional distribution; values that would most likely have resulted in the observed data [ | The package ‘ |