| Literature DB >> 23853744 |
Yiran Dong1, Chao-Ying Joanne Peng.
Abstract
The impact of missing data on quantitative research can be serious, leading to biased estimates of parameters, loss of information, decreased statistical power, increased standard errors, and weakened generalizability of findings. In this paper, we discussed and demonstrated three principled missing data methods: multiple imputation, full information maximum likelihood, and expectation-maximization algorithm, applied to a real-world data set. Results were contrasted with those obtained from the complete data set and from the listwise deletion method. The relative merits of each method are noted, along with common features they share. The paper concludes with an emphasis on the importance of statistical assumptions, and recommendations for researchers. Quality of research will be enhanced if (a) researchers explicitly acknowledge missing data problems and the conditions under which they occurred, (b) principled methods are employed to handle missing data, and (c) the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication.Entities:
Keywords: EM; FIML; Listwise deletion; MAR; MCAR; MI; MNAR; Missing data
Year: 2013 PMID: 23853744 PMCID: PMC3701793 DOI: 10.1186/2193-1801-2-222
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Probability of missing for LBEHRISK and ESTEEM at three missing rates
| Overall missing rate | Missing variable | FAMSTR | Missing variable | EMORISK | |||
|---|---|---|---|---|---|---|---|
| Single family | Intact/step family | ≤ | Between | ≥ | |||
| 20% | ESTEEM | .20 | .02 | LBEHRISK | .00 | .10 | .30 |
| 40% | ESTEEM | .40 | .05 | LBEHRISK | .10 | .20 | .60 |
| 60% | ESTEEM | .80 | .10 | LBEHRISK | .20 | .40 | .80 |
Note. Q1 = first quartile, Q3 = third quartile.
Regression Coefficients from Four Missing Data Methods
| Complete data | LD | MI | FIML | EM | |
|---|---|---|---|---|---|
| GENDER | -0.434*** | -0.412*** | -0.414*** | -0.421*** | -0.421*** |
| (0.082) | (0.091) | (0.086) | (0.087) | (0.083) | |
| DROPOUT | 1.172*** | 1.237*** | 1.266*** | 1.263*** | 1.263*** |
| (0.125) | (0.142) | (0.132) | (0.132) | (0.126) | |
| ESTEEM | -0.191*** | -0.213*** | -0.215*** | -0.212*** | -0.212*** |
| (0.041) | (0.046) | (0.044) | (0.044) | (0.041) | |
| FAMSTR | 0.367*** | 0.377*** | 0.365*** | 0.366*** | 0.366*** |
| (0.087) | (0.101) | (0.096) | (0.092) | (0.088) | |
| Actual | 432 | 349 | 432 | N/A | 414 |
| GENDER | -0.434*** | -0.39** | -0.414*** | -0.413*** | -0.413*** |
| (0.082) | (0.131) | (0.1) | (0.104) | (0.086) | |
| DROPOUT | 1.172*** | 1.557*** | 1.559*** | 1.532*** | 1.562*** |
| (0.125) | (0.209) | (0.17) | (0.158) | (0.131) | |
| ESTEEM | -0.191*** | -0.193** | -0.217*** | -0.214** | -0.215*** |
| (0.041) | (0.065) | (0.063) | (0.06) | (0.043) | |
| FAMSTR | 0.367*** | 0.479* | 0.302* | 0.3** | 0.3** |
| (0.087) | (0.192) | (0.116) | (0.111) | (0.091) | |
| Actual | 432 | 171 | 432 | N/A | 367 |
Note. Standard error estimates in parentheses. MI results were based on 60 imputations. FIML results were obtained with EMORISK as an auxiliary variable in the model.
aThe actual overall missing rate was 19.21%. bThe actual overall missing rate was 60.42%.
* p < .05. ** p < .01. ***p < .001.
Percentage of Bias in Estimates
| LD | MI | FIML | EM | |
|---|---|---|---|---|
| GENDER | 5.07 | 4.61 | 3.00 | 3.00 |
| DROPOUT | 5.55 | 8.02 | 7.76 | 7.76 |
| ESTEEM | -11.52 | -12.57 | -10.99 | -10.99 |
| FAMSTR | 2.72 | -0.54 | -0.27 | -0.27 |
| GENDER | 10.14 | 4.61 | 4.84 | 4.84 |
| DROPOUT | 32.85 | 33.02 | 30.72 | 33.28 |
| ESTEEM | -1.05 | -13.61 | -12.04 | -12.57 |
| FAMSTR | 30.52 | -17.71 | -18.26 | -18.26 |
Note. Percentage of bias was calculated as the ratio of the difference between the incomplete data estimate and the complete data estimate divided by the complete data estimate.
aThe actual overall missing rate was 19.21%. bThe actual overall missing rate was 60.42%.