| Literature DB >> 27429686 |
Yang Liu1, Anindya De2.
Abstract
Missing data commonly occur in large epidemiologic studies. Ignoring incompleteness or handling the data inappropriately may bias study results, reduce power and efficiency, and alter important risk/benefit relationships. Standard ways of dealing with missing values, such as complete case analysis (CCA), are generally inappropriate due to the loss of precision and risk of bias. Multiple imputation by fully conditional specification (FCS MI) is a powerful and statistically valid method for creating imputations in large data sets which include both categorical and continuous variables. It specifies the multivariate imputation model on a variable-by-variable basis and offers a principled yet flexible method of addressing missing data, which is particularly useful for large data sets with complex data structures. However, FCS MI is still rarely used in epidemiology, and few practical resources exist to guide researchers in the implementation of this technique. We demonstrate the application of FCS MI in support of a large epidemiologic study evaluating national blood utilization patterns in a sub-Saharan African country. A number of practical tips and guidelines for implementing FCS MI based on this experience are described.Entities:
Keywords: Missing data; blood utilization; complete case analysis; fully conditional specification; multiple imputation
Year: 2015 PMID: 27429686 PMCID: PMC4945131 DOI: 10.6000/1929-6029.2015.04.03.7
Source DB: PubMed Journal: Int J Stat Med Res ISSN: 1929-6029
Frequency Analysis of Missing Variables in Original NAMBTS Data During the Study Period: X Observed; • Missing
| Missing data pattern | 2007/2008 | 2008/2009 | 2009/2010 | 2010/2011 | Grand Total | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Diagnosis | Age | Gender | n (events) | % | n (events) | % | n (events) | % | n (events) | % | n (events) | % |
| X | X | X | 5825 | 66.19 | 7060 | 70.86 | 7418 | 70.53 | 6286 | 62.66 | 26589 | 67.63 |
| X | X | • | 67 | 0.76 | 37 | 0.37 | 6 | 0.06 | 17 | 0.17 | 127 | 0.32 |
| X | • | X | 818 | 9.30 | 1031 | 10.35 | 875 | 8.32 | 586 | 5.84 | 3310 | 8.42 |
| X | • | • | 61 | 0.69 | 42 | 0.42 | 16 | 0.15 | 38 | 0.38 | 157 | 0.40 |
| • | X | X | 1080 | 12.27 | 819 | 8.22 | 1048 | 9.96 | 1875 | 18.69 | 4822 | 12.27 |
| • | X | • | 19 | 0.22 | 7 | 0.07 | 9 | 0.09 | 15 | 0.15 | 50 | 0.13 |
| • | • | X | 226 | 2.57 | 169 | 1.70 | 255 | 2.42 | 437 | 4.36 | 1087 | 2.76 |
| • | • | • | 704 | 8.00 | 798 | 8.01 | 891 | 8.47 | 778 | 7.76 | 3171 | 8.07 |
| Grand Total Events | 8800 | 100% | 9963 | 100% | 10518 | 100% | 10032 | 100% | 39313 | 100% | ||
Note: A transfusion event is defined as any patient record in which at least 1 type of blood component is ordered for an individual patient. Total numbers of each type of blood component unit associated with each transfusion event are established and stratified by component type and by year [23].
Figure 1Imputation diagnostics (Graphic and Numeric).
Figure 2Comparison of imputation models: Blue line represents the Age distribution of imputed data by an appropriate REGPMM model (Imputed_Good); Red line represents the Age distribution of imputed data by an inappropriate REG model (Imputation_Bad); Green line represents the Age distribution of the observed data set.
Comparison of Total Blood Component Unit Counts by FCS MI and CCA
| FCS MI | CCA | ||||
|---|---|---|---|---|---|
| Component Type | n (units) | % | 95% CI | n (units) | % |
| RBC | 78,660 | 86.1 | (85.8, 86.3) | 52,284 | 86.2 |
| FFP | 9,751 | 10.7 | (10.5, 10.9) | 6,082 | 10.0 |
| Platelets | 2,978 | 3.3 | (3.1, 3.4) | 2,266 | 3.7 |
| Total Units | 91,389 | 100.0 | – | 60,632 | 100.0 |
Mean value from 20 imputed data sets.
RBC Utilization by Diagnosis (ICD Category), Age and Gender: A. FCS MI; B. CCA
| A. FCS MI | Male | Female | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ICD category | 0–14 years | 15–49 years | 50+ years | 0–14 years | 15–49 years | 50+ years | Totals | |||||||
| % | 95% CI | % | 95% CI | % | 95% CI | % | 95% CI | % | 95% CI | % | 95% CI | n (units) | % | |
| D. Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism (D50-D89) | 2.3 | (2.1,2.4) | 9.2 | (8.9,9.4) | 4.5 | (4.3,4.7) | 1.9 | (1.7,2) | 16.3 | (15.9,16.7) | 4.8 | (4.6,5.1) | 30,616 | 38.9 |
| A/B. Infectious disease (A00-B99) | 0.6 | (0.5,0.6) | 4.2 | (4,4.4) | 1.6 | (1.4,1.7) | 0.5 | (0.5,0.6) | 6.4 | (6.1,6.7) | 1.5 | (1.3,1.7) | 11,648 | 14.8 |
| O. Pregnancy (O00-O99) | 0 | (0.0) | 0 | (0,0) | 0 | (0,0) | 0.3 | (0.2,0.4) | 9.9 | (9.6,10.2) | 0.8 | (0.5,1.2) | 8,702 | 11.1 |
| K. Gastrointestinal (K20-K93) | 0.1 | (0,0.1) | 1.8 | (1.7,1.9) | 1.6 | (1.5,1.8) | 0.1 | (0,0.1) | 1.2 | (1.1,1.4) | 1.3 | (1.1,1.4) | 4,796 | 0.1 |
| All others | 1.8 | (1.7,1.9) | 6.9 | (6.6,7.2) | 4.8 | (4.6,5.1) | 1.3 | (1.2,1.4) | 9.2 | (8.9,9.5) | 5.1 | (4.7,5.6) | 22,898 | 29.1 |
| 78,660 | 100.0 | |||||||||||||
| D. Diseases of the blood and blood-forming organs and certain disorders involving the immune mechanism (D50-D89) | 2.3 | 9.0 | 4.3 | 1.7 | 16.5 | 4.4 | 19,984 | 38.2 | ||||||
| A/B. Infectious disease (A00-B99) | 0.5 | 4.6 | 1.5 | 0.4 | 6.7 | 1.2 | 7,790 | 14.8 | ||||||
| O. Pregnancy (O00-O99) | 0 | 0 | 0 | 0.0 | 12.0 | 0.1 | 6,348 | 12.1 | ||||||
| K. Gastrointestinal (K20-K93) | 0.1 | 2.0 | 1.7 | 0.1 | 1.1 | 1.2 | 3,205 | 6.1 | ||||||
| All Others | 2.0 | 7.2 | 5.1 | 1.3 | 9.0 | 4.0 | 14,9572 | 28.6 | ||||||
| 52,284 | 100.0 | |||||||||||||