| Literature DB >> 26414328 |
Nawar Shara1, Sayf A Yassin2, Eduardas Valaitis3, Hong Wang1, Barbara V Howard1, Wenyu Wang4, Elisa T Lee4, Jason G Umans1.
Abstract
Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989-1991), 2 (1993-1995), and 3 (1998-1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26414328 PMCID: PMC4587557 DOI: 10.1371/journal.pone.0138923
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Baseline Characteristics of Strong Heart Study Participants with Complete Scr Data at All Three Exams (N = 2,264).
| Variable | N | Mean (SD) |
|---|---|---|
| Scr | 2,264 | 0.88±0.25 |
| Age | 2,264 | 54.9±7.4 |
| Female | 1451 | 64.1% |
| Diabetes | 845 | 37.3% |
| Prevalent CVD | 104 | 4.6% |
| Incident CVD by 2003 | 447 | 19.7% |
Abbreviations: CVD = cardiovascular disease; Scr = serum creatinine.
Mean and SD of Scr Values Stratified by Imputation Method and Model.
| Exam 1 | Complete Data Scr mean (sd): | |||
| Missing Data Generation Method | ||||
| Imputation method | Data with Randomly Missing Values (Model 1) | Autoregressive Missing (Model 2) | Autoregressive w/ Gender and Age (Model 3) | Troxel Algorithm (NMAR Data; Model 4) |
| LD | 0.87 (0.26) | 0.87 (0.18) | 0.88 (0.27) | 0.88 (0.27) |
| Mean | 0.87 (0.24) | 0.87 (0.16) | 0.88 (0.24) | 0.88 (0.24) |
| AV | 0.87 (0.26) | 0.87 (0.18) | 0.88 (0.27) | 0.88 (0.27) |
| MI | 0.87 (0.25) | 0.87 (0.17) | 0.87 (0.25) | 0.88 (0.25) |
| PM (10th percentile) | 0.88 (0.26) | 0.87 (0.17) | 0.89 (0.25) | 0.88 (0.25) |
| PM (25th percentile) | 0.88 (0.25) | 0.88 (0.17) | 0.89 (0.25) | 0.89 (0.26) |
| PM (50th percentile) | 0.88 (0.26) | 0.88 (0.17) | 0.89 (0.25) | 0.89 (0.26) |
|
| Complete Data Scr mean (sd): | |||
| Missing Data Generation Method | ||||
| Imputation method | Data with Randomly Missing Values (Model 1) | Autoregressive Missing (Model 2) | Autoregressive w/ Gender and Age (Model 3) | Troxel Algorithm (NMAR Data; Model 4) |
| LD | 0.90 (0.47) | 0.89 (0.35) | 0.90 (0.45) | 0.83 (0.19) |
| Mean | 0.90 (0.39) | 0.89 (0.29) | 0.90 (0.38) | 0.83 (0.16) |
| AV | 0.89 (0.43) | 0.89 (0.32) | 0.89 (0.41) | 0.86 (0.28) |
| MI | 0.91 (0.45) | 0.89 (0.31) | 0.91 (0.39) | 0.84 (0.19) |
| PM (10th percentile) | 0.95 (0.45) | 0.93 (0.32) | 0.95 (0.41) | 0.86 (0.19) |
| PM (25th percentile) | 0.97 (0.46) | 0.95 (0.32) | 0.97 (0.41) | 0.88 (0.19) |
| PM (50th percentile) | 0.98 (0.46) | 0.96 (0.33) | 0.99 (0.42) | 0.90 (0.21) |
|
| Complete Data Scr mean (sd): | |||
| Missing Data Generation Method | ||||
| Imputation method | Data with Randomly Missing Values (Model 1) | Autoregressive Missing (Model 2) | Autoregressive w/ Gender and Age (Model 3) | Troxel Algorithm (NMAR Data; Model 4) |
| LD | 0.95 (0.94) | 0.94 (0.85) | 0.93 (0.85) | 0.76 (0.17) |
| Mean | 0.95 (0.73) | 0.94 (0.67) | 0.93 (0.67) | 0.76 (0.13) |
| AV | 0.93 (0.81) | 0.93 (0.72) | 0.92 (0.74) | 0.82 (0.26) |
| MI | 1.03 (0.86) | 1.00 (0.76) | 1.00 (0.77) | 0.79 (0.17) |
| PM (10th percentile) | 1.14 (0.89) | 1.09 (0.79) | 1.12 (0.83) | 0.81 (0.17) |
| PM (25th percentile) | 1.16 (0.89) | 1.12 (0.80) | 1.13 (0.82) | 0.83 (0.17) |
| PM (50th percentile) | 1.19 (0.91) | 1.13 (0.82) | 1.16 (0.84) | 0.85 (0.19) |
Abbreviations: AV = imputation using adjacent value; LD = listwise deletion; Mean = imputation using the mean; MI = multiple imputation; NMAR = not missing at random; PM = pattern mixture.
Model 1 = data with randomly missing values
Model 2 = autoregressive missing
Model 3 = autoregressive +gender + age
Model 4 = NMAR data.
Adjusted Hazard Ratios With 95% Confidence Intervals for Cardiovascular Disease Risk.
| Exam 1 | Complete Data Scr HR (95% CI): | |||
| Missing Data Generation Method | ||||
| Imputation method | Data with Randomly Missing Values(Model 1) | Autoregressive Missing (Model 2) | Autoregressive w/ Gender and Age (Model 3) | Troxel Algorithm (NMAR Data; Model 4) |
| LD | 1.05 (0.73–1.51) | 2.21 (1.31–3.67) | 1.16 (0.86–1.55) | 1.17 (0.88–1.55) |
| Mean | 1.06 (0.75–1.51) | 2.19 (1.31–3.67) | 1.13 (0.83–1.54) | 1.16 (0.87–1.54) |
| AV | 1.05 (0.73–1.51) | 2.21 (1.30–3.76) | 1.16 (0.86–1.55) | 1.17 (0.88–1.55) |
| MI | 1.11 (0.82–1.51) | 1.98 (1.18–3.33) | 1.16 (0.87–1.54) | 1.15 (0.87–1.54) |
| PM (10th percentile) | 1.08 (0.78–1.49) | 2.05 (1.23–3.41) | 1.13 (0.84–1.53) | 1.16 (0.86–1.56) |
| PM (25th percentile) | 1.09 (0.80–1.49) | 2.11 (1.26–3.52) | 1.15 (0.85–1.56) | 1.14 (0.85–1.54) |
| PM (50th percentile) | 1.10 (0.80–1.51) | 2.21 (1.30–3.75) | 1.17 (0.88–1.55) | 1.19 (0.91–1.55) |
|
| Complete Data Scr HR (95% CI): | |||
| Missing Data Generation Method | ||||
| Imputation method | Data with Randomly Missing Values (Model 1) | Autoregressive Missing (Model 2) | Autoregressive w/ Gender and Age (Model 3) | Troxel Algorithm(NMAR Data; Model 4) |
| LD | 1.14 (0.96–1.35) | 1.37 (1.11–1.68) | 1.23 (1.08–1.41) | 0.66 (0.32–1.47) |
| Mean | 1.14 (0.97–1.35) | 1.34 (1.09–1.65) | 1.23 (1.08–1.40) | 0.69 (0.32–1.35) |
| AV | 1.12 (0.94–1.33) | 1.40 (1.16–1.69) | 1.23 (1.09–1.40) | 1.09 (0.76–1.55) |
| MI | 1.10 (0.92–1.31) | 1.34 (1.08–1.66) | 1.22 (1.06–1.40) | 1.01 (0.53–1.95) |
| PM (10th percentile) | 1.07 (0.9–1.30) | 1.38 (1.14–1.67) | 1.24 (1.09–1.40) | 1.19 (0.59–2.42) |
| PM (25th percentile) | 1.07 (0.88–1.29) | 1.40 (1.16–1.68) | 1.23 (1.08–1.40) | 1.05 (0.55–1.98) |
| PM (50th percentile) | 1.09 (0.91–1.31) | 1.40 (1.15–1.70) | 1.23 (1.08–1.40) | 1.16 (0.66–2.02) |
|
| Complete Data Scr HR (95% CI): 1. | |||
| Missing Data Generation Method | ||||
| Imputation method | Data with Randomly Missing Values (Model 1) | Autoregressive Missing (Model 2) | Autoregressive w/ Gender and Age (Model 3) | Troxel Algorithm (NMAR Data; Model 4) |
| LD | 1.09 (0.94–1.26) | 1.12 (0.95–1.31) | 1.07 (0.89–1.29) | 1.79 (0.51–6.23) |
| Mean | 1.10 (0.94–1.27) | 1.10 (0.94–1.28) | 1.07 (0.90–1.28) | 1.76 (0.54–5.78) |
| AV | 1.08 (0.93–1.25) | 1.13 (0.98–1.30) | 1.12 (0.96–1.30) | 1.39 (0.93–2.06) |
| MI | 1.04 (0.89–1.21) | 1.13 (0.99–1.30) | 1.11 (0.97–1.27) | 1.87 (0.70–4.97) |
| PM (10th percentile) | 1.00 (0.8–1.20) | 1.11 (0.95–1.31) | 1.11 (0.96–1.28) | 1.52 (0.62–3.76) |
| PM (25th percentile) | 1.02 (0.86–1.20) | 1.13 (0.98–1.31) | 1.12 (0.98–1.27) | 1.38 (0.54–3.54) |
| PM (50th percentile) | 1.01 (0.86–1.19) | 1.13 (0.98–1.30) | 1.11 (0.98–1.27) | 1.37 (0.58–3.26) |
& Cox proportional regression models adjusted for age, gender, and diabetes.
*Significant at 5%.
Abbreviations: LD = listwise deletion; Mean = imputation using the mean; AV = imputation using adjacent value; MI = multiple imputation; NMAR = not missing at random; PM = pattern mixture.
Complete Data: data with no missing values
Model 1: data with randomly missing values
Model 2: autoregressive missing
Model 3: autoregressive +gender + age
Model 4: NMAR data.
Adjusted Hazard Ratios With 95% CI for CVD Risk: Time-Dependent Cox Model.
| Model | Complete Data | LD | Mean | AV | MI | PM (10th percentile) | PM (25th percentile) | PM (50th percentile) |
|---|---|---|---|---|---|---|---|---|
| 1 |
| 1.03(0.92–1.15) | 1.02(0.90–1.15) | 1.03(0.92–1.15) | 0.92(0.73–1.16) | 0.83(0.71–0.97) | 0.82(0.68–0.98) | 0.81(0.68–0.96) |
| 2 |
| 1.05(0.92–1.18) | 1.02(0.89–1.16) | 1.07(0.96–1.20) | 1.00(0.88–1.13) | 0.92(0.79–1.08) | 0.90(0.78–1.04) | 0.91(0.77–1.06) |
| 3 |
| 1.04(0.92–1.18) | 1.04(0.91–1.18) | 1.07(0.96–1.20) | 1.01(0.90–1.14) | 0.93(0.82–1.06) | 0.91(0.79–1.04) | 0.91(0.79–1.05) |
| 4 |
| 6.98(3.57–13.66)* | 13.35(7.36–24.21)* | 1.45(1.18–1.79)* | 3.26(1.86–5.71)* | 2.76(1.69–4.50)* | 2.86(1.50–5.48)* | 2.50(1.52–4.12)* |
Abbreviations: LD = listwise deletion; Mean = imputation using the mean; AV = imputation using adjacent value; MI = multiple imputation; PM = pattern mixture.
Complete Data: data with no missing values.
Model 1: data with randomly missing values
Model 2: autoregressive missing
Model 3: autoregressive +gender + age
Model 4: NMAR data.