| Literature DB >> 35369872 |
Marjan Javanbakht1, Johnny Lin2, Amy Ragsdale3, Soyeon Kim4, Suzanne Siminski4, Pamina Gorbach3.
Abstract
BACKGROUND: Although standardized measures to assess substance use are available, most studies use variations of these measures making it challenging to harmonize data across studies. The aim of this study was to evaluate the performance of different strategies to impute missing substance use data that may result as part of data harmonization procedures.Entities:
Mesh:
Year: 2022 PMID: 35369872 PMCID: PMC8978400 DOI: 10.1186/s12874-022-01554-4
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Sociodemographic and substance use characteristics of mSTUDY participants (8/2014—06/2019)
| – | |||||||
| Age, years (mean, SD) | 31.2 (6.8) | 33.5(6.5) | 29.0 (6.5) | < 0.01 | |||
| Race/ethnicity | 0.61 | ||||||
| African American | 224 | 42.4 | 106 | 40.2 | 118 | 44.7 | |
| Hispanic/Latino | 200 | 37.9 | 101 | 38.3 | 99 | 37.5 | |
| Other | 33 | 6.3 | 13 | 4.9 | 20 | 7.6 | |
| White | 71 | 13.4 | 44 | 16.7 | 27 | 10.2 | |
| Education | 0.04 | ||||||
| Less than High School | 64 | 12.2 | 40 | 15.4 | 24 | 9.1 | |
| High School Graduate | 189 | 36.1 | 94 | 36.2 | 95 | 36.0 | |
| More than High School | 271 | 51.7 | 126 | 48.5 | 145 | 54.9 | |
| Unemployed | 242 | 45.6 | 146 | 55.3 | 94 | 35.6 | < 0.01 |
| Unstable Housing, past 6 monthsa | 190 | 35.5 | 91 | 35.4 | 92 | 35.7 | 0.95 |
| – | |||||||
| Substance use, past 6 months | |||||||
| Heroin | 75 | 3.0 | 33 | 2.6 | 42 | 3.7 | 0.15 |
| Methamphetamine | 897 | 37.5 | 609 | 48.8 | 288 | 25.3 | < 0.01 |
| Cannabis | 1,238 | 51.8 | 593 | 47.5 | 645 | 56.6 | < 0.01 |
Abbreviations, SD Standard deviation
aDefined as not having a regular place to stay in the past 6 months
Relative bias, RMSE and coverage probability for heroin use comparing validation data to imputed data
| Method | % Missing | Estimate | Mean Bias | % Relative Bias | RMSE | Coverage |
|---|---|---|---|---|---|---|
| LD | 10% | 3.0% | 0.00003 | 0.11% | 0.0036 | 94.6% |
| LR | 10% | 3.0% | 0.00006 | 0.19% | 0.0036 | 94.6% |
| HD | 10% | 3.0% | 0.00001 | 0.04% | 0.0036 | 94.0% |
| MI (M = 5) | 10% | 3.0% | 0.00034 | 1.14% | 0.0036 | 94.8% |
| MI (M = 20) | 10% | 3.0% | 0.00033 | 1.09% | 0.0036 | 95.0% |
| LD | 30% | 3.0% | 0.00009 | 0.29% | 0.0042 | 94.6% |
| LR | 30% | 3.0% | 0.00010 | 0.34% | 0.0038 | 92.2% |
| HD | 30% | 3.0% | 0.00006 | 0.19% | 0.0042 | 89.6% |
| MI (M = 5) | 30% | 3.1% | 0.00108 | 3.60% | 0.0040 | 94.2% |
| MI (M = 20) | 30% | 3.1% | 0.00105 | 3.50% | 0.0039 | 94.2% |
| LD | 50% | 3.0% | -0.00007 | -0.23% | 0.0049 | 94.2% |
| LR | 50% | 3.0% | -0.00006 | -0.20% | 0.0039 | 90.0% |
| HD | 50% | 3.0% | 0.00006 | 0.20% | 0.0044 | 88.6% |
| MI (M = 5) | 50% | 3.2% | 0.00183 | 6.09% | 0.0045 | 93.8% |
| MI (M = 20) | 50% | 3.2% | 0.00181 | 6.03% | 0.0043 | 93.6% |
| LD | 10% | 2.0% | -0.00999 | -33.33% | 0.0105 | 14.0% |
| LR | 10% | 2.9% | -0.00110 | -3.66% | 0.0037 | 91.8% |
| HD | 10% | 2.9% | -0.00049 | -1.65% | 0.0037 | 92.8% |
| MI (M = 5) | 10% | 3.0% | 0.00031 | 1.03% | 0.0037 | 95.0% |
| MI (M = 20) | 10% | 3.0% | 0.00034 | 1.12% | 0.0037 | 94.2% |
| LD | 30% | 1.4% | -0.01554 | -51.84% | 0.0158 | 1.0% |
| LR | 30% | 2.7% | -0.00256 | -8.55% | 0.0044 | 83.4% |
| HD | 30% | 2.9% | -0.00132 | -4.40% | 0.0043 | 87.2% |
| MI (M = 5) | 30% | 3.1% | 0.00118 | 3.95% | 0.0042 | 95.4% |
| MI (M = 20) | 30% | 3.1% | 0.00121 | 4.02% | 0.0041 | 95.2% |
| LD | 50% | 1.1% | -0.01871 | -62.41% | 0.0189 | 0.0% |
| LR | 50% | 2.7% | -0.00337 | -11.25% | 0.0051 | 75.6% |
| HD | 50% | 2.8% | -0.00151 | -5.03% | 0.0048 | 82.8% |
| MI (M = 5) | 50% | 3.2% | 0.00201 | 6.69% | 0.0048 | 93.6% |
| MI (M = 20) | 50% | 3.2% | 0.00205 | 6.83% | 0.0046 | 93.8% |
| LD | 10% | 1.6% | -0.01361 | -45.41% | 0.0139 | 0.8% |
| LR | 10% | 1.8% | -0.01181 | -39.41% | 0.0122 | 3.4% |
| HD | 10% | 1.9% | -0.01131 | -37.73% | 0.0117 | 7.0% |
| MI (M = 5) | 10% | 1.9% | -0.01101 | -36.74% | 0.0114 | 11.4% |
| MI (M = 20) | 10% | 1.9% | -0.01099 | -36.67% | 0.0114 | 10.8% |
| LD | 30% | 1.2% | -0.01799 | -60.03% | 0.0182 | 0.0% |
| LR | 30% | 1.8% | -0.01192 | -39.77% | 0.0123 | 3.2% |
| HD | 30% | 1.9% | -0.01127 | -37.59% | 0.0117 | 6.8% |
| MI (M = 5) | 30% | 2.0% | -0.01016 | -33.91% | 0.0106 | 20.6% |
| MI (M = 20) | 30% | 2.0% | -0.01023 | -34.13% | 0.0106 | 18.6% |
| LD | 50% | 1.0% | -0.02032 | -67.82% | 0.0205 | 0.0% |
| LR | 50% | 2.0% | -0.00994 | -33.15% | 0.0105 | 13.8% |
| HD | 50% | 2.1% | -0.00934 | -31.16% | 0.0101 | 22.8% |
| MI (M = 5) | 50% | 2.3% | -0.00687 | -22.91% | 0.0077 | 69.8% |
| MI (M = 20) | 50% | 2.3% | -0.00688 | -22.97% | 0.0077 | 65.8% |
Abbreviations, LD Listwise deletion, LR Logistic regression, HD Hot-deck, MI Multiple imputation, MCAR Missing completely at random, MAR Missing at random, MNAR Missing not at random, RMSE Root mean square error
Relative bias percentage, RMSE and coverage probability for methamphetamine use comparing validation data to imputed data
| Method | % Missing | Estimate | Mean Bias | % Relative Bias | RMSE | Coverage |
|---|---|---|---|---|---|---|
| LD | 10% | 37.4% | -0.00055 | -0.15% | 0.0105 | 95.6% |
| LR | 10% | 37.5% | -0.00031 | -0.08% | 0.0105 | 94.2% |
| HD | 10% | 37.4% | -0.00035 | -0.09% | 0.0108 | 94.0% |
| MI (M = 5) | 10% | 37.5% | -0.00013 | -0.03% | 0.0115 | 95.0% |
| MI (M = 20) | 10% | 37.5% | -0.00013 | -0.04% | 0.0128 | 95.2% |
| LD | 30% | 37.5% | -0.00003 | -0.01% | 0.0117 | 95.2% |
| LR | 30% | 37.4% | -0.00035 | -0.09% | 0.0110 | 93.2% |
| HD | 30% | 37.5% | -0.00017 | -0.04% | 0.0115 | 92.2% |
| MI (M = 5) | 30% | 37.5% | 0.00040 | 0.11% | 0.0115 | 95.4% |
| MI (M = 20) | 30% | 37.5% | 0.00021 | 0.06% | 0.0128 | 95.4% |
| LD | 50% | 37.4% | -0.00064 | -0.17% | 0.0148 | 94.6% |
| LR | 50% | 37.5% | -0.00005 | -0.01% | 0.0117 | 89.4% |
| HD | 50% | 37.5% | 0.00002 | 0.01% | 0.0128 | 87.8% |
| MI (M = 5) | 50% | 37.6% | 0.00071 | 0.19% | 0.0115 | 95.2% |
| MI (M = 20) | 50% | 37.6% | 0.00071 | 0.19% | 0.0128 | 94.2% |
| LD | 10% | 34.9% | -0.02540 | -6.78% | 0.0276 | 30.8% |
| LR | 10% | 36.7% | -0.00818 | -2.18% | 0.0133 | 83.8% |
| HD | 10% | 37.3% | -0.00216 | -0.58% | 0.0108 | 93.0% |
| MI (M = 5) | 10% | 37.5% | 0.00034 | 0.09% | 0.0104 | 94.6% |
| MI (M = 20) | 10% | 37.5% | 0.00036 | 0.10% | 0.0103 | 94.4% |
| LD | 30% | 30.8% | -0.06707 | -17.89% | 0.0680 | 0.0% |
| LR | 30% | 35.7% | -0.01803 | -4.81% | 0.0210 | 56.0% |
| HD | 30% | 37.1% | -0.00338 | -0.90% | 0.0121 | 88.6% |
| MI (M = 5) | 30% | 37.5% | 0.00044 | 0.12% | 0.0110 | 95.6% |
| MI (M = 20) | 30% | 37.5% | 0.00054 | 0.14% | 0.0109 | 96.0% |
| LD | 50% | 26.9% | -0.10551 | -28.15% | 0.1063 | 0.0% |
| LR | 50% | 35.2% | -0.02289 | -6.11% | 0.0257 | 37.4% |
| HD | 50% | 37.1% | -0.00364 | -0.97% | 0.0134 | 86.6% |
| MI (M = 5) | 50% | 37.6% | 0.00082 | 0.22% | 0.0120 | 94.4% |
| MI (M = 20) | 50% | 37.6% | 0.00076 | 0.20% | 0.0118 | 93.0% |
| LD | 10% | 35.0% | -0.02520 | -6.72% | 0.0274 | 32.6% |
| LR | 10% | 35.5% | -0.01959 | -5.22% | 0.0223 | 50.2% |
| HD | 10% | 35.8% | -0.01650 | -4.40% | 0.0197 | 61.6% |
| MI (M = 5) | 10% | 35.8% | -0.01651 | -4.41% | 0.0196 | 61.4% |
| MI (M = 20) | 10% | 35.8% | -0.01651 | -4.40% | 0.0196 | 62.2% |
| LD | 30% | 35.0% | -0.02520 | -6.72% | 0.0274 | 32.6% |
| LR | 30% | 35.5% | -0.01959 | -5.22% | 0.0223 | 50.2% |
| HD | 30% | 33.5% | -0.04014 | -10.71% | 0.0417 | 3.0% |
| MI (M = 5) | 30% | 35.8% | -0.01651 | -4.41% | 0.0196 | 61.4% |
| MI (M = 20) | 30% | 33.5% | -0.03958 | -10.56% | 0.0410 | 3.6% |
| LD | 50% | 27.9% | -0.09619 | -25.66% | 0.0971 | 0.0% |
| LR | 50% | 31.6% | -0.05870 | -15.66% | 0.0598 | 0.0% |
| HD | 50% | 32.4% | -0.05061 | -13.50% | 0.0521 | 0.8% |
| MI (M = 5) | 50% | 32.6% | -0.04917 | -13.12% | 0.0505 | 2.2% |
| MI (M = 20) | 50% | 32.6% | -0.04914 | -13.11% | 0.0504 | 1.6% |
Abbreviations, LD Listwise deletion, LR Logistic regression, HD Hot-deck, MI Multiple imputation, MCAR Missing completely at random, MAR Missing at random, MNAR Missing not at random, RMSE Root mean square error
Relative bias percentage, RMSE and coverage probability for cannabis use comparing validation data to imputed data
| Method | % Missing | Estimate | Mean Bias | % Relative Bias | RMSE | Coverage |
|---|---|---|---|---|---|---|
| LD | 10% | 52.0% | 0.00046 | 0.09% | 0.0299 | 95.6% |
| LR | 10% | 52.0% | 0.00042 | 0.08% | 0.0285 | 95.8% |
| HD | 10% | 52.0% | 0.00033 | 0.06% | 0.0253 | 94.2% |
| MI (M = 5) | 10% | 52.0% | 0.00056 | 0.11% | 0.0328 | 95.6% |
| MI (M = 20) | 10% | 52.0% | 0.00050 | 0.10% | 0.0311 | 96.2% |
| LD | 30% | 52.0% | 0.00021 | 0.04% | 0.0201 | 94.8% |
| LR | 30% | 52.0% | 0.00029 | 0.06% | 0.0238 | 93.0% |
| HD | 30% | 52.0% | 0.00034 | 0.07% | 0.0256 | 92.0% |
| MI (M = 5) | 30% | 52.0% | 0.00032 | 0.06% | 0.0247 | 95.8% |
| MI (M = 20) | 30% | 52.0% | 0.00036 | 0.07% | 0.0262 | 95.6% |
| LD | 50% | 52.0% | 0.00070 | 0.14% | 0.0368 | 94.2% |
| LR | 50% | 52.0% | 0.00072 | 0.14% | 0.0373 | 91.0% |
| HD | 50% | 52.0% | 0.00057 | 0.11% | 0.0332 | 87.4% |
| MI (M = 5) | 50% | 52.0% | 0.00064 | 0.12% | 0.0351 | 95.4% |
| MI (M = 20) | 50% | 52.0% | 0.00079 | 0.15% | 0.0389 | 95.0% |
| LD | 10% | 51.0% | -0.00991 | -1.91% | 0.0145 | 85.4% |
| LR | 10% | 52.0% | 0.00032 | 0.06% | 0.0102 | 95.8% |
| HD | 10% | 51.8% | -0.00134 | -0.26% | 0.0107 | 93.6% |
| MI (M = 5) | 10% | 52.0% | 0.00027 | 0.05% | 0.0101 | 96.2% |
| MI (M = 20) | 10% | 52.0% | 0.00034 | 0.07% | 0.0102 | 96.2% |
| LD | 30% | 49.5% | -0.02443 | -4.70% | 0.0273 | 48.8% |
| LR | 30% | 52.2% | 0.00207 | 0.40% | 0.0112 | 93.2% |
| HD | 30% | 51.7% | -0.00241 | -0.46% | 0.0118 | 91.2% |
| MI (M = 5) | 30% | 52.0% | 0.00054 | 0.10% | 0.0110 | 95.8% |
| MI (M = 20) | 30% | 52.0% | 0.00038 | 0.07% | 0.0111 | 95.0% |
| LD | 50% | 48.0% | -0.03969 | -7.64% | 0.0421 | 21.0% |
| LR | 50% | 52.3% | 0.00340 | 0.65% | 0.0124 | 90.4% |
| HD | 50% | 51.7% | -0.00280 | -0.54% | 0.0137 | 86.0% |
| MI (M = 5) | 50% | 52.0% | 0.00029 | 0.06% | 0.0122 | 95.0% |
| MI (M = 20) | 50% | 52.0% | 0.00028 | 0.05% | 0.0122 | 94.4% |
| LD | 10% | 49.9% | -0.02029 | -3.90% | 0.0230 | 53.8% |
| LR | 10% | 50.4% | -0.01557 | -3.00% | 0.0188 | 69.0% |
| HD | 10% | 50.5% | -0.01480 | -2.85% | 0.0183 | 70.4% |
| MI (M = 5) | 10% | 50.5% | -0.01495 | -2.88% | 0.0183 | 72.6% |
| MI (M = 20) | 10% | 50.5% | 0.47888 | 92.16% | 0.0183 | 72.2% |
| LD | 30% | 46.1% | -0.05821 | -11.20% | 0.0596 | 0.0% |
| LR | 30% | 47.7% | -0.04286 | -8.25% | 0.0444 | 1.8% |
| HD | 30% | 47.9% | -0.04112 | -7.91% | 0.0429 | 3.6% |
| MI (M = 5) | 30% | 47.9% | -0.04078 | -7.85% | 0.0423 | 4.0% |
| MI (M = 20) | 30% | -1.5% | -0.04075 | -7.84% | 0.0423 | 4.0% |
| LD | 50% | 42.2% | -0.09781 | -18.82% | 0.0988 | 0.0% |
| LR | 50% | 45.9% | -0.06050 | -11.64% | 0.0616 | 0.0% |
| HD | 50% | 46.1% | -0.05828 | -11.22% | 0.0597 | 0.0% |
| MI (M = 5) | 50% | 46.3% | -0.05702 | -10.97% | 0.0582 | 0.0% |
| MI (M = 20) | 50% | 0.0% | 0.00013 | 0.02% | 0.0582 | 0.0% |
Abbreviations LD Listwise deletion, LR Logistic regression, HD Hot-deck, MI Multiple imputation, MCAR Missing completely at random, MAR Missing at random, MNAR Missing not at random, RMSE Root mean square error