| Literature DB >> 33286923 |
Jiwei Zhao1, Chi Chen2.
Abstract
We study how to conduct statistical inference in a regression model where the outcome variable is prone to missing values and the missingness mechanism is unknown. The model we consider might be a traditional setting or a modern high-dimensional setting where the sparsity assumption is usually imposed and the regularization technique is popularly used. Motivated by the fact that the missingness mechanism, albeit usually treated as a nuisance, is difficult to specify correctly, we adopt the conditional likelihood approach so that the nuisance can be completely ignored throughout our procedure. We establish the asymptotic theory of the proposed estimator and develop an easy-to-implement algorithm via some data manipulation strategy. In particular, under the high-dimensional setting where regularization is needed, we propose a data perturbation method for the post-selection inference. The proposed methodology is especially appealing when the true missingness mechanism tends to be missing not at random, e.g., patient reported outcomes or real world data such as electronic health records. The performance of the proposed method is evaluated by comprehensive simulation experiments as well as a study of the albumin level in the MIMIC-III database.Entities:
Keywords: asymptotic theory; missingness mechanism; nuisance; post-selection inference; regularization; unconventional likelihood
Year: 2020 PMID: 33286923 PMCID: PMC7597318 DOI: 10.3390/e22101154
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
In Section 5.1, sample bias (Bias), sample standard deviation (SD), estimated standard error (SE), and coverage probability (CP) of 95% confidence interval of the estimator of FullData (using all simulated data), CC (using only completely observed subjects), and of the proposed estimator studied in Section 3.
|
| Parameter | Method | Bias | SD | SE | CP |
|---|---|---|---|---|---|---|
| 500 |
| FullData | 0.0026 | 0.0444 | 0.0450 | 0.9540 |
| CC | −0.0329 | 0.0564 | 0.0560 | 0.9100 | ||
| Proposed | 0.0174 | 0.0829 | 0.0789 | 0.9450 | ||
|
| FullData | 0.0022 | 0.0489 | 0.0503 | 0.9510 | |
| CC | 0.0376 | 0.0670 | 0.0699 | 0.9300 | ||
| Proposed | 0.0164 | 0.1644 | 0.1607 | 0.9400 | ||
|
| FullData | −0.0017 | 0.0657 | 0.0635 | 0.9310 | |
| CC | −0.0649 | 0.0851 | 0.0835 | 0.8680 | ||
| Proposed | −0.0399 | 0.2305 | 0.2239 | 0.9360 | ||
|
| FullData | 0.0022 | 0.0616 | 0.0635 | 0.9540 | |
| CC | 0.0778 | 0.0871 | 0.0867 | 0.8430 | ||
| Proposed | 0.0462 | 0.2323 | 0.2298 | 0.9410 | ||
|
| FullData | −0.0045 | 0.0792 | 0.0810 | 0.9530 | |
| CC | −0.0988 | 0.1007 | 0.1043 | 0.8550 | ||
| Proposed | −0.0672 | 0.3081 | 0.3047 | 0.9380 | ||
| 1000 |
| FullData | −0.0012 | 0.0317 | 0.0317 | 0.9540 |
| CC | −0.0348 | 0.0396 | 0.0393 | 0.8510 | ||
| Proposed | 0.0068 | 0.0573 | 0.0555 | 0.9350 | ||
|
| FullData | 0.0011 | 0.0367 | 0.0355 | 0.9370 | |
| CC | 0.0399 | 0.0490 | 0.0494 | 0.8840 | ||
| Proposed | 0.0154 | 0.1154 | 0.1138 | 0.9460 | ||
|
| Full Data | 0.0020 | 0.0448 | 0.0448 | 0.9500 | |
| CC | −0.0649 | 0.0577 | 0.0588 | 0.8110 | ||
| Proposed | −0.0153 | 0.1531 | 0.1591 | 0.9590 | ||
|
| Full Data | −0.0015 | 0.0458 | 0.0449 | 0.9460 | |
| CC | 0.0779 | 0.0605 | 0.0611 | 0.7490 | ||
| Proposed | 0.0135 | 0.1598 | 0.1634 | 0.9480 | ||
|
| Full Data | 0.0009 | 0.0564 | 0.0571 | 0.9540 | |
| CC | −0.0949 | 0.0720 | 0.0734 | 0.7550 | ||
| Proposed | −0.0242 | 0.2091 | 0.2167 | 0.9430 |
In Section 5.1, sample bias (Bias), sample standard deviation (SD), estimated standard error (SE), and coverage probability (CP) of 95% confidence interval of the estimator of FullData (using all simulated data), CC (using only completely observed subjects), and of the proposed estimator studied in Section 3, with a logistic missingness mechanism model.
|
| Parameter | Method | Bias | SD | SE | CP |
|---|---|---|---|---|---|---|
| 500 |
| FullData | −0.0011 | 0.0464 | 0.0451 | 0.9410 |
| CC | −0.0306 | 0.0567 | 0.0567 | 0.9200 | ||
| Proposed | 0.0100 | 0.0822 | 0.0787 | 0.9380 | ||
|
| FullData | −0.0004 | 0.0509 | 0.0503 | 0.9520 | |
| CC | 0.0440 | 0.0636 | 0.0637 | 0.8930 | ||
| Proposed | 0.0146 | 0.1308 | 0.1236 | 0.9420 | ||
|
| FullData | 0.0013 | 0.0639 | 0.0637 | 0.9520 | |
| CC | −0.0871 | 0.0828 | 0.0821 | 0.8190 | ||
| Proposed | −0.0173 | 0.1824 | 0.1753 | 0.9430 | ||
|
| FullData | −0.0030 | 0.0655 | 0.0636 | 0.9400 | |
| CC | 0.0876 | 0.0847 | 0.0821 | 0.8030 | ||
| Proposed | 0.0214 | 0.1840 | 0.1756 | 0.9440 | ||
|
| FullData | 0.0023 | 0.0845 | 0.0812 | 0.9390 | |
| CC | −0.1307 | 0.1083 | 0.1061 | 0.7560 | ||
| Proposed | −0.0331 | 0.2533 | 0.2384 | 0.9360 | ||
| 1000 |
| FullData | 0.0004 | 0.0315 | 0.0317 | 0.9490 |
| CC | −0.0286 | 0.0396 | 0.0398 | 0.8950 | ||
| Proposed | 0.0060 | 0.0568 | 0.0555 | 0.9390 | ||
|
| FullData | 0.0007 | 0.0362 | 0.0354 | 0.9420 | |
| CC | 0.0442 | 0.0451 | 0.0447 | 0.8410 | ||
| Proposed | 0.0079 | 0.0910 | 0.0859 | 0.9290 | ||
|
| FullData | −0.0004 | 0.0450 | 0.0448 | 0.9390 | |
| CC | −0.0879 | 0.0571 | 0.0576 | 0.6640 | ||
| Proposed | −0.0044 | 0.1277 | 0.1220 | 0.9420 | ||
|
| FullData | −0.0009 | 0.0450 | 0.0448 | 0.9450 | |
| CC | 0.0880 | 0.0588 | 0.0577 | 0.6660 | ||
| Proposed | 0.0114 | 0.1309 | 0.1222 | 0.9380 | ||
|
| FullData | −0.0005 | 0.0576 | 0.0572 | 0.9510 | |
| CC | −0.1342 | 0.0755 | 0.0745 | 0.5740 | ||
| Proposed | −0.0191 | 0.1757 | 0.1661 | 0.9370 |
Figure 1In Section 5.2, (1st column), (2nd column), and (3rd column) norms of the estimation bias of the estimator of FullData (using all simulated data), CC (using only completely observed subjects), and of the proposed estimator studied in Section 4.
In Section 5.2, with sample size , sample bias (Bias), sample standard deviation (SD), estimated standard error (SE), coverage probability (CP), and length (Length) of 95% confidence interval of the estimator of FullData (using all simulated data), CC (using only completely observed subjects) and of the proposed estimator studied in Section 4.
| Parameter | Method | Bias | SD | SE | CP | Length | |
|---|---|---|---|---|---|---|---|
|
| FullData | 0.0001 | 0.0120 | 0.0132 | 0.9480 | 0.0515 | |
| CC | −0.0729 | 0.0180 | 0.0183 | 0.0370 | 0.0716 | ||
| Proposed | −0.0423 | 0.0500 | 0.0498 | 0.8200 | 0.1926 | ||
| True Nonzero |
| FullData | 0.0021 | 0.1686 | 0.1649 | 0.9400 | 0.6415 |
| CC | −0.6547 | 0.2207 | 0.2114 | 0.1460 | 0.8233 | ||
| Proposed | 0.0354 | 0.4698 | 0.4746 | 0.9320 | 1.8513 | ||
|
| Full Data | −0.0275 | 0.1692 | 0.1791 | 0.9440 | 0.6952 | |
| CC | −0.3501 | 0.2227 | 0.2174 | 0.6180 | 0.8471 | ||
| Proposed | −0.2654 | 0.5843 | 0.5609 | 0.8940 | 1.9237 | ||
|
| Full Data | −0.0172 | 0.1576 | 0.1756 | 0.9650 | 0.6826 | |
| CC | −0.4478 | 0.2172 | 0.2161 | 0.4370 | 0.8418 | ||
| Proposed | −0.1251 | 0.4037 | 0.4611 | 0.9330 | 1.8063 | ||
| True Zero |
| FullData | 0.0085 | 0.1567 | 0.1890 | 0.9960 | 0.7184 |
| CC | 0.0063 | 0.2067 | 0.2304 | 0.9890 | 0.8890 | ||
| Proposed | 0.0109 | 0.0988 | 0.1690 | 1.0000 | 0.4398 | ||
|
| Full Data | −0.0019 | 0.1581 | 0.1900 | 0.9940 | 0.7206 | |
| CC | −0.0017 | 0.2097 | 0.2307 | 0.9900 | 0.8914 | ||
| Proposed | 0.0126 | 0.1112 | 0.1447 | 1.0000 | 0.3668 | ||
|
| Full Data | 0.0045 | 0.1212 | 0.1606 | 0.9980 | 0.6146 | |
| CC | −0.0053 | 0.1749 | 0.1953 | 0.9900 | 0.7560 | ||
| Proposed | 0.0034 | 0.0664 | 0.1160 | 1.0000 | 0.2555 | ||
|
| Full Data | 0.0014 | 0.1351 | 0.1839 | 0.9980 | 0.7063 | |
| CC | −0.0055 | 0.1870 | 0.2245 | 0.9950 | 0.8717 | ||
| Proposed | 0.0024 | 0.0386 | 0.1115 | 1.0000 | 0.2538 | ||
|
| Full Data | −0.0072 | 0.1295 | 0.1748 | 0.9990 | 0.6653 | |
| CC | −0.0062 | 0.1795 | 0.2125 | 0.9940 | 0.8251 | ||
| Proposed | 0.0016 | 0.0741 | 0.1066 | 1.0000 | 0.2284 | ||
In Section 5.2, with sample size , sample bias (Bias), sample standard derivation (SD), estimated standard error (SE), coverage probability (CP), and length (Length) of 95% confidence interval of the estimator of FullData (using all simulated data), CC (using only completely observed subjects) and of the proposed estimator studied in Section 4.
| Parameter | Method | Bias | SD | SE | CP | Length | |
|---|---|---|---|---|---|---|---|
|
| FullData | −0.0005 | 0.0073 | 0.0088 | 0.9690 | 0.0344 | |
| CC | −0.0730 | 0.0126 | 0.0130 | 0.0000 | 0.0507 | ||
| Proposed | −0.0213 | 0.0311 | 0.0334 | 0.8700 | 0.1293 | ||
| True Nonzero |
| FullData | −0.0005 | 0.1186 | 0.1170 | 0.9300 | 0.4547 |
| CC | −0.6655 | 0.1568 | 0.1507 | 0.0090 | 0.5864 | ||
| Proposed | 0.0211 | 0.2911 | 0.2969 | 0.9300 | 1.1631 | ||
|
| Full Data | −0.0321 | 0.1175 | 0.1249 | 0.9550 | 0.4861 | |
| CC | −0.3387 | 0.1477 | 0.1534 | 0.3960 | 0.5972 | ||
| Proposed | −0.0979 | 0.2907 | 0.3383 | 0.9230 | 1.3115 | ||
|
| Full Data | −0.0225 | 0.1051 | 0.1206 | 0.9590 | 0.4698 | |
| CC | −0.4485 | 0.1478 | 0.1534 | 0.1770 | 0.5964 | ||
| Proposed | −0.0621 | 0.2351 | 0.2526 | 0.9290 | 0.9871 | ||
| True Zero |
| FullData | −0.0007 | 0.0621 | 0.1162 | 1.0000 | 0.4253 |
| CC | 0.0023 | 0.1414 | 0.1614 | 0.9920 | 0.6180 | ||
| Proposed | 0.0044 | 0.0581 | 0.0910 | 1.0000 | 0.2091 | ||
|
| Full Data | 0.0020 | 0.0632 | 0.1170 | 1.0000 | 0.4271 | |
| CC | −0.0005 | 0.1333 | 0.1608 | 0.9930 | 0.6207 | ||
| Proposed | 0.0063 | 0.0584 | 0.0887 | 1.0000 | 0.2107 | ||
|
| Full Data | 0.0013 | 0.0571 | 0.1010 | 1.0000 | 0.3670 | |
| CC | −0.0034 | 0.1159 | 0.1378 | 0.9950 | 0.5313 | ||
| Proposed | 0.0012 | 0.0281 | 0.0688 | 1.0000 | 0.1430 | ||
|
| Full Data | −0.0028 | 0.0599 | 0.1144 | 1.0000 | 0.4231 | |
| CC | −0.0033 | 0.1243 | 0.1584 | 0.9970 | 0.6131 | ||
| Proposed | 0.0016 | 0.0288 | 0.0698 | 1.0000 | 0.1421 | ||
|
| Full Data | 0.0039 | 0.0589 | 0.1080 | 1.0000 | 0.3970 | |
| CC | 0.0028 | 0.1256 | 0.1497 | 0.9940 | 0.5752 | ||
| Proposed | 0.0000 | 0.0333 | 0.0644 | 1.0000 | 0.1314 | ||
In Section 6, the parameter estimate (Estimate), standard error (SE), and confidence interval (CI) of the estimator of CC (using only completely observed subjects) and of the proposed estimator studied in Section 4 in the MIMIC−III study.
| Effect | CC | Proposed | ||||
|---|---|---|---|---|---|---|
| Estimate | SE | CI | Estimate | SE | CI | |
| Calcium(shadow) | 0.7707 | 0.0691 | [0.6532, 0.9153] | 1.5271 | 0.1796 | [1.1815, 1.8835] |
| Red Blood Cell | 0.6491 | 0.0514 | [0.5337, 0.7257] | 0.7545 | 0.1631 | [0.3594, 1.0109] |
| Magnesium | 0.0000 | 0.0686 | [−0.2073, 0.0000] | 0.2731 | 0.2452 | [0.0000, 0.6609] |
| SOFA | −0.2720 | 0.0268 | [−0.3135, −0.2099] | −0.1852 | 0.1040 | [−0.3467, 0.0000] |
| Temperature | −0.0360 | 0.0351 | [−0.0883, 0.0659] | 0.0000 | 0.0964 | [0.0000, 0.3132] |
| White Blood Cell | −0.0245 | 0.0123 | [−0.0416, 0.0000] | 0.0000 | 0.0025 | [0.0000, 0.0000] |
| Age | 0.0000 | 0.0008 | [0.0000, 0.0000] | 0.0000 | 0.0017 | [0.0000. 0.0000] |
| Gender | 0.0000 | 0.0240 | [−0.0477, 0.0662] | 0.0000 | 0.1320 | [−0.4025, 0.0000] |
| Respiratory Rate | 0.0000 | 0.0034 | [−0.0141, 0.0000] | 0.0000 | 0.0008 | [0.0000, 0.0000] |
| Glucose | 0.0000 | 0.0000 | [0.0000, 0.0000] | 0.0000 | 0.0005 | [0.0000, 0.0000] |
| Heart Rate | 0.0000 | 0.0025 | [−0.0091, 0.0000] | 0.0000 | 0.0004 | [0.0000, 0.0000] |
| Systolic BP | 0.0000 | 0.0045 | [−0.0139, 0.0000] | 0.0000 | 0.0000 | [0.0000, 0.0000] |
| Diastolic BP | 0.0000 | 0.0072 | [0.0000, 0.0223] | 0.0000 | 0.0000 | [0.0000, 0.0000] |
| Urea Nitrogen | 0.0000 | 0.0004 | [0.0000, 0.0000] | 0.0000 | 0.0000 | [0.0000, 0.0000] |
| Platelets | 0.0000 | 0.0000 | [0.0000, 0.0000] | 0.0000 | 0.0000 | [0.0000, 0.0000] |
| Hematocrit | 0.0000 | 0.0027 | [0.0000, 0.0000] | 0.0000 | 0.0000 | [0.0000, 0.0000] |
| SpO2 | 0.0000 | 0.0145 | [−0.0479, 0.0000] | 0.0000 | 0.0162 | [0.0000, 0.0000] |
| SAPS-II | 0.0000 | 0.0106 | [−0.0051, 0.0269] | 0.0000 | 0.0000 | [0.0000, 0.0000] |
Figure 2In Section 6, as tuning parameter varies, the solution path of the proposed estimator in the MIMIC-III study. The optimal , , equals and .