| Literature DB >> 35484507 |
Subhadip Pal1, Riten Mitra1, Arinjita Bhattacharyya1, Shesh Rai2,3,4,5,6.
Abstract
BACKGROUND: Prediction and classification algorithms are commonly used in clinical research for identifying patients susceptible to clinical conditions such as diabetes, colon cancer, and Alzheimer's disease. Developing accurate prediction and classification methods benefits personalized medicine. Building an excellent predictive model involves selecting the features that are most significantly associated with the outcome. These features can include several biological and demographic characteristics, such as genomic biomarkers and health history. Such variable selection becomes challenging when the number of potential predictors is large. Bayesian shrinkage models have emerged as popular and flexible methods of variable selection in regression settings. This work discusses variable selection with three shrinkage priors and illustrates its application to clinical data such as Pima Indians Diabetes, Colon cancer, ADNI, and OASIS Alzheimer's real-world data.Entities:
Keywords: ADNI; Data augmentation; Dirichlet Laplace; Horseshoe; Logistic regression; MCMC; Multinomial; Pima; Polya-Gamma; Shrinkage priors
Mesh:
Year: 2022 PMID: 35484507 PMCID: PMC9046716 DOI: 10.1186/s12874-022-01560-6
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.612
Prediction & Variable Selection Performance for LR with Shrinkage Priors
| Prediction | |||||||||
| Priors | BS1 | BS2 | BS3 | BS4 | BS5 | BS6 | BS7 | BS8 | BS9 |
| N,P, | 1000,10,0.5 | 200,10,0.5 | 400,20,0.5 | 500,50,0.3 | 300,10,0.5 | 100,10,0.5 | 100,130,0.5 | 1000,10,0 | 1000,10,0 |
| (10,10,10,10,5,5,0.1,0.1,0.1,0.1)’ | (10,10,10,10,5,5,0.1,0.1,0.1,0.1)’ | 40% non-zero | (1,1.5,-2,2.5,0,0,0,0,0,0)’ | (5,5,3,0.74,-0.9,0,0,0,0,0)’ | |||||
| Accuracy | |||||||||
| Horseshoe | 0.968(0.012) | 0.958(0.031) | 0.965(0.021) | 0.856(0.032) | 0.843(0.048) | 0.894(0.074) | 0.954(0.053) | 0.909(0.018) | 0.899(0.024) |
| Dirichlet Laplace | 0.968(0.012) | 0.958(0.030) | 0.965(0.020) | 0.839(0.039) | 0.842(0.050) | 0.894(0.075) | 0.938(0.062) | 0.911(0.018) | 0.898(0.024) |
| Double Pareto | 0.968(0.012) | 0.958(0.031) | 0.964(0.021) | 0.834(0.040) | 0.839(0.050) | 0.894(0.073) | 0.940(0.050) | 0.912(0.018) | 0.898(0.024) |
| Sensitivity | |||||||||
| Horseshoe | 0.967(0.016) | 0.965(0.043) | 0.964(0.031) | 0.853(0.047) | 0.840(0.067) | 0.897(0.107) | 0.940(0.076) | 0.909(0.027) | 0.914(0.030) |
| Dirichlet Laplace | 0.967(0.016) | 0.964(0.043) | 0.964(0.031) | 0.840(0.053) | 0.837(0.068) | 0.898(0.106) | 0.944(0.073) | 0.910(0.028) | 0.914(0.030) |
| Double Pareto | 0.967(0.016) | 0.964(0.044) | 0.964(0.030) | 0.835(0.055) | 0.835(0.067) | 0.898(0.105) | 0.946(0.073) | 0.912(0.028) | 0.922(0.029) |
| Specificity | |||||||||
| Horseshoe | 0.969(0.017) | 0.953(0.046) | 0.966(0.031) | 0.859(0.052) | 0.847(0.070) | 0.893(0.094) | 0.943(0.084) | 0.910(0.027) | 0.877(0.038) |
| Dirichlet Laplace | 0.969(0.017) | 0.955(0.046) | 0.967(0.028) | 0.838(0.058) | 0.847(0.070) | 0.890(0.097) | 0.935(0.079) | 0.911(0.026) | 0.875(0.039) |
| Double Pareto | 0.969(0.017) | 0.954(0.045) | 0.964(0.030) | 0.833(0.058) | 0.847(0.070) | 0.893(0.094) | 0.934(0.079) | 0.912(0.027) | 0.863(0.041) |
| Area Under Curve | |||||||||
| Horseshoe | 0.968(0.012) | 0.958(0.031) | 0.965(0.021) | 0.856(0.032) | 0.844(0.049) | 0.898(0.074) | 0.944(0.052) | 0.910(0.018) | 0.896(0.025) |
| Dirichlet Laplace | 0.968(0.012) | 0.958(0.029) | 0.966(0.020) | 0.839(0.039) | 0.842(0.050) | 0.897(0.075) | 0.939(0.063) | 0.911(0.018) | 0.895(0.025) |
| Double Pareto | 0.968(0.012) | 0.958(0.030) | 0.964(0.021) | 0.834(0.040) | 0.849(0.050) | 0.897(0.073) | 0.941(0.051) | 0.912(0.018) | 0.897(0.024) |
| Brier Score | |||||||||
| Horseshoe | 0.023(0.007) | 0.030(0.018) | 0.025(0.013) | 0.103(0.019) | 0.110(0.026) | 0.073(0.042) | 0.046(0.026) | 0.067(0.009) | 0.072(0.012) |
| Dirichlet Laplace | 0.023(0.007) | 0.029(0.016) | 0.025(0.011) | 0.114(0.022) | 0.111(0.026) | 0.073(0.041) | 0.044(0.026) | 0.066(0.009) | 0.072(0.012) |
| Double Pareto | 0.023(0.007) | 0.030(0.017) | 0.025(0.012) | 0.117(0.023) | 0.112(0.026) | 0.073(0.043) | 0.045(0.024) | 0.063(0.010) | 0.072(0.012) |
| Variable Selection | |||||||||
| Accuracy | |||||||||
| Horseshoe | 0.989(0.035) | 0.923(0.072) | 0.922(0.053) | 0.999(0.004) | 0.980(0.043) | 0.827(0.066) | 0.422(0.262) | 0.747(0.056) | 0.868(0.085) |
| Dirichlet Laplace | 0.994(0.024) | 0.920(0.070) | 0.914(0.049) | 0.972(0.024) | 0.981(0.044) | 0.829(0.064) | 0.504(0.275) | 0.758(0.062) | 0.856(0.107) |
| Double Pareto | 0.985(0.039) | 0.927(0.071) | 0.926(0.052) | 0.947(0.034) | 0.977(0.047) | 0.832(.071) | 0.527(0.274) | 0.820(0.065) | 0.940(0.078) |
| Sensitivity | |||||||||
| Horseshoe | 1.000(0.000) | 0.885(0.113) | 0.860(0.096) | 0.998(0.025) | 0.962(0.090) | 0.662(0.123) | 0.163(0.094) | 0.689(0.072) | 0.820(0.100) |
| Dirichlet Laplace | 1.000(0.000) | 0.868(0.114) | 0.838(0.093) | 0.998(0.025) | 0.978(0.072) | 0.668(0.114) | 0.136(0.096) | 0.704(0.079) | 0.848(0.106) |
| Double Pareto | 1.000(0.000) | 0.882(0.112) | 0.864(0.096) | 1.000(0.000) | 0.978(0.072) | 0.682(0.121) | 0.129(0.096) | 0.785(0.076) | 0.978(0.056) |
| Specificity | |||||||||
| Horseshoe | 0.972(0.086) | 0.980(0.068) | 0.985(0.039) | 0.999(0.004) | 0.992(0.037) | 0.992(0.039) | 0.999(0.004) | 0.980(0.098) | 0.940(0.113) |
| Dirichlet Laplace | 0.985(0.060) | 0.998(0.025) | 0.989(0.031) | 0.969(0.026) | 0.983(0.058) | 0.990(0.044) | 0.999(0.003) | 0.975(0.110) | 0.868(0.165) |
| Double Pareto | 0.962(0.096) | 0.995(0.035) | 0.988(0.033) | 0.942(0.037) | 0.977(0.050) | 0.982(0.058) | 0.999(0.002) | 0.960(0.136) | 0.882(0.165) |
| L1 error | |||||||||
| Horseshoe | 2.358(0.417) | 3.444(3.895) | 2.152(1.194) | 0.057(0.017) | 0.197(0.064) | 1.311(1.776) | 1.977(0.229) | 1.998(0.109) | 1.696(0.053) |
| Dirichlet Laplace | 2.474(0.326) | 2.953(0.388) | 2.148(0.329) | 0.187(0.045) | 0.213(0.073) | 0.644(0.289) | 1.955(0.220) | 1.970(0.068) | 1.689(0.055) |
| Double Pareto | 2.421(0.387) | 2.669(0.601) | 1.997(0.438) | 0.230(0.053) | 0.231(0.076) | 0.938(0.720) | 1.960(0.237) | 1.669(0.077) | 1.479(0.089) |
| L2 error | |||||||||
| Horseshoe | 3.063(0.572) | 4.435(4.931) | 2.971(1.722) | 0.102(0.036) | 0.259(0.091) | 1.942(2.797) | 2.562(0.299) | 2.384(0.128) | 2.128(0.060) |
| Dirichlet Laplace | 3.231(0.446) | 3.849(0.507) | 3.015(0.585) | 0.277(0.081) | 0.276(0.103) | 0.896(0.456) | 2.582(0.293) | 2.349(0.075) | 2.118(0.063) |
| Double Pareto | 3.146(0.529) | 3.469(0.772) | 2.781(0.459) | 0.330(0.092) | 0.293(0.106) | 1.359(1.176) | 2.594(0.333) | 1.980(0.088) | 1.855(0.102) |
Prediction Performance & Variable Selection for MLR with Shrinkage Priors
| Prediction | |||||||
| Priors | MS1 | MS2 | MS3 | MS4 | MS5 | MS6 | MS7 |
| N,P,J,m | 400,4, 3, (-2,-1,0,1,2)’ | 250,4,3,(-2,-1,0,1,2)’ | 400,10, 3,0 | 1000,30, 3,0 | 500,50, 3,0 | 600,20, 5,0 | 300,400, 3,0 |
| ( | |||||||
| Accuracy | |||||||
| Horseshoe | 0.812(0.048) | 0.812(0.047) | 0.712(0.078) | 0.873(0.022) | 0.816(0.044) | 0.760(0.034) | 0.576(0.071) |
| Dirichlet Laplace | 0.811(0.048) | 0.811(0.050) | 0.722(0.050) | 0.875(0.023) | 0.818(0.043) | 0.759(0.034) | 0.602(0.068) |
| Double Pareto | 0.813(0.048) | 0.812(0.046) | 0.713(0.078) | 0.873(0.021) | 0.816(0.044) | 0.760(0.034) | 0.583(0.069) |
| Miss-classification Error | |||||||
| Horseshoe | 0.188(0.048) | 0.188(0.047) | 0.282(0.052) | 0.127(0.022) | 0.184(0.044) | 0.240(0.034) | 0.424(0.071) |
| Dirichlet Laplace | 0.189(0.048) | 0.189(0.050) | 0.278(0.050) | 0.125(0.023) | 0.182(0.043) | 0.241(0.034) | 0.398(0.068) |
| Double Pareto | 0.187(0.048) | 0.188(0.046) | 0.281(0.052) | 0.127(0.021) | 0.184(0.044) | 0.240(0.034) | 0.417(0.069) |
| C-Entropy | |||||||
| Horseshoe | 0.487(0.071) | 0.491(0.069) | 0.670(0.073) | 0.327(0.031) | 0.432(0.072) | 0.632(0.060) | 1.557(0.308) |
| Dirichlet Laplace | 0.480(0.089) | 0.488(0.097) | 0.665(0.100) | 0.313(0.056) | 0.620(0.177) | 0.677(0.101) | 4.113(0.969) |
| Double Pareto | 0.488(0.070) | 0.492(0.069) | 0.671(0.072) | 0.330(0.030) | 0.430(0.073) | 0.634(0.059) | 1.963(0.424) |
| AUC | |||||||
| Horseshoe | 0.719(0.060) | 0.731(0.065) | 0.710(0.047) | 0.877(0.029) | 0.824(0.050) | 0.771(0.030) | 0.632(0.066) |
| Dirichlet Laplace | 0.708(0.059) | 0.713(0.072) | 0.716(0.047) | 0.884(0.026) | 0.827(0.048) | 0.767(0.039) | 0.654(0.062) |
| Double Pareto | 0.720(0.059) | 0.730(0.064) | 0.713(0.048) | 0.877(0.028) | 0.825(0.048) | 0.773(0.031) | 0.636(0.065) |
| Variable Selection | |||||||
| Accuracy | |||||||
| Horseshoe | 0.996(0.021) | 0.970(0.057) | 0.901(0.040) | 0.986(0.015) | 0.776(0.033) | 0.754(0.039) | 0.850(0.071) |
| Dirichlet Laplace | 0.969(0.060) | 0.934(0.091) | 0.929(0.055) | 0.954(0.026) | 0.812(0.039) | 0.789(0.048) | 0.880(0.006) |
| Double Pareto | 0.992(0.030) | 0.976(0.052) | 0.903(0.036) | 0.986(0.014) | 0.776(0.034) | 0.749(0.039) | 0.868(0.004) |
| Sensitivity | |||||||
| Horseshoe | 0.998(0.025) | 0.950(0.101) | 0.838(0.056) | 1.000(0.000) | 0.675(0.045) | 0.659(0.057) | 0.052(0.021) |
| Dirichlet Laplace | 0.970(0.082) | 0.920(0.132) | 0.936(0.069) | 1.000(0.000) | 0.849(0.046) | 0.779(0.062) | 0.237(0.033) |
| Double Pareto | 0.998(0.025) | 0.965(0.087) | 0.838(0.053) | 1.000(0.000) | 0.677(0.047) | 0.651(0.055) | 0.122(0.026) |
| Specificity | |||||||
| Horseshoe | 0.995(0.035) | 0.990(0.049) | 0.964(0.056) | 0.978(0.024) | 0.920(0.036) | 0.875(0.051) | 0.000(0.000) |
| Dirichlet Laplace | 0.968(0.084) | 0.948(0.125) | 0.922(0.089) | 0.929(0.040) | 0.759(0.063) | 0.803(0.075) | 0.993(0.004) |
| Double Pareto | 0.988(0.055) | 0.988(0.055) | 0.968(0.053) | 0.979(0.022) | 0.919(0.040) | 0.875(0.049) | 0.999(0.002) |
| L1 error | |||||||
| Horseshoe | 0.255(0.035) | 0.294(0.038) | 0.275(0.024) | 0.404(0.012) | 0.335(0.013) | 0.323(0.019) | 0.006(0.006) |
| Dirichlet Laplace | 0.215(0.077) | 0.293(0.113) | 0.206(0.060) | 0.246(0.047) | 0.550(0.035) | 0.366(0.055) | 0.024(0.024) |
| Double Pareto | 0.260(0.035) | 0.295(0.037) | 0.281(0.024) | 0.414(0.012) | 0.328(0.014) | 0.332(0.018) | 0.011(0.011) |
| L2 error | |||||||
| Horseshoe | 0.313(0.035) | 0.358(0.037) | 0.354(0.023) | 0.582(0.009) | 0.430(0.013) | 0.409(0.018) | 0.007(0.007) |
| Dirichlet Laplace | 0.270(0.105) | 0.366(0.149) | 0.256(0.072) | 0.313(0.064) | 0.687(0.044) | 0.464(0.067) | 0.025(0.025) |
| Double Pareto | 0.322(0.035) | 0.363(0.037) | 0.364(0.025) | 0.601(0.010) | 0.419(0.015) | 0.420(0.017) | 0.013(0.013) |
Fig. 1(a) Variable Selection and (b) Prediction Performance in LR with Shrinkage Priors across Simulation Scenarios
Fig. 2(a) Variable Selection and (b) Prediction Performance in MLR with Shrinkage Priors across Simulation Scenarios
Data Description
| Data | Availability | N, P | Variables | Outcome |
|---|---|---|---|---|
| Pima Indians Diabetes | R package “mlbench” [ | 768,8 | no. of times pregnant ( | tested positive for diabetes ( |
| Colon | R package ‘HiDimDA’ [ | 62, 2000 | human genes | 40 tumor ( |
| ADNI | Alzheimer’s Disease Neu-roimaging Initiative (ADNI) database (adni.loni.usc.edu) R package ADNIMERGE [ | 14712, 113, after pre-processing: 911, 22 | Age, CDRSB_bl: Clinical Dementia Rating Sum of Boxes (core), ADAS11_bl: 11 item-AD Cognitive Scale (score), ADAS13_bl: 13 item-AD Cognitive Scale (score), MMSE_bl: Mini-Mental State Examination (score), RAVLT_immediate_bl, RAVLT_learning_bl, RAVLT_forgetting_bl, RAVLT_perc_forgetting_bl: Rey’s Auditory Verbal Learning Test (scores for immediate response, learning, forgetting and percentage forgetting), FAQ_bl: Functional Activities Questionnaire, APOE4: APOE4 gene presence, Hippocampus_bl: Volume of hippocampus, Ventricles_bl: Volume of ventricles, WholeBrain_bl: volume of Brain, Fusiform_bl: The volume of the fusiform gyrus, Entorhinal_bl: The volume of the entorhinal cortex, MidTemp_bl: The volume of the middle temporal gyrus, ICV: Intra Cranial Volume, PTGENDER: Participant’s gender, PTETHCAT: Participant’s ethnicity, PTRACCAT: Participant’s race, PTMARRY: Participant’s marital status | LR model: AD ( |
| OASIS | Open Access Series of Imaging Studies (OASIS) [ | 373, 15; after pre-processing 373, 8 | Visit: number of visits, gender, age, EDUC: Education, SES: Socioeconomic status as assessed by the Hollings head Index of Social Position, MMSE: Mini-Mental State Examination score, nWBV: Normalized whole brain volume, ASF: Atlas Scaling Factor | CDR: Clinical Dementia Rating (0 = no dementia, 0.5 = very mild AD, 1 = mild AD, 2 = moderate AD); For LR model: mild AD or AD ( |
Prediction Performance in Real Life Data for LR
| Priors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Measures | Horseshoe | DL | DP | BLasso | BElastic | Lasso | EN | Ridge | Gradient-Boosting | Random Forest | BART |
| ADNI | |||||||||||
| Accuracy | 0.842 (0.736) | 0.831 (0.747) | 0.842 (0.747) | 0.907 | 0.852 | 0.907 | 0.907 | 0.902 | 0.918 | 0.923 | 0.907 |
| Sensitivity | 0.810 | 0.796 | 0.810 | 0.964 | 0.810 | 0.964 | 0.964 | 0.993 | 0.971 | 0.985 | 0.964 |
| Specificity | 0.935 | 0.935 | 0.935 | 0.739 | 0.978 | 0.739 | 0.739 | 0.630 | 0.761 | 0.739 | 0.739 |
| AUC | 0.798 (0.880) | 0.789 (0.884) | 0.798 (0.883) | 0.851 | 0.894 | 0.894 | 0.894 | 0.928 | 0.866 | 0.862 | 0.851 |
| Brier Score | 0.107 | 0.109 | 0.110 | 0.093 | 0.113 | 0.093 | 0.093 | 0.098 | 0.082 | 0.077 | 0.093 |
| OASIS | |||||||||||
| Accuracy | 0.733 (0.741) | 0.733 (0.688) | 0.733 (0.741) | 0.827 | 0.800 | 0.827 | 0.840 | 0.787 | 0.760 | 0.760 | 0.773 |
| Sensitivity | 0.818 | 0.818 | 0.818 | 1.000 | 0.977 | 1.000 | 1.000 | 1.000 | 0.841 | 0.841 | 0.864 |
| Specificity | 0.613 | 0.613 | 0.613 | 0.581 | 0.548 | 0.581 | 0.613 | 0.484 | 0.645 | 0.645 | 0.645 |
| AUC | 0.727(0.764) | 0.727 (0.701) | 0.727(0.764) | 0.790 | 0.763 | 0.886 | 0.893 | 0.867 | 0.756 | 0.756 | 0.772 |
| Brier Score | 0.129 | 0.130 | 0.129 | 0.173 | 0.126 | 0.173 | 0.160 | 0.213 | 0.240 | 0.240 | 0.227 |
| Pima Indian Diabetes | |||||||||||
| Accuracy | 0.727 | 0.727 | 0.727 | 0.727 | 0.708 | 0.727 | 0.734 | 0.734 | 0.786 | 0.792 | 0.786 |
| Sensitivity | 0.705 | 0.705 | 0.705 | 0.867 | 0.667 | 0.867 | 0.876 | 0.895 | 0.864 | 0.893 | 0.874 |
| Specificity | 0.776 | 0.776 | 0.776 | 0.429 | 0.796 | 0.429 | 0.429 | 0.388 | 0.627 | 0.588 | 0.608 |
| AUC | 0.711 | 0.711 | 0.711 | 0.648 | 0.731 | 0.682 | 0.692 | 0.696 | 0.746 | 0.741 | 0.741 |
| Brier Score | 0.197 | 0.197 | 0.197 | 0.273 | 0.199 | 0.273 | 0.266 | 0.266 | 0.214 | 0.208 | 0.214 |
| Colon | |||||||||||
| Accuracy | 0.846 | 0.769 | 0.769 | 0.923 | 0.769 | 0.923 | 0.923 | 0.923 | 0.846 | 0.846 | 0.692 |
| Sensitivity | 1.000 | 0.750 | 0.750 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.667 | 0.667 | 0.333 |
| Specificity | 0.778 | 0.778 | 0.778 | 0.889 | 0.667 | 0.889 | 0.889 | 0.889 | 0.900 | 0.900 | 0.900 |
| AUC | 0.889 | 0.764 | 0.764 | 0.944 | 0.833 | 0.944 | 0.944 | 0.944 | 0.783 | 0.783 | 0.567 |
| Brier Score | 0.121 | 0.224 | 0.240 | 0.077 | 0.276 | 0.077 | 0.077 | 0.077 | 0.154 | 0.154 | 0.308 |
Fig. 3Circular Bar Chart comparing Prediction Metrics among data sets
Fig. 4ROC Surface Plot for Shrinkage priors in ADNI data