| Literature DB >> 26951763 |
Tjeerd van der Ploeg1,2, Ewout W Steyerberg3.
Abstract
BACKGROUND: Genetic comparisons of clinical and environmental Legionella strains form an essential part of outbreak investigations. DNA microarrays often comprise many DNA markers (features). Feature selection and the development of prediction models are particularly challenging in this domain with many variables and comparatively few subjects or data points. We aimed to compare modeling strategies to develop prediction models for classifying infections as clinical or environmental.Entities:
Mesh:
Year: 2016 PMID: 26951763 PMCID: PMC4782323 DOI: 10.1186/s13104-016-1945-2
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Fig. 1Feature selection and model development strategy
Fig. 2Evaluation of optimism for each strategy
Top 5 features VARSEL RF and SVM RFE and frequency of selection in 200 bootstrap resamples
| Technique | Top 5 features and frequencies | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| VARSELRF | LePn.007B8 | [196] | LePn.032E12 | [93] | LePn.004E8 | [71] | LePn.015B2 | [40] | LePn.035C6 | [40] |
| SVMRFE | LePn.007B8 | [88] | LePn.016E4 | [80] | LePn.033H2 | [77] | LePn.005H6 | [60] | LePn.033D7 | [54] |
Mean AUC and mean optimism VARSEL RF and SVM RFE
| Technique | Apparent AUC | Bootstrap AUC | Validated AUC | Optimism | |||
|---|---|---|---|---|---|---|---|
| Mean | 95 % CI | Mean | 95 % CI | Mean | 95 % CI | ||
| VARSELRF | 0.904 | 0.966 | [0.963; 0.969] | 0.966 | [0.963; 0.969] | 0.000 | [−0.004; 0.004] |
| SVMRFE | 0.964 | 0.991 | [0.990; 0.992] | 0.915 | [0.911; 0.919] | 0.076 | [0.072; 0.080] |
Top 5 features CART, RF, SVM and LASSO and frequency of selection in 200 bootstrap resamples
| Technique | Top 5 features and frequencies [] | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CART | LePn.007B8 | [200] | LePn.026A7 | [93] | LePn.027A12 | [76] | LePn.028A11 | [71] | LePn.016E4 | [66] |
| RF | LePn.007B8 | [200] | LePn.032E12 | [168] | LePn.004E8 | [151] | LePn.035C6 | [141] | LePn.016E4 | [100] |
| SVM | LePn.007B8 | [144] | LePn.035G3 | [111] | LePn.009C5 | [105] | LePn.012C5 | [97] | LePn.024C3 | [89] |
| LASSO | LePn.007B8 | [187] | LePn.033H2 | [146] | LePn.016E4 | [131] | LePn.010B12 | [83] | LePn.011B3 | [77] |
Mean AUC and mean optimism CART, RF, SVM and LASSO
| Technique | Apparent AUC | Bootstrap AUC | Validated AUC | Optimism | |||
|---|---|---|---|---|---|---|---|
| Mean | 95 % CI | Mean | 95 % CI | Mean | 95 % CI | ||
| CART | 0.929 | 0.937 | [0.933; 0.942] | 0.873 | [0.868; 0.878] | 0.064 | [0.060; 0.068] |
| RF | 0.938 | 0.980 | [0.978; 0.981] | 0.975 | [0.973; 0.976] | 0.005 | [0.003; 0.008] |
| SVM | 0.887 | 0.924 | [0.918; 0.930] | 0.859 | [0.852; 0.866] | 0.066 | [0.061; 0.071] |
| LASSO | 0.965 | 0.981 | [0.980; 0.983] | 0.925 | [0.922; 0.928] | 0.056 | [0.053; 0.060] |