| Literature DB >> 29075164 |
Hye-Seung Lee1, Jeffrey P Krischer1.
Abstract
When prediction is a goal, validation utilizing data outside of the prediction effort is desirable. Typically, data is split into two parts: one for a development and one for validation. But this approach becomes less attractive when predicting uncommon events, as it substantially reduces power. When predicting uncommon events within a large prospective cohort study, we propose the use of a nested case-control design, which is an alternative to the full cohort analysis. By including all cases but only a subset of the non-cases, this design is expected to produce a result similar to the full cohort analysis. In our framework, variable selection is conducted and a prediction model is fit on those selected variables in the case-control cohort. Then, the fraction of true negative predictions (specificity) of the fitted prediction model in the case-control cohort is compared to that in the rest of the cohort (non-cases) for validation. In addition, we propose an iterative variable selection using random forest for missing data imputation, as well as a strategy for a valid classification. Our framework is illustrated with an application featuring high-dimensional variable selection in a large prospective cohort study.Entities:
Keywords: High dimensional variable selection; Nested case-control; Penalized regression; Random forest imputation; Validation
Year: 2017 PMID: 29075164 PMCID: PMC5654558 DOI: 10.3233/MAS-170397
Source DB: PubMed Journal: Model Assist Stat Appl ISSN: 1574-1699