| Literature DB >> 18853300 |
D M Hawkins1, J J Kraker, S C Basak, D Mills.
Abstract
Traditionally, QSAR and QSPR models have been fitted by splitting the available compounds into separate learning and validation sets. The model is then fitted to the learning set and assessed using the validation set. Cross-validation (CV) uses all available compounds for both purposes, so that the full body of available information is brought to bear on both the learning and the validation portions of the study. The price paid for this additional information is a substantially greater computational load. A common mistake in using CV is to omit some of the repetitive computations. This mistake leads to substantial bias in the assessment. A hydroxyl radical reaction rate dataset is used to illustrate the superiority of CV and the pitfalls from its improper execution when modeling using nearest neighbors, paralleling behavior in the well-studied linear model setting.Mesh:
Substances:
Year: 2008 PMID: 18853300 DOI: 10.1080/10629360802349058
Source DB: PubMed Journal: SAR QSAR Environ Res ISSN: 1026-776X Impact factor: 3.000