| Literature DB >> 14757253 |
Helene Schulerud1, Fritz Albregtsen.
Abstract
We address the problems of feature selection and error estimation when the number of possible feature candidates is large and the number of training samples is limited. A Monte Carlo study has been performed to illustrate the problems when using stepwise feature selection and discriminant analysis. The simulations demonstrate that in order to find the correct features, the necessary ratio of number of training samples to feature candidates is not a constant. It depends on the number of feature candidates, training samples and the Mahalanobis distance between the classes. Moreover, the leave-one-out error estimate may be a highly biased error estimate when feature selection is performed on the same data as the error estimation. It may even indicate complete separation of the classes, while no real difference between the classes exists. However, if feature selection and leave-one-out error estimation are performed in one process, an unbiased error estimate is achieved, but with high variance. The holdout error estimate gives a reliable estimate with low variance, depending on the size of the test set.Mesh:
Year: 2004 PMID: 14757253 DOI: 10.1016/s0169-2607(03)00018-x
Source DB: PubMed Journal: Comput Methods Programs Biomed ISSN: 0169-2607 Impact factor: 5.428