| Literature DB >> 24415822 |
Chaeryon Kang1, Hao Zhu2, Fred A Wright3, Fei Zou3, Michael R Kosorok3.
Abstract
We introduce the Interactive Decision Committee method for classification when high-dimensional feature variables are grouped into feature categories. The proposed method uses the interactive relationships among feature categories to build base classifiers which are combined using decision committees. A two-stage or a single-stage 5-fold cross-validation technique is utilized to decide the total number of base classifiers to be combined. The proposed procedure is useful for classifying biochemicals on the basis of toxicity activity, where the feature space consists of chemical descriptors and the responses are binary indicators of toxicity activity. Each descriptor belongs to at least one descriptor category. The support vector machine, the random forests, and the tree-based AdaBoost algorithms are utilized as classifier inducers. Forward selection is used to select the best combinations of the base classifiers given the number of base classifiers. Simulation studies demonstrate that the proposed method outperforms a single large, unaggregated classifier in the presence of interactive feature category information. We applied the proposed method to two toxicity data sets associated with chemical compounds. For these data sets, the proposed method improved classification performance for the majority of outcomes compared to a single large, unaggregated classifier.Entities:
Keywords: Chemical toxicity; Decision committee method; Ensemble; Ensemble feature selection; QSAR modeling; Statistical learning
Year: 2012 PMID: 24415822 PMCID: PMC3887560
Source DB: PubMed Journal: J Stat Res ISSN: 0256-422X