MOTIVATION: Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. RESULTS: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. AVAILABILITY: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. CONTACT: alexander.statnikov@vanderbilt.edu.
MOTIVATION:Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. RESULTS: Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. AVAILABILITY: The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. CONTACT: alexander.statnikov@vanderbilt.edu.
Authors: Björn Usadel; Axel Nagel; Oliver Thimm; Henning Redestig; Oliver E Blaesing; Natalia Palacios-Rojas; Joachim Selbig; Jan Hannemann; Maria Conceição Piques; Dirk Steinhauser; Wolf-Rüdiger Scheible; Yves Gibon; Rosa Morcuende; Daniel Weicht; Svenja Meyer; Mark Stitt Journal: Plant Physiol Date: 2005-07 Impact factor: 8.340
Authors: Sara Aibar; Celia Fontanillo; Conrad Droste; Beatriz Roson-Burgo; Francisco J Campos-Laborie; Jesus M Hernandez-Rivas; Javier De Las Rivas Journal: BMC Genomics Date: 2015-05-26 Impact factor: 3.969
Authors: Zijian Yang; Michael J LaRiviere; Jina Ko; Jacob E Till; Theresa Christensen; Stephanie S Yee; Taylor A Black; Kyle Tien; Andrew Lin; Hanfei Shen; Neha Bhagwat; Daniel Herman; Andrew Adallah; Mark H O'Hara; Charles M Vollmer; Bryson W Katona; Ben Z Stanger; David Issadore; Erica L Carpenter Journal: Clin Cancer Res Date: 2020-04-16 Impact factor: 12.531
Authors: Jan C Peeken; Tatyana Goldberg; Christoph Knie; Basil Komboz; Michael Bernhofer; Francesco Pasa; Kerstin A Kessel; Pouya D Tafti; Burkhard Rost; Fridtjof Nüsslin; Andreas E Braun; Stephanie E Combs Journal: Strahlenther Onkol Date: 2018-03-20 Impact factor: 3.621
Authors: P R Bushel; A N Heinloth; J Li; L Huang; J W Chou; G A Boorman; D E Malarkey; C D Houle; S M Ward; R E Wilson; R D Fannin; M W Russo; P B Watkins; R W Tennant; R S Paules Journal: Proc Natl Acad Sci U S A Date: 2007-11-02 Impact factor: 11.205