Adam B Olshen1, Ajay N Jain. 1. Comprehensive Cancer Center, Cancer Research Institute, and Department of Laboratory Medicine, University of California, San Francisco, CA 94143-0128, USA.
Abstract
MOTIVATION: The last few years have seen the development of DNA microarray technology that allows simultaneous measurement of the expression levels of thousands of genes. While many methods have been developed to analyze such data, most have been visualization-based. Methods that yield quantitative conclusions have been diverse and complex. RESULTS: We present two straightforward methods for identifying specific genes whose expression is linked with a phenotype or outcome variable as well as for systematically predicting sample class membership: (1) a conservative, permutation-based approach to identifying differentially expressed genes; (2) an augmentation of K-nearest-neighbor pattern classification. Our analyses replicate the quantitative conclusions of Golub et al. (1999; Science, 286, 531-537) on leukemia data, with better classification results, using far simpler methods. With the breast tumor data of Perou et al. (2000; Nature, 406, 747-752), the methods lend rigorous quantitative support to the conclusions of the original paper. In the case of the lymphoma data in Alizadeh et al. (2000; Nature, 403, 503-511), our analyses only partially support the conclusions of the original authors. AVAILABILITY: The software and supplementary information are available freely to researchers at academic and non-profit institutions at http://cc.ucsf.edu/jain/public
MOTIVATION: The last few years have seen the development of DNA microarray technology that allows simultaneous measurement of the expression levels of thousands of genes. While many methods have been developed to analyze such data, most have been visualization-based. Methods that yield quantitative conclusions have been diverse and complex. RESULTS: We present two straightforward methods for identifying specific genes whose expression is linked with a phenotype or outcome variable as well as for systematically predicting sample class membership: (1) a conservative, permutation-based approach to identifying differentially expressed genes; (2) an augmentation of K-nearest-neighbor pattern classification. Our analyses replicate the quantitative conclusions of Golub et al. (1999; Science, 286, 531-537) on leukemia data, with better classification results, using far simpler methods. With the breast tumor data of Perou et al. (2000; Nature, 406, 747-752), the methods lend rigorous quantitative support to the conclusions of the original paper. In the case of the lymphoma data in Alizadeh et al. (2000; Nature, 403, 503-511), our analyses only partially support the conclusions of the original authors. AVAILABILITY: The software and supplementary information are available freely to researchers at academic and non-profit institutions at http://cc.ucsf.edu/jain/public
Authors: Weida Tong; Qian Xie; Huixiao Hong; Leming Shi; Hong Fang; Roger Perkins; Emanuel F Petricoin Journal: Environ Health Perspect Date: 2004-11 Impact factor: 9.031
Authors: Sergio E Baranzini; Parvin Mousavi; Jordi Rio; Stacy J Caillier; Althea Stillman; Pablo Villoslada; Matthew M Wyatt; Manuel Comabella; Larry D Greller; Roland Somogyi; Xavier Montalban; Jorge R Oksenberg Journal: PLoS Biol Date: 2004-12-28 Impact factor: 8.029
Authors: Cynthia J Coffman; Marta L Wayne; Sergey V Nuzhdin; Laura A Higgins; Lauren M McIntyre Journal: Genome Biol Date: 2005-06-01 Impact factor: 13.583