Jennifer M Franks1, Guoshuai Cai2, Michael L Whitfield1,3. 1. Department of Molecular and Systems Biology. 2. Department of Environmental Health Sciences, Arnold School of Public Health, University of South Carolina, Columbia, SC, 29208, USA. 3. Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, 03756, USA.
Abstract
Motivation: Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity and therapeutic responses. However, technical biases inherent to different gene expression profiling platforms present a unique problem when analyzing data generated from different studies. Currently, there is a lack of effective methods designed to eliminate platform-based bias. We present a method to normalize and classify RNA-seq data using machine learning classifiers trained on DNA microarray data and molecular subtypes in two datasets: breast invasive carcinoma (BRCA) and colorectal cancer (CRC). Results: Multiple analyses show that feature specific quantile normalization (FSQN) successfully removes platform-based bias from RNA-seq data, regardless of feature scaling or machine learning algorithm. We achieve up to 98% accuracy for BRCA data and 97% accuracy for CRC data in assigning molecular subtypes to RNA-seq data normalized using FSQN and a support vector machine trained exclusively on DNA microarray data. We find that maximum accuracy was achieved when normalizing RNA-seq datasets that contain at least 25 samples. FSQN allows comparison of RNA-seq data to existing DNA microarray datasets. Using these techniques, we can successfully leverage information from existing gene expression data in new analyses despite different platforms used for gene expression profiling. Availability and implementation: FSQN has been submitted as an R package to CRAN. All code used for this study is available on Github (https://github.com/jenniferfranks/FSQN). Contact: michael.l.whitfield@dartmouth.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity and therapeutic responses. However, technical biases inherent to different gene expression profiling platforms present a unique problem when analyzing data generated from different studies. Currently, there is a lack of effective methods designed to eliminate platform-based bias. We present a method to normalize and classify RNA-seq data using machine learning classifiers trained on DNA microarray data and molecular subtypes in two datasets: breast invasive carcinoma (BRCA) and colorectal cancer (CRC). Results: Multiple analyses show that feature specific quantile normalization (FSQN) successfully removes platform-based bias from RNA-seq data, regardless of feature scaling or machine learning algorithm. We achieve up to 98% accuracy for BRCA data and 97% accuracy for CRC data in assigning molecular subtypes to RNA-seq data normalized using FSQN and a support vector machine trained exclusively on DNA microarray data. We find that maximum accuracy was achieved when normalizing RNA-seq datasets that contain at least 25 samples. FSQN allows comparison of RNA-seq data to existing DNA microarray datasets. Using these techniques, we can successfully leverage information from existing gene expression data in new analyses despite different platforms used for gene expression profiling. Availability and implementation: FSQN has been submitted as an R package to CRAN. All code used for this study is available on Github (https://github.com/jenniferfranks/FSQN). Contact: michael.l.whitfield@dartmouth.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Sarah E Reese; Kellie J Archer; Terry M Therneau; Elizabeth J Atkinson; Celine M Vachon; Mariza de Andrade; Jean-Pierre A Kocher; Jeanette E Eckel-Passow Journal: Bioinformatics Date: 2013-08-19 Impact factor: 6.937
Authors: C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein Journal: Nature Date: 2000-08-17 Impact factor: 49.962
Authors: Joel S Parker; Michael Mullins; Maggie C U Cheang; Samuel Leung; David Voduc; Tammi Vickery; Sherri Davies; Christiane Fauron; Xiaping He; Zhiyuan Hu; John F Quackenbush; Inge J Stijleman; Juan Palazzo; J S Marron; Andrew B Nobel; Elaine Mardis; Torsten O Nielsen; Matthew J Ellis; Charles M Perou; Philip S Bernard Journal: J Clin Oncol Date: 2009-02-09 Impact factor: 44.544
Authors: María Salazar-Roa; Marianna Trakala; Mónica Álvarez-Fernández; Fátima Valdés-Mora; Cuiqing Zhong; Jaime Muñoz; Yang Yu; Timothy J Peters; Osvaldo Graña-Castro; Rosa Serrano; Elisabet Zapatero-Solana; María Abad; María José Bueno; Marta Gómez de Cedrón; José Fernández-Piqueras; Manuel Serrano; María A Blasco; Da-Zhi Wang; Susan J Clark; Juan Carlos Izpisua-Belmonte; Sagrario Ortega; Marcos Malumbres Journal: EMBO J Date: 2020-07-02 Impact factor: 11.598
Authors: Brian Skaug; Dinesh Khanna; William R Swindell; Monique E Hinchcliff; Tracy M Frech; Virginia D Steen; Faye N Hant; Jessica K Gordon; Ami A Shah; Lisha Zhu; W Jim Zheng; Jeffrey L Browning; Alexander M S Barron; Minghua Wu; Sudha Visvanathan; Patrick Baum; Jennifer M Franks; Michael L Whitfield; Victoria K Shanmugam; Robyn T Domsic; Flavia V Castelino; Elana J Bernstein; Nancy Wareing; Marka A Lyons; Jun Ying; Julio Charles; Maureen D Mayes; Shervin Assassi Journal: Ann Rheum Dis Date: 2019-11-25 Impact factor: 19.103
Authors: Jennifer M Franks; Viktor Martyanov; Guoshuai Cai; Yue Wang; Zhenghui Li; Tammara A Wood; Michael L Whitfield Journal: Arthritis Rheumatol Date: 2019-09-02 Impact factor: 15.483
Authors: Rachel Shahan; Che-Wei Hsu; Trevor M Nolan; Benjamin J Cole; Isaiah W Taylor; Laura Greenstreet; Stephen Zhang; Anton Afanassiev; Anna Hendrika Cornelia Vlot; Geoffrey Schiebinger; Philip N Benfey; Uwe Ohler Journal: Dev Cell Date: 2022-02-07 Impact factor: 13.417
Authors: Michael E Johnson; Jennifer M Franks; Guoshuai Cai; Bhaven K Mehta; Tammara A Wood; Kimberly Archambault; Patricia A Pioli; Robert W Simms; Nicole Orzechowski; Sarah Arron; Michael L Whitfield Journal: Arthritis Res Ther Date: 2019-02-06 Impact factor: 5.606
Authors: Scott Kopetz; Dipen M Maru; Jeffrey S Morris; Rajyalakshmi Luthra; Yusha Liu; Dzifa Y Duose; Wonyul Lee; Neelima G Reddy; Justin Windham; Huiqin Chen; Zhimin Tong; Baili Zhang; Wei Wei; Manyam Ganiraju; Bradley M Broom; Hector A Alvarez; Alicia Mejia; Omkara Veeranki; Mark J Routbort; Van K Morris; Michael J Overman; David Menter; Riham Katkhuda; Ignacio I Wistuba; Jennifer S Davis Journal: Clin Cancer Res Date: 2020-10-27 Impact factor: 13.801