S Ballouz1, W Verleyen1, J Gillis1. 1. Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, 500 Sunnyside Boulevard Woodbury, NY 11797, USA.
Abstract
MOTIVATION: RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks. RESULTS: We examine RNA-seq co-expression data generated from 1970 RNA-seq samples using a Guilt-By-Association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were >20 samples with read depth >10 M per sample. While the aggregate network constructed shows good performance (area under the receiver operator characteristic curve ∼0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples are required to obtain 'gold-standard' co-expression. We find a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology. CONTACT: jgillis@cshl.edu or sballouz@cshl.edu SUPPLEMENTARY INFORMATION: Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/ and supplementary data are available at Bioinformatics online.
MOTIVATION: RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks. RESULTS: We examine RNA-seq co-expression data generated from 1970 RNA-seq samples using a Guilt-By-Association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were >20 samples with read depth >10 M per sample. While the aggregate network constructed shows good performance (area under the receiver operator characteristic curve ∼0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples are required to obtain 'gold-standard' co-expression. We find a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology. CONTACT: jgillis@cshl.edu or sballouz@cshl.edu SUPPLEMENTARY INFORMATION: Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/ and supplementary data are available at Bioinformatics online.
Authors: Jennifer H Wisecaver; Alexander T Borowsky; Vered Tzin; Georg Jander; Daniel J Kliebenstein; Antonis Rokas Journal: Plant Cell Date: 2017-04-13 Impact factor: 11.277
Authors: Guilherme T Valente; Rafael T Nakajima; Bruno E A Fantinatti; Diego F Marques; Rodrigo O Almeida; Rafael P Simões; Cesar Martins Journal: Chromosoma Date: 2016-08-24 Impact factor: 4.316
Authors: Ashis Saha; Yungil Kim; Ariel D H Gewirtz; Brian Jo; Chuan Gao; Ian C McDowell; Barbara E Engelhardt; Alexis Battle Journal: Genome Res Date: 2017-10-11 Impact factor: 9.043
Authors: Peng Zhou; Zhi Li; Erika Magnusson; Fabio Gomez Cano; Peter A Crisp; Jaclyn M Noshay; Erich Grotewold; Candice N Hirsch; Steven P Briggs; Nathan M Springer Journal: Plant Cell Date: 2020-03-17 Impact factor: 11.277
Authors: Chi Nam Ignatius Pang; Sara Ballouz; Daniel Weissberger; Loïc M Thibaut; Joshua J Hamey; Jesse Gillis; Marc R Wilkins; Gene Hart-Smith Journal: Mol Cell Proteomics Date: 2020-08-18 Impact factor: 5.911