| Literature DB >> 31097038 |
Princy Parsana1, Claire Ruberman2, Andrew E Jaffe2,3,4,5,6, Michael C Schatz1,7, Alexis Battle8,9, Jeffrey T Leek10,11.
Abstract
Gene co-expression networks capture biological relationships between genes and are important tools in predicting gene function and understanding disease mechanisms. We show that technical and biological artifacts in gene expression data confound commonly used network reconstruction algorithms. We demonstrate theoretically, in simulation, and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false discoveries. Using data from the GTEx project in multiple tissues, we show that this approach reduces false discoveries beyond correcting only for known confounders.Entities:
Mesh:
Year: 2019 PMID: 31097038 PMCID: PMC6521369 DOI: 10.1186/s13059-019-1700-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Toy simulation example. (a-f) This toy simulation shows the reconstruction of gene co-expression networks is affected by confounders. (g-i) True underlying network structure can be reconstructed after principal component correction of gene expression data as described in the paper
Fig. 2False discovery rate of WGCNA modules and graphical lasso networks based on canonical pathways (a–f). The density of networks inferred from PC-corrected data is sparser (g–l). a–c FDR of WGCNA networks obtained at varying cut heights. Each point corresponds to FDR of the network obtained at a specific cut height. Each color represents networks reconstructed with a specific correction approach. d–f Each point in the figure corresponds to false discovery rates of networks obtained at a specific L1 penalty parameter value (lambda) in the graphical lasso. Each color represents networks reconstructed with a specific correction approach—uncorrected, multi-covariate, RIN, and PC corrected. g–i Each point corresponds to a number of edges in networks inferred by WGCNA at a cut height. j–l Each point corresponds to a number of edges inferred by graphical lasso in networks obtained at a specific L1 penalty parameter value. Networks inferred by PC-corrected data have fewer edges compared to uncorrected or RIN-corrected data