Literature DB >> 34864851

Identifying correlations driven by influential observations in large datasets.

Kevin Bu1, David S Wallach1, Zach Wilson1, Nan Shen1, Leopoldo N Segal2, Emilia Bagiella3, Jose C Clemente1,4.   

Abstract

Although high-throughput data allow researchers to interrogate thousands of variables simultaneously, it can also introduce a significant number of spurious results. Here we demonstrate that correlation analysis of large datasets can yield numerous false positives due to the presence of outliers that canonical methods fail to identify. We present Correlations Under The InfluencE (CUTIE), an open-source jackknifing-based method to detect such cases with both parametric and non-parametric correlation measures, and which can also uniquely rescue correlations not originally deemed significant or with incorrect sign. Our approach can additionally be used to identify variables or samples that induce these false correlations in high proportion. A meta-analysis of various omics datasets using CUTIE reveals that this issue is pervasive across different domains, although microbiome data are particularly susceptible to it. Although the significance of a correlation eventually depends on the thresholds used, our approach provides an efficient way to automatically identify those that warrant closer examination in very large datasets.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  correlation analysis; microbiome; multiomic analysis; statistics

Mesh:

Year:  2022        PMID: 34864851      PMCID: PMC8769929          DOI: 10.1093/bib/bbab482

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  32 in total

Review 1.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data.

Authors:  Robert Clarke; Habtom W Ressom; Antai Wang; Jianhua Xuan; Minetta C Liu; Edmund A Gehan; Yue Wang
Journal:  Nat Rev Cancer       Date:  2008-01       Impact factor: 60.716

Review 2.  Use of systems biology to decipher host-pathogen interaction networks and predict biomarkers.

Authors:  A Dix; S Vlaic; R Guthke; J Linde
Journal:  Clin Microbiol Infect       Date:  2016-04-22       Impact factor: 8.067

3.  Correlation detection strategies in microbial data sets vary widely in sensitivity and precision.

Authors:  Sophie Weiss; Will Van Treuren; Catherine Lozupone; Karoline Faust; Jonathan Friedman; Ye Deng; Li Charlie Xia; Zhenjiang Zech Xu; Luke Ursell; Eric J Alm; Amanda Birmingham; Jacob A Cram; Jed A Fuhrman; Jeroen Raes; Fengzhu Sun; Jizhong Zhou; Rob Knight
Journal:  ISME J       Date:  2016-02-23       Impact factor: 10.302

4.  Statin therapy is associated with lower prevalence of gut microbiota dysbiosis.

Authors:  Sara Vieira-Silva; Gwen Falony; Eugeni Belda; Trine Nielsen; Judith Aron-Wisnewsky; Rima Chakaroun; Sofia K Forslund; Karen Assmann; Mireia Valles-Colomer; Thi Thuy Duyen Nguyen; Sebastian Proost; Edi Prifti; Valentina Tremaroli; Nicolas Pons; Emmanuelle Le Chatelier; Fabrizio Andreelli; Jean-Phillippe Bastard; Luis Pedro Coelho; Nathalie Galleron; Tue H Hansen; Jean-Sébastien Hulot; Christian Lewinter; Helle K Pedersen; Benoit Quinquis; Christine Rouault; Hugo Roume; Joe-Elie Salem; Nadja B Søndertoft; Sothea Touch; Marc-Emmanuel Dumas; Stanislav Dusko Ehrlich; Pilar Galan; Jens P Gøtze; Torben Hansen; Jens J Holst; Lars Køber; Ivica Letunic; Jens Nielsen; Jean-Michel Oppert; Michael Stumvoll; Henrik Vestergaard; Jean-Daniel Zucker; Peer Bork; Oluf Pedersen; Fredrik Bäckhed; Karine Clément; Jeroen Raes
Journal:  Nature       Date:  2020-05-06       Impact factor: 49.962

5.  Genetic control of human brain transcript expression in Alzheimer disease.

Authors:  Jennifer A Webster; J Raphael Gibbs; Jennifer Clarke; Monika Ray; Weixiong Zhang; Peter Holmans; Kristen Rohrer; Alice Zhao; Lauren Marlowe; Mona Kaleem; Donald S McCorquodale; Cindy Cuello; Doris Leung; Leslie Bryden; Priti Nath; Victoria L Zismann; Keta Joshipura; Matthew J Huentelman; Diane Hu-Lince; Keith D Coon; David W Craig; John V Pearson; Christopher B Heward; Eric M Reiman; Dietrich Stephan; John Hardy; Amanda J Myers
Journal:  Am J Hum Genet       Date:  2009-04       Impact factor: 11.025

6.  Inferring correlation networks from genomic survey data.

Authors:  Jonathan Friedman; Eric J Alm
Journal:  PLoS Comput Biol       Date:  2012-09-20       Impact factor: 4.475

7.  Microbial co-occurrence relationships in the human microbiome.

Authors:  Karoline Faust; J Fah Sathirapongsasuti; Jacques Izard; Nicola Segata; Dirk Gevers; Jeroen Raes; Curtis Huttenhower
Journal:  PLoS Comput Biol       Date:  2012-07-12       Impact factor: 4.475

8.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

9.  Stability of gene contributions and identification of outliers in multivariate analysis of microarray data.

Authors:  Florent Baty; Daniel Jaeger; Frank Preiswerk; Martin M Schumacher; Martin H Brutsche
Journal:  BMC Bioinformatics       Date:  2008-06-20       Impact factor: 3.169

10.  Multiscale Embedded Gene Co-expression Network Analysis.

Authors:  Won-Min Song; Bin Zhang
Journal:  PLoS Comput Biol       Date:  2015-11-30       Impact factor: 4.475

View more
  2 in total

Review 1.  Lung microbial-host interface through the lens of multi-omics.

Authors:  Shivani Singh; Jake G Natalini; Leopoldo N Segal
Journal:  Mucosal Immunol       Date:  2022-07-06       Impact factor: 8.701

Review 2.  Novel technologies to characterize and engineer the microbiome in inflammatory bowel disease.

Authors:  Alba Boix-Amorós; Hilary Monaco; Elisa Sambataro; Jose C Clemente
Journal:  Gut Microbes       Date:  2022 Jan-Dec
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.