Huaying Fang1, Chengcheng Huang2, Hongyu Zhao3, Minghua Deng4. 1. LMAN, School of Mathematical Sciences, Beijing International Center for Mathematical Research, Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China. 2. College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China. 3. Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA and. 4. LMAN, School of Mathematical Sciences, Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China, Center for Statistical Science, Peking University, Beijing 100871, China.
Abstract
MOTIVATION: Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data. RESULTS: In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project. AVAILABILITY AND IMPLEMENTATION: CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3. CONTACT: dengmh@pku.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data. RESULTS: In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project. AVAILABILITY AND IMPLEMENTATION: CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3. CONTACT: dengmh@pku.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Steven R Gill; Mihai Pop; Robert T Deboy; Paul B Eckburg; Peter J Turnbaugh; Buck S Samuel; Jeffrey I Gordon; David A Relman; Claire M Fraser-Liggett; Karen E Nelson Journal: Science Date: 2006-06-02 Impact factor: 47.728
Authors: Kevin Bu; David S Wallach; Zach Wilson; Nan Shen; Leopoldo N Segal; Emilia Bagiella; Jose C Clemente Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622
Authors: Natasha J Haveman; Christina L M Khodadad; Anirudha R Dixit; Artemis S Louyakis; Gioia D Massa; Kasthuri Venkateswaran; Jamie S Foster Journal: NPJ Microgravity Date: 2021-06-17 Impact factor: 4.415