Li C Xia1, Dongmei Ai, Jacob Cram, Jed A Fuhrman, Fengzhu Sun. 1. Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089-2910, USA.
Abstract
MOTIVATION: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation. RESULTS: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples. AVAILABILITY: The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSA's website: http://meta.usc.edu/softs/lsa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: fsun@usc.edu.
MOTIVATION: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation. RESULTS: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples. AVAILABILITY: The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSA's website: http://meta.usc.edu/softs/lsa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: fsun@usc.edu.
Authors: P T Spellman; G Sherlock; M Q Zhang; V R Iyer; K Anders; M B Eisen; P O Brown; D Botstein; B Futcher Journal: Mol Biol Cell Date: 1998-12 Impact factor: 4.138
Authors: Cheryl-Emiliane T Chow; Rohan Sachdeva; Jacob A Cram; Joshua A Steele; David M Needham; Anand Patel; Alma E Parada; Jed A Fuhrman Journal: ISME J Date: 2013-07-18 Impact factor: 10.302
Authors: Zhenqiu Liu; Fengzhu Sun; Jonathan Braun; Dermot P B McGovern; Steven Piantadosi Journal: Bioinformatics Date: 2014-11-20 Impact factor: 6.937
Authors: Sophie Weiss; Will Van Treuren; Catherine Lozupone; Karoline Faust; Jonathan Friedman; Ye Deng; Li Charlie Xia; Zhenjiang Zech Xu; Luke Ursell; Eric J Alm; Amanda Birmingham; Jacob A Cram; Jed A Fuhrman; Jeroen Raes; Fengzhu Sun; Jizhong Zhou; Rob Knight Journal: ISME J Date: 2016-02-23 Impact factor: 10.302