Literature DB >> 23383994

Alignment-free sequence comparison based on next-generation sequencing reads.

Kai Song1, Jie Ren, Zhiyuan Zhai, Xuemei Liu, Minghua Deng, Fengzhu Sun.   

Abstract

Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D(2), D(*)(2) and D(s)(2), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D(*)(2) and D(s)(2), outperform D(2) for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D(*)(2) and D(s)(2). Finally, variations of these statistics, d(2), d(*)(2) and d(s)(2), respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d(s)(2) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d(s)(2) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

Entities:  

Mesh:

Year:  2013        PMID: 23383994      PMCID: PMC3581251          DOI: 10.1089/cmb.2012.0228

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  18 in total

Review 1.  Alignment-free sequence comparison-a review.

Authors:  Susana Vinga; Jonas Almeida
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

2.  The power of detecting enriched patterns: an HMM approach.

Authors:  Zhiyuan Zhai; Shih-Yen Ku; Yihui Luan; Gesine Reinert; Michael S Waterman; Fengzhu Sun
Journal:  J Comput Biol       Date:  2010-04       Impact factor: 1.479

3.  Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack.

Authors:  Charles H Cannon; Chai-Shian Kua; D Zhang; J R Harting
Journal:  Mol Ecol       Date:  2010-03       Impact factor: 6.185

4.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors:  Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-02       Impact factor: 11.205

5.  Alignment-free sequence comparison (I): statistics and power.

Authors:  Gesine Reinert; David Chew; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2009-12       Impact factor: 1.479

6.  Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution.

Authors:  Se-Ran Jun; Gregory E Sims; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-12-14       Impact factor: 11.205

7.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser.

Authors:  Webb Miller; Kate Rosenbloom; Ross C Hardison; Minmei Hou; James Taylor; Brian Raney; Richard Burhans; David C King; Robert Baertsch; Daniel Blankenberg; Sergei L Kosakovsky Pond; Anton Nekrutenko; Belinda Giardine; Robert S Harris; Svitlana Tyekucheva; Mark Diekhans; Thomas H Pringle; William J Murphy; Arthur Lesk; George M Weinstock; Kerstin Lindblad-Toh; Richard A Gibbs; Eric S Lander; Adam Siepel; David Haussler; W James Kent
Journal:  Genome Res       Date:  2007-11-05       Impact factor: 9.043

8.  Identifying cis-regulatory sequences by word profile similarity.

Authors:  Garmay Leung; Michael B Eisen
Journal:  PLoS One       Date:  2009-09-04       Impact factor: 3.240

9.  Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs.

Authors:  Andra Ivan; Marc S Halfon; Saurabh Sinha
Journal:  Genome Biol       Date:  2008-01-28       Impact factor: 13.583

10.  MetaSim: a sequencing simulator for genomics and metagenomics.

Authors:  Daniel C Richter; Felix Ott; Alexander F Auch; Ramona Schmid; Daniel H Huson
Journal:  PLoS One       Date:  2008-10-08       Impact factor: 3.240

View more
  39 in total

1.  Multiple alignment-free sequence comparison.

Authors:  Jie Ren; Kai Song; Fengzhu Sun; Minghua Deng; Gesine Reinert
Journal:  Bioinformatics       Date:  2013-08-29       Impact factor: 6.937

Review 2.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Authors:  Kai Song; Jie Ren; Gesine Reinert; Minghua Deng; Michael S Waterman; Fengzhu Sun
Journal:  Brief Bioinform       Date:  2013-09-23       Impact factor: 11.622

3.  Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences.

Authors:  Chris-Andre Leimeister; Jendrik Schellhorn; Svenja Dörrer; Michael Gerth; Christoph Bleidorn; Burkhard Morgenstern
Journal:  Gigascience       Date:  2019-03-01       Impact factor: 6.524

4.  Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

Authors:  Jie Ren; Kai Song; Minghua Deng; Gesine Reinert; Charles H Cannon; Fengzhu Sun
Journal:  Bioinformatics       Date:  2015-06-30       Impact factor: 6.937

5.  Sequence Comparison Without Alignment: The SpaM Approaches.

Authors:  Burkhard Morgenstern
Journal:  Methods Mol Biol       Date:  2021

6.  Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.

Authors:  Matteo Comin; Michele Schimd
Journal:  BMC Bioinformatics       Date:  2014-09-10       Impact factor: 3.169

7.  Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses.

Authors:  Bonnie L Hurwitz; Anton H Westveld; Jennifer R Brum; Matthew B Sullivan
Journal:  Proc Natl Acad Sci U S A       Date:  2014-07-07       Impact factor: 11.205

8.  Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences.

Authors:  Nathan A Ahlgren; Jie Ren; Yang Young Lu; Jed A Fuhrman; Fengzhu Sun
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  Confidence intervals for Markov chain transition probabilities based on next generation sequencing reads data.

Authors:  Lin Wan; Xin Kang; Jie Ren; Fengzhu Sun
Journal:  Quant Biol       Date:  2020-05-25

10.  Reads Binning Improves the Assembly of Viral Genome Sequences From Metagenomic Samples.

Authors:  Kai Song
Journal:  Front Microbiol       Date:  2021-05-21       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.