Literature DB >> 25252700

Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.

Matteo Comin, Michele Schimd.   

Abstract

BACKGROUND: With the advent of Next-Generation Sequencing technologies (NGS), a large amount of short read data has been generated. If a reference genome is not available, the assembly of a template sequence is usually challenging because of repeats and the short length of reads. When NGS reads cannot be mapped onto a reference genome alignment-based methods are not applicable. However it is still possible to study the evolutionary relationship of unassembled genomes based on NGS data.
RESULTS: We present a parameter-free alignment-free method, called Under2, based on variable-length patterns, for the direct comparison of sets of NGS reads. We define a similarity measure using variable-length patterns, as well as reverses and reverse-complements, along with their statistical and syntactical properties. We evaluate several alignment-free statistics on the comparison of NGS reads coming from simulated and real genomes. In almost all simulations our method Under2 outperforms all other statistics. The performance gain becomes more evident when real genomes are used.
CONCLUSION: The new alignment-free statistic is highly successful in discriminating related genomes based on NGS reads data. In almost all experiments, it outperforms traditional alignment-free statistics that are based on fixed length patterns.

Entities:  

Mesh:

Year:  2014        PMID: 25252700      PMCID: PMC4168702          DOI: 10.1186/1471-2105-15-S9-S1

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  22 in total

Review 1.  Alignment-free sequence comparison-a review.

Authors:  Susana Vinga; Jonas Almeida
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

2.  Distributional regimes for the number of k-word matches between two random sequences.

Authors:  Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-08       Impact factor: 11.205

3.  The average common substring approach to phylogenomic reconstruction.

Authors:  Igor Ulitsky; David Burstein; Tamir Tuller; Benny Chor
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

4.  Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions.

Authors:  Gregory E Sims; Se-Ran Jun; Guohong A Wu; Sung-Hou Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-02       Impact factor: 11.205

5.  Alignment-free sequence comparison (I): statistics and power.

Authors:  Gesine Reinert; David Chew; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2009-12       Impact factor: 1.479

6.  Alignment-free sequence comparison based on next-generation sequencing reads.

Authors:  Kai Song; Jie Ren; Zhiyuan Zhai; Xuemei Liu; Minghua Deng; Fengzhu Sun
Journal:  J Comput Biol       Date:  2013-02       Impact factor: 1.479

7.  A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors:  B E Blaisdell
Journal:  Proc Natl Acad Sci U S A       Date:  1986-07       Impact factor: 11.205

8.  Whole genome molecular phylogeny of large dsDNA viruses using composition vector method.

Authors:  Lei Gao; Ji Qi
Journal:  BMC Evol Biol       Date:  2007-03-15       Impact factor: 3.260

9.  Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

Authors:  Jonathan Göke; Marcel H Schulz; Julia Lasserre; Martin Vingron
Journal:  Bioinformatics       Date:  2012-01-12       Impact factor: 6.937

10.  Alignment-free phylogeny of whole genomes using underlying subwords.

Authors:  Matteo Comin; Davide Verzotto
Journal:  Algorithms Mol Biol       Date:  2012-12-06       Impact factor: 1.405

View more
  4 in total

1.  Clustering of reads with alignment-free measures and quality values.

Authors:  Matteo Comin; Andrea Leoni; Michele Schimd
Journal:  Algorithms Mol Biol       Date:  2015-01-28       Impact factor: 1.405

2.  On the comparison of regulatory sequences with multiple resolution Entropic Profiles.

Authors:  Matteo Comin; Morris Antonello
Journal:  BMC Bioinformatics       Date:  2016-03-18       Impact factor: 3.169

3.  Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

Authors:  Matteo Comin; Michele Schimd
Journal:  BMC Med Genomics       Date:  2016-08-12       Impact factor: 3.063

4.  Better quality score compression through sequence-based quality smoothing.

Authors:  Yoshihiro Shibuya; Matteo Comin
Journal:  BMC Bioinformatics       Date:  2019-11-22       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.