Literature DB >> 24064230

New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing.

Kai Song1, Jie Ren, Gesine Reinert, Minghua Deng, Michael S Waterman, Fengzhu Sun.   

Abstract

With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data.

Entities:  

Keywords:  Markov model; NGS data; alignment-free; genome comparison; statistical power; word patterns

Mesh:

Year:  2013        PMID: 24064230      PMCID: PMC4017329          DOI: 10.1093/bib/bbt067

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  52 in total

1.  Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA.

Authors:  A Campbell; J Mrázek; S Karlin
Journal:  Proc Natl Acad Sci U S A       Date:  1999-08-03       Impact factor: 11.205

2.  The average common substring approach to phylogenomic reconstruction.

Authors:  Igor Ulitsky; David Burstein; Tamir Tuller; Benny Chor
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

3.  Distinctive features of large complex virus genomes and proteomes.

Authors:  Jan Mrázek; Samuel Karlin
Journal:  Proc Natl Acad Sci U S A       Date:  2007-03-09       Impact factor: 11.205

4.  A statistical method for alignment-free comparison of regulatory sequences.

Authors:  Miriam R Kantorovitz; Gene E Robinson; Saurabh Sinha
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

5.  Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

Authors:  Qi Dai; Yanchun Yang; Tianming Wang
Journal:  Bioinformatics       Date:  2008-08-18       Impact factor: 6.937

6.  Compositional differences within and between eukaryotic genomes.

Authors:  S Karlin; J Mrázek
Journal:  Proc Natl Acad Sci U S A       Date:  1997-09-16       Impact factor: 11.205

7.  Whole genome molecular phylogeny of large dsDNA viruses using composition vector method.

Authors:  Lei Gao; Ji Qi
Journal:  BMC Evol Biol       Date:  2007-03-15       Impact factor: 3.260

8.  The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.

Authors:  Douglas B Rusch; Aaron L Halpern; Granger Sutton; Karla B Heidelberg; Shannon Williamson; Shibu Yooseph; Dongying Wu; Jonathan A Eisen; Jeff M Hoffman; Karin Remington; Karen Beeson; Bao Tran; Hamilton Smith; Holly Baden-Tillson; Clare Stewart; Joyce Thorpe; Jason Freeman; Cynthia Andrews-Pfannkoch; Joseph E Venter; Kelvin Li; Saul Kravitz; John F Heidelberg; Terry Utterback; Yu-Hui Rogers; Luisa I Falcón; Valeria Souza; Germán Bonilla-Rosso; Luis E Eguiarte; David M Karl; Shubha Sathyendranath; Trevor Platt; Eldredge Bermingham; Victor Gallardo; Giselle Tamayo-Castillo; Michael R Ferrari; Robert L Strausberg; Kenneth Nealson; Robert Friedman; Marvin Frazier; J Craig Venter
Journal:  PLoS Biol       Date:  2007-03       Impact factor: 8.029

9.  Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes.

Authors:  Ken Kurokawa; Takehiko Itoh; Tomomi Kuwahara; Kenshiro Oshima; Hidehiro Toh; Atsushi Toyoda; Hideto Takami; Hidetoshi Morita; Vineet K Sharma; Tulika P Srivastava; Todd D Taylor; Hideki Noguchi; Hiroshi Mori; Yoshitoshi Ogura; Dusko S Ehrlich; Kikuji Itoh; Toshihisa Takagi; Yoshiyuki Sakaki; Tetsuya Hayashi; Masahira Hattori
Journal:  DNA Res       Date:  2007-10-03       Impact factor: 4.458

10.  Comparison study on k-word statistical measures for protein: from sequence to 'sequence space'.

Authors:  Qi Dai; Tianming Wang
Journal:  BMC Bioinformatics       Date:  2008-09-23       Impact factor: 3.169

View more
  47 in total

1.  CAFE: aCcelerated Alignment-FrEe sequence analysis.

Authors:  Yang Young Lu; Kujin Tang; Jie Ren; Jed A Fuhrman; Michael S Waterman; Fengzhu Sun
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

2.  Phenetic Comparison of Prokaryotic Genomes Using k-mers.

Authors:  Maxime Déraspe; Frédéric Raymond; Sébastien Boisvert; Alexander Culley; Paul H Roy; François Laviolette; Jacques Corbeil
Journal:  Mol Biol Evol       Date:  2017-10-01       Impact factor: 16.240

3.  Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

Authors:  Jie Ren; Kai Song; Minghua Deng; Gesine Reinert; Charles H Cannon; Fengzhu Sun
Journal:  Bioinformatics       Date:  2015-06-30       Impact factor: 6.937

4.  Inferring Phylogenomic Relationship of Microbes Using Scalable Alignment-Free Methods.

Authors:  Guillaume Bernard; Timothy G Stephens; Raúl A González-Pech; Cheong Xin Chan
Journal:  Methods Mol Biol       Date:  2021

5.  Sequence Comparison Without Alignment: The SpaM Approaches.

Authors:  Burkhard Morgenstern
Journal:  Methods Mol Biol       Date:  2021

6.  The Amordad database engine for metagenomics.

Authors:  Ehsan Behnam; Andrew D Smith
Journal:  Bioinformatics       Date:  2014-06-27       Impact factor: 6.937

7.  Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences.

Authors:  Nathan A Ahlgren; Jie Ren; Yang Young Lu; Jed A Fuhrman; Fengzhu Sun
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

8.  CYP101J2, CYP101J3, and CYP101J4, 1,8-Cineole-Hydroxylating Cytochrome P450 Monooxygenases from Sphingobium yanoikuyae Strain B2.

Authors:  Birgit Unterweger; Dieter M Bulach; Judith Scoble; David J Midgley; Paul Greenfield; Dena Lyras; Priscilla Johanesen; Geoffrey J Dumsday
Journal:  Appl Environ Microbiol       Date:  2016-10-27       Impact factor: 4.792

9.  Identity: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models.

Authors:  Hani Z Girgis; Benjamin T James; Brian B Luczak
Journal:  NAR Genom Bioinform       Date:  2021-02-01

10.  VSEARCH: a versatile open source tool for metagenomics.

Authors:  Torbjørn Rognes; Tomáš Flouri; Ben Nichols; Christopher Quince; Frédéric Mahé
Journal:  PeerJ       Date:  2016-10-18       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.