Literature DB >> 23990418

Multiple alignment-free sequence comparison.

Jie Ren1, Kai Song, Fengzhu Sun, Minghua Deng, Gesine Reinert.   

Abstract

MOTIVATION: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, C(*)1 and C(S)1, extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, C(*)2, C(S)2 and C(geo)2, averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences.
RESULTS: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. AVAILABILITY: Our implementation of the five statistics is available as R package named 'multiAlignFree' at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. CONTACT: reinert@stats.ox.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23990418      PMCID: PMC3799466          DOI: 10.1093/bioinformatics/btt462

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  16 in total

1.  Distributional regimes for the number of k-word matches between two random sequences.

Authors:  Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-10-08       Impact factor: 11.205

2.  An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes.

Authors:  Manonmani Arunachalam; Karthik Jayasurya; Pavel Tomancak; Uwe Ohler
Journal:  Bioinformatics       Date:  2010-07-11       Impact factor: 6.937

3.  ChIP-Seq identification of weakly conserved heart enhancers.

Authors:  Matthew J Blow; David J McCulley; Zirong Li; Tao Zhang; Jennifer A Akiyama; Amy Holt; Ingrid Plajzer-Frick; Malak Shoukry; Crystal Wright; Feng Chen; Veena Afzal; James Bristow; Bing Ren; Brian L Black; Edward M Rubin; Axel Visel; Len A Pennacchio
Journal:  Nat Genet       Date:  2010-08-22       Impact factor: 38.330

4.  ROCR: visualizing classifier performance in R.

Authors:  Tobias Sing; Oliver Sander; Niko Beerenwinkel; Thomas Lengauer
Journal:  Bioinformatics       Date:  2005-08-11       Impact factor: 6.937

5.  A statistical method for alignment-free comparison of regulatory sequences.

Authors:  Miriam R Kantorovitz; Gene E Robinson; Saurabh Sinha
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

6.  Alignment-free sequence comparison (I): statistics and power.

Authors:  Gesine Reinert; David Chew; Fengzhu Sun; Michael S Waterman
Journal:  J Comput Biol       Date:  2009-12       Impact factor: 1.479

7.  Alignment-free sequence comparison based on next-generation sequencing reads.

Authors:  Kai Song; Jie Ren; Zhiyuan Zhai; Xuemei Liu; Minghua Deng; Fengzhu Sun
Journal:  J Comput Biol       Date:  2013-02       Impact factor: 1.479

8.  A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors:  B E Blaisdell
Journal:  Proc Natl Acad Sci U S A       Date:  1986-07       Impact factor: 11.205

9.  Structure and evolution of a pair-rule interaction element: runt regulatory sequences in D. melanogaster and D. virilis.

Authors:  C Wolff; M Pepling; P Gergen; M Klingler
Journal:  Mech Dev       Date:  1999-01       Impact factor: 1.882

10.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

View more
  6 in total

1.  Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

Authors:  Jie Ren; Kai Song; Minghua Deng; Gesine Reinert; Charles H Cannon; Fengzhu Sun
Journal:  Bioinformatics       Date:  2015-06-30       Impact factor: 6.937

2.  On the comparison of regulatory sequences with multiple resolution Entropic Profiles.

Authors:  Matteo Comin; Morris Antonello
Journal:  BMC Bioinformatics       Date:  2016-03-18       Impact factor: 3.169

3.  A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

Authors:  Brian B Luczak; Benjamin T James; Hani Z Girgis
Journal:  Brief Bioinform       Date:  2019-07-19       Impact factor: 11.622

4.  Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data.

Authors:  Saulo Alves Aflitos; Edouard Severing; Gabino Sanchez-Perez; Sander Peters; Hans de Jong; Dick de Ridder
Journal:  BMC Bioinformatics       Date:  2015-11-02       Impact factor: 3.169

5.  Evidence for deep regulatory similarities in early developmental programs across highly diverged insects.

Authors:  Majid Kazemian; Kushal Suryamohan; Jia-Yu Chen; Yinan Zhang; Md Abul Hassan Samee; Marc S Halfon; Saurabh Sinha
Journal:  Genome Biol Evol       Date:  2014-09       Impact factor: 3.416

Review 6.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.