Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multiple alignment-free sequence comparison.

Literature DB >> 23990418

Multiple alignment-free sequence comparison.

Jie Ren¹, Kai Song, Fengzhu Sun, Minghua Deng, Gesine Reinert.

Abstract

MOTIVATION: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, C(*)1 and C(S)1, extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, C(*)2, C(S)2 and C(geo)2, averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences.
RESULTS: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. AVAILABILITY: Our implementation of the five statistics is available as R package named 'multiAlignFree' at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. CONTACT: reinert@stats.ox.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Gene

Mesh：

Substances：
Transcription Factors

Year: 2013 PMID： 23990418 PMCID： PMC3799466 DOI： 10.1093/bioinformatics/btt462

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

16 in total

1. Distributional regimes for the number of k-word matches between two random sequences.

Authors: Ross A Lippert; Haiyan Huang; Michael S Waterman
Journal: Proc Natl Acad Sci U S A Date: 2002-10-08 Impact factor: 11.205

2. An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes.

Authors: Manonmani Arunachalam; Karthik Jayasurya; Pavel Tomancak; Uwe Ohler
Journal: Bioinformatics Date: 2010-07-11 Impact factor: 6.937

3. ChIP-Seq identification of weakly conserved heart enhancers.

Authors: Matthew J Blow; David J McCulley; Zirong Li; Tao Zhang; Jennifer A Akiyama; Amy Holt; Ingrid Plajzer-Frick; Malak Shoukry; Crystal Wright; Feng Chen; Veena Afzal; James Bristow; Bing Ren; Brian L Black; Edward M Rubin; Axel Visel; Len A Pennacchio
Journal: Nat Genet Date: 2010-08-22 Impact factor: 38.330

4. ROCR: visualizing classifier performance in R.

Authors: Tobias Sing; Oliver Sander; Niko Beerenwinkel; Thomas Lengauer
Journal: Bioinformatics Date: 2005-08-11 Impact factor: 6.937

5. A statistical method for alignment-free comparison of regulatory sequences.

Authors: Miriam R Kantorovitz; Gene E Robinson; Saurabh Sinha
Journal: Bioinformatics Date: 2007-07-01 Impact factor: 6.937

6. Alignment-free sequence comparison (I): statistics and power.

Authors: Gesine Reinert; David Chew; Fengzhu Sun; Michael S Waterman
Journal: J Comput Biol Date: 2009-12 Impact factor: 1.479

7. Alignment-free sequence comparison based on next-generation sequencing reads.

Authors: Kai Song; Jie Ren; Zhiyuan Zhai; Xuemei Liu; Minghua Deng; Fengzhu Sun
Journal: J Comput Biol Date: 2013-02 Impact factor: 1.479

8. A measure of the similarity of sets of sequences not requiring sequence alignment.

Authors: B E Blaisdell
Journal: Proc Natl Acad Sci U S A Date: 1986-07 Impact factor: 11.205

9. Structure and evolution of a pair-rule interaction element: runt regulatory sequences in D. melanogaster and D. virilis.

Authors: C Wolff; M Pepling; P Gergen; M Klingler
Journal: Mech Dev Date: 1999-01 Impact factor: 1.882

Multiple alignment-free sequence comparison.

1. Distributional regimes for the number of k-word matches between two random sequences.

2. An alignment-free method to identify candidate orthologous enhancers in multiple Drosophila genomes.

3. ChIP-Seq identification of weakly conserved heart enhancers.

4. ROCR: visualizing classifier performance in R.

5. A statistical method for alignment-free comparison of regulatory sequences.

6. Alignment-free sequence comparison (I): statistics and power.

7. Alignment-free sequence comparison based on next-generation sequencing reads.

8. A measure of the similarity of sets of sequences not requiring sequence alignment.

9. Structure and evolution of a pair-rule interaction element: runt regulatory sequences in D. melanogaster and D. virilis.

10. Identification of common molecular subsequences.

1. Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

2. On the comparison of regulatory sequences with multiple resolution Entropic Profiles.

3. A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

4. Cnidaria: fast, reference-free clustering of raw and assembled genome and transcriptome NGS data.

5. Evidence for deep regulatory similarities in early developmental programs across highly diverged insects.

Review 6. Alignment-free sequence comparison: benefits, applications, and tools.