Literature DB >> 21030741

VARUN: discovering extensible motifs under saturation constraints.

Alberto Apostolico1, Matteo Comin, Laxmi Parida.   

Abstract

The discovery of motifs in biosequences is frequently torn between the rigidity of the model on one hand and the abundance of candidates on the other hand. In particular, motifs that include wild cards or "don't cares" escalate exponentially with their number, and this gets only worse if a don't care is allowed to stretch up to some prescribed maximum length. In this paper, a notion of extensible motif in a sequence is introduced and studied, which tightly combines the structure of the motif pattern, as described by its syntactic specification, with the statistical measure of its occurrence count. It is shown that a combination of appropriate saturation conditions and the monotonicity of probabilistic scores over regions of constant frequency afford us significant parsimony in the generation and testing of candidate overrepresented motifs. A suite of software programs called Varun is described, implementing the discovery of extensible motifs of the type considered. The merits of the method are then documented by results obtained in a variety of experiments primarily targeting protein sequence families. Of equal importance seems the fact that the sets of all surprising motifs returned in each experiment are extracted faster and come in much more manageable sizes than would be obtained in the absence of saturation constraints.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21030741     DOI: 10.1109/TCBB.2008.123

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  3 in total

1.  Parallel continuous flow: a parallel suffix tree construction tool for whole genomes.

Authors:  Matteo Comin; Montse Farreras
Journal:  J Comput Biol       Date:  2014-03-05       Impact factor: 1.479

2.  On the comparison of regulatory sequences with multiple resolution Entropic Profiles.

Authors:  Matteo Comin; Morris Antonello
Journal:  BMC Bioinformatics       Date:  2016-03-18       Impact factor: 3.169

3.  Alignment-free phylogeny of whole genomes using underlying subwords.

Authors:  Matteo Comin; Davide Verzotto
Journal:  Algorithms Mol Biol       Date:  2012-12-06       Impact factor: 1.405

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.