Literature DB >> 12935329

Monotony of surprise and large-scale quest for unusual words.

Alberto Apostolico1, Mary Ellen Bock, Stefano Lonardi.   

Abstract

The problem of characterizing and detecting recurrent sequence patterns such as substrings or motifs and related associations or rules is variously pursued in order to compress data, unveil structure, infer succinct descriptions, extract and classify features, etc. In molecular biology, exceptionally frequent or rare words in bio-sequences have been implicated in various facets of biological function and structure. The discovery, particularly on a massive scale, of such patterns poses interesting methodological and algorithmic problems and often exposes scenarios in which tables and synopses grow faster and bigger than the raw sequences they are meant to encapsulate. In previous study, the ability to succinctly compute, store, and display unusual substrings has been linked to a subtle interplay between the combinatorics of the subword of a word and local monotonicities of some scores used to measure the departure from expectation. In this paper, we carry out an extensive analysis of such monotonicities for a broader variety of scores. This supports the construction of data structures and algorithms capable of performing global detection of unusual substrings in time and space linear in the subject sequences, under various probabilistic models.

Entities:  

Mesh:

Year:  2003        PMID: 12935329     DOI: 10.1089/10665270360688020

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  6 in total

1.  On avoided words, absent words, and their application to biological sequence analysis.

Authors:  Yannis Almirantis; Panagiotis Charalampopoulos; Jia Gao; Costas S Iliopoulos; Manal Mohamed; Solon P Pissis; Dimitris Polychronopoulos
Journal:  Algorithms Mol Biol       Date:  2017-03-14       Impact factor: 1.405

2.  Efficient computation of absent words in genomic sequences.

Authors:  Julia Herold; Stefan Kurtz; Robert Giegerich
Journal:  BMC Bioinformatics       Date:  2008-03-26       Impact factor: 3.169

3.  Efficient algorithms for the discovery of gapped factors.

Authors:  Alberto Apostolico; Cinzia Pizzi; Esko Ukkonen
Journal:  Algorithms Mol Biol       Date:  2011-03-23       Impact factor: 1.405

4.  Detecting seeded motifs in DNA sequences.

Authors:  Cinzia Pizzi; Stefania Bortoluzzi; Andrea Bisognin; Alessandro Coppe; Gian Antonio Danieli
Journal:  Nucleic Acids Res       Date:  2005-09-01       Impact factor: 16.971

5.  A multistep bioinformatic approach detects putative regulatory elements in gene promoters.

Authors:  Stefania Bortoluzzi; Alessandro Coppe; Andrea Bisognin; Cinzia Pizzi; Gian Antonio Danieli
Journal:  BMC Bioinformatics       Date:  2005-05-18       Impact factor: 3.169

6.  Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences.

Authors:  Derek Gatherer
Journal:  Bioinform Biol Insights       Date:  2009-11-24
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.