Literature DB >> 21342588

Closest string with outliers.

Christina Boucher1, Bin Ma.   

Abstract

BACKGROUND: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many--but not necessarily all--input strings is an important task that plays a role in many applications in bioinformatics.
RESULTS: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the closest string with outliers (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n - k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set.We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n - k, ℓ, and d.
CONCLUSIONS: Our refined model abstractly models finding common patterns in several but not all input strings. We initialize the study of the computability of this model and show that it is sensitive to different parameterizations. Lastly, we conclude by suggesting several open problems which warrant further investigation.

Entities:  

Mesh:

Year:  2011        PMID: 21342588      PMCID: PMC3044313          DOI: 10.1186/1471-2105-12-S1-S55

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  6 in total

1.  An algorithm for finding signals of unknown length in DNA sequences.

Authors:  G Pavesi; G Mauri; G Pesole
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

2.  Combinatorial approaches to finding subtle signals in DNA sequences.

Authors:  P A Pevzner; S H Sze
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

3.  An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes.

Authors:  K Lucas; M Busch; S Mössinger; J A Thompson
Journal:  Comput Appl Biosci       Date:  1991-10

4.  Primer Master: a new program for the design and analysis of PCR primers.

Authors:  V Proutski; E C Holmes
Journal:  Comput Appl Biosci       Date:  1996-06

5.  Design of primers for PCR amplification of highly variable genomes.

Authors:  J Dopazo; A Rodríguez; J C Sáiz; F Sobrino
Journal:  Comput Appl Biosci       Date:  1993-04

6.  Assessing computational tools for the discovery of transcription factor binding sites.

Authors:  Martin Tompa; Nan Li; Timothy L Bailey; George M Church; Bart De Moor; Eleazar Eskin; Alexander V Favorov; Martin C Frith; Yutao Fu; W James Kent; Vsevolod J Makeev; Andrei A Mironov; William Stafford Noble; Giulio Pavesi; Graziano Pesole; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher Workman; Chun Ye; Zhou Zhu
Journal:  Nat Biotechnol       Date:  2005-01       Impact factor: 54.908

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.