Literature DB >> 18366790

Efficient computation of absent words in genomic sequences.

Julia Herold1, Stefan Kurtz, Robert Giegerich.   

Abstract

BACKGROUND: Analysis of sequence composition is a routine task in genome research. Organisms are characterized by their base composition, dinucleotide relative abundance, codon usage, and so on. Unique subsequences are markers of special interest in genome comparison, expression profiling, and genetic engineering. Relative to a random sequence of the same length, unique subsequences are overrepresented in real genomes. Shortest words absent from a genome have been addressed in two recent studies.
RESULTS: We describe a new algorithm and software for the computation of absent words. It is more efficient than previous algorithms and easier to use. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays. Our implementation is available as an open source package. We compute unwords of human and mouse as well as some other organisms, covering a genome size range from 109 down to 105 bp.
CONCLUSION: The new algorithm computes absent words for the human genome in 10 minutes on standard hardware, using only 2.5 Mb of space. This enables us to perform this type of analysis not only for the largest genomes available so far, but also for the emerging pan- and meta-genome data.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18366790      PMCID: PMC2375138          DOI: 10.1186/1471-2105-9-167

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  13 in total

1.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution.

Authors:  C Workman; A Krogh
Journal:  Nucleic Acids Res       Date:  1999-12-15       Impact factor: 16.971

2.  Monotony of surprise and large-scale quest for unusual words.

Authors:  Alberto Apostolico; Mary Ellen Bock; Stefano Lonardi
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

3.  Mauve: multiple alignment of conserved genomic sequence with rearrangements.

Authors:  Aaron C E Darling; Bob Mau; Frederick R Blattner; Nicole T Perna
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

4.  Construction of a large signature-tagged mini-Tn5 transposon library and its application to mutagenesis of Sinorhizobium meliloti.

Authors:  Nataliya Pobigaylo; Danijel Wetter; Silke Szymczak; Ulf Schiller; Stefan Kurtz; Folker Meyer; Tim W Nattkemper; Anke Becker
Journal:  Appl Environ Microbiol       Date:  2006-06       Impact factor: 4.792

5.  The spectrum of genomic signatures: from dinucleotides to chaos game representation.

Authors:  Yingwei Wang; Kathleen Hill; Shiva Singh; Lila Kari
Journal:  Gene       Date:  2005-02-14       Impact factor: 3.688

6.  Absent sequences: nullomers and primes.

Authors:  Greg Hampikian; Tim Andersen
Journal:  Pac Symp Biocomput       Date:  2007

Review 7.  Structure and function of type II restriction endonucleases.

Authors:  A Pingoud; A Jeltsch
Journal:  Nucleic Acids Res       Date:  2001-09-15       Impact factor: 16.971

8.  The genome sequence of the filamentous fungus Neurospora crassa.

Authors:  James E Galagan; Sarah E Calvo; Katherine A Borkovich; Eric U Selker; Nick D Read; David Jaffe; William FitzHugh; Li-Jun Ma; Serge Smirnov; Seth Purcell; Bushra Rehman; Timothy Elkins; Reinhard Engels; Shunguang Wang; Cydney B Nielsen; Jonathan Butler; Matthew Endrizzi; Dayong Qui; Peter Ianakiev; Deborah Bell-Pedersen; Mary Anne Nelson; Margaret Werner-Washburne; Claude P Selitrennikoff; John A Kinsey; Edward L Braun; Alex Zelter; Ulrich Schulte; Gregory O Kothe; Gregory Jedd; Werner Mewes; Chuck Staben; Edward Marcotte; David Greenberg; Alice Roy; Karen Foley; Jerome Naylor; Nicole Stange-Thomann; Robert Barrett; Sante Gnerre; Michael Kamal; Manolis Kamvysselis; Evan Mauceli; Cord Bielke; Stephen Rudd; Dmitrij Frishman; Svetlana Krystofova; Carolyn Rasmussen; Robert L Metzenberg; David D Perkins; Scott Kroken; Carlo Cogoni; Giuseppe Macino; David Catcheside; Weixi Li; Robert J Pratt; Stephen A Osmani; Colin P C DeSouza; Louise Glass; Marc J Orbach; J Andrew Berglund; Rodger Voelker; Oded Yarden; Michael Plamann; Stephan Seiler; Jay Dunlap; Alan Radford; Rodolfo Aramayo; Donald O Natvig; Lisa A Alex; Gertrud Mannhaupt; Daniel J Ebbole; Michael Freitag; Ian Paulsen; Matthew S Sachs; Eric S Lander; Chad Nusbaum; Bruce Birren
Journal:  Nature       Date:  2003-04-24       Impact factor: 49.962

9.  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii.

Authors:  C J Bult; O White; G J Olsen; L Zhou; R D Fleischmann; G G Sutton; J A Blake; L M FitzGerald; R A Clayton; J D Gocayne; A R Kerlavage; B A Dougherty; J F Tomb; M D Adams; C I Reich; R Overbeek; E F Kirkness; K G Weinstock; J M Merrick; A Glodek; J L Scott; N S Geoghagen; J C Venter
Journal:  Science       Date:  1996-08-23       Impact factor: 47.728

10.  Genome comparison without alignment using shortest unique substrings.

Authors:  Bernhard Haubold; Nora Pierstorff; Friedrich Möller; Thomas Wiehe
Journal:  BMC Bioinformatics       Date:  2005-05-23       Impact factor: 3.169

View more
  18 in total

1.  Microbial diversity in saliva of oral squamous cell carcinoma.

Authors:  Smruti Pushalkar; Shrinivasrao P Mane; Xiaojie Ji; Yihong Li; Clive Evans; Oswald R Crasta; Douglas Morse; Robert Meagher; Anup Singh; Deepak Saxena
Journal:  FEMS Immunol Med Microbiol       Date:  2011-02-01

2.  Word-based characterization of promoters involved in human DNA repair pathways.

Authors:  Jens Lichtenberg; Edwin Jacox; Joshua D Welch; Kyle Kurz; Xiaoyu Liang; Mary Qu Yang; Frank Drews; Klaus Ecker; Stephen S Lee; Laura Elnitski; Lonnie R Welch
Journal:  BMC Genomics       Date:  2009-07-07       Impact factor: 3.969

3.  Spatial distribution of predicted transcription factor binding sites in Drosophila ChIP peaks.

Authors:  Kade P Pettie; Jacqueline M Dresch; Robert A Drewell
Journal:  Mech Dev       Date:  2016-06-02       Impact factor: 1.882

4.  Minimal absent words in prokaryotic and eukaryotic genomes.

Authors:  Sara P Garcia; Armando J Pinho; João M O S Rodrigues; Carlos A C Bastos; Paulo J S G Ferreira
Journal:  PLoS One       Date:  2011-01-31       Impact factor: 3.240

5.  Insertion site preference of Mu, Tn5, and Tn7 transposons.

Authors:  Brian Green; Christiane Bouchier; Cécile Fairhead; Nancy L Craig; Brendan P Cormack
Journal:  Mob DNA       Date:  2012-02-07

6.  Multiplex primer prediction software for divergent targets.

Authors:  Shea N Gardner; Amy L Hiddessen; Peter L Williams; Christine Hara; Mark C Wagner; Bill W Colston
Journal:  Nucleic Acids Res       Date:  2009-09-16       Impact factor: 16.971

7.  The word landscape of the non-coding segments of the Arabidopsis thaliana genome.

Authors:  Jens Lichtenberg; Alper Yilmaz; Joshua D Welch; Kyle Kurz; Xiaoyu Liang; Frank Drews; Klaus Ecker; Stephen S Lee; Matt Geisler; Erich Grotewold; Lonnie R Welch
Journal:  BMC Genomics       Date:  2009-10-08       Impact factor: 3.969

8.  Genomic DNA k-mer spectra: models and modalities.

Authors:  Benny Chor; David Horn; Nick Goldman; Yaron Levy; Tim Massingham
Journal:  Genome Biol       Date:  2009-10-08       Impact factor: 13.583

9.  Comparative analysis of DNA word abundances in four yeast genomes using a novel statistical background model.

Authors:  Ramkumar Hariharan; Reji Simon; M Radhakrishna Pillai; Todd D Taylor
Journal:  PLoS One       Date:  2013-03-05       Impact factor: 3.240

10.  Pervasive sequence patents cover the entire human genome.

Authors:  Jeffrey A Rosenfeld; Christopher E Mason
Journal:  Genome Med       Date:  2013-03-25       Impact factor: 11.117

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.