Literature DB >> 21209072

Adaptive seeds tame genomic sequence comparison.

Szymon M Kiełbasa1, Raymond Wan, Kengo Sato, Paul Horton, Martin C Frith.   

Abstract

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Mesh:

Substances:

Year:  2011        PMID: 21209072      PMCID: PMC3044862          DOI: 10.1101/gr.113985.110

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  24 in total

1.  WindowMasker: window-based masker for sequenced genomes.

Authors:  Aleksandr Morgulis; E Michael Gertz; Alejandro A Schäffer; Richa Agarwala
Journal:  Bioinformatics       Date:  2005-11-15       Impact factor: 6.937

2.  A unifying framework for seed sensitivity and its application to subset seeds.

Authors:  Gregory Kucherov; Laurent Noé; Mikhail Roytberg
Journal:  J Bioinform Comput Biol       Date:  2006-04       Impact factor: 1.122

3.  On subset seeds for protein alignment.

Authors:  Mikhail Roytberg; Anna Gambin; Laurent Noé; Slawomir Lasota; Eugenia Furletova; Ewa Szczurek; Gregory Kucherov
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2009 Jul-Sep       Impact factor: 3.710

4.  Space efficient computation of rare maximal exact matches between multiple sequences.

Authors:  Enno Ohlebusch; Stefan Kurtz
Journal:  J Comput Biol       Date:  2008-05       Impact factor: 1.479

5.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

Review 6.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

7.  The regulated retrotransposon transcriptome of mammalian cells.

Authors:  Geoffrey J Faulkner; Yasumasa Kimura; Carsten O Daub; Shivangi Wani; Charles Plessy; Katharine M Irvine; Kate Schroder; Nicole Cloonan; Anita L Steptoe; Timo Lassmann; Kazunori Waki; Nadine Hornig; Takahiro Arakawa; Hazuki Takahashi; Jun Kawai; Alistair R R Forrest; Harukazu Suzuki; Yoshihide Hayashizaki; David A Hume; Valerio Orlando; Sean M Grimmond; Piero Carninci
Journal:  Nat Genet       Date:  2009-04-19       Impact factor: 38.330

8.  How to map billions of short reads onto genomes.

Authors:  Cole Trapnell; Steven L Salzberg
Journal:  Nat Biotechnol       Date:  2009-05       Impact factor: 54.908

9.  EPD in its twentieth year: towards complete promoter coverage of selected model organisms.

Authors:  Christoph D Schmid; Rouaïda Perier; Viviane Praz; Philipp Bucher
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  Database indexing for production MegaBLAST searches.

Authors:  Aleksandr Morgulis; George Coulouris; Yan Raytselis; Thomas L Madden; Richa Agarwala; Alejandro A Schäffer
Journal:  Bioinformatics       Date:  2008-06-21       Impact factor: 6.937

View more
  465 in total

1.  ALP & FALP: C++ libraries for pairwise local alignment E-values.

Authors:  Sergey Sheetlin; Yonil Park; Martin C Frith; John L Spouge
Journal:  Bioinformatics       Date:  2015-10-01       Impact factor: 6.937

2.  CentroidHomfold-LAST: accurate prediction of RNA secondary structure using automatically collected homologous sequences.

Authors:  Michiaki Hamada; Koichiro Yamada; Kengo Sato; Martin C Frith; Kiyoshi Asai
Journal:  Nucleic Acids Res       Date:  2011-05-11       Impact factor: 16.971

3.  Genome structures and transcriptomes signify niche adaptation for the multiple-ion-tolerant extremophyte Schrenkiella parvula.

Authors:  Dong-Ha Oh; Hyewon Hong; Sang Yeol Lee; Dae-Jin Yun; Hans J Bohnert; Maheshi Dassanayake
Journal:  Plant Physiol       Date:  2014-02-21       Impact factor: 8.340

4.  Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics.

Authors:  Rendy Ruvindy; Richard Allen White; Brett Anthony Neilan; Brendan Paul Burns
Journal:  ISME J       Date:  2015-05-29       Impact factor: 10.302

5.  A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.

Authors:  Laurent Noé; Donald E K Martin
Journal:  J Comput Biol       Date:  2014-12       Impact factor: 1.479

6.  The large genome size variation in the Hesperis clade was shaped by the prevalent proliferation of DNA repeats and rarer genome downsizing.

Authors:  Petra Hloušková; Terezie Mandáková; Milan Pouch; Pavel Trávníček; Martin A Lysak
Journal:  Ann Bot       Date:  2019-08-02       Impact factor: 4.357

7.  Mutation load at a mimicry supergene sheds new light on the evolution of inversion polymorphisms.

Authors:  Paul Jay; Mathieu Chouteau; Annabel Whibley; Héloïse Bastide; Hugues Parrinello; Violaine Llaurens; Mathieu Joron
Journal:  Nat Genet       Date:  2021-01-25       Impact factor: 38.330

8.  A chromosome-scale genome assembly of Isatis indigotica, an important medicinal plant used in traditional Chinese medicine : An Isatis genome.

Authors:  Minghui Kang; Haolin Wu; Qiao Yang; Li Huang; Quanjun Hu; Tao Ma; Zaiyun Li; Jianquan Liu
Journal:  Hortic Res       Date:  2020-02-01       Impact factor: 6.793

9.  Yet Another Quick Assembly, Analysis and Trimming Tool (YAQAAT): A Server for the Automated Assembly and Analysis of Sanger Sequencing Data.

Authors:  Darius Wen-Shuo Koh; Kwok-Fong Chan; Weiling Wu; Samuel Ken-En Gan
Journal:  J Biomol Tech       Date:  2021-01-15

10.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.