Literature DB >> 20029662

Mining the NCBI Influenza Sequence Database: adaptive grouping of BLAST results using precalculated neighbor indexing.

Leonid Zaslavsky1, Tatiana Tatusova.   

Abstract

The Influenza Virus Resource and other Virus Variation Resources at NCBI provide enhanced visualization web tools for exploratory analysis for influenza sequence data. Despite the improvements in data analysis, the initial data retrieval remains unsophisticated, frequently producing huge and imbalanced datasets due to the large number of identical and nearly-identical sequences in the database.We propose a data mining algorithm to organize reported sequences into groups based on their relatedness to the query sequence and to each other. The algorithm uses BLAST to find database sequences related to the query. Neighbor lists precalculated from pairwise BLAST alignments between database sequences are used to organize results in groups of nearly-identical and strongly related sequences. We propose to use a non-symmetric dissimilarity measure well crafted for dealing with sequences of different length (fragments).A balanced and representative data set produced by this tool can be used for further analysis, i.e. multiple sequence alignment and phylogenetic trees. The algorithm is implemented for protein coding sequences and is being integrated with the NCBI Influenza Virus Resource.

Entities:  

Year:  2009        PMID: 20029662      PMCID: PMC2771650          DOI: 10.1371/currents.RRN1124

Source DB:  PubMed          Journal:  PLoS Curr        ISSN: 2157-3999


  13 in total

1.  A local alignment metric for accelerating biosequence database search.

Authors:  Peter A Spiro; Natasa Macura
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

2.  The influenza virus resource at the National Center for Biotechnology Information.

Authors:  Yiming Bao; Pavel Bolotov; Dmitry Dernovoy; Boris Kiryutin; Leonid Zaslavsky; Tatiana Tatusova; Jim Ostell; David Lipman
Journal:  J Virol       Date:  2007-10-17       Impact factor: 5.103

Review 3.  Influenza virus hemagglutinin cleavage into HA1, HA2: no laughing matter.

Authors:  J K Taubenberger
Journal:  Proc Natl Acad Sci U S A       Date:  1998-08-18       Impact factor: 11.205

4.  Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.

Authors:  Elodie Ghedin; Naomi A Sengamalay; Martin Shumway; Jennifer Zaborsky; Tamara Feldblyum; Vik Subbu; David J Spiro; Jeff Sitz; Hean Koo; Pavel Bolotov; Dmitry Dernovoy; Tatiana Tatusova; Yiming Bao; Kirsten St George; Jill Taylor; David J Lipman; Claire M Fraser; Jeffery K Taubenberger; Steven L Salzberg
Journal:  Nature       Date:  2005-10-05       Impact factor: 49.962

5.  Removing near-neighbour redundancy from large protein sequence collections.

Authors:  L Holm; C Sander
Journal:  Bioinformatics       Date:  1998-06       Impact factor: 6.937

6.  Rapid and sensitive protein similarity searches.

Authors:  D J Lipman; W R Pearson
Journal:  Science       Date:  1985-03-22       Impact factor: 47.728

7.  Improved tools for biological sequence comparison.

Authors:  W R Pearson; D J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  1988-04       Impact factor: 11.205

8.  Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic.

Authors:  Gavin J D Smith; Dhanasekaran Vijaykrishna; Justin Bahl; Samantha J Lycett; Michael Worobey; Oliver G Pybus; Siu Kit Ma; Chung Lam Cheung; Jayna Raghwani; Samir Bhatt; J S Malik Peiris; Yi Guan; Andrew Rambaut
Journal:  Nature       Date:  2009-06-25       Impact factor: 49.962

9.  Virus variation resources at the National Center for Biotechnology Information: dengue virus.

Authors:  Wolfgang Resch; Leonid Zaslavsky; Boris Kiryutin; Michael Rozanov; Yiming Bao; Tatiana A Tatusova
Journal:  BMC Microbiol       Date:  2009-04-02       Impact factor: 3.605

10.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

View more
  2 in total

1.  Tree Pruner: An efficient tool for selecting data from a biased genetic database.

Authors:  Mohan Krishnamoorthy; Pragneshkumar Patel; Mira Dimitrijevic; Jonathan Dietrich; Margaret Green; Catherine Macken
Journal:  BMC Bioinformatics       Date:  2011-02-09       Impact factor: 3.307

2.  Treetrimmer: a method for phylogenetic dataset size reduction.

Authors:  Shinichiro Maruyama; Robert J M Eveleigh; John M Archibald
Journal:  BMC Res Notes       Date:  2013-04-12
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.