Literature DB >> 11262956

Including biological literature improves homology search.

J T Chang1, S Raychaudhuri, R B Altman.   

Abstract

Annotating the tremendous amount of sequence information being generated requires accurate automated methods for recognizing homology. Although sequence similarity is only one of many indicators of evolutionary homology, it is often the only one used. Here we find that supplementing sequence similarity with information from biomedical literature is successful in increasing the accuracy of homology search results. We modified the PSI-BLAST algorithm to use literature similarity in each iteration of its database search. The modified algorithm is evaluated and compared to standard PSI-BLAST in searching for homologous proteins. The performance of the modified algorithm achieved 32% recall with 95% precision, while the original one achieved 33% recall with 84% precision; the literature similarity requirement preserved the sensitive characteristic of the PSI-BLAST algorithm while improving the precision.

Mesh:

Year:  2001        PMID: 11262956      PMCID: PMC2671075          DOI: 10.1142/9789814447362_0037

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  19 in total

1.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

Authors:  C A Wilson; J Kreychman; M Gerstein
Journal:  J Mol Biol       Date:  2000-03-17       Impact factor: 5.469

2.  The PSIPRED protein structure prediction server.

Authors:  L J McGuffin; K Bryson; D T Jones
Journal:  Bioinformatics       Date:  2000-04       Impact factor: 6.937

3.  Large-scale comparison of protein sequence alignment algorithms with structure alignments.

Authors:  J M Sauder; J W Arthur; R L Dunbrack
Journal:  Proteins       Date:  2000-07-01

4.  SAWTED: structure assignment with text description--enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons.

Authors:  R M MacCallum; L A Kelley; M J Sternberg
Journal:  Bioinformatics       Date:  2000-02       Impact factor: 6.937

5.  Position-specific annotation of protein function based on multiple homologs.

Authors:  M A Andrade
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1999

6.  Identification of related proteins on family, superfamily and fold level.

Authors:  E Lindahl; A Elofsson
Journal:  J Mol Biol       Date:  2000-01-21       Impact factor: 5.469

7.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

8.  The Protein Data Bank: a computer-based archival file for macromolecular structures.

Authors:  F C Bernstein; T F Koetzle; G J Williams; E F Meyer; M D Brice; J R Rodgers; O Kennard; T Shimanouchi; M Tasumi
Journal:  J Mol Biol       Date:  1977-05-25       Impact factor: 5.469

9.  A general method applicable to the search for similarities in the amino acid sequence of two proteins.

Authors:  S B Needleman; C D Wunsch
Journal:  J Mol Biol       Date:  1970-03       Impact factor: 5.469

10.  Identification of common molecular subsequences.

Authors:  T F Smith; M S Waterman
Journal:  J Mol Biol       Date:  1981-03-25       Impact factor: 5.469

View more
  6 in total

1.  A literature-based method for assessing the functional coherence of a gene group.

Authors:  Soumya Raychaudhuri; Russ B Altman
Journal:  Bioinformatics       Date:  2003-02-12       Impact factor: 6.937

2.  Discovering protein similarity using natural language processing.

Authors:  Indra N Sarkar; Thomas C Rindflesch
Journal:  Proc AMIA Symp       Date:  2002

3.  Using text analysis to identify functionally coherent gene groups.

Authors:  Soumya Raychaudhuri; Hinrich Schütze; Russ B Altman
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

4.  A sentence sliding window approach to extract protein annotations from biomedical articles.

Authors:  Martin Krallinger; Maria Padron; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

5.  Homology induction: the use of machine learning to improve sequence similarity searches.

Authors:  Andreas Karwath; Ross D King
Journal:  BMC Bioinformatics       Date:  2002-04-23       Impact factor: 3.169

6.  A linear-RBF multikernel SVM to classify big text corpora.

Authors:  R Romero; E L Iglesias; L Borrajo
Journal:  Biomed Res Int       Date:  2015-03-23       Impact factor: 3.411

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.