Literature DB >> 9322011

Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system.

M A Andrade1, A Valencia.   

Abstract

We have developed a prototype for the automatic annotation of functional characteristics in protein families. The system is able to extract biological information directly from scientific literature in the form of MEDLINE abstracts. The criterion for selecting relevant keywords is the difference between their frequency in the abstracts associated with the protein family under study and its frequency in other unrelated protein families. The concept of functional information associated to protein families is the key feature of our system and gathers evolutionary information into the problem of functional annotation of biological sequences. The system has been tested in two different scenarios: first, a large set of protein families with a small number of abstract per family and second, selected protein families with large number of abstracts attached to each one. In both cases the performances are compared with annotations provided by human experts showing a clear relation between the amount of information provided to the system and the quality of the annotations. The automatic annotations are in many cases of similar quality to the ones contained in current data bases. The possibilities and difficulties to be encountered during the development of a full system for automatic annotation are discussed.

Entities:  

Mesh:

Substances:

Year:  1997        PMID: 9322011

Source DB:  PubMed          Journal:  Proc Int Conf Intell Syst Mol Biol        ISSN: 1553-0833


  10 in total

1.  Including biological literature improves homology search.

Authors:  J T Chang; S Raychaudhuri; R B Altman
Journal:  Pac Symp Biocomput       Date:  2001

2.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature.

Authors:  Soumya Raychaudhuri; Jeffrey T Chang; Patrick D Sutphin; Russ B Altman
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

3.  Creating an online dictionary of abbreviations from MEDLINE.

Authors:  Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal:  J Am Med Inform Assoc       Date:  2002 Nov-Dec       Impact factor: 4.497

4.  Using text analysis to identify functionally coherent gene groups.

Authors:  Soumya Raychaudhuri; Hinrich Schütze; Russ B Altman
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

5.  The computational analysis of scientific literature to define and recognize gene expression clusters.

Authors:  Soumya Raychaudhuri; Jeffrey T Chang; Farhad Imam; Russ B Altman
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

6.  Automatic extraction of mutations from Medline and cross-validation with OMIM.

Authors:  Dietrich Rebholz-Schuhmann; Stephane Marcel; Sylvie Albert; Ralf Tolle; Georg Casari; Harald Kirsch
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

7.  ChemicalTagger: A tool for semantic text-mining in chemistry.

Authors:  Lezan Hawizy; David M Jessop; Nico Adams; Peter Murray-Rust
Journal:  J Cheminform       Date:  2011-05-16       Impact factor: 5.514

8.  PubNet: a flexible system for visualizing literature derived networks.

Authors:  Shawn M Douglas; Gaetano T Montelione; Mark Gerstein
Journal:  Genome Biol       Date:  2005-08-16       Impact factor: 13.583

9.  Auto-CORPus: A Natural Language Processing Tool for Standardizing and Reusing Biomedical Literature.

Authors:  Tim Beck; Tom Shorter; Yan Hu; Zhuoyu Li; Shujian Sun; Casiana M Popovici; Nicholas A R McQuibban; Filip Makraduli; Cheng S Yeung; Thomas Rowlands; Joram M Posma
Journal:  Front Digit Health       Date:  2022-02-15

10.  PubMatrix: a tool for multiplex literature mining.

Authors:  Kevin G Becker; Douglas A Hosack; Glynn Dennis; Richard A Lempicki; Tiffani J Bright; Chris Cheadle; Jim Engel
Journal:  BMC Bioinformatics       Date:  2003-12-10       Impact factor: 3.169

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.