Literature DB >> 9730925

Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families.

M A Andrade1, A Valencia.   

Abstract

MOTIVATION: Annotation of the biological function of different protein sequences is a time-consuming process currently performed by human experts. Genome analysis tools encounter great difficulty in performing this task. Database curators, developers of genome analysis tools and biologists in general could benefit from access to tools able to suggest functional annotations and facilitate access to functional information. APPROACH: We present here the first prototype of a system for the automatic annotation of protein function. The system is triggered by collections of s related to a given protein, and it is able to extract biological information directly from scientific literature, i.e. MEDLINE abstracts. Relevant keywords are selected by their relative accumulation in comparison with a domain-specific background distribution. Simultaneously, the most representative sentences and MEDLINE abstracts are selected and presented to the end-user. Evolutionary information is considered as a predominant characteristic in the domain of protein function. Our system consequently extracts domain-specific information from the analysis of a set of protein families.
RESULTS: The system has been tested with different protein families, of which three examples are discussed in detail here: 'ataxia-telangiectasia associated protein', 'ran GTPase' and 'carbonic anhydrase'. We found generally good correlation between the amount of information provided to the system and the quality of the annotations. Finally, the current limitations and future developments of the system are discussed. AVAILABILITY: The current system can be considered as a prototype system. As such, it can be accessed as a server at http://columba.ebi.ac. uk:8765/andrade/abx. The system accepts text related to the protein or proteins to be evaluated (optimally, the result of a MEDLINE search by keyword) and the results are returned in the form of Web pages for keywords, sentences and s. SUPPLEMENTARY INFORMATION: Web pages containing full information on the examples mentioned in the text are available at: http://www.cnb.uam.es/ approximately cnbprot/keywords/ CONTACT: valencia@cnb.uam.es

Entities:  

Mesh:

Substances:

Year:  1998        PMID: 9730925     DOI: 10.1093/bioinformatics/14.7.600

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  27 in total

1.  Mapping abbreviations to full forms in biomedical articles.

Authors:  Hong Yu; George Hripcsak; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2002 May-Jun       Impact factor: 4.497

2.  Quantitative assessment of dictionary-based protein named entity tagging.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

3.  Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists.

Authors:  Vladimir B Bajic; Merlin Veronika; Pardha Sarathi Veladandi; Archana Meka; Mok-Wei Heng; Kanagasabai Rajaraman; Hong Pan; Sanjay Swarup
Journal:  Plant Physiol       Date:  2005-08       Impact factor: 8.340

4.  eFIP: a tool for mining functional impact of phosphorylation from literature.

Authors:  Cecilia N Arighi; Amy Y Siu; Catalina O Tudor; Jules A Nchoutmboube; Cathy H Wu; Vijay K Shanker
Journal:  Methods Mol Biol       Date:  2011

5.  eGIFT: mining gene information from the literature.

Authors:  Catalina O Tudor; Carl J Schmidt; K Vijay-Shanker
Journal:  BMC Bioinformatics       Date:  2010-08-09       Impact factor: 3.169

6.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Authors:  Ying Tao; Lee Sam; Jianrong Li; Carol Friedman; Yves A Lussier
Journal:  Bioinformatics       Date:  2007-07-01       Impact factor: 6.937

7.  EDGAR: extraction of drugs, genes and relations from the biomedical literature.

Authors:  T C Rindflesch; L Tanabe; J N Weinstein; L Hunter
Journal:  Pac Symp Biocomput       Date:  2000

8.  Literature mining for the discovery of hidden connections between drugs, genes and diseases.

Authors:  Raoul Frijters; Marianne van Vugt; Ruben Smeets; René van Schaik; Jacob de Vlieg; Wynand Alkema
Journal:  PLoS Comput Biol       Date:  2010-09-23       Impact factor: 4.475

9.  Click-words: learning to predict document keywords from a user perspective.

Authors:  Rezarta Islamaj Doğan; Zhiyong Lu
Journal:  Bioinformatics       Date:  2010-09-01       Impact factor: 6.937

10.  Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.

Authors:  Andrew Wong; Hagit Shatkay
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.