Literature DB >> 16781189

Gene functional annotation by statistical analysis of biomedical articles.

T Theodosiou1, L Angelis, A Vakali, G N Thomopoulos.   

Abstract

BACKGROUND: Functional annotation of genes is an important task in biology since it facilitates the characterization of genes relationships and the understanding of biochemical pathways. The various gene functions can be described by standardized and structured vocabularies, called bio-ontologies. The assignment of bio-ontology terms to genes is carried out by means of applying certain methods to datasets extracted from biomedical articles. These methods originate from data mining and machine learning and include maximum entropy or support vector machines (SVM).
PURPOSE: The aim of this paper is to propose an alternative to the existing methods for functionally annotating genes. The methodology involves building of classification models, validation and graphical representations of the results and reduction of the dimensions of the dataset.
METHODS: Classification models are constructed by Linear discriminant analysis (LDA). The validation of the models is based on statistical analysis and interpretation of the results involving techniques like hold-out samples, test datasets and metrics like confusion matrix, accuracy, recall, precision and F-measure. Graphical representations, such as boxplots, Andrew's curves and scatterplots of the variables resulting from the classification models are also used for validating and interpreting the results.
RESULTS: The proposed methodology was applied to a dataset extracted from biomedical articles for 12 Gene Ontology terms. The validation of the LDA models and the comparison with the SVM show that LDA (mean F-measure 75.4%) outperforms the SVM (mean F-measure 68.7%) for the specific data.
CONCLUSION: The application of certain statistical methods can be beneficial for functional gene annotation from biomedical articles. Apart from the good performance the results can be interpreted and give insight of the bio-text data structure.

Mesh:

Year:  2006        PMID: 16781189     DOI: 10.1016/j.ijmedinf.2006.04.011

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  3 in total

1.  Bioinformatics analysis of transcription profiling of solid pseudopapillary neoplasm of the pancreas.

Authors:  Yongping Zhang; Xu Han; Hao Wu; Yifeng Zhou
Journal:  Mol Med Rep       Date:  2017-06-19       Impact factor: 2.952

2.  Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.

Authors:  Andrew Wong; Hagit Shatkay
Journal:  BMC Bioinformatics       Date:  2013-02-28       Impact factor: 3.169

3.  Automatic assignment of prokaryotic genes to functional categories using literature profiling.

Authors:  Raul Torrieri; Francislon S Oliveira; Guilherme Oliveira; Roney S Coimbra
Journal:  PLoS One       Date:  2012-10-15       Impact factor: 3.240

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.