Literature DB >> 18689812

Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.

Shi Yu1, Steven Van Vooren, Leon-Charles Tranchevent, Bart De Moor, Yves Moreau.   

Abstract

MOTIVATION: Computational gene prioritization methods are useful to help identify susceptibility genes potentially being involved in genetic disease. Recently, text mining techniques have been applied to extract prior knowledge from text-based genomic information sources and this knowledge can be used to improve the prioritization process. However, the effect of various vocabularies, representations and ranking algorithms on text mining for gene prioritization is still an issue that requires systematic and comparative studies. Therefore, a benchmark study about the vocabularies, representations and ranking algorithms in gene prioritization by text mining is discussed in this article.
RESULTS: We investigated 5 different domain vocabularies, 2 text representation schemes and 27 linear ranking algorithms for disease gene prioritization by text mining. We indexed 288 177 MEDLINE titles and abstracts with the TXTGate text pro.ling system and adapted the benchmark dataset of the Endeavour gene prioritization system that consists of 618 disease-causing genes. Textual gene pro.les were created and their performance for prioritization were evaluated and discussed in a comparative manner. The results show that inverse document frequency-based representation of gene term vectors performs better than the term-frequency inverse document-frequency representation. The eVOC and MESH domain vocabularies perform better than Gene Ontology, Online Mendelian Inheritance in Man's and London Dysmorphology Database. The ranking algorithms based on 1-SVM, Standard Correlation and Ward linkage method provide the best performance. AVAILABILITY: The MATLAB code of the algorithm and benchmark datasets are available by request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18689812     DOI: 10.1093/bioinformatics/btn291

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

Review 1.  Text mining applications in psychiatry: a systematic literature review.

Authors:  Adeline Abbe; Cyril Grouin; Pierre Zweigenbaum; Bruno Falissard
Journal:  Int J Methods Psychiatr Res       Date:  2015-07-17       Impact factor: 4.035

2.  A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case.

Authors:  Jingchun Sun; Peilin Jia; Ayman H Fanous; Bradley T Webb; Edwin J C G van den Oord; Xiangning Chen; Jozsef Bukszar; Kenneth S Kendler; Zhongming Zhao
Journal:  Bioinformatics       Date:  2009-07-14       Impact factor: 6.937

3.  Improving disease gene prioritization using the semantic similarity of Gene Ontology terms.

Authors:  Andreas Schlicker; Thomas Lengauer; Mario Albrecht
Journal:  Bioinformatics       Date:  2010-09-15       Impact factor: 6.937

4.  L2-norm multiple kernel learning and its application to biomedical data fusion.

Authors:  Shi Yu; Tillmann Falck; Anneleen Daemen; Leon-Charles Tranchevent; Johan Ak Suykens; Bart De Moor; Yves Moreau
Journal:  BMC Bioinformatics       Date:  2010-06-08       Impact factor: 3.169

5.  A PubMed-wide associational study of infectious diseases.

Authors:  Vitali Sintchenko; Stephen Anthony; Xuan-Hieu Phan; Frank Lin; Enrico W Coiera
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

6.  Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles.

Authors:  Warren A Cheung; Bf Francis Ouellette; Wyeth W Wasserman
Journal:  Genome Med       Date:  2012-09-28       Impact factor: 11.117

7.  Revealing and avoiding bias in semantic similarity scores for protein pairs.

Authors:  Jing Wang; Xianxiao Zhou; Jing Zhu; Chenggui Zhou; Zheng Guo
Journal:  BMC Bioinformatics       Date:  2010-05-28       Impact factor: 3.169

8.  Protein comparison at the domain architecture level.

Authors:  Byungwook Lee; Doheon Lee
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

9.  Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data.

Authors:  Yongjin Li; Jinyan Li
Journal:  BMC Genomics       Date:  2012-12-13       Impact factor: 3.969

10.  The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation.

Authors:  Armand Valsesia; Aurélien Macé; Sébastien Jacquemont; Jacques S Beckmann; Zoltán Kutalik
Journal:  Front Genet       Date:  2013-05-30       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.