Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.

Literature DB >> 18689812

Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.

Shi Yu¹, Steven Van Vooren, Leon-Charles Tranchevent, Bart De Moor, Yves Moreau.

Abstract

MOTIVATION: Computational gene prioritization methods are useful to help identify susceptibility genes potentially being involved in genetic disease. Recently, text mining techniques have been applied to extract prior knowledge from text-based genomic information sources and this knowledge can be used to improve the prioritization process. However, the effect of various vocabularies, representations and ranking algorithms on text mining for gene prioritization is still an issue that requires systematic and comparative studies. Therefore, a benchmark study about the vocabularies, representations and ranking algorithms in gene prioritization by text mining is discussed in this article.
RESULTS: We investigated 5 different domain vocabularies, 2 text representation schemes and 27 linear ranking algorithms for disease gene prioritization by text mining. We indexed 288 177 MEDLINE titles and abstracts with the TXTGate text pro.ling system and adapted the benchmark dataset of the Endeavour gene prioritization system that consists of 618 disease-causing genes. Textual gene pro.les were created and their performance for prioritization were evaluated and discussed in a comparative manner. The results show that inverse document frequency-based representation of gene term vectors performs better than the term-frequency inverse document-frequency representation. The eVOC and MESH domain vocabularies perform better than Gene Ontology, Online Mendelian Inheritance in Man's and London Dysmorphology Database. The ranking algorithms based on 1-SVM, Standard Correlation and Ward linkage method provide the best performance. AVAILABILITY: The MATLAB code of the algorithm and benchmark datasets are available by request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Disease Species

Mesh：

Substances：
Genetic Markers

Year: 2008 PMID： 18689812 DOI： 10.1093/bioinformatics/btn291

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

17 in total

Review 1. Text mining applications in psychiatry: a systematic literature review.

Authors: Adeline Abbe; Cyril Grouin; Pierre Zweigenbaum; Bruno Falissard
Journal: Int J Methods Psychiatr Res Date: 2015-07-17 Impact factor: 4.035

2. A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case.

Authors: Jingchun Sun; Peilin Jia; Ayman H Fanous; Bradley T Webb; Edwin J C G van den Oord; Xiangning Chen; Jozsef Bukszar; Kenneth S Kendler; Zhongming Zhao
Journal: Bioinformatics Date: 2009-07-14 Impact factor: 6.937

Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining.

Review 1. Text mining applications in psychiatry: a systematic literature review.

2. A multi-dimensional evidence-based candidate gene prioritization approach for complex diseases-schizophrenia as a case.

3. Improving disease gene prioritization using the semantic similarity of Gene Ontology terms.

4. L2-norm multiple kernel learning and its application to biomedical data fusion.

5. A PubMed-wide associational study of infectious diseases.

6. Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles.

7. Revealing and avoiding bias in semantic similarity scores for protein pairs.

8. Protein comparison at the domain architecture level.

9. Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data.

10. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation.