| Literature DB >> 23396322 |
Fabio Rinaldi1, Simon Clematide, Simon Hafner, Gerold Schneider, Gintare Grigonyte, Martin Romacker, Therese Vachon.
Abstract
In this article, we describe the architecture of the OntoGene Relation mining pipeline and its application in the triage task of BioCreative 2012. The aim of the task is to support the triage of abstracts relevant to the process of curation of the Comparative Toxicogenomics Database. We use a conventional information retrieval system (Lucene) to provide a baseline ranking, which we then combine with information provided by our relation mining system, in order to achieve an optimized ranking. Our approach additionally delivers domain entities mentioned in each input document as well as candidate relationships, both ranked according to a confidence score computed by the system. This information is presented to the user through an advanced interface aimed at supporting the process of interactive curation. Thanks, in particular, to the high-quality entity recognition, the OntoGene system achieved the best overall results in the task.Entities:
Mesh:
Year: 2013 PMID: 23396322 PMCID: PMC3568389 DOI: 10.1093/database/bas053
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 2ODIN interface: entry page.
Figure 3ODIN interface: entity annotations and candidate interactions on a sample PubMed abstract.
| Category | Total | Found (%) |
|---|---|---|
| Disease | 12 639 | 9502 (75.18) |
| Chemical | 38 523 | 30 129 (78.21) |
| Gene | 39 150 | 29 199 (74.58) |
| Total | 90 312 | 68 830 (76.21) |
| dis-gen | 6956 | 5126 (73.69) |
| che-dis | 12 154 | 8356 (68.75) |
| che-gen | 52 746 | 34 883 (66.13) |
| Total | 71 856 | 48 365 (67.13) |
| Term | MAP | Genes | Chemicals | Diseases |
|---|---|---|---|---|
| Doxorubicin | 0.800 | 0.167 | 0.843 | 0.793 |
| Indomethacin | 0.936 | 0.331 | 0.834 | 0.725 |
| Raloxifene | 0.798 | 0.244 | 0.818 | 0.778 |
| Amsacrine | 0.655 | 0.603 | 0.689 | 0.500 |
| Aniline | 0.543 | 0.625 | 0.561 | 0.524 |
| 2-Acetylaminofluorene | 0.643 | 0.412 | 0.845 | 0.421 |
| Aspartame | 0.365 | 0.686 | 0.756 | 0.720 |
| Quercetin | 0.853 | 0.463 | 0.646 | 0.653 |
| Cyclophosphamide | 0.708 | 0.396 | 0.880 | 0.646 |
| Phenacetin | 0.809 | 0.716 | 0.467 | 0.667 |
| Urethane | 0.650 | 0.365 | 0.871 | 0.633 |
Figure 4Official results of the BioCreative 2012 competition (task 1: ‘triage for the CTD database’). OntoGene was identified as ‘Group 116’. Reproduced from (18).
Figure 5Example of syntactic analysis of a sentence as performed by the Ontogene parser. Reprinted from Journal of Biomedical Informatics, Volume 45, Issue 5, Fabio Rinaldi, Gerold Schneider, Simon Clematide, ‘Relation Mining Experiments in the Pharmacogenomics Domain’, pages 851–861, 2012, with permission from Elsevier.
| Play | 0 | 25 | 17 | 1.47 | 13.41 |
| Treat | 0 | 24 | 17 | 1.41 | 12.71 |
| Bind | 0 | 18 | 9 | 2.00 | 12.70 |
| Inhibit | 0 | 41 | 48 | 0.85 | 12.28 |
| Constitute | 0 | 13 | 3 | 4.33 | 12.21 |
| Demonstrate | 0 | 30 | 30 | 1.00 | 11.57 |
| Exhibit | 0 | 16 | 11 | 1.45 | 9.67 |
| Reveal | 0 | 20 | 19 | 1.05 | 9.29 |
| 2t | 0 | 11 | 4 | 2.75 | 9.14 |
| … | … | … | … | … | … |
| Quinine | 1 | 8 | 1 | 8.00 | 0.00 |
| Phytoestrogen | 1 | 7 | 6 | 1.17 | 0.00 |
| Thalidomide | 1 | 6 | 15 | 0.40 | 0.00 |
Relation labels are shown in the first column. The second column is a boolean value indicating whether the head word is itself a term. The third column (‘F’) shows the number of times the head word is seen in a relevant path (notice that the same head word can occur in multiple relevant paths). The fourth column (‘A’) shows the number of times the word occurs in the document collection. The next column shows the ratio among the preceding two values. The final column calculated a weighted score considering the previous factors.