| Literature DB >> 23813008 |
Makoto Miwa1, Tomoko Ohta, Rafal Rak, Andrew Rowley, Douglas B Kell, Sampo Pyysalo, Sophia Ananiadou.
Abstract
MOTIVATION: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge.Entities:
Mesh:
Year: 2013 PMID: 23813008 PMCID: PMC3694679 DOI: 10.1093/bioinformatics/btt227
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Illustration of PathText 2 architecture
Fig. 2.Illustration of event representation
Top-level correspondences for reaction-event mapping
| Pathway reaction | Event representation | ||
|---|---|---|---|
| Type | Participant | Type | Arguments |
| Truncation | Reactant, Product | Catabolism | Theme:Reactant |
| Transcription | Reactant, Product | Transcription, Gene_expression | Theme:Reactant |
| Translation | Reactant, Product | Translation, Gene_expression | Theme:Reactant |
| Heterodimer association | Reactant:Biomolecule, Product:Complex | Binding | Theme:Reactant |
| Dissociation | Reactant:Complex, Product:Biomolecule | Dissociation | — |
| Transport | Reactant:Biomolecule, from/to:Compartment | Localization | Theme:Reactant, atLoc/toLoc:from/to |
| Degradation/Truncation | Reactant:Biomolecule | Catabolism | Theme:Reactant |
Fig. 3.Screenshot of PathText 2 web interface
Summary of the annotated corpus statistics
| Pathway | p38 MAPK | p53 | p53 feedback | Wnt signalling | Total |
|---|---|---|---|---|---|
| Number of reactions | 16 | 12 | 6 | 11 | 45 |
| Number of documents | 160 | 120 | 60 | 110 | 450 |
| Highly relevant | 6 | 13 | 15 | 14 | 48 |
| Relevant | 0 | 17 | 16 | 0 | 33 |
| Partly relevant | 101 | 42 | 8 | 33 | 184 |
| Not relevant | 53 | 48 | 21 | 63 | 185 |
Statistics of the train/test data split
| Category | Train | Test | All |
|---|---|---|---|
| Number of reactions | 36 | 9 | 45 |
| Number of documents | 360 | 90 | 450 |
| Highly relevant | 38 | 10 | 48 |
| Relevant | 26 | 7 | 33 |
| Partly relevant | 148 | 36 | 184 |
| Not relevant | 148 | 37 | 185 |
Contribution of each semantic search system on training data in nDCG
| FACTA | KLEIO | MEDIE SVO | MEDIE EVENT |
|---|---|---|---|
| 0.829 | 0.847 | 0.850 | 0.859 |
Evaluation of test data and rule-based scoring method on training data in nDCG
| Data | SVM + Pathway | Pathway | SVM | RankSVM | SVR | Priority ranking | Average hit ratio | BM25 | Random |
|---|---|---|---|---|---|---|---|---|---|
| Test | 0.788 | 0.719 | 0.777 | 0.719 | 0.672 | 0.775 | 0.696 | 0.747 | 0.542 |
| Train | – | – | – | – | – | 0.865 | 0.846 | 0.842 | 0.641 |
Fig. 4.Learning curve on SVM-based ranking
Manual evaluation of PathText 2 and PubMed in Top 10 precision and nDCG
| Evaluation Metric | Priority ranking + Query expansion | Average hit ratio + Query expansion | Priority ranking | PubMed |
|---|---|---|---|---|
| Top 10 precision | 0.493 | 0.347 | 0.393 | 0.280 |
| nDCG | 0.419 | 0.373 | 0.376 | 0.215 |