| Literature DB >> 27402677 |
Fabio Rinaldi1, Tilia Renate Ellendorff2, Sumit Madan3, Simon Clematide2, Adrian van der Lek2, Theo Mevissen3, Juliane Fluck4.
Abstract
Automatic extraction of biological network information is one of the most desired and most complex tasks in biological and medical text mining. Track 4 at BioCreative V attempts to approach this complexity using fragments of large-scale manually curated biological networks, represented in Biological Expression Language (BEL), as training and test data. BEL is an advanced knowledge representation format which has been designed to be both human readable and machine processable. The specific goal of track 4 was to evaluate text mining systems capable of automatically constructing BEL statements from given evidence text, and of retrieving evidence text for given BEL statements. Given the complexity of the task, we designed an evaluation methodology which gives credit to partially correct statements. We identified various levels of information expressed by BEL statements, such as entities, functions, relations, and introduced an evaluation framework which rewards systems capable of delivering useful BEL fragments at each of these levels. The aim of this evaluation method is to help identify the characteristics of the systems which, if combined, would be most useful for achieving the overall goal of automatically constructing causal biological networks from text.Entities:
Mesh:
Year: 2016 PMID: 27402677 PMCID: PMC4940434 DOI: 10.1093/database/baw067
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Example of BEL statement (The ‘cat’ function representing catalytic activity was considered in our evaluation as equivalent to ‘act’ (activity), see Table 3 for details.).
Overview of selected functions
| Function | Function Type | Example |
|---|---|---|
| complex() | Abundances | (complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:‘cell adhesion') |
| pmod() | Modifications | p(MGI:Cav1,pmod(P)) -> a(CHEBI:‘nitric oxide') |
| deg() | Transformations | p(MGI:Lyve1) -> deg(a(CHEBI:‘hyaluronic acid')) |
| tloc() | Transformations | a(CHEBI:‘brefeldin A') -> tloc(p(MGI:Stk16)) |
| act() | Activities | complex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1)) |
Figure 2Training data example.
BEL Relationships evaluated in Track 4
| Relationship—long form | Short form | Example |
|---|---|---|
| −| | a(CHEBI:‘brefeldin A’) -| p(HGNC:SCOC) | |
| =| | p(HGNC:TIMP1) =| act(p(HGNC:MMP9)) | |
| −> | p(MGI:Bmp4) -> p(MGI:Acta2) | |
| = > | p(HGNC:VEGFA) = > act(p(HGNC:KDR)) |
In the challenge, decreases was accepted in place of directlyDecreases.
In the challenge, increases was accepted in place of directlyIncreases.
Overview of Track 4 namespaces and associated functions
| Namespace Identifier | Description | Associated Entities | BEL Functions | Function Longform | BEL Term Example |
|---|---|---|---|---|---|
| Standard approved gene symbols and synonyms for Humans, used to specify genes, microRNA, RNA and proteins | Human Genes, microRNA, RNA, proteins | p(),g(),r(),m(), | proteinAbundance | p(HGNC:MAPK14) | |
| Standard approved gene symbols and synonyms for Mouse, used to specify genes, microRNA, RNA and proteins | Mouse Genes, microRNA, RNA, proteins | p(),g(),r(),m(), | Same as above | p(MGI:Mapk14) | |
| Genes, microRNA, RNA and proteins of Homo sapiens, Mus musculus and Rattus norvegicus. | Genes, microRNA, RNA, proteins | p(),g(),r(), | Same as above | p(EGID:1432) | |
| Gene Ontology database for biological processes referenced through the standard name. | Biological Processes | bp() | bp(GOBP:‘cell proliferation’) | ||
| U.S. National Library of Medicine provided vocabulary for disease. Namespace provides the Main Heading for each disease in the Diseases [C] tree. These identifiers can be used to specify pathologies. | Diseases, Pathologies | path() | path(MESHD:Hyperoxia) | ||
| Chemical Entities referenced through the standard name for each compound. | Chemicals | a() | a(CHEBI: lipopolysaccharide) |
Figure 3Visualization of the BEL statement ‘cat(p(HGNC:FAS)) increases p(HGNC:RB1,pmod(P))’ derived from the sentence ‘Fas stimulation of Jurkat cells is known to induce p38 kinase and we find a pronounced increase in Rb phosphorylation within 30 min of Fas stimulation’.
Figure 4A screenshot of the evaluation user interface of task 1.
Figure 5An example output of the sentence-based evaluation. The screenshot contains the detected true positive (green), false positive (red) and false negatives (yellow) entries for the term and relationship level.
Overview of the different evaluation levels with examples
| BEL Statement | p(HGNC:BCL2A1) decreases bp(GOBP:’apoptotic process') | act(p(MGI:Hras)) increases p(MGI:Mmp9) |
|---|---|---|
| p(HGNC:BCL2A1) | p(MGI:Hras) | |
| bp(GOBP:’apoptotic process') | p(MGI:Mmp9) | |
| – | act(p(MGI:Hras)) | |
| p(HGNC:BCL2A1) decreases bp(GOBP:’apoptotic process') | p(MGI:Hras) increases p(MGI:Mmp9) | |
| p(HGNC:BCL2A1) decreases bp(GOBP:’apoptotic process') | act(p(MGI:Hras)) increases p(MGI:Mmp9) | |
Figure 6An example result page of a candidate evaluation. The example shows the candidate sentence, with the gold standard and the predicted BEL statements. The evaluation scores are shown for all primary and secondary levels.
Evaluation of stage 1 of task 1 (prediction of BEL statements without gold standard entities)
Evaluation of stage 2 of task 1 (prediction of BEL statements with gold standard entities)
Evaluation results of task 2 including mean average precision (MAP)
| Criterion | TP | FP | Precision | MAP | Worst | Random | Best | |
|---|---|---|---|---|---|---|---|---|
| Full | 316 | 490 | 39.2% | 49.0% | 31.7% | 46.5% | 74.2% | |
| Relaxed | 429 | 377 | 53.2% | 62.1% | 45.9% | 58.4% | 80.4% | |
| Context | 496 | 310 | 61.5% | 68.9% | 55.2% | 65.7% | 83.5% |