| Literature DB >> 27694210 |
Sumit Madan1, Sven Hodapp2, Philipp Senger2, Sam Ansari3, Justyna Szostak3, Julia Hoeng3, Manuel Peitsch3, Juliane Fluck1.
Abstract
Network-based approaches have become extremely important in systems biology to achieve a better understanding of biological mechanisms. For network representation, the Biological Expression Language (BEL) is well designed to collate findings from the scientific literature into biological network models. To facilitate encoding and biocuration of such findings in BEL, a BEL Information Extraction Workflow (BELIEF) was developed. BELIEF provides a web-based curation interface, the BELIEF Dashboard, that incorporates text mining techniques to support the biocurator in the generation of BEL networks. The underlying UIMA-based text mining pipeline (BELIEF Pipeline) uses several named entity recognition processes and relationship extraction methods to detect concepts and BEL relationships in literature. The BELIEF Dashboard allows easy curation of the automatically generated BEL statements and their context annotations. Resulting BEL statements and their context annotations can be syntactically and semantically verified to ensure consistency in the BEL network. In summary, the workflow supports experts in different stages of systems biology network building. Based on the BioCreative V BEL track evaluation, we show that the BELIEF Pipeline automatically extracts relationships with an F-score of 36.4% and fully correct statements can be obtained with an F-score of 30.8%. Participation in the BioCreative V Interactive task (IAT) track with BELIEF revealed a systems usability scale (SUS) of 67. Considering the complexity of the task for new users-learning BEL, working with a completely new interface, and performing complex curation-a score so close to the overall SUS average highlights the usability of BELIEF.Database URL: BELIEF is available at http://www.scaiview.com/belief/.Entities:
Mesh:
Year: 2016 PMID: 27694210 PMCID: PMC5045868 DOI: 10.1093/database/baw136
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Structure of a BEL statement. This example of a BEL statement describes that an abundance of the chemical corticosteroid reduces the biological process Oxidative Stress. For the identification and disambiguation of domain-specific terms and concepts in BEL pre-defined namespaces, in this example, CHEBI and MESHPP are used. The namespaces CHEBI contains chemical entities from the resource ChEBI (http://www.ebi.ac.uk/chebi/) and MESHPP contains various biological processes from MeSH (http://www.nlm.nih.gov/mesh/) Phenomena and Processes [G] branch.
Figure 2Architecture of semiautomatic information extraction workflow BELIEF. The workflow consists of a text mining pipeline (BELIEF Pipeline) and a web-based biocuration tool (BELIEF Dashboard). (Note: UIMA: Unstructured Information Management Architecture. UIMA Reader: A reader component to parse and extr act information from UIMA XCAS documents. JSVC Daemon: A Java library that allows applications to run as daemons.).
Figure 3Architecture of the BELIEF text mining pipeline. (Note: POS Tagging: Part of Speech Tagging. NLP: Natural Language Processing. TEES: Turku Event Extraction System, a state-of-the-art relation extraction system. GE: Genia Event Extraction for NFkB knowledgebase, a BioNLP Shared Task. PC: Pathway Curation, a BioNLP Shared Task.).
Used resources with the corresponding entity classes as well as the namespace symbols provided within OpenBEL
| Entity class | Resources | OpenBEL namespace |
|---|---|---|
| Human genes/proteins | EntrezGene/Uniprot | HGNC |
| Mouse genes/proteins | EntrezGene/Uniprot | MGI |
| Rat genes/proteins | EntrezGene/Uniprot | RGD |
| Protein family names | OpenBEL | SFAM |
| Protein complex names | OpenBEL | SCOMP |
| Protein complex names | Gene Ontology | GOCC |
| Biological processes | Gene Ontology | GOBP |
| Chemical names | OpenBEL | SCHEM |
| Chemical names | ChEBI | CHEBI |
| Chemical names | ChEMBL | CHEMBL |
| Disease names | MeSH | MESHD |
| Anatomical names | MeSH | MESHAnatomy |
| Cell lines | Cell Line Ontology | CellLine |
| Cell structures | MeSH | CellStructure |
These resources were converted into dictionaries and integrated into the workflow: EntrezGene (http://www.ncbi.nlm.nih.gov/gene), Uniprot (http://www.uniprot.org/), OpenBEL name spaces and annotations (http://resources.belframework.org/belframework/), Gene Ontology (http://www.geneontology.org/), ChEBI (http://www.ebi.ac.uk/chebi/), ChEMBL (https://www.ebi.ac.uk/chembl/), MeSH (http://www.nlm.nih.gov/mesh/), and the Cell Line Ontology (cellontology.org/).
Figure 4Screenshot of the evidence-centric curation view. In the upper left, the evidence text is visualized. Detected concepts in the current evidence text are shown in the upper right. In the bottom left, the curation of BEL statements and their context annotations can be performed.
Examples of conversion of TEES output (BioNLP Shared Task event annotations) to BEL statements
| Example | Sentence snippet | BioNLP | BEL statement | |
|---|---|---|---|---|
| Event (Name: | Arguments (Name: | |||
| E1 | Earlygrowth response-1 gene expression | Gene_expression: | Theme: | p(HGNC:EGR1) |
| E2 | Transfection of Foxp3 | Gene_expression: | Theme: | r(HGNC:FOXP3) |
| For Gene_expression trigger words ≠ express | ||||
| E3 | IRF-4 transcription | Transcription: | Theme: | r(HGNC:IRF4) |
| E4 | Phosphorylation of Thr-426 in SREBP1a | Phosphorylation: | Theme: | p(HGNC:SREBF1, pmod(P, |
| E5 | Degradation of the EGF receptor | ProteinCatabolism: | Theme: | deg(p(HGNC:EGFR)) |
| E6 | Cells treated with LPS secreted similar levels of TNF-alpha | Localization: | Theme: | sec(p(HGNC:TNFA)) |
| E7 | Surface expression of the LPS receptor complex that comprises TLR4. | Localization: | Theme: | surf(p(HGNC:TLR4)) |
| E8 | Galangin induced AhR nuclear translocation. | Localization: | Theme: | tloc (p(HGNC:AHR),GOCC:Nucleus) |
| For Localization trigger words ≠ secret or express and AtLoc ≠ cell surface or surface | ||||
| E9 | NF-AT interacts with Foxp3. | Binding: | Theme: | complex(p(HGNC:NFAT1), p(HGNC:FOXP3)) |
| E10 | Galangin induced AhR nuclear translocation. | PositiveRegulation: | Theme: | p(CHEBI:galangin) -> tloc(p(HGNC:AHR), GOCC:Nucleus) |
| E11 | GRK2 decreases early growth response-1 expression. | NegativeRegulation: | Theme: | p(HGNC:ADRBP1) -| p(HGNC:EGR1) |
| E12 | STAT4 controls Interleukin 10. | Regulation: | Theme: | p(HGNC:STAT4)−−p(HGNC:IL10) |
More details are described in Fluck et al. (38).
Figure 5Screenshot of the document management system listing documents for project ‘Demoversion’.
Figure 6Screenshot of the statement-centric curation view in the BELIEF Dashboard.
Test set prediction results for the several classes of BioCreative V BEL track task 1
| Class | Precision | Recall | |
|---|---|---|---|
| Term (T) | 72.67 | 76.76 | |
| Function-Secondary (FS) | 39.29 | 49.44 | |
| Function (F) | 33.33 | 40.37 | |
| Relationship-Secondary (RS) | 56.65 | 64.09 | |
| Relationship (R) | 31.68 | 43.10 | |
| Statement (S) | 20.79 | 30.77 |
The classes represent the different structural levels of a BEL statement. The description of the classes and several examples can be found in Section 3.1 and in Rinaldi et al. (7). (Note. The secondary classes stand for partially correct information.)
Some evaluation examples based on the sentences derived from BioCreative V BEL track sample set
The nuclear protein pigpen has an affinity for carbohydrate structures a carbohydrate-binding domain resides in the C terminus of the molecule and can be preferentially inhibited by saccharides, most notably N-acetyl-d-galactosamine and chondroitin sulphate. | ||
| Type | Result | BEL statement |
| Gold | – | a(CHEBI:"N-acetyl-D-galactosamine") -| p(HGNC:FUS) |
| – | a(CHEBI:"chondroitin sulfate") -| p(HGNC:FUS) | |
| BELIEF | True positive | a(CHEBI:"N-acetyl-D-galactosamine") -| p(HGNC:FUS) |
| False positive | a(CHEBI:"N-acetyl-D-galactosamine") -| a(CHEBI:"chondroitin sulfate") | |
| 60 or 90 μM galangin induced AhR nuclear translocation in both cell type (Figure 11A, lanes 7, 8; Figure 11B, lane 6). | ||
| Type | Result | BEL statement |
| Gold | – | a(CHEBI:galangin) -> act(p(HGNC:AHR)) |
| BELIEF | True positive | a(CHEBI:galangin) -> tloc(p(HGNC:AHR)) |
| In the absence of CdCl2 pre-treatment, ionizing radiation increased both expression and phosphorylation of c-Jun in MRC5CV1 cells but not in AT5BIVA cells. | ||
| Type | Result | BEL statement |
| Gold | – | bp(GOBP:"response to ionizing radiation") -> p(HGNC:JUN,pmod(P)) |
| – | bp(GOBP:"response to ionizing radiation") -> p(HGNC:JUN) | |
| BELIEF | False positive | p(PH:placeholder) -| (a(CHEBI:"cadmium dichloride") -> p(HGNC:JUN,pmod(P))) |
| False positive | p(PH:placeholder) -| (a(CHEBI:"cadmium dichloride") -> p(HGNC:JUN)) | |
| False positive | p(HGNC:JUN) −− a(CHEBI:"cadmium dichloride") | |
| The sensitivity to Fas-induced cell death was reduced in HGF transfectants, which was reversed by the presence of anti-HGF antibody | ||
| Type | Result | BEL statement |
| Gold | – | p(HGNC:HGF) -| (act(p(HGNC:FAS)) -> bp(GOBP:"cell death")) |
| BELIEF | Partly true positive | p(PH:placeholder) -| (p(HGNC:FAS) -> bp(GOBP:"cell death")) |
| Partly true positive | p(HGNC:FAS) −− bp(GOBP:"cell death") | |
| False positive | p(HGNC:HGF) −− bp(GOBP:"cell death") | |
Every example contains a sentence with the gold standard and predicted BEL statements. In addition, the results of the predicted BEL statements are provided.
Usage of the document sets by the two groups
| Assisted curation | Manual curation | |
|---|---|---|
| Group 1 ( | Set 1 | Set 2 |
| Group 2 ( | Set 2 | Set 1 |
Figure 7Average number of documents curated by curators in 1 h through assited curation using BELIEF Dashboard and through manual curation. The documents with BEL syntax errors are shown here as invalid. The curators are ordered according to their experience of BEL curation.
Resulted documents and BEL statements through assisted curation using BELIEF Dashboard and through manual curation
| Documents | BEL Statements | |||
|---|---|---|---|---|
| Assisted curation | Manual curation | Assisted curation | Manual curation | |
| Syntactically valid | 25 | 16 | 243 | 159 |
| Syntactically invalid | 0 | 8 | 0 | 27 |
| 25 | 24 | 243 | 186 | |
The invalid category represents the documents and BEL statements with syntax errors.
Figure 8Time usage by curators and curation type.