| Literature DB >> 32410573 |
Halil Kilicoglu1,2, Graciela Rosemblat3, Marcelo Fiszman4, Dongwook Shin3.
Abstract
BACKGROUND: In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships.Entities:
Keywords: Biomedical relation extraction; Natural language processing; Scientific publications; Semantic interpretation
Mesh:
Year: 2020 PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1High-level overview of the SemRep pipeline. Processes marked with * are optional (domain processing and sortal anaphora resolution).
Lexical entries retrieved for reduced and calcium antagonists
| input=reduced | input=calcium antagonists |
| base=reduce | base=calcium antagonist |
| entry=E0052363 | entry=E0515276 |
| cat=verb | cat=noun |
| variants=reg | variants=reg |
| intran | variants=uncount |
| tran=np | spelling_variant=calcium-antagonist |
| tran=pphr(from,np,pphr(to,np)) | number=plural |
| tran=pphr(to,np) | |
| ditran=np,pphr(from,np,pphr(to,np)) | |
| ditran=np,pphr(to,np | | |
| cplxtran=np,pphr(to,ingcomp:objc) | |
| nominalization=reduction | | |
| nominalization=reducement | | |
| tense=past,pastpart |
Results of prior intrinsic SemRep evaluations
| Gene-disease relations [ | 1000 | 1124 | 0.76 | - |
| Pharmacogenomic relations [ | 300 | 850 | 0.73 | 0.55 |
| Hypernymic relations [ | - | 830 | 0.83 | - |
| Comparative structures [ | 287 | 288 | 0.96 | 0.70 |
| Nominal predications [ | 300 | 300 | 0.75 | 0.57 |
| Substance interactions [ | 200 | 489 | 0.59 | 0.44 |
| Gene-function relations [ | 100 | 200 | 0.65 | 0.42 |
SemRep 1.8 evaluation against the test collection
| Precision | Recall | F 1 | |
|---|---|---|---|
| Strict evaluation | 0.55 | 0.34 | 0.42 |
| Relaxed evaluation | 0.69 | 0.42 | 0.52 |
Relaxed evaluation allows interchangeable concepts and ignores test collection annotation errors
Evaluation against the CDR corpus
| Precision | Recall | F 1 | |
|---|---|---|---|
| SemRep-ALL | 0.90 | 0.24 | 0.38 |
| SemRep-SENTENCE | 0.90 | 0.35 | 0.50 |
| Xu et al. [ | 0.56 | 0.58 | 0.57 |
| Peng et al. [ | 0.66 | 0.57 | 0.61 |
SemRep-ALL indicates the case in which all ground truth relations are taken into account. SemRep-SENTENCE indicates the scenario in which only the intra-sentence ground truth relations are considered. Xu et al. [19] was the top-ranking system in the BioCreative V CID task and Peng et al. [20] reported best post-challenge results. Both systems perform end-to-end relation extraction