| Literature DB >> 26262109 |
Majid Rastegar-Mojarad1, Ravikumar Komandur Elayavilli1, Dingcheng Li1, Hongfang Liu1.
Abstract
Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.Entities:
Mesh:
Year: 2015 PMID: 26262109 PMCID: PMC5859928
Source DB: PubMed Journal: Stud Health Technol Inform ISSN: 0926-9630
Figure 1Architecture of our LBD system. In this system, the starting concept is drug, the linking concept is gene, and the target is disease that leads to drug-disease discoveries. Our system uses Semantic predications as evidence of correlation between the concepts.
Figure 2Results of our study
Figure 3Comparison of frequencies of drug-disease relations’ co-occurrence in a single sentence versus multiple sentences
Figure 4Comparing the time gap between the first co-occurrence of discovery and the causal pairs (Sentence level versus discourse level)