| Literature DB >> 29262548 |
Haixiu Yang1, Lingling Zhao2, Ying Zhang3, Hong Ju4, Dong Wang5, Yang Hu6, Jun Zhang3, Liang Cheng1.
Abstract
Co-occurrence relationships in PubMed between terms accelerate the recognition of term associations. The lack of manually curated relationships in vocabularies and the rapid increase of biomedical literatures highlight the importance of co-occurrence relationships. Here we proposed a framework to explore term associations based on a standard procedure that comprises multiple tools of text mining and relationship degree calculation methods. The text of PubMed were segmented into sentences by Apache OpenNLP first, and then terms of sentences were recognized by MGREP. After that two terms occurring in a common sentence were identified as a co-occurrence relationship. The relationship degree is then calculated using Normalized MEDLINE Distance (NMD) or relationship-scaled score (RSS) method. The framework was utilized in exploring associations between terms of Gene Ontology (GO) and Disease Ontology (DO) based on co-occurrence relationship. Results show that pairs of terms with more co-occurrence relationships indicate shared more semantic relationships of ontology and genes. The identified association terms based on co-occurrence relationships were applied in constructing a disease association network (DAN). The small giant component confirms with the observation that diseases in the same class have more linkage than diseases in different classes.Entities:
Keywords: co-occurrence relationship; framework; term association; text mining
Year: 2017 PMID: 29262548 PMCID: PMC5732714 DOI: 10.18632/oncotarget.21532
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1The number of pair-wise terms of ontologies with co-occurrence relationships
Figure 2The correlation between co-occurrence based term relativity and structure-based term similarity
(A) The distribution of the term similarity by Wang’s method. (B) Pearson correlation coefficient between relative degree score and similarity score.
Figure 3The correlation between co-occurrence based term relativity and gene-based term similarity
(A) The distribution of the term similarity by ASR method. (B) Pearson correlation coefficient between relative degree score and ASR score.
Figure 4Constructing and characteristics of the disease association network
(A) Cumulative distribution of the edges between diseases when using various similarity thresholds. (B) Degree distribution for diseases in the DAN.
Data sources
| Data source | Web site |
|---|---|
| GO | |
| GOA | |
| DO | |
| DOA | |
| PubMed |
Figure 5The framework for exploring term associations