| Literature DB >> 27195695 |
Yongjun Zhu1, Min Song2, Erjia Yan1.
Abstract
In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications.Entities:
Mesh:
Year: 2016 PMID: 27195695 PMCID: PMC4873143 DOI: 10.1371/journal.pone.0156091
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Entity types and their percentage.
| Entity Type | Number | Percentage |
|---|---|---|
| Disease | 9,681 | 58.44% |
| Drug | 4,347 | 26.24% |
| Gene | 2,540 | 15.32% |
Fig 1A schematic diagram for the proposed methods.
Fig 2A flow chart of six steps.
Top entities highly ranked by PageRank and Betweenness centrality.
| Rank | Disease | Drug | Gene | |||
|---|---|---|---|---|---|---|
| PageRank | Betweenness | PageRank | Betweenness | PageRank | Betweenness | |
| 1 | Tumor | hepatoma | alcohol | gamma-glutamyl | HCC | recombinant human interleukin 2 |
| 2 | hepatocellular carcinoma | Tumor | cisplatin | Tyrosine | AFP | creatine kinase B |
| 3 | Cancer | Cancer | glucose | trastuzumab | p53 | G21 |
| 4 | HCC | autosomal recessive, inherited disorder | oxygen | metallocorrole | CEA | gamma-glutamyl transpeptidase |
| 5 | Hepatoma | intrahepatic and extrahepatic cholangiocarcinoma | tyrosine | glutamyl | albumin | ED2 |
| 6 | Cirrhosis | CRLM and extra hepatic disease | ethanol | [11C]CH3OTf | TACE | CDH2 |
| 7 | Hepatitis | Thyrotoxicosis | 5-FU | 3-methylcholanthrene | insulin | thyroid hormone receptor beta |
| 8 | colorectal cancer | mitochondrial dysfunction | bilirubin | calcium folinate | alpha-fetoprotein | vascular-endothelial growth factor and fibroblast growth factor receptors |
| 9 | liver metastasis | HPV | glutathione | CBD | VEGF | Histone |
| 10 | liver cirrhosis | absence of disease progression, fatty liver | amino acid | diethylnitrosamine | IL-6 | beta-galactosidase |
Network characteristics for the six entity networks.
| Indicators | Disease | Drug | Gene | |||
|---|---|---|---|---|---|---|
| PageRank | Betweenness | PageRank | Betweenness | PageRank | Betweenness | |
| No. of Nodes | 41 | 55 | 77 | 58 | 67 | 47 |
| No. of Edges | 79 | 64 | 93 | 63 | 86 | 40 |
| Avg. Degree | 3.85 | 2.33 | 2.42 | 2.17 | 2.57 | 1.7 |
| Avg. Weighted Degree | 2210 | 836 | 156 | 46 | 124 | 6 |
| Avg. Path Length | 2.79 | 3.34 | 3.43 | 3.2 | 3.39 | 3.06 |
| Graph Density | 0.096 | 0.043 | 0.032 | 0.038 | 0.039 | 0.037 |
| Modularity | 0.19 | 0.1 | 0.71 | 0.59 | 0.46 | 0.54 |
| No. of Communities | 2 | 7 | 7 | 8 | 6 | 10 |
| Avg. Clustering Coefficient | 0.11 | 0.44 | 0.49 | 0.51 | 0.48 | 0 |
Top 15 entity pairs.
| Range | Rank | Disease | Drug | Gene | |||
|---|---|---|---|---|---|---|---|
| PageRank | Betweenness | PageRank | Betweenness | PageRank | Betweenness | ||
| > = 1000 | 1 | tumor—hepatocellular carcinoma | tumor—hepatocellular carcinoma | N/A | HCC—AFP | N/A | |
| > = 1000 | 2 | tumor—HCC | tumor—HCC | bilirubin—aspartate | N/A | HCC—TACE | N/A |
| > = 1000 | 3 | tumor—liver metastasis | tumor—liver metastasis | N/A | HCC—p53 | N/A | |
| > = 1000 | 4 | cirrhosis—hepatitis | tumor—hepatoma | N/A | HCC—VEGF | N/A | |
| > = 1000 | 5 | hepatocellular carcinoma—liver cirrhosis | cancer—hepatocellular carcinoma | N/A | HCC—alpha-fetoprotein | N/A | |
| <1000 and > = 100 | 1 | hepatitis—chronic hepatitis | cancer–HCC | diethylnitrosamine—phenobarbital | AFP—DCP | N/A | |
| <1000 and > = 100 | 2 | cancer—HCC | tumor—metastasis | tyrosine—serine | tyrosine—serine | N/A | |
| <1000 and > = 100 | 3 | HCC—liver cirrhosis | cancer—liver metastasis | 3-methylcholanthrene—phenobarbital | N/A | ||
| <1000 and > = 100 | 4 | HCC—chronic hepatitis | tumor—liver cirrhosis | tyrosine—sorafenib | N/A | ||
| <1000 and > = 100 | 5 | tumor—metastasis | cancer—colorectal cancer | HCC—IFN | N/A | ||
| <100 | 1 | cirrhosis—viral hepatitis | hepatoma—hepatitis B | TACE—RFA | gamma-glutamyl transpeptidase—alkaline phosphatase | ||
| <100 | 2 | hepatoma—hepatitis B | hepatoma—liver cirrhosis | tyrosine—threonine | IL-6—IL-10 | gamma-glutamyl transpeptidase—HCC | |
| <100 | 3 | liver cirrhosis—liver cancer | hepatoma—liver tumor | diethylnitrosamine—glutathione | VEGF—CD34 | gamma-glutamyl transpeptidase—AST | |
| <100 | 4 | liver metastasis—liver tumor | hepatoma—breast cancer | diethylnitrosamine—2-acetylaminofluorene | histone—HCC | ||
| <100 | 5 | cirrhosis—hepatitis C | hepatoma—chronic hepatitis | diethylnitrosamine—glucose | IL-6—TNF-alpha | ||
Fig 3PageRank-based (a) and Betweenness centrality-based (b) disease networks.
Fig 4PageRank-based (a) and Betweenness centrality-based (b) drug networks.
Fig 5PageRank-based (a) and Betweenness centrality-based (b) gene networks.