| Literature DB >> 28750648 |
Jisoo Park1, Benjamin J Hescott2, Donna K Slonim2,3.
Abstract
BACKGROUND: Disease taxonomies have been designed for many applications, but they tend not to fully incorporate the growing amount of molecular-level knowledge of disease processes, inhibiting research efforts. Understanding the degree to which we can infer disease relationships from molecular data alone may yield insights into how to ultimately construct more modern taxonomies that integrate both physiological and molecular information.Entities:
Keywords: Disease Ontology; Disease Ontology inference; Disease gene association; Disease tree inference; Hierarchical clustering; Medical Subject Headings tree; Pairwise disease similarity; Parent Promotion
Mesh:
Year: 2017 PMID: 28750648 PMCID: PMC5530939 DOI: 10.1186/s13326-017-0134-0
Source DB: PubMed Journal: J Biomed Semantics
Subnetworks of the Disease Ontology
| Root disease | #Diseases (nodes) | #Edges | #Nodes with 1 parent | #Nodes with 2 parents | #Nodes with 3 parents |
|---|---|---|---|---|---|
| Disease | 2,039 | 2,095 | 1,982 | 55 | 1 |
| Cardiovascular disease | 141 | 141 | 139 | 1 | 0 |
| Gastrointestinal disease | 115 | 118 | 110 | 4 | 0 |
| Musculoskeletal disease | 133 | 135 | 129 | 3 | 0 |
| Nervous System disease | 308 | 324 | 291 | 15 | 1 |
The entire Disease Ontology (root = “Disease”) and four subnetworks of various sizes extracted from it. The original DO and its subnetworks are tree-like: 1) the numbers of edges are close to n−1, where n is the number of nodes and 2) only a small fraction of nodes have 2 or more parents
Four MeSH subtrees of various sizes used for method development
| Root disease | #Diseases (nodes) | #Edges |
|---|---|---|
| Infant, Premature, Diseases | 6 | 5 |
| Dementia | 13 | 12 |
| Respiration disorders | 23 | 22 |
| Eye diseases | 149 | 178 |
Fig. 1Topological difference between MeSH and the corresponding inferred ontology using CliXO. a A MeSH subtree containing prematurity complications. b Corresponding Disease Ontology inferred using CliXO and ontology alignment. Drawn in Cytoscape v. 3.3.0 [30]
Fig. 2How the Parent Promotion method transforms a dendrogram created by hierarchical clustering. a Dendrogram for diseases of infants born preterm. Hierarchical clustering builds a tree whose internal nodes are hard to interpret. b Parent Promotion finds the most general disease term from each cluster and promotes it as an internal node. An internal node becomes the parent of all other nodes in the same cluster. Disease term 3 has the most citations and keeps being selected for promotion until it becomes the root. Disease term 6 has more citations than 5 and is promoted as the parent of 5. However, it later becomes a child of 3 because it has fewer citations than 3. c Final tree built by Parent Promotion
Average performance of inference methods across the MeSH trees
| Method | EC (± stdev) | AC (± stdev) | AP (± stdev) | AR (± stdev) | F (± stdev) |
|---|---|---|---|---|---|
| Parent Promotion |
|
|
| 0.47 (± 0.14) |
|
| CliXO | 0.12 (± 0.10) | 0.22 (± 0.12) | 0.30 (± 0.14) | 0.38 (± 0.17) | 0.33 (± 0.15) |
| MWST | 0.07 (± 0.04) | 0.11 (± 0.07) | 0.13 (± 0.08) |
| 0.21 (± 0.11) |
Average Edge Correctness (EC), Ancestor Correctness (AC), Ancestor Precision (AP), Ancestor Recall (AR) and F-score across the different trees in the MeSH forest. Standard deviation is shown in parentheses. Best performance across different inference techniques is highlighted in italic
Evaluation results for four DO subnetworks
| Edge Correctness | Ancestor Correctness | F-score (Ancestor precision, ancestor recall) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Parent | Parent | Parent | |||||||
| Root disease | Promotion | CliXO | MWST | Promotion | CliXO | MWST | Promotion | CliXO | MWST |
| Cardiovascular disease | 0.06 |
| 0.07 |
| 0.18 | 0.11 |
| 0.27 | 0.21 |
| ( | (0.24, 0.30) | (0.13, | |||||||
| Gastrointestinal disease |
| 0.13 | 0.03 |
| 0.26 | 0.14 |
| 0.39 | 0.26 |
| ( | (0.36, 0.42) | (0.18, 0.48) | |||||||
| Musculoskeletal disease |
| 0.08 | 0.10 | 0.15 |
| 0.09 | 0.26 |
| 0.17 |
| ( | (0.42, | (0.16, 0.19) | |||||||
| Nervous System disease |
| 0.07 | 0.09 |
| 0.17 | 0.10 |
| 0.30 | 0.19 |
| ( | (0.26, 0.34) | (0.13, 0.34) | |||||||
Average Edge Correctness (EC), Ancestor Correctness (AC), Ancestor Precision (AP), Ancestor Recall (AR) and F-score across four DO subnetworks. Standard deviation is shown in parentheses. Best performance across different inference techniques is highlighted as italic
Fig. 3Parent Promotion tree using DO data. Subtree of the disease tree built by Parent Promotion on DO “musculoskeletal system disease” data that is an exact match to nodes and edges in the DO
Fig. 4A MeSH tree rooted at “Respiration Disorder” and corresponding inferred disease trees. a The MeSH tree containing “Respiration Disorder” and its descendants. b The disease tree inferred by Parent Promotion on data from the tree in a). c The disease tree inferred by MWST from the same data. MWST builds a taller and slimmer tree. As a result, most diseases have more ancestors in c) than in a) or b). This leads MWST to have good performance with respect to Ancestor Recall (AR)