| Literature DB >> 32819419 |
Wytze J Vlietstra1, Rein Vos2,3, Marjan van den Akker4,5, Erik M van Mulligen2, Jan A Kors2.
Abstract
BACKGROUND: Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it.Entities:
Keywords: Directionality of predicates; Disease trajectories; Knowledge graph; Predicates; Protein-protein interactions; Temporal relationships
Mesh:
Year: 2020 PMID: 32819419 PMCID: PMC7439632 DOI: 10.1186/s13326-020-00228-8
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Schematic overview of the overlap, direct, and indirect scenarios that were extracted from the knowledge graph. Both diseases A and disease B have three disease proteins (DP) associated with them according to the manually curated subset of DisGeNet. DisGeNet describes that DP1 is known to be associated with both diseases, while the knowledge graph describes that it has a “binds with” relationship to itself. DP2 and DP4 have a direct “inhibits” relationship, and DP3 and DP5 are connected through an indirect path, by an intermediate protein (IP). The arrows between the proteins indicate which protein is the subject of the “inhibits” predicate, and which one its object. The “binds with” predicate was considered to be undirected by the experts, and therefore does not have a direction. Based on the paths in the knowledge graph, four feature sets are created, based on two methods to represent indirect paths, and both with and without the directional information of predicates
Fig. 2The four feature sets that were derived from the paths between the disease proteins in Fig. 1. All features are binary: Black fields indicate a “True” value, while empty fields indicate a “False” value. For the “Mixed” feature sets, the “Binds with” predicate is assessed to be undirected by experts, while the “Inhibits” predicate is assessed to be directed
Predicates categorized as undirected as a result of the assessment process
| Undirected Predicates | |
|---|---|
| binds with | |
| coexists with | |
| does not coexist with | |
| forms protein complex with | |
| interacts with | |
| does not interact with | |
| is associated with | |
| is compared with | |
| is functionally related to | |
| is spatially related to | |
| is the same as | |
| ortholog is associated with |
Classification results for the four feature sets for both reference sets
| Jensen set | Jensen set - undersampled | Van den Akker set | ||||
|---|---|---|---|---|---|---|
| Metapaths | Split paths | Metapaths | Split paths | Metapaths | Split paths | |
| Undirected | 83.3 (1.7) | 78.3 (1.7) | 64.2 (12.1) | 61.9 (12.3) | 72.5 (11.8) | 68.4 (13.0) |
| Mixed | 89.8 (0.9) | 82.8 (1.2) | 82.3 (8.4) | 69.6 (13.1) | 74.5 (10.5) | 70.3 (11.4) |
The values in the columns indicate the mean AUC and its standard deviation in % of 10 cross-validation experiments
Fig. 3ROC curves of the mixed metapaths classifiers for the Jensen set and the Van den Akker set
Assessment of the top 15 false-positive trajectories
| First disease | ICD-10 | Second disease | ICD-10 | Assessment |
|---|---|---|---|---|
| Mental and behavioural disorders due to use of alcohol | F10 | Alzheimer’s disease | G30 | Described in literature [ |
| Essential (primary) Hypertension | I10 | Alzheimer’s disease | G30 | Described in literature [ |
| Osteoporosis without pathological fracture | M81 | Alzheimer’s disease | G30 | Described in literature [ |
| Non-insulin-dependent diabetes mellitus | E11 | Alzheimer’s disease | G30 | Described in literature [ |
| Other disorders of pancreatic internal secretion | E16 | Alzheimer’s disease | G30 | Described in literature [ |
| Schizophrenia | F20 | Other septicaemia | A41 | Described in literature, but commonly occurs via intermediate diseases such as agranulocytosis and pneumonia [ |
| Lupus erythematosus | L93 | Other disorders of urinary system | N39 | Described in literature [ |
| Disorders of vestibular function | H81 | Alzheimer’s disease | G30 | Described in literature [ |
| Lupus erythematosus | L93 | Respiratory failure, not elsewhere classified | J96 | Described in literature [ |
| Unspecified Dementia | F03 | Dementia in Alzheimer’s Disease | F00 | Further specification of diagnosis |
| Retinal vascular occlusions | H34 | Cystitis | N30 | No relationship found in literature |
| Chronic ischaemic heart disease | I25 | Other septicaemia | A41 | Cardiac troponins are suggested to be biomarkers for sepsis [ |
| Hyperplasia of prostate | N40 | Alzheimer’s disease | G30 | No relationship found in literature |
| Hyperparathyroidism and other disorders of parathyroid gland | E21 | Alzheimer’s disease | G30 | Suggested in literature (via calcium) [ |
| Asthma | J45 | Umbilical hernia | K42 | No relationship found in literature |
Assessment of the top 15 false-negative trajectories
| First disease | ICD-10 | Second disease | ICD-10 | Assessment |
|---|---|---|---|---|
| Thyrotoxicosis [hyperthyroidism] | E05 | Other disorders of eye and adnexa | H57 | Likely side effect of treatment [ |
| Irritable bowel syndrome | K58 | Spondylosis | M47 | No relationship found in literature |
| Vitamin B12 deficiency anaemia | D51 | Other septicaemia | A41 | Vitamin B12 has been hypothesized as treatment for sepsis [ |
| Mental and behavioural disorders due to use of alcohol | F10 | Acute and transient psychotic disorders | F23 | Described in literature, but no clear role for protein interactions [ |
| Gonarthrosis [arthrosis of knee] | M17 | Erysipelas | A46 | No relationship found in literature |
| Senile cataract | H25 | Other disorders of lens | H27 | Likely side effect of treatment [ |
| Transient cerebral ischaemic attacks and related syndromes | G45 | Vitamin B12 deficiency anaemia | D51 | Only reverse described in literature, that vitamin B12 protects against stroke [ |
| Malignant neoplasm of ovary | C56 | Deficiency of other nutrient elements | E61 | Likely mechanical cause [ |
| Malignant neoplasm of larynx | C32 | Candidiasis | B37 | Likely side effect of treatment [ |
| Other intervertebral disc disorders | M51 | Somatoform disorders | F45 | No relationship found in literature |
| Gonarthrosis [arthrosis of knee] | M17 | Other local infections of skin and subcutaneous tissue | L08 | No relationship found in literature |
| Benign neoplasm of brain and other parts of central nervous system | D33 | Other septicaemia | A41 | Likely intermediate through infection, which follows surgery or weakening of the immune system after (radiation) treatment |
| Insulin-dependent diabetes mellitus | E10 | Other disorders of eye and adnexa | H57 | Diabetes is a risk factor for many eye diseases [ |
| Noninflammatory disorders of ovary, fallopian tube and broad ligament | N83 | Ventral hernia | K43 | Likely side effect of treatment [ |
| Other intervertebral disc disorders | M51 | Other polyneuropathies | G62 | Likely mechanical cause [ |