| Literature DB >> 33165574 |
Yi Liu1, Benjamin Elsworth1, Pau Erola1, Valeriia Haberland2, Gibran Hemani1, Matt Lyon1,3, Jie Zheng1, Oliver Lloyd1, Marina Vabistsevits1, Tom R Gaunt1,3.
Abstract
MOTIVATION: The wealth of data resources on human phenotypes, risk factors, molecular traits and therapeutic interventions presents new opportunities for population health sciences. These opportunities are paralleled by a growing need for data integration, curation and mining to increase research efficiency, reduce mis-inference and ensure reproducible research.Entities:
Mesh:
Year: 2021 PMID: 33165574 PMCID: PMC8189674 DOI: 10.1093/bioinformatics/btaa961
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Architecture of the EpiGraphDB platform. Source datasets are integrated into a Neo4j graph database. Standard HTTP queries are processed through a RESTful API service, which can be called from any REST API client, including our R package epigraphdb. The web UI showcases main topics of the epidemiological evidence in EpiGraphDB and demonstrates the example API queries to get the underlying data
Summary of epidemiological evidence in EpiGraphDB
| Category | Description | Sources | ||
|---|---|---|---|---|
| Causal relationships | Pairwise MR between traits | MR-EvE ( | ||
| pQTL/eQTL MR | xQTL ( | |||
| Association relationships | Genetic correlations | Neale Lab ( | ||
| Observational correlations | EpiGraphDB inhouse | |||
| GWAS top hits | OpenGWAS ( | |||
| Polygenic risk score associations | PRS atlas ( | |||
| PPIs | IntAct ( | |||
| Drug targets | Open targets ( | |||
| Molecular pathways | Pathway ontologies and molecular events | Reactome ( | ||
| Gene expression for tissues | GTEx (The GTEx Consortium | |||
| Literature-mined/derived evidence | Literature evidence of biomedical entities and mechanisms | SemMedDB ( | ||
| Mapping of biomedical entities to literature terms | ||||
| Ontology and semantic relationships | Mapping of biomedical entities to ontology terms | EFO ( | ||
| Semantic similarities of biomedical entities | Vectology ( | |||
| Entity metrics | Meta nodes | Meta relationships | Nodes | Relationships |
| 14 | 42 | 32 969 103 | 84 181 124 | |
Note: Detailed discussion on data integration and how these biomedical entities and associations are represented in EpiGraphDB are available in the Supplementary Appendices S1 and S2.
Further details on the inhouse results by EpiGraphDB members are available from Supplementary Appendix S2.
Information and metrics are based on latest version of EpiGraphDB platform (version 0.3.0, April 21, 2020).
Fig. 2.Distinguishing vertical and horizontal pleiotropy using EpiGraphDB. (A) Concept of vertical and horizontal pleiotropy using SNP–proteins relationship as an example. We have a valid instrument for MR when a SNP affects proteins in a single path; in contrast, if an instrument is associated with proteins participating in different pathways, it violates the ‘exclusion restriction criterion’ and our instrument is invalid. (B) Integration of SNP–protein associations with pathway information and PPI data to distinguish vertical and horizontal pleiotropy using EpiGraphDB. All four proteins are associated with the same SNP. Proteins 1 and 2 share the same biological pathway. Proteins 2 and 3 are in PPI. Protein 4 shares no links with other proteins. Therefore, the SNP association on proteins 1, 2 and 3 are likely to act through vertical pleiotropy, where the SNP association on protein 4 verse other three proteins are likely to be horizontal pleiotropy
Fig. 3.Network diagram with the evidence to assess the pleiotropy of genetic variant rs12720356. The network has one node for each protein regulated by the eQTL rs12720356, and their size is inversely proportional to their P-value (see Supplementary Table S4 for details). Dashed pink edges depict the participation in common biological pathways, and blue edges represent the number of shared PPIs (value indicated)
Triangulation of MR and literature evidence on the effects of IL23R and associated genes to IBD
| Gene | Effect size (SE) |
| QTL | SemMed predicate (count) |
|---|---|---|---|---|
| IL23R | 1.50 (0.05) | 2.21 | pQTL | AFFECTS (1), ASSOCIATED_WITH (21), NEG_ASSOCIATED_WITH (2), PREDISPOSES (1) |
| 0.89 (0.06) | 4.16 | eQTL | ||
| IL12B | 0.42 (0.03) | 9.59 | pQTL | ASSOCIATED_WITH (5) |
| IL15 | −1.42 (0.20) | 5.53 | eQTL | ASSOCIATED_WITH (2) |
| IL4 | 0.46 (0.08) | 4.47 | eQTL | ASSOCIATED_WITH (3), DISRUPTS (1) |
| JAK2 | −1.90 (0.20) | 1.32 | eQTL | AFFECTS (1), ASSOCIATED_WITH (3) |
| NFKB1 | 0.97 (0.17) | 2.16 | eQTL | ASSOCIATED_WITH (2) |
| RORC | −1.00 (0.12) | 1.21 | eQTL | ASSOCIATED_WITH (1) |
| STAT3 | 0.60 (0.08) | 2.96 | eQTL | AFFECTS (2), AUGMENTS (1), ASSOCIATED_WITH (9), CAUSES (1) |
Note: The MR evidence is the QTL MR estimates of IL23R and the associated druggable genes (via direct PPI with Tier 1 druggability) to IBD GWAS (OpenGWAS ID: ieu-a-249). The literature evidence is the SemMed predicates derived by SemMedDB and the numbers of PubMed articles identified to support the predicate mechanism. Here, we report the subset of genes that are identified to contain both MR evidence (P-value <1 10−5).
Summary of disease traits identified with causal association to ‘Sleep duration’
| Exposure | Outcome | MR beta | MR | Disease |
|---|---|---|---|---|
| ieu-a-1088: sleep duration | ukb-a-107: non-cancer illness code self-reported: gout | −0.00257 | 3.8 | ‘gout’ |
| ieu-a-1088: sleep duration | ieu-a-6: coronary heart disease | −1.03933 | 2.3 | ‘coronary artery disease’ |
| ieu-a-1088: sleep duration | ukb-a-548: Diagnoses - main ICD10: K35 acute appendicitis | −0.00671 | 8.0 | ‘appendicitis’ |
| ieu-a-1088: sleep duration | ukb-a-54: cancer code self-reported: lung cancer | −0.00191 | 1.1 | ‘cancer’, ‘lung carcinoma’ |
| ukb-a-9: sleep duration | ukb-a-13: sleeplessness/insomnia | −0.32167 | 1.1 | ‘insomnia (disease)’ |
Note: We searched for MR evidence associated with the trait ‘Sleep duration’ with P-value to be under 1 10−10, and map the outcome trait to a disease term via mappings through EFO terms. The identifiers ‘ieu-a-’/‘ukb-a-’ are IEU OpenGWAS IDs.
Fig. 4.Literature-mined/derived evidence on the intermediates between ‘Sleep duration’ and ‘Coronary heart disease’. Counts of overlapping SemMed terms grouped by the SemMed term type (aapp—amino acids, peptides, proteins, gngm—genes or genome, horm—hormones, orch—organic chemicals; full list available at https://mmtx.nlm.nih.gov/MMTx/semanticTypes.shtml)
Fig. 5.Literature-derived mechanisms between ‘Sleep duration’, ‘Leptin’ and ‘Coronary Heart Disease’. Network diagram displaying the literature connections between ‘Sleep Duration’ and ‘Coronary Heart Disease’ through the intermediate term ‘Leptin’. Predicates connecting two semantic terms, their frequencies and enrichment P-value are labelled on the edges. Enrichment is calculated via MELODI Presto (Elsworth and Gaunt, 2020) based on a comparison of query count to background. Edge width represents the enrichment log transformed P-value. Red nodes represent the exposure (SLEEP DURATION) and outcome (CORONARY HEART DISEASE) traits, blue nodes represent intermediate semantic literature nodes