| Literature DB >> 30218068 |
Christiaan A de Leeuw1, Sven Stringer2, Ilona A Dekkers3, Tom Heskes4, Danielle Posthuma5,6.
Abstract
Gene-set analysis provides insight into which functional and biological properties of genes are aetiologically relevant for a particular phenotype. But genes have multiple properties, and these properties are often correlated across genes. This can cause confounding in a gene-set analysis, because one property may be statistically associated even if biologically irrelevant to the phenotype, by being correlated with gene properties that are relevant. To address this issue we present a novel conditional and interaction gene-set analysis approach, which attains considerable functional refinement of its conclusions compared to traditional gene-set analysis. We applied our approach to blood pressure phenotypes in the UK Biobank data (N = 360,243), the results of which we report here. We confirm and further refine several associations with multiple processes involved in heart and blood vessel formation but also identify novel interactions, among others with cardiovascular tissues involved in regulatory pathways of blood pressure homoeostasis.Entities:
Mesh:
Year: 2018 PMID: 30218068 PMCID: PMC6138636 DOI: 10.1038/s41467-018-06022-6
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Illustration of different confounding scenarios. In each of the scenarios, a gene set with no relevance to the phenotype overlaps with a relevant gene set, resulting in a confounded association. The left column contains scatterplots of gene associations as a function of their position, with the lines below the plots indicating which gene sets they belong to. The right column contains bar plots showing the resulting gene-set associations, when analysing either the marginal associations of each gene set individually (as in traditional GSA) or when using a joint conditional (a–c) or interaction (d) analysis of the two gene sets. In a–c, one of the gene sets is relevant to the phenotype, with the other having no effect; in d the effect is assigned to the interaction between the two sets, with neither having a main effect. In all scenarios this is shown to be correctly reflected by the conditional/interaction analysis, but not the marginal analyses
Fig. 2Overview of the extended gene-set analysis workflow. The workflow is composed of five steps, starting with a standard gene-set analysis in step one. Results from this analysis are then successively refined in the subsequent steps, discarding initially significant gene properties if their associations are found to be insufficiently robust. In the optional sixth step an exploratory interaction analysis is used to detect additional interaction effects not uncovered in the main analysis workflow
Overview of the number of initially significant gene properties retained in each step
| Number of significant and retained gene properties | ||||||
|---|---|---|---|---|---|---|
| Domain | Number of gene properties | Phenotype | Step 1 | Step 2 | Step 3a | Finala |
| Tissue-specific gene expression | 53 | SBP | 52 | 1 | 1 | 1 |
| DBP | 50 | 3 | 3 (2) | 3 (2) | ||
| PP | 43 | 14 | 7 (3) | 7 (3) | ||
| miRNA targets | 221 | SBP | 0 | 0 | 0 | 0 |
| DBP | 0 | 0 | 0 | 0 | ||
| PP | 3 | 0 | 0 | 0 | ||
| GO—biological process | 4653 | SBP | 10 | 6 | 5 | 5 |
| DBP | 14 | 6 | 5 (4) | 5 (4) | ||
| PP | 31 | 16 | 9 (8) | 9 (8) | ||
| GO—cellular component | 584 | SBP | 1 | 1 | 1 | 1 |
| DBP | 0 | 0 | 0 | 0 | ||
| PP | 6 | 2 | 2 (1) | 2 (1) | ||
| GO—molecular function | 929 | SBP | 2 | 1 | 1 | 0 |
| DBP | 1 | 1 | 1 | 1 | ||
| PP | 6 | 2 | 2 | 2 | ||
| All domains | 6440 | SBP | 65 | 9 | 8 | 7 |
| DBP | 65 | 10 | 9 (7) | 9 (7) | ||
| PP | 89 | 34 | 20 (14) | 20 (14) | ||
| Combined | 219 | 53 | 37 (29) | 36 (28) | ||
Gene properties are initially included based on significance in step 1, then retained or discarded in subsequent steps
Multiple testing correction was performed separately for each phenotype, applying Bonferroni correction per domain with α = 0.05/5 = 0.01
SBP systolic blood pressure, DBP diastolic blood pressure, PP pulse pressure, GO Gene Ontology
a Numbers in parentheses reflect the likely number of distinct underlying signals
Marginally significant gene properties retained at end of extended analysis workflow
| Gene property | No. of genes | Phenotype | Shared | QQ check | |
|---|---|---|---|---|---|
|
| |||||
| Artery (aorta) | – | PP | 1.16e-11 | (1) | – |
| Artery (coronary) | – | DBP | 0.000113 | (2) | – |
| Artery (coronary) | – | PP | 3.13e-10 | (1) | – |
| Artery (tibial) | – | DBP | 8.82e-5 | (2) | – |
| Artery (tibial) | – | PP | 1.15e-11 | (1) | – |
| Cervix (endocervix) | – | PP | 1.58e-6 | (3) | – |
| Heart (atrial appendage) | – | PP | 2.26e-8 | – | |
| Ovary | – | PP | 5.65e-6 | (3) | – |
| Uterus | – | SBP | 0.000123 | – | |
| Uterus | – | DBP | 4.78e-5 | – | |
| Uterus | – | PP | 4.03e-8 | (3) | – |
|
| |||||
| Blood vessel remodelling | 29 | DBP | 5.74e-7 | ||
| Cardiocyte differentiation | 95 | SBP | 6.28e-9 | ||
| Cardiocyte differentiation | 95 | PP | 9.49e-9 | ||
| Cardiovascular system development | 764 | PP | 1.79e-9 | (4) | |
| Cell proliferation | 645 | PP | 1.60e-8 | ||
| CGMP biosynthetic process | 13 | DBP | 4.06e-7 | ||
| Circulatory system development | 764 | PP | 1.79e-9 | (4) | |
| Embryonic eye morphogenesis | 32 | SBP | 4.17e-7 | ||
| Mesenchyme development | 180 | PP | 9.11e-9 | ||
| Negative regulation of cellular senescence | 36 | PP | 3.63e-8 | ||
| Negative regulation of smooth muscle cell proliferation | 11 | PP | 6.20e-7 | (5) | |
| Negative regulation of transcription from RNA polymerase II promotor | 701 | SBP | 9.22e-7 | Flagged | |
| Nitric oxide metabolic process | 15 | DBP | 6.22e-8 | (6) | |
| Positive regulation of developmental growth | 150 | SBP | 1.84e-6 | ||
| Positive regulation of urine volume | 14 | SBP | 6.43e-7 | ||
| Positive regulation of urine volume | 14 | DBP | 5.72e-7 | ||
| Reactive oxygen species biosynthetic process | 21 | DBP | 3.37e-8 | (6) | |
| Regulation of smooth muscle cell proliferation | 98 | PP | 1.00e-7 | (5) | |
| Regulation of transcription from RNA polymerase II promotor | 1682 | PP | 9.97e-7 | ||
|
| |||||
| Actin cytoskeleton | 430 | PP | 1.08e-6 | (7) | Flagged |
| Cytoskeleton | 1882 | PP | 1.88e-7 | (7) | Flagged |
| T tubule | 45 | SBP | 1.35e-5 | Flagged | |
|
| |||||
| Cell adhesion molecule binding | 180 | PP | 1.28e-6 | Flagged | |
| Peptide hormone binding | 35 | PP | 4.96e-6 | Flagged | |
| Sequence-specific DNA binding | 976 | DBP | 5.62e-6 |
p-Values are from step 2 of the analysis workflow, after correcting for general confounders. Gene properties that likely reflect a single shared association are marked by the same number in the ‘Shared’ column. Gene sets for which issues were noted during inspection of the QQ-plots are marked in the ‘QQ check’ column. These are still valid, but require more caution when interpreting their association
SBP systolic blood pressure, DBP diastolic blood pressure, PP pulse pressure, GO Gene Ontology
Fig. 3Global QQ-plots of associations for pulse pressure. a Comparison of gene-set associations corrected for other effects. Shown are associations with no corrections (step 1), associations corrected for overall and tissue-specific gene expression (step 2), and associations additionally corrected for all significant and retained gene sets listed in Table 2. When correcting for gene expression, the associations for miRNA target sets were also corrected for general miRNA target status. b Comparison of the overall levels of marginal and interaction association. Marginal associations are corrected for general confounders (step 2); the interaction associations are from the exploratory interaction analysis (step 6). c Comparison of tissue by gene set interactions for different tissues. For all of the tissue-specific analyses, each interaction was also conditioned on the interaction between overall expression and the gene set. For all three plots, corresponding figures for all of the phenotypes can be found in Supplementary Figures 3–5
Significant and retained interactions from post hoc tissue expression by gene set interaction analysis
| Tissue | Gene set | Marginal (set) | Interaction | Top 25% | Shared |
|---|---|---|---|---|---|
|
| |||||
| Artery (coronary) | Nucleoside phosphate biosynthetic process (BP) | 0.0167 | 2.59e-8 | 3.86e-5 | (1) |
| Artery (coronary) | Purine-containing compound biosynthetic process (BP) | 0.0104 | 1.81e-7 | 6.38e-5 | (1) |
| Artery (tibial) | miRNA-145 targets | 0.442 | 1.60e-5 | 1.74e-7 | |
| Artery (tibial) | Nucleoside phosphate biosynthetic process (BP) | 0.0167 | 8.56e-7 | 2.28e-4 | (1) |
| Artery (tibial) | Purine-containing compound biosynthetic process (BP) | 0.0104 | 1.12e-6 | 1.79e-5 | (1) |
| Heart (atrial appendage) | Positive regulation of catalytic activity (BP) | 0.122 | 9.24e-7 | 8.53e-3 | (2) |
| Heart (atrial appendage) | Positive regulation of molecular function (BP) | 0.0238 | 3.25e-6 | 1.86e-3 | (2) |
| Heart (atrial appendage) | Receptor signalling protein activity (MF) | 0.123 | 2.39e-5 | 2.75e-5 | |
| Heart (atrial appendage) | Regulation of blood pressure (BP) | 0.0115 | 2.61e-8 | 1.57e-9 | |
| Heart (atrial appendage) | Vascular process in circulatory system (BP) | 0.00139 | 5.88e-9 | 2.25e-7 | |
| Uterus | Development of primary sexual characteristics (BP) | 0.00698 | 1.3e-5 | 3.19e-8 | (3) |
| Uterus | Reproductive system development (BP) | 0.000829 | 2.93e-5 | 4.71e-4 | (3) |
| Uterus | Sex differentiation (BP) | 0.00339 | 5.21e-6 | 2.21e-6 | (3) |
|
| |||||
| Artery (coronary) | Nucleoside phosphate biosynthetic process (BP) | 0.550 | 2.01e-6 | 1.08e-5 | (4) |
| Artery (coronary) | Purine-containing compound biosynthetic process (BP) | 0.117 | 2.48e-6 | 7.52e-6 | (4) |
| Artery (tibial) | Microtubule-based movement (BP) | 0.582 | 2.45e-5 | 2.20e-3 | |
| Artery (tibial) | Microtubule binding (MF) | 0.525 | 2.34e-5 | 3.41e-3 | |
| Heart (atrial appendage) | Regulation of blood pressure (BP) | 0.000777 | 2.77e-9 | 2.04e-11 | (5) |
| Heart (atrial appendage) | Vascular process in circulatory system (BP) | 5.55e-6 | 4.91e-10 | 1.37e-9 | (5) |
|
| |||||
| Artery (coronary) | Nucleoside phosphate biosynthetic process (BP) | 0.500 | 1.02e-5 | 2.89e-4 | (6) |
| Artery (coronary) | Purine-containing compound biosynthetic process (BP) | 0.205 | 2.44e-5 | 3.63e-4 | (6) |
| Artery (tibial) | miRNA-145 targets | 0.638 | 1.01e-5 | 6.20e-7 | |
| Heart (atrial appendage) | Cellular response to nitrogen compound (BP) | 0.0167 | 2.18e-5 | 8.10e-7 | |
| Uterus | Development of primary sexual characteristics (BP) | 0.0144 | 7.56e-7 | 2.84e-7 | (7) |
| Uterus | Sex differentiation (BP) | 0.00523 | 6.26e-7 | 4.28e-7 | (7) |
Marginal gene-set p-values are from step 2 of the analysis workflow, after correcting for general confounders. The ‘top 25%’ p-values are for the gene set of the 25% genes with the highest residual expression on the tissue, conditioned on the whole set and the tissue expression. Interactions that likely reflect a single shared association are marked by the same number in the ‘Shared’ column
SBP systolic blood pressure, DBP diastolic blood pressure, PP pulse pressure, BP biological process, MF molecular function
Significant and retained interactions from post hoc gene set by gene set interaction analysis for pulse pressure
| Gene-set interaction pair | Size | Overlap | Marginal | Full model | Shared | |
|---|---|---|---|---|---|---|
| Main | Interaction | |||||
| Cardiovascular system development (BP)a | 764 | 100 | 1.79e-9 | 2.74e-5 | 1.44e-5 | (1) |
| × Chemical homoeostasis (BP) | 844 | 0.000672 | 0.0836 | |||
| Cardiovascular system development (BP)a | 764 | 147 | 1.79e-9 | 0.000210 | 2.01e-5 | (1) |
| × Homoeostatic process (BP) | 1277 | 0.000981 | 0.100 | |||
| Cell adhesion molecule binding (MF) | 180 | 21 | 1.28e-6 | 0.000574 | 0.000751 | |
| × Glycosaminoglycan binding (MF) | 201 | 0.0904 | 0.602 | |||
| Cell proliferation (BP) | 645 | 204 | 1.60e-8 | 0.00311 | 0.000204 | |
| × Regulation of intracellular signal transduction (BP) | 1581 | 0.0197 | 0.340 | |||
| Cell proliferation (BP) | 645 | 74 | 1.60e-8 | 5.51e-5 | 7.62e-6 | |
| × Regulation of intracellular signal transduction (BP) | 596 | 0.454 | 0.958 | |||
| Regulation of transcription from RNA polymerase II promotor (BP) | 1682 | 203 | 9.97e-7 | 0.00127 | 1.73e-5 | |
| × Homoeostatic process (BP) | 1277 | 0.000981 | 0.125 | |||
Marginal gene-set p-values are from step 2 of the analysis workflow, after correcting for general confounders. Interactions that likely reflect a single shared association are marked by the same number in the ‘Shared’ column
BP biological process, MF molecular function
a The same interaction exists for circulatory system development, which is identical to cardiovascular system development and therefore omitted from this table
Fig. 4Illustration of results from post hoc tissue expression by gene set interaction analysis. Only interactions that were initially significant and retained after follow-up checks are shown. Tissue expression per anatomic site (blue), biological process (green), molecular function (yellow) and miRNA (red)