| Literature DB >> 35681780 |
Hans Binder1,2, Maria Schmidt1, Lydia Hopp1, Suren Davitavyan3,4, Arsen Arakelyan2,3, Henry Loeffler-Wirth1.
Abstract
Multi-omics high-throughput technologies produce data sets which are not restricted to only one but consist of multiple omics modalities, often as patient-matched tumour specimens. The integrative analysis of these omics modalities is essential to obtain a holistic view on the otherwise fragmented information hidden in this data. We present an intuitive method enabling the combined analysis of multi-omics data based on self-organizing maps machine learning. It "portrays" the expression, methylation and copy number variations (CNV) landscapes of each tumour using the same gene-centred coordinate system. It enables the visual evaluation and direct comparison of the different omics layers on a personalized basis. We applied this combined molecular portrayal to lower grade gliomas, a heterogeneous brain tumour entity. It classifies into a series of molecular subtypes defined by genetic key lesions, which associate with large-scale effects on DNA methylation and gene expression, and in final consequence, drive with cell fate decisions towards oligodendroglioma-, astrocytoma- and glioblastoma-like cancer cell lineages with different prognoses. Consensus modes of concerted changes of expression, methylation and CNV are governed by the degree of co-regulation within and between the omics layers. The method is not restricted to the triple-omics data used here. The similarity landscapes reflect partly independent effects of genetic lesions and DNA methylation with consequences for cancer hallmark characteristics such as proliferation, inflammation and blocked differentiation in a subtype specific fashion. It can be extended to integrate other omics features such as genetic mutation, protein expression data as well as extracting prognostic markers.Entities:
Keywords: DNA methylome and copy number variation data; integrative cancer bioinformatics; lower grade gliomas; modes of genomics regulation; self-organizing maps machine learning; transcriptome
Year: 2022 PMID: 35681780 PMCID: PMC9179546 DOI: 10.3390/cancers14112797
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Workflow of integral multi-omics portrayal using combiSOM analysis. Different omics data layers were combined using weight factors and then trained together into one combined self-organizing map (combiSOM). After training, the three layers were decomposed to provide one separate portrait for each of them. Genes are located identically in all three portraits. Comparison of the three profiles reveals virtually anticorrelation between Gex and Dme for the red group and correlation between CNV and Gex for the cyan and blue groups. Downstream analysis considers clinical data to generate prognostic maps, to extract the functional context, and the relatedness between the omes and tumours. Symbols: wi with i = e, m, c denotes the weighting factor for combining the expression, methylation and CNV domains and Δe, Δm and Δc their feature values (see text).
Figure 2Genetic stratification of LGG, patient, genetic, and methylation characteristics. LGG were classified according to their IDH mutation status (IDH-mutated tumours were subsumed as GCIMP) and the co-deletion status of Chr.1p/19q, and single deletion of Chr.19q and sorted in each of the groups with increasing GCIMP-methylation score [37]. Selected features such as TERT promoter mutation, CNV of Chr.7+ (gains) and Chr.10- (loss), WHO grade (II or III), and prognosis (hazard ratio) differ between the genetic groups. Expression (E1–E8) and methylation (M1–M6) groups as defined in [35] enrich in different genetic groups. The red lines serve as guide for the eye to show trends of GCIMP and G-protein coupled receptor (GPCR) DNA methylation.
Figure A1Gallery of single (personalized) tumour portraits in the different omics domains. Mean group portraits are shown in Figure 4a.
Figure 3Similarity analysis of LGG using Gex, Dme and CNV data using pairwise similarity provides heatmaps of the SOM portraits (part (a)) and similarity network presentation (part (b)) (each circle indicates one tumour). CNV shows the clearest separation between genetic groups mainly along two axes referring to the Chr. 1p/19q codeletion and Chr. 7 + status, followed by Dme distributing the tumours along one axis according to their methylation level. Gex produces more diverse, multidimensional patterns resembling a closed circular net with strong intermixing of IDH-A, IDH-A and IDH-wt tumours. For methods description, see Appendix B.
Figure 4SOM portrayal of the gene expression (Gex), DNA methylation (Dme), and CNV landscapes. (a) Mean portraits of the genetic groups indicate characteristic features as red (increased values) or blue (decreased values) spots which are labelled by capital letters. The full gallery of individual portraits is shown in Figure A1. (b) ScoV portraits show cross-correlations between pairwise combinations of omics features. Gex and Dme predominantly anti-correlate (blue spots) reflecting repressive effect of DNA promoter methylation on gene activity while CNV and Gex/Dme mostly positively correlate (red) reflecting dose–response relationships. (c) Variance maps colour code the gene space for high (maroon colour) to low (blue) variance of the respective omics score. The variance patterns of Dme and CNV distribute along the two perpendicular diagonals, thus reflecting partial independence while the Gex pattern mixes with them. (d) The spot map shows the areas of the map with high feature values in any of the group portraits. They are observed predominantly in the Gex (red colour along the frame), Dme (green) or CNV (yellow) domains and are labelled by capital letters A–K in a clockwise direction. (e) The population map visualizes the population of metagenes with single genes. In the Dme- and Gex dominated regions (lower left to upper right diagonal), genes are more smoothly distributed while the CNV domain is characterized by an “isolated island”-like distribution of genes. (f) Spot profiles in the three omics domains and their functional/genetic context indicate increased (red) and decreased (blue) feature scores. Details are provided in Table A1 (Appendix B) and Supplementary Table S1. (g) Correlation plots of genes from selected spots reveal negatively correlated repressive relations between Dme and Gex and positively correlated dose–response relations between CNV and Gex. (h) Ternary diagrams of the feature composition in each spot and tumour. Point clouds along the left Gex axis are driven by changing methylation (upper row of diagrams) while the lower row refers to spots governed mostly by CNV Gex dose responses.
Figure 5Characteristics of the omics domains: (a) The metagene covariance maps colour code the mean covariance between the single-gene profiles and the metagene profiles in each pixel. High covariance values (red) agree with the spot regions. They preferentially associate with cellular programs (Gex), methylation modes (Dme), and chromosome-wise aberrations (CNV). (b) The prognostic map colour codes the overall survival (OS) hazard ratio (HR, compared with the mean OS averaged over the whole data set) between maximum (red, furious prognosis) and minimum (blue, good prognosis) HR. HR values are calculated metagene-wise by selecting tumours showing omics scores exceeding one standard deviation in positive direction (i.e., with high values of the score, see [35] for details). (c) Diversity analysis includes spot frequency distributions and profiles of the mean omics score and variance per tumour. The row below maps the key genes driving the different genetic subtypes (see Discussion section).
Figure 6Sets of signature genes were analysed in terms of ranked profiles, maps, and ternary diagrams using the gene set Z score (GSZ) of expression, methylation, and CNV values. (a) LGG were ranked with increasing GSZ from left to right. Tumours of different genetic groups accumulate at low or high GSZ values as indicated (LGG-bars are coloured according to their genetic group; see Figure 2 for assignment). The signature maps show the distribution of signature genes in the map. Their accumulation in selected spot areas is indicated by red circles (see Figure 4d). (b) Ternary diagrams show mutual dependencies where Gex of the respective set is driven via dose response by CNV (see chromosomal sets), via repression by Dme (see GCIMP set), by combinations of both or none of both (see the legend in the right part). A fine structure spans in direction (blue arrow) perpendicular to the major axis of variation (red arrow). Gene sets were taken from [43,44]. For a larger collection of gene signatures see Figure A2.
Figure A2Sets of signature genes were characterized using ranked profiles, ternary diagrams and signature maps. (a) Genes from selected chromosomes are typically aberrant in gliomas, (b) functional signatures [43,56] and (c) signatures differentiating glioma subtypes obtained in previous studies [22,24,37]. In the profiles, the LGG samples were coloured according to the genetic groups IDH-wt (red), IDH-A (blue), IDH-A’ (yellow) and IDH-O (light blue). (a) Genes from the key-chromosomal aberration at Chr.1 and 19 (preferentially in IDH-O, partly in IDH-A’) and Chr. 7+ and Chr. 10− (in IDH-wt and, partly, in IDH-A/A’) accumulate in specific areas of the map and reveal orchestrated changes of Gex and CNV values due to dose–response relationships in the ternary diagrams. (b) Gene signatures characterizing different biological functions enrich in specific areas of the map. This is indicative of co-regulation of the involved genes which is governed partly by Dme and CNV. For example, genes related to cell cycle activity accumulate in and around spot F and show high expression in IDH-wt and IDH-O. It associates with hypomethylation and a slight copy number gain in IDH-O paralleled by hypomethylation of MYC-targets, while cycling genes are hypermethylated in IDH-wt, thus showing the opposite methylation trend. Genes with functions in inflammatory response are downregulated in IDH-O due to their hypermethylation, a pattern, which is partly observed for G-protein coupled receptors (GPCR), thus suggesting their role in this process. Targets of the polycomb repressive complex (PRC2) and so-called low transcription factor activity genes (low TF) are mostly regulated by epigenetic mechanisms in the context of differentiation and development of healthy tissues and cancer [43,56]. Genes from these sets split into two major populations found near opposite corners of the map, namely the lower-left one (Gex_UP in LGG) and the upper right corner (Gex_DN in LGG). This split indicates that the respective genes are affected by an LGG group specific hyper-methylation patterns which activates them either in healthy brain or in brain cancer. Conversely, so-called high-TF genes associate with regulatory modes governed by TF-networks. They accumulate in regions of increased proliferative activity observed for MYC-targets. (c) Glioma-related signatures were taken from [22,24,37,41]. They show mostly pronounced enrichment in spot areas and clearly associate with group-wise up- and/or down-regulation in the different omics domains. They were adequately extracted from glioma expression data sets. Therefore, these signatures give rise to the spread of the LGG along the Gex axis of the ternary diagram, however with segregation between subtypes, e.g., between IDH-A and IDH-O.
Figure 7Stratification of genetic groups into expression (E-) and methylation (M-) subtypes defined previously [35]. (a) The flow diagrams visualize the distribution of the E- and M groups across the G groups. The group portraits reveal a high diversity of Gex and Dme patterns not fully resolved in the genetic groups (see text). The full gallery of individual tumour portraits is shown in Figure A1. (b) The spot summary maps provide an overview of the major spots due to high omics score values (Gex, Dme and CNV) in the respective group portraits. Their omics score profiles were sorted and coloured according to the genetic groups. (c) ScoV portraits of the Gex versus-CNV type are dominated by red spots due to positive correlations between gene expression and CNV aberrations while Gex Dme ScoV portraits show predominantly blue spots due to negative correlations between gene expression and DNA methylation. Positive correlations reflect either combined up- (++) or down- (−−), regulation while negative correlations split into up-/down- (+− for Gex_UP and Dme_DN) or down-/up- (−+) combinations.
Figure A3Omics (subtype-sorted) profiles and maps of selected gene signatures. Stratification into eight expression (E-) and six methylation (M-) subtypes show partly sharp subtype-specific differences of feature values not resolved in the genetic groups (see red arrows).
Figure A4CombiSOM of LGG using dominant weights (w = 0.99) for gene expression, DNA methylation and CNV, respectively. The mean portraits of the genetic groups in the Gex, Dme and CNV domains and the variance maps reflect different topologies of the SOM which are governed by the respective “dominantly weighted” omics domain. They result in different distributions of genes across the maps. (compare with Figure 4a, c referring to the equally weighted combiSOM). (a) The mean group portraits enable to identify modules of co-regulated genes of high feature (Gex, Dme, CNV) values in the different genetic groups. (b) The variance maps identify spot-regions of co-expressed (Gex map), co-methylated (Dme) and co-aberrant (CNV) genes which partly mutually overlap between the different modalities. (c) Distribution of genes in each of the SOM is different. It is governed by the dominantly weighted omics domain giving rise to clusters of highly variant gene expression (Gex), methylation (Dme) or copy numbers. Selected gene signatures consequently distribute differently in the different SOM. GCIMP genes were collected by combined hypermethylation and underexpression [37]. They consequently accumulate in certain areas in the Gex and Dme-dominated maps but diffusely spread over different chromosomal locations in the CNV dominated map. Inflammatory genes [44] show the strongest local enrichment in the Gex dominated SOM but are more distributed in the two other SOM. Genes encoding G-protein coupled receptors (GPCR) accumulate in the middle of the Gex dominated SOM, an area of low variant expression, while in the Dme-dominated SOM these genes associate with higher variance, meaning that to a higher degree with changing methylation than with expression. Low expression transcription factor genes associate more with epigenetic modes of genomics regulation, while high expression TF associate more with transcription factor networks [56].
Figure 8Multi-omics cartography of gliomas. DNA methylation, gene expression, and copy number variations are visualized as three layers of mutually linked landscapes where the (red) peaks refer to variant gene-centred features (DNA promoter methylation, gene expression, and copy number gains, respectively) and (blue) valleys to virtually invariant features. The right part schematically assigns the key topological features and their regulation in LGG.
Functional context of expression spots.
| Spot | Brief Characteristics | Up/DN | Top Genes (a) | Gene Sets and |
|---|---|---|---|---|
| A | Verhhak CL/MES_UP | IDH-wt, |
| WILLSCHER_GBM_Verhaak—CL & MES_up 1 × 10−9 HALLMARK_EPITHELIAL_ |
| B | GCIMP-meth_UP | IDH-wt/ |
| Hopp_Sturm_GBM_ IDH_UP 1 × 10−99, Noushmehr_ GCIMP_hypermeth 2 × 10−22, |
| C | healthy_brain | /IDH- A’ |
| GBM_DN 3 × 10−77, WIRTH_Nervous System 1 × 10−43, Sturm_ RTK II ‘Classic’_UP_RTK I 2 × 10−29, WIRTH_Normal Brain 7 × 10−32 |
| D | Chr. 10− | IDH-A, |
| Chr 10 1 × 10−99, HOPP_Weak_promoter 4 × 10−13, Reifenberger_GBM_IDH-wt_DN 1 × 10−10 |
| E | Chr. 10− | IDH-A, |
| Chr 10 1 × 10−99, LASTOWSKA_NEUROBLASTOMA_COPY_ |
| F | Chr.13− | IDH-O/ |
| Chr 13 1 × 10−99 |
| G | Healthy_brain, anti-GCIMP | IDH-O// |
| WIRTH_Nervous System 8 × 10−76, GBM_DN 2 × 10−76, WILLSCHER_GBM_Verhaak−PN (mut&wt)_up 2 × 10−60, Sturm_E5_RTK II ‘Classic’_UP 8 × 10−49, Lembcke_TCGA_meth_CIMP.L_UP_CIMP.H_DN 3 × 10−19 |
| H | Chr. 7+ | IDH-wt/ |
| Chr 7 1 × 10−99, AGUIRRE_PANCREATIC_ |
| I | Chr. 19− | IDH-A, |
| Chr 19 1 × 10−99, KUUSELO_PANCREATIC_ |
| J | Chr. 1− | IDH-A, |
| Chr 1 1 × 10−99, HOPP_Heterochrom 1 × 10−99, LASTOWSKA_ |
| K | GBM_Mesenchymal, Inflammation | IDH-wt, IDH-A; |
| Sturm_E4_Mesenchymal_RTK I ‘PDGFRA’_DN 1 × 10−99, WILLSCHER_GBM_Verhaak−CL & MES_up 1 × 10−85, Lembcke_TCGA−expr_ CIMP.H_UP 5 × 10−83, CHEN_METABOLIC_ |
(a) Top 10 genes with largest maximal expression. See Supplementary Table S1 for the full gene list, gene names and further details. (b) Gene sets were implemented in oposSOM ([19,39] and references cited therein). Enrichment was calculated using Fisher’s exact test [57]. Only gene sets with enrichment p < 10−6 were considered.