Literature DB >> 34238766

Integrative Multi-Omics Approaches in Cancer Research: From Biological Networks to Clinical Subtypes.

Yong Jin Heo^1,2, Chanwoong Hwa¹, Gang-Hee Lee¹, Jae-Min Park¹, Joon-Yong An^1,2.

Abstract

Multi-omics approaches are novel frameworks that integrate multiple omics datasets generated from the same patients to better understand the molecular and clinical features of cancers. A wide range of emerging omics and multi-view clustering algorithms now provide unprecedented opportunities to further classify cancers into subtypes, improve the survival prediction and therapeutic outcome of these subtypes, and understand key pathophysiological processes through different molecular layers. In this review, we overview the concept and rationale of multi-omics approaches in cancer research. We also introduce recent advances in the development of multi-omics algorithms and integration methods for multiple-layered datasets from cancer patients. Finally, we summarize the latest findings from large-scale multi-omics studies of various cancers and their implications for patient subtyping and drug development.

Entities: Chemical

Keywords: cancer research; genomics; multi-omics approach; proteogenomics; proteomics; systems biology

Mesh：

Year: 2021 PMID： 34238766 PMCID： PMC8334347 DOI： 10.14348/molcells.2021.0042

Source DB: PubMed Journal: Mol Cells ISSN： 1016-8478 Impact factor: 5.034

INTRODUCTION

Living organisms experience millions of signals transferred every second between cells, tissues, organs, and external environmental stimuli. Fine-tuned responses at various degrees and scales within the human body are central to the homeostatic mechanism that copes with potentially harmful environmental perturbations, including pathogens, smoking, and drugs, and interacts with the genetic background arising from spontaneous somatic mutations and numerous germline variants. Thus, a holistic view of homeostatic mechanisms through the study of genomic and epigenetic aberrations is needed to understand the core of cancer biology and the pathophysiological features of cancer during oncogenesis and tumor progression. A multi-omics study is a data-driven scientific investigation that analyzes a range of high-dimensional datasets at multiple levels and scales to reveal the complexity of cells and their environment. Such type of study can provide novel frameworks to untangle biological phenomena or models to test certain hypotheses using various datasets. In cancer research, a paradigm shift toward multi-omics approaches has been achieved with the recent development of high-throughput technologies in genomics and transcriptomics, increasing effort in large-scale research collaboration, and advancement of computational algorithms (Basu et al., 2013; Berns and Bernards, 2012; Cancer Genome Atlas Network, 2012b; Gentles and Gallahan, 2011; Whitehurst et al., 2007). Together with advances in genomics and transcriptomics, proteomics is emerging as a prominent field to elucidate the dynamics of gene activity. Large-scale proteomic research, such as that promoted by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), has uncovered the ubiquitous link of biomolecules to the environment and disease status (Gillette et al., 2020; Krug et al., 2020; Mertins et al., 2016; Mun et al., 2019; Zhang et al., 2016). Such a transition has extensively deepened our knowledge on the function of driver genes and proteins and has provided a comprehensive understanding of the signaling networks occurring between cells, tissues, organs, and the entire organism. Multi-omics approaches have been applied to numerous clinical studies for better identification of clinical subtypes or drug resistance, prediction of effective combination therapies, and identification of predictive biomarkers to increase the response rate to targeted treatments. In this review, we introduce the concept of multi-omics approaches in cancer research and provide useful resources for this. We focus on some of the clinical and basic science studies that have benefited from the use of a multi-omics approach to uncover novel concepts and properties. We also discuss some of the challenges connected to multi-omics approaches and how this relatively young field of study can have a positive impact on cancer research.

MULTI-OMICS APPROACHES IN CANCER RESEARCH

Over the past decades, there have been rapid advances in high-throughput technologies, which enable a range of genomic analyses at the cellular and tissue levels. Furthermore, highly developed genome screening technologies, such as whole exome sequencing (WES) and whole genome sequencing (WGS), have enabled comprehensive collection of gene expression data (e.g., RNA sequencing [RNA-seq] and microRNA [miRNA] profiling) and DNA methylation profiles (Cancer Genome Atlas Network, 2012a; 2012b; Cancer Genome Atlas Research Network, 2011, 2013; Cancer Genome Atlas Research Network et al., 2013a; Chin et al., 2006; Hennessy et al., 2010; Neve et al., 2006). Single-cell technologies provide new biological insights for the understanding of gene activity and cytological characteristics at the cellular level (Lee et al., 2021; Stuart et al., 2019; Stuart and Satija, 2019). In addition, large amounts of proteins and metabolites can be detected with high accuracy owing to the maturation of mass spectrometry techniques (Lai et al., 2018; Palmer et al., 2017; Schubert et al., 2017). Proteomics technologies allow to detect almost all human proteins and are advancing toward single-cell resolution (Marx, 2019; Vidova and Spacil, 2017). However, a single platform is insufficient to decipher the complexity underlying cancer genomes or to find a robust association with cancer driver mutations (Bozic et al., 2010; Greenman et al., 2007). Consequently, there is an emerging effort in the development of data-driven mathematical and computational methods to analyze high-dimensional datasets obtained from several novel analysis platforms (Bodenmiller et al., 2012; Hill et al., 2012; Pritchard et al., 2013; Qiu et al., 2011; Sumazin et al., 2011; Tentner et al., 2012; Teves and Won, 2020). In this regard, multi-omics approaches have been introduced to integrate multiple omics datasets generated from patients and identify coherent and preserved molecular or clinical features across different datasets (Fig. 1). Multi-omics studies aim to identify patient subgroups and biological features underlying cancer pathophysiology; they have been applied to overcome current complexities, due to genetic and phenotypic heterogeneity, that hinder our understanding of cancer genesis and progression, and to design effective predictive models to validate novel therapies and drugs. Within such an integrative framework, there has been an emerging effort to develop computational and mathematical methods that can decipher the complexity of cancer heterogeneity, since genomic and epigenetic instability in tumors can alter intracellular responses to the local environment and affect the individual as a whole through the tumorigenic process.

Fig. 1

Overview of multi-omics approaches in cancer research.

The integration of omics datasets is a crucial step in multi-omics studies. Datasets such as somatic mutations, CNV, gene expression, methylation, and proteome datasets are merged using various computational frameworks with distinct methods. The integration enables the comparison of molecular features across multiple viewpoints and the clustering of patients with relevant clinical features. Possible outcomes include enhanced identification of clinical subtypes, understanding of cancer pathophysiology, prediction of potential drug targets, and clinical decision support.

Over the last decade, a range of modeling approaches have been developed to deal with various aspects of cancer. In particular, the integration of large omics datasets has enabled modeling of cellular behaviors at the tissue level to understand cancer pathophysiology or the behavior of cancer cells in response to drugs and angiogenesis (Carro et al., 2010; Hong et al., 2020; Huang et al., 2013; Iadevaia et al., 2010; Pascal et al., 2013; Swanson et al., 2011). Multi-omics studies have opened new avenues for the implementation of targeted therapies for cancer treatment. Integrative approaches with large-scale multi-omics datasets have the potential to delineate the relationship between molecular markers and the response to targeted therapies. A more comprehensive understanding of the molecular characteristics of non-responsive or resistant tumors could enable more precise predictions of therapy outcomes, resulting in an increased therapeutic efficacy or in the ability to bypass drug resistance. In addition, multi-omics approaches might allow to identify subgroups of patients that are most likely to benefit from therapy. Cancer cells exhibit extreme levels of genetic heterogeneity and genomic instability. Thus, many putative driver aberrations can be observed: some could be bona fide drivers of cancer, but most of them are passenger mutations. Therefore, a major challenge in cancer research is to identify biomarkers or potential targets for cancer treatment (Cancer Genome Atlas Research Network, 2013; Cancer Genome Atlas Research Network et al., 2013a). On the other hand, it remains to be elucidated whether passenger aberrations within cancer genes play a role in cellular functions associated with cancer pathophysiology and response to targeted therapeutics. To evaluate this, a recent study developed a systems-based computational method that can assess low-frequency mutations in impure and heterogeneous samples (Cibulskis et al., 2013). This study successfully reported a range of sub-clonal drivers underpinning tumor progression and treatment resistance. Thus, multi-omics approaches can provide an efficient analytic framework to distinguish drivers from passenger mutations and dissect the genetic heterogeneity of cancer cells.

COMPUTATIONAL FRAMEWORKS FOR MULTI-OMICS STUDIES

Recent advances in high-throughput sequencing technologies have allowed the measurement of a large number of molecular patterns of cancer in a single experiment. High-throughput measurements enable rapid and unbiased profiling of somatic mutations, copy number variations (CNVs), and mRNA, non-coding RNA, and protein expression. Various computational algorithms have been proposed for multi-view clustering, to detect coherent features from heterogeneous inputs. In the biomedical domain, this has facilitated the definition of the clinical subtypes of complex disorders, such as cancers. Clustering methods have been widely developed to identify co-expressed gene modules and subgroups of patients within a certain disease (Langfelder and Horvath, 2008). The integration of multi-omics datasets for the same set of samples has been devised to better understand fine-tuned structures, which are not revealed by examining only a single data type. For instance, cancer subtypes can be classified based on multi-omics datasets, such as gene expression and mutation profiles, from the same patients (Chauvel et al., 2020). Multi-omics clustering can ameliorate potential bias or noise from a single omics dataset as the integration of multiple omics layers can fully represent different cellular aspects from the genomic to the epigenomic level (Nguyen and Wang, 2020; Wang et al., 2014). To date, various tools have been developed for multi-omics datasets with the following objectives: 1) identify disease subtypes or classify subgroups, 2) identify putative biomarkers for diagnostics and driver genes for diseases, and 3) gain insights into disease biology. Multi-omics frameworks are mostly based on Bayesian statistics (Kirk et al., 2012; Lock and Dunson, 2013; Shen et al., 2009; Vaske et al., 2010; Wu et al., 2015; Yuan et al., 2011), similarity networks (Nguyen et al., 2019; Wang et al., 2014), joint nonnegative matrix factorization (Yang and Michailidis, 2016), and sparse canonical correlation analysis (Witten and Tibshirani, 2009). Several multi-omics tools are highly used in the field or show outperformance for subtype prediction and survival analysis (Table 1). However, most multi-omics tools rely on different mathematical theories and support different ranges of data types. Even when using the same data, their performance varies greatly depending on the biological characteristics of the study objects. Therefore, acquiring biological insights from multi-omics data is a computational and biological challenge, requiring the researcher to select appropriate multi-omics tools.

Table 1

List of computational frameworks for multi-omics cancer studies

Study	Findings	Dataset	Principles
iCluster (Curtis et al., 2012; Shen et al., 2009)	Novel subgroups from 2,000 breast tumors	mRNA expression^a CNV^c	Joint latent variable model-based clustering method
iOmicsPASS (Koh et al., 2019)	Novel transcriptional regulatory network from TCGA/CPTAC breast cancer data	mRNA expression^a CNV^d Protein expression^e	Network construction using a modified nearest shrunken centroid algorithm
SALMON (Huang et al., 2019)	Improved survival analysis	Mutation^h mRNA/miRNA expressionCNV^h	Deep learning based on co-expression modules
SNF (Wang et al., 2014)	Subtype classification of clinical relevance	mRNA^a/miRNA expression^b DNA methylation^g	Patient similarity networks using an iterative procedure based on message passing
NEMO (Rappoport and Shamir, 2019)	Novel subtypes from even partial AML datasets	mRNA^a/miRNA expression^b DNA methylation^g	Sample clustering from partial datasets using an adjusted Rand index
MONET (Rappoport et al., 2020)	Module detection of patient subtypes and improved survival analysis	mRNA^a/miRNA expression^b DNA methylation^g	Detect similar modules commonly present across multi-omics datasets
PARADIGM (Vaske et al., 2010)	Detection of pathways affected by cancer with fewer false positives	mRNA expression^a CNV^c	Pathway recognition algorithm applied to multi-omics datasets
LRAcluster (Wu et al., 2015)	Subtype detection in both pan-cancer analysis and single cancer types	Mutationⁱ mRNA expression^a CNV^d DNA methylation^g	Performance of low-rank approximation from probabilistic models
BCC (Lock and Dunson, 2013)	Detection of patient subtypes in response to survival rates and driver mutation signatures	mRNA^a/miRNA expression^b DNA methylation^g Protein expression^f	Bayesian framework for estimation of an integrative clustering model

Gene expression data with normalization (e.g., quantile normalization, fragment per kilobase of transcript per million mapped reads [FPKM]).

Quantification of miRNA expression.

Circular binary segmentation-based copy number segmented means.

Affymetrix 6.0 SNP arrays.

Protein quantification by iTRAQ (isobaric Tags for Relative and Absolute Quantification) protein quantification.

Reverse phase protein array (RPPA).

Illumina Human Methylation arrays.

In the SALMON method, the copy number burden (CNB) is calculated using the total gene length (Kb) from SNP 6 data, and the tumor mutation burden (TMB) is calculated using the total number of mutated genes reported in Mutation Annotation Format (MAF) files.

The LRAcluster method uses somatic mutation data converted into a binary form.

iCluster

iCluster is an early multi-omics integration method that first integrates multiple inputs and then identifies multi-omics clusters by joint estimation of latent variables and through clustering and expectation–maximization-like algorithms (Shen et al., 2009). It was initially used for large-scale cancer genomic projects, for example for breast and lung cancer, in which gene expression and CNVs were summarized for multiple subgroups of patients. Since the runtime of iCluster increases with the number of features, iCluster+, providing full Bayesian regularization for clustering, has recently been proposed (Mo et al., 2013). iCluster+ identified colorectal cancer subtypes with different cancer progression pathways, one of which was found not to require aggressive drug treatment in addition to surgery.

iOmicsPASS

iOmicsPASS is a network-based algorithm that can merge genome-based networks with multi-omics datasets (Koh et al., 2019). Scores for biological interaction are computed by transformation of omics datasets and used as an input to construct networks, whose edges are defined for phenotypic groups using a modified nearest shrunken centroid algorithm. iOmicsPASS was shown to improve the identification of breast invasive ductal carcinoma (IDC) subtypes by integrating mRNA expression and protein abundance data. Such integrated analysis by iOmicsPASS revealed a new transcriptional regulatory network in a specific breast cancer subtype that could not be found through single-omics analysis.

SALMON (Survival Analysis Learning with Multi-Omics Neural Networks)

SALMON is a deep learning method based on co-expression networks (Huang et al., 2019). It takes multi-omics datasets from cancer patients and computes eigengenes from co-expression modules, and can thus ameliorate the issue of overfitting arising whenever multi-omics approaches are applied to datasets containing many features but few samples are available. For example, by analyzing mRNA and miRNA datasets from 583 female breast invasive carcinoma patients, SALMON provided a good prediction of survival.

SNF (Similarity Network Fusion)

SNF is a novel algorithm for the generation of patient similarity networks that uses an iterative procedure based on message passing (Wang et al., 2014). It calculates similarity networks for individual patients and then merges them to identify disease subtypes and predict phenotypes. In contrast to early integration, SNF takes advantage of individual omics datasets to construct independent single-omics networks and find coherent modules sourced from similar biological features across patients with similar clinical features. SNF iteratively applies a local K-nearest neighbors (KNN) approach to compute a patient similarity matrix for each omics dataset. When merging the global similarity matrices from all omics datasets, SNF conducts averaging of similarity matrices with iterative updating. It has demonstrated high efficiency in identifying clinical subtypes of cancers and other disorders such as autism (Cavalli et al., 2017; Ramaswami et al., 2020).

NEMO (NEighborhood based Multi-Omics clustering)

NEMO is a multi-omics clustering method that can be used for partial datasets without the need for data imputation (Rappoport and Shamir, 2019). NEMO first calculates an inter-patient similarity matrix for each omics dataset and then combines the matrices of different omics datasets into a single matrix. Clusters are identified using an adjusted Rand index to compute the similarity between patients by distance. NEMO was shown to outperform other multi-omics clustering algorithms when tested on multi-omics datasets of 10 cancers, and exhibited enhanced cluster detection from partial datasets.

MONET (Multi Omic clustering by Non-Exhaustive Types)

MONET is a method for detecting similar modules commonly present across multi-omics datasets (Rappoport et al., 2020). MONET utilizes three omics datasets (mRNA expression, DNA methylation, and miRNA expression) to compute an edge-weighted graph per omics dataset, where nodes represent samples and edges represent the similarity between samples. It then detects a disjoint set of modules for patients from multiple omics graphs. MONET was used to conduct benchmarking on 287 patients with ovarian serous cystadenocarcinoma, and revealed four sample modules representing venous invasion status and survival rates.

PARADIGM (PAthway Recognition Algorithm using Data Integration on Genomic Models)

PARADIGM is a method to identify specific biological pathways from a multi-omics dataset (Vaske et al., 2010). It combines multi-omics-scale values derived from an individual sample with gene activities, products, and an overview of the pathway interactions included in the National Cancer Institute (NCI) database, which contains information on protein-protein interactions. PARADIGM utilizes factor graphs derived from variables representing the state of various entities (e.g., a specific mRNA molecule or protein complex), and then creates probabilistic graphical models. Using these, it infers significant and non-significant interactions between pathways involving different entities. This tool proved to be efficient, and revealed four subtypes of glioblastoma leading to significantly different survival outcomes according to the perturbated pathways. This result suggests that the cancer subtype could be used as a basis to support clinical decisions.

LRAcluster (Low Rank Approximation based multi-omics data clustering)

LRAcluster is a multi-omics approach that integrates data on somatic mutations, CNVs, DNA methylation, and gene expression, and performs low-rank approximation from the probabilistic models of various molecular features (Wu et al., 2015). All molecular features from the omics datasets are transformed into variables and arranged in a parameter matrix, which is subject to the low-rank assumption. Next, dimension reduction is conducted, revealing clusters associated with distinct clinical subtypes. LRAcluster outperformed other existing methods in terms of both time and classification accuracy when tested on multi-omics datasets of breast invasive carcinoma, colon adenocarcinoma, and lung adenocarcinoma (LUAD).

BCC (Bayesian Consensus Clustering)

BCC is a data-driven approach that performs consensus clustering across multi-omics datasets (Lock and Dunson, 2013). BCC is based on the finite Dirichlet mixture model to explain not only overall consensus clustering, but also important features inherent to an individual omics dataset. Given that clusters constructed using a single data type are roughly connected, BCC seeks an integrative point for their adherence to an overall cluster. BCC was applied to 384 breast cancer patients from TCGA datasets, including gene expression, DNA methylation, and protein data, and effectively revealed three cancer subtypes associated with specific clinical features.

LATEST FINDINGS AND IMPLICATIONS IN CANCER MULTI-OMICS STUDIES

Cancer research has taken advantage of advances in omics technologies from genomics to transcriptomics and of the wide range of resources of multiple omics datasets originating from the same patients. Multi-omics approaches provide a unique opportunity to identify the molecular and clinical features of cancer patients. In genomics and transcriptomics, there is an unmet need to disentangle incompatibility in related biological processes, such as differences in post-translational modifications or variability in expression profiles due to the role of mRNA transcripts in cancer development (Greenbaum et al., 2003; Hegde et al., 2003; Tyers and Mann, 2003). Recent advances in proteomics through the maturation of several mass spectrometry techniques have enabled the introduction of proteogenomic approaches, which can integrate genomic data with proteomics and information on post-translational modifications (e.g., protein phosphorylation and acetylation). Large-scale proteogenomic research, including that promoted by the CPTAC (Gillette et al., 2020; Krug et al., 2020; Mertins et al., 2016; Mun et al., 2019; Zhang et al., 2016), has been conducted to unravel new biological mechanisms in cancers and provide fundamental information on multi-omics approaches for the development of integration strategies or computational algorithms. Multi-omics clustering further refined the association between molecular profiles and clinical features among cancer patients (Fig. 2). The identification of coherent subtypes across multiple dataset layers could have major implications for predicting clinical relevance or therapeutic response regardless of the overall tumor mutational load. Moreover, the integration of proteomics datasets enables the identification of a direct connection between mutations and phenotypes, and therefore increases the resolution of clustering patterns across samples. Here, we summarize the latest findings obtained in cancer research using multi-omics approaches.

Fig. 2

Latest findings in cancer multi-omics research.

Multi-omics approaches integrate various high-throughput sequencing datasets across a range of molecular layers. Biological features are subject to multi-view clustering methods and account for distinct subtypes of cancer patients based on relevant clinical features.

Lung cancer

Despite extensive research on its mutation signature and gene expression landscape, LUAD shows a high level of intrinsic or acquired resistance after treatment. Therefore, recent multi-omics-based efforts have been made to integrate genomic, transcriptomic, and proteomic datasets and decipher the molecular features underlying durable treatment responses. Recently, the CPTAC has conducted a large-scale multi-omics study of LUAD by integrating WES, WGS, RNA-seq, miRNA and DNA methylation profiling, and high-resolution mass spectrometry-based proteomics, phosphoproteomics, and acetylproteomics. Integrative multi-omics clustering revealed four clusters of clinical and molecular features. For example, the patients in Cluster 1 were mostly TP53 positive but STK11 negative, and showed high gene expression in proximal inflammatory structures and high CpG methylation. In contrast, the patients in Cluster 2 were TP53 negative and their transcriptome was enriched in proximal proliferative subcluster genes. This multi-omics approach also enabled to dissect ethnic differences in the cohort, represented by Cluster 3 (Vietnamese patients) and Cluster 4 (Chinese patients), which exhibited distinct mutation signatures (Gillette et al., 2020). Moreover, deep-scale proteogenomic studies revealed a novel KEAP1/NFE2L2 network mechanism based on cis and trans regulation. Driver mutations in KEAP1 did not impact the levels of KEAP1 and NFE2L2 transcripts but were highly correlated with the phosphorylation of NFE2L2 and low protein expression of KEAP1. The KEAP1/NFE2L2 heterocomplex upregulates the antioxidant pathway to protect cancer cells and can be used as a unique biomarker for LUAD. In another large-scale study, Chen et al. (2020) applied multi-omics approaches for early-stage, non-smoker patients in Taiwan using WES, RNA-seq, and proteomics datasets (Chen et al., 2020). Clustering was performed separately for proteomics, transcriptomics, and phosphoproteomics datasets, and clustering of proteomics data into three subtypes was chosen as the best representative of tumor staging and driver mutation classification. The largest group, Subtype 1, was composed of late-stage tumors (> II) with a high mutation rate, including in TP53. Subtype 2 represented IA- and IB-stage patients that did not carry the EGFR-L858R mutation. Finally, early-stage (IA) patients that lacked the TP53 mutation were classified into Subtype 3. To further decipher the biological features of this cohort, these authors constructed protein-protein interaction network models using STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) (Szklarczyk et al., 2019). The constructed models explained the differential regulation of the three subtypes mentioned above. It was found that extracellular matrix (ECM)-regulated pathways, involving the proteins MMP7, MMP11, and MMP12, were significantly upregulated in Subtype 1 patients. Immunohistochemical staining for these three matrix metalloproteinases (MMPs) revealed that MMP11 was highly associated with patient survival and was a candidate biomarker. This study also showed a clear APOBEC signature in females, associated with upregulation of DNA damage proteins and phosphosites, implicating putative environmental carcinogens in cancer development of non-smoking patients.

Breast cancer

Multi-omics analyses have increased our knowledge of breast cancer biology. In particular, integrative analyses have revealed the recurrence of mutations in the TP53, PIK3CA, and GATA3 genes in breast cancer, but also the presence of specific mutations within subtypes, such as PIK3CA mutations in luminal tumors (Cancer Genome Atlas Network, 2012b). As a result, multi-omics approaches could reveal a new subtype of breast cancer that had not been previously detected from a single dataset. Similarly, integrated analyses revealed the activation of signaling pathways promoting HER2 or epidermal growth factor receptor (EGFR) activity. Given the observed downstream phosphorylation of EGFR, the activation of the HER2 signaling network might reflect the need for a treatment strategy tailored to this subgroup of patients. Endometrial, colon, and rectal cancers have been associated with hypermutation, which might be attributed to microsatellite instability, while a new type of instability driven by mutations of the POLE gene results in ultra-mutated tumors (Cancer Genome Atlas Network, 2012a). Multi-omics analyses have reported MYC-directed activation in aggressive colorectal carcinoma. In clear cell renal cell carcinoma, alterations in cellular oxygen sensing and chromatin remodeling/histone methylation, as well as metabolic shifts in the tricarboxylic acid (TCA) cycle, have been observed, and might be key processes in the pathology of this cancer type (Cancer Genome Atlas Research Network, 2013). An integrative analysis of gene expression and proteomics has been applied to the survival data of ERBB2-positive patients, and revealed breast tumors with acquired resistance to lapatinib and ability to block EGFR/ERBB2 signaling (Komurov et al., 2012). Nonetheless, an increase in glucose metabolism, unfolded protein response, and endoplasmic reticulum (ER) stress pathways reduced the ability of lapatinib to induce cell death. Arguably, this might imply that targeting both metabolic and signaling networks may improve patient outcomes (Csibi et al., 2013; Komurov et al., 2012). A recent study on 122 patients integrating data on mutations, mRNA expression, protein expression, and post-translational modifications (phosphorylation and acetylation) has yielded robust profiles to elucidate the biological features of breast cancer (Krug et al., 2020). The resulting subtypes, that is, the basal-inclusive, HER2-inclusive, LumA-inclusive, and LumB-inclusive subtypes, were similar to those generated by the already existing and widely used PAM50 assay but revealed hidden biological structures such as the status of the ERBB2 amplicon, stratified by proteomics assessment; the RB status, which is deeply related to the CDK4/6 inhibitor; and post-translational cross-linkage between proteins involved in cytoplasmic and mitochondrial metabolic pathways. The acetylproteome was found to be useful for distinguishing cancers into luminally and basally enriched subtypes, based on their metabolic activity.

Gastric cancer

Multi-omics research on gastric cancers revealed four subtypes: 1) an Epstein–Barr virus subtype with recurrent PIK3CA mutations, 2) a microsatellite-unstable subtype with a high mutation rate, 3) a genomically stable type enriched in a diffuse histological variant, and 4) a chromosomally unstable type with aneuploidy and focal amplification of receptor tyrosine kinases (Cancer Genome Atlas Research Network, 2014). A recent proteogenomic study of early-onset gastric cancer revealed four subtypes through integrated analysis; moreover, phosphorylation data supported the classification into four subtypes and provided information about active signaling pathways (Mun et al., 2019). The authors of this study applied a network propagation method to mutation and phosphorylation data and calculated two types of network-smoothed scores. Two functionally related cellular processes, affiliated with gastric cancer pathogenesis, were identified using network-smoothed scores for pairs of mutated genes and phosphorylated proteins. The first cellular process was represented by Notch and caspase signaling with mutated genes and phosphorylated proteins. The second cellular process was associated with MAPK, AMPK, FOXO, mTOR, and T-cell receptor signaling. Therefore, multi-omics approaches enable the discovery of various subtypes of gastric cancer, thereby allowing a comprehensive understanding of patient stratification and suggesting novel possibilities for personalized targeted therapy.

Glioblastoma

In highly characterized samples of glioblastoma patients, a multi-omics approach has delineated core transcriptional factors (CEBP and STAT3) that widely regulate mesenchymal transformation in glioblastoma (Carro et al., 2010). Integrative analyses of gene expression and phosphoproteomes have identified several cellular features that respond to stress and growth factors (Hill et al., 2012; Huang et al., 2013), are key regulators of the EGFR signaling pathway, and are associated with patient survival outcomes (Amit et al., 2007). Similarly, combining proteomic and metabolomic profiles also revealed a unique regulatory function in a cellular network of stress and growth factors (Bordbar et al., 2012). Dekker et al. (2020) conducted an integrative multi-omics analysis of gene and protein expression, as well as phosphoproteomic profiles, using paired primary recurrent tissue samples from eight glioblastoma patients (Dekker et al., 2020). Half of the patients showed a marked difference in the phosphorylation of STMN1 (S38), a component of the ERBB4 signaling pathway.

Acute myeloid leukemia

Integrating methylation profiles with genomic and transcriptomic datasets can substantiate the utility of studying acute myeloid leukemia (AML). A multi-omics analysis of 200 adult patients with AML showed distinct gene expression and methylation patterns across samples (Cancer Genome Atlas Research Network et al., 2013b). In particular, CpG-sparse regions showed a marked difference in methylation due to gene mutations. AML cells with IDH1 and IDH2 mutations exhibited more extensive methylation than normal CD34+CD38- cells, whereas AML cells with MLL fusions or co-occurring NPM1, DNMT3A, and FLT3 mutations were related to loss of DNA methylation.

Pancreatic ductal adenocarcinoma

A multi-omics approach has also been applied to pancreatic ductal adenocarcinoma (PDAC) by integrating omics profiling of 150 patients for mutations, gene expression (mRNA, miRNA, and long non-coding RNA [lncRNA]), DNA methylation, and protein expression (Cancer Genome Atlas Research Network, 2017). KRAS mutational heterogeneity and signatures of individual pancreatic cancers have been identified, indicating the existence of distinct molecular subtypes of pancreatic cancer. For multi-omics clustering, the SNF method was applied to mRNA, miRNA, and DNA methylation data, and allowed to identify three clusters, which are mostly associated with tumor purity and gene expression signatures. This provides insights into the importance of considering neoplastic cellularity for further analysis of PDAC and the need for molecular characterization platforms to further stratify samples.

ADVANCES IN DRUG TARGET DISCOVERY USING CANCER MULTI-OMICS

Drug target discovery is a critical step in the development of cancer drugs and personalized therapeutics. In traditional drug target discovery, biomolecules with a confirmed mechanism of action are selected through a series of studies, which require enormous manpower (Lindsay, 2003; Paananen and Fortino, 2020). Over the last decade, putative drug targets have been identified through the latest high-throughput genomic approaches in combination with experimental validation, including overexpression or knockdown by RNAi and the use of transgenic animals and model organisms (Benson et al., 2006). Multi-omics is an interdisciplinary approach to study biological characteristics, and can comprehensively yield many drug target candidates in a cost-effective manner. The analysis of 14 cancer subtypes from TCGA multi-omics datasets revealed 40 driver genes associated with the Wnt, Notch, Hedgehog, JAK/STAT, NK-KB, and MAPK signaling pathways (Chen et al., 2014). Among them, well-known driver genes such as EGFR, ERBB2, PIK3CA, and KRAS were confirmed to be upregulated in several cancers, and DCUN1D1 and NSD3 were identified as new diver genes. Along with the success of trastuzumab (an agent targeting HER2), the use of multi-omics approaches for the discovery of new druggable targets in breast cancer has emerged. A recent proteomic analysis of 105 breast cancer patients has elucidated the association of this cancer type with CDK12, PAK1, PTK2, RIPK2, and TLK2 amplicons, and highlighted the overexpression of EGFR following the loss of CETN3 and SKP1 (Mertins et al., 2016). Progress has also been made with regard to tumor metabolites. Jain et al. (2012) detected consumption and release (CORE) profiles of 219 metabolites from NCI-60 cell lines. After the integrated analysis of CORE profiles with gene expression data, these authors demonstrated that glycine consumption and upregulation of the mitochondrial glycine biosynthetic pathway were highly correlated with the proliferation of cancer cells. Multi-omics approaches may allow systematic assessment of drug discovery for personalized cancer therapy and improve the efficacy of chemotherapy (Aguirre et al., 2018; Li et al., 2013; Pauli et al., 2017). Refining molecular-defined subsets of patients can provide information on drug response and resistance, which vary among patients. Cui et al. (2020) integrated the expression of lncRNA, miRNA, mRNA, methylation, and the profile of somatic mutations with the expression of drug response-related lncRNAs. These authors found that lncRNAs respond to diverse chemotherapeutic drugs and characterized some key lncRNAs, such as HOXA-AS2, which mediate resistance to the drug adriamycin in BRCA patients (Cui et al., 2020). Another proteogenomic study of breast cancer found that triple-negative BRCA (TNBC) tumors with RB1 mutations or deletions are resistant to the CDK4/6 inhibitor palbociclib, unlike wild-type TNBC. However, most of the TNBC samples showed a small level of RB protein expression along with that of the wild-type RB1 gene. Based on previous findings, the Genomics of Drug Sensitivity in Cancer (GDSC) data analysis showed that the response to palbociclib was correlated with the total amount of RB protein, regardless of the RB1 genotype. An exception to this is that the I388S, P515L, and N480 (in-frame) mutations of the RB1 gene led to poor palbociclib response (Krug et al., 2020). Collectively, these studies indicate that multi-omics analysis can unravel new biological characteristics and enable to discover drug targets that cannot be pinpointed based on single-omics data.

CONCLUDING REMARKS

In this review, we introduce computational methods for multi-omics studies and report the latest findings in cancer research based on them. Multi-omics approaches can fully characterize the intersection between different layers of quantitative information, systematically summarizing biological interactions from an individual cell or tissue to an individual patient with a primary tumor and possible metastases. In addition, such integration can reflect the molecular characteristics of tumors at various levels, from genes to proteins, and different cancer stages through multidisciplinary analysis. Multi-omics approaches may hold the potential to study different cancer types with a high level of similarity, in terms of molecular characteristics, to basal-like breast cancer, high-grade serous ovarian cancer, and serous endometrial cancer (Cancer Genome Atlas Research Network et al., 2013a). A systems approach integrating multi-omics data is key to understanding cancer biology and investigating the molecular pathogenesis of cancer. Multi-omics data analysis across tumor types can identify molecular characteristics commonly underlying a range of cancer types and further detail patient subgroups as well as the molecular classification of cancer subtypes. Therefore, multiple data layers, including genomics, transcriptomics, epigenomics, and proteomics datasets, are required to fully represent the molecular and clinical structures of cancer patients. The generation of high-quality and unbiased datasets is a critical part of multi-omics approaches. In addition, further studies should consider proper integration methods and computational algorithms for robust and systematic assessment to obtain solid findings and predictive models.

84 in total

1. Defining principles of combination drug mechanisms of action.

Authors: Justin R Pritchard; Peter M Bruno; Luke A Gilbert; Kelsey L Capron; Douglas A Lauffenburger; Michael T Hemann
Journal: Proc Natl Acad Sci U S A Date: 2012-12-18 Impact factor: 11.205

2. Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma.

Authors:
Journal: Cancer Cell Date: 2017-08-14 Impact factor: 31.743

3. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data.

Authors: Zi Yang; George Michailidis
Journal: Bioinformatics Date: 2015-09-15 Impact factor: 6.937

4. Multiomics profiling of paired primary and recurrent glioblastoma patient tissues.

Authors: Lennard J M Dekker; Nynke M Kannegieter; Femke Haerkens; Emma Toth; Johan M Kros; Dag Are Steenhoff Hov; Julien Fillebeen; Lars Verschuren; Sieger Leenstra; Anna Ressa; Theo M Luider
Journal: Neurooncol Adv Date: 2020-07-04

5. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM.

Authors: Charles J Vaske; Stephen C Benz; J Zachary Sanborn; Dent Earl; Christopher Szeto; Jingchun Zhu; David Haussler; Joshua M Stuart
Journal: Bioinformatics Date: 2010-06-15 Impact factor: 6.937

6. Integrated genomic analyses of ovarian carcinoma.

Authors:
Journal: Nature Date: 2011-06-29 Impact factor: 49.962

Review 7. Single-Cell Toolkits Opening a New Era for Cell Engineering.

Authors: Sean Lee; Jireh Kim; Jong-Eun Park
Journal: Mol Cells Date: 2021-03-31 Impact factor: 5.034

8. Linking proteomic and transcriptional data through the interactome and epigenome reveals a map of oncogene-induced signaling.

Authors: Shao-shan Carol Huang; David C Clarke; Sara J C Gosline; Adam Labadorf; Candace R Chouinard; William Gordon; Douglas A Lauffenburger; Ernest Fraenkel
Journal: PLoS Comput Biol Date: 2013-02-07 Impact factor: 4.475

9. Identification of druggable cancer driver genes amplified across TCGA datasets.

Authors: Ying Chen; Jeremy McGee; Xianming Chen; Thompson N Doman; Xueqian Gong; Youyan Zhang; Nicole Hamm; Xiwen Ma; Richard E Higgs; Shripad V Bhagwat; Sean Buchanan; Sheng-Bin Peng; Kirk A Staschke; Vipin Yadav; Yong Yue; Hosein Kouros-Mehr
Journal: PLoS One Date: 2014-05-29 Impact factor: 3.240

10. Integrative genomics identifies a convergent molecular subtype that links epigenomic with transcriptomic differences in autism.

Authors: Gokul Ramaswami; Hyejung Won; Michael J Gandal; Jillian Haney; Jerry C Wang; Chloe C Y Wong; Wenjie Sun; Shyam Prabhakar; Jonathan Mill; Daniel H Geschwind
Journal: Nat Commun Date: 2020-09-25 Impact factor: 17.694

4 in total