Literature DB >> 34900668

Navigating Multi-Scale Cancer Systems Biology Towards Model-Driven Clinical Oncology and Its Applications in Personalized Therapeutics.

Mahnoor Naseer Gondal^1,2, Safee Ullah Chaudhary¹.

Abstract

Rapid advancements in high-throughput omics technologies and experimental protocols have led to the generation of vast amounts of scale-specific biomolecular data on cancer that now populates several online databases and resources. Cancer systems biology models built using this data have the potential to provide specific insights into complex multifactorial aberrations underpinning tumor initiation, development, and metastasis. Furthermore, the annotation of these single- and multi-scale models with patient data can additionally assist in designing personalized therapeutic interventions as well as aid in clinical decision-making. Here, we have systematically reviewed the emergence and evolution of (i) repositories with scale-specific and multi-scale biomolecular cancer data, (ii) systems biology models developed using this data, (iii) associated simulation software for the development of personalized cancer therapeutics, and (iv) translational attempts to pipeline multi-scale panomics data for data-driven in silico clinical oncology. The review concludes that the absence of a generic, zero-code, panomics-based multi-scale modeling pipeline and associated software framework, impedes the development and seamless deployment of personalized in silico multi-scale models in clinical settings.

Entities: Chemical

Keywords: cancer systems biology; data-driven oncology; in silico cancer systems oncology; multi-scale cancer modeling; personalized cancer therapeutics; predictive systems oncology

Year: 2021 PMID： 34900668 PMCID： PMC8652070 DOI： 10.3389/fonc.2021.712505

Source DB: PubMed Journal: Front Oncol ISSN： 2234-943X Impact factor: 6.244

1 Introduction

In 1971, President Richard Nixon declared his euphemistic “war on cancer” through the promulgation of the National Cancer Act (1). Five decades later, despite ground-breaking discoveries and advancements in the field of cancer systems biology, a definitive and affordable cure for all types of cancer still evades humankind (2). Numerous “breakthrough” treatments have also gone on to exhibit adverse side effects (3, 4) that lower patients’ quality of life (QoL) or have reported degrading efficacies (5). At the heart of this problem lies our limited understanding of the bewildering multifactorial biomolecular complexity as well as patient-centricity of cancer. Recent advances in biomolecular cancer research have helped factor system-level oncological manifestations into mutations across genetic, transcriptomic, proteomic, and metabolomic scales (6–9) that also act in concert (10, 11). Crosstalk between multi-scale pathways comprising of these oncogenic mutations can further exacerbate the etiology of the disease (7, 12–14). The combination of mutational diversity and interplay between the constituent pathways adds genetic heterogeneity and phenotypic plasticity in cancer cells (15, 16). Hanahan and Weinberg (17, 18) summarized this heterogeneity and plasticity into “Hallmarks of Cancer” – a set of progressively acquired traits during the development of cancer. Experimental techniques such as high-throughput next-generation sequencing, and mass spectrometry-based proteomics are now providing specific spatiotemporal cues on patient-specific biomolecular aberrations involved in cancer development and growth. The voluminous high-throughput patient data coupled with the remarkable complexity of the disease has given impetus to data integrative in silico cancer modeling and therapeutic evaluation approaches (19). Specifically, scale-specific molecular insights into key regulators underpinning each hallmark of cancer are now helping unravel the complex dynamics of the disease (20) besides creating avenues for personalized therapeutics (21, 22). In this review, we will evaluate the emergence, evolution, and integration of multiscale cancer data towards building coherent and biologically plausible in silico models and their integrative analysis for employment in personalized cancer treatment in clinical settings. The review concludes by highlighting the need of integrating and modeling multi-omics data and associated software pipelines for employment in developing personalized therapeutics.

2 Scale-Specific Biomolecular Data and Its Applications in Cancer

Rapid advancements in molecular biology research, particularly in high-throughput genomics (23), transcriptomics (24), and proteomics (25) have resulted in the generation of big data on spatiotemporal measurements of scale-specific biomolecules ( ) in physiological as well as pathological contexts (26, 27). This vast and complex spatiotemporal data is expected to exceed 40 exabytes by 2025 (28) and is currently populating several online databases and repositories. These databases can be broadly categorized into seven salient database sub-types: biomolecules (29–31), pathways (32–34), networks (35–37), cellular environment (38, 39), cell lines (40–43), histopathological images (44–46), and mutations, and drug (47–52) databases, which are discussed below.

Figure 1

Overview of complex biomolecular regulation in cancer and scale-specific databases. (A) The complexity between genomic, transcriptomic, proteomic, metabolomic, cell-level, and environmental levels in a cancerous cell. Four examples of biomolecular signaling pathways are listed e.g., Hedgehog, Notch, (Wingless) Wnt, TGFβ, and AKT pathway. Stimuli from the extracellular environment signal the downstream pathway activation, in the cell, towards alternating the regulations in the proteomic, metabolomic, transcriptomic, and genomic scales, bringing out a system-level outcome in cancers. Lists (B) biomolecule (genes, transcripts, proteins, and metabolites) databases such as GenBank, GEO, TCGA, HPP, HMDB, etc. (C) Pathways databases such as PathDB, KEGG, STRING, etc. (D) Networks databases such as BioGRID, DIP, BIND, etc. (E) Environment databases e.g, ExoCarta, MatrixDB, MatrisomeDB, etc. (F) Cell lines databases such as CCLE, CLDB, Cellosaurus, etc. (G) Histopathological image database, for instance, TCIA, GTEx, TMAD, etc., and (H) Mutation and drug databases such as DrugBank, KEGG Drug, OncoKB, etc.

2.1 Biomolecular Databases

Biomolecular and clinical data generated from large-scale omics approaches for cancer research can be divided into four sub-categories: (i) genome, (ii) transcriptome, (iii) proteome, and (iv) metabolome (53).

2.1.1 Genome-Scale Databases

The foremost endeavor to collect and organize large-scale genomics data into coherent and accessible repositories led to the establishment of GenBank in 1986 (54) ( , ). This open-access resource now forms one of the largest public databases for nucleotide sequences from large-scale sequencing projects comprising over 300,000 species (55, 56). In a salient study employing GenBank, Diez et al. (57) screened breast and ovarian cancer families with mutations in BRCA1 and BRCA2 genes and its distribution in the Spanish population. Medrek et al. (58) employed microarray profile sets from GenBank to analyze gene levels for CD163 and CD68 in different breast cancer patient groups. The study established the need for localization of tumor-associated macrophages as a prognostic marker for breast cancer patients. To date, GenBank remains a comprehensive nucleotide database; however, its data heterogeneity poses a significant challenge in its employment in the development of personalized cancer therapeutics. Towards an improved data stratification and retrieval of genome-scale data, in 2002, Hubbard et al. (59) launched the Ensembl genome database. Ensembl provides a comprehensive resource for human genome sequences capable of automatic annotation and organization of large-scale sequencing data. Amongst various genome-wide studies utilizing Ensembl, Easton et al. (60) used this database to extract human sequence information to identify novel breast cancer susceptibility loci. Patient-specificity (10, 11) and mutational diversity (19) in cancer can manifest across spatiotemporal scales. Hence, the availability of patient-specific data for each type of cancer can furnish valuable insights into the biomolecular foundation of the disease. In an attempt to provide cancer type-specific mutation data, Wellcome Trust’s Sanger Institute developed Catalogue of Somatic Mutations in Cancer (COSMIC) (61) database. COSMIC comprises of 10,000 somatic mutations from 66,634 clinical samples. Schubbert et al. (62) employed COSMIC’s mutation data to investigate Ras activity in cancers as well as developmental disorders. The study concluded that the duration, as well as the strength of hyperactive Ras signaling controls the probability of tumorigenesis. Similarly, Weir et al. (63) utilized COSMIC data on tumor-suppressor and proto-oncogenes in the study to characterize the genome of lung adenocarcinomas. The systematic copy-number analysis with SNP data indicated that several lung cancer genes remain to be elucidated and characterized.

2.1.2 Transcriptome-Scale Databases

Gene-level information can facilitate the development of personalized cancer models; however, gene expression may vary from cell to cell and across cancer patients. As a result, cancer patients have divergent genetic signatures and transcript-level information. Hence, high-throughput transcriptomic data has the potential to provide valuable insights into the transcriptomic complexity in cancer cells and can be useful in investigating cell state, physiology, and relevant biological events (64) ( and ). Towards developing such a transcriptional information resource, in 2000, Edgar et al. (31) launched the Gene Expression Omnibus (GEO) initiative. GEO acts as a tertiary resource providing coherent high-throughput transcriptomic and functional genomics data. The platform now hosts over 3800 datasets and is expanding exponentially. GEO was employed by Chakraborty et al. (65) for annotation of chemo-resistant cell line models which helped investigate chemoresistance and glycolysis in ovarian cancers. The study identified Mitochondrial Calcium Uptake 1 (MICU1) as an important component for cancer metabolism that influences aerobic glycolysis and chemoresistance and can have a potential role in cancer therapeutics. The curation of patient-specific gene and protein expression data led to the development of The Cancer Genome Atlas (TCGA) (66). TCGA also captures the copy number variations and DNA methylation profiles for different cancer subtypes. TCGA’s potential (49, 67) is well exhibited by Leiserson et al.’s (68) pan-cancer analysis which helped identify 14 significantly mutated subnetworks containing numerous genes with rare somatic mutations across many cancers types. Davis et al. (69) further evaluated the genomic landscape of chromophobe renal cell carcinomas (ChRCCs) to elicit molecular patterns as clues for determining the origin of cancer cells. To facilitate in data management across different cancer projects as well as to ensure data uniformity towards developing data-driven models, the International Cancer Genome Consortium (ICGC) was launched in 2010 (70). ICGC adopts a federated data storage architecture that enables it to host a collection of scale-specific data from TCGA and 24 other projects (71). Burn et al. (72) estimated the distribution of cytosine in liver tumor data using ICGC. The study reported APOBEC3B (A3B) catalyzed deamination as a chronic source of DNA damage in breast cancer which also explains tumor cell evolution and heterogeneity. Supek et al. (73) compared mutation rates between different human cancers and reported the influence of “silent” mutations as a frequent contributor to cancer. Numerous databases have been established to store large-scale genomic data, however, insights from an integrated analysis of genomic data across databases have the potential to provide precise biomolecular cues into complex processes and evolution in cancer cells. This was enabled by cBio Cancer Genomics Portal (cBioPortal) (74) in 2012, with multidimensional dataset retrieval, and exploration from multiple databases. The platform additionally provides data visualization tools, pathway exploration, statistical analysis, and selective data download features for seamless utilization of large-scale genomics data across genes, patient samples, projects, and databases (75). Numerous studies have effectively employed cBioPortal (76–78); in particular, Jiao et al. (79) evaluated the prognostic value of TP53 and its correlation with EGFR mutations in advanced non-small-cell lung cancer (NSCLC). The study established that TP53 coupled with EGFR mutation can lead to the more accurate prognosis of advanced NSCLC. Hou et al. (80) also used cBioPortal to deduce targetable genotypes which are present in young patients with lung adenocarcinomas and revealed that young patients with lung adenocarcinoma were more likely to harbor targetable genotype.

2.1.3 Proteome-Scale Databases

Transcriptomic data remains limited in providing a deterministic proteome profile (81–83). Particularly, transcripts produced in a cell can be degraded, translated inefficiently, or modified due to post-translational modification (84, 85) resulting in no or a very small amount of functional protein (64). This relatively low correlation between transcriptome and proteome data was highlighted in 2019, by Bathke et al. (86) where it was shown that an increase in transcript synthesis cannot be directly associated with an increase in functional response in a cell. To facilitate functional analysis, there is a need to utilize proteomic-level data, which can help to capture a more accurate quantitative assessment of complex biomolecular regulation for functional studies ( and ). Following the successful completion of the Human Genome Project (HGP) (1998), in 2003, a group of Swedish researchers reported the Human Protein Atlas (HPA) (87, 88) with an aim to map the entire set of human proteins in cells, tissues, and organs for normal as well as cancerous state (89). HPA employs large-scale omics-based technologies to localize and quantify protein expression patterns. The database has successfully managed to host comprehensive information on human proteins from cells, tissues, pathology, brain, and blood region-related studies. HPA data can be employed for various purposes such as investigating the spatial distribution of proteins in different tissue and comparing normal and cancerous protein expression patterns across samples, etc (87). In a salient study employing HPA, Gámez-Pozo et al. (90) studied the localized expression pattern of proteins to help profile human lung cancer subtypes. The study reported a combination of peptide-level expressions which can distinguish between non-small lung cancer samples and normal lung cancer in different histological subtypes. Imberg-Kazdan et al. (91) employed HPA to identify novel regulators of androgen receptor (AR) function in prostate cancer towards therapy. Another significant stride towards generation of proteome-level information came with the establishment of the Human Proteome Project (HPP) (92, 93) in 2008 (94) by the Human Proteome Organization (HUPO). HPP consolidated mass spectrometry-based proteomics data, and bioinformatics pipelines, with the aim to organize and map the entire human proteome. To date, numerous studies have utilized HPP towards identifying the complex protein machinery involved in cancer cell fate outcomes (95–100). Amongst the earliest attempts, in 2001, Sebastian et al. (101) employed the HPP platform to deduce the complex regulatory region of the human CYP19 gene (‘armatose’), one of the contributors of breast cancer regulation. HPP project was later segmented into “biology and disease-oriented HPP” (B/D HPP) (102) and chromosome-centric HPP (C-HPP) (103). Specifically, Gupta et al. (104) carried out an extensive analysis of existing experimental and bioinformatics databases to annotate and decipher proteins associated with glioma on chromosome 12, while, Wang et al. (95) performed a qualitative and quantitative assessment of human chromosome 20 genes in cancer tissue and cells using C-HPP resources. The study revealed that several cancer-associated proteins on chromosome 20 were tissue or cell-type specific.

2.1.4 Metabolome-Scale Databases

Metabolic reprogramming is one of the earliest manifestations during tumorigenesis (105) and therefore, potentiates the usefulness of identifying metabolic biomarkers involved in cancer onset, its prognosis, as well as treatment. Large-scale efforts to collect metabolomics data have led to the development of several online databases (106–108) ( and ) including the Golm Metabolome Database (GMD) (106), in 2004. GMD provides a comprehensive resource on metabolic profiles, customized mass spectral libraries, along spectral information for use in metabolite identification. GMD was employed in 2011 by Wedge et al. (109) to identify and compare metabolic profiles in serum and plasma samples for small-cell lung cancer patients towards determining optimal agent for onwards analysis. The study showed that the discriminatory ability of both serum and plasma was equivalent. In 2013, Pasikanti et al. (110) utilized GMD to identify biomarker metabolites present in bladder cancer. The study proposed a potential role of kynurenine in malignancy and therapeutic intervention in bladder cancer. To allow for large-scale metabolic data stratification and retrieval, in 2007, Wishart et al. published the Human Metabolome Database (HMDB) (107, 111). HMDB contains organism-specific information on metabolites across various biospecimens and their accompanying environments. It is now the world’s largest metabolomics database with around 114,100 metabolites that have been characterized and annotated. HMDB was employed by Sugimoto et al. (112) to identify environmental compounds specific to oral, breast, and pancreatic cancer profiles. The study identified 57 principal metabolites to help predict disease susceptibility, besides being potential markers for medical screening. Agren et al. (113) employed metabolomic data from HMDB to construct metabolic network models for 69 human cells and 16 cancer types. The study’s comprehensive metabolic network analysis between disease and normal cell types has the potential to provide avenues for the identification of cancer-specific metabolic targets for therapeutic interventions. The HMDB supports data deposition and dissemination, however, integrated exploratory analysis is not available. The Metabolomics Workbench (108), reported in 2016, provides information on metabolomics metadata and experimental data across species, along with an integrated set of exploratory analysis tools. The platform also acts as a resource to integrate, deposit, track, analyze, as well as disseminate large-scale heterogeneous metabolomics data from a variety of studies. In a case study built using this platform, Hattori et al. (114) studied the aberrant BCAA (branched-chain amino acids) metabolism activation by MSI2 (Musashi2)-BCAT1 axis which they reported to drive myeloid leukemia progression.

2.2 Biomolecular Pathway Databases

Investigations restricted to single biomolecular scales have limited translational potential as cancer dysregulation is driven by tightly coupled biomolecular pathways constituted by biomolecules from a variety of spatiotemporal scales (discussed above). Such biomolecular pathways represent organized cascades of interactions integrating different spatiotemporal scales towards reaching specific phenotypic cell fate outcomes. Numerous scale-specific and multi-omics biomolecular pathway databases now exist to help retrieve, store and analyze existing pathway information towards understanding cellular communication in light of complex cancer regulation (32, 34, 115). One of the earliest attempts at integrating genomics data for pathway construction came in 1995, with the establishment of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (32) ( and ). Over time, KEGG has significantly expanded to include high-throughput multi-omics data (116). As a result, the resource is divided into fifteen sub-groups including KEGG Genome (for genome-level pathways), KEGG Compound (for small molecules level pathways), KEGG Gene (for gene and protein pathways), KEGG Reaction (for biochemical reaction and metabolic pathways), etc. Li et al. (117) used the KEGG database to perform pathway enrichment analysis for predicting the function of circular RNA (circRNA) dysregulation in colorectal cancer (CRC). The study highlighted circDDX17 potential role as a tumor suppressor and biomarker for CRC. While employing KEGG, Feng et al. (118) identified four up-regulated differentially expressed genes associated with poor prognosis in ovarian cancer. Further, to furnish information on pathways for high-throughput functional analysis studies, PANTHER (protein annotation through evolutionary relationship) database was established in 2010 (34, 115). PANTHER hosts information on ontological gene and protein-protein interaction pathways by leveraging GenBank and Human Gene Mutation Database (HGMD) (119), etc. Turcan et al. (120) employed PANTHER to perform network pathway enrichment for biological processes in differentially expressed genes, especially to investigate IDH1 mutations in glioma hypermethylation phenotype. The study provided a framework for understanding gliomas and the interplay between genomic as well as epigenomic regulation in cancer. To store metabolic pathway information in a cell, Karp et al. (121) developed MetaCyc, a comprehensive reference database comprising of metabolic pathways. MetaCyc is currently available as a web-based resource with metabolic pathway information which can be employed to investigate metabolic reengineering in cancers, carry out biochemistry-based studies, and explore cancer cell metabolism, etc. Miller et al. (122) demonstrated the utility of MetaCyc database by evaluating plasma metabolomic profiles after limonene intervention in breast cancer patients. The study employed MetaCyc to perform pathway-based interpretations and revealed that such alterations were related with tissue-level cyclin D1 expression changes.

2.3 Biomolecular Network Databases

The regulatory complexity of cancer is compounded by the crosstalk between numerous multi-scale biomolecular pathways resulting in the formation of complex interaction networks ( and ). One of the earliest biomolecular network databases, the Biomolecular Interaction Network Database (BIND) (123) was established in 2001, with an aim to organize biomolecular interactions between genes, transcripts, proteins, metabolites, as well as small molecules. Chen et al. (124) employed BIND to construct a biological interaction network (BIN) towards investigating tyrosine kinase regulation in breast cancer development. The study identified SLC4A7 and TOLLIP as novel tyrosine kinase substrates which are also linked to tumorigenesis. BIND provides a comprehensive resource of predefined interacting pathways; however, it does not contain ‘indirect’ interaction information. In contrast, the Molecular INTeraction database (MINT) (37), developed in 2002, curates existing literature to develop networks with both direct as well as indirect interactions from large-scale projects with information from genes, transcripts, proteins, promoter regions, etc. MINT can store data on “functional” interaction such as enzymatic properties and modifications present in biomolecular regulatory networks. The database was employed by Vinayagam et al. (125) to construct a human immunodeficiency virus (HIV) network that helped identify novel cancer genes across genomic datasets. The Database of Interacting Proteins (DIP) (126) was developed to mine existing literature and experimental studies on biomolecules and their pathways to construct protein interaction networks (127–129). Goh et al. (127) constructed a protein-protein interaction network for investigating liver cancer using DIP. The study revealed that hepatocellular carcinoma (HCC) at moderate stage is enriched in proteins that are involved in the immune response. Similarly, Zhao et al. (128) identified autophagic pathways in plants lectin-treated cancer cells which are regulated by microRNAs. The study showed that plant lectin has the potential to block sugar-containing receptor EGFR-mediated pathways thereby leading to autophagic cell death. To further consolidate and integrate protein interaction data across pathways as well as organisms, the Search Tool for the Retrieval of Interacting Genes/Proteins – STRING database was developed in 2005 (130). STRING provides a comprehensive text-mining and computational prediction platform which is accessible through an intuitive web interface (131). STRING database also provides information on interaction weights for edges between biomolecules to show an estimated likelihood for each interaction in the network (131). Mlecnik et al. (132) employed STRING database to study T-cells homing factors in colorectal cancer and demonstrated that specific chemokines and adhesion molecules had high densities of T-cell subsets in cancer.

2.4 Cellular Environment Databases

Each pathway within a biomolecular network requires input cues from the extracellular environment for onward downstream signal transduction (133–135). In the case of cancer, the biomolecular milieu constituting the tumor microenvironment (TME) acts as a niche for tumor development, metastasis, as well as therapy response (136) ( and ). Efforts to curate information from the environmental factors such as metabolites, matrisome, and other environmental compounds led to the development of MatrixDB (38) in 2011, which hosts matrix-based information on interactions between extracellular proteins and polysaccharides. MatrixDB additionally links databases with information on genes encoding extracellular proteins such as Human Protein Atlas (137) and UniGene (138) as well as host transcripts information. Celik et al. (139) employed MatrixDB data to evaluate epithelial-mesenchymal transition (EMT) inducers in the environment using nine ovarian cancer datasets and discovered a novel biomarker, HOPX, with the potential to drive tumor-associated stroma. To host studies on extracellular matrix (ECM) proteins from normal as well as disease-inflicted tissue samples, MatrisomeDB (39) was established in 2020. The database curates 17 studies on 15 physiologically healthy murine and human tissue as well as 6 cancer types from different stages (including breast, colon, ovarian, and lung cancer) along with other diseases. Levi-Galibov et al. (140) employed MatrisomeDB to investigate the progression of chronic intestinal inflammation in colon cancer. The study revealed the critical role of heat shock factor 1 (HSF1) during early changes in extracellular matrix structure as well as its composition.

2.5 Cell Line Databases

In vitro cell lines derived from cancer patients have become an essential tool for clinical and translational research (141). These cell lines are defined based on gene expression profiles and morphological features which have been cataloged in various databases such as the Cancer Cell Line Encyclopedia (CCLE) (42) ( and ). CCLE contains mutation data on 947 different human cancer cell lines coupled with pharmacological profiles of 24 anti-cancer drugs (42) for evaluating therapeutic effectiveness and sensitivity. Li et al. (142) employed CCLE data to investigate cancer cell line metabolism. The study showed that the mutated asparagine synthetase (ASNS) hypermethylation can cause gastric as well as hepatic cancers to sensitized asparaginase therapy. Hanniford et al. (143) demonstrated epigenetic silencing of RNA during invasion and metastasis in melanoma using the CCLE database. Other cell line databases include Cell Line Data Base (CLDB) (43), and The COSMIC Cell Lines Project (40), and CellMinerCDB (41). The CellMinerCDB (2018) curates data from National Cancer Institute (NCI) (144), BROAD institute (145), Sanger institute (146), and Massachusetts General Hospital (MGH) (147) and provides a platform for pharmacological and genomic analysis.

2.6 Histopathological Image Databases

Additionally, histopathological image datasets derived from the microscopic examination of tumor biopsy samples furnish information on cellular structure, function, chemistry, morphology, etc. Numerous histopathological image-based databases have been developed to store, manage, and retrieve such information ( and ). Amongst these databases, The Cancer Imaging Archive (TCIA) (46), reported in 2013, provides a multi-component architecture with various types of images including region-specific (e.g., Breast), cancer-type specific (e.g., TCGA-GBM, TCGA-BRCA), radiology, and anatomy images (e.g., Prostate-MRI). The cancer image collection in TCIA has been captured using a variety of modalities including radiation treatment, X-ray, mammography, and computed tomography (CT), etc (148). Li et al. (149) exploited TCIA by using radiomics data in predicting the risk for breast cancer recurrence, while Sun et al. (150) employed image data to perform a cohort study to validate a radiomics-based biomarker in cancer patients. The study developed a radiomic signature for CD8 cells using the TCGA dataset for predicting the immune phenotype of tumors and deduce clinical outcomes. In 2013, image data from TCIA was integrated with The Cancer Digital Slide Archive (CDSA) (44). The CDSA hosts imaging as well as histopathological data and provides more than 20,000 whole-slide images of 22 different cancer types. The whole-slide images of individual patients present in CDSA help in linking tumor morphology with the patient’s genomic and clinical data. Khosravi et al. (151) performed a deep convolution study using CDSA, to distinguish heterogeneous digital pathology images across different types of cancers. To associate patient’s genetic information and histology images, Genotype-Tissue Expression (GTEx) (45) was reported in 2014, and was curated using datasets from non-disease tissues of 1000 individuals. Patel et al. (152) employed GTEx to investigate intratumoral heterogeneity present in glioblastoma and concluded that glioblastoma subtype classifiers are variably expressed in individual cells.

2.7 Mutation and Drug Databases

Pharmacological investigations have elucidated the mechanism as well as efficacies of numerous cancer drugs, in clinical and preclinical studies (153, 154). Databases with drug-target information can be employed in precision oncology towards designing efficacious patient-centric therapeutic strategies ( , ). These databases include DrugBank (155), which was established in 2006 and contains information from over 4100 drug entries, 800 FDA-approved small molecules, and 14,000 protein or drug target sequences. DrugBank combines drug data with drug-target information thus enabling applications in cancer biology including in silico drug target discovery, drug design, drug interaction prediction, etc. In a study employing DrugBank, Augustyn et al. (156) evaluated potential therapeutic targets of achaete-scute homolog 1 (ASCL1) genes in lung cancers and reported unique molecular vulnerabilities for potential therapeutics, while Han et al. (157) determined synergistic combinations of drug targets in K562 chronic myeloid leukemia (CML) cells including BCL2L1 and MCL1 combination. Further to evaluate drugs in light of the patient’s genomic signature, PanDrugs (52) database was established in 2019 and currently hosts data from 24 sources and 56297 drug-target pairs along with 9092 unique compounds and 4804 genes. Using PanDrugs, Fernández-Navarro et al. (158) prioritized personalized drug treatments using PanDrugs, for T-cell acute lymphoblastic (T-ALL) patients. Altogether, the availability of voluminous high-resolution biomolecular data has enabled the development of a quantitative understanding of aberrant mechanisms underpinning hallmarks of cancer as well as create avenues for personalized therapeutic insights. Recently, Karimi et al. (159) systematically evaluated the current multi-omics data generation approaches, as well as their associated analysis pipelines for employment in cancer research.

3 Data-Driven Integrative Modeling in Cancer Systems Biology

The need to prognosticate system-level outcomes in light of oncogenic dysregulation (160, 161) has led researchers to develop integrative data-driven computational models (162–168). Such models can help decode emergent mechanisms underpinning tumorigenesis as well as aid in the development of patient-centered therapeutic strategies (162–168). These in silico models can be broadly grouped into four salient sub-scales as biomolecular (169–171), tumor environment (172–174), cell level (175, 176), and multi-scale integrative cancer models (177–179) ( ).

Figure 2

Evolution timeline of in silico scale-specific and multi-scale data-drive cancer models. Timeline of salient in silico scale-specific and multi-scale cancer models, along with PubMed yearly report (1990-2020) to display the evolutionary trends seen in the development of (A) genome-scale cancer models, (B) Transcript-level cancer models, (C) Proteome-scale models, (D) Metabolome scale models, (E) Environment-based models, (F) Cell-level models, and (G) Multi-scale cancer models.

3.1 Biomolecular-Scale Models

In silico biomolecular models of cancer can be classified into (i) genome-scale, (ii) transcriptome-scale, (iii) proteome-scale, (iv) metabolome-scale models.

3.1.1 Genome-Scale Models

Amongst initial attempts at developing cancer gene regulation models, in 1987, Leppert et al. (169) reported a computational model to study the genetic locus of familial polyposis coli and its involvement in colonic polyposis and colorectal cancer. The study was also validated by Mehl et al. (170) in 1991, which further elucidated the formation and development of familial polyposis coli genes in colorectal cancer patients. In 2014, Stratmann et al. (171) developed a personalized genome-scale 3D lung cancer model to study epithelial-mesenchymal transition (EMT) by TGFβ-based stimulation, while in 2016, Margolin et al. (180) developed a blood-based diagnostic model to help detect DNA hypermethylation of potential pan-cancer marker ZNF154. In 2017, Jahangiri et al. (181) employed an in silico pipeline to evaluate Staphylococcal Enterotoxin B for DNA-based vaccine for cancer therapy. A similar genome-scale model of DNA damage and repair was proposed by Smith et al. (182) to evaluate proton treatment in cancer ( ).

3.1.2 Transcriptome-Scale Models

In silico models developed using transcriptomic expression data can assist in comparing gene expression patterns in cancer for investigating genetic heterogeneity and cancer development as well as towards precision therapy ( ). In 2003, Huang et al. (183) presents a mathematical model using a large-scale transcriptional dataset of breast cancer patients to elucidate patterns of metagenes for nodal metastases and relapses. In another large-scale cancer transcriptomics study, 3,000 patient samples from nine different cancer types were used to decode the genomic evolution of cancer by Cheng et al. (184). In 2013, Chen et al. (185) employed an in silico pipeline which helped identify 183 new tumor-associated gene candidates with the potential to be involved in the development of hepatocellular carcinoma (HCC), while, in 2014, Agren et al. (186) developed a personalized transcriptomic data-based model to identify anticancer drugs for HCC. In 2019, Béal et al. (187) reported a logical network modeling pipeline for personalized cancer medicine using individual breast cancer patients’ data. The pipeline was validated in 2021 (188), using in silico personalized logical models for melanomas and colorectal cancers samples in response to BRAF treatments. In a similar study conducted in 2019, Rodriguez et al. (189) developed a mathematical model for breast cancer using transcriptional regulation data to predict hypervariability in a large dynamic dataset which revealed the basis of expression heterogeneity in breast cancer.

3.1.3 Proteome-Scale Models

To capture the quantitative aspects of biomolecular regulations and functional studies (82, 83), data-driven proteomic-based cancer models are essential ( ). Such models can be particularly helpful in diagnostic as well as prognostic purposes as well as for monitoring response to treatment (82, 83). In a study employing proteome-level information, in 2011, Baloria et al. (190) carried out an in silico proteome-based characterization of the human epidermal growth factor receptor 2 (HER-2) to evaluate its immunogenicity in an in silico DNA vaccine. Akhoon et al. (191) simplified this approach with the development of a new prophylactic in silico DNA vaccine using IL-12 as an adjuvant. In 2017, Fang et al. (192) employed proteome level data towards predicting in silico drug-target interactions for applications in targeted cancer therapeutics, while in 2018, Azevedo et al. (193) designed novel glycobiomarkers in bladder cancer. Recently, in 2020, Lee et al. (194) reported an integrated proteome model of macrophage migration in a complex tumor microenvironment. However, proteome-level models are limited in their ability to provide a complete analysis of the biomolecules present in a cell since they lack information on low molecular weight biomolecular compounds such as metabolites (105).

3.1.4 Metabolome-Scale Models

Metabolic data-driven models can be especially useful in understanding cancer cell metabolism, mitochondrial dysfunction, metabolic pathway alteration, etc ( ). In 2007, Ma et al. (195) developed the Edinburgh Human Metabolic Network (EHMN) model with more than 3000 metabolic reactions alongside 2000 metabolic genes for employment in metabolite-related studies and functional analysis. In 2011, Folger et al.’s (196) employed the EHMN model to propose a large-scale flux balance analysis (FBA) model for investigating metabolic alterations in different cancer types and for predicting potential drug targets. In 2014, Aurich et al. (197) reported a workflow to characterize cellular metabolic traits using extracellular metabolic data from lymphoblastic leukemia cell lines (Molt-4) towards investigating cancer cell metabolism. Yurkovich et al. (198) augmented this workflow in 2017 and reported eight biomarkers for accurately predicting quantitative metabolite concentration in human red blood cells. Alakwaa et al. (199) employed a mathematical model to predict the status of Estrogen Receptor in breast cancer metabolomics dataset, while in 2018, Azadi et al. (200) used an integrative in silico pipeline to evaluate the anti-cancerous effects of Syzygium aromaticum employing data from the Human Metabolome Database (107).

3.2 Tumor Environment Models

Integrative mathematical models of environmental cues and extracellular matrix can help researchers abstract tumor microenvironments. Such data-driven models can be used to study angiogenesis (201), cell adhesion (202), and vasculature (203), etc ( ). Amongst initial attempts at developing cancer environment-based in silico models, in 1972, Greenpan et al. (172) designed a solid carcinoma in silico model to evaluate cancer cell behavior in limited diffusion settings. In 1976, the model was expanded (173) to investigate tumor growth in asymmetric conditions. In 1996, Chaplain developed a mathematical model to elucidate avascular growth, angiogenesis, and vascular growth in solid tumors (174). Anderson and Chaplain (204), in 1998, expanded this strategy and reported a continuous and discrete model for tumor-induced angiogenesis. This modeling approach was further augmented in 2005 by Anderson (202) to a hybrid mathematical model of a solid tumor to study cellular adhesion in tumor cell invasion. Organ-specific metastases and associated survival ratios in small cell lung cancer patients have been modeled and evaluated (205) using similar models.

3.3 Cell Models

To model cell population-level behavior in cancer, researchers are increasingly developing innovative cell lines in silico models which can complement in vivo wet-lab experiments, while overcoming wet-lab limitations (206) ( ). Such models are employed to investigate cell-to-cell interactions as well as evaluate physical features of the synthetic extracellular matrix (ECM) (206), etc (175, 207). Amongst initial attempts at developing in silico cell line models, in 1989, Shackney et al. (175) proposed an in silico cancer cell model to study tumor evolution. Results showed an association between discrete aneuploidy peaks with the activation of growth-promoting genes. In 2007, Aubert et al. (207) developed an in silico glioma cell migration model and validated cell migration preferences for homotype and heterotypic gap junctions with experimental results. Gerlee and Nelander (176) expanded this work, in 2012, to investigate the effect of phenotype switching in glioblastoma growth. In conclusion, cell-based cancer models help provide scale-specific insights into cancer, however, they remain limited in investigating the spatiotemporal tissue diversity and heterogeneity in cancer patients.

3.4 Multi-Scale Models

Recently, data-driven multi-scale models are becoming increasingly popular in cancer (208) ( ). One of the earliest attempts at developing multi-scale cancer models was in 1985 when Balding developed a mathematical model to demonstrate tumor-induced capillary growth (201). In 2000, Swanson et al. (209) proposed a quantitative model to investigate glioma cells. In a similar study, Zhang et al. (210) generated a 3D, multi-scale agent-based model of the brain to study the role (EGFR)-mediated activation of signaling protein phospholipase role in a cell’s decision to either proliferate or migrate. In 2010, Wang et al. (178) also took a multi-scale agent-based modeling approach to identify therapeutic targets in concurrent EGFR-TGFβ signaling pathway in non-small cell lung cancer (NSCLC). Later in 2011 (179), they employed the approach to identify critical molecular components in NSCLC. Similarly, Perfahl et al. (211) formulated a multi-scale vascular tumor growth model to investigate spatiotemporal regulations in cancer and response to therapy. In 2007, Anderson et al. (212) proposed a mathematical model for studying cancer growth, evolution, and invasion. This model was later built upon by Chaudhary et al. (165, 213) with a multi-scale modeling strategy to investigate tumorigenesis induced by mitochondrial incapacitation in cell death, in 2011. In 2017, Vavourakis et al. (214) developed a multi-scale model to investigate tumor angiogenesis and growth. In 2017, Norton et al. (215) used an agent-based computational model of triple-negative breast cancer to study the effects of migration in CCR5+ cancer cells, stem cell proliferation, and hypoxia on the system. They later (216) reported an agent-based and hybrid model to investigate tumor immune microenvironment, in 2019. In the same year, Karolak et al. ( 217) modeled in silico breast cancer organoid morphologies (218) to help elucidate efficacies amongst drug treatment based on the morphophenotypic classification. Similarly, Berrouet et al. (219) employed a multi-scale mathematical model to evaluate the effect of drug concentration on monolayers and spheroid cultures. Summarily, data-integrative computational models have now assumed the forefront in decoding the complex biomolecular regulations involved in cancer and are increasingly been employed for the development of personalized preclinical models as well as therapeutics design (220).

4 Software Platforms for Modeling in Cancer Systems Biology

Over the past decade, in silico modeling of cancer has gained significant popularity in systems biology research (162–168). In particular, data-drive computational models are now acting as an enabling technology for precision medicine and personalized treatment of cancer. To date, several single and multi-scale software platforms have been reported to model biomolecules (171, 221–228), cellular environments (229), as well as cell-level (230), and multi-scale (231) information into coherent in silico cancer models ( ).

Figure 3

Feature-by-feature comparison of networks, environments, cell lines, and multi-scale modeling software in chronological order.

4.1 Biomolecular Modeling Platforms

Software platforms aimed at modeling biomolecular entities, abstract information from published literature as well as high-throughput technologies, and model them using a Boolean modeling approach or a differential equations modeling strategy.

4.1.1 Boolean Modeling Software

Boolean network modeling technique was first introduced by Kauffman (232, 233), in 1969. This approach has been widely adopted as a tool to model gene, transcript, protein, and metabolite regulatory networks. Numerous mathematical and computational cancer models have been developed using this representation (171, 221–225). To facilitate the Boolean model development and analysis process, several platforms have been devised (234–236) ( ). The applicability of these platforms can be further categorized into qualitative or quantitative Boolean network modeling.

4.1.1.1 Qualitative Boolean Modeling

Biomolecular qualitative Boolean models are a widely employed approach in cancer systems biology research to cater for cases where there is insufficient quantitative information, and/or lack of mechanistic understanding. Thus far, numerous platforms have been reported to help researchers develop qualitative Boolean network models. Amongst them, FluxAnalyzer (237), reported by Klamt et al. in 2003, was developed to undertake metabolic pathway construction, flux optimization, topological feature detection, flux analysis, etc. To expand the scope of the platform and include cell signaling, gene as well as protein regulatory networks, in 2007, Klamt et al. expanded FluxAnalyzer and reported CellNetAnalyzer (226). Tian et al. (238) employed CellNetAnalyzer to develop a p53 network model for evaluating DNA damage in cancer, while Hetmanski et al. (239) designed a MAPK-driven feedback loop in Rho-A-driven cancer cell invasion. Although CellNetAnalyzer remains a widely used logical modeling software, its programmability and MATLAB dependency hinders its clinical employment for developing personalized cancer models. Towards addressing this challenge, in 2008, Albert et al. published BooleanNet (235), an open-source, freely available Boolean modeling software for large-scale simulations of dynamic biological systems. Saadatpoort et al. (240) employed BooleanNet’s general asynchronous (GA) method to deduce therapeutic targets for granular lymphocyte leukemia. Similarly, in 2008, Kachalo et al. presented NET-SYNTHESIS (227); a platform for undertaking network synthesis, inference, and simplification. Steinway et al. (241) employed both BooleanNet and NET-SYNTHESIS platforms to model epithelial-to-mesenchymal transition (EMT) in light of TFGβ cell signaling, in hepatocellular carcinoma patients towards elucidating potential therapeutic targets. BooleanNet was used to undertake model simulation and NET-SYNTHESIS for carrying out network interference and simplification. Another logical modeling platform, GIMsim (242), published by Naldi et al., in 2009, also employed asynchronous state transition graphs to perform qualitative logical modeling which is especially useful for networks with large state space. This platform was employed by Flobak et al. (243) to map cell fate decisions in gastric adenocarcinoma cell-line towards evaluating drug synergies for treatment purposes, while Remy et al. (244) studied mutually exclusive and co-occurring genetic alterations in bladder cancer. GIMsim also employed multi-valued logical functions, useful in simulating qualitative dynamical behavior in cancer research. However, the platform was unable to program automatic theoretical predictions, moreover, it only employed qualitative analysis approaches and could not be used to accurately map cell fates based on quantitative biomolecular expression data. Taken together, classical qualitative Boolean modeling approaches remain limited in developing predictive cancer models that could leverage quantitative biomolecular expression data generated from next-generation proteomics and related-sequencing projects.

4.1.1.2 Quantitative Boolean Modeling

Platforms aimed to integrate quantitative expression data from existing literature and databases towards carrying out network annotation and onwards analysis can be particularly useful in developing personalized cancer models. One such platform, the Markovian Boolean Stochastic Simulator (MaBoss), was established by Stoll et al. (245) in 2017, for stochastic and semi-quantitative Boolean network model development, mutations, and drug evaluation, sensitivity analysis based on experimental data, and eliciting model predictions. In 2019, Béal et al. (187) employed MaBoss to develop a logical model to evaluate breast cancer in light of individual patients’ genomic signature for personalized cancer medicine. This model was later expanded, in 2021, to investigate BRAF treatments in melanomas and colorectal cancer patients (188). Similarly, Kondratova et al. (246) used MaBoss to model an immune checkpoint network to evaluate the synergistic effects of combined checkpoint inhibitors in different types of cancers. In a similar attempt at developing quantitative Boolean networks, BoolNet (247) was developed in 2010 by Müssel and Kestler. BoolNet allows its users to reconstruct networks from time-series data, perform robustness and perturbation analysis and visualize the resultant cell fates attractor. BoolNet was employed by Steinway et al. (248) to construct a metabolic network model towards evaluating gut microbiome in normal and disease conditions, whereas, Cohen et al. (249) studied tumor cell invasion and migration. BoolNet, however, lacks a graphical user interface, and results from the analysis cannot be visualized interactively, which hindered its employment. To address this issue, Shah et al. (250), in 2018, developed an Attractor Landscape Analysis Toolbox for Cell Fate Discovery and Reprogramming (ATLANTIS). ATLANTIS has an intuitive graphical user interface and interactive result visualization feature, for ease in utilization. The platform can be employed to perform deterministic as well as probabilistic analysis and was validated through literature-based case studies on the yeast cell cycle (251), breast cancer (252), and colorectal cancer (253).

4.1.2 Differential Equations Modeling Software

Boolean models have proven to be a powerful tool in modeling complex biomolecular signaling networks, however, these models are unable to describe continuous concentration, and cannot be used to quantify the time-dependent behavior of biological systems, necessitating the need to switch to quantitative differential equations (254). As a result, numerous stand-alone and web-based tools have been developed to build continuous network models to help describe the temporal evolution of biomolecules towards elucidating more accurate cell fate outcomes from quantitative expression data ( ). Amongst initial attempts at developing such software, GEPASI (GEneral PAthway Simulator) was designed by Pedro Mendes et al. (228), in 1993. GEPASI is a stand-alone simulator that facilitates formulating mathematical models of biochemical reaction networks. GEPASI can also be used to perform parametric sensitivity analysis using an automatic pipeline that evaluates networks in light of exhaustive combinatorial input parameters. Ricci et al. (255) employed GEPASI to investigate the mechanism of action of anticancer drugs, while Marín-Hernández et al. (256) constructed kinetic models of glycolysis in cancer. In 2006, Hoops et al. reported a successor of GEPASI; COPASI (COmplex PAthway SImulator) (257), a user-friendly independent biochemical simulator that can handles larger networks for faster simulation results, through parallel computing. Orton et al. (258) employed COPASI to model cancerous mutations in EGFR/ERK pathway, while cellular senescence was evaluated by Pezze et al. (259) for targeted therapeutic interventions. Towards establishing a user-friendly software with an intuitive graphical user interface (GUI), another desktop application, CellDesigner was published by Funahashi et al. (260). CellDesigner application can be extended to include various simulation and analysis packages through integration with systems biology workbench (SBW) (261). In a case study using CellDesigner, Calzone et al. (262) developed a network of retinoblastoma protein (RB/RB1) and evaluated its influence in cell cycle, while Grieco et al. (263) investigated the impact of Mitogen-Activated Protein Kinase (MAPK) network on cancer cell fate outcomes.

4.2 Cell Environment Modeling Software

To facilitate the development of environmental models that can help investigate inter-, intra-, and extracellular interactions between cellular network models and their dynamic environment, several software and platforms have been reported (229, 264). These platforms employ discrete, continuous, and hybrid approaches to develop models of cellular microenvironments towards setting up specific biological contexts such as normoxia, hypoxia, Warburg effect, etc ( ). In 2014, Starruß et al. (229) published Morpheus, a platform for modeling complex tumor microenvironment. Morpheus leverages a cellular potts modeling approach to integrate and stimulate cell-based biomolecular systems for modeling intra- and extra cellular dynamics. In a case study using Morpheus, Felix et al. (265) evaluated pancreatic ductal adenocarcinoma’s adaptive and innate immune response levels, while Meyer et al. (266) investigated the dynamics of biliary fluid in the liver lobule. Morpheus is a widely employed modeling software, however, its diffusion solver is limited in its capacity to model large 3D domains. Towards modeling fast simulations for larger cellular systems, the Finite Volume Method for biological problems (BioFVM) software (264) was reported by Ghaffarizadeh et al., in 2016. BioFVM is an efficient transport solver for single as well as multi-cell biological problems such as excretion, decomposition, diffusion, and consumption of substrates, etc (267). In a case study using BioFVM, Ozik et al. (268) evaluated tumor-immune interactions, while Wang et al. (269) elucidated the impact of tumor-parenchyma on the progression of liver metastasis. BioFVM, however, relies on its users to have programming-based knowledge to develop their models, which limits its translational potential. Towards minimizing programming requirements, SALSA (ScAffoLd SimulAtor) was developed by Cortesi et al. (206) in 2020. SALSA is general-purpose software that employs a minimum programming requirement, a significant advantage over its predecessors. The platform has been useful in studying cellular diffusion in 3D cultures. This recent tool was validated in 2021, with a case study that evaluated and predicted therapeutic agents in 3D cell cultures (270).

4.3 Cell-Level Modeling Software

Towards modeling cancer cell-specific behaviors such as cellular adhesion, membrane transport, loss of cell polarity, etc several software have been reported which can help develop in silico cancer cell models ( ). The foremost endeavor to develop software for cell-level modeling and simulation, led to the establishment of E-Cell (230), in 1999. E-Cell can be employed to model biochemical regulations and genetic processes using biomolecular regulatory networks in cells. Edwards et al. (271) employed E-cell to predict the metabolic capabilities of Escherichia coli and validated the results using existing literature, while Orton et al. (272) modeled the receptor-tyrosine-kinase-activated MAPK pathway. This software was expanded in 2001, with the development of Virtual Cell (V-Cell) (273), a web-based general-purpose modeling platform. V-Cell has an intuitive graphical and mathematical interface that allows ease in the design and simulation of whole cells, along with sub-cellular biomolecular networks and the external environment. Neves et al. (274) employed V-Cell to investigate the flow of spatial information in cAMP/PKA/B-Raf/MAPK1,2 networks, while Calebiro et al. (275) modeled cell signaling by internalized G-protein–coupled receptors. Similarly, to increase the efficiency in the design and modeling of synthetic regulatory networks in cells, in 2009, Chandran et al. reported TinkerCell (276) with computer-aided design (CAD) functionality which enabled faster simulation and associated analysis. The platform employed a modular approach for constructing networks and provides built-in features for ease in network construction, robustness analysis, and evaluating networks using existing databases. The evolutionary trend for TinkerCell’s platform adaptability and flexibility is highlighted in , which shows a gradual shift from being a model-specific platform to a domain modeling one. In a study employing TinkerCell, Renicke et al. (277) constructed a generic photosensitive degron (psd) model to investigate protein degradation and cellular function, while Chandran et al. (278) reported computer-aided biological circuits.

Figure 4

Evolution of scale-specific and multi-scale software. Evolution of multi-scale modeling software for abstracting and simulating the spatiotemporal biomolecular complexity. Highlighting the need for a generic, data-driven, zero-code software requirement.

4.4 Multi-Scale Modeling Software

Multi-scale cancer modeling approaches bring together scale-specific information towards undertaking an integrative analysis of heterogeneous experimental data by building coherent and biologically plausible models (279–281). Several multi-scale modeling platforms have been reported to help develop multi-level cancer models (231) ( ). Amongst these, the REcursive Porous Agent Simulation Toolkit (Repast) (282) published in 2003, provides a free and open-source tool for modeling and simulating agent-based models, with high-performance computing (HPC) capability. Repast toolkit was employed in 2007, by Folcik et al. (283) to develop an agent-based model which was then used to study interactions between cells and the immune system. Similarly, Mehdizadeh et al. (284) used Repast to model angiogenesis in porous biomaterial scaffolds. Although Repast can be used for simulating several types of evolutionary trends between agents, there is no established guideline for selecting a mechanism to model such trends, hindering its use by naïve users. Moreover, Repast does not have a GUI or a software development kit (SDK) interface for implementing subcellular mechanisms e.g., gene, protein, and metabolic networks. In contrast, CompuCell (279), published in 2004 by Izaguirre et al., provides an elaborative GUI to model cell-scale or tissue-scale simulations by integrating biomolecular networks, intra- and extracellular environment, and cell to environment interactions. Mahoney et al. (285) employed CompuCell to develop an angiogenesis-based model in cancer for investigating novel cancer therapies. Although CompuCell provides an intuitive framework modeling paradigm, the platform core is not conducive to multi-scale cancer modeling. The focus of the software is primarily multi-agent simulations rather than multi-scale cancer modeling. This poses a significant challenge in the utilization of the software. In 2013, CHASTE (280) (Cancer Heart and Soft Tissue Environment) was launched, which provides a computational simulation pipeline for the mathematical modeling of complex multi-scale models. Users can employ CHASTE for a wide range of problems involving on and off-lattice modeling workflows. CHASTE has also previously been employed to model colorectal cancer crypts. Nonetheless, CHASTE does not have a GUI and can only be executed by command line text commands. Furthermore, it requires recompilation on part of the modeler to use the code updates performed by the group. To further improve the multi-scale modeling approach for investigating cancer, in 2013, Chaudhary et al. (281) published ELECANS (Electronic Cancer System). ELECANS had an intuitive but rigid GUI along with a programmable SDK besides the lack of a high-performance simulation engine (286, 287). ELECANS was employed to model the mitochondrial processes in cancer towards elucidating the hidden mechanisms involved in cell death (165). ELECANS provided a feature-rich environment for constructing multi-scale models, however, the platform lacked a biomolecular network integration pipeline, and also placed a heavy programming requirement on its users. In contrast, in 2018, PhysiCell (288) was reported for 2D and 3D multi-cell off-lattice agent-based simulations. PhysiCell is coupled with BioFVM (264)’s finite volume method to model multi-scale cancer systems (268). In a salient example, Wang et al. (269) employed PhysiCell to model liver metastatic progression. In 2019, PhysiCell’s agent-based modeling features and MaBoss’s Boolean cell signaling network feature were coupled together to develop an integrated platform, PhysiBoss (289). As a result, PhysiBoss provided an agent-based modeling environment to study physical dimension and cell signaling networks in a cancer model. In 2020, Colin et al. (290) employed PhysiBoss’s source code to model diffusion in oocytes during prophase 1 and meiosis 1, while Getz et al. (291) proposed a framework using PhysiBoss to develop a multi-scale model of SARS-CoV-2 dynamics in lung tissue. Recently, in 2021, Gondal et al. (292) reported Theatre for in silico Systems Oncology (TISON), a web-based multi-scale “zero-code” modeling and simulation platform for in silico oncology. TISON aims to develop single or multi-scale models for designing personalized cancer therapeutics. To exemplify the use case for TISON, Gondal et al. employed TISON to model colorectal tumorigenesis in Drosophila melanogaster’s midgut towards evaluating efficacious combinatorial therapies for individual colorectal cancer patients (225). Summarily, multi-scale modeling software has enabled the development of biologically plausible cancer models to varying degrees. These platforms, however, fall short of providing a generic and high-throughput environment that could be conveniently translated into clinical settings.

5 Pipelining Panomics Data towards In Silico ClinicalSystems Oncology

In silico multi-scale cancer models, annotated with patient-specific biomolecular and clinical data, can help decode complex mechanisms underpinning tumorigenesis and assist in clinical decision-making. Clinically driven in silico multi-scale cancer models simulate in vivo tumor growth and response to therapies across biocomplexity scales, within a clinical environment, towards evaluating efficacious treatment combinations. To facilitate the development of multi-scale cancer models, several large-scale program projects have been launched ( ) such as Advancing Clinico-Genomic Trials on Cancer (ACGT) (293), Clinically Oriented Translational Cancer Multilevel Modeling (ContraCancrum) (294), Personalized Medicine (p-medicine) (295), Transatlantic Tumor Model Repositories (TUMOR) (296), and Computational Horizons In Cancer (CHIC) (297), amongst others ( ). Here, we review and evaluate five salient projects for multi-scale cancer modeling towards their clinical deployment.

Figure 5

Salient projects pipelining multi-scale panomics data into clinical settings – a timeline. Timeline highlighting salient project platforms for developing realistic and clinically-driven multi-scale cancer models, along with their associated leading case studies.

5.1 ACGT Project

The ACGT project (293), launched in 2007, proposed to develop Clinico-Genomic infrastructure for organizing clinical and genomic data towards investigating personalized therapeutics regimens for an individual cancer patient. The ACGT platform provides an open-source and open-access infrastructure designed to support the development of “oncosimulators” to help clinicians accurately compare results from different clinical trials and enhance their efficiency towards optimizing cancer treatment. The ACGT framework employs molecular and clinical data generated from different sources including whole genome sequencing, histopathological, imaging, molecular, and clinical data, etc. to develop simulators for mimicking clinical trials. Personalized panomics data employed to develop oncosimulators in ACGT is extracted from real patients which enables the oncosimulators to be clinically relevant for predictive purposes. Additionally, ACGT provides data retrieval, storage, integrative, anonymization, and analysis as well as results presentation capabilities. Using the platform, in 2004, a personalized, spatiotemporal oncosimulator model of breast cancer was developed to mimic a clinical trial based on protocols outlined in the Trial of Principle (TOP), towards evaluating the model response to chemotherapeutic treatment in neoadjuvant settings (298). Similarly, in 2009, Graf et al. (299) modeled nephroblastoma oncosimulator, a childhood cancer of the kidney, based on a clinical trial run by the International Society of Paediatric Oncology (SIOP) for simulating tumor response to therapeutic regimens in clinical trials. The results generated from the TOP and SIOP trials enabled the ACGT oncosimulators to adapt in light of real clinical conditions and the software to be validated against multi-scale patient data. The focus of ACGT oncosimulators, however, is limited to existing clinical trials for predicting efficacious treatment combinations.

5.2 The ContraCancrum Project

Towards establishing a platform for the development of composite multi-scale models for simulating malignant tumor models, in 2008, ContraCancrum project (294) was initiated. ContraCancrum aimed to construct a multi-scale computational framework for translating personalized cancer models into clinical settings towards simulating malignant tumor development and response to therapeutic regimens. For that, the Individualized MediciNe Simulation Environment (IMENSE) platform (300, 301) was established, to undertake the oncosimulator development process. The platform was employed for molecular and clinical data storage, retrieval, integration, and analysis. Oncosimulators, developed under the ContraCancrum project, employ patient data across biologically and clinically relevant scales including molecular, environmental, cell, and tissues level. These oncosimulators can be used to optimize personalized cancer therapeutics for assisting clinician decision-making process. To date, several oncosimulators have been reported, under the ContraCancrum project (27, 294, 299, 301–305). In particular, the initial validation of the ContraCancrum workflow was performed using two case studies; glioblastoma multiforme (GBM) (294, 305) and non-small cell lung cancer (NSCLC) (294, 304). In 2010, Folarin and Stamatakos (306) designed a glioblastoma oncosimulator using personalized molecular patient data to evaluate treatment response under the effect of a chemotherapeutic drug (temozolomide) (300). Similarly, Roniotis et al. (305) developed a multi-scale finite elements workflow to model glioblastoma growth, while Giatili et al. (307) outlined explicit boundary condition treatment in glioblastoma using an in silico tumor growth model. The results from the case studies were validated by comparing in silico prediction with pre- and post-operative imaging and clinical data (294, 302). Moreover, The ContraCancrum project hosts more than 100 lung patients’ tumor and blood samples (294). This data is employed to develop clinically validated in silico multilevel cancer models for NSCLC using patient-specific data (304). In 2010, using a biochemical oncosimulator, Wan et al. (304) investigated the binding affinities for AEE788 and Gefitinib tyrosine kinase inhibitor against mutated epidermal growth factor receptor (EGFR) for NSCLC treatment. Similarly, Wang et al. (308) developed a 2D agent-based NSCLC model to investigate proliferation markers and evaluated ERK as a suitable target for targeted therapy. Although ContraCancrum’s project has created numerous avenues for multilevel cancer model development, the platform is limited to only specific types of cancer for which data is internally available on the platform.

5.3 p-Medicine Project

Towards improving clinical deployment capabilities of oncosimulators, another pilot project: the personalized medicine (p-medicine) project was launched in 2011 (295). The main aim of p-medicine was to create biomedical tools facilitating translation of current medicine to P5 medicine (predictive, personalized, preventive, participatory, and psycho-cognitive) (309). For that, the p-medicine portal provides a web-based environment that hosts specifically-purpose tools for personalized panomics data integration, management, and model development. The portal has an intuitive graphical user interface (GUI) with an integrated workbench application for integrating information from clinical practices, histopathological imaging, treatment, and omics data, etc. Computational models developed under p-medicine workflow, are quantitatively adapted to clinical settings since they are derived using real multi-scale data. Several multi-scale cancer simulation models (oncosimulators) (295, 309) have been devised, using the p-medicine workflow. Amongst these, in 2012, Georgiadi et al. (310) developed a four-dimensional nephroblastoma treatment model and evaluated its employment in clinical decisions making. Towards evaluating personalized therapeutic combinations, in 2014, Blazewicz et al. (311) reported a p-medicine parallelized oncosimulator which evaluated nephroblastoma tumor response to therapy. The parallelization enhanced model usability and accuracy for eventual translation into clinical settings for supporting clinical decisions. In 2016, Argyri et al. (312) developed a breast cancer oncosimulator to evaluate vascular tumor growth in light of single-agent bevacizumab therapy (anti-angiogenic treatment), while in 2014, Stamatakos et al. (313) evaluated breast cancer treatment under an anthracycline drug for chemotherapy (epirubicin). In 2012, Ouzounoglou et al. (314) designed a personalized acute lymphoblastic leukemia (ALL) oncosimulator for evaluating the efficacy of prednisolone (a steroid medication). This study was further augmented, in 2015, by Kolokotroni et al. (315) to investigate the potential cytotoxic side effects of prednisolone. Later in 2017, Ouzounoglou et al. (316) expanded the ALL oncosimulator model to a hybrid oncosimulator for predicting pre-phase treatments for ALL patients. The validation of these models was undertaken using clinical trial data in pre and post-treatment (317). Although p-medicine presents a state-of-the-art in silico multi-scale cancer modeling environment, the project is limited in its application for determining individual biomarkers for potential novel therapy identification. Moreover, the project is limited to its niche set of tools and models that cannot be integrated with other similar model repositories and platforms.

5.4 TUMOR Project

In 2012, a transatlantic USA-EU partnership was initiated with the launch of Transatlantic Tumor Model Repositories (TUMOR) (296). TUMOR provides an integrated, interoperable transatlantic research environment for developing a clinically driven cancer model repository. The repository aimed to integrate computational cancer models developed by different research groups. TUMOR’s transatlantic aim was to couple models from ACGT and ContraCancrum projects with those at the Center for the Development of a Virtual Tumor (CViT) (318) as well as other relevant centers. TUMOR is predicted to serve as an international clinical translation, interoperable and validation platform for in silico oncology hosting multi-scale cancer models from different cancer model repositories and platforms such as E-Cell (230), CellML (319), FieldML (320), and BioModels (35, 321). To support model data interoperability between platforms, TumorML (Tumor model repositories Markup Language) (321) was developed towards facilitating inter-data operability in the TUMOR project. The TUMOR environment also offers a wide range of additional services supporting predictive oncology and individualized optimization of cancer treatment. For example, the platform allows remote access of predictive cancer models in hospitals and to clinicians for the development of quantitative cancer research and personalized cancer therapy model. TUMOR environment also incorporates deterministic as well as stochastic models through COPASI simulator (257). An automatic validation pipeline is also embedded for the execution and deployment of these models in clinical settings (296). TUMOR’s ability to couple and integrate models from different scales, approaches, and repositories towards increasing model accuracy in predictive oncology was exemplified with Wang et al.’s (322) NSCLC model. In 2007, Wang et al. developed a 2D multi-scale NSCLC model for evaluating tumor expansion dynamics in NSCLC patients, in CViT. This model was exported in TumorML and made available in the TUMOR repository to help evaluate growth factor influence in aggressive cancer (321). Similarly, in 2014 Sakkalis et al. (323) employed the TUMOR platform to interlink and couple three independent glioblastoma-specific cancer models (EGFR signaling (324), cancer metabolism (325), Oncosimulator (323), reported by different research groups. The resultant model was used to investigate the impact of radiation and temozolomide (chemotherapy) on glioblastoma multiforme to evaluate treatment effectiveness.

5.5 CHIC Project

Another transatlantic project Computational Horizons In Cancer (CHIC) (297) launched in 2013, aimed to provide an oncosimulator modeling platform for in silico oncology. CHIC was initiated to develop and implement predictive oncology and individualized multi-scale cancer modeling tools towards assisting quantitative cancer research and personalized cancer therapy. The workflow and tools established under the ambient of the CHIC project allowed the development of robust, interoperable, and collaborative in silico models in cancer and normal conditions. The CHIC project also proposed a pipeline to translate the model towards supporting clinicians to make optimal personalized treatment plans for individual patients. To this end, several models are created using CHIC such as non-small cell lung cancer (326), glioblastoma (327), leukemia model (328), etc. These models furnish a quantitative understanding of tumorigenesis towards providing avenues for promoting individual cancer patient treatment combinations. One such model reported by Kolokotroni et al. (326) evaluated the efficacy of cisplatin-based therapy for NSCLC patients using in silico multi-scale cancer model. While, in 2015, Antonopoulos and Stamatakos (327) modeled the infiltration of glioblastoma cells in normal brain regions using a novel treatment. In 2017, Stamatakos and Giatili (329) extended glioblastoma oncosimulator for modeling tumor growth using reaction-diffusion numerical handling based on multi-scale panomics data. They further proposed a clinical pipeline to translate the model into clinical settings towards supporting clinicians to make optimal personalized treatment plans for individual patients. Ouzounoglou et al. (328) developed an in silico multi-scale leukemia oncosimulator model towards modeling deregulations in the G1/S pathway to investigate the altered function of retinoblastoma in ALL patients. However, a clinical translation of these models is currently in the works (297).

6 Discussion

In this work, we have evaluated the use of data-driven multi-scale cancer models in deciphering complex biomolecular underpinnings in cancer research towards developing personalized therapeutics interventions for clinical decision-making. Specifically, we have discussed the chronological evolution of online cancer data repositories that host high-resolution datasets from multiple spatiotemporal scales. Next, we evaluate how this data drives single- and multi-scale systems biology models towards decoding complex cancer regulation in patients. We then track the development of various modeling software and associated applications in enhancing the translational role of cancer systems biology models in clinics. We conclude that the contemporary multi-scale modeling software line-up remains limited in their clinical employment due to the lack of a generic, zero-code, panomic-based framework for translating research models into clinical settings. Such a framework would help annotate in silico cancer models developed using single and multi-scale databases (45, 61, 330–334). The framework should also provide an environment for developing the extra-cellular matrix of a cancer cell which can then be integrated into cellular models. Existing environment (38, 39) and cell line databases (41–43) need to be integrated to design environmental models along with biologically plausible cell line structures. These cell lines could then be assembled into three-dimensional geometries to create multi-scale in silico organoids. The pipeline should also furnish capabilities such as a convenient import workflow for clinical data integration through histopathological image data repositories (44–46) for designing biologically accurate organoid structures based on each cancer patient’s underlying cellular morphology. Once the personalized multi-scale model has been constructed, the pipeline would allow investigation into the temporal evolution of the multi-scale organoid under personalized inputs and user-designed biomolecular entities. Data generated from the multi-scale model simulation can be analyzed to elicit biomolecular cues for each cancer patient as well as determine its role. Due to the heterogeneity and complexity of biomolecular data a coherent panomics-based pipeline can be challenging to develop and will require a collaborative effort by various research groups, through close collaborations and data standardization. Taken together, a translational in silico systems oncology pipeline is the need of the hour and will help develop and deliver personalized treatments of cancer as well as substantively inform clinical decision-making processes.

Author Contributions

SC and MG carried out the literature review and drafted the manuscript. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National ICT-R&D Fund (SRG-209), RF-NCBC-015, NGIRI-2020-4771, HEC (21-30SRGP/R&D/HEC/2014, 20-2269/NRPU/R&D/HEC/12/4792 and 20-3629/NRPU/R&D/HEC/14/585), TWAS (RG 14-319 RG/ITC/AS_C) and LUMS (STG-BIO-1008, FIF-BIO-2052, FIF-BIO-0255, SRP-185-BIO, SRP-058-BIO and FIF-477-1819-BIO) grants.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

309 in total

1. The morphologies of breast cancer cell lines in three-dimensional assays correlate with their profiles of gene expression.

Authors: Paraic A Kenny; Genee Y Lee; Connie A Myers; Richard M Neve; Jeremy R Semeiks; Paul T Spellman; Katrin Lorenz; Eva H Lee; Mary Helen Barcellos-Hoff; Ole W Petersen; Joe W Gray; Mina J Bissell
Journal: Mol Oncol Date: 2007-06 Impact factor: 6.603

2. The Massachusetts General Hospital (MGH) Hairpulling Scale: 2. reliability and validity.

Authors: R L O'Sullivan; N J Keuthen; C F Hayday; J N Ricciardi; M L Buttolph; M A Jenike; L Baer
Journal: Psychother Psychosom Date: 1995 Impact factor: 17.659

3. Network modeling of TGFβ signaling in hepatocellular carcinoma epithelial-to-mesenchymal transition reveals joint sonic hedgehog and Wnt pathway activation.

Authors: Steven Nathaniel Steinway; Jorge G T Zañudo; Wei Ding; Carl Bart Rountree; David J Feith; Thomas P Loughran; Reka Albert
Journal: Cancer Res Date: 2014-09-04 Impact factor: 12.701

4. A quantitative model for differential motility of gliomas in grey and white matter.

Authors: K R Swanson; E C Alvord; J D Murray
Journal: Cell Prolif Date: 2000-10 Impact factor: 6.831

5. Biomolecular network reconstruction identifies T-cell homing factors associated with survival in colorectal cancer.

Authors: Bernhard Mlecnik; Marie Tosolini; Pornpimol Charoentong; Amos Kirilovsky; Gabriela Bindea; Anne Berger; Matthieu Camus; Mélanie Gillard; Patrick Bruneval; Wolf-Herman Fridman; Franck Pagès; Zlatko Trajanoski; Jérôme Galon
Journal: Gastroenterology Date: 2009-11-10 Impact factor: 22.682

6. Boolean network simulations for life scientists.

Authors: István Albert; Juilee Thakar; Song Li; Ranran Zhang; Réka Albert
Journal: Source Code Biol Med Date: 2008-11-14

7. Big Data: Astronomical or Genomical?

Authors: Zachary D Stephens; Skylar Y Lee; Faraz Faghri; Roy H Campbell; Chengxiang Zhai; Miles J Efron; Ravishankar Iyer; Michael C Schatz; Saurabh Sinha; Gene E Robinson
Journal: PLoS Biol Date: 2015-07-07 Impact factor: 8.029

Review 8. In Silico Neuro-Oncology: Brownian Motion-Based Mathematical Treatment as a Potential Platform for Modeling the Infiltration of Glioma Cells into Normal Brain Tissue.

Authors: Markos Antonopoulos; Georgios Stamatakos
Journal: Cancer Inform Date: 2015-08-10

Review 9. Integrating Multiscale Modeling with Drug Effects for Cancer Treatment.

Authors: Xiangfang L Li; Wasiu O Oduola; Lijun Qian; Edward R Dougherty
Journal: Cancer Inform Date: 2016-01-13

10. In Silico Oncology: Quantification of the In Vivo Antitumor Efficacy of Cisplatin-Based Doublet Therapy in Non-Small Cell Lung Cancer (NSCLC) through a Multiscale Mechanistic Model.

Authors: Eleni Kolokotroni; Dimitra Dionysiou; Christian Veith; Yoo-Jin Kim; Jörg Sabczynski; Astrid Franz; Aleksandar Grgic; Jan Palm; Rainer M Bohle; Georgios Stamatakos
Journal: PLoS Comput Biol Date: 2016-09-22 Impact factor: 4.475