Literature DB >> 28492066

Progression inference for somatic mutations in cancer.

Leif E Peterson1,2,3,4,5, Tatiana Kovyrshina1,6.   

Abstract

Computational methods were employed to determine progression inference of genomic alterations in commonly occurring cancers. Using cross-sectional TCGA data, we computed evolutionary trajectories involving selectivity relationships among pairs of gene-specific genomic alterations such as somatic mutations, deletions, amplifications, downregulation, and upregulation among the top 20 driver genes associated with each cancer. Results indicate that the majority of hierarchies involved TP53, PIK3CA, ERBB2, APC, KRAS, EGFR, IDH1, VHL, etc. Research into the order and accumulation of genomic alterations among cancer driver genes will ever-increase as the costs of nextgen sequencing subside, and personalized/precision medicine incorporates whole-genome scans into the diagnosis and treatment of cancer.

Entities:  

Keywords:  Cancer research; Computational biology; Genetics; Oncology

Year:  2017        PMID: 28492066      PMCID: PMC5415494          DOI: 10.1016/j.heliyon.2017.e00277

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Tumors evolve through a multi-step mutagenic process for which cells acquire resistance to apoptotic and antiproliferative signals, self-sufficiency in growth signals, and unlimited proliferation potential [1]. Tumor cells also activate glycolysis and deactivate oxidative phosphorylation (Warburg effect), evade immune detection by decreasing pH with lactic acid, increase ROS, exhibit chromosome aberrations and telomere shortening, negotiate life support from stromal cells, activate invasion and motility, and coordinate neovascularization [2], [3]. Cancer avoids extinction through genetic selection because its onset commonly occurs at older ages – beyond reproductive years. If cancer mortality rates decreased with age, cancer would be selected out due to zero fitness within several generations. Genomic instability and the seeding of a mutator phenotype is another hallmark, which can receive positive feedback from clonal expansion of a single pathogenic mutation in key stability pathways, such as DNA repair, replication, or cell-cycle checkpoints [4]. Within several DNA replications, a cascade of mutations ensues, sacrificing stability in the entire genome. While tumors evolve as a consequence of the accumulation of somatic lesions, it is unclear how mutated genes interact to generate the phenotypic hallmarks of cancer. The accumulation of somatic mutations in tumors is rarely detectable during early stages of development, and difficult to detect at later stages because of the high mutational load [5]. From a population genetics standpoint, homozygous mutations in both alleles of mismatch-repair genes result in a 100-fold increase in mutation rates [6], [7], [8], [9], but have zero fitness and won't survive [10]. On the other hand, heterozygosity confers a 5–10 fold increase in mutations [11], [12], which in the presence of activated DNA repair pathways will result in a tolerable selective advantage. The foundation of computational biology for tumor progression was introduced by Vogelstein et al., who introduced the notion of trajectories of progression during which tumor progression involved activation of mutations which progressed toward subpopulations of cells having clonal origin [13]. Attolini et al. [14] introduced a bioinformatic approach called RESIC (Retracing the Evolutionary Steps in Cancer) for deducing the temporal sequence of genetic events during tumorigenesis from cross-sectional genomic data of tumors. Several nextgen sequencing datasets were employed consisting of 70 advanced colorectal cancers, 91 primary human glioblastomas, and 57 acute myelogenous leukemias. In the colorectal cancers, RESIC accurately predicted the temporal sequence of APC, KRAS, and TP53 mutations, which was in agreement with order determined through analyzing tumors at different stages of colon cancer formation. For GBM tumors, it was observed that TP53 was the first gene showing selective pressure for somatic mutations, and in the AML samples, JAK2 and TET2 were the first genes to exhibit selective pressure. Youn and Simon [15] employed a highly-parameterized likelihood-based approach for inferring order of mutational steps in genes, using nextgen sequencing data for 188 lung cancers [16] and 133 colorectal tumors [17]. Results indicated that KRAS, EGFR, and TP53 were among the first genes showing selective pressure in the lung tumors, while for colorectal tumors selective pressure first appeared in APC, KRAS, and TP53. Lecca et al. [18] describe the TO-DAG (Timed Oncogenetic Directed Acyclic Graph) algorithm applied to 74 human prostate cancer samples that include point mutations, copy number losses and gains, and rearrangements. Gerstung et al. [19] developed a computational approach to infer TO-DAGs from human tumor mutation data, and determined that TO-DAG shows high performance scores on synthetic data and recognizes mutations in gatekeeper tumor suppressor genes as a trigger for several downstream mutational events in the human tumor data. The models generated by TO-DAG have been extensively compared with the trees and the graphs inferred by most recent tools representative of the RESIC [14] and CT-CBN [19]. Kang et al. [20] introduced a parametric approach to estimate the sequential order of gene mutations during tumorigenesis from genome sequencing data based on a Markov chain model as TOMC (Temporal Order based on Markov Chain). TOMC revealed that tumor suppressor genes tend to be mutated ahead of oncogenes, which are considered as important events for key functional loss and gain during tumorigenesis. A larger workflow approach was used to develop CAPRI [21], which generates acyclic graphs to capture branched, independent and confluent evolution via bootstrapping, shrinkage, maximum likelihood, and regularization. Caravagna et al. [22] reported on the PiCnIc pipeline, which incorporates CAPRI, and is a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. This investigation focuses on development of cancer progression models derived from cross-sectional genomic data. The models employed focus on selectivity relationships between mutational events, such that event j selects for event k, resulting in a weighted directed acyclic graph (WDAG) of alterations representing the accumulation of events under selective pressure during cancer progression. We also introduce a permutation-based affinity metric (PBAM) approach, which is an iterative learning method that combines multi-tumor co-occurrence event statistics and within-tumor order permutations to extract affinity relationships between all possible pairs of events. The affinity relationships are then filtered by probabilistic causation conditions based on temporality and probability raising, which are not admissible. The temporality condition assumes that, if event j occurs earlier than event k, then event j will occur more frequently than event k, that is . Whereas, the probability raising condition assumes that the occurrence of event k increases the occurrence of event j, i.e., . Selectivity relationships stem from the idea that, during tumor clonal evolution, there is a selective advantage of certain genomic alterations which increase the probability of subsequent downstream events. The pattern that emerges is one for which greater frequencies of driver events subsume a mixture of later events, above some background noise level. Estimating selectivity relationships among genomic alterations is important because early events may represent important therapeutic targets, while late mutations may play a role in metastases. Investigation of selectivity relationships can also deepen our understanding of tumorigenesis at the genome sequence level, and may help to elucidate functional roles of genomic alterations.

Cancer data

The cross-sectional data used in this investigation were derived from nextgen sequencing of tumors in the The Cancer Genome Atlas (TCGA) [23]. We investigated mutational events in 10 cancers (Table 1) for which nextgen sequencing and RNA-seq expression data were available from cBio-Portal (http://www.cbioportal.org) [24], [25].
Table 1

Genomic annotation for driver genes employed, including Gene Ontology nomenclature and molecular pathway.

CancerSymbolChrGO:Biol. ProcessGO:Cell. ComponentGO:Mol. FunctionPathway
Acute Myeloid LeukemiaCEBPA19urea cyclenucleusRNA polymerase II core promoter proximal region sequence-specific DNA bindingNon-alcoholic fatty liver disease (NAFLD)
n = 200DIS313rRNA processingnuclear exosome (RNase complex)3'–5'-exoribonuclease activityRNA degradation
DNMT3A2negative regulation of transcription from RNA polymerase II promoterchromosomeDNA bindingCysteine and methionine metabolism
KIT4MAPK cascadeacrosomal vesicleprotease bindingRas signaling pathway
KRAS12MAPK cascadeintracellularGTPase activityMAPK signaling pathway
RAD218double-strand break repairnuclear chromosometranscriptional activator activityCell cycle
SUZ1217negative regulation of transcription from RNA polymerase II promotersex chromatinRNA polymerase II core promoter sequence-specific DNA binding
U2AF121mRNA splicingnucleoplasmnucleotide bindingSpliceosome
WT111negative regulation of transcription from RNA polymerase II promoternucleustranscriptional activator activityTranscriptional misregulation in cancer
FLT313leukocyte homeostasisnucleustransmembrane receptor protein tyrosine kinase activityCytokine–cytokine receptor interaction
IDH12glyoxylate cyclecytoplasmmagnesium ion bindingCitrate cycle (TCA cycle)
IDH215carbohydrate metabolic processmitochondrionmagnesium ion bindingCitrate cycle (TCA cycle)
NRAS1MAPK cascadeGolgi membraneGTP bindingMAPK signaling pathway
NPM11, 10, 15, 2, 5, 8DNA repairnucleusnucleic acid binding
PHACTR16actomyosin structure organizationnucleusactin binding
PTPN1112, 3, 4DNA damage checkpointnucleusphosphoprotein phosphatase activityRas signaling pathway
PTPRT20protein dephosphorylationplasma membraneprotein tyrosine phosphatase activity
RUNX121ossificationnucleusregulatory region DNA bindingPathways in cancer
TET24kidney developmentnucleussulfonate dioxygenase activity
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
Brain Lower Grade GliomaATRXXDNA repairnuclear chromosomeDNA binding
n = 530SETD23angiogenesisnucleusprotein bindingLysine degradation
TMEM189-UBE2V120, 3protein polyubiquitinationubiquitin ligase complexubiquitin protein ligase binding
CACNA1S1calcium ion transportcytoplasmvoltage-gated calcium channel activityMAPK signaling pathway
CIC19negative regulation of transcription from RNA polymerase II promoternucleusDNA binding
CDC2717metaphase/anaphase transition of mitotic cell cyclenucleusprotein bindingCell cycle
CHEK215, 22, YDNA damage checkpointchromosomeprotein kinase activityCell cycle
EGFR7MAPK cascadeGolgi membraneglycoprotein bindingMAPK signaling pathway
IDH12glyoxylate cyclecytoplasmmagnesium ion bindingCitrate cycle (TCA cycle)
IDH215carbohydrate metabolic processmitochondrionmagnesium ion bindingCitrate cycle (TCA cycle)
KRTAP1-517keratin filament
NF117MAPK cascadenucleusGTPase activator activityMAPK signaling pathway
NOTCH19negative regulation of transcription from RNA polymerase II promoterGolgi membranecore promoter bindingDorso-ventral axis formation
PTEN10, 9regulation of cyclin-dependent protein serine/threonine kinase activityextracellular regionmagnesium ion bindingInositol phosphate metabolism
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
PIK3R15cellular glucose homeostasisnucleustransmembrane receptor protein tyrosine kinase adaptor activityErbB signaling pathway
PLCG120activation of MAPKK activityrufflephosphatidylinositol phospholipase C activityInositol phosphate metabolism
PTPN1112, 3, 4DNA damage checkpointnucleusphosphoprotein phosphatase activityRas signaling pathway
STK196protein phosphorylationnucleusprotein serine/threonine kinase activity
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
Breast Invasive CarcinomaAKT114protein import into nucleusnucleusprotein kinase activityMAPK signaling pathway
n = 1105ARID1A1negative regulation of transcription from RNA polymerase II promoternuclear chromatinDNA binding
CTCF16negative regulation of transcription from RNA polymerase II promoterchromosomeRNA polymerase II core promoter proximal region sequence-specific DNA binding
GATA310negative regulation of transcription from RNA polymerase II promoternuclear chromatintranscription regulatory region sequence-specific DNA binding
SH3PXD2A10superoxide metabolic processpodosomeprotein binding
CDH116homophilic cell adhesion via plasma membrane adhesion moleculesextracellular regionglycoprotein bindingRap1 signaling pathway
ERBB217MAPK cascadenucleusRNA polymerase I core bindingErbB signaling pathway
FOXA114negative regulation of transcription from RNA polymerase II promoternucleusRNA polymerase II transcription factor activity
ITPR13response to hypoxianuclear inner membraneion channel activityCalcium signaling pathway
KMT2C7transcriptionnucleusDNA bindingLysine degradation
MAP2K417apoptotic processnucleusprotein kinase activityMAPK signaling pathway
MAP3K15MAPK cascadecytoplasmprotein kinase activityMAPK signaling pathway
NR1H219negative regulation of transcription from RNA polymerase II promoternucleusRNA polymerase II core promoter proximal region sequence-specific DNA bindingInsulin resistance
OR5P211G-protein coupled receptor signaling pathwayplasma membraneG-protein coupled receptor activityOlfactory transduction
PTEN10, 9regulation of cyclin-dependent protein serine/threonine kinase activityextracellular regionmagnesium ion bindingInositol phosphate metabolism
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
PIK3R15cellular glucose homeostasisnucleustransmembrane receptor protein tyrosine kinase adaptor activityErbB signaling pathway
RUNX121ossificationnucleusregulatory region DNA bindingPathways in cancer
TPRX119regulation of transcriptionnucleusDNA binding
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
Colorectal AdenocarcinomaAPC5mitotic cytokinesiskinetochoreprotein bindingWnt signaling pathway
n = 633BRAF7MAPK cascadenucleusprotein kinase activityMAPK signaling pathway
FBXW74protein polyubiquitinationnucleoplasmubiquitin-protein transferase activityUbiquitin mediated proteolysis
KRAS12MAPK cascadeintracellularGTPase activityMAPK signaling pathway
SMAD418negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingFoxO signaling pathway
TBP6DNA-templated transcriptionnuclear chromatinRNA polymerase II core promoter proximal region sequence-specific DNA bindingBasal transcription factors
ARXin utero embryonic developmentnuclear chromatinRNA polymerase II core promoter proximal region sequence-specific DNA bindingOocyte meiosis
ATXN16transcriptionnucleusDNA binding
CACNA1B9transportplasma membranevoltage-gated calcium channel activityMAPK signaling pathway
CTNNB13negative regulation of transcription from RNA polymerase II promoterspindle poleRNA polymerase II transcription factor bindingRap1 signaling pathway
GRIA24signal transductionendoplasmic reticulum membraneionotropic glutamate receptor activitycAMP signaling pathway
IRF57transcriptionnucleusregulatory region DNA bindingToll-like receptor signaling pathway
KRT112complement activationextracellular spacereceptor activity
LAMC39cell morphogenesis involved in differentiationextracellular regionstructural molecule activityPI3K-Akt signaling pathway
NRAS1MAPK cascadeGolgi membraneGTP bindingMAPK signaling pathway
NEFH22microtubule cytoskeleton organizationcytoplasmstructural molecule activityAmyotrophic lateral sclerosis (ALS)
OPRD11protein import into nucleuscytoplasmopioid receptor activitycGMP-PKG signaling pathway
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
PPM1E17negative regulation of protein kinase activitynucleusprotein serine/threonine phosphatase activity
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
Renal Clear CellARAP35cytoskeleton organizationruffleGTPase activator activityRap1 signaling pathway
n = 538BAP13regulation of cell growthintracellularchromatin binding
GPR3219complement receptor mediated signaling pathwayplasma membranecomplement receptor activity
SETD23angiogenesisnucleusprotein bindingLysine degradation
SRGAP33signal transductioncytoplasmGTPase activator activityAxon guidance
SSX3Xtranscriptionintracellularnucleic acid binding
ACACA17tissue homeostasisnucleolusacetyl-CoA carboxylase activityFatty acid biosynthesis
BTRC10G2/M transition of mitotic cell cyclenucleusubiquitin-protein transferase activityOocyte meiosis
CDC2717metaphase/anaphase transition of mitotic cell cyclenucleusprotein bindingCell cycle
FAM104A17protein binding
FAM151A1membrane
HEBP112circadian rhythmextracellular regionheme binding
KLK119proteolysisnucleusserine-type endopeptidase activityRenin-angiotensin system
OVGP11carbohydrate metabolic processextracellular regionchitinase activity
PTEN10, 9regulation of cyclin-dependent protein serine/threonine kinase activityextracellular regionmagnesium ion bindingInositol phosphate metabolism
PABPC112, 8nuclear-transcribed mRNA catabolic processnucleusnucleotide bindingRNA transport
PBRM13chromatin remodelingnuclear chromosomeDNA binding
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
VHL3negative regulation of transcription from RNA polymerase II promoternucleusubiquitin-protein transferase activityHIF-1 signaling pathway
ZNF17519transcriptionintracellularnucleic acid binding
Lung AdenocarcinomaATM11DNA damage checkpointchromosomeDNA bindingNF-kappa B signaling pathway
n = 522BRAF7MAPK cascadenucleusprotein kinase activityMAPK signaling pathway
CREBBP16negative regulation of transcription from RNA polymerase II promoterhistone acetyltransferase complexcore promoter proximal region sequence-specific DNA bindingcAMP signaling pathway
KRAS12MAPK cascadeintracellularGTPase activityMAPK signaling pathway
LRP1B2receptor-mediated endocytosisintegral component of membranecalcium ion binding
SOS12MAPK cascadeintracellularDNA bindingMAPK signaling pathway
U2AF121mRNA splicingnucleoplasmnucleotide bindingSpliceosome
CHEK215, 22, YDNA damage checkpointchromosomeprotein kinase activityCell cycle
DMDXpositive regulation of cell-matrix adhesionnucleusdystroglycan bindingHypertrophic cardiomyopathy (HCM)
EGFR7MAPK cascadeGolgi membraneglycoprotein bindingMAPK signaling pathway
FLG1multicellular organism developmentnucleusstructural molecule activity
KEAP119in utero embryonic developmentnucleoplasmubiquitin-protein transferase activityUbiquitin mediated proteolysis
NF117MAPK cascadenucleusGTPase activator activityMAPK signaling pathway
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
RYR21response to hypoxiacellryanodine-sensitive calcium-release channel activityCalcium signaling pathway
STK1119regulation of cell growthnucleusmagnesium ion bindingFoxO signaling pathway
SPTA11MAPK cascadecytosolRas guanyl-nucleotide exchange factor activity
TNN1cell-matrix adhesionproteinaceous extracellular matrixintegrin bindingPI3K-Akt signaling pathway
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
USH2A1visual perceptionphotoreceptor inner segmentprotein binding
Ovarian Serous CarcinomaKIT4MAPK cascadeacrosomal vesicleprotease bindingRas signaling pathway
n = 603ACACB12acetyl-CoA metabolic processnucleusacetyl-CoA carboxylase activityFatty acid biosynthesis
ANK18exocytosisnucleusstructural molecule activityProteoglycans in cancer
CDH1116skeletal system developmentcytoplasmcalcium ion binding
COL4A42extracellular matrix organizationextracellular regionextracellular matrix structural constituentPI3K-Akt signaling pathway
COL6A63cell adhesionextracellular regionPI3K-Akt signaling pathway
CYP4A111long-chain fatty acid metabolic processcytoplasmmonooxygenase activityFatty acid degradation
EGFR7MAPK cascadeGolgi membraneglycoprotein bindingMAPK signaling pathway
GRIN2B12MAPK cascadeintracellularNMDA glutamate receptor activityRas signaling pathway
GNPAT1glycerophospholipid metabolic processmitochondrionreceptor bindingGlycerophospholipid metabolism
IL21R16natural killer cell activationintegral component of membraneinterleukin-21 receptor activityCytokine–cytokine receptor interaction
KAT6B10nucleosome assemblynucleosomeDNA binding
MYH1317muscle contractionmuscle myosin complexmicrofilament motor activityTight junction
MYH217plasma membrane repairGolgi apparatusmicrofilament motor activityTight junction
NF222mesoderm formationruffleactin bindingHippo signaling pathway
PLCH13lipid catabolic processcytoplasmphosphatidylinositol phospholipase C activityInositol phosphate metabolism
KCNQ56protein complex assemblyplasma membraneinward rectifier potassium channel activityCholinergic synapse
SNTG18cell communicationnucleusactin binding
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
ZAN7binding of sperm to zona pellucidaplasma membrane
Prostate AdenocarcinomaAPC5mitotic cytokinesiskinetochoreprotein bindingWnt signaling pathway
n = 499BRAF7MAPK cascadenucleusprotein kinase activityMAPK signaling pathway
POLI18DNA replicationintracellulardamaged DNA bindingFanconi anemia pathway
EP30022negative regulation of transcription from RNA polymerase II promoterhistone acetyltransferase complexRNA polymerase II core promoter sequence-specific DNA bindingcAMP signaling pathway
RGPD82protein targeting to GolgiintracellularRan GTPase bindingRNA transport
CACNA1A19sulfur amino acid metabolic processnucleusion channel activityMAPK signaling pathway
CDC2717metaphase/anaphase transition of mitotic cell cyclenucleusprotein bindingCell cycle
CHEK215, 22, YDNA damage checkpointchromosomeprotein kinase activityCell cycle
FOXA114negative regulation of transcription from RNA polymerase II promoternucleusRNA polymerase II transcription factor activity
GRIK31adenylate cyclase-inhibiting G-protein coupled glutamate receptor signaling pathwayplasma membraneadenylate cyclase inhibiting G-protein coupled glutamate receptor activityNeuroactive ligand-receptor interaction
IDH12glyoxylate cyclecytoplasmmagnesium ion bindingCitrate cycle (TCA cycle)
KRTAP4-1117keratin filamentprotein binding
MSH35meiotic mismatch repairnuclear chromosomedamaged DNA bindingMismatch repair
PTEN10, 9regulation of cyclin-dependent protein serine/threonine kinase activityextracellular regionmagnesium ion bindingInositol phosphate metabolism
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
SPOP17regulation of proteolysisnucleusprotein binding
SYNE16nucleus organizationnucleusactin binding
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
FRG1B4
KMT2C7transcriptionnucleusDNA bindingLysine degradation
Stomach AdenocarcinomaARID1A1negative regulation of transcription from RNA polymerase II promoternuclear chromatinDNA binding
n = 478FAT44branching involved in ureteric bud morphogenesisintracellularcalcium ion binding
KRAS12MAPK cascadeintracellularGTPase activityMAPK signaling pathway
LRP1B2receptor-mediated endocytosisintegral component of membranecalcium ion binding
RGPD42protein targeting to GolgiintracellularRNA transport
SMAD418negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingFoxO signaling pathway
SMARCA419negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II core promoter proximal region sequence-specific DNA binding
CDH116homophilic cell adhesion via plasma membrane adhesion moleculesextracellular regionglycoprotein bindingRap1 signaling pathway
CTNNB13negative regulation of transcription from RNA polymerase II promoterspindle poleRNA polymerase II transcription factor bindingRap1 signaling pathway
CDC2717metaphase/anaphase transition of mitotic cell cyclenucleusprotein bindingCell cycle
CHEK215, 22, YDNA damage checkpointchromosomeprotein kinase activityCell cycle
FLG1multicellular organism developmentnucleusstructural molecule activity
MUC611O-glycan processingextracellular regionextracellular matrix structural constituent
OBSCN1protein phosphorylationcytosolprotein kinase activity
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
PCDHA35cell adhesionplasma membranecalcium ion binding
RHOA3transforming growth factor beta receptor signaling pathwayintracellularGTPase activityRas signaling pathway
SPTA11MAPK cascadecytosolRas guanyl-nucleotide exchange factor activity
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway
USH2A1visual perceptionphotoreceptor inner segmentprotein binding
Uterine Corpus Endometrial CarcinomaAKT114protein import into nucleusnucleusprotein kinase activityMAPK signaling pathway
n = 548ARID1A1negative regulation of transcription from RNA polymerase II promoternuclear chromatinDNA binding
CTCF16negative regulation of transcription from RNA polymerase II promoterchromosomeRNA polymerase II core promoter proximal region sequence-specific DNA binding
EP30022negative regulation of transcription from RNA polymerase II promoterhistone acetyltransferase complexRNA polymerase II core promoter sequence-specific DNA bindingcAMP signaling pathway
FBXW74protein polyubiquitinationnucleoplasmubiquitin-protein transferase activityUbiquitin mediated proteolysis
KRAS12MAPK cascadeintracellularGTPase activityMAPK signaling pathway
TIAM121cardiac muscle hypertrophynucleusreceptor signaling protein activityRas signaling pathway
CTNNB13negative regulation of transcription from RNA polymerase II promoterspindle poleRNA polymerase II transcription factor bindingRap1 signaling pathway
CHD412negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II core promoter proximal region sequence-specific DNA bindingViral carcinogenesis
ESR16negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II core promoter proximal region sequence-specific DNA bindingEstrogen signaling pathway
FGFR210negative regulation of transcription from RNA polymerase II promoterextracellular regionprotein tyrosine kinase activityMAPK signaling pathway
FLNAXplatelet degranulationextracellular regionG-protein coupled receptor bindingMAPK signaling pathway
FOXA220positive regulation of transcription from RNA polymerase II promoter by glucosenucleusRNA polymerase II core promoter proximal region sequence-specific DNA bindingMaturity onset diabetes of the young
PTEN10, 9regulation of cyclin-dependent protein serine/threonine kinase activityextracellular regionmagnesium ion bindingInositol phosphate metabolism
PIK3CA3angiogenesisintracellularprotein serine/threonine kinase activityInositol phosphate metabolism
PIK3R15cellular glucose homeostasisnucleustransmembrane receptor protein tyrosine kinase adaptor activityErbB signaling pathway
PRPF817spliceosomal tri-snRNP complex assemblynucleussecond spliceosomal transesterification activitySpliceosome
PPP2R1A19G2/M transition of mitotic cell cycleprotein phosphatase type 2A complexantigen bindingmRNA surveillance pathway
SPOP17regulation of proteolysisnucleusprotein binding
TP5317negative regulation of transcription from RNA polymerase II promoternuclear chromatinRNA polymerase II regulatory region sequence-specific DNA bindingMAPK signaling pathway

Consensus driver genes

We obtained a consensus of the top 20 driver genes for each cancer considered from the DriverDB database [26], based on identification by at least 2 tools (default) for each cancer, since requesting a higher consensus could result in fewer than 20 driver genes for some cancers. DriverDB assembles together lists of the top ranked driver genes determined from the use of 15 packages, including ActiveDriver, Dendrix, MDPFinder, Simon, NetBox, OncodriveFM, MutSigCV, MEMo, CoMDP, DawnRank, DriverNet, e-Driver, iPAC, MSEA, and OncodriveCLUST. Table 1 lists the cancer types, sample size, and descriptions of the driver genes used, including chromosome location, gene ontology nomenclature, and molecular pathway.

Driver events: mutations, deletions, amplifications, downregulation, and upregulation

Data for all somatic mutations were obtained directly from cBio-Portal. We also acquired high-confidence deletions and amplifications from cBio-Portal, where a deletion was defined as full homozygous loss with a GISTIC score [27] of −2, and an amplification was defined as high-level gain with a GISTIC score of 2. Low-level deletions (heterozygous loss) and low-level gain (low-level amplifications) with GISTIC scores of −1 and 1, respectively, were not used. Downregulation and upregulation of RNA-Seq based expression was also obtained from cBio-Portal, where Z-scores less than −1.96 were assumed to represent downregulation, and Z-scores greater than 1.96 were assumed to represent upregulation. Since there were 20 driver genes considered per cancer and 5 “driver events” considered per gene (mutations, deletions, amplifications, downregulation, upregulation), each datafile for n tumor samples consisted of 100 binary () variables, where 20 columns represented the presence of somatic mutations (1:yes, 0:no), 20 columns represented presence of deletions (1:, 0:otherwise), 20 columns represented presence of amplifications (1:, 0:otherwise), 20 columns represented downregulation (1:, 0:), and 20 columns represented presence of upregulation (1:, 0:).

Permutation-based affinity matrix (PBAM)

The methods described in current and following sections are collectively called “PRogression Inference of Somatic Mutations in CAncer” (PRISM-CA). A permutation-based affinity matrix (PBAM) approach was employed for identifying the order of events among the top 20 driver genes used. The PBAM approach to infer sequential order of genomic alterations is non-parametric and independent of fitness, waiting times, and baseline somatic mutation rates. Define an event as a boolean outcome of true (or binary 1 vs. 0), for either a somatic mutation, deletion, amplification, downregulation, or upregulation of a driver gene. Thus, for 20 top driver genes and 5 possible outcomes per gene (mutation, deletion, amplification, downregulation, or upregulation), there are potential events for each tumor sample based on the 20 driver genes. Let n be the total number of tumor samples of a given cancer histological subtype. The assumed data file for each cancer employed will therefore have columns (binary variables) for the 100 events and n rows representing tumor samples. For each jth event, calculate the frequency, , of the event among the n tumor samples. For a pair of events j and k, the affinity between event j and k is defined as , where is the co-occurrence frequency of events j and k among the n tumor samples, and is the between-event distance for events j and k defined as where is the cardinality of tumors with co-occurring events j and k, and is the co-occurrence frequency for all other events along with event k (within the ith sample). The co-occurrence frequency is initially set to , where is the calculated frequency based on the data. Note that and matrices have dimensions , where p is the total number of events for a particular cancer type. The summation on the right side of (1) is performed over all pairs of events in each ith sample, while the summation on the left is over all tumor samples with events j and k. Since each ith tumor sample with events j and k can have up to total events, the number of permutations of event labels, , for the ith tumor sample is ! For each permutation of event labels representing events in the ith tumor sample, we have where and are the subscripts for represented by events j and k, respectively, within a permutation. Once values of are determined for the ith tumor sample, calculate the permutation-specific probability for the ith tumor sample as A permutation, or specific order of events, was assumed to be significant if its observed probability, , was greater than the expected probability of ! After looping through all of the ! permutations per tumor sample with genes j and k mutated, we updated the co-occurrence frequency for events j and k using the relationship where represents cases where j precedes k within a permutation, and represents cases where k precedes j within the same permutation. The above steps were repeated 10 times to iteratively update values of , which were used to update values of and at the beginning of each iteration. As we looped through all possible pairwise values of j and k during the last iteration, we compiled a list based on (only) the most significant value of for each tumor sample for each pairwise combination of j and k. From this list of the most significant permutations over all values of j and k, we determined the frequency that each event occupies the 1st, 2nd, 3rd, 4th, or 5th element in all of the output permutations. At convergence, C is not a symmetric matrix, since non-zero elements in the jth row provide the out-links of the jth node (from node j to others), while non-zero elements in the kth column provide its in-links (from others to node k). This leads to a dichotomy for each event, where the in-link can be compared with its out-link, essentially revealing whether event j is more of an information provider than an information receiver based on the relative values of and . We first filtered C by zeroing the smaller of each pair of and in the lower and upper triangulars. The next section describes an additional filtration method applied to C. Figure 1 lists the computational steps involved in equations (1)–(4). During all runs for all histological subtypes considered, we monitored , which increased with decreasing increments during the iterations.
Figure 1

Computational algorithm for Permutation-based Affinity Matrix (PBAM) method.

Matrix filtration with Suppes probabilistic causation

Given a pair of events j and k, we can causally infer whether event j is likely to be causal of event k, or vice versa. Since the event data are binary(boolean), we first determined the probabilities , , , and . Suppes definition of probabilistic causation [28] states that j is a prima facie cause of k, , if there is temporality such that event j occurs more frequently than event k, i.e., , and k is a probability raiser of j, i.e., [29]. Taken collectively, the temporality condition and probability raising condition are amenable for describing selective advantage characteristics of the accumulation of genomic alterations during tumor progression. Temporality assumes that earlier occurring genomic events occur more frequently, and probability raising assumes that the probability of observing j raises the probability of observing k. These conditions were used for filtering C, for which elements of C were set to zero if or .

Polytree generation with weighted directed acyclic graphs

Once the affinity matrix C was filtered for Suppes probabilistic causation, we retained only the greatest column values of C, and zeroed out the remaining column entries. This was done to prevent child nodes from having multiple parent nodes. Matrix C was then employed as an adjacency matrix A to generate a weighted directed acyclic graph (WDAG), , with vertices (nodes) and edges between vertices j and k. This essentially resulted in . A depth first search of G was then performed in order to uncover the structure of all possible depth-first trees with separate roots. Plots of tree forests were generated with edges which were colored according to their value of . We also outlined graph vertices (nodes) with colors representing the number of Pubmed publications found whose titles included the common gene symbol and “mutation,” “deletion,” “amplification,” “downregulated” or “downregulation,” or “upregulated” or “upregulation” for the gene's relevant event.

Mutual information Bayesian network (MIBN)

A Bayesian network approach was also developed using the Chow–Liu algorithm [30], for which between-event mutual information was determined in the form where , is the number of tumor samples with co-occurrence of events x and y, and and are the number of tumor samples having singleton events. Prim's algorithm [31] was then applied to the WDAG to remove unconnected edges in order to construct the forest of trees. It warrants noting that the MI event-by-event matrix in MIBN is symmetric, while PBAM's C matrix is not symmetric. The adjacency matrix for tree construction was based on connected edge weights, which were also filtered using Suppes definition of probabilistic causation.

Randomization tests for temporality and probability raising

We developed a randomization test to determine the significance of temporality and probability raising between all possible pairs of events j and k represented by elements of C and MI which met the temporality and probability raising conditions during previous filtration. The p-value for temporality was based on the number of times the difference exceeded , divided by 500. The p-value for the test of significance for probability raising was based on the number of times the difference exceeded divided by 500. During each bth iteration, binary outcomes () were randomly shuffled between events (binary variables) j and k. Graph edges for pairs of nodes with significant p-values () were dashed using various line styles to represent full prima facie causation (—) where both temporality and probability raising conditions were met, only the temporality condition was met (- - -), only the probability raising condition existed (-.-.), or neither (....).

Results

Figures 2 and 3 illustrate the distribution of between-event PBAM values of and MIBN values of at convergence, which were employed for weighted directed acyclic graph generation. Figure 2 reveals that the distribution of values for the cancers investigated is right-skewed with the median typically falling near-zero. Renal clear cell carcinoma resulted in the greatest interquartile range (IQR) of values, and ovarian serous resulted in the smallest IQR of values. Driver genes having the greatest degree of “communication”, that is, greatest extreme values of in the right whiskers of the boxes, will be dominated by only a few drivers such as TP53, PICK3CA, etc. Whereas the bulk of the remainder genes will have their values within the IQRs. This would imply that renal clear cell cancer has a greater proportion of drivers sharing information (greatest IQR), while ovarian serous cancers with a more narrow IQR of will share information to a lesser extent. Alternatively, Figure 3 reveals a left-skewed distribution of values for the cancers considered, showing much more variation in median values. Acute myeloid leukemia exhibits the lowest median value of roughly 7.2, and breast carcinoma exhibits the highest median of 9.8. Overall, there was not as much variation in range of the values across the cancers when compared with values. When compared with cancer-specific IQRs for , the IQR values of are approximately the same; however, the median values of are drastically different, especially between acute myeloid leukemia and breast invasive cancer. Thus, the between-cancer differences in information sharing represented by tend to be revealed by the extreme values, tail length, and width of IQR, whereas between-cancer differences in tend to depend mostly on median values.
Figure 2

Box plot showing distribution of between-event PBAM c matrix element values used for weighted directed acyclic graph generation.

Figure 3

Box plot showing distribution of between-event MIBN MI(x,y) mutual information values used for weighted directed acyclic graph generation.

The clinical impact of driver genes can best be portrayed by their impact on overall survival (OS) of patients. Table 2 lists the significant hazard ratios (HR) for OS, adjusted for age at diagnosis, and the 95% confidence interval. The majority of gene alterations were deleterious rather than protective, and we observed the greatest HR of 8.86 (95% CI, 2.80–27.98) from the mutation of PLCH1 in ovarian serous cancer. This is followed by the suppressed expression of KMT2C in breast adenocarcinoma, resulting in an HR of 7.83 (95% CI, 1.08–56.67), amplification of ACACA in renal clear cell carcinoma with an HR of 7.69 (95% CI, 1.06–56.03), and enhanced expression of IDH1 in prostate adenocarcinoma, resulting in an HR of . We also observed a protective effect from several genes, including the mutations of CTNNB1 and SMARCA4 in stomach adenocarcinoma, which resulted in HRs of 0.30 (95% CI, 0.11–0.82) and 0.35 (95% CI, 0.13–0.96), and mutations of ARID1A, KRAS, and PTEN in uterine cancer with an HR of 0.17 (95% CI, 0.05–0.54), 0.30 (95% CI, 0.09–0.95), and 0.40 (95% CI, 0.22–0.71).
Table 2

Clinical relevance of genomic events evaluated in the context of survival analysis. Hazard ratios (HR) based on Cox proportional hazards regression with adjustment for age at diagnosis. HR compares the mortality rate among subjects with a given event with the average mortality rate among all subjects with the given cancer. Genes whose HR > 1 are termed “deleterious,” while genes whose HR < 1 are called “protective.” Survival time variable: overall survival in months (TCGA field: OS_MONTH), censoring variable: overall survival status (TCGA field: OS_STATUS). Gene symbol subscripts are: _m – mutation, _d – deletion, _a – amplification, _s – suppressed expression, _e – enhanced expression. Results sorted by p-values.

Cancer siteEventCoef. (βj)s.e. (βj)ZProb.HR(95%CI)
Acute Myeloid LeukemiaTP53_m0.99810.30933.22680.00122.71(1.480,4.975)
RUNX1_a1.89580.63003.00890.00266.66(1.937,22.893)
DNMT3A_m0.49350.21732.27080.02311.64(1.070,2.508)
U2AF1_a0.96040.47292.03070.04222.61(1.034,6.602)
NPM1_m0.43090.21372.01670.04371.54(1.012,2.339)



Brain Lower Grade GliomaEGFR_a1.46900.24615.96760.00004.34(2.682,7.039)
PTEN_s1.33460.26415.05220.00003.80(2.263,6.375)
EGFR_m1.70530.34704.91330.00005.50(2.787,10.866)
NF1_s1.88840.40424.67100.00006.61(2.992,14.598)
NF1_m1.62380.35124.62320.00005.07(2.548,10.097)
EGFR_e0.83380.20344.09820.00002.30(1.545,3.430)
IDH1_e1.11120.28283.92860.00003.04(1.745,5.289)
PTEN_m1.50660.38583.90420.00004.51(2.118,9.611)
CHEK2_e0.95630.25923.68890.00022.60(1.566,4.326)
PLCG1_e0.90600.25753.51850.00042.47(1.494,4.099)
IDH1_m−0.506960.1881−2.694380.00700.60(0.417,0.871)
CACNA1S_m1.70760.71722.38090.01725.52(1.352,22.498)
PTEN_d1.22000.52132.34010.01923.39(1.219,9.411)
CIC_m−0.781410.3463−2.256230.02400.46(0.232,0.902)
CDC27_e0.74920.34642.16250.03052.12(1.073,4.172)
TP53_e0.68660.33062.07630.03781.99(1.039,3.799)
CIC_s−0.496820.2500−1.986570.04690.61(0.373,0.993)



Breast Invasive CarcinomaFOXA1_m1.29050.39193.29310.00093.63(1.686,7.836)
CTCF_e1.23680.39103.16270.00153.44(1.601,7.413)
MAP2K4_d1.16770.39282.97260.00293.21(1.489,6.943)
ITPR1_a1.00640.34572.91040.00362.74(1.389,5.388)
NR1H2_a1.06150.42402.50300.01232.89(1.259,6.638)
ERBB2_m1.13570.45712.48430.01293.11(1.271,7.628)
ARID1A_d1.59090.71592.22210.02624.91(1.206,19.969)
PIK3R1_m1.10070.50852.16480.03043.01(1.110,8.145)
KMT2C_d1.22080.58702.07960.03753.39(1.073,10.713)
KMT2C_s2.05761.01022.03670.04167.83(1.081,56.699)
PIK3CA_e0.55880.28261.97740.04791.75(1.005,3.043)
CDH1_m−0.598290.3037−1.969980.04880.55(0.303,0.997)



Colorectal AdenocarcinomaGRIA2_m1.46650.51482.84850.00434.33(1.580,11.889)
FBXW7_e1.02940.42002.45050.01422.80(1.229,6.378)
KRT1_e1.46670.71812.04230.04114.34(1.061,17.714)
FBXW7_s1.42270.72501.96230.04974.15(1.002,17.180)



Lung Squamous Cell CarcinomaPIK3CA_e−0.354300.1420−2.493670.01260.70(0.531,0.927)
APC_m0.91550.41912.18430.02892.50(1.099,5.681)
CTNNA2_a1.01020.50911.98420.04722.75(1.012,7.450)
Ovarian Serous CystadenocarcinomaPLCH1_m2.18100.58683.71630.00028.86(2.803,27.976)
GRIN2B_a0.49980.19322.58650.00961.65(1.129,2.407)



Prostate AdenocarcinomaIDH1_e1.97110.72922.70310.00687.18(1.719,29.977)
POLI_e1.89430.81912.31240.02076.65(1.335,33.113)



Renal Clear Cell CarcinomaACACA_e1.18760.23545.04360.00003.28(2.067,5.203)
PABPC1_e1.16180.23504.94380.00003.20(2.016,5.065)
TP53_e0.91740.23773.85940.00012.50(1.571,3.988)
SSX3_e1.03750.28073.69640.00022.82(1.628,4.893)
CDC27_e1.04530.34463.03280.00242.84(1.447,5.590)
FAM104A_e0.63200.25532.47540.01331.88(1.141,3.103)
BAP1_m0.49400.24302.03260.04201.64(1.018,2.639)
ACACA_a2.04031.01302.01410.04407.69(1.056,56.026)



Stomach AdenocarcinomaUSH2A_e1.13310.27414.13410.00003.11(1.815,5.314)
OBSCN_a1.60070.39054.09900.00004.96(2.306,10.657)
USH2A_a1.75120.59062.96510.00305.76(1.811,18.337)
FAT4_m−0.645130.2350−2.744590.00600.52(0.331,0.832)
CTNNB1_m−1.193700.5067−2.355450.01850.30(0.112,0.818)
SMAD_d0.60540.26382.29460.02171.83(1.092,3.073)
TP53_m−0.341940.1622−2.107770.03500.71(0.517,0.976)
SMARCA4_m−1.037350.5085−2.039900.04130.35(0.131,0.960)



Uterine Corpus Endometrial CarcinomaPIK3CA_m−1.068950.3110−3.436200.00050.34(0.187,0.632)
PTEN_m−0.923470.2926−3.155920.00160.40(0.224,0.705)
AKT1_a1.42840.46293.08560.00204.17(1.684,10.339)
ARID1A_m−1.781600.5895−3.022140.00250.17(0.053,0.535)
ESR1_d1.88210.72722.58790.00966.57(1.579,27.317)
PTEN_s1.43000.59802.39130.01674.18(1.294,13.494)
PTEN_d0.92930.39652.34350.01912.53(1.164,5.510)
PPP2R1A_a1.33010.58942.25660.02403.78(1.191,12.006)
AKT1_e0.88950.42532.09110.03652.43(1.057,5.603)
ESR1_a0.81820.39862.05280.04002.27(1.038,4.951)
FBXW7_e1.21820.59352.05250.04013.38(1.056,10.821)
KRAS_m−1.210890.5911−2.048420.04050.30(0.094,0.949)
Interpreting Figures 4–23  The following sections describe results observed for the 10 cancers considered, and first cover PBAM results followed by MIBN results for each cancer. Each figure illustrates the degree of information sharing between events by use of gradient color scales for (a) values of between-event for PBAM plots or for MIBN plots, and (b) the amount of Pubmed hits for each node (event) based on reports containing the gene symbol and either “somatic mutation”, “deletion”, “amplification”, “downregulation”, or “upregulation”, depending on the event being searched for. The line colors used between nodes (events) are based on the value of the between-event or , which also refers to the color gradient scale used. There are five node colors representing the various gene-related events: mutation, deletion, amplification, downregulation, or upregulation in expression. Since there are 20 driver genes used per cancer, there are 100 potential nodes that can appear in each plot. Genes whose events never co-occur with other events are not shown in the plots, and therefore, co-occurrence of events is required in order for an event to appear in a plot. The number of samples harboring the event along with the percentage out of the total sample size (in parentheses) for each cancer is also listed above the relevant node. Varying line styles are also used to reflect the existence of between-event prima facie causality, probability raising, or neither.
Figure 4

PBAM analysis results for Invasive Breast Carcinoma.

Figure 23

MIBN analysis results for Brain Lower Grade Gliomas.

Breast invasive carcinoma

PBAM-based results for invasive breast carcinoma shown in Figure 4 illustrate that mutations in TP53, PIK3CA, and GATA3, and upregulation of ERBB2 were the main driver events observed in the data [33], [47], [48], [37], [38], [51], [52]. The events which followed TP53 mutations were comprised of a mixture of deletions, amplifications, downregulation and upregulation, with few mutational events, which include PTEN deletions [40], [49], PIK3CA amplifications [38], [50], downregulation of ERBB2 [51], [52], PTEN [51], [44], AKT1 [55], [56], TP53 [53], [54], ARID1A [57], [58], and CTCF [59], [60] and upregulation of TP53 [53], [54], AKT1 [55], [56], PIK3CA [61], [62], and ARID1A [57], [58]. Secondary events following mutations in PIK3CA included upregulation of PTEN [41], [44], RUNX1 [45], [46], and mutations in PTEN [40], [41], ERBB2 [42], [43] and CDH1 [39]. Events downstream of ERBB2 upregulation included amplification of ERBB2 [42], [63] and upregulation of FOXA1 [66], [67] and CDH1 [64], [65]. Events that were likely to be causal after GATA3 mutations included AKT1 mutations [34] and upregulation of GATA3 [35], [36] and CTCF [59], [60]. The MIBN-derived WDAG shown in Figure 5 illustrates that in invasive breast carcinoma, causal inference indicates three main root nodes involving mutations in TP53, PIK3CA, and GATA3. Interestingly, while PIK3CA and GATA3 mutations seem to precede mostly mutations, TP53 seemed to precede deletions, amplifications, and transcriptional alterations. The tree starting with mutations in GATA3 contained one node involving expression changes in GATA3, which have been reported and another involving mutations in AKT1. Within the PIK3CA tree, there were nodes representing mutations in CDH1, PTEN, and ERBB2. The tree rooted by TP53 contained the remaining alterations involving deletions in PTEN, amplifications in PIK3CA and ERBB2, and gene expression changes in AKT1, ERBB2, CDH1, CTCF, FOXA1, PTEN, ARID1A, RUNX1, TP53, and PIK3CA.
Figure 5

MIBN analysis results for Invasive Breast Carcinoma.

Colorectal adenocarcinoma

PBAM results for colorectal adenocarcinoma (Figure 6) indicate that mutations in APC [68], [69] and downregulation of SMAD4 [81], [82] were the main driver events. Although the literature commonly reports mutations in BRAF, PIK3CA, and KRAS as key driver events in colorectal cancer, our results show that mutations in APC are the driver for events in these three genes. Driver events which were secondary to APC mutations included mutations in BRAF [70], [71], KRAS [72], [73], NRAS [74], [75], PIK3CA [76], [77], SMAD4 [78], [79], and TP53 [80], [75]. Whereas the secondary driver events observed to follow SMAD4 downregulation were SMAD4 deletions [83], [84], KRAS amplifications [95], [96], downregulation of TP53 [85], [86], APC [89], [90], KRAS [72], [91], PIK3CA [93], [94], FBXW7 [97], [98], and upregulation of KRAS [72], [91], BRAF [87], [88], CTNNB1 [92], APC [89], [90], PIK3CA [93], [94], FBXW7 [97], [98], SMAD4 [81], [82], and TP53 [85], [86]. The main driver events for MIBN-based analysis (Figure 7) were also mutations in APC and downregulation of SMAD4. Furthermore, we observed that KRAS-induced mutations in SMAD4, upregulation and downregulation in APC, downregulation in PIK3CA, and downregulation in KRAS. A variety of events in the APC tree also include mutations in NRAS and TP53, and upregulation of FBXW7. An apparent pattern in Figure 7 is that the tree rooted by APC is mostly populated with daughter nodes representing deletions, amplifications, and transcriptional changes, while the primary daughter events in the SMAD4 tree are transcriptionally related. Transcriptional changes visible within the SMAD4 tree are downregulation of CTNNB1, FBXW7, TP53 and upregulation of BRAF, CTNNB1, TP53, KRAS, SMAD4, and PIK3CA.
Figure 6

PBAM analysis results for Colorectal Adenocarcinoma.

Figure 7

MIBN analysis results for Colorectal Adenocarcinoma.

Lung adenocarcinoma

PBAM results for lung adenocarcinoma in Figure 8 indicate that mutations in TP53 [99], [100] were the single main driver event, followed by three separate clusters of events. One cluster (left side) included a series of child nodes events representing mostly mutations in BRAF [101], [102], EGFR [103], [104], NF1 [105], and PIK3CA [106], [107]. A second cluster was comprised of a polytree rooted by mutations in RYR2, followed by mutations in KRAS [116], [117], STK11 [100], deletions in KRAS [108], [109], STK11 [110], EGFR [113], [114], amplifications in KRAS [119], [120], downregulation of STK11 [100], and upregulation of KEAP1 [111], [112], TP53 [115], [100], KRAS [118], [100], STK11 [100], NF1 [121]. The third cluster of events was merely an agglomeration of child events consisting of deletions in KEAP1 [122], amplifications of EGFR [125], [126] and PIK3CA [127], downregulation of NF1 [121] and KEAP1 [111], [112], and upregulation of EGFR [123], [124]. Regarding the MIBN-based results for lung adenocarcinoma (Figure 9), there was one tree identified which was rooted by mutations in TP53, and contained mutations in BRAF, EGFR, KRAS, NF1, STK11, and PIK3CA. We also identified reports for deletions in KEAP1, KRAS, STK11, EGFR, and amplifications in EGFR, KRAS, and PIK3CA, and downregulation of NF1, STK11, KEAP1, EGFR, and upregulation of KEAP1, KRAS, STK11, TP53, and NF1.
Figure 8

PBAM analysis results for Lung Adenocarcinoma.

Figure 9

MIBN analysis results for Lung Adenocarcinoma

Ovarian serous cystadenocarcinoma

Our PBAM results for ovarian cancer, shown in Figure 10, identified one major tree rooted by TP53 mutation [129], [130]. This tree contains clusters of alterations in gene amplification, mutation, and upregulation. The upregulation of EGFR and KIT in ovarian cancer has been widely reported in literature [131], [132], [133], [134]. Our results also identified two smaller trees, one rooted by amplification in GRIN2B, which appears to precede upregulation in TP53 [135], [136], and an even smaller tree rooted by amplification of PLCH1. Our MIBN results, shown in Figure 11, identified TP53 as the single precursor event of all alterations in ovarian cancer.
Figure 10

PBAM analysis results for Ovarian Serous Cystadenocarcinoma.

Figure 11

MIBN analysis results for Ovarian Serous Cystadenocarcinoma.

Prostate adenocarcinoma

Figure 12 illustrates results of the PBAM run for prostate adenocarcinoma. The tree has four main root nodes involving mutations in FRG1B, SPOP [139], [140], downregulation in PTEN [147], [148], and upregulation in APC [141]. FRG1B precedes mutations in CHEK2, which has been reported in literature [137], [138]. The tree rooted by SPOP causes deletion and downregulation of genes, including downregulation in APC [141]. The third root with downregulation in PTEN causes a variety of aberrations, including mutations in PTEN [144], downregulation in PTEN [142], [143] and CHEK2 [149], and upregulation in FOXA1 [145], [146]. The fourth tree rooted by upregulation in APC causes mostly other upregulations, such as in the gene PTEN [142], [143], and a few deletion, such as in the gene FOXA1 [145], [146]. By comparison, the MIBN results (Figure 13) have three main root nodes, one of which is mutations of TP53. In this result, the mutation of TP53 precedes mutation in PTEN, deletion of CHEK2, downregulation of FOXA1, and upregulation of PTEN and FOXA1. TP53 was only causal to upregulation and amplification of FOXA1 in the first model. The second root, downregulation of PTEN, drives the deletion of PTEN and mutation of CHEK2. The third root, FRG1BP mutation, is causal to clusters of mutations, including SPOP, deletions, and downregulation of APC. By contrast, we did not observe FRG1B to be such a major parent node in our PBAM results.
Figure 12

PBAM analysis results for Prostate Adenocarcinoma.

Figure 13

MIBN analysis results for Prostate Adenocarcinoma.

Renal clear cell cancer, RCC

From our PBAM run of renal clear cell (RCC) cancer in Figure 14, we observed that alterations in the BAP1 gene appear the most. Our results show six trees, generally broken up by alteration. The first tree, rooted by ARAP3, is causally linked to a multitude of mutations and an upregulation of BAP1 [150], [151]. The second tree is rooted by VHL [152], [153] and drives a mixture of aberrations, including mutation of BAP1 [154], downregulation of BAP1 [150], [151], and upregulation of VHL [155], [200]. Two small trees contained only deletions, none of which have been previously reported in RCC. The tree of mostly downregulations is rooted by PBRM1 [156], [150] and precedes downregulation of SETD2 [157]. Upregulation of ACACA was the parent root of most upregulations, including PBRM1 [156], [150] and SETD2 [157]. This was quite different from our tree results of the MIBN run in Figure 15, for which we had one tree rooted by VHL mutations which was likely causal to all the other alterations in RCC.
Figure 14

PBAM analysis results for Renal Clear Cell Cancer.

Figure 15

MIBN analysis results for Renal Clear Cell Cancer.

Stomach adenocarcinoma

PBAM results for stomach adenocarcinoma can be found in Figure 16. We identified that mutations in ARID1A, LRP1B, and TP53 [172], [173] formed three major parent nodes. Mutation in ARID1A appears to be causal to chains of alterations within the same gene. For example, we observed that ARID1A mutation precedes mutation, downregulation, and deletion within the CDH1 gene [158], [159], [160], [161], [162]. Similarly, we also observed a causal link to upregulation and deletion of TP53 [163], [164]. ARID1A also had several daughter nodes with mutations, including the genes PIK3CA [165], [166], [168] and RHOA [167], [169], [170]. The tree rooted by LRP1B mutation contains a small cluster of amplifications, which includes PIK3CA [168] and KRAS [169], [170], and a cluster of upregulation, which includes PIK3CA [171], [160]. The third tree, rooted by TP53 mutation, is likely causal of groups of mutations, amplifications, downregulations, and upregulations. Downregulation was observed in ARID1A and TP53, and upregulation was observed in ARID1A, CDH1, and MUC6, which have been reported in stomach cancer [174], [175], [163], [164], [174], [175], [160], [161], [176]. Our MIBN results, shown in Figure 17, identified only one parent node consisting of TP53 mutations.
Figure 16

PBAM analysis results for Stomach Adenocarcinoma.

Figure 17

MIBN analysis results for Stomach Adenocarcinoma.

Uterine corpus endometrial carcinoma

For uterine corpus endometrial carcinoma we identified two parent roots (Figure 18) with the PBAM method. One tree root, mutation of PTEN [177], [178], is causally linked to only other mutations, among which were the genes ARID1A, CTNNB1, FGFR2, KRAS, AKT1 [179], [180], [177], [181], [181], [182], [183]. The second tree root, mutation of PIK3CA [183], is causal to the mutation of TP53, which is the driver for a cluster of amplifications, including KRAS [181], and a cluster of upregulations which include KRAS, ARID1A, and PTEN [182], [184], [185], [186], [187]. PIK3CA also drove the deletion and downregulation of PTEN [188], [189], [190], [187], [188]. This was similar to our MIBN results (Figure 19) which show PTEN mutation as the only parent node and PIK3CA mutation as a daughter node but with similar causality as seen in the first run.
Figure 18

PBAM analysis results for Uterine Corpus Endometrial Carcinoma.

Figure 19

MIBN analysis results for Uterine Corpus Endometrial Carcinoma.

Acute myelogenous leukemia, AML

Figure 20 shows our results for a PBAM analysis of AML. There were five main root nodes observed for mutations in FLT3, IDH2, NPM1, TP53, and upregulation of RAD21. Most of these mutations have been observed previously in AML [191], [192], [201], [192], [210], [211]. Our first tree shows that mutations in FLT3 will precede mutations in WT1 [193]. In the second tree we see that mutations in IDH2 cause further mutation in RUNX1 [194], as well as a host of upregulations in genes such as FLT3, RUNX1, and WT1 [195], [196], [197], [198], [199], [200]. The tree rooted by NPM1 drives mostly mutations, including the genes DNMT3A, TET2, and KIT [202], [203], [204], [194], [207], however, there is upregulation in genes such as CEBPA and NPM1 [205], [206], [208], [209]. The tree rooted by TP53 drives a mixture of gene aberrations, but nothing previously reported in literature for AML. The final tree shows upregulation of RAD21 as a driver for upregulation in other genes, including KIT [212], [213]. By comparison, Figure 21 shows the causal MIBN-based WDAG for AML mutations. Our results typically showed fewer trees and more daughters resulting from WDAGs, which we can see is true for AML. Our WDAG for AML had two main root nodes observed for mutations in FLT3 and TP53. FLT3 mutations also precede additional mutations which were observed in NPM1, DNT3A, KIT, RUNX1, WT1, and TET2. Our WDAG results also identified a causal inference for transcriptional alterations in genes such as FLT3, RUNX1, CEBPA, KIT, NPM1, and WT1.
Figure 20

PBAM analysis results for Acute Myelogenous Leukemia.

Figure 21

MIBN analysis results for Acute Myelogenous Leukemia.

Brain lower grade gliomas

Low grade brain gliomas (Figure 22 and 23) was one of the few cancers that we did not observe any major difference between the PBAM and MIBN runs. Both runs identified two main root nodes involving mutations in IDH1 [214], [215] and upregulation of EGFR [216], [217], [218], [219]. In both results mutations in IDH1 preceded mutations in ATRX [216] as well as a variety of deletions, amplifications, and upregulations. The upregulation of NOTCH1 has been previously reported in low grade glioma [217]. Upregulation of EGFR seemed to be causally linked to amplifications in EGFR [220], [221], mutations in NF1 [222], upregulation of TP53 [128], and transcriptional alterations in other genes.
Figure 22

PBAM analysis results for Brain Lower Grade Gliomas.

Discussion

Cancer is a relatively short-term evolutionary process involving initiation and progression [223]. Genetic variation is the key to evolutionary existence and selection, and in cancer this variation is generated via somatic mutations. Natural selection determines the fate of somatic mutations by purifying selection, or reducing the likelihood that deleterious mutations persist, and by positive selection, for which functionally advantageous mutations persist. Tumor mutations persisting until the point of nextgen sequencing are referred to as substitutions, which are responsible, in part, for transformation, growth and progression, drug resistance, invasion and neovascularization, and metastasis. In addition to the viewpoint concerning mutations in tumor suppressor and oncogenes, genetic adaptation of tumors is commonly driven through somatic mutations in genes with basic cellular functions [224]. Therefore, somatic mutations among genes expressed globally throughout many tissues cannot be refuted as a major determinant of cancer. To date, the large-scale nextgen sequencing studies have revealed a high degree of prevalence of somatic mutations in human cancers. As more tumor DNA sequences are analyzed, there will continue to be new information available regarding driver and passenger genes, mutations in housekeeping genes, and the role of somatic mutations in cancer. Our PBAM approach employed iterative calculations to determine gene pairwise selectivity relationships using data on gene-specific alterations and their permutations. The results support tumor progression inference models of evolutionary trajectories, which can be applied to longitudinal studies and clinical trials for the purpose of stratifying patients based on somatic driver events and gene expression. The MIBN method was used as an alternative to PBAM to reveal results hinged to mutual information via a straightforward BN approach. Overall, our models allowed us to make inferences about probabilistic causal models to draw conclusions about the relationships between genomic alterations among key driver genes for common cancers. For the cancers investigated, PBAM and MIBN were able to partition events into hierarchical trees representing groups of patients with similar events. Altogether, our results indicate that driver mutations in TP53 were the most common across the cancers considered, which support prima facie causal inference for a host of other aberrations such as deletions, amplifications, downregulation, and upregulation. The major driver events in root nodes of trees were often TP53, APC, PICK3CA, PTEN, SMAD4, KRAS, BRAF, EGFR, IDH1, SPOP, VHL, etc., which revealed the importance of these driver genes and their alterations. A major difference between PBAM and MIBN was that while PBAM partitioned the events to distinct clusters of patients, MIBN tended to agglomerate the events into a single hierarchy – mostly as a result of mutual information. We tend to favor the results of PBAM over MIBN because it partitions events(patients) into distinct evolutionary trajectories of events, for which there is a major driver event. MIBN results would suggest that there is commonly a single event with a selective advantage over all subsets of events. Thus, the partitioning characteristic of PBAM is more amenable to the clustering of patients with distinct histories of evolutionary trajectories that parallels patient diagnosis, treatment, and follow-up. The translational value of our results are established by the potential identification of novel patient genotypes, which could prove useful in future studies of molecular markers of therapy and metastatic prediction. Our future investigations will link TCGA clinical data for recurrence, survival, and metastasis to the trees generated to identify whether certain patterns of events confer various levels of risk. Application of the PBAM and MIBN approaches to TCGA data has enabled us to view cancer from a distant perspective based on high-granularity genomic alterations which occurred in major driver genes. This view will hopefully enforce an appreciation among oncologists and biologists for the translational value of diagnostically partitioning patients according to their deleterious genomic alterations, enrolling such patients in trials involving single- or multi-label treatments associated with prolonged survival, and pursuing longitudinal studies to improve therapeutic strategies. We did not comparatively assess numerous techniques for their computational efficiency, scalability, or differences in selectivity relationships. We also did not employ a gold-standard to establish false positive and false negative rates. Rather, we highlighted differences in the polytrees portrayed by PBAM and MIBN. Estimation of the hierarchical structure of gene selectivity events may help identify the major partitions of patients having various mixtures of genomic alterations, as well as identify early and late driver events which may confer a positive selection advantage during tumor progression. The work presented here suggests that investigation of the selectivity relationships among genes can provide new insights into the development of human cancer and can establish new leads for future research on molecular diagnosis and therapeutics for cancer. There are several challenging issues surrounding development of tumor progression models using TCGA data. First, there is the problem of unknown upstream effects of germline polymorphisms which may result in a variety of results including deletions and amplifications. In low grade gliomas, we did observe a tree rooted by upregulation of EGFR which has been reported [32], and in breast cancer, we observed a tree rooted by upregulation of ERBB2 (HER2), which has been reported to be the result of amplification in 18–20% of breast cancer cases [225]. Second, there were cases of primary events involving gene loss, for which we observed downregulation of SMAD4 as a root node in colorectal adenocarcinoma, which confers worse survival in stage I-II patients [226] and early recurrence after therapy [227]. Third, tumor heterogeneity is another hallmark of cancer that cannot be easily overcome when constructing models of tumor progression based on genomic alterations. TCGA data are not based on DNA and RNA extraction from single-cells, which would be helpful for elucidating heterogeneity; however, the large variation in alleles identified throughout all the TCGA samples used would exacerbate the complexity surrounding our attempt to portray tumor progression via a single picture. We also did not consider DNA methylation status, chromosome aberrations, and microsatellite instability, which would overlay more complexity on the models developed. In conclusion, our data-driven approach to infer tumor progression models from deleterious genomic alterations in the absence of germline polymorphisms should be cautiously interpreted. Although 90–95% of cancers are sporadic, there is nevertheless great importance in the inherited component of cancer, for which the long-term lifetime chronicity of exposure to genetic polymorphisms can seed genomic alterations which were not considered in this investigation.

Conclusions

We employed computational methods to derive selectivity relationships between deleterious genomic alterations available in the The Cancer Genome Atlas. Results of our computational methods translate to portraits of evolutionary trajectories of events among major cancer driver genes. The utility of our results can be realized by oncologists and biologists who envision partitions of clusters of patients for which selectivity relationships confer certain outcomes. We conclude that the portraits which emerged during construction of graphs presented can be employed for longitudinal studies of cancer patients, to fuse genotype data with prognostic indicators of recurrence, metastasis, and survival.

Declarations

Author contribution statement

Leif E. Peterson: Conceived and designed the experiments; Wrote the paper. Tatiana Kovyrshina: Analyzed and interpreted the data.

Funding statement

This work was funded by NASA (grant NNX12AO52A).

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.
  215 in total

1.  Measurement of Domain-Specific HER2 (ERBB2) Expression May Classify Benefit From Trastuzumab in Breast Cancer.

Authors:  Daniel E Carvajal-Hausdorf; Kurt A Schalper; Lajos Pusztai; Amanda Psyrri; Konstantine T Kalogeras; Vasiliki Kotoula; George Fountzilas; David L Rimm
Journal:  J Natl Cancer Inst       Date:  2015-05-19       Impact factor: 13.506

2.  FOXA1 positively regulates gene expression by changing gene methylation status in human breast cancer MCF-7 cells.

Authors:  Lu Zheng; Bo Qian; Duo Tian; Tong Tang; Shengyun Wan; Lei Wang; Lixin Zhu; Xiaoping Geng
Journal:  Int J Clin Exp Pathol       Date:  2015-01-01

3.  A large germline deletion in the Chek2 kinase gene is associated with an increased risk of prostate cancer.

Authors:  C Cybulski; D Wokołorczyk; T Huzarski; T Byrski; J Gronwald; B Górski; T Debniak; B Masojć; A Jakubowska; B Gliniewicz; A Sikorski; M Stawicka; D Godlewski; Z Kwias; A Antczak; K Krajka; W Lauer; M Sosnowski; P Sikorska-Radek; K Bar; R Klijer; R Zdrojowy; B Małkiewicz; A Borkowski; T Borkowski; M Szwiec; S A Narod; J Lubiński
Journal:  J Med Genet       Date:  2006-11       Impact factor: 6.318

4.  HIF-2alpha deletion promotes Kras-driven lung tumor development.

Authors:  Jolly Mazumdar; Michele M Hickey; Dhruv K Pant; Amy C Durham; Alejandro Sweet-Cordero; Anil Vachani; Tyler Jacks; Lewis A Chodosh; Joseph L Kissil; M Celeste Simon; Brian Keith
Journal:  Proc Natl Acad Sci U S A       Date:  2010-07-21       Impact factor: 11.205

5.  The temporal order of genetic and pathway alterations in tumorigenesis.

Authors:  Moritz Gerstung; Nicholas Eriksson; Jimmy Lin; Bert Vogelstein; Niko Beerenwinkel
Journal:  PLoS One       Date:  2011-11-01       Impact factor: 3.240

6.  Premalignant alterations in breast and endometrium associated with a PTEN mutation in a woman with Cowden syndrome: implications for preventive care.

Authors:  Christopher B Morse; Rochelle L Garcia; Kristine E Calhoun; Elizabeth M Swisher
Journal:  Gynecol Oncol Rep       Date:  2015-02-09

7.  SPOP mutation leads to genomic instability in prostate cancer.

Authors:  Gunther Boysen; Christopher E Barbieri; Davide Prandi; Mirjam Blattner; Sung-Suk Chae; Arun Dahija; Srilakshmi Nataraj; Dennis Huang; Clarisse Marotz; Limei Xu; Julie Huang; Paola Lecca; Sagar Chhangawala; Deli Liu; Pengbo Zhou; Andrea Sboner; Johann S de Bono; Francesca Demichelis; Yariv Houvras; Mark A Rubin
Journal:  Elife       Date:  2015-09-16       Impact factor: 8.140

8.  Promoter hypermethylation of ARID1A gene is responsible for its low mRNA expression in many invasive breast cancers.

Authors:  Xianyu Zhang; Qian Sun; Ming Shan; Ming Niu; Tong Liu; Bingshu Xia; Xiaoshuan Liang; Wei Wei; Shanshan Sun; Youxue Zhang; Xiaolong Sean Liu; Qingbin Song; Yanmei Yang; Yuyan Ma; Yang Liu; Long Yang; Yanlv Ren; Guoqiang Zhang; Da Pang
Journal:  PLoS One       Date:  2013-01-21       Impact factor: 3.240

9.  Mutation and expression analysis of the putative prostate tumour-suppressor gene PTEN.

Authors:  I C Gray; L M Stewart; S M Phillips; J A Hamilton; N E Gray; G J Watson; N K Spurr; D Snary
Journal:  Br J Cancer       Date:  1998-11       Impact factor: 7.640

10.  Coexistence of EGFR with KRAS, or BRAF, or PIK3CA somatic mutations in lung cancer: a comprehensive mutation profiling from 5125 Chinese cohorts.

Authors:  S Li; L Li; Y Zhu; C Huang; Y Qin; H Liu; L Ren-Heidenreich; B Shi; H Ren; X Chu; J Kang; W Wang; J Xu; K Tang; H Yang; Y Zheng; J He; G Yu; N Liang
Journal:  Br J Cancer       Date:  2014-04-17       Impact factor: 7.640

View more
  1 in total

1.  Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer.

Authors:  Hamed Dashti; Iman Dehzangi; Masroor Bayati; James Breen; Amin Beheshti; Nigel Lovell; Hamid R Rabiee; Hamid Alinejad-Rokny
Journal:  BMC Bioinformatics       Date:  2022-04-19       Impact factor: 3.307

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.