Literature DB >> 28492066

Progression inference for somatic mutations in cancer.

Leif E Peterson^1,2,3,4,5, Tatiana Kovyrshina^1,6.

Abstract

Computational methods were employed to determine progression inference of genomic alterations in commonly occurring cancers. Using cross-sectional TCGA data, we computed evolutionary trajectories involving selectivity relationships among pairs of gene-specific genomic alterations such as somatic mutations, deletions, amplifications, downregulation, and upregulation among the top 20 driver genes associated with each cancer. Results indicate that the majority of hierarchies involved TP53, PIK3CA, ERBB2, APC, KRAS, EGFR, IDH1, VHL, etc. Research into the order and accumulation of genomic alterations among cancer driver genes will ever-increase as the costs of nextgen sequencing subside, and personalized/precision medicine incorporates whole-genome scans into the diagnosis and treatment of cancer.

Entities: Disease Gene

Keywords: Cancer research; Computational biology; Genetics; Oncology

Year: 2017 PMID： 28492066 PMCID： PMC5415494 DOI： 10.1016/j.heliyon.2017.e00277

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

Tumors evolve through a multi-step mutagenic process for which cells acquire resistance to apoptotic and antiproliferative signals, self-sufficiency in growth signals, and unlimited proliferation potential [1]. Tumor cells also activate glycolysis and deactivate oxidative phosphorylation (Warburg effect), evade immune detection by decreasing pH with lactic acid, increase ROS, exhibit chromosome aberrations and telomere shortening, negotiate life support from stromal cells, activate invasion and motility, and coordinate neovascularization [2], [3]. Cancer avoids extinction through genetic selection because its onset commonly occurs at older ages – beyond reproductive years. If cancer mortality rates decreased with age, cancer would be selected out due to zero fitness within several generations. Genomic instability and the seeding of a mutator phenotype is another hallmark, which can receive positive feedback from clonal expansion of a single pathogenic mutation in key stability pathways, such as DNA repair, replication, or cell-cycle checkpoints [4]. Within several DNA replications, a cascade of mutations ensues, sacrificing stability in the entire genome. While tumors evolve as a consequence of the accumulation of somatic lesions, it is unclear how mutated genes interact to generate the phenotypic hallmarks of cancer. The accumulation of somatic mutations in tumors is rarely detectable during early stages of development, and difficult to detect at later stages because of the high mutational load [5]. From a population genetics standpoint, homozygous mutations in both alleles of mismatch-repair genes result in a 100-fold increase in mutation rates [6], [7], [8], [9], but have zero fitness and won't survive [10]. On the other hand, heterozygosity confers a 5–10 fold increase in mutations [11], [12], which in the presence of activated DNA repair pathways will result in a tolerable selective advantage. The foundation of computational biology for tumor progression was introduced by Vogelstein et al., who introduced the notion of trajectories of progression during which tumor progression involved activation of mutations which progressed toward subpopulations of cells having clonal origin [13]. Attolini et al. [14] introduced a bioinformatic approach called RESIC (Retracing the Evolutionary Steps in Cancer) for deducing the temporal sequence of genetic events during tumorigenesis from cross-sectional genomic data of tumors. Several nextgen sequencing datasets were employed consisting of 70 advanced colorectal cancers, 91 primary human glioblastomas, and 57 acute myelogenous leukemias. In the colorectal cancers, RESIC accurately predicted the temporal sequence of APC, KRAS, and TP53 mutations, which was in agreement with order determined through analyzing tumors at different stages of colon cancer formation. For GBM tumors, it was observed that TP53 was the first gene showing selective pressure for somatic mutations, and in the AML samples, JAK2 and TET2 were the first genes to exhibit selective pressure. Youn and Simon [15] employed a highly-parameterized likelihood-based approach for inferring order of mutational steps in genes, using nextgen sequencing data for 188 lung cancers [16] and 133 colorectal tumors [17]. Results indicated that KRAS, EGFR, and TP53 were among the first genes showing selective pressure in the lung tumors, while for colorectal tumors selective pressure first appeared in APC, KRAS, and TP53. Lecca et al. [18] describe the TO-DAG (Timed Oncogenetic Directed Acyclic Graph) algorithm applied to 74 human prostate cancer samples that include point mutations, copy number losses and gains, and rearrangements. Gerstung et al. [19] developed a computational approach to infer TO-DAGs from human tumor mutation data, and determined that TO-DAG shows high performance scores on synthetic data and recognizes mutations in gatekeeper tumor suppressor genes as a trigger for several downstream mutational events in the human tumor data. The models generated by TO-DAG have been extensively compared with the trees and the graphs inferred by most recent tools representative of the RESIC [14] and CT-CBN [19]. Kang et al. [20] introduced a parametric approach to estimate the sequential order of gene mutations during tumorigenesis from genome sequencing data based on a Markov chain model as TOMC (Temporal Order based on Markov Chain). TOMC revealed that tumor suppressor genes tend to be mutated ahead of oncogenes, which are considered as important events for key functional loss and gain during tumorigenesis. A larger workflow approach was used to develop CAPRI [21], which generates acyclic graphs to capture branched, independent and confluent evolution via bootstrapping, shrinkage, maximum likelihood, and regularization. Caravagna et al. [22] reported on the PiCnIc pipeline, which incorporates CAPRI, and is a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. This investigation focuses on development of cancer progression models derived from cross-sectional genomic data. The models employed focus on selectivity relationships between mutational events, such that event j selects for event k, resulting in a weighted directed acyclic graph (WDAG) of alterations representing the accumulation of events under selective pressure during cancer progression. We also introduce a permutation-based affinity metric (PBAM) approach, which is an iterative learning method that combines multi-tumor co-occurrence event statistics and within-tumor order permutations to extract affinity relationships between all possible pairs of events. The affinity relationships are then filtered by probabilistic causation conditions based on temporality and probability raising, which are not admissible. The temporality condition assumes that, if event j occurs earlier than event k, then event j will occur more frequently than event k, that is . Whereas, the probability raising condition assumes that the occurrence of event k increases the occurrence of event j, i.e., . Selectivity relationships stem from the idea that, during tumor clonal evolution, there is a selective advantage of certain genomic alterations which increase the probability of subsequent downstream events. The pattern that emerges is one for which greater frequencies of driver events subsume a mixture of later events, above some background noise level. Estimating selectivity relationships among genomic alterations is important because early events may represent important therapeutic targets, while late mutations may play a role in metastases. Investigation of selectivity relationships can also deepen our understanding of tumorigenesis at the genome sequence level, and may help to elucidate functional roles of genomic alterations.

Cancer data

The cross-sectional data used in this investigation were derived from nextgen sequencing of tumors in the The Cancer Genome Atlas (TCGA) [23]. We investigated mutational events in 10 cancers (Table 1) for which nextgen sequencing and RNA-seq expression data were available from cBio-Portal (http://www.cbioportal.org) [24], [25].

Table 1

Genomic annotation for driver genes employed, including Gene Ontology nomenclature and molecular pathway.

Cancer	Symbol	Chr	GO:Biol. Process	GO:Cell. Component	GO:Mol. Function	Pathway
Acute Myeloid Leukemia	CEBPA	19	urea cycle	nucleus	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Non-alcoholic fatty liver disease (NAFLD)
n = 200	DIS3	13	rRNA processing	nuclear exosome (RNase complex)	3'–5'-exoribonuclease activity	RNA degradation
	DNMT3A	2	negative regulation of transcription from RNA polymerase II promoter	chromosome	DNA binding	Cysteine and methionine metabolism
	KIT	4	MAPK cascade	acrosomal vesicle	protease binding	Ras signaling pathway
	KRAS	12	MAPK cascade	intracellular	GTPase activity	MAPK signaling pathway
	RAD21	8	double-strand break repair	nuclear chromosome	transcriptional activator activity	Cell cycle
	SUZ12	17	negative regulation of transcription from RNA polymerase II promoter	sex chromatin	RNA polymerase II core promoter sequence-specific DNA binding
	U2AF1	21	mRNA splicing	nucleoplasm	nucleotide binding	Spliceosome
	WT1	11	negative regulation of transcription from RNA polymerase II promoter	nucleus	transcriptional activator activity	Transcriptional misregulation in cancer
	FLT3	13	leukocyte homeostasis	nucleus	transmembrane receptor protein tyrosine kinase activity	Cytokine–cytokine receptor interaction
	IDH1	2	glyoxylate cycle	cytoplasm	magnesium ion binding	Citrate cycle (TCA cycle)
	IDH2	15	carbohydrate metabolic process	mitochondrion	magnesium ion binding	Citrate cycle (TCA cycle)
	NRAS	1	MAPK cascade	Golgi membrane	GTP binding	MAPK signaling pathway
	NPM1	1, 10, 15, 2, 5, 8	DNA repair	nucleus	nucleic acid binding
	PHACTR1	6	actomyosin structure organization	nucleus	actin binding
	PTPN11	12, 3, 4	DNA damage checkpoint	nucleus	phosphoprotein phosphatase activity	Ras signaling pathway
	PTPRT	20	protein dephosphorylation	plasma membrane	protein tyrosine phosphatase activity
	RUNX1	21	ossification	nucleus	regulatory region DNA binding	Pathways in cancer
	TET2	4	kidney development	nucleus	sulfonate dioxygenase activity
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
Brain Lower Grade Glioma	ATRX	X	DNA repair	nuclear chromosome	DNA binding
n = 530	SETD2	3	angiogenesis	nucleus	protein binding	Lysine degradation
	TMEM189-UBE2V1	20, 3	protein polyubiquitination	ubiquitin ligase complex	ubiquitin protein ligase binding
	CACNA1S	1	calcium ion transport	cytoplasm	voltage-gated calcium channel activity	MAPK signaling pathway
	CIC	19	negative regulation of transcription from RNA polymerase II promoter	nucleus	DNA binding
	CDC27	17	metaphase/anaphase transition of mitotic cell cycle	nucleus	protein binding	Cell cycle
	CHEK2	15, 22, Y	DNA damage checkpoint	chromosome	protein kinase activity	Cell cycle
	EGFR	7	MAPK cascade	Golgi membrane	glycoprotein binding	MAPK signaling pathway
	IDH1	2	glyoxylate cycle	cytoplasm	magnesium ion binding	Citrate cycle (TCA cycle)
	IDH2	15	carbohydrate metabolic process	mitochondrion	magnesium ion binding	Citrate cycle (TCA cycle)
	KRTAP1-5	17		keratin filament
	NF1	17	MAPK cascade	nucleus	GTPase activator activity	MAPK signaling pathway
	NOTCH1	9	negative regulation of transcription from RNA polymerase II promoter	Golgi membrane	core promoter binding	Dorso-ventral axis formation
	PTEN	10, 9	regulation of cyclin-dependent protein serine/threonine kinase activity	extracellular region	magnesium ion binding	Inositol phosphate metabolism
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	PIK3R1	5	cellular glucose homeostasis	nucleus	transmembrane receptor protein tyrosine kinase adaptor activity	ErbB signaling pathway
	PLCG1	20	activation of MAPKK activity	ruffle	phosphatidylinositol phospholipase C activity	Inositol phosphate metabolism
	PTPN11	12, 3, 4	DNA damage checkpoint	nucleus	phosphoprotein phosphatase activity	Ras signaling pathway
	STK19	6	protein phosphorylation	nucleus	protein serine/threonine kinase activity
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
Breast Invasive Carcinoma	AKT1	14	protein import into nucleus	nucleus	protein kinase activity	MAPK signaling pathway
n = 1105	ARID1A	1	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	DNA binding
	CTCF	16	negative regulation of transcription from RNA polymerase II promoter	chromosome	RNA polymerase II core promoter proximal region sequence-specific DNA binding
	GATA3	10	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	transcription regulatory region sequence-specific DNA binding
	SH3PXD2A	10	superoxide metabolic process	podosome	protein binding
	CDH1	16	homophilic cell adhesion via plasma membrane adhesion molecules	extracellular region	glycoprotein binding	Rap1 signaling pathway
	ERBB2	17	MAPK cascade	nucleus	RNA polymerase I core binding	ErbB signaling pathway
	FOXA1	14	negative regulation of transcription from RNA polymerase II promoter	nucleus	RNA polymerase II transcription factor activity
	ITPR1	3	response to hypoxia	nuclear inner membrane	ion channel activity	Calcium signaling pathway
	KMT2C	7	transcription	nucleus	DNA binding	Lysine degradation
	MAP2K4	17	apoptotic process	nucleus	protein kinase activity	MAPK signaling pathway
	MAP3K1	5	MAPK cascade	cytoplasm	protein kinase activity	MAPK signaling pathway
	NR1H2	19	negative regulation of transcription from RNA polymerase II promoter	nucleus	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Insulin resistance
	OR5P2	11	G-protein coupled receptor signaling pathway	plasma membrane	G-protein coupled receptor activity	Olfactory transduction
	PTEN	10, 9	regulation of cyclin-dependent protein serine/threonine kinase activity	extracellular region	magnesium ion binding	Inositol phosphate metabolism
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	PIK3R1	5	cellular glucose homeostasis	nucleus	transmembrane receptor protein tyrosine kinase adaptor activity	ErbB signaling pathway
	RUNX1	21	ossification	nucleus	regulatory region DNA binding	Pathways in cancer
	TPRX1	19	regulation of transcription	nucleus	DNA binding
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
Colorectal Adenocarcinoma	APC	5	mitotic cytokinesis	kinetochore	protein binding	Wnt signaling pathway
n = 633	BRAF	7	MAPK cascade	nucleus	protein kinase activity	MAPK signaling pathway
	FBXW7	4	protein polyubiquitination	nucleoplasm	ubiquitin-protein transferase activity	Ubiquitin mediated proteolysis
	KRAS	12	MAPK cascade	intracellular	GTPase activity	MAPK signaling pathway
	SMAD4	18	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	FoxO signaling pathway
	TBP	6	DNA-templated transcription	nuclear chromatin	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Basal transcription factors
	AR	X	in utero embryonic development	nuclear chromatin	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Oocyte meiosis
	ATXN1	6	transcription	nucleus	DNA binding
	CACNA1B	9	transport	plasma membrane	voltage-gated calcium channel activity	MAPK signaling pathway
	CTNNB1	3	negative regulation of transcription from RNA polymerase II promoter	spindle pole	RNA polymerase II transcription factor binding	Rap1 signaling pathway
	GRIA2	4	signal transduction	endoplasmic reticulum membrane	ionotropic glutamate receptor activity	cAMP signaling pathway
	IRF5	7	transcription	nucleus	regulatory region DNA binding	Toll-like receptor signaling pathway
	KRT1	12	complement activation	extracellular space	receptor activity
	LAMC3	9	cell morphogenesis involved in differentiation	extracellular region	structural molecule activity	PI3K-Akt signaling pathway
	NRAS	1	MAPK cascade	Golgi membrane	GTP binding	MAPK signaling pathway
	NEFH	22	microtubule cytoskeleton organization	cytoplasm	structural molecule activity	Amyotrophic lateral sclerosis (ALS)
	OPRD1	1	protein import into nucleus	cytoplasm	opioid receptor activity	cGMP-PKG signaling pathway
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	PPM1E	17	negative regulation of protein kinase activity	nucleus	protein serine/threonine phosphatase activity
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
Renal Clear Cell	ARAP3	5	cytoskeleton organization	ruffle	GTPase activator activity	Rap1 signaling pathway
n = 538	BAP1	3	regulation of cell growth	intracellular	chromatin binding
	GPR32	19	complement receptor mediated signaling pathway	plasma membrane	complement receptor activity
	SETD2	3	angiogenesis	nucleus	protein binding	Lysine degradation
	SRGAP3	3	signal transduction	cytoplasm	GTPase activator activity	Axon guidance
	SSX3	X	transcription	intracellular	nucleic acid binding
	ACACA	17	tissue homeostasis	nucleolus	acetyl-CoA carboxylase activity	Fatty acid biosynthesis
	BTRC	10	G2/M transition of mitotic cell cycle	nucleus	ubiquitin-protein transferase activity	Oocyte meiosis
	CDC27	17	metaphase/anaphase transition of mitotic cell cycle	nucleus	protein binding	Cell cycle
	FAM104A	17			protein binding
	FAM151A	1		membrane
	HEBP1	12	circadian rhythm	extracellular region	heme binding
	KLK1	19	proteolysis	nucleus	serine-type endopeptidase activity	Renin-angiotensin system
	OVGP1	1	carbohydrate metabolic process	extracellular region	chitinase activity
	PTEN	10, 9	regulation of cyclin-dependent protein serine/threonine kinase activity	extracellular region	magnesium ion binding	Inositol phosphate metabolism
	PABPC1	12, 8	nuclear-transcribed mRNA catabolic process	nucleus	nucleotide binding	RNA transport
	PBRM1	3	chromatin remodeling	nuclear chromosome	DNA binding
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
	VHL	3	negative regulation of transcription from RNA polymerase II promoter	nucleus	ubiquitin-protein transferase activity	HIF-1 signaling pathway
	ZNF175	19	transcription	intracellular	nucleic acid binding
Lung Adenocarcinoma	ATM	11	DNA damage checkpoint	chromosome	DNA binding	NF-kappa B signaling pathway
n = 522	BRAF	7	MAPK cascade	nucleus	protein kinase activity	MAPK signaling pathway
	CREBBP	16	negative regulation of transcription from RNA polymerase II promoter	histone acetyltransferase complex	core promoter proximal region sequence-specific DNA binding	cAMP signaling pathway
	KRAS	12	MAPK cascade	intracellular	GTPase activity	MAPK signaling pathway
	LRP1B	2	receptor-mediated endocytosis	integral component of membrane	calcium ion binding
	SOS1	2	MAPK cascade	intracellular	DNA binding	MAPK signaling pathway
	U2AF1	21	mRNA splicing	nucleoplasm	nucleotide binding	Spliceosome
	CHEK2	15, 22, Y	DNA damage checkpoint	chromosome	protein kinase activity	Cell cycle
	DMD	X	positive regulation of cell-matrix adhesion	nucleus	dystroglycan binding	Hypertrophic cardiomyopathy (HCM)
	EGFR	7	MAPK cascade	Golgi membrane	glycoprotein binding	MAPK signaling pathway
	FLG	1	multicellular organism development	nucleus	structural molecule activity
	KEAP1	19	in utero embryonic development	nucleoplasm	ubiquitin-protein transferase activity	Ubiquitin mediated proteolysis
	NF1	17	MAPK cascade	nucleus	GTPase activator activity	MAPK signaling pathway
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	RYR2	1	response to hypoxia	cell	ryanodine-sensitive calcium-release channel activity	Calcium signaling pathway
	STK11	19	regulation of cell growth	nucleus	magnesium ion binding	FoxO signaling pathway
	SPTA1	1	MAPK cascade	cytosol	Ras guanyl-nucleotide exchange factor activity
	TNN	1	cell-matrix adhesion	proteinaceous extracellular matrix	integrin binding	PI3K-Akt signaling pathway
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
	USH2A	1	visual perception	photoreceptor inner segment	protein binding
Ovarian Serous Carcinoma	KIT	4	MAPK cascade	acrosomal vesicle	protease binding	Ras signaling pathway
n = 603	ACACB	12	acetyl-CoA metabolic process	nucleus	acetyl-CoA carboxylase activity	Fatty acid biosynthesis
	ANK1	8	exocytosis	nucleus	structural molecule activity	Proteoglycans in cancer
	CDH11	16	skeletal system development	cytoplasm	calcium ion binding
	COL4A4	2	extracellular matrix organization	extracellular region	extracellular matrix structural constituent	PI3K-Akt signaling pathway
	COL6A6	3	cell adhesion	extracellular region		PI3K-Akt signaling pathway
	CYP4A11	1	long-chain fatty acid metabolic process	cytoplasm	monooxygenase activity	Fatty acid degradation
	EGFR	7	MAPK cascade	Golgi membrane	glycoprotein binding	MAPK signaling pathway
	GRIN2B	12	MAPK cascade	intracellular	NMDA glutamate receptor activity	Ras signaling pathway
	GNPAT	1	glycerophospholipid metabolic process	mitochondrion	receptor binding	Glycerophospholipid metabolism
	IL21R	16	natural killer cell activation	integral component of membrane	interleukin-21 receptor activity	Cytokine–cytokine receptor interaction
	KAT6B	10	nucleosome assembly	nucleosome	DNA binding
	MYH13	17	muscle contraction	muscle myosin complex	microfilament motor activity	Tight junction
	MYH2	17	plasma membrane repair	Golgi apparatus	microfilament motor activity	Tight junction
	NF2	22	mesoderm formation	ruffle	actin binding	Hippo signaling pathway
	PLCH1	3	lipid catabolic process	cytoplasm	phosphatidylinositol phospholipase C activity	Inositol phosphate metabolism
	KCNQ5	6	protein complex assembly	plasma membrane	inward rectifier potassium channel activity	Cholinergic synapse
	SNTG1	8	cell communication	nucleus	actin binding
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
	ZAN	7	binding of sperm to zona pellucida	plasma membrane
Prostate Adenocarcinoma	APC	5	mitotic cytokinesis	kinetochore	protein binding	Wnt signaling pathway
n = 499	BRAF	7	MAPK cascade	nucleus	protein kinase activity	MAPK signaling pathway
	POLI	18	DNA replication	intracellular	damaged DNA binding	Fanconi anemia pathway
	EP300	22	negative regulation of transcription from RNA polymerase II promoter	histone acetyltransferase complex	RNA polymerase II core promoter sequence-specific DNA binding	cAMP signaling pathway
	RGPD8	2	protein targeting to Golgi	intracellular	Ran GTPase binding	RNA transport
	CACNA1A	19	sulfur amino acid metabolic process	nucleus	ion channel activity	MAPK signaling pathway
	CDC27	17	metaphase/anaphase transition of mitotic cell cycle	nucleus	protein binding	Cell cycle
	CHEK2	15, 22, Y	DNA damage checkpoint	chromosome	protein kinase activity	Cell cycle
	FOXA1	14	negative regulation of transcription from RNA polymerase II promoter	nucleus	RNA polymerase II transcription factor activity
	GRIK3	1	adenylate cyclase-inhibiting G-protein coupled glutamate receptor signaling pathway	plasma membrane	adenylate cyclase inhibiting G-protein coupled glutamate receptor activity	Neuroactive ligand-receptor interaction
	IDH1	2	glyoxylate cycle	cytoplasm	magnesium ion binding	Citrate cycle (TCA cycle)
	KRTAP4-11	17		keratin filament	protein binding
	MSH3	5	meiotic mismatch repair	nuclear chromosome	damaged DNA binding	Mismatch repair
	PTEN	10, 9	regulation of cyclin-dependent protein serine/threonine kinase activity	extracellular region	magnesium ion binding	Inositol phosphate metabolism
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	SPOP	17	regulation of proteolysis	nucleus	protein binding
	SYNE1	6	nucleus organization	nucleus	actin binding
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
	FRG1B	4
	KMT2C	7	transcription	nucleus	DNA binding	Lysine degradation
Stomach Adenocarcinoma	ARID1A	1	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	DNA binding
n = 478	FAT4	4	branching involved in ureteric bud morphogenesis	intracellular	calcium ion binding
	KRAS	12	MAPK cascade	intracellular	GTPase activity	MAPK signaling pathway
	LRP1B	2	receptor-mediated endocytosis	integral component of membrane	calcium ion binding
	RGPD4	2	protein targeting to Golgi	intracellular		RNA transport
	SMAD4	18	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	FoxO signaling pathway
	SMARCA4	19	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II core promoter proximal region sequence-specific DNA binding
	CDH1	16	homophilic cell adhesion via plasma membrane adhesion molecules	extracellular region	glycoprotein binding	Rap1 signaling pathway
	CTNNB1	3	negative regulation of transcription from RNA polymerase II promoter	spindle pole	RNA polymerase II transcription factor binding	Rap1 signaling pathway
	CDC27	17	metaphase/anaphase transition of mitotic cell cycle	nucleus	protein binding	Cell cycle
	CHEK2	15, 22, Y	DNA damage checkpoint	chromosome	protein kinase activity	Cell cycle
	FLG	1	multicellular organism development	nucleus	structural molecule activity
	MUC6	11	O-glycan processing	extracellular region	extracellular matrix structural constituent
	OBSCN	1	protein phosphorylation	cytosol	protein kinase activity
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	PCDHA3	5	cell adhesion	plasma membrane	calcium ion binding
	RHOA	3	transforming growth factor beta receptor signaling pathway	intracellular	GTPase activity	Ras signaling pathway
	SPTA1	1	MAPK cascade	cytosol	Ras guanyl-nucleotide exchange factor activity
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway
	USH2A	1	visual perception	photoreceptor inner segment	protein binding
Uterine Corpus Endometrial Carcinoma	AKT1	14	protein import into nucleus	nucleus	protein kinase activity	MAPK signaling pathway
n = 548	ARID1A	1	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	DNA binding
	CTCF	16	negative regulation of transcription from RNA polymerase II promoter	chromosome	RNA polymerase II core promoter proximal region sequence-specific DNA binding
	EP300	22	negative regulation of transcription from RNA polymerase II promoter	histone acetyltransferase complex	RNA polymerase II core promoter sequence-specific DNA binding	cAMP signaling pathway
	FBXW7	4	protein polyubiquitination	nucleoplasm	ubiquitin-protein transferase activity	Ubiquitin mediated proteolysis
	KRAS	12	MAPK cascade	intracellular	GTPase activity	MAPK signaling pathway
	TIAM1	21	cardiac muscle hypertrophy	nucleus	receptor signaling protein activity	Ras signaling pathway
	CTNNB1	3	negative regulation of transcription from RNA polymerase II promoter	spindle pole	RNA polymerase II transcription factor binding	Rap1 signaling pathway
	CHD4	12	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Viral carcinogenesis
	ESR1	6	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Estrogen signaling pathway
	FGFR2	10	negative regulation of transcription from RNA polymerase II promoter	extracellular region	protein tyrosine kinase activity	MAPK signaling pathway
	FLNA	X	platelet degranulation	extracellular region	G-protein coupled receptor binding	MAPK signaling pathway
	FOXA2	20	positive regulation of transcription from RNA polymerase II promoter by glucose	nucleus	RNA polymerase II core promoter proximal region sequence-specific DNA binding	Maturity onset diabetes of the young
	PTEN	10, 9	regulation of cyclin-dependent protein serine/threonine kinase activity	extracellular region	magnesium ion binding	Inositol phosphate metabolism
	PIK3CA	3	angiogenesis	intracellular	protein serine/threonine kinase activity	Inositol phosphate metabolism
	PIK3R1	5	cellular glucose homeostasis	nucleus	transmembrane receptor protein tyrosine kinase adaptor activity	ErbB signaling pathway
	PRPF8	17	spliceosomal tri-snRNP complex assembly	nucleus	second spliceosomal transesterification activity	Spliceosome
	PPP2R1A	19	G2/M transition of mitotic cell cycle	protein phosphatase type 2A complex	antigen binding	mRNA surveillance pathway
	SPOP	17	regulation of proteolysis	nucleus	protein binding
	TP53	17	negative regulation of transcription from RNA polymerase II promoter	nuclear chromatin	RNA polymerase II regulatory region sequence-specific DNA binding	MAPK signaling pathway

Consensus driver genes

We obtained a consensus of the top 20 driver genes for each cancer considered from the DriverDB database [26], based on identification by at least 2 tools (default) for each cancer, since requesting a higher consensus could result in fewer than 20 driver genes for some cancers. DriverDB assembles together lists of the top ranked driver genes determined from the use of 15 packages, including ActiveDriver, Dendrix, MDPFinder, Simon, NetBox, OncodriveFM, MutSigCV, MEMo, CoMDP, DawnRank, DriverNet, e-Driver, iPAC, MSEA, and OncodriveCLUST. Table 1 lists the cancer types, sample size, and descriptions of the driver genes used, including chromosome location, gene ontology nomenclature, and molecular pathway.

Driver events: mutations, deletions, amplifications, downregulation, and upregulation

Data for all somatic mutations were obtained directly from cBio-Portal. We also acquired high-confidence deletions and amplifications from cBio-Portal, where a deletion was defined as full homozygous loss with a GISTIC score [27] of −2, and an amplification was defined as high-level gain with a GISTIC score of 2. Low-level deletions (heterozygous loss) and low-level gain (low-level amplifications) with GISTIC scores of −1 and 1, respectively, were not used. Downregulation and upregulation of RNA-Seq based expression was also obtained from cBio-Portal, where Z-scores less than −1.96 were assumed to represent downregulation, and Z-scores greater than 1.96 were assumed to represent upregulation. Since there were 20 driver genes considered per cancer and 5 “driver events” considered per gene (mutations, deletions, amplifications, downregulation, upregulation), each datafile for n tumor samples consisted of 100 binary () variables, where 20 columns represented the presence of somatic mutations (1:yes, 0:no), 20 columns represented presence of deletions (1:, 0:otherwise), 20 columns represented presence of amplifications (1:, 0:otherwise), 20 columns represented downregulation (1:, 0:), and 20 columns represented presence of upregulation (1:, 0:).

Permutation-based affinity matrix (PBAM)

The methods described in current and following sections are collectively called “PRogression Inference of Somatic Mutations in CAncer” (PRISM-CA). A permutation-based affinity matrix (PBAM) approach was employed for identifying the order of events among the top 20 driver genes used. The PBAM approach to infer sequential order of genomic alterations is non-parametric and independent of fitness, waiting times, and baseline somatic mutation rates. Define an event as a boolean outcome of true (or binary 1 vs. 0), for either a somatic mutation, deletion, amplification, downregulation, or upregulation of a driver gene. Thus, for 20 top driver genes and 5 possible outcomes per gene (mutation, deletion, amplification, downregulation, or upregulation), there are potential events for each tumor sample based on the 20 driver genes. Let n be the total number of tumor samples of a given cancer histological subtype. The assumed data file for each cancer employed will therefore have columns (binary variables) for the 100 events and n rows representing tumor samples. For each jth event, calculate the frequency, , of the event among the n tumor samples. For a pair of events j and k, the affinity between event j and k is defined as , where is the co-occurrence frequency of events j and k among the n tumor samples, and is the between-event distance for events j and k defined as where is the cardinality of tumors with co-occurring events j and k, and is the co-occurrence frequency for all other events along with event k (within the ith sample). The co-occurrence frequency is initially set to , where is the calculated frequency based on the data. Note that and matrices have dimensions , where p is the total number of events for a particular cancer type. The summation on the right side of (1) is performed over all pairs of events in each ith sample, while the summation on the left is over all tumor samples with events j and k. Since each ith tumor sample with events j and k can have up to total events, the number of permutations of event labels, , for the ith tumor sample is ! For each permutation of event labels representing events in the ith tumor sample, we have where and are the subscripts for represented by events j and k, respectively, within a permutation. Once values of are determined for the ith tumor sample, calculate the permutation-specific probability for the ith tumor sample as A permutation, or specific order of events, was assumed to be significant if its observed probability, , was greater than the expected probability of ! After looping through all of the ! permutations per tumor sample with genes j and k mutated, we updated the co-occurrence frequency for events j and k using the relationship where represents cases where j precedes k within a permutation, and represents cases where k precedes j within the same permutation. The above steps were repeated 10 times to iteratively update values of , which were used to update values of and at the beginning of each iteration. As we looped through all possible pairwise values of j and k during the last iteration, we compiled a list based on (only) the most significant value of for each tumor sample for each pairwise combination of j and k. From this list of the most significant permutations over all values of j and k, we determined the frequency that each event occupies the 1st, 2nd, 3rd, 4th, or 5th element in all of the output permutations. At convergence, C is not a symmetric matrix, since non-zero elements in the jth row provide the out-links of the jth node (from node j to others), while non-zero elements in the kth column provide its in-links (from others to node k). This leads to a dichotomy for each event, where the in-link can be compared with its out-link, essentially revealing whether event j is more of an information provider than an information receiver based on the relative values of and . We first filtered C by zeroing the smaller of each pair of and in the lower and upper triangulars. The next section describes an additional filtration method applied to C. Figure 1 lists the computational steps involved in equations (1)–(4). During all runs for all histological subtypes considered, we monitored , which increased with decreasing increments during the iterations.

Figure 1

Computational algorithm for Permutation-based Affinity Matrix (PBAM) method.

Matrix filtration with Suppes probabilistic causation

Given a pair of events j and k, we can causally infer whether event j is likely to be causal of event k, or vice versa. Since the event data are binary(boolean), we first determined the probabilities , , , and . Suppes definition of probabilistic causation [28] states that j is a prima facie cause of k, , if there is temporality such that event j occurs more frequently than event k, i.e., , and k is a probability raiser of j, i.e., [29]. Taken collectively, the temporality condition and probability raising condition are amenable for describing selective advantage characteristics of the accumulation of genomic alterations during tumor progression. Temporality assumes that earlier occurring genomic events occur more frequently, and probability raising assumes that the probability of observing j raises the probability of observing k. These conditions were used for filtering C, for which elements of C were set to zero if or .

Polytree generation with weighted directed acyclic graphs

Once the affinity matrix C was filtered for Suppes probabilistic causation, we retained only the greatest column values of C, and zeroed out the remaining column entries. This was done to prevent child nodes from having multiple parent nodes. Matrix C was then employed as an adjacency matrix A to generate a weighted directed acyclic graph (WDAG), , with vertices (nodes) and edges between vertices j and k. This essentially resulted in . A depth first search of G was then performed in order to uncover the structure of all possible depth-first trees with separate roots. Plots of tree forests were generated with edges which were colored according to their value of . We also outlined graph vertices (nodes) with colors representing the number of Pubmed publications found whose titles included the common gene symbol and “mutation,” “deletion,” “amplification,” “downregulated” or “downregulation,” or “upregulated” or “upregulation” for the gene's relevant event.

Mutual information Bayesian network (MIBN)

A Bayesian network approach was also developed using the Chow–Liu algorithm [30], for which between-event mutual information was determined in the form where , is the number of tumor samples with co-occurrence of events x and y, and and are the number of tumor samples having singleton events. Prim's algorithm [31] was then applied to the WDAG to remove unconnected edges in order to construct the forest of trees. It warrants noting that the MI event-by-event matrix in MIBN is symmetric, while PBAM's C matrix is not symmetric. The adjacency matrix for tree construction was based on connected edge weights, which were also filtered using Suppes definition of probabilistic causation.

Randomization tests for temporality and probability raising

We developed a randomization test to determine the significance of temporality and probability raising between all possible pairs of events j and k represented by elements of C and MI which met the temporality and probability raising conditions during previous filtration. The p-value for temporality was based on the number of times the difference exceeded , divided by 500. The p-value for the test of significance for probability raising was based on the number of times the difference exceeded divided by 500. During each bth iteration, binary outcomes () were randomly shuffled between events (binary variables) j and k. Graph edges for pairs of nodes with significant p-values () were dashed using various line styles to represent full prima facie causation (—) where both temporality and probability raising conditions were met, only the temporality condition was met (- - -), only the probability raising condition existed (-.-.), or neither (....).

Results

Figures 2 and 3 illustrate the distribution of between-event PBAM values of and MIBN values of at convergence, which were employed for weighted directed acyclic graph generation. Figure 2 reveals that the distribution of values for the cancers investigated is right-skewed with the median typically falling near-zero. Renal clear cell carcinoma resulted in the greatest interquartile range (IQR) of values, and ovarian serous resulted in the smallest IQR of values. Driver genes having the greatest degree of “communication”, that is, greatest extreme values of in the right whiskers of the boxes, will be dominated by only a few drivers such as TP53, PICK3CA, etc. Whereas the bulk of the remainder genes will have their values within the IQRs. This would imply that renal clear cell cancer has a greater proportion of drivers sharing information (greatest IQR), while ovarian serous cancers with a more narrow IQR of will share information to a lesser extent. Alternatively, Figure 3 reveals a left-skewed distribution of values for the cancers considered, showing much more variation in median values. Acute myeloid leukemia exhibits the lowest median value of roughly 7.2, and breast carcinoma exhibits the highest median of 9.8. Overall, there was not as much variation in range of the values across the cancers when compared with values. When compared with cancer-specific IQRs for , the IQR values of are approximately the same; however, the median values of are drastically different, especially between acute myeloid leukemia and breast invasive cancer. Thus, the between-cancer differences in information sharing represented by tend to be revealed by the extreme values, tail length, and width of IQR, whereas between-cancer differences in tend to depend mostly on median values.

Figure 2

Box plot showing distribution of between-event PBAM c matrix element values used for weighted directed acyclic graph generation.

Figure 3

Box plot showing distribution of between-event MIBN MI(x,y) mutual information values used for weighted directed acyclic graph generation.

The clinical impact of driver genes can best be portrayed by their impact on overall survival (OS) of patients. Table 2 lists the significant hazard ratios (HR) for OS, adjusted for age at diagnosis, and the 95% confidence interval. The majority of gene alterations were deleterious rather than protective, and we observed the greatest HR of 8.86 (95% CI, 2.80–27.98) from the mutation of PLCH1 in ovarian serous cancer. This is followed by the suppressed expression of KMT2C in breast adenocarcinoma, resulting in an HR of 7.83 (95% CI, 1.08–56.67), amplification of ACACA in renal clear cell carcinoma with an HR of 7.69 (95% CI, 1.06–56.03), and enhanced expression of IDH1 in prostate adenocarcinoma, resulting in an HR of . We also observed a protective effect from several genes, including the mutations of CTNNB1 and SMARCA4 in stomach adenocarcinoma, which resulted in HRs of 0.30 (95% CI, 0.11–0.82) and 0.35 (95% CI, 0.13–0.96), and mutations of ARID1A, KRAS, and PTEN in uterine cancer with an HR of 0.17 (95% CI, 0.05–0.54), 0.30 (95% CI, 0.09–0.95), and 0.40 (95% CI, 0.22–0.71).

Table 2

Clinical relevance of genomic events evaluated in the context of survival analysis. Hazard ratios (HR) based on Cox proportional hazards regression with adjustment for age at diagnosis. HR compares the mortality rate among subjects with a given event with the average mortality rate among all subjects with the given cancer. Genes whose HR > 1 are termed “deleterious,” while genes whose HR < 1 are called “protective.” Survival time variable: overall survival in months (TCGA field: OS_MONTH), censoring variable: overall survival status (TCGA field: OS_STATUS). Gene symbol subscripts are: _m – mutation, _d – deletion, _a – amplification, _s – suppressed expression, _e – enhanced expression. Results sorted by p-values.

Cancer site	Event	Coef. (βj)	s.e. (βj)	Z	Prob.	HR(95%CI)
Acute Myeloid Leukemia	TP53_m	0.9981	0.3093	3.2268	0.0012	2.71(1.480,4.975)
	RUNX1_a	1.8958	0.6300	3.0089	0.0026	6.66(1.937,22.893)
	DNMT3A_m	0.4935	0.2173	2.2708	0.0231	1.64(1.070,2.508)
	U2AF1_a	0.9604	0.4729	2.0307	0.0422	2.61(1.034,6.602)
	NPM1_m	0.4309	0.2137	2.0167	0.0437	1.54(1.012,2.339)

Brain Lower Grade Glioma	EGFR_a	1.4690	0.2461	5.9676	0.0000	4.34(2.682,7.039)
	PTEN_s	1.3346	0.2641	5.0522	0.0000	3.80(2.263,6.375)
	EGFR_m	1.7053	0.3470	4.9133	0.0000	5.50(2.787,10.866)
	NF1_s	1.8884	0.4042	4.6710	0.0000	6.61(2.992,14.598)
	NF1_m	1.6238	0.3512	4.6232	0.0000	5.07(2.548,10.097)
	EGFR_e	0.8338	0.2034	4.0982	0.0000	2.30(1.545,3.430)
	IDH1_e	1.1112	0.2828	3.9286	0.0000	3.04(1.745,5.289)
	PTEN_m	1.5066	0.3858	3.9042	0.0000	4.51(2.118,9.611)
	CHEK2_e	0.9563	0.2592	3.6889	0.0002	2.60(1.566,4.326)
	PLCG1_e	0.9060	0.2575	3.5185	0.0004	2.47(1.494,4.099)
	IDH1_m	−0.50696	0.1881	−2.69438	0.0070	0.60(0.417,0.871)
	CACNA1S_m	1.7076	0.7172	2.3809	0.0172	5.52(1.352,22.498)
	PTEN_d	1.2200	0.5213	2.3401	0.0192	3.39(1.219,9.411)
	CIC_m	−0.78141	0.3463	−2.25623	0.0240	0.46(0.232,0.902)
	CDC27_e	0.7492	0.3464	2.1625	0.0305	2.12(1.073,4.172)
	TP53_e	0.6866	0.3306	2.0763	0.0378	1.99(1.039,3.799)
	CIC_s	−0.49682	0.2500	−1.98657	0.0469	0.61(0.373,0.993)

Breast Invasive Carcinoma	FOXA1_m	1.2905	0.3919	3.2931	0.0009	3.63(1.686,7.836)
	CTCF_e	1.2368	0.3910	3.1627	0.0015	3.44(1.601,7.413)
	MAP2K4_d	1.1677	0.3928	2.9726	0.0029	3.21(1.489,6.943)
	ITPR1_a	1.0064	0.3457	2.9104	0.0036	2.74(1.389,5.388)
	NR1H2_a	1.0615	0.4240	2.5030	0.0123	2.89(1.259,6.638)
	ERBB2_m	1.1357	0.4571	2.4843	0.0129	3.11(1.271,7.628)
	ARID1A_d	1.5909	0.7159	2.2221	0.0262	4.91(1.206,19.969)
	PIK3R1_m	1.1007	0.5085	2.1648	0.0304	3.01(1.110,8.145)
	KMT2C_d	1.2208	0.5870	2.0796	0.0375	3.39(1.073,10.713)
	KMT2C_s	2.0576	1.0102	2.0367	0.0416	7.83(1.081,56.699)
	PIK3CA_e	0.5588	0.2826	1.9774	0.0479	1.75(1.005,3.043)
	CDH1_m	−0.59829	0.3037	−1.96998	0.0488	0.55(0.303,0.997)

Colorectal Adenocarcinoma	GRIA2_m	1.4665	0.5148	2.8485	0.0043	4.33(1.580,11.889)
	FBXW7_e	1.0294	0.4200	2.4505	0.0142	2.80(1.229,6.378)
	KRT1_e	1.4667	0.7181	2.0423	0.0411	4.34(1.061,17.714)
	FBXW7_s	1.4227	0.7250	1.9623	0.0497	4.15(1.002,17.180)

Lung Squamous Cell Carcinoma	PIK3CA_e	−0.35430	0.1420	−2.49367	0.0126	0.70(0.531,0.927)
	APC_m	0.9155	0.4191	2.1843	0.0289	2.50(1.099,5.681)
	CTNNA2_a	1.0102	0.5091	1.9842	0.0472	2.75(1.012,7.450)
Ovarian Serous Cystadenocarcinoma	PLCH1_m	2.1810	0.5868	3.7163	0.0002	8.86(2.803,27.976)
	GRIN2B_a	0.4998	0.1932	2.5865	0.0096	1.65(1.129,2.407)

Prostate Adenocarcinoma	IDH1_e	1.9711	0.7292	2.7031	0.0068	7.18(1.719,29.977)
	POLI_e	1.8943	0.8191	2.3124	0.0207	6.65(1.335,33.113)

Renal Clear Cell Carcinoma	ACACA_e	1.1876	0.2354	5.0436	0.0000	3.28(2.067,5.203)
	PABPC1_e	1.1618	0.2350	4.9438	0.0000	3.20(2.016,5.065)
	TP53_e	0.9174	0.2377	3.8594	0.0001	2.50(1.571,3.988)
	SSX3_e	1.0375	0.2807	3.6964	0.0002	2.82(1.628,4.893)
	CDC27_e	1.0453	0.3446	3.0328	0.0024	2.84(1.447,5.590)
	FAM104A_e	0.6320	0.2553	2.4754	0.0133	1.88(1.141,3.103)
	BAP1_m	0.4940	0.2430	2.0326	0.0420	1.64(1.018,2.639)
	ACACA_a	2.0403	1.0130	2.0141	0.0440	7.69(1.056,56.026)

Stomach Adenocarcinoma	USH2A_e	1.1331	0.2741	4.1341	0.0000	3.11(1.815,5.314)
	OBSCN_a	1.6007	0.3905	4.0990	0.0000	4.96(2.306,10.657)
	USH2A_a	1.7512	0.5906	2.9651	0.0030	5.76(1.811,18.337)
	FAT4_m	−0.64513	0.2350	−2.74459	0.0060	0.52(0.331,0.832)
	CTNNB1_m	−1.19370	0.5067	−2.35545	0.0185	0.30(0.112,0.818)
	SMAD_d	0.6054	0.2638	2.2946	0.0217	1.83(1.092,3.073)
	TP53_m	−0.34194	0.1622	−2.10777	0.0350	0.71(0.517,0.976)
	SMARCA4_m	−1.03735	0.5085	−2.03990	0.0413	0.35(0.131,0.960)

Uterine Corpus Endometrial Carcinoma	PIK3CA_m	−1.06895	0.3110	−3.43620	0.0005	0.34(0.187,0.632)
	PTEN_m	−0.92347	0.2926	−3.15592	0.0016	0.40(0.224,0.705)
	AKT1_a	1.4284	0.4629	3.0856	0.0020	4.17(1.684,10.339)
	ARID1A_m	−1.78160	0.5895	−3.02214	0.0025	0.17(0.053,0.535)
	ESR1_d	1.8821	0.7272	2.5879	0.0096	6.57(1.579,27.317)
	PTEN_s	1.4300	0.5980	2.3913	0.0167	4.18(1.294,13.494)
	PTEN_d	0.9293	0.3965	2.3435	0.0191	2.53(1.164,5.510)
	PPP2R1A_a	1.3301	0.5894	2.2566	0.0240	3.78(1.191,12.006)
	AKT1_e	0.8895	0.4253	2.0911	0.0365	2.43(1.057,5.603)
	ESR1_a	0.8182	0.3986	2.0528	0.0400	2.27(1.038,4.951)
	FBXW7_e	1.2182	0.5935	2.0525	0.0401	3.38(1.056,10.821)
	KRAS_m	−1.21089	0.5911	−2.04842	0.0405	0.30(0.094,0.949)

Interpreting Figures 4–23 The following sections describe results observed for the 10 cancers considered, and first cover PBAM results followed by MIBN results for each cancer. Each figure illustrates the degree of information sharing between events by use of gradient color scales for (a) values of between-event for PBAM plots or for MIBN plots, and (b) the amount of Pubmed hits for each node (event) based on reports containing the gene symbol and either “somatic mutation”, “deletion”, “amplification”, “downregulation”, or “upregulation”, depending on the event being searched for. The line colors used between nodes (events) are based on the value of the between-event or , which also refers to the color gradient scale used. There are five node colors representing the various gene-related events: mutation, deletion, amplification, downregulation, or upregulation in expression. Since there are 20 driver genes used per cancer, there are 100 potential nodes that can appear in each plot. Genes whose events never co-occur with other events are not shown in the plots, and therefore, co-occurrence of events is required in order for an event to appear in a plot. The number of samples harboring the event along with the percentage out of the total sample size (in parentheses) for each cancer is also listed above the relevant node. Varying line styles are also used to reflect the existence of between-event prima facie causality, probability raising, or neither.

Figure 4

PBAM analysis results for Invasive Breast Carcinoma.

Figure 23

MIBN analysis results for Brain Lower Grade Gliomas.

Breast invasive carcinoma

PBAM-based results for invasive breast carcinoma shown in Figure 4 illustrate that mutations in TP53, PIK3CA, and GATA3, and upregulation of ERBB2 were the main driver events observed in the data [33], [47], [48], [37], [38], [51], [52]. The events which followed TP53 mutations were comprised of a mixture of deletions, amplifications, downregulation and upregulation, with few mutational events, which include PTEN deletions [40], [49], PIK3CA amplifications [38], [50], downregulation of ERBB2 [51], [52], PTEN [51], [44], AKT1 [55], [56], TP53 [53], [54], ARID1A [57], [58], and CTCF [59], [60] and upregulation of TP53 [53], [54], AKT1 [55], [56], PIK3CA [61], [62], and ARID1A [57], [58]. Secondary events following mutations in PIK3CA included upregulation of PTEN [41], [44], RUNX1 [45], [46], and mutations in PTEN [40], [41], ERBB2 [42], [43] and CDH1 [39]. Events downstream of ERBB2 upregulation included amplification of ERBB2 [42], [63] and upregulation of FOXA1 [66], [67] and CDH1 [64], [65]. Events that were likely to be causal after GATA3 mutations included AKT1 mutations [34] and upregulation of GATA3 [35], [36] and CTCF [59], [60]. The MIBN-derived WDAG shown in Figure 5 illustrates that in invasive breast carcinoma, causal inference indicates three main root nodes involving mutations in TP53, PIK3CA, and GATA3. Interestingly, while PIK3CA and GATA3 mutations seem to precede mostly mutations, TP53 seemed to precede deletions, amplifications, and transcriptional alterations. The tree starting with mutations in GATA3 contained one node involving expression changes in GATA3, which have been reported and another involving mutations in AKT1. Within the PIK3CA tree, there were nodes representing mutations in CDH1, PTEN, and ERBB2. The tree rooted by TP53 contained the remaining alterations involving deletions in PTEN, amplifications in PIK3CA and ERBB2, and gene expression changes in AKT1, ERBB2, CDH1, CTCF, FOXA1, PTEN, ARID1A, RUNX1, TP53, and PIK3CA.

Figure 5

MIBN analysis results for Invasive Breast Carcinoma.

Colorectal adenocarcinoma

PBAM results for colorectal adenocarcinoma (Figure 6) indicate that mutations in APC [68], [69] and downregulation of SMAD4 [81], [82] were the main driver events. Although the literature commonly reports mutations in BRAF, PIK3CA, and KRAS as key driver events in colorectal cancer, our results show that mutations in APC are the driver for events in these three genes. Driver events which were secondary to APC mutations included mutations in BRAF [70], [71], KRAS [72], [73], NRAS [74], [75], PIK3CA [76], [77], SMAD4 [78], [79], and TP53 [80], [75]. Whereas the secondary driver events observed to follow SMAD4 downregulation were SMAD4 deletions [83], [84], KRAS amplifications [95], [96], downregulation of TP53 [85], [86], APC [89], [90], KRAS [72], [91], PIK3CA [93], [94], FBXW7 [97], [98], and upregulation of KRAS [72], [91], BRAF [87], [88], CTNNB1 [92], APC [89], [90], PIK3CA [93], [94], FBXW7 [97], [98], SMAD4 [81], [82], and TP53 [85], [86]. The main driver events for MIBN-based analysis (Figure 7) were also mutations in APC and downregulation of SMAD4. Furthermore, we observed that KRAS-induced mutations in SMAD4, upregulation and downregulation in APC, downregulation in PIK3CA, and downregulation in KRAS. A variety of events in the APC tree also include mutations in NRAS and TP53, and upregulation of FBXW7. An apparent pattern in Figure 7 is that the tree rooted by APC is mostly populated with daughter nodes representing deletions, amplifications, and transcriptional changes, while the primary daughter events in the SMAD4 tree are transcriptionally related. Transcriptional changes visible within the SMAD4 tree are downregulation of CTNNB1, FBXW7, TP53 and upregulation of BRAF, CTNNB1, TP53, KRAS, SMAD4, and PIK3CA.

Figure 6

PBAM analysis results for Colorectal Adenocarcinoma.

Figure 7

MIBN analysis results for Colorectal Adenocarcinoma.

Lung adenocarcinoma

PBAM results for lung adenocarcinoma in Figure 8 indicate that mutations in TP53 [99], [100] were the single main driver event, followed by three separate clusters of events. One cluster (left side) included a series of child nodes events representing mostly mutations in BRAF [101], [102], EGFR [103], [104], NF1 [105], and PIK3CA [106], [107]. A second cluster was comprised of a polytree rooted by mutations in RYR2, followed by mutations in KRAS [116], [117], STK11 [100], deletions in KRAS [108], [109], STK11 [110], EGFR [113], [114], amplifications in KRAS [119], [120], downregulation of STK11 [100], and upregulation of KEAP1 [111], [112], TP53 [115], [100], KRAS [118], [100], STK11 [100], NF1 [121]. The third cluster of events was merely an agglomeration of child events consisting of deletions in KEAP1 [122], amplifications of EGFR [125], [126] and PIK3CA [127], downregulation of NF1 [121] and KEAP1 [111], [112], and upregulation of EGFR [123], [124]. Regarding the MIBN-based results for lung adenocarcinoma (Figure 9), there was one tree identified which was rooted by mutations in TP53, and contained mutations in BRAF, EGFR, KRAS, NF1, STK11, and PIK3CA. We also identified reports for deletions in KEAP1, KRAS, STK11, EGFR, and amplifications in EGFR, KRAS, and PIK3CA, and downregulation of NF1, STK11, KEAP1, EGFR, and upregulation of KEAP1, KRAS, STK11, TP53, and NF1.

Figure 8

PBAM analysis results for Lung Adenocarcinoma.

Figure 9

MIBN analysis results for Lung Adenocarcinoma

Ovarian serous cystadenocarcinoma

Our PBAM results for ovarian cancer, shown in Figure 10, identified one major tree rooted by TP53 mutation [129], [130]. This tree contains clusters of alterations in gene amplification, mutation, and upregulation. The upregulation of EGFR and KIT in ovarian cancer has been widely reported in literature [131], [132], [133], [134]. Our results also identified two smaller trees, one rooted by amplification in GRIN2B, which appears to precede upregulation in TP53 [135], [136], and an even smaller tree rooted by amplification of PLCH1. Our MIBN results, shown in Figure 11, identified TP53 as the single precursor event of all alterations in ovarian cancer.

Figure 10

PBAM analysis results for Ovarian Serous Cystadenocarcinoma.

Figure 11

MIBN analysis results for Ovarian Serous Cystadenocarcinoma.

Prostate adenocarcinoma

Figure 12 illustrates results of the PBAM run for prostate adenocarcinoma. The tree has four main root nodes involving mutations in FRG1B, SPOP [139], [140], downregulation in PTEN [147], [148], and upregulation in APC [141]. FRG1B precedes mutations in CHEK2, which has been reported in literature [137], [138]. The tree rooted by SPOP causes deletion and downregulation of genes, including downregulation in APC [141]. The third root with downregulation in PTEN causes a variety of aberrations, including mutations in PTEN [144], downregulation in PTEN [142], [143] and CHEK2 [149], and upregulation in FOXA1 [145], [146]. The fourth tree rooted by upregulation in APC causes mostly other upregulations, such as in the gene PTEN [142], [143], and a few deletion, such as in the gene FOXA1 [145], [146]. By comparison, the MIBN results (Figure 13) have three main root nodes, one of which is mutations of TP53. In this result, the mutation of TP53 precedes mutation in PTEN, deletion of CHEK2, downregulation of FOXA1, and upregulation of PTEN and FOXA1. TP53 was only causal to upregulation and amplification of FOXA1 in the first model. The second root, downregulation of PTEN, drives the deletion of PTEN and mutation of CHEK2. The third root, FRG1BP mutation, is causal to clusters of mutations, including SPOP, deletions, and downregulation of APC. By contrast, we did not observe FRG1B to be such a major parent node in our PBAM results.

Figure 12

PBAM analysis results for Prostate Adenocarcinoma.

Figure 13

MIBN analysis results for Prostate Adenocarcinoma.

Renal clear cell cancer, RCC

From our PBAM run of renal clear cell (RCC) cancer in Figure 14, we observed that alterations in the BAP1 gene appear the most. Our results show six trees, generally broken up by alteration. The first tree, rooted by ARAP3, is causally linked to a multitude of mutations and an upregulation of BAP1 [150], [151]. The second tree is rooted by VHL [152], [153] and drives a mixture of aberrations, including mutation of BAP1 [154], downregulation of BAP1 [150], [151], and upregulation of VHL [155], [200]. Two small trees contained only deletions, none of which have been previously reported in RCC. The tree of mostly downregulations is rooted by PBRM1 [156], [150] and precedes downregulation of SETD2 [157]. Upregulation of ACACA was the parent root of most upregulations, including PBRM1 [156], [150] and SETD2 [157]. This was quite different from our tree results of the MIBN run in Figure 15, for which we had one tree rooted by VHL mutations which was likely causal to all the other alterations in RCC.

Figure 14

PBAM analysis results for Renal Clear Cell Cancer.

Figure 15

MIBN analysis results for Renal Clear Cell Cancer.

Stomach adenocarcinoma

PBAM results for stomach adenocarcinoma can be found in Figure 16. We identified that mutations in ARID1A, LRP1B, and TP53 [172], [173] formed three major parent nodes. Mutation in ARID1A appears to be causal to chains of alterations within the same gene. For example, we observed that ARID1A mutation precedes mutation, downregulation, and deletion within the CDH1 gene [158], [159], [160], [161], [162]. Similarly, we also observed a causal link to upregulation and deletion of TP53 [163], [164]. ARID1A also had several daughter nodes with mutations, including the genes PIK3CA [165], [166], [168] and RHOA [167], [169], [170]. The tree rooted by LRP1B mutation contains a small cluster of amplifications, which includes PIK3CA [168] and KRAS [169], [170], and a cluster of upregulation, which includes PIK3CA [171], [160]. The third tree, rooted by TP53 mutation, is likely causal of groups of mutations, amplifications, downregulations, and upregulations. Downregulation was observed in ARID1A and TP53, and upregulation was observed in ARID1A, CDH1, and MUC6, which have been reported in stomach cancer [174], [175], [163], [164], [174], [175], [160], [161], [176]. Our MIBN results, shown in Figure 17, identified only one parent node consisting of TP53 mutations.

Figure 16

PBAM analysis results for Stomach Adenocarcinoma.

Figure 17

MIBN analysis results for Stomach Adenocarcinoma.

Uterine corpus endometrial carcinoma

For uterine corpus endometrial carcinoma we identified two parent roots (Figure 18) with the PBAM method. One tree root, mutation of PTEN [177], [178], is causally linked to only other mutations, among which were the genes ARID1A, CTNNB1, FGFR2, KRAS, AKT1 [179], [180], [177], [181], [181], [182], [183]. The second tree root, mutation of PIK3CA [183], is causal to the mutation of TP53, which is the driver for a cluster of amplifications, including KRAS [181], and a cluster of upregulations which include KRAS, ARID1A, and PTEN [182], [184], [185], [186], [187]. PIK3CA also drove the deletion and downregulation of PTEN [188], [189], [190], [187], [188]. This was similar to our MIBN results (Figure 19) which show PTEN mutation as the only parent node and PIK3CA mutation as a daughter node but with similar causality as seen in the first run.

Figure 18

PBAM analysis results for Uterine Corpus Endometrial Carcinoma.

Figure 19

MIBN analysis results for Uterine Corpus Endometrial Carcinoma.

Acute myelogenous leukemia, AML

Figure 20 shows our results for a PBAM analysis of AML. There were five main root nodes observed for mutations in FLT3, IDH2, NPM1, TP53, and upregulation of RAD21. Most of these mutations have been observed previously in AML [191], [192], [201], [192], [210], [211]. Our first tree shows that mutations in FLT3 will precede mutations in WT1 [193]. In the second tree we see that mutations in IDH2 cause further mutation in RUNX1 [194], as well as a host of upregulations in genes such as FLT3, RUNX1, and WT1 [195], [196], [197], [198], [199], [200]. The tree rooted by NPM1 drives mostly mutations, including the genes DNMT3A, TET2, and KIT [202], [203], [204], [194], [207], however, there is upregulation in genes such as CEBPA and NPM1 [205], [206], [208], [209]. The tree rooted by TP53 drives a mixture of gene aberrations, but nothing previously reported in literature for AML. The final tree shows upregulation of RAD21 as a driver for upregulation in other genes, including KIT [212], [213]. By comparison, Figure 21 shows the causal MIBN-based WDAG for AML mutations. Our results typically showed fewer trees and more daughters resulting from WDAGs, which we can see is true for AML. Our WDAG for AML had two main root nodes observed for mutations in FLT3 and TP53. FLT3 mutations also precede additional mutations which were observed in NPM1, DNT3A, KIT, RUNX1, WT1, and TET2. Our WDAG results also identified a causal inference for transcriptional alterations in genes such as FLT3, RUNX1, CEBPA, KIT, NPM1, and WT1.

Figure 20

PBAM analysis results for Acute Myelogenous Leukemia.

Figure 21

MIBN analysis results for Acute Myelogenous Leukemia.

Brain lower grade gliomas

Low grade brain gliomas (Figure 22 and 23) was one of the few cancers that we did not observe any major difference between the PBAM and MIBN runs. Both runs identified two main root nodes involving mutations in IDH1 [214], [215] and upregulation of EGFR [216], [217], [218], [219]. In both results mutations in IDH1 preceded mutations in ATRX [216] as well as a variety of deletions, amplifications, and upregulations. The upregulation of NOTCH1 has been previously reported in low grade glioma [217]. Upregulation of EGFR seemed to be causally linked to amplifications in EGFR [220], [221], mutations in NF1 [222], upregulation of TP53 [128], and transcriptional alterations in other genes.

Figure 22

PBAM analysis results for Brain Lower Grade Gliomas.

Discussion

Cancer is a relatively short-term evolutionary process involving initiation and progression [223]. Genetic variation is the key to evolutionary existence and selection, and in cancer this variation is generated via somatic mutations. Natural selection determines the fate of somatic mutations by purifying selection, or reducing the likelihood that deleterious mutations persist, and by positive selection, for which functionally advantageous mutations persist. Tumor mutations persisting until the point of nextgen sequencing are referred to as substitutions, which are responsible, in part, for transformation, growth and progression, drug resistance, invasion and neovascularization, and metastasis. In addition to the viewpoint concerning mutations in tumor suppressor and oncogenes, genetic adaptation of tumors is commonly driven through somatic mutations in genes with basic cellular functions [224]. Therefore, somatic mutations among genes expressed globally throughout many tissues cannot be refuted as a major determinant of cancer. To date, the large-scale nextgen sequencing studies have revealed a high degree of prevalence of somatic mutations in human cancers. As more tumor DNA sequences are analyzed, there will continue to be new information available regarding driver and passenger genes, mutations in housekeeping genes, and the role of somatic mutations in cancer. Our PBAM approach employed iterative calculations to determine gene pairwise selectivity relationships using data on gene-specific alterations and their permutations. The results support tumor progression inference models of evolutionary trajectories, which can be applied to longitudinal studies and clinical trials for the purpose of stratifying patients based on somatic driver events and gene expression. The MIBN method was used as an alternative to PBAM to reveal results hinged to mutual information via a straightforward BN approach. Overall, our models allowed us to make inferences about probabilistic causal models to draw conclusions about the relationships between genomic alterations among key driver genes for common cancers. For the cancers investigated, PBAM and MIBN were able to partition events into hierarchical trees representing groups of patients with similar events. Altogether, our results indicate that driver mutations in TP53 were the most common across the cancers considered, which support prima facie causal inference for a host of other aberrations such as deletions, amplifications, downregulation, and upregulation. The major driver events in root nodes of trees were often TP53, APC, PICK3CA, PTEN, SMAD4, KRAS, BRAF, EGFR, IDH1, SPOP, VHL, etc., which revealed the importance of these driver genes and their alterations. A major difference between PBAM and MIBN was that while PBAM partitioned the events to distinct clusters of patients, MIBN tended to agglomerate the events into a single hierarchy – mostly as a result of mutual information. We tend to favor the results of PBAM over MIBN because it partitions events(patients) into distinct evolutionary trajectories of events, for which there is a major driver event. MIBN results would suggest that there is commonly a single event with a selective advantage over all subsets of events. Thus, the partitioning characteristic of PBAM is more amenable to the clustering of patients with distinct histories of evolutionary trajectories that parallels patient diagnosis, treatment, and follow-up. The translational value of our results are established by the potential identification of novel patient genotypes, which could prove useful in future studies of molecular markers of therapy and metastatic prediction. Our future investigations will link TCGA clinical data for recurrence, survival, and metastasis to the trees generated to identify whether certain patterns of events confer various levels of risk. Application of the PBAM and MIBN approaches to TCGA data has enabled us to view cancer from a distant perspective based on high-granularity genomic alterations which occurred in major driver genes. This view will hopefully enforce an appreciation among oncologists and biologists for the translational value of diagnostically partitioning patients according to their deleterious genomic alterations, enrolling such patients in trials involving single- or multi-label treatments associated with prolonged survival, and pursuing longitudinal studies to improve therapeutic strategies. We did not comparatively assess numerous techniques for their computational efficiency, scalability, or differences in selectivity relationships. We also did not employ a gold-standard to establish false positive and false negative rates. Rather, we highlighted differences in the polytrees portrayed by PBAM and MIBN. Estimation of the hierarchical structure of gene selectivity events may help identify the major partitions of patients having various mixtures of genomic alterations, as well as identify early and late driver events which may confer a positive selection advantage during tumor progression. The work presented here suggests that investigation of the selectivity relationships among genes can provide new insights into the development of human cancer and can establish new leads for future research on molecular diagnosis and therapeutics for cancer. There are several challenging issues surrounding development of tumor progression models using TCGA data. First, there is the problem of unknown upstream effects of germline polymorphisms which may result in a variety of results including deletions and amplifications. In low grade gliomas, we did observe a tree rooted by upregulation of EGFR which has been reported [32], and in breast cancer, we observed a tree rooted by upregulation of ERBB2 (HER2), which has been reported to be the result of amplification in 18–20% of breast cancer cases [225]. Second, there were cases of primary events involving gene loss, for which we observed downregulation of SMAD4 as a root node in colorectal adenocarcinoma, which confers worse survival in stage I-II patients [226] and early recurrence after therapy [227]. Third, tumor heterogeneity is another hallmark of cancer that cannot be easily overcome when constructing models of tumor progression based on genomic alterations. TCGA data are not based on DNA and RNA extraction from single-cells, which would be helpful for elucidating heterogeneity; however, the large variation in alleles identified throughout all the TCGA samples used would exacerbate the complexity surrounding our attempt to portray tumor progression via a single picture. We also did not consider DNA methylation status, chromosome aberrations, and microsatellite instability, which would overlay more complexity on the models developed. In conclusion, our data-driven approach to infer tumor progression models from deleterious genomic alterations in the absence of germline polymorphisms should be cautiously interpreted. Although 90–95% of cancers are sporadic, there is nevertheless great importance in the inherited component of cancer, for which the long-term lifetime chronicity of exposure to genetic polymorphisms can seed genomic alterations which were not considered in this investigation.

Conclusions

We employed computational methods to derive selectivity relationships between deleterious genomic alterations available in the The Cancer Genome Atlas. Results of our computational methods translate to portraits of evolutionary trajectories of events among major cancer driver genes. The utility of our results can be realized by oncologists and biologists who envision partitions of clusters of patients for which selectivity relationships confer certain outcomes. We conclude that the portraits which emerged during construction of graphs presented can be employed for longitudinal studies of cancer patients, to fuse genotype data with prognostic indicators of recurrence, metastasis, and survival.

Declarations

Author contribution statement

Leif E. Peterson: Conceived and designed the experiments; Wrote the paper. Tatiana Kovyrshina: Analyzed and interpreted the data.

Funding statement

This work was funded by NASA (grant NNX12AO52A).

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

215 in total

1. Measurement of Domain-Specific HER2 (ERBB2) Expression May Classify Benefit From Trastuzumab in Breast Cancer.

Authors: Daniel E Carvajal-Hausdorf; Kurt A Schalper; Lajos Pusztai; Amanda Psyrri; Konstantine T Kalogeras; Vasiliki Kotoula; George Fountzilas; David L Rimm
Journal: J Natl Cancer Inst Date: 2015-05-19 Impact factor: 13.506

2. FOXA1 positively regulates gene expression by changing gene methylation status in human breast cancer MCF-7 cells.

Authors: Lu Zheng; Bo Qian; Duo Tian; Tong Tang; Shengyun Wan; Lei Wang; Lixin Zhu; Xiaoping Geng
Journal: Int J Clin Exp Pathol Date: 2015-01-01

3. A large germline deletion in the Chek2 kinase gene is associated with an increased risk of prostate cancer.

Authors: C Cybulski; D Wokołorczyk; T Huzarski; T Byrski; J Gronwald; B Górski; T Debniak; B Masojć; A Jakubowska; B Gliniewicz; A Sikorski; M Stawicka; D Godlewski; Z Kwias; A Antczak; K Krajka; W Lauer; M Sosnowski; P Sikorska-Radek; K Bar; R Klijer; R Zdrojowy; B Małkiewicz; A Borkowski; T Borkowski; M Szwiec; S A Narod; J Lubiński
Journal: J Med Genet Date: 2006-11 Impact factor: 6.318

4. HIF-2alpha deletion promotes Kras-driven lung tumor development.

Authors: Jolly Mazumdar; Michele M Hickey; Dhruv K Pant; Amy C Durham; Alejandro Sweet-Cordero; Anil Vachani; Tyler Jacks; Lewis A Chodosh; Joseph L Kissil; M Celeste Simon; Brian Keith
Journal: Proc Natl Acad Sci U S A Date: 2010-07-21 Impact factor: 11.205

5. The temporal order of genetic and pathway alterations in tumorigenesis.

Authors: Moritz Gerstung; Nicholas Eriksson; Jimmy Lin; Bert Vogelstein; Niko Beerenwinkel
Journal: PLoS One Date: 2011-11-01 Impact factor: 3.240

6. Premalignant alterations in breast and endometrium associated with a PTEN mutation in a woman with Cowden syndrome: implications for preventive care.

Authors: Christopher B Morse; Rochelle L Garcia; Kristine E Calhoun; Elizabeth M Swisher
Journal: Gynecol Oncol Rep Date: 2015-02-09

7. SPOP mutation leads to genomic instability in prostate cancer.

Authors: Gunther Boysen; Christopher E Barbieri; Davide Prandi; Mirjam Blattner; Sung-Suk Chae; Arun Dahija; Srilakshmi Nataraj; Dennis Huang; Clarisse Marotz; Limei Xu; Julie Huang; Paola Lecca; Sagar Chhangawala; Deli Liu; Pengbo Zhou; Andrea Sboner; Johann S de Bono; Francesca Demichelis; Yariv Houvras; Mark A Rubin
Journal: Elife Date: 2015-09-16 Impact factor: 8.140

8. Promoter hypermethylation of ARID1A gene is responsible for its low mRNA expression in many invasive breast cancers.

Authors: Xianyu Zhang; Qian Sun; Ming Shan; Ming Niu; Tong Liu; Bingshu Xia; Xiaoshuan Liang; Wei Wei; Shanshan Sun; Youxue Zhang; Xiaolong Sean Liu; Qingbin Song; Yanmei Yang; Yuyan Ma; Yang Liu; Long Yang; Yanlv Ren; Guoqiang Zhang; Da Pang
Journal: PLoS One Date: 2013-01-21 Impact factor: 3.240

9. Mutation and expression analysis of the putative prostate tumour-suppressor gene PTEN.

Authors: I C Gray; L M Stewart; S M Phillips; J A Hamilton; N E Gray; G J Watson; N K Spurr; D Snary
Journal: Br J Cancer Date: 1998-11 Impact factor: 7.640

10. Coexistence of EGFR with KRAS, or BRAF, or PIK3CA somatic mutations in lung cancer: a comprehensive mutation profiling from 5125 Chinese cohorts.

Authors: S Li; L Li; Y Zhu; C Huang; Y Qin; H Liu; L Ren-Heidenreich; B Shi; H Ren; X Chu; J Kang; W Wang; J Xu; K Tang; H Yang; Y Zheng; J He; G Yu; N Liang
Journal: Br J Cancer Date: 2014-04-17 Impact factor: 7.640

1 in total

1. Integrative analysis of mutated genes and mutational processes reveals novel mutational biomarkers in colorectal cancer.

Authors: Hamed Dashti; Iman Dehzangi; Masroor Bayati; James Breen; Amin Beheshti; Nigel Lovell; Hamid R Rabiee; Hamid Alinejad-Rokny
Journal: BMC Bioinformatics Date: 2022-04-19 Impact factor: 3.307

1 in total