| Literature DB >> 26864072 |
Ana B Pavel1,2, Dmitriy Sonkin3, Anupama Reddy4.
Abstract
BACKGROUND: High throughput technologies have been used to profile genes in multiple different dimensions, such as genetic variation, copy number, gene and protein expression, epigenetics, metabolomics. Computational analyses often treat these different data types as independent, leading to an explosion in the number of features making studies under-powered and more importantly do not provide a comprehensive view of the gene's state. We sought to infer gene activity by integrating different dimensions using biological knowledge of oncogenes and tumor suppressors.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26864072 PMCID: PMC4750289 DOI: 10.1186/s12918-016-0260-9
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Inferring gene activity by integrating different data types and biological knowledge. a Example showing how mutation, copy number and expression data are important for inferring the activity of PIK3CA (oncogene), and PTEN (tumor suppressor). b Schematic for Fuzzy Logic Modeling (FLM)
Fuzzy rules
| RULE 1: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS high) and (CN IS amplified) THEN Gene_activity IS high_GoF; |
| RULE 2: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS high) and (CN IS neutral) THEN Gene_activity IS high_GoF; |
| RULE 3: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS high) and (CN IS deleted) THEN Gene_activity IS GoF; |
| RULE 4: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS medium) and (CN IS amplified) THEN Gene_activity IS high_GoF; |
| RULE 5: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS medium) and (CN IS neutral) THEN Gene_activity IS GoF; |
| RULE 6: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS medium) and (CN IS deleted) THEN Gene_activity IS low_GoF; |
| RULE 7: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS low) and (CN IS amplified) THEN Gene_activity IS high_GoF; |
| RULE 8: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS low) and (CN IS neutral) THEN Gene_activity IS GoF; |
| RULE 9: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS recurrent) and (Expression IS low) and (CN IS deleted) THEN Gene_activity IS no_effect; |
| RULE 10: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS high) and (CN IS amplified) THEN Gene_activity IS GoF; |
| RULE 11: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS high) and (CN IS neutral) THEN Gene_activity IS low_GoF; |
| RULE 12: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS high) and (CN IS deleted) THEN Gene_activity IS low_GoF; |
| RULE 13: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS medium) and (CN IS amplified) THEN Gene_activity IS low_GoF; |
| RULE 14: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS medium) and (CN IS neutral) THEN Gene_activity IS low_GoF; |
| RULE 15: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS medium) and (CN IS deleted) THEN Gene_activity IS no_effect; |
| RULE 16: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS low) and (CN IS amplified) THEN Gene_activity IS low_GoF; |
| RULE 17: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS low) and (CN IS neutral) THEN Gene_activity IS no_effect; |
| RULE 18: IF ((Mutation IS Missense_Mutation) or (Mutation IS In_Frame_Del) or (Mutation IS In_Frame_Ins)) and (Recurrence IS non_recurrent) and (Expression IS low) and (CN IS deleted) THEN Gene_activity IS no_effect; |
| RULE 19: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS high) and (CN IS amplified) THEN Gene_activity IS low_LoF; |
| RULE 20: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS high) and (CN IS neutral) THEN Gene_activity IS LoF; |
| RULE 21: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS high) and (CN IS deleted) THEN Gene_activity IS high_LoF; |
| RULE 22: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS medium) and (CN IS amplified) THEN Gene_activity IS LoF; |
| RULE 23: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS medium) and (CN IS neutral) THEN Gene_activity IS LoF; |
| RULE 24: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS medium) and (CN IS deleted) THEN Gene_activity IS high_LoF; |
| RULE 25: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS low) and (CN IS amplified) THEN Gene_activity IS high_LoF; |
| RULE 26: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS low) and (CN IS neutral) THEN Gene_activity IS high_LoF; |
| RULE 27: IF ((Mutation IS Frame_Shift_Ins) or (Mutation IS Frame_Shift_Del) or (Mutation IS Nonsense_Mutation) or (Mutation IS Nonstop_Mutation) or (Mutation IS Splice_Site)) and (Expression IS low) and (CN IS deleted) THEN Gene_activity IS high_LoF; |
| RULE 28: IF ((Mutation IS No_Mutation) and ((Expression IS low) or (CN IS deleted))) THEN Gene_activity IS LoF; |
| RULE 29: IF ((Mutation IS No_Mutation) and ((Expression IS high) or (CN IS amplified))) THEN Gene_activity IS low_GoF; |
| RULE 30: IF ((Mutation IS No_Mutation) and (Expression IS medium) and (CN IS neutral)) THEN Gene_activity IS no_effect; |
Fig. 2Gene activity scores and inferred GoF/LoF status using Fuzzy Logic Modeling. a Distribution of GoF and LoF activity scores across all genes and all samples. b For each gene that presents mutations in CCLE (more than 1 % of the samples), two scores are computed (GoF and LoF gene score). GoF gene score is computed as the percentage of mutated samples with G o F>|L O F|. LoF gene score is computed as the percentage of mutated samples with |L o F|>G o F. A gene is classified as GoF (oncogene) if the GoF gene score is >50 % or as LoF (tumor suppressor) if the LoF gene score is >50 %. c Known oncogenes [3] were correctly predicted by our method with an accuracy of 90 % (19/21). d Known oncogenes [3] were correctly predicted by our method with an accuracy of 86 % (18/21). Note that the known oncogenes and tumor suppressors were restricted to those that were found to be mutated in the CCLE at >1 % frequency
Gene targets and the predictors of sensitivity for the compounds
| Compounds | Direct gene targets | Known sensitivity predictors | Sensitivity threshold | Resistance threshold |
|---|---|---|---|---|
| PLX4720 | BRAF | BRAF mutation |
|
|
| AZD6244 | MEK | BRAF mutation |
|
|
| Erlotinib | EGFR | EGFR mutation |
|
|
| Nutlin-3 | MDM2 | TP53 mutation |
|
|
| TP53 expression | ||||
| BYL719 | PIK3CA | PIK3CA mutation |
|
|
|
|
|
Fig. 4FLM gene activity scores differentiate the sensitive vs. resistant groups better than the relevant mutations (colored red) in each compound: a PLX4720, c Nutlin-3, e AZD6244, g Erlotinib. FLM scores improve prediction of drug sensitivity compared to gene expression, somatic mutation and copy number data separately: b PLX4720, p<0.00002, d Nutlin-3, p<0.06, f AZD6244, p<0.22, h Erlotinib using EGFR-KRAS predictor, p<0.01. We denote by * the significance level of 0.05
Fig. 3FLM gene activity scores improve prediction of BYL719 drug sensitivity compared to using expression, mutation and copy number data separately. a Boxplot for PIK3CA FLM scores vs. BYL719 (PIK3CA inhibitor) sensitivity. BYL719 sensitive group has higher activity scores compared to the resistant group (t-test p <10−4). Even within the PIK3CA missense mutants (colored in red), we see that FLM GoF scores are higher in sensitive compared to resistant group (t-test p <0.0008). b Using PIK3CA FLM GoF scores to predict sensitivity, the AUC significantly improved compared to expression, mutation and copy number data separately, p<0.05. We denote by * the significance level of 0.05. c Heatmap showing the FLM activity scores for PIK3CA, PTEN and the individual data types. All values are scaled between [–1, 1]. Note that our algorithm correctly labeled PIK3CA as a GoF gene, and PTEN as a LoF gene, consistent with their classification in the literature. The color bar on top indicates the sensitivity groups for the samples (green = sensitive, black = resistant). The combined predictor of PIK3CA GoF scores and PTEN LoF scores significantly improves performance compared to combinations of individual data types, p<0.009
Fig. 5Identifying unsupervized clusters in colorectal cancer and finding differential gene activity within each cluster. a Consensus matrix for K=2,3,4,5, using k-means clustering on colorectal cell lines. The consensus matrices show that there are two distinct subtypes which are stable even when K is increased. b Principal component analysis (PCA) plot of the FLM gene activity scores for 42 colorectal cancer cell lines. Colors indicate the two subtypes found using consensus clustering. c Subtypes found by FLM in CCLE are validated by comparing with subtypes in TCGA [36]. CCS2 is correlated with cluster 2 (green), while cluster 1 is split between CCS1 and CCS3. d Heatmap of the significantly differential gene activity scores (Student’s t-test, F D R<0.05) which differentiate the two FLM subtypes