Literature DB >> 31797588

From genome to phenome: Predicting multiple cancer phenotypes based on somatic genomic alterations via the genomic impact transformer.

Yifeng Tao¹, Chunhui Cai, William W Cohen, Xinghua Lu.

Abstract

Cancers are mainly caused by somatic genomic alterations (SGAs) that perturb cellular signaling systems and eventually activate oncogenic processes. Therefore, understanding the functional impact of SGAs is a fundamental task in cancer biology and precision oncology. Here, we present a deep neural network model with encoder-decoder architecture, referred to as genomic impact transformer (GIT), to infer the functional impact of SGAs on cellular signaling systems through modeling the statistical relationships between SGA events and differentially expressed genes (DEGs) in tumors. The model utilizes a multi-head self-attention mechanism to identify SGAs that likely cause DEGs, or in other words, differentiating potential driver SGAs from passenger ones in a tumor. GIT model learns a vector (gene embedding) as an abstract representation of functional impact for each SGA-affected gene. Given SGAs of a tumor, the model can instantiate the states of the hidden layer, providing an abstract representation (tumor embedding) reflecting characteristics of perturbed molecular/cellular processes in the tumor, which in turn can be used to predict multiple phenotypes. We apply the GIT model to 4,468 tumors profiled by The Cancer Genome Atlas (TCGA) project. The attention mechanism enables the model to better capture the statistical relationship between SGAs and DEGs than conventional methods, and distinguishes cancer drivers from passengers. The learned gene embeddings capture the functional similarity of SGAs perturbing common pathways. The tumor embeddings are shown to be useful for tumor status representation, and phenotype prediction including patient survival time and drug response of cancer cell lines.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 31797588 PMCID： PMC6932864

Source DB: PubMed Journal: Pac Symp Biocomput ISSN： 2335-6928

Introduction

Cancer is mainly caused by the activation of oncogenes or deactivation of tumor suppressor genes (collectively called “driver genes”) as results of somatic genomic alterations (SGAs),[1] including somatic mutations (SMs),[2,3] somatic copy number alterations (SCNAs),[4,5] DNA structure variations (SVs),[6] and epigenetic changes.[7] Precision oncology relies on the capability of identifying and targeting tumor-specific aberrations resulting from driver SGAs and their effects on molecular and cellular phenotypes. However, our knowledge of driver SGAs and cancer pathways remains incomplete. Particularly, it remains a challenge to determine which SGAs (among often hundreds) in a specific tumor are drivers, which cellular signals or biological processes a driver SGA perturbs, and which molecular/cellular phenotypes a driver SGA affects. Current methods for identifying driver genes mainly concentrate on identifying genes that are mutated at a frequency above expectation, based on the assumption that mutations in these genes may provide oncogenic advantages and thus are positively selected.[8,9] Some works further focus on the mutations perturbing conserved (potentially functional) domains of proteins as indications they may be driver events.[10,11] However, these methods do not provide any information regarding the functional impact of enriched mutations on molecular/cellular phenotypes of cells. Without the knowledge of functional impact, it is difficult to further determine whether an SGA will lead to specific molecular, cellular and clinical phenotypes, such as response to therapies. What’s more, while both SMs and SCNAs may activate/deactivate a driver gene, there is no well-established frequency-based method that combines different types of SGAs to determine their functional impact. Conventionally, an SGA event perturbing a gene in a tumor is represented as a “one-hot” vector spanning gene space, in which the element corresponding to the perturbed gene is set to “1”. This representation simply indicates which gene is perturbed, but it does not reflect the functional impact of the SGA, nor can it represent the similarity of distinct SGAs that perturb a common signaling pathway. We conjecture that it is possible to represent an SGA as a low-dimensional vector, in the same manner as the “word embedding”[12-14] in the natural language processing (NLP) field, such that the representation reflects the functional impact of a gene on biological systems, and genes sharing similar functions should be closely located in such embedding space. Here the “similar function” is broadly defined, e.g., genes from the same pathway or of the same biological process.[15] Motivated by this, we propose a scheme for learning “gene embeddings” for SGA-affected genes, i.e., a mapping from individual genes to low-dimensional vectors of real numbers that are useful in multiple prediction tasks. Based on the assumption that SGAs perturbing cellular signaling systems often eventually lead to changes in gene expression,[16] we introduce an encoder-decoder architecture neural network model called “genomic impact transformer” (GIT) to predict DEGs and detect potential cancer drivers with the supervision of DEGs. While deep learning models are being increasingly used to model different bioinformatics problems,[17,18] to our knowledge there are few studies using the neural network to model the relationships between SGAs and molecular/cellular phenotypes in cancers. The proposed GIT model has the following innovative characteristics: (1) The encoder part of the transformer[19] first uses SGAs observed in a tumor as inputs, maps each SGA into a gene embedding representation, and combines gene embeddings of SGAs to derive a personalized “tumor embedding”. Then the decoder part decodes and translates the tumor embedding to DEGs. (2) A multi-head self-attention mechanism[20,21] is utilized in the encoder, which is a technique widely used in NLP to choose the input features that significantly influence the output. It differentiates SGAs by assigning different weights to them so that it can potentially distinguish SGAs that have an impact on DEG from those do not, i.e., detecting drivers from passengers. (3) Pooling inferred weighted impact of SGAs in a tumor produces a personalized tumor embedding, which can be used as an effective feature to predict DEGs and other phenotypes. (4) Gene embeddings are pre-trained by a “Gene2Vec” algorithm and further refined by the GIT, which captures the functional impact of SGAs on the cellular signaling system. Our results and analysis indicate that above innovative approaches enable us to derive powerful gene embedding and tumor embedding representations that are highly informative of molecular, cellular and clinical phenotypes.

Materials and methods

SGAs and DEGs pre-processing

We obtained SGA data, including SMs and SCNAs, and DEGs of 4,468 tumors consisting of 16 cancer types directly from TCGA portal.[22] Details available in SI (Sec. S1).

The GIT neural network

GIT network structure: encoder-decoder architecture

Figure 1a shows the general structure of the GIT model with an overall encoder-decoder architecture. GIT mimics hierarchically organized cellular signaling system,[23,24] in which a neuron may potentially encode the signal of one or more signaling proteins. When a cellular signaling system is perturbed by SGAs, it often can lead to changes in measured molecular phenotypes, such as gene expression changes. Thus, for a tumor t, the set of its is connected to the GIT neural network as observed input (Fig. 1a bottom part squares). The impact of SGAs is represented as embedding vectors which are further linearly combined to produce a tumor embedding vector e through an attention mechanism in the encoder (Fig. 1a middle part). We explicitly represent cancer type s and its influence on encoding system e of the tumor because tissue type influences which genes are expressed in cells of specific tissue as well. Finally, the decoder module, which consists of a feed-forward multi-layer perceptron (MLP),[25] transforms the functional impact of SGAs and cancer type into DEGs of the tumor (Fig. 1a top part).

Fig. 1.

(a) Overall architecture of GIT. An example case and its detected drivers are shown. (b) A two-dimensional demo that shows how attention mechanism combines multiple gene embeddings of SGAs and cancer type embedding e into a tumor embedding vector e using attention weights (c) Calculation of attention weights using gene embeddings

Pre-training gene embeddings using Gene2Vec algorithm

In this study, we projected the discrete binary representation of SGAs perturbing a gene into a continuous embedding space, which we call “gene embeddings” of corresponding SGAs, using a “Gene2Vec” algorithm, based on the assumption of co-occurrence pattern of SGAs in each tumor, including mutually exclusive patterns of mutations affecting a common pathway.[26] These gene embeddings were further updated and fine-tuned by the GIT model with the supervision of affected DEGs. Algorithm details available in SI (Sec. S2).

Encoder: multi-head self-attention mechanism

To detect the difference of functional impact of SGAs in a tumor, we designed a multi-head self-attention mechanism (Fig. 1a middle part). For all SGA-affected genes and the cancer type s of a tumor t, we first mapped them to corresponding gene embeddings and a cancer type embedding e from a look-up table where e and e are real-valued vectors. From the implementation perspective, we treated cancer types in the same way as SGAs, except the attention weight of it is fixed to be “1”. The overall idea of producing the tumor embedding e is to use the weighted sum of cancer type embedding e and gene embeddings (Fig. 1b) : The attention weights were calculated by employing multi-head self-attention mechanism, using gene embeddings of SGAs in the tumor: (Fig. 1c). See SI (Sec. S3) for mathematical details. Overall we have three parameters {W0, Θ, ε} to train in the multi-head attention module using back-propagation.[27] The look-up table was initialized with Gene2Vec pre-trained gene embeddings and refined by GIT here.

Decoder: multi-layer perceptron (MLP)

For a specific tumor t, we fed tumor embedding et into an MLP with one hidden layer as the decoder, using non-linear activation functions and fully connected layers, to produce the final predictions for DEGs y; (Fig. 1a top part): where ReLU(x) = max(0, x) is rectified linear unit, and σ(x) = (1+exp(−x))−1 is sigmoid activation function. The output of the decoder and actual values of DEGs were used to calculate the 2-regularized cross entropy, which was minimized during training: cross entropy loss defined as regularizer defined as

Training and evaluation

We utilized PyTorch (https://pytorch.org/) to train, validate and test the Gene2Vec, GIT (variants) and other conventional models (Lasso and MLPs; Section 3.1). The training, validation and test sets were split in the ratio of 0.33:0.33:0.33 and fixed across different models. The hyperparameters were tuned over the training and validation sets to get best F1 scores, trained on training and validation sets, and finally applied to the test set for evaluation if not further mentioned below. The models were trained by updating parameters using backpropagation,[27] specifically, using mini-batch Adam[28] with default momentum parameters. Gene2Vec used mini-batch stochastic gradient descent (SGD) instead of Adam. Dropout[29] and weight decay (l-regularization) were used to prevent overfitting. We trained all the models 30 to 42 epochs until they fully converged. The output DEGs were represented as a sparse binary vector. We utilized various performance metrics including accuracy, precision, recall, and F1 score, where F1 is the harmonic mean of precision and recall. The training and test were repeated for five runs get the mean and variance of evaluation metrics. We designed two metrics in the present work for evaluating the functional similarity among genes sharing similar gene embedding: “nearest neighborhood (NN) accuracy” and “GO enrichment”. See SI (Sec. S4) for the definition and meaning of them.

Results

GIT statistically detects real biological signals

The task of GIT is to predict DEGs (dependent variables) using SGAs as input (independent variables). Our results of GIT performance on both real and shuffled data demonstrates that GIT is able to capture real statistical relationships between SGAs and DEGs from the noisy biological data (SI: Sec. S5). As a comparison, we also trained and tested the Lasso (multivariate regression with l1-regularization)[30] and MLPs[25] as baseline prediction models to predict DEGs based on SGAs. The Lasso model is appealing in our setting because, when predicting a DEG, it can filter out most of the irrelevant input variables (SGAs) and keep only the most informative ones, and it is a natural choice in our case where there are 19.8k possible SGAs. However, in comparison to MLP, it lacks the capability of portraying complex relationships between SGAs and DEGs. On the other hand, while conventional MLPs have sufficient power to capture complex relationships–particularly, the neurons in hidden layers may mimic signaling proteins24–they can not utilize any biological knowledge extracted from cancer genomics, nor do they explain the signaling process and distinguish driver SGAs. We employed the precision, recall, F1 score, as well as accuracy to compare GIT and traditional methods (Table 1: 1st to 4th, and last rows). One can conclude that GIT outperforms all these other conventional baseline methods for predicting DEGs in all metrics, indicating the specifically designed structure of GIT is able to soar the performance in the task of predicting DEGs from SGAs.

Table 1.

Performances of GIT (variants) and baseline methods.

Methods	Precision	Recall	F1 score	Accuracy

Lasso	59.6±0.05	52.8±0.03	56.0±0.01	74.0±0.02
1 layer MLP	61.9±0.09	50.4±0.17	55.6±0.07	74.7±0.02
2 layer MLP	64.2±0.39	52.0±0.66	56.5±0.19	75.9±0.09
3 layer MLP	64.2±0.37	50.5±0.30	52.1±0.29	75.7±0.13

GIT - can	60.5±0.34	45.8±0.38	52.1±0.29	73.6±0.14
GIT - attn	67.6±0.32	55.3±0.77	60.8±0.35	77.7±0.05
GIT - init	69.8±0.28	54.1±0.37	60.9±0.16	78.3±0.06

GIT	69.5±0.09	57.1±0.18	62.7±0.08	78.7±0.01

In order to evaluate the utility of each module (procedure) in GIT, we conducted ablation study by removing one module at a time: the cancer type input (“can”), the multi-head self-attention module (“attn”), and the initialization with pre-trained gene embeddings (“init”). The impact of each module can be detected by comparing to the full GIT model. All the modules in GIT help to improve the prediction of DEGs from SGAs in terms of overall performance: F1 score and accuracy (Table 1: 5th to last rows).

Gene embeddings compactly represent the functional impact of SGAs

We examined whether the gene embeddings capture the functional similarity of SGAs, using mainly two metrics: NN accuracy and GO enrichment (Defined in SI Sec. S4). NN accuracy: By capturing the co-occurrence pattern of somatic alterations, the Gene2Vec pre-trained gene embeddings improve 36% in NN accuracy over the random chance of any pair of the genes sharing Gene Ontology (GO) annotation[15] (Table 2). The fine-tuned embeddings by GIT further show a one-fold increase in NN accuracy. These results indicate that the learned gene embeddings are consistent with the gene functions, and they map the discrete binary SGA representation into a meaningful and compact space. GO enrichment: We performed clustering analysis of SGAs in embedding space using k-means clustering, and calculated GO enrichment, and we varied the number of clusters (k ) to derive clusters with different degrees of granularity (Fig. 2a). As one can see, when the genes are randomly distributed in the embedding space, they get GO enrichment of 1. However, in the gene embedding space, the GO enrichment increases fast until the number of clusters reaches 40, indicating a strong correlation between the clusters in embedding space and the functions of the genes.

Table 2.

NN accuracy with respect to GO in different gene embedding spaces.

Gene embeddings	NN accuracy	Improvement

Random pairs	5.3±0.36	–
Gene2Vec	7.2	36%
Gene2Vec + GIT	10.7	100%

Fig. 2.

(a) GO enrichment of vs. number of groups in k-means clustering. (b) t-SNE visualization of gene embeddings. The different colors represent k-means (40 clusters) clustering labels. An enlarged inset of a cluster is shown, which contains a set of closely related genes which we refer to “IFN pathway”. (c) Landscape of attention of SGAs based on attention weights and frequencies.

To visualize the manifold of gene embeddings, we grouped the genes into 40 clusters, and conducted the t-SNE[31] of genes (Fig. 2b left panel). Using PANTHER GO enrichment analysis,[32] 12 out of 40 clusters are shown to be enriched in at least one biological process (SI Sec. S6). Most of the gene clusters are well-defined and tight located in the projected t-SNE space. As a case study, we took a close look at one cluster (Fig. 2b right panel), which contains a set of functionally similar genes, such as that code a protein family of type I interferons (IFNs), which are responsible for immune and viral response.[33]

Self-attention reveals impactful SGAs on cancer cell transcriptome

While it is widely accepted that cancer is mainly caused by SGAs, but not all SGAs observed in a cancer cell are causative.[1] Previous methods mainly concentrate on searching for SGAs with higher than expected frequency to differentiate candidate drivers SGAs from passenger SGAs. GIT provides a novel perspective to address the problem: identifying the SGAs that have a functional impact on cellular signaling systems and eventually lead DEGs as the tumor-specific candidate drivers. Here we compare the relationship of overall attention weights (inferred by GIT model) and the frequencies of somatic alterations (used as the benchmark/control group) in all the cancer types (Pan-Cancer) from our test data (Fig. 2c). In general, the attention weights are correlated with the alteration frequencies of genes, e.g., common cancer drivers such as TP53 and PIK3CA are the top two SGAs selected by both methods.[2] However, our self-attention mechanism assigns high weights to many of genes previously not designated as drivers, indicating these genes are potential cancer drivers although their roles in cancer development remain to be further studied. Table 3 lists top SGAs ranked according to GIT attention weights in pan-cancer and five selected cancer types, where known cancer drivers from TumorPortal[3] and IntOGen[34] are marked as bold font. Apart from TP53 and PIK3CA as drivers in the pan-cancer analysis,[2] we also find the top cancer drivers in specific cancer types consistent with our knowledge of cancer oncology. For example, CDH1 and GATA3 are drivers of breast invasive carcinoma (BRCA),[35] CASP8 is known driver of head and neck squamous cell carcinoma (HNSC),[36] STK11, KRAS, KEAP1 are known drivers of lung adenocarcinoma (LUAD),[37] PTEN and RB1 are drivers of glioblastoma (GBM),[38] and FGFR3, RB1, HSP90AA1, STAG2 are known drivers in urothelial bladder carcinoma (BLCA).[39] In contrast, the most frequently mutated genes (control group) are quite different from that using attention mechanism (experiment group), and only a few of them are known drivers (SI Sec. S7).

Table 3.

Top five SGA-affected genes ranked according to attention weight.

Rank	PANCAN	BRCA	HNSC	LUAD	GBM	BLCA

1	TP53	TP53	TP53	STK11	TP53	TP53
2	PIK3CA	PIK3CACASP8		TP53	PTEN	FGFR3
3	RB1	CDH1	PIK3CAKRAS		C9orf53	RB1
4	PBRM1	GATA3	CYLD	CYLC2	RB1	HSP90AA1
5	PTEN	MED24	RB1	KEAP1	CHIC2	STAG2

Personalized tumor embeddings reveal distinct survival profiles

Besides learning the specific biological function impact of SGAs on DEGs, we further examined the utility of tumor embeddings e in two perspectives: (1) Discovering patterns of tumors potentially sharing common disease mechanisms across different cancer types; (2) Using tumor embedding to predict patient survival. We first used the t-SNE plot of tumor embeddings to illustrate the common disease mechanisms across different cancer types (Fig. 3a). When cancer type embedding e is included in full tumor embedding e, which has a much higher weight than any individual gene embedding (Fig. 1b, Eq. 1) and dominates the full tumor embedding, tumor samples are clustered according to cancer types. This is not surprising as it is well appreciated that expressions of many genes are tissue-specific.[40] To examine the pure effect of SGAs on tumor embedding, we removed the effect of tissue by subtracting cancer type embeddings e, followed by clustering tumors in the stratified tumor embedding space (Fig. 3b). It is interesting to see that each dense area (potential tumor clusters) includes tumors from different tissues of origins, indicating SGAs in these tumors may reflect shared disease mechanisms (pathway perturbations) among tumors, warranting further investigations.

Fig. 3.

(a) t-SNE of full tumor embedding e. (b) t-SNE of stratified tumor embedding (e-e). (c) PCA of tumor embedding shows internal subtype structure of BRCA tumors. Color lablels the group index of k-means clustering. (d) KM estimators of the three breast cancer groups. (e) Cox regression using tumor embeddings.

The second set of experiments was to test whether differences in tumor embeddings (thereby difference in disease mechanisms) are predictive of patient clinical outcomes. We conducted unsupervised k-means clustering using only breast cancer tumors from our test set, which reveals 3 three groups (Fig. 3c) with significant difference in survival profiles evaluated by log-rank test[41] (Fig. 3d; p-value=0.017). In addition, using tumor embeddings as input features, we trained l1,2-regularized (elastic net)[42] Cox proportional hazard models[43] in a 10-fold cross-validation (CV) experiment. This led to an informative ranked list of tumors according to predicted survivals/hazards evaluated by the concordance index (CI) value (CI=0.795), indicating that the trained model is very accurate. We further split test samples into two groups divided by the median of predicted survivals/hazards, which also yields significant separation of patients in survival profiles (Fig. 3e; p-value=5.1 × 10), indicating that our algorithm has correctly ranked the patients according to characteristics of the tumor. As shown above, distinct SGAs may share similar embeddings if they share similar functional impact. Thus, two tumors may have similar tumor embeddings even though they do not share any SGAs, as long as the functional impact of distinct SGAs from these tumors are similar. Therefore, tumor embedding makes it easier to discover common disease mechanisms and their impact on patient survival. To further test this, we also performed clustering analysis on breast cancer tumors represented in original SGA space, followed similar survival analysis as described in the previous paragraph (SI Sec. S8).

Tumor embeddings are predictive of drug responses of cancer cell lines

Precision oncology concentrates on using patient-specific omics data to determine optimal therapies for a patient. We set out to see if SGA data of cancer cells can be used to predict their sensitivity to anti-cancer drugs. We used the CCLE dataset,[44] which performed drug sensitivity screening over hundreds of cancer cell lines and 24 anti-cancer drugs. The study collects genomic and transcriptomic data of these cell lines, but in general, the genomic data (except the molecularly targeted genes) from a cell line are not sufficient to predict sensitivity its sensitivity to different drugs. We discretized the response of each drug following the procedure in previous research.[44,45] Since CCLE only contains a small subset of mutations in TCGA dataset (around 1,600 gene mutations), we retrained the GIT with this limited set of SGAs in TCGA, using default hyperparameters we set before. Cancer type input was removed as well, which is not explicitly provided in CCLE dataset. The output of tumor embeddings e was then extracted as feature. We formulated drug response prediction as a binary classification problem with l1-regularized cross entropy loss (Lasso), where the input can be raw sparse SGAs or tanh-curved tumor embeddings tanh(e). Following previous work,[44] we performed 10-fold CV experiment training Lasso using either inputs to test the drug response prediction task of four drugs with distinct targets. Lasso regression using tumor embeddings consistently outperforms the models trained with original SGAs as inputs (Fig. 4). Specifically, in the case of Sorafenib, the raw mutations just give random prediction results, while the tumor embedding is able to give predictable results. It should be noted that it is possible that certain cancer cells may host SGAs along the pathways related to FGFR, RAF, EGFR, and RTK, rendering them sensitive to the above drugs. Such information can be implicitly captured and represented by the tumor embeddings, so that the information from raw SGAs are captured and pooled to enhance classification accuracy.

Fig. 4.

ROC curves and the areas under the curve (AUCs) of Lasso models trained with original SGAs and tumor embeddings representations on predicting responses to four drugs.

Conclusion and Future Work

Despite the significant advances in cancer biology, it remains a challenge to reveal disease mechanisms of each individual tumor, particularly which and how SGAs in a cancer cell lead to the development of cancer. Here we propose the GIT model to learn the general impact of SGAs, in the form of gene embeddings, and to precisely portray their effects on the downstream DEGs with higher accuracy. With the supervision of DEGs, we can further assess the importance of an SGA using multi-head self-attention mechanisms in each individual tumor. More importantly, while the tumor embeddings are trained with predicting DEGs as the task, it contains information for predicting other phenotypes of cancer cells, such as patient survival and cancer cell drug sensitivity. The key advantage of transforming SGA into a gene embedding space is that it enables the detection and representation of the functional impact of SGAs on cellular processes, which in turn enables detection of common disease mechanisms of tumors even if they host different SGAs. We anticipate that GIT, or other future models like it, can be applied broadly to gain mechanistic insights of how genomic alterations (or other perturbations) lead to specific phenotypes, thus providing a general tool to connect genome to phenome in different biological fields and genetic diseases. One should also be careful that despite the correlation of genomic alterations and phenotypes such as survival profiles and drug response, the model may not fully reveal the causalities and there may exist other confounding factors not considered. There are a few future directions for further improving the GIT model. First of all, decades of biomedical research has accumulated a rich body of knowledge, e.g., Gene Ontology and gene regulatory networks, which may be incorporated as the prior of the model to boost the performance.[46] Secondly, we expect that by getting a larger corpus of tumor data with mutations and gene expressions, we will be able to train better models to minimize potential overfitting or variance. Lastly, more clinically oriented investigations are warranted to examine, when trained with a large volume of tumor omics data, the learned embeddings of SGAs and tumors may be applied to predict sensitivity or resistance to anti-cancer drugs based SGA data that are becoming readily available in contemporary oncology practice.

33 in total

1. Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics.

Authors: Michael Q Ding; Lujia Chen; Gregory F Cooper; Jonathan D Young; Xinghua Lu
Journal: Mol Cancer Res Date: 2017-11-13 Impact factor: 5.852

Review 2. Deep learning in bioinformatics.

Authors: Seonwoo Min; Byunghan Lee; Sungroh Yoon
Journal: Brief Bioinform Date: 2017-09-01 Impact factor: 11.622

3. The mutational landscape of head and neck squamous cell carcinoma.

Authors: Nicolas Stransky; Ann Marie Egloff; Aaron D Tward; Aleksandar D Kostic; Kristian Cibulskis; Andrey Sivachenko; Gregory V Kryukov; Michael S Lawrence; Carrie Sougnez; Aaron McKenna; Erica Shefler; Alex H Ramos; Petar Stojanov; Scott L Carter; Douglas Voet; Maria L Cortés; Daniel Auclair; Michael F Berger; Gordon Saksena; Candace Guiducci; Robert C Onofrio; Melissa Parkin; Marjorie Romkes; Joel L Weissfeld; Raja R Seethala; Lin Wang; Claudia Rangel-Escareño; Juan Carlos Fernandez-Lopez; Alfredo Hidalgo-Miranda; Jorge Melendez-Zajgla; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Matthew Meyerson; Eric S Lander; Gad Getz; Todd R Golub; Levi A Garraway; Jennifer R Grandis
Journal: Science Date: 2011-07-28 Impact factor: 47.728

Review 4. The fundamental role of epigenetic events in cancer.

Authors: Peter A Jones; Stephen B Baylin
Journal: Nat Rev Genet Date: 2002-06 Impact factor: 53.242

Review 5. Cancer genome landscapes.

Authors: Bert Vogelstein; Nickolas Papadopoulos; Victor E Velculescu; Shibin Zhou; Luis A Diaz; Kenneth W Kinzler
Journal: Science Date: 2013-03-29 Impact factor: 47.728

6. Large-scale gene function analysis with the PANTHER classification system.

Authors: Huaiyu Mi; Anushya Muruganujan; John T Casagrande; Paul D Thomas
Journal: Nat Protoc Date: 2013-07-18 Impact factor: 13.491

7. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Authors: Katherine A Hoadley; Christina Yau; Denise M Wolf; Andrew D Cherniack; David Tamborero; Sam Ng; Max D M Leiserson; Beifang Niu; Michael D McLellan; Vladislav Uzunangelov; Jiashan Zhang; Cyriac Kandoth; Rehan Akbani; Hui Shen; Larsson Omberg; Andy Chu; Adam A Margolin; Laura J Van't Veer; Nuria Lopez-Bigas; Peter W Laird; Benjamin J Raphael; Li Ding; A Gordon Robertson; Lauren A Byers; Gordon B Mills; John N Weinstein; Carter Van Waes; Zhong Chen; Eric A Collisson; Christopher C Benz; Charles M Perou; Joshua M Stuart
Journal: Cell Date: 2014-08-07 Impact factor: 41.582

8. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

9. Mutational heterogeneity in cancer and the search for new cancer-associated genes.

Authors: Michael S Lawrence; Petar Stojanov; Paz Polak; Gregory V Kryukov; Kristian Cibulskis; Andrey Sivachenko; Scott L Carter; Chip Stewart; Craig H Mermel; Steven A Roberts; Adam Kiezun; Peter S Hammerman; Aaron McKenna; Yotam Drier; Lihua Zou; Alex H Ramos; Trevor J Pugh; Nicolas Stransky; Elena Helman; Jaegil Kim; Carrie Sougnez; Lauren Ambrogio; Elizabeth Nickerson; Erica Shefler; Maria L Cortés; Daniel Auclair; Gordon Saksena; Douglas Voet; Michael Noble; Daniel DiCara; Pei Lin; Lee Lichtenstein; David I Heiman; Timothy Fennell; Marcin Imielinski; Bryan Hernandez; Eran Hodis; Sylvan Baca; Austin M Dulak; Jens Lohr; Dan-Avi Landau; Catherine J Wu; Jorge Melendez-Zajgla; Alfredo Hidalgo-Miranda; Amnon Koren; Steven A McCarroll; Jaume Mora; Brian Crompton; Robert Onofrio; Melissa Parkin; Wendy Winckler; Kristin Ardlie; Stacey B Gabriel; Charles W M Roberts; Jaclyn A Biegel; Kimberly Stegmaier; Adam J Bass; Levi A Garraway; Matthew Meyerson; Todd R Golub; Dmitry A Gordenin; Shamil Sunyaev; Eric S Lander; Gad Getz
Journal: Nature Date: 2013-06-16 Impact factor: 49.962