| Literature DB >> 22615549 |
Christof Winter1, Glen Kristiansen, Stephan Kersting, Janine Roy, Daniela Aust, Thomas Knösel, Petra Rümmele, Beatrix Jahnke, Vera Hentrich, Felix Rückert, Marco Niedergethmann, Wilko Weichert, Marcus Bahra, Hans J Schlitt, Utz Settmacher, Helmut Friess, Markus Büchler, Hans-Detlev Saeger, Michael Schroeder, Christian Pilarsky, Robert Grützmann.
Abstract
Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22615549 PMCID: PMC3355064 DOI: 10.1371/journal.pcbi.1002511
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Clinical characteristics of patients used in this study.
| Screening dataset | Validation dataset | Validation dataset | ||||
| Adjuvant therapy | No adjuvant therapy | |||||
| n | % | n | % | n | % | |
| Total number of patients | 30 | 100 | 172 | 100 | 240 | 100 |
| Sex | ||||||
| female | 19 | 63 | 73 | 42 | 115 | 48 |
| male | 11 | 37 | 99 | 58 | 125 | 52 |
| Age | ||||||
| Median (years) | 65 | 65 | 66 | |||
| Range (years) | 40–82 | 33–81 | 31–85 | |||
| Patient survival | ||||||
| 1 year | 17 | 57 | 104 | 60 | 146 | 61 |
| 2 years | 9 | 30 | 38 | 22 | 83 | 35 |
| 3 years | 6 | 20 | 12 | 7 | 54 | 23 |
| 4 years | 3 | 10 | 4 | 2 | 32 | 13 |
| 5 years | 0 | 0 | 3 | 2 | 15 | 6 |
| Median (months) | 17.5 | 14.4 | 15.5 | |||
| Range (months) | 5–53 | 1–109 | 1–189 | |||
| Primary tumor | ||||||
| pT1 | 0 | 0 | 3 | 2 | 6 | 2 |
| pT2 | 2 | 7 | 28 | 16 | 42 | 18 |
| pT3 | 26 | 87 | 137 | 80 | 182 | 76 |
| pT4 | 2 | 7 | 4 | 2 | 9 | 4 |
| Regional lymph nodes | ||||||
| pN0 | 12 | 40 | 43 | 25 | 88 | 37 |
| pN1 | 18 | 60 | 129 | 75 | 148 | 62 |
| Distant metastasis | ||||||
| M0 | 29 | 97 | 152 | 88 | 182 | 76 |
| M1 | 1 | 3 | 11 | 6 | 17 | 7 |
| Histologic grade | ||||||
| G1 | 1 | 3 | 7 | 4 | 17 | 7 |
| G2 | 11 | 37 | 67 | 39 | 108 | 45 |
| G3 | 18 | 60 | 97 | 56 | 111 | 46 |
| Residual tumor | ||||||
| R0 | 24 | 80 | 119 | 69 | 167 | 70 |
| R1 | 3 | 10 | 40 | 23 | 57 | 24 |
| R2 | 3 | 10 | 6 | 3 | 5 | 2 |
| Stage grouping | ||||||
| I | 2 | 7 | 11 | 6 | 18 | 8 |
| IIA | 10 | 33 | 26 | 15 | 54 | 22 |
| IIB | 15 | 50 | 112 | 65 | 106 | 44 |
| III | 2 | 7 | 3 | 2 | 4 | 2 |
| IV | 1 | 3 | 11 | 6 | 17 | 7 |
The screening dataset (genome-wide gene expression profiling) comprises 30 samples of surgically resected pancreatic ductal adenocarcinoma from patients without adjuvant chemotherapy. The validation dataset (immunohistochemistry of seven marker candidates) comprises samples from 412 patients, of which 172 had received adjuvant therapy and 240 had not. Significant differences between the adjuvant and no adjuvant therapy subgroups were found for regional lymph nodes status (, Fisher's exact test) and for the stage groupings (, Fisher's exact test). Differences in all other variables were not significant.
†: Based on postsurgical histopathological assessment (indicated by the p prefix).
‡: Stage was assessed by the American Joint Committee on Cancer 2006 guidelines.
Figure 1Monte Carlo cross-validation workflow to evaluate the accuracy of methods for ranking genes for outcome prediction.
The full dataset is a gene expression matrix with 8,000 features (the genes) as rows and 30 samples (the patients) as columns. For each patient, the outcome (poor or good) is given (1). The dataset is randomly divided into a training and a test set (2). Within the training set, genes are ranked by how different they are between patients with poor and good outcome (3). The most different genes are selected (4). They are used to train a machine learning classifier on the training set (5). After training, the classifier is asked to predict the outcome of the test set patients (6). The predicted outcome is compared with the true outcome and the number of correctly classified patients is noted (7). Steps 2–7 are repeated 1,000 times, and the resulting final accuracy is obtained by averaging over the 1,000 accuracies of step 7.
Figure 2NetRank feature selection outperforms standard feature selection methods.
(A) The accuracy of different feature selection methods for predicting patient outcome was tested on the screening dataset. The NetRank feature selection using a transcription factor network is shown in red. For smaller training set sizes, our method is superior to all other feature selection methods, reaching an accuracy of 72% in a Monte Carlo cross-validation. (B) Markers found with NetRank are more accurate than markers described in literature.
Figure 3Regulatory network around signature genes.
(A) All direct neighbors for the seven candidates STAT3, FOS, JUN, SP1, CDX2, CEBPA, and BRCA1 (marked yellow). Transcription factors are marked with a dot. Genes reported in the literature associated with pancreatic cancer survival according to GoGene are represented with larger circles. The absolute correlation coefficient of gene expression with survival in the screening dataset is shown in red. (B) Selection of the network showing genes that are regulated by FOS and SP1. It contains many literature-associated and highly correlated genes. (C) Protein–protein interactions among all signature genes, representing physical interactions between the transcription factors SP1, STAT3, JUN, FOS and the transcription coactivator BRCA1.
Figure 4Signature to predict risk in patients with and without adjuvant therapy.
(A) Signature to predict risk in patients with adjuvant therapy. The signature was developed with patients receiving adjuvant therapy separated by their median survival into two groups, a high risk group with shorter survival and a low risk group with longer survival. A classifier trained with the signature using leave-one-out cross-validation shows a significant difference between the predicted low and high risk group (, logrank test). (B) Signature to predict risk in patients without adjuvant therapy. The signature was developed with patients not receiving adjuvant therapy separated by their median survival into two groups, a high risk group with shorter survival and a low risk group with longer survival. A classifier trained with the signature using leave-one-out cross-validation shows a significant difference between the predicted low and high risk group (, logrank test).