| Literature DB >> 32694712 |
Steven A Buechler1, Melissa T Stephens2, Amanda B Hummon3, Katelyn Ludwig4, Emily Cannon5, Tonia C Carter6, Jeffrey Resnick7, Yesim Gökmen-Polar8, Sunil S Badve8,9.
Abstract
Colorectal cancer (CRC) tumors can be partitioned into four biologically distinct consensus molecular subtypes (CMS1-4) using gene expression. Evidence is accumulating that tumors in different subtypes are likely to respond differently to treatments. However, to date, there is no clinical diagnostic test for CMS subtyping. In this study, we used novel methodology in a multi-cohort training domain (n = 1,214) to develop the ColoType scores and classifier to predict CMS1-4 based on expression of 40 genes. In three validation cohorts (n = 1,744, in total) representing three distinct gene-expression measurement technologies, ColoType predicted gold-standard CMS subtypes with accuracies 0.90, 0.91, 0.88, respectively. To accommodate for potential intratumoral heterogeneity and tumors of mixed subtypes, ColoType was designed to report continuous scores measuring the prevalence of each of CMS1-4 in a tumor, in addition to specifying the most prevalent subtype. For analysis of clinical specimens, ColoType was also implemented with targeted RNA-sequencing (Illumina AmpliSeq). In a series of formalin-fixed, paraffin-embedded CRC samples (n = 49), ColoType by targeted RNA-sequencing agreed with subtypes predicted by two independent methods with accuracies 0.92, 0.82, respectively. With further validation, ColoType by targeted RNA-sequencing, may enable clinical application of CMS subtyping with widely-available and cost-effective technology.Entities:
Mesh:
Year: 2020 PMID: 32694712 PMCID: PMC7374173 DOI: 10.1038/s41598-020-69083-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of colon cancer samples used for training and validation of ColoType.
| Cohort A (Affymetrix) | Cohort B (PETACC-3) | Cohort C (COAD) | ||||
|---|---|---|---|---|---|---|
| Training (n = 683)a | Validation (n = 1,205)b | Training (n = 342) | Validation (n = 346) | Training (n = 189) | Validation (n = 193) | |
| Assay platform | Affymetrix hgu133plus2 | Affymetrix hgu133plus2 | custom array | custom array | RNA-seq | RNA-seq |
| 1 | 61 | 134 | NA | NA | 28 | 29 |
| 2 | 293 | 206 | NA | NA | 65 | 62 |
| 3 | 241 | 164 | NA | NA | 58 | 48 |
| 4 | 88 | 30 | NA | NA | 15 | 14 |
| NA | 0 | 612 | NA | NA | 7 | 2 |
| Median age | 68 | 70 | NA | NA | 67 | 69 |
| MSI | 72 | 86 | NA | NA | 32 | 28 |
| MSS | 402 | 133 | NA | NA | 133 | 123 |
| NA | 209 | 653 | NA | NA | 8 | 4 |
| CMS1 | 117 | 135 | 39 | 39 | 28 | 26 |
| CMS2 | 281 | 315 | 120 | 121 | 60 | 50 |
| CMS3 | 81 | 121 | 36 | 37 | 25 | 24 |
| CMS4 | 141 | 200 | 72 | 73 | 41 | 37 |
| NONE | 63 | 101 | 75 | 76 | 19 | 18 |
aFormed from datasets (n): GSE17536 (167), GSE39582 (516).
bFormed from datasets (n): GSE13067 (69), GSE13294 (150), GSE14333 (149), GSE2109 (264), GSE23878 (31), GSE35896 (57), GSE37892 (121), KFSYCC (276).
cThe CMS classification as reported by CRCSC combining the network and random forest classifiers, termed CMS-final in this study.
Summary clinico-pathological features of the Marshfield cohort.
| feature | Marshfield cohort (n = 49) |
|---|---|
| Age at surgery (median) | 76.4 |
| Stage 2 | 49 |
| T2 | 1 |
| T3 | 44 |
| T4 | 4 |
| Ascending | 17 |
| Transverse | 21 |
| Descending | 2 |
| Sigmoid | 9 |
| Distant metastases | 7 |
| Deaths | 18 |
| Median follow-up days | 2,212 |
Figure 1Receiver operator characteristic (ROC) curves are plotted for ColoType CMS1-score, CMS2-score, CMS3-score, and CMS4-score, for samples in the Cohort A validation set (n = 1,205). Area under the curve (AUC) values are displayed on the panels.
Classification thresholds for ColoType CMS scores in each cohort.
| CMS1-score | CMS2-score | CMS3-score | CMS4-score | |
|---|---|---|---|---|
| Cohort A | 0.751 | 0.792 | 0.663 | 0.684 |
| Cohort B | 0.450 | 0.760 | 0.447 | 0.612 |
| Cohort C, log2 TPM normalized | 0.790 | 0.698 | 0.564 | 0.767 |
| Cohort C, size-factor normalized | 0.630 | 0.613 | 0.632 | 0.613 |
| Marshfield, whole-genomea | 0.630 | 0.613 | 0.632 | 0.613 |
| Marshfield, AmpliSeqa | 0.630 | 0.613 | 0.632 | 0.613 |
aClassification thresholds for these cohorts were selected in reference to Cohort C with size-factor normalized gene expression.
Confusion matrix of ColoType CMS predictions and CRCSC CMS subtypes (CMS-final) in the combined validation sets.
| ColoType | CRCSC subtypes (CMS-final) | ||||
|---|---|---|---|---|---|
| CMS1 | CMS2 | CMS3 | CMS4 | NONE | |
| CMS1 | 190 | 1 | 18 | 19 | 28 |
| CMS2 | 0 | 529 | 4 | 59 | 65 |
| CMS3 | 6 | 5 | 183 | 2 | 25 |
| CMS4 | 13 | 4 | 0 | 290 | 31 |
| NONE | 33 | 88 | 31 | 32 | 88 |
| Sensitivity | 0.91 | 0.98 | 0.89 | 0.78 | |
| Specificity | 0.99 | 0.92 | 0.99 | 0.98 | |
aStatistics are reported for prediction of each CMS subtype individually; unclassified samples were excluded from the computation.
Figure 2Multiple independent analyses were performed on the Marshfield cohort samples to assess the accuracy of subtyping by ColoType targeted RNA-seq analysis. Count data were generated by whole genome RNA-seq and by targeted RNA-seq with the ColoType custom AmpliSeq library. Gene expression values were computed from both sets of count data by size factor normalization, log2 transformed. ColoType was applied to expression data by AmpliSeq, and three independent classifiers were applied to whole-genome data. Results of the four subtype systems were compared.
Figure 3For each sample in the Marshfield cohort, the CMS classification is displayed for ColoType by targeted RNA-sequencing (AmpliSeq), and the classification methods CMScaller, CMSclassifier and ColoType by whole-genome data with size factor normalized expression values.
Figure 4ROC curves are plotted in the Cohort A validation set (n = 1,205) for the scores developed herein to predict CRCassigner subtypes. The Enterocyte-score was derived from risk scores for genes CA1 and CA2. The other scores were derived from the ColoType CMS scores and the Enterocyte-score.