| Literature DB >> 32745259 |
Helena Liljedahl1, Anna Karlsson1, Gudrun N Oskarsdottir1,2, Annette Salomonsson1, Hans Brunnström1,3, Gigja Erlingsdottir4,5, Mats Jönsson1, Sofi Isaksson1, Elsa Arbajian1, Cristian Ortiz-Villalón6, Aziz Hussein7, Bengt Bergman8, Anders Vikström9, Nastaran Monsef10, Eva Branden11,12, Hirsh Koyi11,12, Luigi de Petris13, Annika Patthey14, Annelie F Behndig15, Mikael Johansson16, Maria Planck1,2, Johan Staaf1.
Abstract
Disease recurrence in surgically treated lung adenocarcinoma (AC) remains high. New approaches for risk stratification beyond tumor stage are needed. Gene expression-based AC subtypes such as the Cancer Genome Atlas Network (TCGA) terminal-respiratory unit (TRU), proximal-inflammatory (PI) and proximal-proliferative (PP) subtypes have been associated with prognosis, but show methodological limitations for robust clinical use. We aimed to derive a platform independent single sample predictor (SSP) for molecular subtype assignment and risk stratification that could function in a clinical setting. Two-class (TRU/nonTRU=SSP2) and three-class (TRU/PP/PI=SSP3) SSPs using the AIMS algorithm were trained in 1655 ACs (n = 9659 genes) from public repositories vs TCGA centroid subtypes. Validation and survival analysis were performed in 977 patients using overall survival (OS) and distant metastasis-free survival (DMFS) as endpoints. In the validation cohort, SSP2 and SSP3 showed accuracies of 0.85 and 0.81, respectively. SSPs captured relevant biology previously associated with the TCGA subtypes and were associated with prognosis. In survival analysis, OS and DMFS for cases discordantly classified between TCGA and SSP2 favored the SSP2 classification. In resected Stage I patients, SSP2 identified TRU-cases with better OS (hazard ratio [HR] = 0.30; 95% confidence interval [CI] = 0.18-0.49) and DMFS (TRU HR = 0.52; 95% CI = 0.33-0.83) independent of age, Stage IA/IB and gender. SSP2 was transformed into a NanoString nCounter assay and tested in 44 Stage I patients using RNA from formalin-fixed tissue, providing prognostic stratification (relapse-free interval, HR = 3.2; 95% CI = 1.2-8.8). In conclusion, gene expression-based SSPs can provide molecular subtype and independent prognostic information in early-stage lung ACs. SSPs may overcome critical limitations in the applicability of gene signatures in lung cancer.Entities:
Keywords: gene expression; lung adenocarcinoma; molecular subtypes; prognosis; single sample predictor
Mesh:
Substances:
Year: 2020 PMID: 32745259 PMCID: PMC7689824 DOI: 10.1002/ijc.33242
Source DB: PubMed Journal: Int J Cancer ISSN: 0020-7136 Impact factor: 7.396
FIGURE 1Flow‐chart of study. A, Approach to derive molecular subtype training class through nearest centroid classification (NCC) of all datasets individually using the scheme reported by Wilkerson et al. For the two‐class subtype approach, PP and PI subtypes were combined to a single nonTRU class. B, Training and validation scheme for deriving a two‐class SSP for TRU/nonTRU (SSP2) and a three‐class SSP for TRU/PI/PP subtypes (SSP3) based on the AIMS single sample method. Of the total 22 datasets included, 5 were reserved as independent validation datasets and were also used for evaluation of prognostic performance of the SSP models in both surgically treated only and adjuvantly treated patients. A patient overlap existed for the Shedden et al and Zhu et al cohorts. Patients overlapping were excluded from one cohort in survival analyses. An additional external validation of the SSP2 model was also performed in archival RNA from 44 Stage‐I patients treated with surgery only, by pairing the SSP2 model with the NanoString nCounter XT technology
Datasets included in our study
| Datasets | Total (N) | Accession | Platform | Sex: males (%) | Stage I (%) | OS | DMFS | Adj. chemo (N) | NCC status: TRU vs nonTRU (%) | Cohort assignment |
|---|---|---|---|---|---|---|---|---|---|---|
| Chitale et al | 102 | Chitale U133 2plus | Affymetrix | 41 | 69 | Yes | No | 0 | 41 | Training |
| CLCGP | 98 | CLCGP | Illumina | 48 | 44 | Yes | Yes | 0 | 34 | Training |
| Bild et al | 58 | GSE3141 | Affymetrix | NA | 45 | Yes | Yes | 0 | 36 | Training |
| Lee et al | 63 | GSE8894 | Affymetrix | 54 | NA | No | Yes | 0 | 38 | Training |
| Tomida et al | 117 | GSE13213 | Agilent | 51 | 68 | Yes | No | 0 | 40 | Training |
| Hou et al | 45 | GSE19188 | Affymetrix | 56 | NA | Yes | No | 0 | 31 | Training |
| Lu et al | 60 | GSE19804 | Affymetrix | NA | 58 | No | No | 0 | 38 | Training |
| Wilkerson et al | 116 | GSE26939 | Agilent | 46 | 53 | Yes | No | 0 | 41 | Training |
| Rousseaux et al | 85 | GSE30219 | Affymetrix | 78 | 95 | Yes | Yes | 0 | 34 | Training |
| Botling et al | 106 | GSE37745 | Affymetrix | 43 | 66 | Yes | No | 0 | 35 | Training |
| Seo et al | 87 | GSE40419 | RNAseq | 61 | 63 | No | No | 0 | 41 | Training |
| Tarca et al | 77 | GSE43580 | Affymetrix | 68 | 53 | No | No | 0 | 42 | Training |
| Chen et al | 92 | GSE46539 | Illumina | 17 | NA | No | No | 0 | 37 | Training |
| Der et al | 127 | GSE50081 | Affymetrix | 51 | 72 | Yes | Yes | 0 | 39 | Training |
| Karlsson et al | 77 | GSE60644 | Illumina | 42 | 88 | Yes | No | 0 | 40 | Training |
| Djureinovic et al | 115 | GSE81089 | RNAseq | 37 | 58 | Yes | No | 0 | 35 | Training |
| TCGA | 230 | TCGA | RNAseq | NA | NA | No | No | 0 | 39 | Training |
| Shedden et al | 444 | Shedden | Affymetrix | 50 | 62 | Yes | Yes | 89 | 37 | Validation |
| Okayama et al | 226 | GSE31210 | Affymetrix | 46 | 74 | Yes | Yes | 0 | 43 | Validation |
| Fouret et al | 103 | E‐MTAB‐923 | Affymetrix | 16 | 58 | Yes | No | 33 | 42 | Validation |
| Zhu et al | 71 | GSE14814 | Affymetrix | 52 | 59 | Yes | Yes | 39 | 35 | Validation |
| Tang et al | 133 | GSE42127 | Illumina | 51 | 67 | Yes | No | 39 | 38 | Validation |
Samples were divided into two cohorts based on the different Affymetrix platforms, U133A and U133 2plus. Only the latter subset was included in the analysis.
CLCGP, The Clinical Lung Cancer Genome Project (http://www.uni‐koeln.de/med‐fak/clcgp/).
The Cancer Genome Atlas Network (TCGA).
Data obtained from the “ArrayExpress” database (https://www.ebi.ac.uk/arrayexpress/experiments/E‐MTAB‐923/).
Present dataset overlaps with Shedden et al (43 samples).
FIGURE 2Training and validation of SSPs for prediction of molecular subtypes in lung adenocarcinoma. A, Proportion of TRU and nonTRU cases predicted by the NCC method per dataset in the study. For each dataset, assignment to training or validation cohort and technical gene expression platform is shown. Top‐axis indicates dataset size. B, Schematic overview of the SSP2 classifier for TRU/nonTRU status based on training vs NCC subtype classes in the training cohort. The SSP2 classifier comprises 18 gene rules (pairs), that is, 36 genes. Gene rules are shown with indication of their highest posterior probability in the AIMS model. Based on all individual gene rule probabilities a final prediction is made. C, Overlap of genes in the SSP2 (top) and SSP3 classifiers vs the original NCC centroid genes from Wilkerson et al. D, Proportions of TRU classified cases in the five validation datasets for the NCC and SSP2 models, showing differences across datasets. E, Classification performance (accuracy and balanced accuracy) in the validation cohort for the SSP2 model vs TRU/nonTRU NCC classifier, and the SSP3 model vs the TRU/PI/PP NCC classifications
FIGURE 3Comparison of classification methods and implication on survival outcome in lung adenocarcinoma. For details about the groups used in the Kaplan‐Meier plots, see the Results section. A, Kaplan‐Meier plot of OS for 590 surgically treated lung adenocarcinoma patients combined from the five validation datasets stratified by concordant or discordant NCC and SSP2 classifications. B, OS for 94 of 104 patients with discrepant SSP2/NCC classification from (A). C, DMFS for 454 surgically treated lung adenocarcinoma patients combined from the five validation datasets stratified by concordant or discordant NCC and SSP2 classifications. D, Kaplan‐Meier plot of DMFS for 86 patients with discrepant SSP2/NCC classification from (C). E, Kaplan‐Meier plot of OS for 176 lung adenocarcinoma patients treated with adjuvant chemotherapy combined from the five validation datasets. F, DMFS for 105 adjuvant treated lung adenocarcinoma patients combined from the five validation datasets. In all plots, P‐values were calculated using the log‐rank test
Cox regression analysis of transcriptional subtypes in lung adenocarcinoma (surgically treated patients)
| Univariable analysis | Multivariable analysis | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Events (N) | HR | 95% CI |
| Events (N) | HR | 95% CI |
| Included confounders | |
| Overall survival | |||||||||
| Subtypes | 159/590 | 157/586 | |||||||
| TRU‐TRU | 1.00 | Ref | (<.001) | 1.00 | Ref | (<.001) | Stage, gender, age | ||
| TRU‐nonTRU | 2.9 | 1.37‐6.32 | .005 | 3.0 | 1.38‐6.42 | .005 | |||
| nonTRU‐TRU | 0.69 | 0.32‐1.48 | .3 | 0.76 | 0.35‐1.65 | .5 | |||
| nonTRU‐nonTRU | 3.3 | 2.30‐4.84 | <.001 | 2.9 | 1.92‐4.23 | <.001 | |||
| Stage | 157/586 | ||||||||
| I | 1.00 | Ref | (<.001) | 1.00 | Ref | ||||
| II | 3.1 | 2.17‐4.41 | <.001 | 2.5 | 1.73‐3.61 | <.001 | |||
| III | 7.2 | 4.66‐11.10 | <.001 | 5.4 | 3.46‐8.41 | <.001 | |||
| Gender | 159/590 | ||||||||
| Female | 1.00 | Ref | (.3) | 1.00 | Ref | ||||
| Male | 1.2 | 0.87‐1.61 | .3 | 0.96 | 0.70‐1.33 | .8 | |||
| Age (yr) | 159/590 | ||||||||
| 1.04 | 1.02‐1.06 | (<.001) | 1.03 | 1.01‐1.05 | <.001 | ||||
| Distant metastasis‐free survival | |||||||||
| Subtypes | 146/454 | 145/452 | |||||||
| TRU‐TRU | 1.00 | Ref | (<.001) | 1.00 | Ref | (<.001) | Stage, gender, age | ||
| TRU‐nonTRU | 3.0 | 1.38‐6.39 | .005 | 3.0 | 1.39‐6.52 | .005 | |||
| nonTRU‐TRU | 1.8 | 1.06‐3.02 | .03 | 1.4 | 0.77‐2.35 | .2 | |||
| nonTRU‐nonTRU | 2.8 | 1.88‐4.15 | <.001 | 2.1 | 1.37‐3.21 | <.001 | |||
| Stage | 145/452 | ||||||||
| I | 1.00 | Ref | (<.001) | 1.00 | Ref | ||||
| II | 3.2 | 2.28‐4.58 | <.001 | 2.8 | 1.89‐4.02 | <.001 | |||
| III | 3.3 | 1.71‐6.39 | <.001 | 3.0 | 1.48‐5.65 | .001 | |||
| Gender | 146/454 | ||||||||
| Female | 1.00 | Ref | (.3) | 1.00 | Ref | ||||
| Male | 1.2 | 0.87‐1.66 | .3 | 1.1 | 0.77‐1.49 | .7 | |||
| Age (yr) | 146/454 | ||||||||
| 1.02 | 1.002‐1.04 | (.03) | 1.02 | 1.01‐1.05 | .02 | ||||
Abbreviation: CI, confidence interval.
The following confounders were included in the model: Stage (not Stage IV because of too few cases), gender and age. The confounders were selected based on their significance from the univariable analysis with P ≤ .05 (except for gender).
Follow‐up starts after surgical resection of the tumor lesion and ends at death by any reason (=event).
Groups were created based on a combination of two classifiers' outcome: TRU or nonTRU. Classifier1 (=NCC) − Classifier2 (=SSP).
Follow‐up starts after surgical resection of the tumor lesion and ends at distant metastasis occurrence (=event).
P‐value for the pairwise comparisons were calculated using the Wald test. Overall P‐values (also from the Wald test) are given within the parentheses.
FIGURE 4SSP2 performance on surgically treated Stage‐I lung adenocarcinomas. A, Kaplan‐Meier plot of OS for surgically treated Stage‐I patients in the validation datasets (only patients with outcome data), stratified by SSP2 classification. B, DMFS for surgically treated Stage‐I patients in the validation datasets, stratified by SSP2 classification. C, Hierarchical clustering (Pearson correlation and ward.D linkage) of log2 count NanoString data for 44 FFPE Stage‐I tumors using the 36 genes present in the SSP2 model through the CLAMS package. D, Confusion matrix of CLAMS prediction vs clinical status of relapse (loc‐regional/distant) yes/no. E, Gene expression of MKI67 (Ki67) and NAPSA (Napsin A) across the 44 NanoString cases stratified by CLAMS prediction and clinical relapse status. Groups in gray represents agreement between TRU/no relapse and nonTRU/relapse. F, Kaplan‐Meier plot of recurrence‐free (loco‐regional/distant) interval for the 44 NanoString cases stratified by CLAMS prediction. G, Kaplan‐Meier plot of OS for 30 Stage‐I tumors from GSE143486 stratified by CLAMS prediction. FFPE RNA for these samples were analyzed by RNA sequencing. In all Kaplan‐Meier plots, P‐values were calculated using the log‐rank test