Bo Ci1, Donghan M Yang1, Ling Cai1,2,3, Lin Yang1,4, Luc Girard5,6, Junya Fujimoto7, Ignacio I Wistuba7, Yang Xie1,2,8, John D Minna5,6,9, William Travis10, Guanghua Xiao1,2,8. 1. Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA. 2. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA. 3. Children's Research Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA. 4. Department of Pathology, National Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. 5. Hamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, TX, USA. 6. Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX, USA. 7. Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA. 8. Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA. 9. Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA. 10. Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
Abstract
BACKGROUND: Lung adenocarcinomas (ADCs) show heterogeneous morphological patterns that are classified into five subgroups: lepidic predominant, papillary predominant, acinar predominant, micropapillary predominant and solid predominant. The morphological classification of ADCs has been reported to be associated with patient prognosis and adjuvant chemotherapy response. However, the molecular mechanisms underlying the morphology differences among different subgroups remain largely unknown. METHODS: Using the molecular profiling data from The Cancer Genome Atlas (TCGA) lung ADC (LUAD) cohort, we studied the molecular differences across invasive ADC morphological subgroups. RESULTS: We showed that the expression of proteins and mRNAs, but not the gene mutations copy number alterations (CNA), were significantly associated with lung ADC morphological subgroups. In addition, expression of the FOXM1 gene (which is negatively associated with patient survival) likely plays an important role in the morphological differences among different subgroups. Moreover, we found that protein abundance of PD-L1 were associated with the malignancy of subgroups. These results were validated in an independent cohort. CONCLUSIONS: This study provides insights into the molecular differences among different lung ADC morphological subgroups, which could lead to potential subgroup-specific therapies. 2020 Translational Lung Cancer Research. All rights reserved.
BACKGROUND: Lung adenocarcinomas (ADCs) show heterogeneous morphological patterns that are classified into five subgroups: lepidic predominant, papillary predominant, acinar predominant, micropapillary predominant and solid predominant. The morphological classification of ADCs has been reported to be associated with patient prognosis and adjuvant chemotherapy response. However, the molecular mechanisms underlying the morphology differences among different subgroups remain largely unknown. METHODS: Using the molecular profiling data from The Cancer Genome Atlas (TCGA) lung ADC (LUAD) cohort, we studied the molecular differences across invasive ADC morphological subgroups. RESULTS: We showed that the expression of proteins and mRNAs, but not the gene mutations copy number alterations (CNA), were significantly associated with lung ADC morphological subgroups. In addition, expression of the FOXM1 gene (which is negatively associated with patient survival) likely plays an important role in the morphological differences among different subgroups. Moreover, we found that protein abundance of PD-L1 were associated with the malignancy of subgroups. These results were validated in an independent cohort. CONCLUSIONS: This study provides insights into the molecular differences among different lung ADC morphological subgroups, which could lead to potential subgroup-specific therapies. 2020 Translational Lung Cancer Research. All rights reserved.
Lung cancer is the leading cause of death from cancer in the United States and worldwide (1-3). Lung adenocarcinoma (ADC) is the most common histologic subtype, comprising 40% of lung cancers (1,4). Over 70–90% of surgically resected lung cancers were invasive ADCs (1,5). In 2011, the International Association for the Study of Lung Cancer (IASLC), American Thoracic Society (ATS), and European Respiratory Society (ERS) proposed to classify invasive lung ADCs into 5 morphological subgroups, lepidic predominant ADC (LPA), papillary predominant ADC (PPA), acinar predominant ADC (APA), micropapillary predominant ADC (MPA) and solid predominant ADC (SPA) (1).The morphological classification was reported to be a prognostic factor of lung ADC patients. For example, a study with 292 lung ADC patients by Gu et al. showed that patients with LPAs, PPAs/APAs and MPAs/SPAs formed three distinct survival groups (6). Similar results were also reported by Russell et al. (7). In addition, Hung et al. (8) showed that lung ADC patients with MPAs and SPAs had worse disease-specific survival. In a study with 440 lung ADC patients, Yoshizawa et al. (9) showed that the 5-year disease free survival rate was the lowest in MPAs and SPAs, and the tumors in these subgroups recurred more frequently compared with those in other morphological subgroups. Overall, LPAs were associated with more favorable survival outcomes, while patients whose tumors were in the MPA and SPA morphological subgroups showed worse prognosis. Patients with PPA and APA tumors were in the intermediate groups. The ADC morphological classification was also reported to be associated with patient adjuvant chemotherapy response. Tsao et al. (10) showed that MPA/SPA patients benefit from adjuvant chemotherapy in terms of disease-specific survival. Luo et al.’s study of invasive lung ADC stage IB patients (11) showed that MPA/SPA classification was a predictive factor for adjuvant chemotherapy benefit in terms of disease-specific survival. Hung et al. (8) reported that the SPA morphological subtype was predictive of treatment benefit for patients undergoing adjuvant chemotherapy.Although the morphological subgroups have been reported to be associated with prognosis and adjuvant chemotherapy response in lung ADC patients, the underlying molecular mechanisms that lead to the morphological differences among the subgroups is largely unknown. Tsuta et al. (12) examined the mutation status of KRAS and EGFR, and ALK rearrangements in 757 ADC patients, and showed that EGFR mutations were prevalent in PPA, while ALK rearrangements were prevalent in MPA and APA. In a study with 19 lung ADC patients, Vinayanuwattikun et al. (13) reported no apparent difference in the patterns of mutated genes or somatic copy number alterations (CNA) across 3 groups, LPA, Non-LPA and adenocarcinoma in situ (AIS)/minimally invasive adenocarcinoma (MIA). Zhang et al. compared 4 SPA patients with 4 APA patients and showed that differentially expressed genes were enriched in pathways of RNA polymerase activity and p53 inactivation (14). Zabeck et al. also reported distinct transcriptomic differences using a cohort of 48 invasive lung ADC patients (15). Although these studies provide valuable insights into the underlying molecular mechanisms, they are largely limited by the small number of genes being examined or the small sample size. A study of genome-wide molecular profiles in a large ADC patient cohort is greatly needed for a comprehensive investigation of the molecular mechanisms underlying the morphological subgroups.In this study we used the comprehensive molecular profiling data from The Cancer Genome Atlas (TCGA) lung ADC (LUAD) dataset (16), including DNA mutation, CNAs, mRNA expression and protein abundance, to characterize the molecular patterns of these morphological subgroups. Our analysis showed that the protein abundance and mRNA expression, but not the patterns of gene mutations or CNAs, were significantly associated with the morphological subgroups of invasive lung ADCs. FOXM1 is likely an important molecular biomarker associated with different morphological subgroups. Moreover, we found that both the gene expression and protein abundance of PD-L1 were associated with the malignancy of the morphological subgroups. These results were validated in an independent cohort. Our work provides insights into the molecular mechanisms underlying the ADC morphological subgroups and provides hints for future subgroup-specific therapies.
Methods
Statement of ethics approval
This study is a computational study of existing datasets, so the Ethics Approval is not required.
Source of data
The discovery cohort was originally from TCGA-LUAD (16). The mutation annotation, mRNA expression and protein abundance datasets used in this study were downloaded from the third-party website Synapse in November 2015 (17). The CNA dataset was obtained from Firehose in November 2016 (18). The protein abundance was measured by reverse phase protein array (RPPA), mRNA expression was measured by RNA-seq, DNA mutation was measured by exon-seq, and CNAs were measured by single nucleotide polymorphisms (SNP) array.Morphological subgroup classification of the TCGA LUAD patient cohort was performed by Drs. William D. Travis and Adi Gazdar by examining pathology slides according to the IASLC/ATS/ERS classification of lung ADCs (1). In total, 212 patients with both morphological classification and molecular profiling data were selected as the discovery cohort (). The validation cohort was from the Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM) (19). In total, 38 patients with both subgroup classification and mRNA profiling data were selected as the validation cohort (). Due to the small sample size of the validation cohort, 5 subgroups were combined into 2 categories for analysis, non-solid predominant ADCs (Non-SPAs, n=27) and solid predominant ADCs (SPAs, n=11), since these two sub-types are the most clinically important sub-groups.
Table S1
Number of patients in each omics dataset in the discovery cohort
The pathway analysis was performed using DAVID Bioinformatics Resources (version 6.8) (https://david.ncifcrf.gov/) with the default settings (20,21). The categories included molecular function (MF), biological process (BP) and cellular component (CC) categories from Gene Ontology (GOTERM). Fisher’s exact tests were performed to assess the enrichment of each pathway using DAVID. Bonferroni P value adjustment was used to control the multiple comparisons issue.
The adenocarcinomas-squamous cell carcinomas (ADC-SCC) score
The degree of differentiation for each ADC patient was estimated by using an adenocarcinoma-squamous cell carcinoma (ADC-SCC) score derived from a 42-gene signature, which was differentially expressed in ADCs and SCCs (22). Based on the original study (22), the higher the ADC-SCC score (in absolute value), the more differentiated the ADCs.
Single sample gene set enrichment analysis
Single sample Gene Set Enrichment Analysis (ssGSEA) (23) and immune cell gene sets (24) were used to evaluate the immune cell activity in the discovery cohort. The enrichment score generated by ssGSEA indicates the activity level of the biological process represented by the gene set. In the ssGSEA, the alpha value was set to 0.25 according to the original reference (23). This calculation was repeated for each gene set and each sample. We used the immune cell type-specific gene sets (24) to estimate the levels of immune cell activities.
Associations between molecular profiles and morphological subgroups
Welch’s ANOVA was used to test whether the mean was the same across different morphological subgroups for the copy number, mRNA expression and protein abundance of each gene. Compared with the classical ANOVA method which assumes homogeneity of variance, Welch’s ANOVA does not rely on the assumption of homogeneity of variance, therefore it is a better fit for this study as some LPA morphological subgroups have relatively small sample size, and the equal variance assumption is hard to test. In addition, we performed classical ANOVA test with and without adjustment for stage, in order to investigate whether the stage of ADCs affects the results. The Fisher’s exact test was used to test whether the proportion of each mutated gene was the same across subgroups. The Jonckheere-Terpstra test was used to test the monotonic trend of molecular features. Bonferroni p-value adjustment was used to control the multiple comparisons issue. Results with Bonferroni-adjusted P value ≤0.05 were considered statistically significant.
Permutation of morphological subgroup labels to estimate the false discovery rate (FDR)
In order to construct a negative control dataset to estimate the FDR in the analysis, the original morphological subgroup labels of patients were randomly assigned to the patients. Welch’s ANOVA was performed for each antibody in the protein abundance dataset based on the new assignment. This process was repeated 100 times. The averaged frequency of p-values in each bin was calculated. Theoretically, if the P values are completely random and independent, they will follow a uniform distribution between 0 and 1, i.e., the average number of P values in each bin should be close to each other.
Computational environment
All computations were conducted in the R environment, version 3.3.2 (RStudio Team 2016; R Core Team 2017). R packages “clinfun” (version 1.0.15), “GSVA” (version 1.22.4), “survival” (version 2.40-1) and “beeswarm” (version 0.2.3) were used.
Results
Patient characteristics
Among the 212 patients in the discovery cohort (TCGA LUAD dataset), 13 (6.1%) patients were diagnosed with LPAs, 27 (12.7%) with PPAs, 82 (38.7%) with APAs, 26 (12.3%) with MPAs and 64 (30.2%) with SPAs. Among the 38 patients in the validation cohort (the CLCGP and NGM dataset), 2 (5.3%) were diagnosed with LPAs, 9 (23.7%) with PPAs, 11 (28.9%) with APAs, 5 (13.2%) with MPAs and 11 (28.9%) with SPAs. The Non-SPAs accounted for 71.1% of the validation cohort. The median overall survival of the discovery cohort was 13.2 months [low quartile (LQ)–high quartile (HQ): 3–30 months], and it was 18 months (LQ–HQ: 15–29 months) in the validation cohort ().
Table 1
Characteristics of the discovery and validation cohorts
Characteristics
Discovery cohort
Validation cohort
Number of patients
212
38
Gender (%)
Male
90 (42.5)
18 (47.4)
Female
122 (57.5)
20 (52.6)
Age at diagnosis, years, median [LQ–HQ]
67 [59, 72]
66 [57, 74]
Smoking status (%)
Current smoker
39 (18.4)
26 (68.4)
Former
133 (62.7)
6 (15.8)
Never
29 (13.7)
5 (13.2)
NA
11 (5.2)
1 (2.6)
Morphological subgroup (%)
Lepidic predominant
13 (6.1)
2 (5.3)
Papillary predominant
27 (12.7)
9 (23.7)
Acinar predominant
82 (38.7)
11 (28.9)
Micropapillary predominant
26 (12.3)
5 (13.2)
Solid predominant
64 (30.2)
11 (28.9)
Follow-up, months, median [LQ–HQ]
13.2 [3, 30]
18 [15, 29]
Vital status (%)
Alive
151 (71.2)
29 (76.3)
Deceased
61 (28.8)
9 (23.7)
Stage* (%)
I
3 (1.4)
0 (0)
IA
51 (24.0)
6 (15.8)
IB
62 (29.2)
14 (36.8)
IIA
15 (7.1)
1 (2.6)
IIB
28 (13.2)
4 (10.5)
IIIA
33 (15.6)
5 (13.2)
IIIB
8 (3.8)
8 (21.1)
IV
12 (5.7)
0 (0)
Discovery cohort is from The Cancer Genome Atlas (TCGA) – lung ADC (LUAD) dataset. Validation cohort is generated by the Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM). *, American Joint Committee on Cancer (AJCC) pathologic stage for the discovery cohort and the Union for International Cancer Control (UICC) stage for the validation cohort. LQ, low quartile; HQ, high quartile.
Discovery cohort is from The Cancer Genome Atlas (TCGA) – lung ADC (LUAD) dataset. Validation cohort is generated by the Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM). *, American Joint Committee on Cancer (AJCC) pathologic stage for the discovery cohort and the Union for International Cancer Control (UICC) stage for the validation cohort. LQ, low quartile; HQ, high quartile.
Overall molecular differences across morphological subgroups
The mutation, CNA, mRNA expression and protein abundance for each gene was tested across different morphological subgroups. No gene mutation rates or copy number values were statistically different across the five morphological subgroups (Figure S1A,B), for example, EGFR, KRAS, ALK and STK11 were not different. In total, 414 mRNAs in the RNA-sequencing dataset and 8 antibodies in the RPPA dataset showed significant statistical difference across the five subgroups (Figure S1C,D). The pathway analysis of these 414 differentially expressed mRNAs unveiled a strong association with the cell division process, including mitotic nuclear division, sister chromatid cohesion and DNA replication ().
Table S3
Top 10 terms in the pathway analysis with 414 mRNA hits identified from the mRNA expression dataset
Category
Term
Fisher’s exact test P value
Bonferroni-adjusted P value
GOTERM_BP_DIRECT
Cell division
1.70E-36
1.51E-33
GOTERM_BP_DIRECT
Mitotic nuclear division
9.50E-32
8.46E-29
GOTERM_BP_DIRECT
Sister chromatid cohesion
4.30E-27
3.83E-24
GOTERM_BP_DIRECT
DNA replication
1.30E-24
1.16E-21
GOTERM_CC_DIRECT
Condensed chromosome kinetochore
1.00E-21
8.90E-19
GOTERM_BP_DIRECT
G1/S transition of mitotic cell cycle
1.80E-18
1.60E-15
GOTERM_CC_DIRECT
Kinetochore
2.00E-17
1.78E-14
GOTERM_CC_DIRECT
Midbody
3.20E-17
2.85E-14
GOTERM_BP_DIRECT
DNA replication initiation
3.70E-17
3.29E-14
GOTERM_CC_DIRECT
Chromosome, centromeric region
1.90E-15
1.69E-12
BP, biological process; CC, cellular components.
Proteins with differential abundances across morphological subgroups
The top two hits identified in the RPPA dataset (ranked by the P values of Welch’s ANOVA) were PD-L1-R-V and CD274-R-E. They are two different antibodies targeting the same protein programmed death ligand 1 (PD-L1). Both PD-L1-R-V and CD274-R-E showed a significant monotonic increasing trend from LPAs, PPAs, APAs, MPAs to SPAs (Jonckheere-Terpstra test, P<0.001, ). The third hit, Napsin-A (NAPSA), was a protease to process the pulmonary surfactant protein B (). NAPSA was used as a marker for distinguishing primary lung ADCs and as a prognostic factor (25,26). The fourth and fifth hits were Cyclin-B1 (CCNB1) and Forkhead box protein M1 (FOXM1), respectively (). CCNB1 is a G2/mitotic-specific cyclin regulating cell cycles. It has been reported to be involved in multiple human cancers (27-30). The expression of CCNB1 can be regulated by FOXM1, a transcriptional factor (31-33). FOXM1 regulates the expression of cell cycle genes that are essential for DNA replication and mitosis. FOXM1 was reported as an oncogene for lung ADCs (31,34,35). These top hits identified in the protein abundance dataset showed significant monotonic (either increasing or decreasing) trends (Jonckheere-Terpstra test, Bonferroni adjustment, P<0.01, ). Moreover, the top hits identified in the mRNA sequencing dataset demonstrated the same tendency (Jonckheere-Terpstra test, P<0.001, ). In addition, we have compared the protein abundances across morphological subgroups adjusting for stage, and the results were similar to those without adjusting for stage ().
Figure 1
Top hits identified in the RPPA dataset. (A,B,C,D,E,F) Protein levels of PD-L1, CD274, NAPSA, CCNB1, FOXM1 and ADAR1. P value, Jonckheere-Terpstra test. (G) The rank, antibody name, protein name, Welch’s ANOVA P values and Jonckheere-Terpstra test P values for each hit identified from the RPPA dataset. Bonferroni method was used to adjust the Welch’s ANOVA P values. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC; JT, Jonckheere-Terpstra; RPPA, reverse phase protein array.
Figure S2
mRNA expressions of the top 4 hits identified Welch’s ANOVA in the mRNA expression dataset. (A) Metallothionein 1H (MT1H); (B) sorbin and SH3 domain containing 2 (SORBS2); (C) cilia and flagella associated protein 221 (PCDP1); (D) RAP1 GTPase activating protein (RAP1GAP). P value, Jonckheere-Terpstra test.
Table S4
Compare the P values from different analysis methods, with and without adjustment of stage
Antibody name
P value
Welch ANOVA P value with Bonferroni adjustment
Welch ANOVA
ANOVA
ANOVA adjusted by stage
PD-L1-R-V
8.6E-08
5.0E-09
5.7E-09
2.1E-05
CD274-R-E
2.9E-07
1.1E-08
1.2E-08
6.8E-05
Napsin-A-R-E
8.0E-07
4.0E-06
4.4E-06
1.9E-04
Cyclin-B1-R-V
1.8E-05
4.9E-08
4.1E-08
4.2E-03
FoxM1-R-V
1.1E-04
1.7E-05
1.6E-05
2.7E-02
ADAR1-M-V
1.3E-04
8.8E-03
9.1E-03
3.1E-02
4E-BP1-R-V
1.5E-04
4.3E-06
4.4E-06
3.5E-02
INPP4B-G-E
1.9E-04
1.9E-02
1.8E-02
4.6E-02
Top hits identified in the RPPA dataset. (A,B,C,D,E,F) Protein levels of PD-L1, CD274, NAPSA, CCNB1, FOXM1 and ADAR1. P value, Jonckheere-Terpstra test. (G) The rank, antibody name, protein name, Welch’s ANOVA P values and Jonckheere-Terpstra test P values for each hit identified from the RPPA dataset. Bonferroni method was used to adjust the Welch’s ANOVA P values. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC; JT, Jonckheere-Terpstra; RPPA, reverse phase protein array.
Morphological subgroups were associated with the degree of differentiation
NAPSA is one of the genes most frequently used to demonstrate ADC differentiation in non-small cell lung cancer (NSCLC) (36-38). The protein level and mRNA expression of NAPSA decreased from LPAs to SPAs (Jonckheere-Terpstra test, protein level P<0.001, mRNA expression P<0.001, and Figure S3A), indicating the decreasing trend of ADC differentiation. The protein level and mRNA expression of NK2 homeobox 1 (NKX2-1), a thyroid-specific transcription factor used as an ADC differentiation indicator, showed a similar decreasing tendency across subgroups (Jonckheere-Terpstra test, protein level P<0.01, mRNA expression P<0.001, Figure S3B,C). Moreover, we estimated the degree of differentiation using ADC-SCC scores (22) (see “Methods” section). Similar to the results for NAPSA and NKX2-1, the degrees of differentiation (i.e., ADC-SCC scores) monotonically decreased from LPAs to SPAs (Jonckheere-Terpstra test, P<0.001, Figure S3D).
FOXM1: a transcription factor associated with ADC morphological subgroups
Our results showed that morphological differences among subgroups were associated with mRNA and protein expression, but not with gene mutations or CNA patterns. This implies that transcription regulation may be a key process where the morphological difference may appear. Coincidentally, FOXM1, the fifth hit from the RPPA dataset, is a transcription factor, while CCNB1, the fourth hit, is one of the FOXM1 downstream targets (31-33) (). Moreover, the permutation results () indicate that the FDR is less than 0.05. As result, the possibility of both FOMX1 and CCNB1 being false positive hits is low.
Figure S4
The histograms of Welch’s ANOVA P values in the RPPA dataset (238 antibodies) with real labels and permutated labels. Green, the histogram of the Welch’s ANOVA P values in the RPPA dataset with real labels; Pink, the histogram of the Welch’s ANOVA P values in the RPPA dataset with permutated labels; Brown, the overlapped region of the above two histograms. RPPA, reverse phase protein array.
To test our hypothesis that FOXM1 may play an important role in the morphological differences among different subgroups, we examined the mRNA expressions of FOXM1 and its downstream targets VEGFA, PLK4, MMP9 and STMN (31-33) () in the discovery cohort. The mRNA expressions of VEGFA, PLK4, MMP9 and STMN were consistent with FOXM1, significantly increasing from LPAs to SPAs (Jonckheere-Terpstra test, P<0.001, ). It has been reported that E2F and ESR1 facilitate the expression of FOXM1, but FOXO3A inhibits FOXM1 (31,39,40). The mRNA expressions of E2F and ESR1 were positively associated with FOXM1, significantly increasing from LPAs to SPAs (Jonckheere-Terpstra test, P<0.01 ), while FOXO3A showed a decreasing trend (Jonckheere-Terpstra test, P<0.01, ). Moreover, FOXM1 expression was negatively associated with patient survival in the meta-analysis (P<0.001, ) across 21 datasets (2,968 lung ADC patients in total). Similarly, CCNB1, a downstream target of FOXM1, was also negatively associated with patient survival in the meta-analysis (P<0.001, ).
Figure 2
mRNA expressions of FOXM1, its downstream and upstream molecules. (A) A scheme of FOXM1 and part of its downstream targets involved in cancer. (B) mRNA expression of FOXM1. (C,D,E,F,G) mRNA expressions of FOXM1 downstream targets CCNB1, VEGFA, PLK4, MMP9 and STMN1. (H,I,J) mRNA expressions of FOXM1 upstream molecules E2F1, ESR1 and FOXO3. P value, Jonckheere-Terpstra test. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.
Figure 3
Forest-plot of FOXM1 gene expression: meta-analysis of the effect of FOXM1 gene expression on patient overall survival outcome across 21 datasets (2,968 lung ADC patients in total). CI, confidence interval; HR, hazard ratio; seTE, standard error of treatment estimate; TE, estimated treatment effect.
Figure S5
Forest-plot of CCNB1 gene expression: meta-analysis of the effect of CCNB1 gene expression on patient overall survival outcome across 22 datasets (2,998 lung ADC patients in total). CI, confidence interval; HR, hazard ratio; seTE, standard error of treatment estimate; TE, estimated treatment effect.
mRNA expressions of FOXM1, its downstream and upstream molecules. (A) A scheme of FOXM1 and part of its downstream targets involved in cancer. (B) mRNA expression of FOXM1. (C,D,E,F,G) mRNA expressions of FOXM1 downstream targets CCNB1, VEGFA, PLK4, MMP9 and STMN1. (H,I,J) mRNA expressions of FOXM1 upstream molecules E2F1, ESR1 and FOXO3. P value, Jonckheere-Terpstra test. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.Forest-plot of FOXM1 gene expression: meta-analysis of the effect of FOXM1 gene expression on patient overall survival outcome across 21 datasets (2,968 lung ADC patients in total). CI, confidence interval; HR, hazard ratio; seTE, standard error of treatment estimate; TE, estimated treatment effect.
Validating our findings in the validation cohort
The mRNA expression dataset from the CLCGP and NGM Cohorts was used to validate the results in the TCGA cohort (see “Methods” section). In the validation cohort, PD-L1 was highly expressed in SPAs compared to Non-SPAs (t-test, P<0.05, ). SPAs showed lower mRNA levels of NAPSA (), but a higher mRNA level of FOMX1 compared to Non-SPAs (t-test, P<0.05, ). Although the difference was not statistically significant, the trends of FOXM1 downstream targets (31-33) (CCNB1, VEGFA, PLK4, MMP9 and STMN1) and upstream molecules (31,39,40) (E2F1 and ESR1) were consistent with the results in the discovery cohort ().
Figure 4
mRNA expression results in the validation cohort. (A,B,C) mRNA expressions of PD-L1, NAPSA and FOMX1; (D,E,F,G,H) mRNA expressions of FOXM1 downstream targets CCNB1, VEGFA, PLK4, MMP9 and STMN1; (I,J,K) mRNA expressions of FOXM1 upstream molecules E2F1, ESR1 and FOXO3. P value, t-test. Non-S, non-solid predominant ADC; S, solid predominant ADC.
mRNA expression results in the validation cohort. (A,B,C) mRNA expressions of PD-L1, NAPSA and FOMX1; (D,E,F,G,H) mRNA expressions of FOXM1 downstream targets CCNB1, VEGFA, PLK4, MMP9 and STMN1; (I,J,K) mRNA expressions of FOXM1 upstream molecules E2F1, ESR1 and FOXO3. P value, t-test. Non-S, non-solid predominant ADC; S, solid predominant ADC.
Mutation status across morphological subgroups
summarizes the association between ADC morphological subgroups and mutation status of 19 selected genes. Among the 19 genes, only TP53 mutation showed numerical differences across the 5 ADC morphological subgroups (P=0.023). The patients with SPAs had higher TP53 mutation rate (63.5%) compared with other morphological subgroups (30.8%, 34.6%, 41.5% and 40% for lepidic, papillary, acinar, micropapillary, respectively). The mutation of other genes showed no significant differences across morphological subgroups.
Table S5
Association between ADC subtype and mutation status of selected genes (the P value is calculated based on Fisher’s exact test) and is without any multiple hypothesis adjustment
Gene
# of patients with mutation in each ADC subtype
Total # of patients with mutation
P value
L (%)
P (%)
A (%)
M (%)
S (%)
TP53
4 (30.8)
9 (34.6)
34 (41.5)
10 (40.0)
40 (63.5)
97 (46.4)
0.023
ALK
2 (15.4)
3 (11.5)
2 (2.4)
2 (8.0)
8 (12.7)
17 (8.1)
0.064
EGFR
1 (7.7)
2 (7.7)
22 (26.8)
4 (16.0)
7 (11.1)
36 (17.2)
0.068
SMAD4
0 (0)
0 (0)
4 (4.9)
4 (16.0)
1 (1.6)
9 (4.3)
0.069
PIK3CA
3 (23.1)
2 (7.7)
4 (4.9)
3 (12.0)
3 (4.8)
15 (7.2)
0.125
BRAF
2 (15.4)
6 (23.1)
5 (6.1)
2 (8.0)
7 (11.1)
22 (10.5)
0.141
STK11
5 (38.5)
7 (26.9)
12 (14.6)
3 (12.0)
13 (20.6)
40 (19.1)
0.191
ERBB2
0 (0)
1 (3.8)
1 (1.2)
2 (8.0)
3 (4.8)
7 (3.3)
0.345
LRP1B
5 (38.5)
6 (23.1)
27 (32.9)
5 (20.0)
25 (39.7)
68 (32.5)
0.348
EML4
0 (0)
0 (0)
2 (2.4)
2 (8.0)
4 (6.3)
8 (3.8)
0.409
CTNNB1
0 (0)
2 (7.7)
2 (2.4)
1 (4.0)
4 (6.3)
9 (4.3)
0.590
INHBA
1 (7.7)
1 (3.8)
4 (4.9)
3 (12.0)
6 (9.5)
15 (7.2)
0.593
AKT1
0 (0)
0 (0)
0 (0)
0 (0.0)
1 (1.6)
1 (0.5)
0.608
FGFR3
0 (0)
0 (0)
0 (0)
0 (0.0)
1 (1.6)
1 (0.5)
0.608
KRAS
4 (30.8)
9 (34.6)
22 (26.8)
9 (36.0)
14 (22.2)
58 (27.8)
0.608
MTOR
1 (7.7)
0 (0)
5 (6.1)
2 (8.0)
3 (4.8)
11 (5.3)
0.612
ROS1
1 (7.7)
1 (3.8)
2 (2.4)
1 (4.0)
4 (6.3)
9 (4.3)
0.618
MAP2K1
0 (0)
0 (0)
2 (2.4)
1 (4.0)
1 (1.6)
4 (1.9)
0.773
MET
1 (7.7)
1 (3.8)
7 (8.5)
1 (4.0)
5 (7.9)
15 (7.2)
0.940
L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.
Discussion
Our results showed that mRNA and protein expression, but not gene mutation rates or CNAs, were associated with lung ADC morphological subgroups (). These results indicated that the different morphological subgroups may share similar genomic backgrounds, while gene expression regulation components such as transcription factors may play an important role in morphological differentiation in lung ADCs. Furthermore, our analysis suggested a transcription factor, FOXM1, as a potential molecule functioning in the divergences of tumor morphology ( and ). Although FOXM1 was reported to be associated with lung cancer genesis (31,34,35), our study also reports the strong association and thus potential function of FOXM1 in divergent lung ADC morphological patterns. Further experimental validation of the functional roles of FOXM1 in ADC subtypes will be important.
Figure S1
P value plots of each omics dataset. Each dot represents the tested P value of each gene/copy number value/mRNA expression/protein abundance. (A) Fisher’s exact test of gene mutation proportions; (B) Welch’s ANOVA P values of CNAs; (C) Welch’s ANOVA P values of mRNA expressions; (D) Welch’s ANOVA P values of protein abundance. Red line, Bonferroni adjusted P=0.05. CNA, copy number alteration.
PD-L1 is an important target for immunotherapy (41,42). However, it is hard to predict which patients will respond to immune checkpoint blockade therapies (CBTs) (43,44). In this study, we observed the higher expression of PD-L1 in MPAs and SPAs () compared to other subgroups. Therefore, MPAs and SPAs may be the subgroups that respond better to CBTs. Since multiple types of cells could express PD-L1 in the tumor microenvironment, such as macrophages, dendritic cells and tumor cells (45,46), we further evaluated whether the high expression of PD-L1 was actually from tumor cells by using a computational approach, ssGSEA (23), to evaluate the activity level of the macrophages and dendritic cells in each sample. The absence of significant differential activity levels across subgroups in both macrophages and dendritic cells (Figure S6A,B) suggests that the differential expression of PD-L1 likely came from tumor cells. Further evidence is still required to pinpoint that tumor cells are the cause of the PD-L1 differential expression across subgroups. This question could be further addressed using the data from single cell sequencing, if available, to rule out the confounding effects of other type of cells.The degree of NSCLC differentiation is associated with disease progression and clinical outcomes. Our study shows that the lung ADC morphological subgroups are associated with the degree of differentiation. Both the protein abundances and mRNA expressions of NAPSA and NKX2-1 suggest a decreasing differentiation trend from LPAs, PPAs, APAs, MPAs to SPAs, which was supported by the ADC-SCC scores ( and ). This tendency could also explain the previously reported outcome results of subgroups: the more differentiated the ADCs were, the better outcomes the patients had (6-9).
Figure S3
The differentiation status of invasive ADC subgroups. (A) mRNA expression of NAPSA; (B,C) mRNA expression and protein level of NKX2-1; (D) the estimated ADC-SCC scores. P value, Jonckheere-Terpstra test. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.
One limitation of this work is the accuracy of the subgroup classification. The materials provided by TCGA for the pathological classification were less than those a pathologist usually would have in a real clinical setting. Thus, if the materials were not representative, the subgroup classification may be not accurate. However, the result consistency across different omics datasets and the validation results in the independent cohort indicate the high quality of this pathological classification.Another limitation of this work is that the results were based on statistical analysis of existing datasets. In addition, the sample size, especially for the validation dataset, is relatively small. In order to further validate the findings from this study, data from perspectival collected larger patient cohorts will be very helpful. Furthermore, experimental validation will be needed to pinpoint the underlying mechanisms.In the current stage, the morphological classification requires experienced pathologists to examine the tumor tissue slide in details. It would be helpful to develop some computerized tools using molecular features or pathology image to assist pathologist in morphological classification. It will also be interesting to test whether the molecular markers identified in this study could facilitate the diagnosis of different types of lung ADCs in clinics.LPA, lepidic predominant ADC; PPA, papillary predominant ADC; APA, acinar predominant ADC; MPA, micropapillary predominant ADC; SPA, solid predominant ADC.LPA, lepidic predominant ADC; PPA, papillary predominant ADC; APA, acinar predominant ADC; MPA, micropapillary predominant ADC; SPA, solid predominant ADC.BP, biological process; CC, cellular components.L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.P value plots of each omics dataset. Each dot represents the tested P value of each gene/copy number value/mRNA expression/protein abundance. (A) Fisher’s exact test of gene mutation proportions; (B) Welch’s ANOVA P values of CNAs; (C) Welch’s ANOVA P values of mRNA expressions; (D) Welch’s ANOVA P values of protein abundance. Red line, Bonferroni adjusted P=0.05. CNA, copy number alteration.mRNA expressions of the top 4 hits identified Welch’s ANOVA in the mRNA expression dataset. (A) Metallothionein 1H (MT1H); (B) sorbin and SH3 domain containing 2 (SORBS2); (C) cilia and flagella associated protein 221 (PCDP1); (D) RAP1 GTPase activating protein (RAP1GAP). P value, Jonckheere-Terpstra test.The differentiation status of invasive ADC subgroups. (A) mRNA expression of NAPSA; (B,C) mRNA expression and protein level of NKX2-1; (D) the estimated ADC-SCC scores. P value, Jonckheere-Terpstra test. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.The histograms of Welch’s ANOVA P values in the RPPA dataset (238 antibodies) with real labels and permutated labels. Green, the histogram of the Welch’s ANOVA P values in the RPPA dataset with real labels; Pink, the histogram of the Welch’s ANOVA P values in the RPPA dataset with permutated labels; Brown, the overlapped region of the above two histograms. RPPA, reverse phase protein array.Forest-plot of CCNB1 gene expression: meta-analysis of the effect of CCNB1 gene expression on patient overall survival outcome across 22 datasets (2,998 lung ADC patients in total). CI, confidence interval; HR, hazard ratio; seTE, standard error of treatment estimate; TE, estimated treatment effect.Immune cell activity enrichment scores in the discovery cohort. (A,B) the estimated enrichment score of macrophages and dendritic cells. P value, Welch’s ANOVA. L, lepidic predominant ADC; P, papillary predominant ADC; A, acinar predominant ADC; M, micropapillary predominant ADC; S, solid predominant ADC.The article’s supplementary files as
Authors: Prudence A Russell; Zoe Wainer; Gavin M Wright; Marissa Daniels; Matthew Conron; Richard A Williams Journal: J Thorac Oncol Date: 2011-09 Impact factor: 15.609
Authors: Il-Man Kim; Timothy Ackerson; Sneha Ramakrishna; Maria Tretiakova; I-Ching Wang; Tanya V Kalin; Michael L Major; Galina A Gusarova; Helena M Yoder; Robert H Costa; Vladimir V Kalinichenko Journal: Cancer Res Date: 2006-02-15 Impact factor: 12.701
Authors: Julie Millour; Natalia de Olano; Yoshiya Horimoto; Lara J Monteiro; Julia K Langer; Rosa Aligue; Nabil Hajji; Eric W F Lam Journal: Mol Cancer Ther Date: 2011-04-25 Impact factor: 6.261
Authors: Eric J Burks; Jiarui Zhang; Travis B Sullivan; Xingyi Shi; Jacob M Sands; Shawn M Regis; Brady J McKee; Andrea B McKee; Sherry Zhang; Hanqiao Liu; Gang Liu; Avrum Spira; Jennifer Beane; Marc E Lenburg; Kimberly M Rieger-Christ Journal: Cancer Treat Res Commun Date: 2021-11-10