Literature DB >> 33163952

A prognostic model for overall survival of patients with early-stage non-small cell lung cancer: a multicentre, retrospective study.

Cheng Lu1, Kaustav Bera1, Xiangxue Wang1, Prateek Prasanna2, Jun Xu3, Andrew Janowczyk4, Niha Beig1, Michael Yang5, Pingfu Fu6, James Lewis7, Humberto Choi8, Ralph A Schmid9, Sabina Berezowska10, Kurt Schalper11, David Rimm11, Vamsidhar Velcheti12, Anant Madabhushi13.   

Abstract

Background: Intratumoural heterogeneity has been previously shown to be related to clonal evolution and genetic instability and associated with tumour progression. Phenotypically, it is reflected in the diversity of appearance and morphology within cell populations. Computer-extracted features relating to tumour cellular diversity on routine tissue images might correlate with outcome. This study investigated the prognostic ability of computer-extracted features of tumour cellular diversity (CellDiv) from haematoxylin and eosin (H&E)-stained histology images of non-small cell lung carcinomas (NSCLCs).
Methods: In this multicentre, retrospective study, we included 1057 patients with early-stage NSCLC with corresponding diagnostic histology slides and overall survival information from four different centres. CellDiv features quantifying local cellular morphological diversity from H&E-stained histology images were extracted from the tumour epithelium region. A Cox proportional hazards model based on CellDiv was used to construct risk scores for lung adenocarcinoma (LUAD; 270 patients) and lung squamous cell carcinoma (LUSC; 216 patients) separately using data from two of the cohorts, and was validated in the two remaining independent cohorts (comprising 236 patients with LUAD and 335 patients with LUSC). We used multivariable Cox regression analysis to examine the predictive ability of CellDiv features for 5-year overall survival, controlling for the effects of clinical and pathological parameters. We did a gene set enrichment and Gene Ontology analysis on 405 patients to identify associations with differentially expressed biological pathways implicated in lung cancer pathogenesis. Findings: For prognosis of patients with early-stage LUSC, the CellDiv LUSC model included 11 discriminative CellDiv features, whereas for patients with early-stage LUAD, the model included 23 features. In the independent validation cohorts, patients predicted to be at a higher risk by the univariable CellDiv model had significantly worse 5-year overall survival (hazard ratio 1·48 [95% CI 1·06-2·08]; p=0·022 for The Cancer Genome Atlas [TCGA] LUSC group, 2·24 [1·04-4·80]; p=0·039 for the University of Bern LUSC group, and 1·62 [1·15-2·30]; p=0·0058 for the TCGA LUAD group). The identified CellDiv features were also found to be strongly associated with apoptotic signalling and cell differentiation pathways. Interpretation: CellDiv features were strongly prognostic of 5-year overall survival in patients with early-stage NSCLC and also associated with apoptotic signalling and cell differentiation pathways. The CellDiv-based risk stratification model could potentially help to determine which patients with early-stage NSCLC might receive added benefit from adjuvant therapy. Funding: National Institue of Health and US Department of Defense.

Entities:  

Mesh:

Year:  2020        PMID: 33163952      PMCID: PMC7646741          DOI: 10.1016/s2589-7500(20)30225-9

Source DB:  PubMed          Journal:  Lancet Digit Health        ISSN: 2589-7500


Introduction

Tumour cellular heterogeneity has been shown to be a hallmark of all cancers, with a diverse group of cell populations including cancer cells, immune cells, mesenchymal cells, and the like making up a heterogeneous solid tumour.[1-3] Several studies have shown that tumour progression and carcinogenesis are related to clonal evolution and genetic instability, with highly aggressive tumours being far more heterogeneous than less aggressive variants. The presence of genetic sub-clonal populations in cancers, termed intratumoural heterogeneity, has been shown to be an independent prognostic factor of outcome in several different cancer types such as breast cancer[3] and head and neck cancer,[4] with high intratumoural heterogeneity having markedly worse patient survival and implicated in drug resistance.[5] Nuclear morphological differences or nuclear pleomorphism have traditionally been known to be pathognomonic of cancer and a marker for tumour differentiation.[6] This genetic intratumoural heterogeneity is reflected in the morphological makeup of the tissue, with more rapidly growing or aggressive cancers showing a greater cellular heterogeneity among cancer cells compared with a relatively indolent tumour.[7] Tumours with higher intratumoural heterogeneity and higher nuclear diversity have been shown to have a poorer prognosis than cancers with low intratumoural heterogeneity and less nuclear diversity. Thus, quantifying this sub-visual morphological diversity in tissues would be a good surrogate for genetic intratumoural heterogeneity. In early-stage (stage I and II) non-small cell lung carcinomas (NSCLCs), surgical resection is the treatment of choice, but almost 40–55% of these tumours recur after surgery.[8] Several genomic-based prognostic biomarkers of outcome exist in early-stage NSCLC, but these are typically developed from a single biopsy and thus might not comprehensively account for genetic and morphological intratumoural heterogeneity present in NSCLC. Features relating to tumour cellular diversity and local morphological heterogeneity within tissue slides might be driven by the genomic and epigenetic alterations and could potentially provide a tissue non-destructive way of predicting disease outcome. Here, we present a new histogenomic approach—local cellular morphological diversity (referred to as CellDiv)—to interrogate the diversity of nuclear morphology in the epithelium region, and employ it in conjunction with a Cox proportional hazards model to predict overall survival in early-stage NSCLC. A deep-learning neural network model based on U-net[9] was first employed to segment the epithelium region and nuclei in the image for downstream calculation of the nuclear morphological features. Because early-stage NSCLC can be broadly differentiated into the two major groups—lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD)—with varying driver mutations and epigenetic pathways,[10] we independently analysed the two subpopulations, keeping in mind the intrinsic differences between them. Our histogenomic analysis involved investigating the associations of these computerised CellDiv features with biological pathways implicated in carcinogenesis as well as studying the underexpression and overexpression of biological pathways associated with the CellDiv-derived prognostic risk groups.

Methods

Study design

The experimental design of this study has five key steps: data acquisition, local cellular diversity computation, calculation of the cellular diversity-based risk score, survival analysis, and histogenomic analysis (appendix 1 pp 9–10). Digitised tissue micro-arrays (TMAs) and whole slide images (WSIs) were obtained from four independent cohorts, which were divided into two training cohorts, and two independent validation cohorts. The nuclei identified in the haematoxylin and eosin (H&E)-stained images were segmented by an automatic method and a local nuclear graph (LNG) was constructed based on nuclear proximity. CellDiv features were then extracted from each LNG and used to calculate the cellular diversity-based risk score. We used the Least Absolute Shrinkage and Selection Operator (LASSO) method to discover the top features for constructing risk score, for LUAD and LUSC specifically, using a Cox proportional hazard model on the training cohorts. After locking down the Cox model, a risk score was generated for each patient in the two independent validation cohorts and survival analysis was done to evaluate the pre-trained Cox model. We compared the CellDiv model with existing models based on clinical variables in terms of precision-recall area under the receiver operating characteristic curve (AUC). Finally, we did histogenomic analysis to explore the association of morphological tumour cellular diversity with biological pathways.

Datasets

Formalin-fixed paraffin-embedded H&E-stained WSIs and TMAs collated from four independent and well characterised NSCLC cohorts were included in this study, representing 2213 patients. We required routine H&E-stained diagnostic images from patients with overall stage I and II cancer, for whom overall survival information was available (appendix 1 p 8). We excluded patients with locally advanced and metastatic (stage III and IV) tumours (n=1155), TMA spots that were not usable due to a lack of sufficient tissue for analysis (n=62), and slides with artifacts such as tissue folding and bubbles (n=37). Of the 1057 patients retained for this study, 506 had early-stage LUAD and 551 had early-stage LUSC (appendix 1 p 8). The four cohorts are represented by D1 (n=395), D2(n=91), D3 (n=473), and D4 (n=98). D1 comprised TMA samples from the Cleveland Clinic, resected between 2004 and 2014, with a mean follow-up of 53· 8 months (SD 9·6). D2 comprised TMA samples from Yale Medical School, resected between 1988 and 2003, with a mean follow-up of 41·7 months (11 ·5). D3 comprised diagnostic WSIs from The Cancer Genome Atlas (TCGA). D4 comprised TMA samples from the University of Bern,[11] resected between 2000 and 2013, with a mean follow-up of 29·1 months (SD 1838). All cohorts featured patients with LUAD and LUSC, except for the University of Bern cohort, which featured LUSC only. Scanning details of the different cohorts are included in appendix 1 (p 5). Clinicopathological and outcome information for patients in D1, D2, and D4 was obtained from Insitutional Review Board-approved retrospective chart review from the respective institutions. The corresponding information for patients in D3 was obtained from the TCGA. Cohorts D1 and D2 were used for feature discovery and model training, whereas D3 and D4 were used for independently validating the trained model. This study conforms to Health Insurance Portability and Accountability Act guidelines and was approved by the Institutional Review Board at University Hospitals Cleveland Medical Center (number 02–13–42C). Informed consent requirement was waived as the study used archival tissue. Usage of the University of Bern cohort was approved by the local Ethics Commission (KEK 200/14), which waived the requirement for written informed consent.

Automatic characterisation of cellular diversity

A U-net-based convolutional neural network model[9] was employed to detect the epithelial region from the digitised H&E-stained images, and then to detect and segment the nuclei. Once nuclei were detected and segmented, LNGs were constructed on the basis of the proximity of the individual nuclei (appendix 1 p 11). The intuition behind using LNGs was to capture the dissimilarity of proximally situated nuclei. The process of construction of an LNG involves first representing the centroids of the individual nuclei as nodes of a graph. Using the approach described by Foulkes[12] and Corredor and colleagues,[13] each node is then connected to the other nodes according to the Euclidean distance, a weighting function that favours the connectivity between proximal nodes. After this process, multiple disconnected subgraphs or clusters of nuclei are generated (appendix 1 p 2). A set of 11 nuclear morphologic features, quantifying nuclear shape and appearance, were first extracted from the H&E-stained images to quantify the nuclear shape and appearance based on the presegmented nuclei (appendix 1 pp 20–22). Each individual nuclear feature was discretised into five levels. We explored the discretisation criterion ω for values ranging from 3 to 7. Setting ω=3 will lead to a small co-occurrence matrix of size 3 × 3, which limits the spectrum of the diversity that can be captured, whereas setting ω=7 will lead to a very sparse co-occurrence matrix. Thus, empirically, we identified ω=5 to be the ideal level for discretising the nuclear diversity features. To explore the local tumour cellular diversity in terms of different shape and texture attributes, corresponding co-occurrence matrices based on the 11 extracted nuclear features were constructed (appendix 1 pp 2–4). 13 high-order statistical features (eg, entropy, energy)[14] were then extracted from each of the 11 co-occurrence matrices. Thus, each of the M different LNGs in each WSI represented by G, where u belongs to the set {1, 2, …., M}, is uniquely represented by a total of 11 different 13-dimensional feature vectors H=[h,…,h], where k ranges from one to 11. The final CellDiv signature (715-dimensional vector) for each single WSI is formed by the first-order statistics (mean, SD, kurtosis, skewness, and range) aggregated across all G.

Cox proportional hazard model

A Cox proportional hazard model, henceforth referred to as the Cox model, was trained using the top CellDiv features identified from D1 and D2 to generate continuous risk scores for all patients. We chose this model because it considers the time-to-event duration as well as censoring information to construct the model. The top discriminant CellDiv features were identified using LASSO with the Cox model as the cost function. The LASSO model was fitted under a ten-fold cross-validation scheme. The risk score for each patient was then calculated as the linear combination of the weights, β, of the top CellDiv features and associated values. The median value Topt of all the risk scores in training cohorts D1 and D2 was locked down as the optimal threshold for separating patients by risk level, with any value higher than the median categorised as high risk and median or lower categorised as low risk. We constructed the image models specifically for early-stage LUAD and LUSC and evaluated them separately. The performance of the locked-down Cox model was evaluated in a blinded fashion on the independent validation test sets D3 and D4. The locked-down Cox model generated a risk score for each patient in the validation test set. The optimal threshold Topt learnt from the training cohorts was then applied to these risk scores to separate the patients into low risk and high risk.

Survival analysis

We chose overall survival as our endpoint because it is considered the gold standard in outcome for clinical trials and studies. We focused on 5-year overall survival because studies have shown that in early-stage NSCLC, 5-year and 10-year overall survival were equivalent.[15] Overall survival was defined as the time interval between the date of diagnosis and the date of death. Patients who were still alive at the last reported date were labelled as censored. We used Kaplan-Meier survival analysis to examine the difference in overall survival between patients categorised as high risk or low risk by the model, and the difference of overall survival in each group was assessed by the log-rank test. Univariable Cox regression analysis was calculated to examine the prognostic ability of CellDiv features and other clinical and pathological parameters including age (>65 years vs ≤65 years), sex (male vs female), race (white vs other), smoking status (ever smoker vs never smoker), overall stage (II vs I), T stage (T2/T2a/T2b/T3 vs T1/T1a/T1b), and N stage (N1 vs N0). Multivariable Cox regression analysis was calculated to examine the predictive ability of CellDiv risk group when controlling for the effects of clinical and pathological parameters including age, smoking status, overall stage, T stage, and N stage. Mantel-Haenszel hazard ratios were calculated in univariable and multivariable analysis. p values were two sided and p<0·05 was considered to be statistically significant.

Histogenomics analysis

We first used CellDiv features to construct a machine learning classifier for KRAS mutational status, using data from the 236 patients with LUAD with data on KRAS mutational status (appendix 1 pp 4–5). We evaluated the association of CellDiv-identified prognostic risk groups and differentially expressed pathways, to help to elucidate the relationship between the histological image phenotype and the corresponding genotype. For the TCGA LUAD and LUSC cohorts, normalised mRNA expression data were available for 405 patients (195 with LUAD and 210 with LUSC), obtained from the Genomic Data Commons portal. These transcriptomic data (IlluminaHiSeq), which consisted of 20 531 annotated genes, were used to investigate the underlying biological pathways of the risk scores derived from the pathological image analysis. First, all the normalised genes were recorded based on their association with the CellDiv risk group, with patients categorised as high risk or low risk. Based on an assumption that gene expression values are not normally distributed,[16] genes that differentially express across patients in the two risk categories were selected using the Wilcoxon rank sum test, using a statistically significant threshold of 0·5. The Benjamini and Hochberg method was used to adjust p values and control for the false discovery rate in multiple testing.[17] The most differentially expressed genes, which were significantly associated with the risk score, were then used in Gene Ontology analysis to identify distinct Gene Ontology-based biological processes.[18] Gene Ontology provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology. Gene Ontology analysis highlights the most over-represented genes and finds the systematic linkages between those genes and biological processes. The next step in the histogenomic analysis involved selecting a set of pathways that were representative of biological processes and doing single-sample gene set enrichment analysis (ssGSEA). ssGSEA, an extension of GSEA, is a computational method that determines whether a predefined set of genes shows significant, concordant differences between two biological states (eg, phenotypes),[19] and calculates an enrichment score for every patient in the cohort. Each ssGSEA enrichment score represents the degree to which the genes in a particular gene set are coordinately upregulated or downregulated within a sample. The predefined sets of genes for the Gene Ontology-based biological processes were acquired from the Molecular Signatures Database. In our case, ssGSEA was used to find pathway associations with tumour cellular diversity-defined phenotypes individually for LUSC and LUAD. This helps to overcome limitations of single-gene analysis which often misses important biological pathways that tend to affect a set of genes acting together, rather than a single gene-based analysis.[19] Significant differentially expressing pathways with respect to CellDiv features that contributed to the risk score were then selected using the Wilcoxon rank sum test.

Role of the funding source

The funders of the study played no role in study design, data collection, data analysis, data interpretation, or writing of the report. All authors had full access to all the data in the study and the corresponding author had final responsibility for the decision to submit for publication.

Results

Clinical and pathological data for all four cohorts are summarised in table 1. Patients were primarily white men in their mid-60s, and about 80% of patients were current or former smokers.
Table 1:

Summary of clinical and pathological data by cohort

Entire cohortD1: Cleveland ClinicD2: Yale Medical SchoolD3: TCGAD4: Universityof Bern (LUSConly; n=98)




LUSC(n=551)LUAD(n=506)LUSC(n=152)LUAD(n=243)LUSC(n=64)LUAD(n=27)LUSC(n=237)LUAD(n=236)
Age, years66·8 (9·4)65·8 (10·4)66·3 (9·9)67·5 (9·9)64·7 (8·7)63·6 (11·6)67·0 (9·5)66·3 (9·7)69·2 (7·9)
Sex
 Male360 (65%)238 (47%)82 (54%)128 (53%)54 (84%)21 (78%)146 (62%)89 (38%)78 (80%)
 Female191 (35%)268 (53%)70 (46%)115 (47%)10 (16%)6 (22%)91 (38%)147 (62%)20 (20%)
Race
 White492 (89%)429 (85%)133 (88%)213 (88%)59 (92%)23 (85%)202 (85%)193 (82%)98 (100%)
 Other47 (9%)64 (13%)18 (12%)30 (12%)5 (8%)4 (15%)24 (10%)30 (13%)0
Smoking status
 Ever436 (79%)389 (77%)139 (91%)186 (77%)NANA211 (89%)203 (86%)86 (88%)
 Never24 (4%)66 (13%)1 (1%)36 (15%)NANA23 (10%)30 (13%)0
T stage
 T1/T1a/T1b181 (33%)213 (42%)64 (42%)126 (52%)8 (13%)3 (11%)86 (36%)84 (36%)23 (23%)
 T2/T2a/T2b309 (56%)242 (48%)68 (45%)96 (40%)31 (48%)12 (44%)135 (57%)134 (57%)75 (77%)
 T341 (7%)43 (8%)20 (13%)21 (9%)5 (8%)4 (15%)16 (7%)18 (8%)0
N stage
 N0414 (75%)391 (77%)127 (84%)205 (84%)32 (50%)11 (41%)180 (76%)175 (74%)75 (77%)
 N187 (16%)103 (20%)25 (16%)38 (16%)12 (19%)8 (30%)27 (11%)57 (24%)23 (23%)
Overall stage
 I/IA/IB309 (56%)278 (55%)75 (49%)108 (44%)39 (61%)15 (56%)146 (62%)155 (66%)49 (50%)
 II/IIA/IIB177 (32%)107 (21%)15 (10%)14 (6%)22 (34%)12 (44%)91 (38%)81 (34%)49 (50%)

Data are n (%) or mean (SD). NA=not applicable. LUAD=lung adenocarcinoma. LUSC=lung squamous cell carcinoma. TCGA=The Cancer Genome Atlas.

For early-stage LUSC prognostication, the CellDiv LUSC model included the 11 most discriminative CellDiv features, which were related to the nuclear shape (ie, major axis length of nuclei) and the nuclear intensity; the full list of feature names and their associated weight is presented in appendix 1 (p 23). In the univariable analysis of LUSC in D3, the CellDiv model was prognostic of 5-year overall survival, whereas none of the included clinical and pathological factors were significant (table 2). CellDiv LUSC was prognostic of overall survival in D4 as well, along with sex (table 2). In multivariable analysis, while controlling for clinicopathological factors, CellDiv was independently prognostic of overall survival in both validation test sets (table 2). This was supported by the Kaplan-Meier analysis (figure 1). When considering two representative cases of patients with LUSC who were identified as high risk and low risk by CellDiv with feature maps overlaid, the model determined that the low-risk tissue image had more local cell clusters (represented by the coloured patches) than the high-risk tissue image, with relatively lower CellDiv (figure 2).
Table 2:

Univariable and multivariable analysis for 5-year overall survival on the validation test sets D3 and D4

D3: TCGA (LUSC)
D3: TCGA (LUAD)
D4: University of Bern (LUSC)
HR (95% CI)p valueHR (95% CI)p valueHR (95% CI)p value
Univariable Cox model analysis for overall survival
Age: >65 years vs ≤65 years1·13 (0·81–1·57)0·4881·00 (0·72–1·41)0·9821·71 (0·84–3·50)0·139
Sex: male vs female0·81 (0·58–1·13)0·2191·30 (0·92–1·86)0·1423·62 (1·30–10·12)0·014
Race: white vs other1·60 (0·93–2·74)0·0870·81 (0·53–1·24)0·326NANA
Smoking status: ever vs never1·29 (0·79–2·10)0·3031·36 (0·79–2·34)0·266NANA
T stage: T2/T2a/T2b/T3 vs T1/T1a/T1b1·24 (0·88–1·74)0·2131·22 (0·86–1·72))0·2731·26 (0·60–2·61)0·542
N stage: N1 vs N01·35 (0·90–2·01)0·1461·92 (1·18–3·13)0·00831·19 (0·60–2·35)0·614
Overall stage: II vs I1·22 (0·86–1·73)0·2591·25 (0·85–1·84)0·2571·05 (0·59–1·85)0·867
Image model: high risk vs low risk1·48 (1·06–2·08)0·0221·62 (1·15–2·30)0·0062·24 (1·04–4·80)0·039
Multivariable Cox model analysis controlling for clinical and pathological variables
Age: >65 years vs ≤65 years1·14 (0·81–1·61)0·4511·12 (0·78–1·60)0·5401·80 (0·84–3·85)0·131
Smoking status: ever vs never1·36 (0·83–2·23)0·2211·14 (0·64–2·01)0·661NANA
Overall stage: II vs I1·13 (0·66–1·94)0·6511·86 (1·04–3·32)0·0371·13 (0·66–1·94)0·708
T stage: T2/T2a/T2b/T3 vs T1/T1a/T1b1·26 (0·85–1·87)0·2441·25 (0·85–1·85)0·2631·39 (0·60–3·19)0·438
N stage: N1 vs N01·36 (0·77–2·41)0·2923·11 (1·55–6·23)0·00121·17 (0·52–2·65)0–709
Image model: high risk vs low risk1·52 (1·08–2·13)0·0161·55 (1·09–2·22)0·0152·34 (1·07–5·14)0·034

Mantel-Haenszel HRs are provided. TCGA=The Cancer Genome Atlas. LUSC=lung squamous cell carcinoma. LUAD=lung adenocarcinoma. HR=hazard ratio. NA=not applicable.

Figure 1:

Kaplan-Meier 5-year overall survival according to risk category

HR=hazard ratio. LUAD=lung adenocarcinoma. LUSC=lung squamous cell carcinoma. NA=not applicable. TCGA=The Cancer Genome Atlas.

Figure 2:

Cellular diversity feature maps in LUSC risk model (A), LUAD risk model (B), and mutational status classification (C)

(A) Representative cases of LUSC and CellDiv feature map illustration. (B) Representative cases of LUAD and CellDiv feature map illustration. In (A) and (B), the first column shows haematoxylin and eosin-stained images with low-risk and high-risk patients as identified by the CellDiv model. The segmented nuclei contour and connecting edges are shown in the second column. The third column shows CellDiv features that capture the CellDiv in terms of nuclear shape (ie, area in panel A and eccentricity in panel B). Each colour patch represents individual LNGs in the image, where the blue and yellow colours represent the low and high normalised feature values. (C) Representative cases of KRAS mutation positive versus KRAS mutation negative, and the corresponding CellDiv feature map. LNG=local nuclear graph. LUAD=lung adenocarcinoma. LUSC=lung squamous cell carcinoma.

For early-stage LUAD prognostication, the CellDiv-LUAD model included the 23 most discriminative CellDiv features, which were related to the nuclear shape (eg, the solidity of nuclei) and the nuclear intensity; the full list of feature names and their associated weight is presented in appendix 1 (p 24). In the univariable analysis of LUAD in D3, the CellDiv model was prognostic of 5-year overall survival, while N stage was also significant (table 2). In multivariable analysis, CellDiv was independently prognostic of overall survival, along with overall stage and N stage (table 2). This was supported by the Kaplan-Meier analysis (figure 1). When considering two representative cases with local nuclear shape diversity feature maps overlaid, the high-risk example had a higher expression of the CellDiv feature relating to nuclear shape than the low-risk example (figure 2). In precision-recall AUC analysis, the CellDiv model outperformed existing clinical variable-based models for LUSC and LUAD (appendix 1 p 26). We obtained a mean AUC of 0·63 in classification of KRAS status (60 KRAS mutation positive vs 176 KRAS mutation negative) using top six discriminative CellDiv features under five-fold cross-validation over 100 iterations (appendix 1 pp 4–5). As part of our histogenomics analysis, we did an empirical analysis of the 20 531 annotated genes across the D3 LUSC and LUAD cohorts, which resulted in 299 and 207 differentially expressing genes (DEGs), respectively, between CellDiv-defined low-risk and high-risk groups based on 5-year overall survival (the full list of DEGs is presented in appendix 2). Our Gene Ontology analysis using these DEGs identified 23 significant biological pathways for LUSC and 15 for LUAD (a complete list of pathways is presented in appendix 2). These significant pathways were chosen on the basis of their biological significance in regulating tumour cellular diversity and carcinogenesis. In LUSC and LUAD, these pathways were broadly concerned with cell signalling, adhesion, division, localisation, apoptosis, and replication. Specifically, in LUAD, dendritic cell cytokine production, mast cell proliferation, regulation of apoptosis, pathways leading to DNA replication, and nucleus development were overexpressed in high-risk patients with higher cellular diversity. In LUSC, pathways of apoptotic signalling by p53, regulation of protein imports into the nucleus, cell adhesion and negative regulation of cellular differentiation, and cell signalling were differentially expressed between the CellDiv risk groups. The fold enrichment changes and strength of association between the CellDiv risk groups and significant biological processes in LUAD and LUSC are shown in appendix 2. For a comprehensive histogenomics analysis, we evaluated the molecular underpinning of the prognostic CellDiv features by studying the corresponding association with ssGSEA. Gene set annotations for the 15 and 23 biological processes that were found significant in Gene Ontology analysis were used to calculate ssGSEA scores for each of the 23 most discriminative tumour cellular diversity features for LUAD and the 11 features for LUSC, respectively. In LUAD, CellDiv features were strongly associated with gene sets corresponding to apoptotic signalling, DNA replication, acute inflammatory response, and chromosome separation in meiosis pathways (figure 3). Meanwhile, in LUSC, CellDiv features were strongly correlated with pathways related to adhesion, cytokine activity, cell differentiation, leucocyte activation, and apoptotic signalling, among others (figure 4). A complete list of these differentially expressing genes can be found in appendix 2. In the case of LUSC, the local CellDiv features in terms of nuclear intensity (eg, mean intensity and mean inside boundary intensity: median [energy], which measure the nuclear intensity diversity in a local region) were strongly associated with cell ageing, adhesion, localisation, replication, apoptosis, and cytokine production. CellDiv features related to nuclear shape (eg, length of minor axis) were similarly strongly associated with pathways regulating cellular differentiation, cell signalling including bone morphogenetic protein signalling, and extracellular organisation. Similarly, in the case of LUAD, the local CellDiv in terms of shape (solidity and circularity) were strongly associated with pathways controlling histone acetylation, nuclear division, apoptosis, cellular differentiation, and nuclear autophagy among others. CellDiv features related to nuclear intensity meanwhile was found to be strongly correlated with apoptosis, nuclear autophagy, inflammatory response, nuclear division, and protein targeting.
Figure 3:

Association between biological processes and the CellDiv features used to construct the prognostic models for LUAD

The strength of association of biological processes, shown in rows, with the CellDiv features, shown in columns, by ssGSEA analysis. Wilcoxon rank sum test p values are shown, where p<0·05 shows an association between histomorphometric features used in the CellDiv models and certain pathways (while p<0·05). LUAD=lung adenocarcinoma. ssGSEA=single-sample gene set enrichment analysis.

Figure 4:

Association between biological processes and the CellDiv features used to construct the prognostic models for LUSC

The strength of association of biological processes, shown in rows, with the CellDiv features, shown in columns, by ssGSEA analysis. Wilcoxon rank sum test p values are shown, where p<0·05 shows an association between histomorphometric features used in the CellDiv models and certain pathways (while p<0·05). LUSC=lung squamous cell carcinoma. BMP=bone morphogenetic protein. TGF=transforming growth factor. ssGSEA=single-sample gene set enrichment analysis.

Discussion

Definitive resection in early-stage NSCLC is potentially curative and the standard of care, yet almost half of these patients experience recurrence following surgery. While adjuvant chemotherapy is routinely used in patients with stage II NSCLC, it is currently not recommended in patients with stage IA disease and there is controversy regarding its use in stage IB NSCLC due to contradictory results from prospective clinical trials.[20] There is thus a need to develop a prognostic biomarker that can identify which patients with stage I NSCLC have more aggressive disease and can derive potential benefit from additional therapy following resection. Subsequently, a prognostic biomarker would also work to eliminate unnecessary chemotherapy for patients with low-risk stage II disease who would do well with surgery alone. Existing prognostic biomarkers in NSCLC mostly rely on molecular or multigene-based assays.[21] These tend to be expensive, time consuming, and tissue destructive while also not accounting for the inherent intratumoural heterogeneity present in tissues. For instance, Sandoval and colleagues[22] presented a prognostic five-gene DNA methylation signature analysing 450 000 CpG sites from the tumoural DNA for stage I NSCLC. On an independent test cohort of 143 patients with stage I disease, the signature had a hazard ratio of 3·24 (95% CI 1·61–6·54; p<0·001) in prognosticating recurrence-free survival. Chen and colleagues[23] presented a five-gene signature panel using RT-PCR that was prognostic of recurrence-free survival and overall survival on an independent test cohort of 42 patients with early-stage NSCLC, with a hazard ratio of 3·36 (1·35–8·35; p=0·009). Several studies have also shown the usefulness of single gene-based biomarkers including p53,[24] ERBB2,[25] RRM1,[26] and BRCA for prognosticating survival in early-stage NSCLC.[21] In this work, a risk score leveraging quantitative pathomorphometric features related to nuclear and morphologic diversity (CellDiv) was used to prognosticate overall survival in early-stage (stage I and II) NSCLC. Accounting for the well explored differences both morphologically and in the genetic makeup between LUSC and LUAD, independent CellDiv models were developed for each histological subtype to maximise model performance and to showcase the different biological underpinning behind the CellDiv features depending on tumour subtype. The developed CellDiv models were independently validated on a large multi-institutional cohort from the TCGA as well as an independent and blinded test cohort from the University of Bern. Previous work in the area of computational pathology-based prognostic predictors for early-stage NSCLC includes works by Corredor and colleagues,[13] Wang and colleagues,[27] Saltz and colleagues,[28] and Yu and colleagues.[29] While not explicitly capturing cellular diversity, these approaches involved characterising the spatial arrangement and appearance of tumour-infiltrating lymphocytes and nuclei and relating these measurements with the likelihood of disease recurrence and progression. Andor and colleagues[6] showed that diversity in nuclear intensity and shape were correlated with intratumoural heterogeneity in four different cancer types (LUAD, head and neck squamous cell carcinoma, and bladder and renal cell carcinomas, in 382 patients). Coudray and colleagues[30] showed that a deep learning model is able to classify NSCLC into LUAD, LUSC, and normal with AUC of 0·97. In addition, the trained deep learning model can predict the ten most commonly mutated genes in LUAD, with AUCs from 0·73 to 0·86. Unlike the approach presented in this work, which relied on computationally derived intuitive features representing local cellular diversity, Coudray and colleagues[30] used so-called black-box deep learning features with little explainability. Additionally, their work considered associations with single-gene driver mutations, whereas we explicitly looked at genome-level representations. Similarly, the work of Kather and colleagues[31] showed an association between deep learning representations from WSI for gastrointestinal cancer and microsatellite instability. Note that while we did employ deep learning, it was solely used for tissue partitioning and nuclear segmentation, the pre-processing steps for the subsequent feature extraction. Our work differed from these studies by developing and validating CellDiv features that represent morphological intratumoural heterogeneity and are representative of gene expression, and are also prognostic of survival in early-stage NSCLC. The CellDiv features employed a mathematical and computational model to capture local morphological heterogeneity. Given that there are both morphological and biological differences between LUAD and LUSC, independent dedicated prognostic models for LUAD and LUSC were separately constructed. The present work also encompasses histogenomics analysis by investigating the molecular and biological pathways that might drive these histomorphometric prognostic features by Gene Ontology and ssGSEA analysis. Additionally, we believe this is the first work to show that computer-extracted histomorphometric features were not only strongly prognostic of overall survival but associated with underlying morphological and biological pathway correlations. In this work, we also explored the molecular underpinning of the CellDiv-defined prognostic risk groups on the TCGA dataset with available mRNA sequencing data. We showed the associations between specific CellDiv features and the significant biological pathways determined by ssGSEA. In adenocarcinomas, for instance, the selected CellDiv features in terms of nuclear solidity and mean inside boundary showed higher expression of genes related to the biological pathway of DNA replication[32] and nucleus development.[33] With the CellDiv features essentially capturing the degree of heterogeneity and diversity in shape, size, and texture of cancer nuclei, this seems to suggest that higher expression of those developmental pathways leads to more disordered or chaotic nuclei. Meanwhile, in LUSC, the family of bone morphogenetic protein and transforming growth factor β receptors, which have been already shown to be implicated in lung cancer carcinogenesis,[34] were found to be associated with CellDiv features that measuring nuclear shape and intensity, clearly suggesting that the diversity features are being driven by the cellular differentiating and adhesion pathways.[35] This was possibly reflective of the increased differentiation present in high-risk tumours as analysed by the CellDiv risk groups. Additionally, CellDiv features were also found to be associated with KRAS mutational status in adenocarcinomas, with a classification AUC of 0·63. Unlike deep learning-based methods presented by Coudray[30] and Kather,[31] CellDiv features explicitly capture morphologic heterogeneity in terms of cellular diversity, as opposed to more opaque representations that are not as intuitive or explainable.[5] In LUAD, the apoptotic signalling pathway was found to be significantly associated with the degree of the solidity of the nuclei, possibly suggesting that the degree of nuclear diversity is dependent on the targeted cellular destruction of cancer nuclei leading to more disordered cellular structure in more aggressive cancers.[36] Nuclear autophagy, which could be another potential reason for the disordered and more heterogeneous cellular structure,[37] was also correlated with a CellDiv feature relating to the nuclear boundary intensity, reflecting textural heterogeneity in nuclei. For LUSC, biological pathways connected with the regulation of DNA replication and cell differentiation[38] were strongly associated with the CellDiv feature analysing nuclear textural heterogeneity. This seems to suggest that more aggressive cancers are represented by a more heterogeneous cellular and nuclear organisation and architecture. Our study had some limitations. First, the CellDiv prognostic model was developed and validated using retrospective data for prognosticating patient outcome, but was not validated for predicting the added benefit of adjuvant therapy. Future work will entail validation of the CellDiv model with access to the appropriate early-stage NSCLC clinical trial datasets (eg, one arm with surgery alone, the other arm with surgery plus adjuvant therapy). Similarly, another future direction to explore would be evaluating the ability of the CellDiv features to predict response to therapies such as checkpoint inhibitors. In addition, the correlative analysis between morphology and gene expression was done only on TCGA patients. Another limitation of the study was the evaluation of the CellDiv feature solely on TMA spot images, but not the WSIs. In the precision-recall AUC analysis, a marked improvement was seen in the TCGA LUSC cohort with CellDiv features only compared with the clinical variable-based model (0·85 vs 0·84; appendix 1 p 26). However, despite the limited tissue area, the CellDiv features were still able to prognosticate patient outcomes. To summarise, we presented a histogenomics approach that attempts to capture CellDiv in the tissue. The CellDiv feature-based classifier was evaluated in H&E-stained image cohorts. The CellDiv features showed a strong correlation with overall survival in early-stage NSCLC, were associated with biological pathways of cellular differentiation, apoptosis, and signalling, and could distinguish KRAS status (in LUAD). CellDiv needs to be clinically validated, first on archived clinical samples from completed clinical trials in early-stage NSCLC (eg, the International Adjuvant Lung Cancer Trial and SWOG Cancer Research Network’s JBR10 trial) for generating level 1 evidence before it can be deployed clinically. Our goal following validation is to provide oncologists with a risk score to guide their treatment decision making. Those patients with early-stage lung cancer but identified by the CellDiv feature classifier as high risk might be good candidates for adjuvant chemotherapy, whereas those identified as being at low risk are likely to do well with surgery alone. Additionally, following validation on archived clinical trials, we will deploy CellDiv as a biomarker to guide therapy in a prospective clinical trial setting, where CellDiv scores will be used to randomly assign patients to either adjuvant chemotherapy or surgery alone for early-stage NSCLC.
  35 in total

Review 1.  Adjuvant chemotherapy in non-small cell lung cancer: state-of-the-art.

Authors:  Ángel Artal Cortés; Lourdes Calera Urquizu; Jorge Hernando Cubero
Journal:  Transl Lung Cancer Res       Date:  2015-04

2.  A five-gene signature and clinical outcome in non-small-cell lung cancer.

Authors:  Hsuan-Yu Chen; Sung-Liang Yu; Chun-Houh Chen; Gee-Chen Chang; Chih-Yi Chen; Ang Yuan; Chiou-Ling Cheng; Chien-Hsun Wang; Harn-Jing Terng; Shu-Fang Kao; Wing-Kai Chan; Han-Ni Li; Chun-Chi Liu; Sher Singh; Wei J Chen; Jeremy J W Chen; Pan-Chyr Yang
Journal:  N Engl J Med       Date:  2007-01-04       Impact factor: 91.245

3.  Exploring and comparing of the gene expression and methylation differences between lung adenocarcinoma and squamous cell carcinoma.

Authors:  Yang Yang; Meng Wang; Bao Liu
Journal:  J Cell Physiol       Date:  2018-10-14       Impact factor: 6.384

Review 4.  Molecular and cellular heterogeneity: the hallmark of glioblastoma.

Authors:  Diane J Aum; David H Kim; Thomas L Beaumont; Eric C Leuthardt; Gavin P Dunn; Albert H Kim
Journal:  Neurosurg Focus       Date:  2014-12       Impact factor: 4.047

Review 5.  Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology.

Authors:  Kaustav Bera; Kurt A Schalper; David L Rimm; Vamsidhar Velcheti; Anant Madabhushi
Journal:  Nat Rev Clin Oncol       Date:  2019-08-09       Impact factor: 66.675

6.  Intra-tumor heterogeneity in head and neck cancer and its clinical implications.

Authors:  Edmund A Mroz; James W Rocco
Journal:  World J Otorhinolaryngol Head Neck Surg       Date:  2016-07-22

7.  Differentiated regulation of immune-response related genes between LUAD and LUSC subtypes of lung cancers.

Authors:  Mengzhu Chen; Xiuying Liu; Jie Du; Xiu-Jie Wang; Lixin Xia
Journal:  Oncotarget       Date:  2017-01-03

8.  Prognostic implications of autophagy-associated gene signatures in non-small cell lung cancer.

Authors:  Yang Liu; Ligao Wu; Haijiao Ao; Meng Zhao; Xue Leng; Mingdong Liu; Jianqun Ma; Jinhong Zhu
Journal:  Aging (Albany NY)       Date:  2019-12-07       Impact factor: 5.682

9.  Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning.

Authors:  Nicolas Coudray; Paolo Santiago Ocampo; Theodore Sakellaropoulos; Navneet Narula; Matija Snuderl; David Fenyö; Andre L Moreira; Narges Razavian; Aristotelis Tsirigos
Journal:  Nat Med       Date:  2018-09-17       Impact factor: 53.440

10.  Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images.

Authors:  Xiangxue Wang; Andrew Janowczyk; Yu Zhou; Rajat Thawani; Pingfu Fu; Kurt Schalper; Vamsidhar Velcheti; Anant Madabhushi
Journal:  Sci Rep       Date:  2017-10-19       Impact factor: 4.379

View more
  21 in total

1.  Reimagining T Staging Through Artificial Intelligence and Machine Learning Image Processing Approaches in Digital Pathology.

Authors:  Kaustav Bera; Ian Katz; Anant Madabhushi
Journal:  JCO Clin Cancer Inform       Date:  2020-11

2.  Histopathological Tissue Segmentation of Lung Cancer with Bilinear CNN and Soft Attention.

Authors:  Rui Xu; Zhizhen Wang; Zhenbing Liu; Chu Han; Lixu Yan; Huan Lin; Zeyan Xu; Zhengyun Feng; Changhong Liang; Xin Chen; Xipeng Pan; Zaiyi Liu
Journal:  Biomed Res Int       Date:  2022-07-07       Impact factor: 3.246

Review 3.  The state of the art for artificial intelligence in lung digital pathology.

Authors:  Vidya Sankar Viswanathan; Paula Toro; Germán Corredor; Sanjay Mukhopadhyay; Anant Madabhushi
Journal:  J Pathol       Date:  2022-06-20       Impact factor: 9.883

4.  A new magnetic resonance imaging tumour response grading scheme for locally advanced rectal cancer.

Authors:  Xiaolin Pang; Peiyi Xie; Li Yu; Haiyang Chen; Jian Zheng; Xiaochun Meng; Xiangbo Wan
Journal:  Br J Cancer       Date:  2022-04-06       Impact factor: 9.075

5.  22-O-(N-Boc-L-glycine) ester of renieramycin M inhibits migratory activity and suppresses epithelial-mesenchymal transition in human lung cancer cells.

Authors:  Yamin Oo; Justin Quiel Lasam Nealiga; Khanit Suwanborirux; Supakarn Chamni; Gea Abigail Uy Ecoy; Varisa Pongrakhananon; Pithi Chanvorachote; Chatchai Chaotham
Journal:  J Nat Med       Date:  2021-07-21       Impact factor: 2.343

6.  Smart wearable devices as a psychological intervention for healthy lifestyle and quality of life: a randomized controlled trial.

Authors:  Hsin-Yen Yen
Journal:  Qual Life Res       Date:  2020-10-26       Impact factor: 4.147

7.  Non-invasive scoring of cellular atypia in keratinocyte cancers in 3D LC-OCT images using Deep Learning.

Authors:  Sébastien Fischman; Javiera Pérez-Anker; Linda Tognetti; Angelo Di Naro; Mariano Suppa; Elisa Cinotti; Théo Viel; Jilliana Monnier; Pietro Rubegni; Véronique Del Marmol; Josep Malvehy; Susana Puig; Arnaud Dubois; Jean-Luc Perrot
Journal:  Sci Rep       Date:  2022-01-10       Impact factor: 4.379

8.  Integrating pathomics with radiomics and genomics for cancer prognosis: A brief review.

Authors:  Cheng Lu; Rakesh Shiradkar; Zaiyi Liu
Journal:  Chin J Cancer Res       Date:  2021-10-31       Impact factor: 4.026

9.  A Convolutional Neural Network-Based Intelligent Medical System with Sensors for Assistive Diagnosis and Decision-Making in Non-Small Cell Lung Cancer.

Authors:  Xiangbing Zhan; Huiyun Long; Fangfang Gou; Xun Duan; Guangqian Kong; Jia Wu
Journal:  Sensors (Basel)       Date:  2021-11-30       Impact factor: 3.576

10.  Heterochronous Metastases of Lung Adenocarcinoma to Pancreas and Liver: A Case Report from Pathological Perspectives.

Authors:  Bo Zhang; Qida Hu; Jiajie Yu; Junsen Wang; Hanjin Yang; Jiongbo Lou; Guoying Cai; Haifeng Huang; Mengqiu Xu; Zhaoying Xiao; Yun Zhang
Journal:  Onco Targets Ther       Date:  2021-07-22       Impact factor: 4.147

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.