Literature DB >> 31053738

Development and Validation of a Pathology Image Analysis-based Predictive Model for Lung Adenocarcinoma Prognosis - A Multi-cohort Study.

Xin Luo1, Shen Yin1,2, Lin Yang1,3, Junya Fujimoto4, Yikun Yang5, Cesar Moran6, Neda Kalhor6, Annikka Weissferdt6, Yang Xie1,7,8, Adi Gazdar8,9,10, John Minna8,9,11, Ignacio Ivan Wistuba4, Yousheng Mao5, Guanghua Xiao12,13,14.   

Abstract

Prediction of disease prognosis is essential for improving cancer patient care. Previously, we have demonstrated the feasibility of using quantitative morphological features of tumor pathology images to predict the prognosis of lung cancer patients in a single cohort. In this study, we developed and validated a pathology image-based predictive model for the prognosis of lung adenocarcinoma (ADC) patients across multiple independent cohorts. Using quantitative pathology image analysis, we extracted morphological features from H&E stained sections of formalin fixed paraffin embedded (FFPE) tumor tissues. A prediction model for patient prognosis was developed using tumor tissue pathology images from a cohort of 91 stage I lung ADC patients from the Chinese Academy of Medical Sciences (CAMS), and validated in ADC patients from the National Lung Screening Trial (NLST), and the UT Special Program of Research Excellence (SPORE) cohort. The morphological features that are associated with patient survival in the training dataset from the CAMS cohort were used to develop a prognostic model, which was independently validated in both the NLST (n = 185) and the SPORE (n = 111) cohorts. The association between predicted risk and overall survival was significant for both the NLST (Hazard Ratio (HR) = 2.20, pv = 0.01) and the SPORE cohorts (HR = 2.15 and pv = 0.044), respectively, after adjusting for key clinical variables. Furthermore, the model also predicted the prognosis of patients with stage I ADC in both the NLST (n = 123, pv = 0.0089) and SPORE (n = 68, pv = 0.032) cohorts. The results indicate that the pathology image-based model predicts the prognosis of ADC patients across independent cohorts.

Entities:  

Mesh:

Year:  2019        PMID: 31053738      PMCID: PMC6499884          DOI: 10.1038/s41598-019-42845-z

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Early diagnosis and accurate staging are among the key challenges for lung cancer patient care[1]. Patients’ survival outcomes vary substantially even within the same histology subtype and pathological stage, which is partially attributed to the highly heterogeneous nature of tumor cells and their close interaction with the diverse tumor microenvironment[2,3]. Recently, different technologies and methods have been developed to stratify cancer patients based on their molecular profiles[4,5] or histopathological factors[6,7], in order to facilitate personalized treatment of individual patients. Formalin fixed paraffin embedded (FFPE) tumor tissue slides provide a vast amount of information about the tumor and its surrounding microenvironment[8]; however, their potential for cancer diagnosis and treatment planning is still far from being fully explored. Currently, H&E stained tumor tissue slide scanning is becoming a routine clinical procedure. Recently, we[9] and Yu et al.[10] have demonstrated that pathology image analysis could be a promising tool to assist pathologists in lung cancer diagnosis and prognosis. However, both studies trained and validated the model using The Cancer Genome Atlas (TCGA) cohort alone. Since pathology images and patients from different cohorts may display different characteristics, in order to test the generalizability of the model, it is essential to evaluate the performance of a predictive model across multiple independent cohorts. In this study, we developed a pathology image-based prognostic model for lung adenocarcinoma (ADC) patients and validated the model in two independent lung ADC patient cohorts. This study established a generalized model that could be applied across different lung ADC patient cohorts.

Materials and Methods

Ethics approval and consent to participate

The University of Texas Southwestern Institutional Review Board granted approval for this research (IRB#: STU 072016-028). Data were collected under informed consent for study participation. Informed consent has been obtained for all study participation. All methods were performed in accordance with the relevant guidelines and regulations.

Datasets

We acquired H&E-stained histological images and the corresponding clinical information for 91 stage I ADC patients from the Chinese Academy of Medical Sciences, China (CAMS), 185 ADC patients from the National Lung Screening Trial (NLST), and 111 ADC patients from the University of Texas Special Program of Research Excellence (SPORE) in Lung Cancer project. There are 91, 433, and 130 tissue slides for the CAMS, NLST and SPORE cohorts, respectively. When a patient had multiple tissue slides, the summarized value of the morphological features from multiple slides was averaged to represent the value in the patient for further statistical analyses. All tumor tissue slides are FFPE and were scanned at ×20 or ×40 magnifications. Our pathologists, Drs. Lin Yang and Junya Fujimoto, manually inspected the tissue slide images, and images with low image quality were removed from further analysis. The images captured at X40 were normalized to X20 using the method described in our previous study[9]. The characteristics of patients from different cohorts are summarized in Table 1.
Table 1

Patient Data Summary.

CohortCAMSNLSTSPORE
Number of Patients91185111
Number of Slides (Tumor)95357129
Age at Diagnosis (Years) Median [LQ-HQ]60 [55–67]64 [60–68]64 [58–72]
Follow-up (Years) Median [LQ-HQ]5.0 [4.0–6.0]6.6 [5.3–6.9]3.6 [2.0–5.2]
Vital Status (%)Alive67 (73.6)122 (65.9)75 (67.6)
Deceased24 (26.4)63 (34.1)36 (32.4)
Gender (%)Male42 (46.2)103 (55.7)56 (50.5)
Female49 (53.8)82 (44.3)55 (49.5)
Cancer Stage (%)I91 (100.0)123 (66.5)68 (61.3)
II0 (0.0)19 (10.3)17 (15.3)
III0 (0.0)31 (16.8)24 (21.6)
IV0 (0.0)12 (6.5)1 (0.9)
NA0 (0.0)0 (0.0)1 (0.9)
Smoking Status (%)Smoker37 (40.7)103 (55.7)97 (87.4)
Non-Smoker54 (59.3)82 (44.3)13 (11.7)
NA0 (0.0)0 (0.0)1 (0.9)

Summary of the number of histological slides and patient clinical information in our study. LQ, the lower quartile, 25th percentile; HQ, the higher quartile, 75th percentile; NA, not available.

Patient Data Summary. Summary of the number of histological slides and patient clinical information in our study. LQ, the lower quartile, 25th percentile; HQ, the higher quartile, 75th percentile; NA, not available.

Extract Morphological Features

Using the method described in our previous study[9], morphological features for each image slide were extracted using CellProfiler[11,12] software by choosing different analyses modules. These features include global features such as tissue texture and granularity, as well as cell nuclear-based features such as the size, shape, distribution, texture and neighboring architecture of nuclei. These features covered comprehensive morphological information provided by the histological images. The average signal was taken for patients with multiple image slides.

Prognostic Model Development and Validation

Since all the pathology images and clinical information from the CAMS cohort had been strictly reviewed and assessed by a pathologist, Dr. Lin Yang, this cohort was used as the training set to develop a pathology image-based prognostic model for lung ADC patients. The morphological features were first screened by their association with patients’ survival using a univariate Cox proportional hazards regression model. Morphological features that were significantly associated (Z score <−2 or >2) with patients’ overall survival were selected to build a prognostic model using the random survival forest method[12]. The model was then validated in ADC patients from the NLST and SPORE cohorts, respectively. Using the risk scores assigned by the model, the patients were separated into high- and low-risk groups by the median risk score in each of the two testing sets.

Statistical Analysis

The survival curve for each group was estimated by Kaplan-Meier method. The differences in the overall survival outcomes between high- and low-risk groups were compared using the log-rank test. Multivariate Cox proportional hazards models were used to determine the association between predicted risk groups and overall survival after adjusting for key clinical variables, including age, sex, smoking status, grade, race and stage. All the analyses were performed with R[10] version 3.4.1.

Results

Extracted Morphological Features are Associated with Patients’ Survival Outcome

In total, 943 morphological features were extracted from H&E stained tumor tissue images. Among these morphological features, the top 15 features were significantly associated with patients’ survival outcome in the CAMS cohort (Table 2). Top features with the most significant Z scores were enriched in the categories of “Tissue Granularity”, “Nuclei Texture” and “Nuclei Size Shape”. Some of the features showed elevated levels of measurement in the high-risk group, whereas others showed the opposite pattern.
Table 2

Selected Morphological Features for Predictive Model.

FeatureCategoryZscore
Granularity_14_MaskedHemaTissue_Granularity−2.01
Granularity_7_MaskedEosinTissue_Granularity−2.06
Mean_Nuclei_AreaShape_Zernike_2_2Nuclei_Size_Shape−2.65
Mean_Nuclei_AreaShape_Zernike_4_4Nuclei_Size_Shape−2.44
Mean_Nuclei_Texture_Contrast_Inverted_3_0Nuclei_Texture−2.27
Mean_Nuclei_Texture_Contrast_Inverted_3_135Nuclei_Texture−2.28
Mean_Nuclei_Texture_Contrast_Inverted_3_45Nuclei_Texture−2.44
Mean_Nuclei_Texture_Correlation_Inverted_3_45Nuclei_Texture2.52
Mean_Nuclei_Texture_InverseDifferenceMoment_Inverted_3_135Nuclei_Texture2.01
Mean_Nuclei_Texture_InverseDifferenceMoment_Inverted_3_45Nuclei_Texture2.25
Mean_Nuclei_Texture_Variance_Inverted_3_90Nuclei_Texture−2.07
Mean_Tissue_Texture_InfoMeas2_Inverted_80_135Tissue_Texture−2.6
Mean_Tissue_Texture_InfoMeas2_MaskedEosin_80_135Tissue_Texture−2.03
Mean_Tissue_Texture_InfoMeas2_MaskedEosin_80_45Tissue_Texture−2.11
Texture_InfoMeas2_Inverted_20_90Tissue_Texture−2.03

The 15 morphological features which were used in the predicative model in classifying low- and high-risk ADC patients. Z scores by univariate Cox proportional hazard analysis in CAMS ADC patients and morphological categories were reported for each feature.

Selected Morphological Features for Predictive Model. The 15 morphological features which were used in the predicative model in classifying low- and high-risk ADC patients. Z scores by univariate Cox proportional hazard analysis in CAMS ADC patients and morphological categories were reported for each feature.

Predictive Model is Robust in Different Cohorts

A prognostic model was developed from the CAMS patient cohort using the 15 top features as predictors. The model was then validated in both the NLST and SPORE patient cohorts. The model separated the patients in each test cohort into high- and low-risk groups. The patients in the predicted high-risk group showed a significantly worse survival than those in the predicted low-risk group, in the NLST dataset (pv = 0.0406) and SPORE dataset (pv = 0.0288), respectively. In the NLST dataset, the 5 year survival rate for the group with low risk scores was 81%, with 95% confident interval (CI) = [73–89%] versus 73% (95% CI = [64–83%]) for the group with high risk scores. In the SPORE dataset, the 5 year survival rate for the group with low risk scores was 73% (95% CI = [60–87%]) versus 58% (95% CI = [44–76%]) for the group with high risk scores (Fig. 1a,b). In multivariate analysis adjusting for clinical variables, including age, sex, smoking status, grade, race and stage (Tables 3 and 4), the association between predicted risk group and overall survival was significant for both the NLST cohort, with HR = 2.20 (predicted high-risk vs. low-risk group) and pv = 0.01, and the SPORE cohort with HR = 2.15 and pv = 0.044. Furthermore, the model predicted the prognosis of patients with stage I ADC in both the NLST (n = 123, pv = 0.0089) and the SPORE (n = 68, pv = 0.032) cohorts (Fig. 1c,d).
Figure 1

Kaplan-Meier survival curves for predicted high- and low-risk ADC patients. Using the risk score assigned by the model, the ADC patients were separated into high- and low-risk groups in (a) NLST cohort ADC patients, (b) SPORE cohort ADC patients, (c) NLST cohort Stage I ADC patients, (d) SPORE cohort Stage I ADC patients,. Kaplan-Meier survival curves were created for each risk group. The performance of the predictive model was evaluated by a log-rank test. Black line: predicted low-risk group. Red line: predicted high-risk group.

Table 3

Multivariate analysis for NLST cohort.

HRpv
Predicted risk group2.200.010
Age1.010.65
Gender(Male vs. Female)0.720.27
Smoke(Yes vs. No)1.260.43
Stage II vs. I1.190.69
Stage III vs. I4.04<0.001
Stage IV vs. I2.970.095
Grade 2 vs. 12.150.17
Grade 3 vs. 12.910.053
Grade 4 vs. 1<0.011.00
Table 4

Multivariate analysis for SPORE cohort.

HRpv
Predicted risk group2.150.044
Age0.990.63
Gender(Male vs. Female)1.190.64
Smoke(Yes vs. No)1.350.71
Stage II vs. I2.860.019
Stage III vs. I3.360.005
Stage IV vs. I57.40.004
Race: African American vs. Caucasian0.850.83
Race: Asian vs. Caucasian<0.011.00
Race: Hispanic vs. Caucasian3.250.29
Kaplan-Meier survival curves for predicted high- and low-risk ADC patients. Using the risk score assigned by the model, the ADC patients were separated into high- and low-risk groups in (a) NLST cohort ADC patients, (b) SPORE cohort ADC patients, (c) NLST cohort Stage I ADC patients, (d) SPORE cohort Stage I ADC patients,. Kaplan-Meier survival curves were created for each risk group. The performance of the predictive model was evaluated by a log-rank test. Black line: predicted low-risk group. Red line: predicted high-risk group. Multivariate analysis for NLST cohort. Multivariate analysis for SPORE cohort.

Discussion

Because of the lack of standard guidelines for pathology images, images from different cohorts may vary substantially regarding the slide thickness, sectioning, staining quality and scanning magnitude. Patients in different cohorts may also display different demographic and clinical characteristics. It is essential to test the generalizability of such prognostic models by evaluating the prediction performance across multiple independent test cohorts. In this study, we have successfully validated the H&E stained tumor pathology image-based prognostic models in two independent cohorts, demonstrating the feasibility of integrating such analysis into real medical practice to assist pathologists in cancer diagnosis. Obtaining good quality and highly representative image data from patients may further improve the predication accuracy, which urges a demand for standard guidelines for pathology image acquisition and processing in the field.
  8 in total

1.  Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software.

Authors:  Lee Kamentsky; Thouis R Jones; Adam Fraser; Mark-Anthony Bray; David J Logan; Katherine L Madden; Vebjorn Ljosa; Curtis Rueden; Kevin W Eliceiri; Anne E Carpenter
Journal:  Bioinformatics       Date:  2011-02-23       Impact factor: 6.937

2.  Intratumour heterogeneity in lung cancer.

Authors:  Farhat Yaqub
Journal:  Lancet Oncol       Date:  2014-10-17       Impact factor: 41.316

Review 3.  Refining the treatment of NSCLC according to histological and molecular subtypes.

Authors:  Anish Thomas; Stephen V Liu; Deepa S Subramaniam; Giuseppe Giaccone
Journal:  Nat Rev Clin Oncol       Date:  2015-05-12       Impact factor: 66.675

4.  Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis.

Authors:  Xin Luo; Xiao Zang; Lin Yang; Junzhou Huang; Faming Liang; Jaime Rodriguez-Canales; Ignacio I Wistuba; Adi Gazdar; Yang Xie; Guanghua Xiao
Journal:  J Thorac Oncol       Date:  2016-11-04       Impact factor: 15.609

5.  Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling.

Authors:  Yinyin Yuan; Henrik Failmezger; Oscar M Rueda; H Raza Ali; Stefan Gräf; Suet-Feung Chin; Roland F Schwarz; Christina Curtis; Mark J Dunning; Helen Bardwell; Nicola Johnson; Sarah Doyle; Gulisa Turashvili; Elena Provenzano; Sam Aparicio; Carlos Caldas; Florian Markowetz
Journal:  Sci Transl Med       Date:  2012-10-24       Impact factor: 17.956

6.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes.

Authors:  Anne E Carpenter; Thouis R Jones; Michael R Lamprecht; Colin Clarke; In Han Kang; Ola Friman; David A Guertin; Joo Han Chang; Robert A Lindquist; Jason Moffat; Polina Golland; David M Sabatini
Journal:  Genome Biol       Date:  2006-10-31       Impact factor: 13.583

7.  Supportive and rejective functions of tumor stroma on tumor cell growth, survival, and invasivity: the cancer evolution.

Authors:  Jozsef Dudas
Journal:  Front Oncol       Date:  2015-02-20       Impact factor: 6.244

8.  Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features.

Authors:  Kun-Hsing Yu; Ce Zhang; Gerald J Berry; Russ B Altman; Christopher Ré; Daniel L Rubin; Michael Snyder
Journal:  Nat Commun       Date:  2016-08-16       Impact factor: 14.919

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.