Literature DB >> 35007352

Computer-extracted features of nuclear morphology in hematoxylin and eosin images distinguish stage II and IV colon tumors.

Neeraj Kumar¹, Ruchika Verma², Chuheng Chen², Cheng Lu², Pingfu Fu³, Joseph Willis^4,5, Anant Madabhushi^2,6.

Abstract

We assessed the utility of quantitative features of colon cancer nuclei, extracted from digitized hematoxylin and eosin-stained whole slide images (WSIs), to distinguish between stage II and stage IV colon cancers. Our discovery cohort comprised 100 stage II and stage IV colon cancer cases sourced from the University Hospitals Cleveland Medical Center (UHCMC). We performed initial (independent) model validation on 51 (143) stage II and 79 (54) stage IV colon cancer cases from UHCMC (The Cancer Genome Atlas's Colon Adenocarcinoma, TCGA-COAD, cohort). Our approach comprised the following steps: (1) a fully convolutional deep neural network with VGG-18 architecture was trained to locate cancer on WSIs; (2) another deep-learning model based on Mask-RCNN with Resnet-50 architecture was used to segment all nuclei from within the identified cancer region; (3) a total of 26 641 quantitative morphometric features pertaining to nuclear shape, size, and texture were extracted from within and outside tumor nuclei; (4) a random forest classifier was trained to distinguish between stage II and stage IV colon cancers using the five most discriminatory features selected by the Wilcoxon rank-sum test. Our trained classifier using these top five features yielded an AUC of 0.81 and 0.78, respectively, on the held-out cases in the UHCMC and TCGA validation sets. For 197 TCGA-COAD cases, the Cox proportional hazards model yielded a hazard ratio of 2.20 (95% CI 1.24-3.88) with a concordance index of 0.71, using only the top five features for risk stratification of overall survival. The Kaplan-Meier estimate also showed statistically significant separation between the low-risk and high-risk patients, with a log-rank P value of 0.0097. Finally, unsupervised clustering of the top five features revealed that stage IV colon cancers with peritoneal spread were morphologically more similar to stage II colon cancers with no long-term metastases than to stage IV colon cancers with hematogenous spread.

© 2022 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland. © 2022 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.

Entities: Chemical

Keywords: colon cancer; computational pathology; hematogenous spread; peritoneal spread; quantitative histomorphometric image analysis

Mesh：

Substances：

Year: 2022 PMID： 35007352 PMCID： PMC9007877 DOI： 10.1002/path.5864

Source DB: PubMed Journal: J Pathol ISSN： 0022-3417 Impact factor: 9.883

Introduction

A critical unmet need in gastrointestinal oncology is to identify colorectal cancer patients at high risk of tumor recurrence after potentially curative surgery [1, 2]. Although AJCC tumor, node, metastasis (TNM) staging remains the bedrock of patient risk stratification [3], it is widely recognized that better systems are needed. This is highlighted by the fact that up to 25% of stage II colon cancer (CC) patients will develop distant metastases within a 10‐year period [4, 5, 6]. Multiple morphological and molecular parameters are predictive of patient outcomes, including poor differentiation, lymphovascular or perineural invasion, and tumor infiltration pattern [7, 8, 9]. Tumor‐infiltrating lymphocyte density and tumor budding, along with other parameters, have also been identified as promising prognostic features in CC [10, 11, 12, 13, 14, 15, 16, 17]. However, problems pertaining to the lack of workable quantitative classification schemes and inter‐pathologist reproducibility make implementation of morphology‐based patient risk stratification difficult to achieve [13, 18, 19, 20, 21, 22]. Numerous studies have identified molecular markers that are associated with CC patient outcomes [23, 24]. However, apart from mismatch repair/microsatellite status and BRAF mutation status in microsatellite‐stable CC, none of these markers have proven robust enough to warrant inclusion into standard clinical care pathways [24]. In the past decade, the availability of digital whole slide imaging (WSI) has paved the way for computerized assessment of tissue pathology through quantitative histomorphometric analysis (QHA) for disease characterization. QHA uses computer‐extracted features to decrypt sub‐visual differences of tumor morphology in digital tissue images. Recently, several deep‐learning‐based approaches composed of multiple processing layers to learn feature representations with multiple levels of abstractions [25] have been developed to learn the feature representations for QHA in both supervised [26] and unsupervised [27] approaches from WSI. Such approaches have been widely used for predicting patient outcomes, mutational profiles, and microsatellite instability across tumors of various organs [28, 29, 30, 31, 32, 33]. An alternate, but more interpretable, approach is to use explainable handcrafted features that relate to specific structures in pathology images, e.g. cancer nuclei for predicting disease outcomes [34, 35]. QHA with handcrafted features has demonstrated an ability to reproducibly define patient outcomes in multiple cancer systems [21, 36, 37], to correlate with molecular classifications [36, 38], and tumor–host responses [36]. Recently, it has been shown that nuclear architecture including nuclear shape, size, and texture is useful in cancer diagnosis, grading, prognostication, and prediction of response to therapy in a number of cancer types [35, 39, 40, 41, 42, 43]. Approximately 25% of CC patients will have distant disease at initial diagnosis and about 50% of all CC patients will develop distant metastases – most commonly to the liver. These are assumed to be via venous hematogenous spread [44]. The peritoneal cavity is the third most common site of metastases – after the lung – and is likely to be caused by direct peritoneal extension of the primary cancer in the majority of cases. Apart from definition of involvement of the visceral peritoneum, there are no known morphological or molecular features that separate primary colon cancers with hematogenous metastases from those with peritoneal metastases. In this study, we investigated the hypothesis that stage II CCs of standard histologic type with no evidence of long‐term recurrence are morphologically distinguishable through QHA from stage IV CCs which present with hematogenously derived metastases (typically to the liver and lung). We also investigated whether CCs that recurred via intraperitoneal spread had different QHA profiles to those with hematogenous dissemination. QHA results were generated via ‘handcrafted’ computational image analyses to evaluate the role of CC nuclear shape and texture features from a pathological spectrum of N = 527 WSIs of stage II and stage IV CCs, with 200 cases in a discovery set and 327 cases in a validation set.

Materials and methods

Brief overview

The main steps adopted in our approach were as follows. First, WSIs of hematoxylin and eosin (H&E)‐stained surgical pathology slides of formalin‐fixed, paraffin‐embedded (FFPE) CC specimens were obtained. Second, a deep convolutional neural network was trained to separate tumor regions from non‐tumor regions. Third, segmentation of nuclei was performed using another deep‐learning algorithm trained on a publicly available dataset containing 29 000 manually annotated nuclei, spanning several organs, patients, disease states, and tissue source sites [45]. Fourth, we extracted several features pertaining to architecture, shape, texture, and spatial arrangement of tumor nuclei. These were used in conjunction with our machine learning classifier to distinguish between stage II and stage IV CC and independently validated on the TCGA‐COAD cohort cases. Finally, we interrogated the nuclear features of stage IV CC with peritoneal metastases and compared these with both stage II CC and stage IV CC with hematogenous spread through unsupervised clustering (Figure 1).

Figure 1

Patient inclusion criteria and distribution of cases in the training and validation cohorts.

Dataset description

Figure 1 shows a flow chart of the patient inclusion and exclusion criteria for this study. We obtained H&E WSIs for 151 stage II CCs with long‐term disease‐free survival and 179 stage IV CCs with hematogenous spread (mostly the liver and lung) from University Hospitals Cleveland Medical Center (UHCMC). The UHCMC cases were divided into two subsets: a training dataset, Str, of 100 stage II and 100 stage IV CCs, and a UHCMC validation set, Sv, containing 51 stage II and 79 stage IV CCs for performance evaluation of the trained model. Another independent validation set, St, from the TCGA‐COAD cohort with WSIs of 143 stage II and 54 stage IV CCs (FFPEs) was used for external performance evaluation. All stage IV CCs included in the study had evidence of hematogenous metastases. An expert gastrointestinal pathologist (JW) reviewed all the UHCMC and TCGA‐COAD cases, and only CCs with standard (not otherwise specified)‐type adenocarcinoma were included in the study. All the UHCMC cases included in this study were scanned at 40× microscopic magnification on a Ventana iScan HT scanner (Roche, Nutley, NJ, USA). An additional 28 UHCMC stage IV CCs (FFPEs) with peritoneal metastases (21 without documented hematogenous metastases) were added to the validation set (Sv) to assess the efficacy of the trained classifier to distinguish between stage IV CC with peritoneal metastases and stage IV CC with hematogenous metastases. The number of WSIs used in the training, validation, and test sets is also presented in supplementary material, Table S1.

Tumor segmentation

To localize the tumor region within the WSI, a fully convolutional neural network (FCN) with VGG‐18 architecture [46] was employed for tumor segmentation (see supplementary material, Table S2 for hyperparameter settings). Vahadane et al's [47] approach for color normalization was applied to each WSI before feeding it to the tumor segmentation network. For training, 50 images per class (stage II and stage IV CC) were randomly selected from Str and an expert pathologist performed manual tumor annotations. For validation, an expert pathologist marked the tumor regions in 50 randomly selected cases from the UHCMC validation set Sv. Comparison of the ground‐truth (pathologist marked) and algorithm‐computed tumor segmentation masks for these 50 cases yielded an average intersection‐over‐union (IoU) value of 0.81. The tumor segmentation network predicted the probability of each input patch (size 512 × 512 pixels) belonging to the tumor or not, and cross‐entropy loss function for binary classification (tumor versus non‐tumor) was employed during the training phase. An illustration of the tumor segmentation convolutional neural network is shown in the top row of Figure 2. The trained model was then applied to segment tumor regions in both UHCMC (Sv) and TCGA‐COAD (St) cases used in this study.

Figure 2

Illustration of the tumor and nucleus segmentation modules.

Segmentation of nuclei

Following tumor segmentation, another convolutional neural network to segment nuclei within the tumor region was trained. The training and validation data for segmentation of nuclei were obtained from an international competition on multi‐organ nuclei segmentation – MoNuSeg [45]. The advantage of using the MoNuSeg dataset was that it was sourced from several organs and covered a wide range of nucleus morphologies across cancer types and stages. The nucleus segmentation module used in the current project obtained an average aggregated Jaccard index (AJI) of 0.70 on the challenge's validation dataset – which is on par with the winning entry of the challenge [45]. Although details of the nucleus segmentation module appear in the Supplementary materials and methods, an illustration of our Mask R‐CNN [48]‐based nucleus segmentation is shown in the bottom row of Figure 2.

Feature extraction

A total of 26 641 nuclear features were extracted from the segmented tumor nuclei per 1000 × 1000 patch, where each patch contained around 30 (±10) nuclei on average. The quantified nuclear features extracted were size, shape, texture, orientation, architecture, and spatial organization. Shape features included invariant moment, Fourier descriptor, and length/width ratios. Nuclear texture was captured using the Haralick texture features. We also computed cell cluster graphs (CCGs) to extract the local neighborhood‐based basic shape features as previously described [43]. Disorder in the orientation of tumor nuclei was captured using cell graph tensors (CGTs) defined over the local CCG [49]. A more detailed description of the extracted nuclear features is provided in Supplementary materials and methods.

Feature selection and classifier construction

The most relevant nuclear features for discriminating between stage II and stage IV CC were identified using the Wilcoxon rank‐sum test (WRST). We limited the number of candidate features to 5 to avoid dimensionality and overfitting in the subsequent classifier. The top five WRST‐identified features were then used to construct a random forest (RF) classifier to distinguish between stage II and stage IV CC. The RF classifier was trained on the UHCMC training data (see Dataset description) by keeping 80% of the data for training and the remaining 20% for initial model evaluation. The classifier was trained on a per‐patch basis, where patient level classification decision (stage II versus stage IV) was obtained by tallying the number of patches identified as stage II or stage IV CC and classifying the patient based on which stage had the majority. This voting method was used to classify the features corresponding to each patient's known tumor stage in the training set. The per‐patient patch voting accuracy was defined as the percentage of patients whose tumor stage was classified correctly using this method.

Statistical analyses

We evaluated the classification accuracy of the dichotomous machine learning classifier, for stage II versus stage IV classification, in terms of precision‐recall receiver operating characteristics area under the curve (ROC‐AUC). We also assessed the distribution of the top five most relevant and discriminative tumor nuclear features using violin plots. A supplementary survival analysis was performed on overall survival data available for TCGA‐COAD using Kaplan–Meier plots and a Cox proportional hazards model. No survival analysis was carried out for UHCMC cases, as survival information was not available for those patients. We also employed uniform manifold approximation and projection (UMAP) embeddings and violin plots to illustrate the clustering of stage II, peritoneal stage IV, and hematogenous stage IV CC in the nuclear feature space to highlight the separation between these cases according to the top discriminatory features obtained from the supervised classification analysis. We also assessed the groupings of stage II and stage IV CC with both peritoneal and hematogenous metastases using an unsupervised hierarchical clustering‐based heatmap. To assess the effectiveness of the trained models, the model with the highest performance in terms of AUC was tested on an external validation set (TCGA‐COAD cohort). Models were trained over the entire primary cohort (UHCMC training set) before being applied, without any retraining to the external validation set.

Results

Patient characteristics

The data in the UHCMC cohort comprised 53% women and 47% men, with an average age of 69 years. Around 72% of patients were Caucasian, while the rest were African Americans. The TCGA‐COAD cohort consisted of 48% women and 52% men, with an average age of 66 years, who were predominantly Caucasian. The tumors of all but one patient included in this study were microsatellite‐stable.

Experiment 1: evaluating the ability of nuclear histomorphometric features to distinguish stage II from stage IV colon cancers

Following feature extraction, the Wilcoxon rank‐sum test (WRST) was employed to select the top five discriminatory features in the UHCMC training set (Str). The selected feature set comprised (1) nuclear area, (2) nuclear perimeter, (3) major‐axis length of nuclei, (4) variance of nuclear contrast, and (5) entropy of nuclear orientation as the most statistically significant discriminatory features. An RF classier with these features on the UHCMC validation set (Sv) and the TCGA‐COAD independent validation set (St) yielded AUCs of 0.81 and 0.78. The AUC‐ROC plots of the combined model with the top five discriminatory features are shown in supplementary material, Figure S1. Figure 3 shows an illustrative example of the quantitative nuclear features extracted to distinguish between stage II and stage IV CC. Features pertaining to nuclear shape and orientation were identified as the most important ones for distinguishing between stage II and stage IV CC. From Figure 3, we can deduce that the nuclei of stage II CC were in general smaller and had less variation in the directionality of their principal axis compared with the nuclei of stage IV CC with hematogenous spread. Further, the nuclei of stage IV CC with peritoneal metastases had an intermediate nuclear size and less variation in their orientation than the stage IV CC with hematogenous metastases.

Figure 3

Illustration of the features extracted from the segmented tumor nuclei for stage II CC and stage IV CC with both peritoneal and hematogenous metastases. (A) Patches of size 1000 × 1000 pixels were extracted from the tumor region of the input WSI. (B) An input patch to the nuclei segmentation module. (C) Output of the nuclei segmentation module, where each nucleus is shown using different colors to show the separation of touching and overlapping nuclei. (D) Nuclear shape features quantifying attributes such as circumference, area, length of major axis, etc. and (E) nuclear orientation features quantifying the direction (in red arrows) of the major axis of each of the segmented nuclei are shown as an illustration. Additional comparative strategies involving feature selection and machine classifiers are provided in supplementary material, Table S3. Additional comparisons were conducted using other nuclear features including cell run length and graph features, and the corresponding results are provided in supplementary material, Table S4 and Figure S2.

Experiment 2: assessing the survival characteristics of the histomorphometrically determined staging groups

Survival analysis was conducted on the independent TCGA validation set to evaluate the efficacy of the model‐generated classification labels for individual tumors to see if there were differences in the survival probabilities across the cases labeled as stage II or IV by the model. The Kaplan–Meier (KM) survival curves of overall survival for the two categories are shown in Figure 4A. The Cox proportional hazards model yielded a hazard ratio of 2.196 (95% CI 1.24–3.88) with a concordance index of 0.71 when only the top five features were used to compute the hazards for high‐risk tumors, treating low‐risk tumors as the baseline. Figure 4B shows the KM curves when unsupervised hierarchical clustering was used to categorize patients into two groups based on the top five quantitative nuclear features. The Cox proportional hazards model yielded a hazard ratio of 1.951 (95% CI 1.18–3.23) with a concordance index of 0.68 when the top five features were used to compute the hazards for the two risk groups identified by the hierarchical clustering. Finally, the KM curves generated from the true stage labels available in the TCGA data are shown in Figure 4C. From Figure 4, it is evident that the top five quantitative nuclear features were able to group the CCs accurately into two categories in both a supervised (low risk versus high risk) and an unsupervised learning framework (class 1 versus class 2).

Figure 4

K–M curves for overall survival on the independent validation set of the TCGA‐COAD cohort. (A) The Cox proportional hazards model generated low‐risk and high‐risk categories using the top five quantitative features. (B) Classes generated using unsupervised hierarchical clustering based on the top five quantitative nuclear features. (C) Original stage labels available in the TCGA‐COAD cohort's clinical data.

Experiment 3: assessing the similarity of stage IV peritoneal versus hematogenous tumors versus stage II CC in terms of nuclear histomorphometric features

To examine the quantitative resemblance of stage IV CC with peritoneal spread to the stage II CC, we created a two‐dimensional embedding of the top five quantitative features that we extracted from each of the stage II CCs (n = 51) and stage IV CCs with either peritoneal (n = 28) or hematogenous (n = 79) metastases using the UMAP algorithm. This embedding is shown as a scatter plot in Figure 5A. From Figure 5A, it is clear that the stage IV CCs with peritoneal metastases adhered more closely to the stage II CCs, which did not progress, than the stage IV CCs with hematogenous spread. It should be noted, however, that stage IV CC with peritoneal spread was used neither for training nor for testing of the classifiers, and that only quantitative nuclear features were extracted from stage IV CC with peritoneal spread.

Figure 5

Comparison of hematogenous and peritoneal metastases of stage IV CC with stage II CC. (A) UMAP illustration of hematogenous versus peritoneal metastases of stage IV CC. (B) Unsupervised clustering‐based heatmap of the top five features that generated maximum cluster separation. True class labels are shown on the left vertical bar beside the heatmap – stage II CCs are shown in green, while blue and red represent stage IV CC with peritoneal and hematogenous metastases, respectively. Another way to examine the closeness of stage II CC and stage IV CC with peritoneal metastases is through unsupervised hierarchical clustering of the quantitative nuclear features. We obtained quantitative nuclear features from all validation cases and performed hierarchical clustering in the nuclear feature space. The hierarchical clustering dendogram was cut at three clusters, and the corresponding heatmap is shown in Figure 6B. Due to the large feature space (>25 000 features), the heatmap is shown only for the top five features that generated the maximum cluster separation. These features were nuclear area, nuclear perimeter, major‐axis length of nuclei, variance of nuclear contrast, and entropy of nuclear orientation – which are similar to those obtained from our previously explained supervised analysis. This indicates that these features are the most relevant features to distinguish between stage II and stage IV CC. It is also evident from Figure 5B that stage IV CCs with peritoneal metastases cluster closer to stage II CC, while stage IV CCs with hematogenous metastases form a separate cluster.

Figure 6

Violin plots of the top discriminatory features between stage II and stage IV CC with both peritoneal and hematogenous metastases. The best‐performing feature from each of the top feature families is shown in this illustration – average nuclear area from nuclear shape features, variance of nuclear orientation from cell‐graph tensor (CGT) features, and variance in local nuclear contrast from local cell‐cluster co‐occurrence nuclear morphology matrix (cCCM) based features. To further examine the differences between the top discriminatory features between stage II and stage IV CC, we analyzed the distributions of normalized feature values (0–1) using violin plots, as shown in Figure 6. It should be noted that the distributions of the most statistically significant features, between stage II and stage IV CC, within each of the top feature families are shown in Figure 6. From Figure 6, it is evident that the nuclei in stage IV CC were larger on average than their stage II counterparts (0.61 versus 0.26, p < 0.01). Additionally, the distribution of nuclear area for stage IV CC was more concentrated around the mean, with a heavy tail, compared with the nuclear area distribution of stage II CC, which was more spread out. Similar trends were observed for the other two discriminatory nuclear features. These trends indicate that the nuclei in stage IV CC were uniformly larger, while the nuclei in stage II tumors were smaller and had a higher degree of variance in their sizes. Furthermore, the entropy of the distribution of the nuclear orientation was higher for stage IV nuclei than for stage II nuclei (0.59 versus 0.40, p < 0.01), indicating that the nuclei in stage IV tumors have a higher degree of orientation disorder, while stage II nuclei were more uniformly oriented. Finally, stage IV nuclei show higher variance in local nuclear contrast compared with stage II nuclei (0.30 versus 0.60, p < 0.01), according to the features obtained from the local cell‐cluster co‐occurrence nuclear morphology matrix (cCCM; see Supplementary materials and methods for details) shown in Figure 6. Conclusively, nuclear area, perimeter, major‐axis length, nuclear contrast, and entropy of nuclear orientation were the most discriminatory features for distinguishing between stage II and stage IV colon tumors.

Discussion

A number of studies have sought to define clinically useful prognostic markers in CC, although apart from features related to stage and grade, along with determination of microsatellite stability and BRAF status, no other markers have been universally incorporated into clinical practice [24]. The need for improved predictive markers in CC is obvious. For example, although most patients with stage II CC are cured by surgery alone, approximately 25% recur – the majority without having received post‐operative adjuvant therapy [1]. Also, 30% of all CCs present as stage III, which has an approximately 40–50% recurrence rate [50]. Even ‘low‐risk’ stage III CC (defined as having 1–3 positive lymph nodes) has a recurrence rate of at least 20%. The majority of CC recurrences are lethal [51]. Thus, a reliable CC prognostication scheme would define patients for whom intense monitoring and potential changes for management could be considered. Quantitative histomorphometric analysis (QHA) methods may play a role in better defining these patients, facilitating enhanced clinical decision support for their treating physicians. In this study, we developed and validated a quantitative, histomorphometric‐based image risk classifier to accurately segregate stage II with at least 5‐year recurrence‐free survival from stage IV CC from digital WSIs of H&E tumor sections. We identified nuclear size features including area, perimeter, and major‐axis length along with nuclear orientation, and local variance in nuclear contrast as the top five discriminatory features that successfully distinguished between stage II and stage IV CC in a single institution validation cohort. These top five tumor nucleus features were independently validated on the TCGA‐COAD cohort cases for stage II versus stage IV classification and were also associated with patients’ overall survival outcomes for respective classes. Finally, in a unique study of the subtypes of stage IV CC, CC with dissemination into the peritoneal cavity via direct extension had nuclear features in between those of stage II CC with no evidence of recurrence and CC with hematogenous spread. Nuclear changes are integral to cancer biology, being one of the earliest recognized features of cancer [52]. Oncogenesis, in the majority of cancer types, is associated with increasing abnormality of nuclear size and shape accompanied by chromatin and nuclear envelope irregularities. These changes are directly related to molecular events of cancer progression and are often specific for individual cancer types [53]. Routine pathological assessments of cancer nuclei are fundamental to cancer classification schemes and are incorporated into prognosis assessments in many cancers. Use of artificial intelligence for computer aided image analysis and classification has revealed important nuclear features that correlate with molecular profiles and clinical outcomes in multiple cancer types [28, 54, 55, 56, 57, 58]. Our findings support these general concepts and specifically identify quantifiable changes in nuclear features of size, entropy of orientation, and local cellular diversity, which are highly correlative with patient outcomes and are likely to be additive to currently accepted prognostic and potentially predictive markers currently used in managing CC patients. The study appears to support the concept that cumulative molecular abnormalities, which are linked to increasing nuclear disorganization, play a pivotal role in CC patient outcomes. Previous studies have failed to identify driver gene mutation accumulation to be associated with CC metastasis [59]. However, it is now recognized that a more complex interaction of oncogenic pathways, such as activated stem‐cell programs, is associated with the likelihood of metastases – though these mechanisms need to be further elucidated [60, 61]. The CC nuclear changes, which are more prominent in stage IV than in stage II CC, are reflective of these molecular processes, although not reproducibly identifiable by a pathologist. Peritoneal cavity seeding, as opposed to hematogenous metastases, requires different molecular events – such as tumor microenvironment interactions – though this has not been well studied. CC with direct local extension to the serosa is a known high‐risk feature of peritoneal cavity recurrence. Our data, using QHA, demonstrating that CCs with peritoneal metastases have an intermediate nuclear phenotype between CCs that do not metastasize and those with hematogenous spread is consistent with this concept. While a number of studies have recently been reported on the use of machine learning for prognosticating survival for colon cancer from WSIs [58], our study was different from these studies in several ways. Specifically, unlike previous black‐box deep‐learning models, our approach used interpretable handcrafted features for quantitative histomorphometric analysis to develop explainable models for downstream digital pathology analysis. As demonstrated through our extensive experiments, the ‘handcrafted’ feature‐based approach showed satisfactory performance on both the initial validation set (UHCMC, Sv) and the independent TCGA‐COAD validation set (St). These results indicate that our approach is not biased to a particular dataset and could be used on external datasets without retraining, which is well‐suited to clinical adoption with further validation. We do acknowledge that our study does have some limitations, the foremost of which being that our analysis was retrospective in nature and was performed on a limited number of patients. Furthermore, additional clinical variables (such as patient age, race, tumor grade) were not combined with image‐derived quantitative histomorphometric features for a comprehensive multivariate analysis. Future efforts will be made to investigate our model on a large patient cohort with multi‐institutional validation. Furthermore, we plan to address some of the critical questions around predictive analytics such as identifying the need for adjuvant chemotherapy for stage II CC with poor prognosis through quantitative histomorphometric analysis. Despite the aforementioned limitations, our study demonstrated that quantitative features pertaining to nuclear area, perimeter, major‐axis length, orientation diversity, and local texture variance are useful in distinguishing between stage II CC with no long‐term metastases and stage IV CC with hematogenous spread. We validated these findings on an independent validation set obtained from the publicly available TCGA‐COAD cohort and also found an association of the identified nuclear features with patients’ survival outcomes. Finally, we also found that stage IV CCs with peritoneal carcinomatosis resemble the stage II CCs with no long‐term metastases more closely than their stage IV CC counterparts with hematogenous metastases in the selected feature space. Further studies to validate these findings on independent multi‐institutional datasets and also for potentially prospective validation are warranted. In conclusion, this study enabled the identification of quantifiable changes in nuclear features of size, entropy of orientation, and local cellular diversity of CC WSI using QHA which are highly correlative with patient outcomes and are likely to be additive to currently accepted prognostic and potentially predictive markers used currently in managing CC patients. Furthermore, this study demonstrates the utility of artificial intelligence‐enhanced ‘handcrafted’ nuclear segmentation image analysis to accurately differentiate between CCs which do and do not metastasize. As opposed to ‘deep‐learning’ platforms, the ‘handcrafted’ approach allows for the translation of image analysis results into oncology and diagnostic pathology practice by defining the abnormalities being measured and allowing for seamless integration of other image analysis pipelines.

Author contributions statement

NK, JW and AM were responsible for conceptualization. JW and NK curated the data. NK, RV, CC, CL and AM developed the methodology. NK and RV carried out formal analysis and validation. NK, RV, CC, CL, PF, JW and AM wrote, reviewed, and/or revised the manuscript. AM and JW administered the project, supervised the study, and acquired funding. All the authors read, and agreed with, the final version of the manuscript. Supplementary materials and methods Figure S1. AUC‐ROC curves for UHCMC and TCGA validation sets for the top five most significant discriminatory features for classification between stage II and stage IV CC with hematogenous metastases using a random forest classifier Figure S2. AUC‐ROC curves for CMC and TCGA validation sets for each feature using a Wilcoxon rank‐sum test for feature selection and random forest as the classification algorithm Figure S3. (A) Example patch with boundaries of nuclei highlighted in green. (B) Table summarizing the first‐order statistics calculated from the nuclei within the patch (referred to in Supplementary materials and methods) Figure S4. An example of a Delaunay Triangulation graph (referred to in Supplementary materials and methods) Figure S5. Example of nuclear Haralick features texture features (referred to in Supplementary materials and methods) Figure S6. Relationship between alpha and CCG (referred to in Supplementary materials and methods) Figure S7. Nuclei orientation features obtained from cell‐graph tensors (referred to in Supplementary materials and methods) Figure S8. Illustration of the basic concept of cell run‐length graph with real image examples (referred to in Supplementary materials and methods) Figure S9. Flowchart for cellular diversity computation (referred to in Supplementary materials and methods) Table S1. Number of whole slide images used in the training, validation, and test sets for experiments reported in this article Table S2. Hyperparameter settings of our tumor and nuclei segmentation convolutional neural networks (CNNs) Table S3. Classification performance of all features as a function of the statistical test of significance (for feature selection) and machine learning model (for classification) Table S4. Classification performance (in terms of AUC‐ROC) of each feature family using a Wilcoxon rank‐sum test for feature selection and random forest as the classification algorithm Table S5. Thirteen Haralick measurements of the co‐occurrence matrix (CM) (referred to in Supplementary materials and methods) Click here for additional data file.

56 in total

1. Quantitative nuclear histomorphometry predicts oncotype DX risk categories for early stage ER+ breast cancer.

Authors: Jon Whitney; German Corredor; Andrew Janowczyk; Shridar Ganesan; Scott Doyle; John Tomaszewski; Michael Feldman; Hannah Gilmore; Anant Madabhushi
Journal: BMC Cancer Date: 2018-05-30 Impact factor: 4.430

2. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study.

Authors: Ole-Johan Skrede; Sepp De Raedt; Andreas Kleppe; Tarjei S Hveem; Knut Liestøl; John Maddison; Hanne A Askautrud; Manohar Pradhan; John Arne Nesheim; Fritz Albregtsen; Inger Nina Farstad; Enric Domingo; David N Church; Arild Nesbakken; Neil A Shepherd; Ian Tomlinson; Rachel Kerr; Marco Novelli; David J Kerr; Håvard E Danielsen
Journal: Lancet Date: 2020-02-01 Impact factor: 79.321

3. Reproducibility of AJCC Criteria for Classifying Deeply Invasive Colon Cancers Is Suboptimal for Consistent Cancer Staging.

Authors: Nicole C Panarelli; Suntrea T G Hammer; Jingmei Lin; Purva Gopal; ILKe Nalbantoglu; Lili Zhao; Jerome Cheng; Adam J Gersten; Jonathan B McHugh; Vinita Parkash; Elena Lucas; Maria Westerhoff
Journal: Am J Surg Pathol Date: 2020-10 Impact factor: 6.394

4. Tumor Budding and PDC Grade Are Stage Independent Predictors of Clinical Outcome in Mismatch Repair Deficient Colorectal Cancer.

Authors: Éanna Ryan; Yi Ling Khaw; Ben Creavin; Robert Geraghty; Elizabeth J Ryan; David Gibbons; Ann Hanly; Sean T Martin; P Ronan O'Connell; Desmond C Winter; Kieran Sheahan
Journal: Am J Surg Pathol Date: 2018-01 Impact factor: 6.394

5. Adjuvant therapy with fluorouracil and oxaliplatin in stage II and elderly patients (between ages 70 and 75 years) with colon cancer: subgroup analyses of the Multicenter International Study of Oxaliplatin, Fluorouracil, and Leucovorin in the Adjuvant Treatment of Colon Cancer trial.

Authors: Christophe Tournigand; Thierry André; Franck Bonnetain; Benoist Chibaudel; Gérard Lledo; Tamas Hickish; Josep Tabernero; Corrado Boni; Jean-Baptiste Bachet; Luis Teixeira; Aimery de Gramont
Journal: J Clin Oncol Date: 2012-08-20 Impact factor: 44.544

6. Comparative lesion sequencing provides insights into tumor evolution.

Authors: Siân Jones; Wei-Dong Chen; Giovanni Parmigiani; Frank Diehl; Niko Beerenwinkel; Tibor Antal; Arne Traulsen; Martin A Nowak; Christopher Siegel; Victor E Velculescu; Kenneth W Kinzler; Bert Vogelstein; Joseph Willis; Sanford D Markowitz
Journal: Proc Natl Acad Sci U S A Date: 2008-03-12 Impact factor: 11.205

7. Comprehensive molecular characterization of human colon and rectal cancer.

Authors:
Journal: Nature Date: 2012-07-18 Impact factor: 49.962

8. Different approaches for extracting information from the co-occurrence matrix.

Authors: Loris Nanni; Sheryl Brahnam; Stefano Ghidoni; Emanuele Menegatti; Tonya Barrier
Journal: PLoS One Date: 2013-12-26 Impact factor: 3.240

9. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning.

Authors: Nicolas Coudray; Paolo Santiago Ocampo; Theodore Sakellaropoulos; Navneet Narula; Matija Snuderl; David Fenyö; Andre L Moreira; Narges Razavian; Aristotelis Tsirigos
Journal: Nat Med Date: 2018-09-17 Impact factor: 53.440

10. Computer Extracted Features from Initial H&E Tissue Biopsies Predict Disease Progression for Prostate Cancer Patients on Active Surveillance.

Authors: Sacheth Chandramouli; Patrick Leo; George Lee; Robin Elliott; Christine Davis; Guangjing Zhu; Pingfu Fu; Jonathan I Epstein; Robert Veltri; Anant Madabhushi
Journal: Cancers (Basel) Date: 2020-09-21 Impact factor: 6.639

1 in total

1. Stain normalization in digital pathology: Clinical multi-center evaluation of image quality.

Authors: Nicola Michielli; Alessandro Caputo; Manuela Scotto; Alessandro Mogetta; Orazio Antonino Maria Pennisi; Filippo Molinari; Davide Balmativola; Martino Bosco; Alessandro Gambella; Jasna Metovic; Daniele Tota; Laura Carpenito; Paolo Gasparri; Massimo Salvi
Journal: J Pathol Inform Date: 2022-09-24

1 in total