| Literature DB >> 35054182 |
Subrata Bhattacharjee1, Kobiljon Ikromjanov2, Kouayep Sonia Carole2, Nuwan Madusanka3, Nam-Hoon Cho4, Yeong-Byn Hwang2, Rashadul Islam Sumon2, Hee-Cheol Kim2, Heung-Kook Choi1.
Abstract
Biomarker identification is very important to differentiate the grade groups in the histopathological sections of prostate cancer (PCa). Assessing the cluster of cell nuclei is essential for pathological investigation. In this study, we present a computer-based method for cluster analyses of cell nuclei and performed traditional (i.e., unsupervised method) and modern (i.e., supervised method) artificial intelligence (AI) techniques for distinguishing the grade groups of PCa. Two datasets on PCa were collected to carry out this research. Histopathology samples were obtained from whole slides stained with hematoxylin and eosin (H&E). In this research, state-of-the-art approaches were proposed for color normalization, cell nuclei segmentation, feature selection, and classification. A traditional minimum spanning tree (MST) algorithm was employed to identify the clusters and better capture the proliferation and community structure of cell nuclei. K-medoids clustering and stacked ensemble machine learning (ML) approaches were used to perform traditional and modern AI-based classification. The binary and multiclass classification was derived to compare the model quality and results between the grades of PCa. Furthermore, a comparative analysis was carried out between traditional and modern AI techniques using different performance metrics (i.e., statistical parameters). Cluster features of the cell nuclei can be useful information for cancer grading. However, further validation of cluster analysis is required to accomplish astounding classification results.Entities:
Keywords: artificial intelligence; classification; cluster analysis; histopathology; prostate cancer; segmentation
Year: 2021 PMID: 35054182 PMCID: PMC8774423 DOI: 10.3390/diagnostics12010015
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Summary of some existing papers that performed PCa analysis using histopathology images.
| Author | Techniques | Classification Types | Description and Performance |
|---|---|---|---|
| Uthappa et al., 2019 [ | CNN-based | Multiclass | Developed a hybrid unified deep learning network to grade the PCa and achieved an accuracy of 98.0% |
| Khouzani et al., 2003 [ | Handcrafted-based | Multiclass | Calculated energy and entropy features of multiwavelet coefficients of the image and used ML classifier to classify each image to the appropriate grade. They achieved an accuracy of 97.0% |
| Kwak et al., | CNN-based texture and nuclear architectural analysis | Binary class | The author presented a CNN approach to identify PCa. In addition, they extracted handcrafted nuclear architecture features and performed ML classification. The performance of their CNNs (0.95 AUC) was significantly better than that of other ML algorithms |
| Linkon et al., | Different techniques related to PCa detection and histopathology image analysis have been discussed | N/A | The author discussed recent advances in CAD systems using DL for automatic detection and recognition. In addition, they discussed the current state and existing techniques as well as unique insights in PCa detection and described research findings, current limitations, and future scope for research |
| Wang et al., | Morphological, texture, and contrastive predictive coding feature analysis | Binary class | The author proposed a weakly supervised approach for grade classification in tissue micro-arrays using graph CNN. An accuracy of 88.6% and an AUC of 0.96 were achieved using their proposed model |
| Bhattacharjee et al., 2019 [ | Morphological | Binary class | The author used histopathology images to perform morphological analysis of cell nucleus and lumen and carried out multiclass and binary classification. The best accuracy of 92.5% was achieved for binary classification (grade 4 vs. grade 5 using support vector machine classifier |
| Bhattacharjee et al., 2020 [ | Handcrafted and non-handcrafted feature analysis using AI techniques | Binary class | The author introduced two lightweight CNN models for histopathology image classification and performed a comparative analysis with other state-of-the-art models. An accuracy of 94.0% was achieved using the proposed DL model |
| Nir et al., | Glandular-, nuclear-, and image-based feature analysis | Binary class | Proposed some novel features based on intra- and inter-nuclei properties for classification using ML and DL algorithms and achieved the best accuracy of 91.6% for benign vs. all grades using linear discriminant analysis |
| Ali et al., | Morphological and architectural feature analysis from cell cluster graph | Binary class | The author defined cells clusters as a node and constructed a novel graph called Cell Cluster Graph (CCG). In addition, they extracted global and local features from the CCG that best capture the morphology of the tumor. A randomized three-fold cross-validation was applied via support vector machine classifier and achieved an accuracy of 83.1% |
| Kim et al., | Texture analysis using DL and ML techniques | Binary class | The author used DL (long short-term memory network) and ML (logistic regression, bagging tree, boosting tree, and support vector machine) techniques to classify dual-channel tissue features extracted from hematoxylin and eosin tissue images |
Figure 1Histologic findings for each grade of prostate cancer. (a–c) Dataset 1: grade 3, grade 4, and grade 5, respectively. (d–f) Dataset 2: grade 3, grade 4, and grade 5, respectively.
Figure A1Prostate adenocarcinoma with Gleason scores 4 and 3 annotated with red and blue color, respectively.
Figure A2Prostate adenocarcinoma with Gleason scores 4 annotated with red color.
Figure A3Prostate adenocarcinoma with Gleason scores 5 and 4 annotated with orange and red color, respectively.
Figure 2Analytical pipeline for the cluster analysis and AI classification of cancer grades observed in histological sections.
Figure 3Stain normalization. (a) Raw image. (b) Reference image. (c) Normalized image.
Figure 4Stain deconvolution. (a) Normalized image. (b) Hematoxylin channel. (c) Eosin channel.
Figure 5The complete process for nuclear segmentation of cancer cells. (a) Hematoxylin channel extracted after performing stain deconvolution. (b) HSI color space converted from (a). (c) Saturation channel extracted from (b). (d) Contrast adjusted image extracted from (c). (e) Binary image after applying global thresholding on (d). (f) Nuclei segmentation after applying the watershed algorithm on (e). Some small objects and artifacts were removed before and after applying the watershed algorithm.
Figure 6Examples of MST cluster analysis. (a) An MST is based on the minimum distances between vertex coordinates. The red dashed lines indicate the removal of inconsistent edges. (b) An intra-cluster MST was obtained after removal of the nine longest edges from (a); the red circles indicate inter- and intra-cluster similarity. (c) The inter-cluster MST was obtained from (b).
Figure 7Flow chart of MST construction.
Feature selection based on majority voting. The most significant features were selected based on majority “True”. True: Selected, False: Not selected, χ2: Chi-Square Test, FS: Fisher Score, IG: Information Gain, RFE: Recursive Feature Elimination, and PI: Permutation Importance.
| Features |
| FS | IG | ANOVA | RFE | PI | Boruta | Votes | Select/Reject |
|---|---|---|---|---|---|---|---|---|---|
| total intra-cluster total MST distance | True | True | True | True | True | True | True | 7 | Select |
| total intra-cluster nucleus to nucleus maximum distance | True | True | True | True | True | True | True | 7 | Select |
| inter-cluster centroid to centroid total distance | True | False | True | True | True | True | True | 6 | Select |
| inter-cluster total MST distance | True | True | True | True | True | False | True | 6 | Select |
| number of clusters | True | True | True | True | True | False | True | 6 | Select |
| total intra-cluster maximum MST distance | True | True | True | True | True | False | True | 6 | Select |
| average intra-cluster nucleus to nucleus minimum distance | False | True | True | True | True | False | True | 5 | Select |
| average intra-cluster nucleus to nucleus maximum distance | False | True | True | True | True | False | True | 5 | Select |
| average intra-cluster maximum MST distance | False | True | True | True | True | False | True | 5 | Select |
| average cluster area | True | True | False | False | True | True | True | 5 | Select |
| total intra-cluster nucleus to nucleus total distance | True | False | False | True | True | True | True | 5 | Select |
| total intra-cluster minimum MST distance | True | True | True | True | False | False | True | 5 | Select |
| total intra-cluster nucleus to nucleus minimum distance | True | True | True | True | False | False | True | 5 | Select |
| inter-cluster maximum MST distance | True | True | False | False | True | False | True | 4 | Select |
| average intra-cluster total MST distance | False | True | True | False | True | False | True | 4 | Select |
| average intra-cluster minimum MST distance | False | True | True | True | False | False | True | 4 | Select |
| total cluster area | True | False | False | False | False | True | True | 3 | Reject |
| inter-cluster average MST distance | False | False | True | True | False | False | True | 3 | Reject |
| average intra-cluster nucleus to nucleus average distance | False | False | True | True | False | False | True | 3 | Reject |
| inter-cluster centroid to centroid average distance | False | True | False | False | True | False | False | 2 | Reject |
| minimum area of cluster | True | False | False | False | True | False | False | 2 | Reject |
| average intra-cluster nucleus to nucleus total distance | True | False | False | False | False | False | True | 2 | Reject |
| inter-cluster centroid to centroid minimum distance | False | False | False | False | False | False | True | 1 | Reject |
| inter-cluster centroid to centroid maximum distance | False | False | False | False | False | False | True | 1 | Reject |
| maximum area of cluster | True | False | False | False | False | False | False | 1 | Reject |
| inter-cluster minimum MST distance | False | False | False | False | False | False | True | 1 | Reject |
Figure 8Machine learning stacking-based ensemble classification. The data were scaled before training and testing. The classification was carried out in two steps: initial and final predictions using base and meta classifiers, respectively.
Comparative analysis of the performance of supervised and unsupervised classification using test and whole datasets, respectively. A five-fold technique was used for both supervised and unsupervised classification. Split 1 and 2 from supervised and split 2 from unsupervised shows the best results marked in bold.
| (A) Supervised Ensemble Classification—Modern AI Techniques | ||||
|---|---|---|---|---|
| Multiclass Classification (Grade 3 vs. Grade 4 vs. Grade 5) | ||||
| Test Split | Accuracy | Precision | Recall | F1-Score |
|
|
|
|
|
|
| Split 2 | 91.7% | 92.0% | 91.7% | 91.7% |
| Split 3 | 97.2% | 97.3% | 97.3% | 97.3% |
| Split 4 | 94.4% | 94.7% | 94.7% | 94.7% |
| Split 5 | 91.7% | 91.7% | 91.7% | 91.7% |
| Average Split | 94.4% | 94.7% | 94.3% | 94.7% |
| Binary Classification (Grade 3 vs. Grade 5) | ||||
| Test Split | Accuracy | Precision | Recall | F1-Score |
| Split 1 | 91.7% | 91.6 | 0.916 | 0.916 |
|
|
|
|
|
|
| Split 3 | 95.8% | 96.2% | 95.8% | 95.9% |
| Split 4 | 95.8% | 96.2% | 95.8% | 95.9% |
| Split 5 | 91.7% | 92.8% | 91.6% | 92.2% |
| Average Split | 95.0% | 95.0% | 95.0% | 95.0% |
|
| ||||
| Multiclass Classification (Grade 3 vs. Grade 4 vs. Grade 5) | ||||
| Data Split | Accuracy | Precision | Recall | F1-Score |
| Split 1 | 86.1% | 87.0% | 86.0% | 86.3% |
|
|
|
|
|
|
| Split 3 | 86.7% | 88.3% | 86.7% | 87.0% |
| Split 4 | 88.3% | 88.3% | 88.3% | 88.0% |
| Split 5 | 91.6% | 91.7% | 91.7% | 91.7% |
| Average Split | 88.5% | 89.7% | 88.3% | 88.7% |
| Binary Classification (Grade 3 vs. Grade 5) | ||||
| Data Split | Accuracy | Precision | Recall | F1-Score |
| Split 1 | 81.7% | 82.0% | 81.5% | 81.5% |
|
|
|
|
|
|
| Split 3 | 89.2% | 89.5% | 89.0% | 89.0% |
| Split 4 | 86.7% | 87.5% | 86.5% | 86.5% |
| Split 5 | 93.3% | 93.5% | 93.5% | 93.5% |
| Average Split | 88.3% | 88.5% | 88.5% | 88.5% |
Figure 9Confusion matrices of the supervised and unsupervised classification using test and whole datasets, respectively. (a,b) Confusion matrices of multiclass and binary classification using supervised ensemble technique based upon the test split 1 and 2 in Table 3A, respectively. (c,d) Confusion matrices of multiclass and binary classification using an unsupervised technique based upon the data split 2, respectively.
Figure 10Bar charts of the accuracy scores of unsupervised and supervised classifications. (a) Multiclass classification. (b) Binary classification. The performance of each PCa grade was obtained from the confusion matrices.
Figure 11Bar chart of the overall performance scores of supervised and unsupervised classifications.
Figure 12The visualization of intra- and inter-cluster MST graphs. (a–c) The intra-cluster MST of grade 3, grade 4, and grade 5, respectively. (d–f) The inter-cluster MST was generated from a, b, and c, respectively. The dotted red circle indicates the cluster of cell nuclei. Different color lines in a-c and d-f indicate intra- and inter-clusters, respectively.
Figure A4The proliferation and community structure of cell nuclei in the annotated region of grade 3.
Figure A5The proliferation and community structure of cell nuclei in the annotated region of grade 4.
Figure A6The proliferation and community structure of cell nuclei in the annotated region of grade 5.
Comparison with other state-of-the-art approaches. AUC: Area under the curve, DL: Deep learning, ML: Machine learning.
| Authors | Methods | Classification Type | Performance | |
|---|---|---|---|---|
| Uthappa et al., 2019 [ | Hybrid DL | Multiclass (grade 2, 3, 4, and 5) | 98.0% (Accuracy) | |
| Khouzani et al., 2003 [ | ML | Multiclass (grade 2, 3, 4, and 5) | 97.0% (Accuracy) | |
| Kwak et al., 2017 [ | CNN | Binary (benign and cancer) | 0.95 (AUC) | |
| Wang et al., 2020 [ | Graph CNN | Binary (score 3 + 3 and 3 + 4) | 88.6% (Accuracy) | |
| Bhattacharjee et al., 2019 [ | ML | Binary | benign vs. malignant | 88.7% (Accuracy) |
| grade 3 vs. grade 4, 5 | 85.0% (Accuracy) | |||
| grade 4 vs. grade 5 | 92.5% (Accuracy) | |||
| Bhattacharjee et al., 2020 [ | DL | Binary (benign vs. malignant) | 94.0% (Accuracy) | |
| Nir et al., 2018 [ | ML | Binary | benign vs. all grades | 88.5% (Accuracy) |
| grade 3 vs. grade 4, 5 | 73.8% (Accuracy) | |||
| Ali et al., 2013 [ | ML | Binary (no recurrence vs. recurrence) | 83.1% (Accuracy) | |
| Kim et al., 2021 [ | DL | Binary | benign vs. malignant | 98.6% (Accuracy) |
| low- vs. high-grade | 93.6% (Accuracy) | |||
| Proposed | ML | Binary (Split 2) | grade 3 vs. grade 5 | 100% (Accuracy) |
| Multiclass (Split 1) | grade 3 vs. grade 4 vs. grade 5 | 97.2% (Accuracy) | ||
| K-Medoids Clustering | Binary (Split 2) | grade 3 vs. grade 5 | 96.7% (Accuracy) | |
| Multiclass (Split 2) | grade 3 vs. grade 4 vs. grade 5 | 92.3% (Accuracy) | ||