| Literature DB >> 35053167 |
János Bencze1,2, Máté Szarka3,4,5, Balázs Kóti4, Woosung Seo6, Tibor G Hortobágyi7, Viktor Bencs8, László V Módis9, Tibor Hortobágyi2,7,10,11.
Abstract
Semi-quantitative scoring is a method that is widely used to estimate the quantity of proteins on chromogen-labelled immunohistochemical (IHC) tissue sections. However, it suffers from several disadvantages, including its lack of objectivity and the fact that it is a time-consuming process. Our aim was to test a recently established artificial intelligence (AI)-aided digital image analysis platform, Pathronus, and to compare it to conventional scoring by five observers on chromogenic IHC-stained slides belonging to three experimental groups. Because Pathronus operates on grayscale 0-255 values, we transformed the data to a seven-point scale for use by pathologists and scientists. The accuracy of these methods was evaluated by comparing statistical significance among groups with quantitative fluorescent IHC reference data on subsequent tissue sections. The pairwise inter-rater reliability of the scoring and converted Pathronus data varied from poor to moderate with Cohen's kappa, and overall agreement was poor within every experimental group using Fleiss' kappa. Only the original and converted that were obtained from Pathronus original were able to reproduce the statistical significance among the groups that were determined by the reference method. In this study, we present an AI-aided software that can identify cells of interest, differentiate among organelles, protein specific chromogenic labelling, and nuclear counterstaining after an initial training period, providing a feasible and more accurate alternative to semi-quantitative scoring.Entities:
Keywords: artificial intelligence (AI); digital image analysis; immunohistochemistry; semi-quantitative scoring
Mesh:
Year: 2021 PMID: 35053167 PMCID: PMC8774232 DOI: 10.3390/biom12010019
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1The different immunolabelling intensities in neurons: (A) strong positivity (3+), (B) moderate positivity (2+), (C) mild positivity (1+), (D) negative (0). The protein was visualized by 3,3′-Diaminobenzidine (DAB) chromogen. Nuclear counterstain with haematoxylin.
Figure 2The selection criteria (A,C) and the deconvoluted 3,3′-Diaminobenzidine (DAB) chromogen (B,D). Class 0 represents an example of a misidentified item (vessels which mimic the shape of a neuron). Class 1 depicts an ideal neuron that could be used for the intensity measurements, which has a large amount of cytoplasm and an easily observed nucleus. H = nuclear counterstain; haematoxylin.
Confusion matrix on test dataset of the convolutional neural network model trained for differentiation between Class 0 and Class 1 type objects (See Figure 2). n = 907; FN = False Negative; TP = True Positive; TN = True Negative; FP = False Positive.
|
| 21 (FN) | 441 (TP) |
|
| 388 (TN) | 57 (FP) |
|
|
|
Figure 3Mean semi-quantitative scores defined by five observers and Pathronus-converted values are depicted on Panels A, B, C for the CNT, AD, and DLB groups, respectively. Panel D shows the original Pathronus inverse mean gray intensities of the experimental groups. Insert of panel B specifies the colored lines depicted on panels A, B, and C.; CNT = control; DLB = dementia with Lewy bodies; AD = Alzheimer’s disease; #1–5 = observers].
Figure 4A remarkable limitation of semi-quantitative scoring compared to digital image analysis is the significantly smaller evaluation range (7 vs. 256) due to the difficulties that are experienced when attempting to detect subtle differences in the labelling intensities using the human eye alone. The observers allocated a score of 2 to both of the images, whereas the Pathronus original method revealed that the intensity of Panel A was 110.02, while that of Panel B was 123.03 on the grayscale (ranged between 0–255). Although the human eye is capable of perceiving small differences, the objective and reproducible categorization of hundreds of images on an extended scale is not possible for human observers whereas possible for a digital image analysis software. This shortage of semi-quantitative scoring may result in statistical bias compared to software-based results.
Decreasing order of experimental groups based on the mean intensities assessed by semi-quantitative scoring, Pathronus original analysis, and reference data.
| Observers | #1 | #2 | #3 | #4 | #5 | Pathronus Converted | Pathronus Original | Reference Data |
|---|---|---|---|---|---|---|---|---|
| Strength of immunopositivity among groups | CNT > DLB > AD | CNT > DLB > AD | CNT > DLB > AD | CNT > DLB > AD | DLB > CNT > AD | CNT > DLB > AD | CNT > DLB > AD | CNT > DLB > AD |
| Statistical significance ( | CNT vs. DLB CNT vs. AD DLB vs. AD | CNT vs. AD | CNT vs. AD | CNT vs. DLB CNT vs. AD DLB vs. AD | - | CNT vs. AD DLB vs. AD | CNT vs. AD DLB vs. AD | CNT vs. AD DLB vs. AD |
Strength of immunopositivity is introduced in decreasing order based on mean immunohistochemical (IHC) intensities of the experimental groups, which were determined by the semi-quantitative scoring of five observers, Pathronus original and converted values as well as the immunofluorescent IHC reference method [16]. Statistical significance among groups by analysis of variance (ANOVA) is also presented for every observer and method. (CNT = control; DLB = dementia with Lewy bodies; AD = Alzheimer’s disease; #1–5 = observers).
Comparison of digital image analysis and semi-quantitative scoring based on relevant factors according to the literature [3,21,22,25,26,33,34,37,39,40,41,42,43,44]. (DAB = 3,3′-Diaminobenzidine).
| Digital Image Analysis | Semi-Quantitative Scoring | |
|---|---|---|
| Expensive | Cost | Cheap |
| Fast | Speed | Slow |
| Not required (except training period) | Histological experiment | Required |
| Objective (with standard settings) | Objectivity | Subjective |
| Based on software and settings | Inter-rater variability | Considerable |
| Not applicable | Intra-rater variability | Notable |
| Yes (except DAB labelling) | Quantification | Not applicable |
| Automatic (after training period) | Operation | Manual |
| Large | Data volume | Limited |
| IT background, slide scanner | Equipment | Light microscope |
| New era | Research purposes | Gold standard |
| ( | |||||||
| CNT | Cohen’s kappa values | ||||||
| Strength of agreement | Observers | #1 | #2 | #3 | #4 | #5 | Pathronus |
| #1 | 0.091 | 0.103 | 0.6 | 0.048 | −0.01 | ||
| #2 | poor | 0.301 | 0.195 | 0.008 | −0.012 | ||
| #3 | poor | fair | 0.169 | −0.023 | −0.009 | ||
| #4 | moderate | poor | poor | −0.004 | −0.034 | ||
| #5 | poor | poor | poor | poor | 0.262 | ||
| Pathronus | poor | poor | poor | poor | fair | ||
| ( | |||||||
| DLB | Cohen’s kappa values | ||||||
| Strength of agreement | Observers | #1 | #2 | #3 | #4 | #5 | Pathronus |
| #1 | 0.063 | 0.138 | 0.516 | 0.226 | 0.177 | ||
| #2 | poor | 0.316 | 0.141 | 0.087 | 0.048 | ||
| #3 | poor | fair | 0.114 | 0.143 | −0.022 | ||
| #4 | moderate | poor | poor | 0.286 | 0.196 | ||
| #5 | fair | poor | poor | fair | 0.270 | ||
| Pathronus | poor | poor | poor | poor | fair | ||
| ( | |||||||
| AD | Cohen’s kappa values | ||||||
| Strength of agreement | Observers | #1 | #2 | #3 | #4 | #5 | Pathronus |
| #1 | 0.204 | 0.179 | 0.457 | 0.034 | 0.195 | ||
| #2 | poor | 0.297 | 0.285 | 0.118 | 0.232 | ||
| #3 | poor | fair | 0.178 | 0.180 | 0.062 | ||
| #4 | moderate | fair | poor | 0.260 | 0.214 | ||
| #5 | poor | poor | poor | fair | 0.116 | ||
| Pathronus | poor | fair | poor | fair | poor | ||
| ( | |||
| CNT | DLB | AD | |
| Fleiss’ kappa | 0.091 | 0.176 | 0.183 |
| <0.005 | <0.005 | <0.005 | |
| Agreement | poor | poor | poor |
Pairwise inter-rater reliability of semi-quantitative scoring by five observers and converted Pathronus data in CNT (A), DLB (B), and AD (C) groups. Crosstabs contain the Cohen’s kappa values (yellow background) and the strength of agreement (blue background) between two different observers. Sub-table D: Fleiss’kappa values show the overall inter-rater reliability by group and their statistical significance. (CNT = control; DLB = Dementia with Lewy bodies; AD = Alzheimer’s disease).