| Literature DB >> 34076968 |
Stuart Astbury1,2, Jane I Grove1,2, David A Dorward3,4, Indra N Guha1,2, Jonathan A Fallowfield3, Timothy J Kendall3,4.
Abstract
Biopsy remains the gold-standard measure for staging liver disease, both to inform prognosis and to assess the response to a given treatment. Semiquantitative scores such as the Ishak fibrosis score are used for evaluation. These scores are utilised in clinical trials, with the US Food and Drug Administration mandating particular scores as inclusion criteria for participants and using the change in score as evidence of treatment efficacy. There is an urgent need for improved, quantitative assessment of liver biopsies to detect small incremental changes in liver architecture over the course of a clinical trial. Artificial intelligence (AI) methods have been proposed as a way to increase the amount of information extracted from a biopsy and to potentially remove bias introduced by manual scoring. We have trained and evaluated an AI tool for measuring the amount of scarring in sections of picrosirius red-stained liver. The AI methodology was compared with both manual scoring and widely available colour space thresholding. Four sequential sections from each case were stained on two separate occasions by two independent clinical laboratories using routine protocols to study the effect of inter- and intra-laboratory staining variation on these tools. Finally, we compared these methods to second harmonic generation (SHG) imaging, a stain-free quantitative measure of collagen. Although AI methods provided a modest improvement over simpler computer-assisted measures, staining variation both within and between laboratories had a dramatic effect on quantitation, with manual assignment of scar proportion being the most consistent. Manual assessment also most strongly correlated with collagen measured by SHG. In conclusion, results suggest that computational measures of liver scarring from stained sections are compromised by inter- and intra-laboratory staining. Stain-free quantitative measurement using SHG avoids staining-related variation and may prove more accurate in detecting small changes in scarring that may occur in therapeutic trials.Entities:
Keywords: artificial intelligence; digital pathology; histological scoring; liver fibrosis
Mesh:
Substances:
Year: 2021 PMID: 34076968 PMCID: PMC8363922 DOI: 10.1002/cjp2.227
Source DB: PubMed Journal: J Pathol Clin Res ISSN: 2056-4538
Figure 1Outline of the study design. (A) Twenty explants were PSR stained at two different laboratories (Edinburgh and Nottingham), with each laboratory staining in two batches of 6 months apart, giving four sets of 20 slides each (E1, E2, N1, and N2). The stained slides were then scored using three different methods (human, HSB, and WEKA). A fifth set of slides were sectioned and left unstained for SHG/TPEF imaging. (B) Each stained set of slides gives six measurement pairs that can be compared to assess inter‐ and intra‐laboratory variation with each scoring method. A single set of stained slides (E1) was used as the comparator with the stain‐free SHG/TPEF set.
Figure 2Representative illustration of intra‐ and inter‐laboratory PSR staining differences and the effect on segmentation using HSB and WEKA classifiers and comparison to SHG/TPEF imaging. WEKA features are coloured as purple = PSR positivity, yellow = lumen, green = tissue, and red = blank space. WEKA_c1: WEKA classifier trained on sections from both laboratories. WEKA_c2: WEKA classifier c1 with further targeted training on sections with greater than 2× divergence in PSR quantification between stain pairs. SHG/TPEF image is coloured as collagen in green/yellow and parenchyma in red.
Figure 3Scatterplots comparing all possible PSR stain pairs specified in Figure 1 between and within the two laboratories, all correlations are Spearman's rho. (A) HSB. (B) WEKA_i: WEKA classifier trained only on sections from either Edinburgh or Nottingham laboratories individually. (C) WEKA_c1: WEKA classifier trained on sections from both laboratories. (D) WEKA_c2: WEKA classifier c1 with further targeted training on sections with greater than 2× divergence in PSR quantification between stain pairs.
Figure 4Spearman correlations of all slide pairs scored by four humans on four separate occasions by volunteers (two trained pathologists [hu1 and hu2] and two non‐clinical researchers [hu3 and hu4]).
Figure 5Boxplot of Spearman correlations for every measurement method (left) and combined scatterplot of scores for each measurement method (right).
Figure 6Using a single stained set of slides (E1), the stain‐based scoring methods were compared to percentage SHG measured using stain‐free SHG/TPEF imaging and the qFibrosis index derived from the measured parameters. WEKA_c2: WEKA classifier c1 with further targeted training on sections with greater than 2× divergence in PSR quantification between stain pairs.