Background

Findings

Implications for patient care

Materials and Methods

Results

Discussion

High-Throughput, Machine Learning-Based Quantification of Steatosis, Inflammation, Ballooning, and Fibrosis in Biopsies From Patients With Nonalcoholic Fatty Liver Disease.

Study Population

Liver Histology

Image Analysis for Steatosis, Hepatocyte Ballooning, and Inflammation

Image Analysis for Fibrosis

Statistical Analysis

Derivation Cohort

Validation of Image Analysis in the Validation Cohort

Reproducibility

Paired Biopsy Specimens

Review 1. A systematic review of the quality of liver biopsy specimens.

2. Liver Fibrosis, but No Other Histologic Features, Is Associated With Long-term Outcomes of Patients With Nonalcoholic Fatty Liver Disease.

3. Utility and appropriateness of the fatty liver inhibition of progression (FLIP) algorithm and steatosis, activity, and fibrosis (SAF) score in the evaluation of biopsies of nonalcoholic fatty liver disease.

4. EASL-EASD-EASO Clinical Practice Guidelines for the management of non-alcoholic fatty liver disease.

5. A methodology for automated CPA extraction using liver biopsy image analysis and machine learning techniques.

6. Nonalcoholic steatohepatitis: a proposal for grading and staging the histological lesions.

7. Nonalcoholic steatohepatitis is the second leading etiology of liver disease among adults awaiting liver transplantation in the United States.

8. Design and validation of a histological scoring system for nonalcoholic fatty liver disease.

9. Second harmonic generation microscopy provides accurate automated staging of liver fibrosis in patients with non-alcoholic fatty liver disease.

10. Case definitions for inclusion and analysis of endpoints in clinical trials for nonalcoholic steatohepatitis through the lens of regulatory science.

1. Current considerations for clinical management and care of non-alcoholic fatty liver disease: Insights from the 1st International Workshop of the Canadian NASH Network (CanNASH).

Review 2. The digital transformation of hepatology: The patient is logged in.

Review 3. From the origin of NASH to the future of metabolic fatty liver disease.

Review 4. Artificial intelligence in gastroenterology and hepatology: Status and challenges.

Review 5. State of machine and deep learning in histopathological applications in digestive diseases.

6. A Distance-Based Framework for the Characterization of Metabolic Heterogeneity in Large Sets of Genome-Scale Metabolic Models.

Review 7. Updates in the quantitative assessment of liver fibrosis for nonalcoholic fatty liver disease: Histological perspective.

8. Efficiency of Machine Learning Algorithms for the Determination of Macrovesicular Steatosis in Frozen Sections Stained with Sudan to Evaluate the Quality of the Graft in Liver Transplantation.

9. A Machine Learning Approach Enables Quantitative Measurement of Liver Histology and Disease Monitoring in NASH.

10. Digital Pathology Enables Automated and Quantitative Assessment of Inflammatory Activity in Patients with Chronic Liver Disease.

Steatosis assessment

Inflammation assessment

Ballooning assessment

Fibrosis assessment

Patients with weight gain

Patients with stable weight

Patients with weight loss

Literature DB >> 31887451

Roberta Forlano¹, Benjamin H Mullish¹, Nikolaos Giannakeas², James B Maurice¹, Napat Angkathunyakul³, Josephine Lloyd³, Alexandros T Tzallas², Markos Tsipouras⁴, Michael Yee⁵, Mark R Thursz¹, Robert D Goldin³, Pinelopi Manousou⁶.

Abstract

BACKGROUND & AIMS: Liver biopsy is the reference standard for staging and grading nonalcoholic fatty liver disease (NAFLD), but histologic scoring systems are semiquantitative with marked interobserver and intraobserver variation. We used machine learning to develop fully automated software for quantification of steatosis, inflammation, ballooning, and fibrosis in biopsy specimens from patients with NAFLD and validated the technology in a separate group of patients.
METHODS: We collected data from 246 consecutive patients with biopsy-proven NAFLD and followed up in London from January 2010 through December 2016. Biopsy specimens from the first 100 patients were used to derive the algorithm and biopsy specimens from the following 146 were used to validate it. Biopsy specimens were scored independently by pathologists using the Nonalcoholic Steatohepatitis Clinical Research Network criteria and digitalized. Areas of steatosis, inflammation, ballooning, and fibrosis were annotated on biopsy specimens by 2 hepatobiliary histopathologists to facilitate machine learning. Images of biopsies from the derivation and validation sets then were analyzed by the algorithm to compute percentages of fat, inflammation, ballooning, and fibrosis, as well as the collagen proportionate area, and compared with findings from pathologists' manual annotations and conventional scoring systems.
RESULTS: In the derivation group, results from manual annotation and the software had an interclass correlation coefficient (ICC) of 0.97 for steatosis (95% CI, 0.95-0.99; P < .001); ICC of 0.96 for inflammation (95% CI, 0.9-0.98; P < .001); ICC of 0.94 for ballooning (95% CI, 0.87-0.98; P < .001); and ICC of 0.92 for fibrosis (95% CI, 0.88-0.96; P = .001). Percentages of fat, inflammation, ballooning, and the collagen proportionate area from the derivation group were confirmed in the validation cohort. The software identified histologic features of NAFLD with levels of interobserver and intraobserver agreement ranging from 0.95 to 0.99; this value was higher than that of semiquantitative scoring systems, which ranged from 0.58 to 0.88. In a subgroup of paired liver biopsy specimens, quantitative analysis was more sensitive in detecting differences compared with the nonalcoholic steatohepatitis Clinical Research Network scoring system.
CONCLUSIONS: We used machine learning to develop software to rapidly and objectively analyze liver biopsy specimens for histologic features of NAFLD. The results from the software correlate with those from histopathologists, with high levels of interobserver and intraobserver agreement. Findings were validated in a separate group of patients. This tool might be used for objective assessment of response to therapy for NAFLD in practice and clinical trials.

Entities: Chemical Disease Gene Species

Keywords: Artificial Intelligence; Diagnostics; NASH; NASH CRN

Year: 2019 PMID： 31887451 PMCID： PMC7397508 DOI： 10.1016/j.cgh.2019.12.025

Source DB: PubMed Journal: Clin Gastroenterol Hepatol ISSN： 1542-3565 Impact factor: 11.382

Histologic scoring systems are subjective and do not reproducibly identify patients with nonalcoholic fatty liver disease (NAFLD). Automated techniques for liver biopsy analysis have required expensive reagents and specialized equipment.

We developed and validated a user-friendly, high-throughput, automated technique for quantitation of fat, inflammation, ballooning, and collagen in liver biopsy specimens. An algorithm was devised using machine learning and developed using liver biopsy specimens from patients with NAFLD. Results correlated with those from histopathologists and there was a high level of reproducibility among users. Results also were more sensitive in detecting changes compared with traditional scores in a cohort of paired liver biopsy specimens.

Automated quantitation of features of liver biopsy specimens might support histopathologists and increase reproducibility in detection of histologic features of NAFLD. This tool might be developed to determine responses to therapeutic agents in practice and clinical trials. Nonalcoholic fatty liver disease (NAFLD) is an increasing cause of chronic liver disease worldwide, with an estimated global prevalence of approximately 25%. It is associated closely with type 2 diabetes and the metabolic syndrome, with the increasing incidence of the disease closely reflecting population trends toward increasing levels of obesity, to the extent that NAFLD is now the second most common etiology of liver disease requiring liver transplantation in the United States. Liver biopsy remains the reference standard for the diagnosis and staging of NAFLD, with the Nonalcoholic Steatohepatitis Clinical Research Network (NASH CRN) Scoring System commonly used to stage disease severity. This semiquantitative system consists of a set of scores allocated by the pathologists for each of 4 key histologic features: steatosis (0–3), lobular inflammation (0–3), hepatocyte ballooning (0–2), and fibrosis (0–4). The first 3 features have their respective scores summed to generate the NAFLD Activity Score (NAS) (0–8), and the fibrosis score is allocated based on an assessment of specific architectural patterns of fibrosis. The NASH CRN scoring system was developed by a group of 9 expert academic liver pathologists, between whom there was a high level of agreement. However, other studies have identified poor reproducibility in the assessment of key features of NASH, even among specialist pathologists, with even lower reproducibility between general pathologists. This lack of consistency and objectivity is a concern, particularly in the context of NAFLD clinical trials using histologic end points. More specifically, the resolution of NASH without worsening of fibrosis, or the improvement of fibrosis without resolution of NASH, are commonly used criteria in current NAFLD trials, and the need for rapidly assessed, objective, and reproducible end points currently is unmet. For more than a decade, a range of morphometric techniques and computerized image analysis programs have been developed with the aim of providing more reproducible results for grading histologic features in liver disease, and principally steatosis and fibrosis., Such methods consistently show clear advantages related to reproducibility and objectivity over semiquantitative scoring, but none of them is presently in clinical use because most require high-resolution images and often require specialized equipment. Furthermore, to our knowledge, very few studies have attempted a quantitative assessment of ballooning and inflammation in NAFLD., A recent consensus document from the Case Definitions Working Group of the Liver Forum recognized the potential role of quantitation as an entry criterion to drug trials within the field. This study’s primary aim was to develop and validate a high-throughput, fully automated, machine learning–based system for the quantitation of all 4 key histologic features contributing to the NASH CRN score, using liver biopsy specimens obtained from patients with NAFLD.

We retrospectively assessed all consecutive patients with biopsy-proven NAFLD followed-up at the Liver Unit of St. Mary’s Hospital (Imperial College Healthcare NHS Trust, London, United Kingdom) from January 2010 to December 2016. The study population therefore was divided into 2 subgroups: the derivation cohort (including patients who underwent liver biopsy from January 2010 to December 2012) and the validation cohort (including those who had the procedure from January 2013 to December 2016). At the time of the liver biopsy, a full range of clinical parameters was recorded. Exclusion criteria were the use of steatogenic drugs, excess alcohol consumption (>14 units/wk), as well as comorbidities.

Liver biopsies were performed using the Menghini technique. Further details are available in the Supplementary Methods section. All 4 features were annotated manually in the images of liver biopsy specimens from the derivation group by either one or the other of the expert hepatobiliary pathologists (working independently of each other) to allow training of the machine learning algorithm used to perform the automated image analysis. Finally, the image analysis developed from the derivation group was used for the quantitation of all 4 features in images of the liver biopsy specimens from the validation cohort.

The proposed methodology for quantitation of these features engaged machine learning techniques with conventional image processing methods. Full details are provided in the Supplementary Methods section. The results of the quantitation are expressed as the percentage of fat (fat%), percentage of inflammation (inflammation%), and percentage of ballooning (ballooning%). An example of the output from the machine learning algorithm is shown in Figure 1.

Figure 1

Image analysis for quantitation of steatosis, inflammation, and ballooning. (A) Magnified image of a liver biopsy specimen stained in H&E and scored as steatosis grade 3 (moderate, ≥66%), lobular inflammation score of 1 (≤2 foci), and ballooning score of 1 (few ballooned cells). (B) Results of image analysis were as follows: fat was 30.9% (in green), inflammation was 3.4% (in purple), and ballooning was 10.8% (in blue).

The proposed methodology to quantify fibrosis already has been validated in patients with chronic hepatitis C infection. Briefly, it provides a fully automated image analysis of liver biopsy specimens to extract the collagen proportional area (CPA) (Figure 2). This algorithm also includes a final step that allows the user to remove any structural collagen (eg, collagen from large portal tracts, blood vessel wall, and capsule) from the final quantitation of CPA, similar to the methodology used in comparable studies.

Figure 2

Image analysis for quantitation of fibrosis. (A) Image of a liver biopsy stained in Sirius red and scored as fibrosis stage 4 as per the Nonalcoholic Steatohepatitis Clinical Research Network scoring system. (B) Result of image analysis showing a collagen percentage area of 22.5%.

Statistical analysis and details regarding the analysis of reproducibility are provided in the Supplementary Methods section.

A total of 246 consecutive patients with biopsy-proven NAFLD (190 with NASH and 56 with simple steatosis) were evaluated retrospectively. The first 100 patients were included in the derivation cohort and the following 146 patients were included in the validation cohort. Clinical characteristics of included patients are shown in Tables 1 and 2, respectively.

Table 1

Clinical, Demographic, and Biochemical Characteristics of the Study Population

	Study population (n = 246), N (%)	Derivation cohort (N = 100), N (%)	Validation cohort (N = 146), N (%)	P valuea
Male sex	169 (69)	65 (65)	104 (71)	.23
Ethnic group
White non-Hispanic	112 (46)	50 (50)	62 (42)	.08
White Hispanic	16 (6)	6 (6)	10 (6)	.78
Asiatic	69 (28)	24 (24)	45 (31)	.23
Black	49 (20)	19 (19)	30 (21)	.67
Type 2 DM	121 (49)	41 (41)	80 (54)	.35
Arterial hypertension	110 (44)	31 (31)	79 (54)	.001
Dyslipidemia	132 (53)	54 (54)	78 (53)	.12

ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; HbA1c, glycated hemoglobin; HDL, high-density lipoprotein; LDL, low-density lipoprotein; PLT, platelet; type 2 DM, type 2 diabetes mellitus.

P value for the difference between the derivation group and the validation group.

Table 2

Histologic Characteristics of the Study Population and the Derivation and Validation Cohorts

	Study population (N = 246), N (%)	Derivation cohort (N = 100), N (%)	Validation cohort (N = 146), N (%)	P valuea
Steatosis				.44
Grade 1	70 (28)	26 (26)	44 (30)
Grade 2	139 (57)	58 (58)	81 (56)
Grade 3	37 (15)	16 (16)	21 (14)
Lobular inflammation				.18
Score 0	41 (17)	10 (10)	31 (21)
Score 1	163 (67)	73 (73)	90 (62)
Score 2	38 (15)	16 (16)	22 (15)
Score 3	4 (1)	1 (1)	3 (2)
Ballooning				.41
Score 0	56 (23)	14 (14)	42 (29)
Score 1	116 (47)	54 (54)	62 (42)
Score 2	74 (30)	32 (32)	42 (29)
Fibrosis				.3
0	24 (10)	9 (9)	15 (10)
1	67 (27)	20 (20)	47 (32)
1a	27 (11)	10 (10)	17 (11.4)
1b	3 (1)	2 (2)	1 (0.6)
1c	37 (15)	8 (8)	29 (19)
2	40 (16)	21 (21)	19 (13)
3	82 (34)	35 (35)	47 (33)
4	33 (13)	15 (15)	18 (12)

P value for the difference between the derivation and validation groups.

Clinical, Demographic, and Biochemical Characteristics of the Study Population ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; HbA1c, glycated hemoglobin; HDL, high-density lipoprotein; LDL, low-density lipoprotein; PLT, platelet; type 2 DM, type 2 diabetes mellitus. P value for the difference between the derivation group and the validation group. Histologic Characteristics of the Study Population and the Derivation and Validation Cohorts P value for the difference between the derivation and validation groups.

In the derivation group, the median percentage of fat for each grade was as follows: 2.6% (interquartile range [IQR], 1.7%–3.8%) for grade 1 (5%–33%); 15.1% (IQR, 10.1%–20.1%) for grade 2 (34%–67%); and 28.4% (IQR, 20.2%–31.9%) for grade 3 (>67%) (Table 3). The Spearman correlation between the percentage of fat and steatosis grade was strong (Rho = 0.66; P < .001), but with considerable overlap between the groups (Figure 3A and Supplementary Table 1).

Table 3

Results of the Image Analysis for the Derivation and Validation Cohorts

	Derivation cohort, median (IQR)	Validation cohort, median (IQR)	P value
Steatosis	Fat%
Grade 1	2.65 (1.7–3.8)	2.5 (1.8–4.8)	.18
Grade 2	15.1 (10.1–20.1)	15.6 (9.8–20.7)	.42
Grade 3	28.4 (20.2–31.9)	26.1 (22.2–30.5)	.61
Lobular inflammation	Inflammation%
Score 0	0.9 (0.35–1.7)	1.3 (0.2–1.7)	.82
Score 1	1.1 (0.7–3.3)	1.2 (0.6–3.2)	.91
Score 2	3.8 (3.15–4.17)	2.85 (3.3–7.7)	.36
Score 3	5.1 (N/A)	4.7 (4.4–5)	.24
Ballooning	Ballooning%
Score 0	4.9 (4.3–8.7)	6.7 (2.8–8.8)	.2
Score 1	17.8 (13.5–24)	17.6 (13.5–22.8)	.79
Score 2	23 (20.2–32.3)	23.3 (15.9–28.8)	.37
Fibrosis	CPA
0	1.3 (0.6–2)	2 (0.9–2.6)	.1
1	2.3 (1.9–4.3)	2.1 (1.1–3.7)	.87
2	2.4 (2.6–3.6)	2.1 (1.5–3.8)	.9
3	5.1 (2.8–8.2)	5.5 (3.8–7.4)	.11
4	13 (5.5–20.9)	11.1 (7.6–16.6)	.19

ballooning%, ballooning percentage; CPA, collagen proportionate area; Fat%, fat percentage; inflammation%, inflammation percentage; IQR, interquartile range.

Figure 3

Nonalcoholic Steatohepatitis Clinical Research Network (NASH CRN) scoring system against quantitation in the derivation and validation groups. (A) Fat percentage (fat%) and steatosis grade. (B) Inflammation percentage (inflammation%) and inflammation score. (C) Ballooning percentage (ballooning%) and ballooning score. (D) Collagen percentage area (CPA%) and fibrosis stage.

Supplementary Table 1

Semiquantitative Scores Vs Increasing Values of Quantitation

		Cases, N
Fat%	Steatosis grade, NASH CRN scoring system
<10	Mild	61
	Moderate	36
	Severe	3
10.1–20	Mild	7
	Moderate	69
	Severe	7
20.1–30	Mild	2
	Moderate	25
	Severe	13
>30	Mild	0
	Moderate	9
	Severe	14
Inflammation%	Inflammation score, NASH CRN scoring system
<1	Absent	19
	<2 foci	65
	2–4 foci	3
	>4 foci	0
1.1–5	Absent	22
	<2 foci	73
	2–4 foci	19
	>4 foci	1
>5	Absent	0
	<2 foci	25
	2–4 foci	14
	>4 foci	3
Ballooning%	Ballooning score, NASH CRN scoring system
<10	Absent	37
	Few ballooned cells	19
	Many ballooned cells	7
10.1–20	Absent	18
	Few ballooned cells	44
	ballooned cells	21
20.1–30	Absent	1
	Few ballooned cells	34
	Many ballooned cells	23
>30.1	Absent	0
	Few ballooned cells	19
	Many ballooned cells	23
CPA%	Fibrosis stage, NASH CRN scoring system
<2	Stage 0	12
	Stage 1	28
	Stage 2	20
	Stage 3	10
	Stage 4	0
2.1–5	Stage 0	10
	Stage 1	34
	Stage 2	17
	Stage 3	34
	Stage 4	0
5.1–10	Stage 0	2
	Stage 1	5
	Stage 2	3
	Stage 3	28
	Stage 4	11
>10.1	Stage 0	0
	Stage 1	0
	Stage 2	0
	Stage 3	11
	Stage 4	22

Ballooning%, percentage of ballooning; CPA%, percentage of collagen proportionate area; Fat%, percentage of fat; Inflammation%, percentage of inflammation; NASH CRN, Nonalcoholic Steatohepatitis Clinical Research Network.

Results of the Image Analysis for the Derivation and Validation Cohorts ballooning%, ballooning percentage; CPA, collagen proportionate area; Fat%, fat percentage; inflammation%, inflammation percentage; IQR, interquartile range. Nonalcoholic Steatohepatitis Clinical Research Network (NASH CRN) scoring system against quantitation in the derivation and validation groups. (A) Fat percentage (fat%) and steatosis grade. (B) Inflammation percentage (inflammation%) and inflammation score. (C) Ballooning percentage (ballooning%) and ballooning score. (D) Collagen percentage area (CPA%) and fibrosis stage. The fat% derived from the automated quantitation then was compared with the ratio obtained by the manual annotations of the histopathologists. There was excellent concordance between manual annotations and automatic measurements, with an interclass correlation coefficient (ICC) of 0.97 (95% CI, 0.95–0.99; P < .001).

In the derivation group, the median percentage of inflammation for each score was as follows: 0.9% (interquartile range [IQR], 0.3%–1.7%) for a score of 0; 1.1% (IQR, 0.7%–3.3%) for a score of 1, 3.8% (IQR, 3.15%–4.17%) for a score of 2; and 5.1% for a score of 3 (Table 3). The Spearman correlation between inflammation% and inflammation score was significant (Rho = 0.36; P < .001) and the relation was linear (Jonckheere–Terpstra test [JTT] test z = 4.2; P < .001). A significant overlap was evident between the percentage of inflammation and inflammation scores (Figure 3B and Supplementary Table 1). The percentage of inflammation derived from the automated quantitation then was compared with the ratio obtained by the manual annotations of the histopathologists. There was excellent concordance between manual annotations and automatic measurements, with an ICC of 0.96 (95% CI, 0.9–0.98; P < .001).

In the derivation cohort, the median percentage of ballooning for each score was as follows: 4.9% (IQR, 4.3%–8.7%) for a score of 0, 17.8% (IQR, 13.5%–24%) for a score of 1; and 23% (IQR, 20.2%–32.3%) for a score of 2 (Table 3). The Spearman correlation between the percentage of ballooning and ballooning score was statistically significant (Rho = 0.52; P < .001) and the relation was linear (JTT test z = 4.4; P < .001). There was a significant overlap between ballooning% and ballooning scores (Figure 3C and Supplementary Table 1). The percentage of ballooning derived from the automated quantitation then was compared with the ratio obtained by the manual annotations of the histopathologists. There was excellent concordance between manual annotations and automatic measurements, with an ICC of 0.94 (95% CI, 0.87–0.98; P < .001).

In the derivation group, the median CPA for each stage was as follows: 1.3% (IQR, 0.6%–2%) for stage 0; 2.3% (IQR, 1.9%–4.3%) for stage 1; 2.4% (IQR, 1.6%–3.6%) for stage 2; 5.1% (IQR, 2.8%-8.2%) for stage 3; and 13% (IQR, 5.5–20.9) for stage 4 (Table 3). The Spearman correlation between CPA and fibrosis stage had a Rho value of 0.57 (P = .01). Significant overlap was evident between early stages of fibrosis (Figure 3D and Supplementary Table 1). CPA derived from the automated quantitation then was compared with the ratio obtained by the manual annotations of the histopathologists. There was excellent concordance between manual annotations and automatic measurements, with an ICC of 0.92 (95% CI, 0.88–0.96; P < .001).

In the validation cohort, the median percentage of fat was 2.5% (IQR, 1.8%–4.8%) for grade 1, 15.6% (9.8%–20.7%) for grade 2, and 26.1% (IQR, 22.2%–30.5%) for grade 3. There was no difference between the derivation and validation groups in terms of the median percentage of fat (Table 3). The median percentage of inflammation was 1.3% (IQR, 0.2%–1.7%) for a score of 0; 1.2% (IQR, 0.6%–3.2%) for a score of 1; 2.85% (IQR, 3.3%–7.7%) for a score of 2; and 4.7% for a score of 3 (IQR, 4.4%–5%). There was no difference between the derivation and validation groups in terms of median percentage of inflammation (Table 3). The median percentage of ballooning was 6.7% (IQR, 2.8%–8.8%) for a score of 0; 17.6% (IQR, 13.5%–22.8%) for a score of 1; and 23.3% (IQR, 15.9%–28.8%) for a score of 2. There was no difference between the derivation and validation groups in terms of the median percentage of ballooning (Table 3). The median percentage of CPA was 2% (IQR, 0.9%–2.6%) for a stage of 0; 2.1% (IQR, 1.1%–3.7%) for a stage of 1; 2.1% (IQR, 1.5%–3.8%) for a stage of 2; 5.5% (IQR, 3.8%–7.4%) for a stage of 3; and 11.1% (IQR, 7.6%–16.6%) for a stage of 4. There was no difference between the derivation and validation groups in terms of CPA (Table 3). Binary logistic regression was used to generate a variable that combined the percentage of fat, ballooning, and inflammation for predicting the presence of NASH (NAS score, ≥5): combined variable = 0.058 ∗ (fat%) + 0.079 ∗ (ballooning%) + 0.485 ∗ (inflammation%) – 3.882. The area under the receiver operating characteristic curve of such variables for diagnosing NASH (NAS score, ≥5) was 0.802 (95% CI, 0.68%–0.89%; P = .001) (Supplementary Figure 1). A cut-off value of 0.31 showed a sensitivity of 80%, a specificity of 62%, a positive predictive value of 60%, and a negative predictive value of 72%.

Supplementary Figure 1

The area under the receiver operating characteristic curve (ROC) curve for the variable combining the percentage of fat, percentage of ballooning, and percentage of inflammation for diagnosing nonalcoholic steatohepatitis (nonalcoholic fatty liver disease activity score, ≥5).

The areas under the receiver operating characteristic curves of CPA for diagnosing fibrosis F ≥ F2, F ≥ F3, and F4 were 0.72 (95% CI, 0.66–0.8; P < .001), 0.82 (95% CI, 0.76–0.88; P < .001), and 0.89 (95% CI, 0.82–0.95; P < .001), respectively, with the best cut-off values of 2.05%, 3.1%, and 8.1%, respectively (Supplementary Table 2).

Supplementary Table 2

AUROCs of Fat%, Inflammation%, and Ballooning% for Diagnosing NASH (NAS Score, ≥5) and AUROCs of CPA for Diagnosing F ≥ 2, F ≥ 3, and F4

AUROC, area under the receiver operating characteristic curve; CPA, collagen proportionate area; NAS, Nonalcoholic Fatty Liver Disease Activity Score; NASH, nonalcoholic steatohepatitis.

In the whole population, using automated quantitation, intraobserver and interobserver agreement was excellent compared with the NASH CRN scoring system. Full details are shown in Supplementary Table 3.

Supplementary Table 3

Interobserver and Intraobserver Agreement κ Coefficients Between Two Pathologists for the NASH CRN Scoring System and Image Analysis for the Whole Population

Histologic features	NASH CRN Scoring System		Image analysis
	Interobserver agreement weighted κ (95% CI)a	Intraobserver agreement weighted κ (95% CI)a	Interobserver agreementICC (95% CI)a	Intraobserver agreementICC (95% CI)a
Steatosis	0.73 (0.52–0.94)	0.88 (0.73–0.99)	0.98 (0.95–0.99)	0.96 (0.92–0.98)
Lobular inflammation	0.68 (0.43–0.94)	0.6 (0.29–0.9)	0.99 (0.97–0.99)	0.99 (0.98–0.99)
Ballooning	0.6 (0.3–0.89)	0.58 (0.3–0.87)	0. 96 (0.92–0.98)	0.95 (0.788–0.98)
Fibrosis	0.69 (0.48–0.89)	0.61 (0.37–0.85)	0.98 (0.96–0.99)	0.97 (0.95–0.99)

ICC, intraclass correlation coefficient; NASH CRN, Nonalcoholic Steatohepatitis Clinical Research Network.

P < .001.

A subset of 20 patients underwent paired liver biopsies, with a median time interval of 45 months (range, 15–88 mo) between biopsies. The repeated liver biopsy was performed for clinical reasons (ie, to restage NAFLD). Of note, 7 patients reported significant weight gain, 9 reported stable weight, and 4 reported significant weight loss. The changes in the 4 histologic features were analyzed in each of the 3 groups (Supplementary Figure 2, Supplementary Figure 3, and 4).

Supplementary Figure 2

Analysis of the percentage of fat, inflammation, and ballooning in paired liver biopsy specimens. (A and B) Paired liver biopsy specimens in a patient who gained weight in a time interval of 24 months. (A) At the baseline liver biopsy, the Nonalcoholic Steatohepatitis Clinical Research Network (NASH CRN) scoring system was steatosis grade 2 (33%–66%), inflammation score of 1 (<2 foci), ballooning score of 2 (many ballooned cells), and the percentage of fat (fat%) was 19.8%, the percentage of inflammation (inflammation%) was 1.9%, and the percentage of ballooning (ballooning%) was 24.5%. (B) In the follow-up liver biopsy, the NASH CRN scoring system was steatosis grade 2 (33%–66%), inflammation score if 1 (<2 foci), ballooning score of 1 (few ballooned cells), and fat% was 30.5%, inflammation% was 0.5%, and ballooning% was 18.5%. (C and D) Paired liver biopsy specimens in a patient who lost weight during a time interval of 60 months. At the baseline liver biopsy, the NASH CRN scoring system was steatosis grade 2 (33%–66%), inflammation score of 1 (<2 foci), ballooning score of 1 (few ballooned cells), and fat% was 13.5%, inflammation% was 1.44%, and ballooning% was 32%. In the follow-up liver biopsy, the NASH CRN scoring system was steatosis grade of 1 (<33%), inflammation score of 1 (<2 foci), ballooning score of 1 (few ballooned cells), and Fat% was 6.49%, inflammation% was 1.28%, and ballooning% was 17.6%.

Supplementary Figure 3

Comparison of key features from paired liver biopsy specimens, as assessed by the NASH-CRN scoring system and automated quantitation. (A) Steatosis grade; (B) fat percentage; (C) inflammation score; (D) inflammation percentage; (E) ballooning score; (F) ballooning percentage; (G) fibrosis stage; and (H) collagen proportionate area. The Mann–Whitney test was used to compare pre- and post-biopsies in all cases. ∗P < 05.

Supplementary Figure 4

Analysis of the collagen proportionate area (CPA) in paired biopsy specimens. Differences in CPA in follow-up liver biopsy specimens compared with baseline. Patients were divided into 3 groups: those with weight gain, those with stable weight, and those with weight loss.

Overall, the median steatosis grade was 2 at baseline and 3 at follow-up evaluation (FU) (P = .58), with a Δsteatosis grade of +0.5. The median fat% was 19.25% at baseline and 23.43% at FU (P = .48), with a median Δfat% of +1.77%. The inflammation score was 1 at baseline and 1 at FU (P = .9), with a Δinflammation score of 0. Inflammation% was 1.23% at baseline and 1.28% at FU (P = .05), with a Δinflammation% of +0.4%. The ballooning score was 1 at baseline and 2 at FU (P = .57), with a Δballooning score of +0.5. Ballooning% was 15.7% at baseline and 20.3% at FU (P = .03), with a Δballooning% of +6.25%. The fibrosis stage was 2 at baseline and 3 at FU (P = .05), with Δfibrosis stage of +1. The median CPA was 4.6% and 7.5% at FU (P = .028), with a ΔCPA of +2.25%.

Overall, the median steatosis grade was 2 at baseline and 2 at FU (P = .9), with a Δsteatosis grade of 0. The median fat% was 19.5% at baseline and 13.7% at FU (P = .05), with a median Δfat% of -6.3%. The inflammation score was 1 at baseline and 1 at FU (P = .69), with a Δinflammation score of 0. The median inflammation% was 0.87% at baseline and 1.53% at FU (P = .12), with a median Δinflammation% of +0.12%. The ballooning score was 1 at baseline and 1 at FU (P = .63), with a Δballooning score of 0. The median ballooning% was 13.4% at baseline and 19.4% at FU (P = .78), with a Δballooning% of +3.76%. The fibrosis stage was 3 at baseline and 4 at FU (P = .02), with a Δfibrosis stage of +1. The median CPA was 4.1% at baseline and 11.5% at FU (P = .001), with a ΔCPA of +6.3%.

Overall, the steatosis grade was 2 at baseline and 1 at FU (P = .12), with a Δsteatosis grade of -0.5. The median fat% was 16.5% at baseline and 10.5% at FU (P = .08), with a median Δfat% of -9.95%. The inflammation score was 1 at baseline and 1 at FU (P = .18), with a Δinflammation score of +0.5. The median inflammation% was 3.4% at baseline and 1.6% at FU (P = .04), with a Δinflammation% of -0.44%. The ballooning score was 1.5 at baseline and 1 at FU (P = .17), with a Δballooning score of -0.5. The median ballooning% was 19.3% at baseline and 12.35% at FU (P = .04), with a Δballooning% of -5.47%. Fibrosis stage was 1.5 at baseline and 2.5 at FU (P = .05), with a Δfibrosis stage of +1. The median CPA was 6.55% at baseline and 6.75% at FU (P = .12), with ΔCPA of 1.75%.

Histology remains the reference standard to diagnose and stage NAFLD. In the absence of validated noninvasive markers, liver biopsy remains the only modality through which the presence of NASH may be assessed. The NASH CRN score, the widely validated histologic system for grading NASH, was not designed to replace the histopathologist’s overall assessment of disease category (eg, NASH/borderline NASH/not NASH), but rather to provide a measurable scale for use in trial end points. However, significant concerns exist regarding the reproducibility of the assessment of these histologic features between different pathologists by conventional scores., There also are questions about the objectivity of these techniques, as shown by the apparent significant disparities between the quantitation of fat on liver biopsy specimens made by pathologists when compared with using more objective assessment methods. In this study, we propose a technique based on image analysis and machine learning for the quantitation of all 4 key histologic features included within the NASH CRN scoring system. The study involved 2 hepatobiliary pathologists examining biopsy specimens from a large cohort of patients with NAFLD. The cohort included patients with the full spectrum of the condition, with typical comorbidities seen in Western practice, and across a range of ethnicities. The techniques described here require only modest computational effort, thus consuming very little time and avoiding the need to purchase specialist equipment. The machine learning software is straightforward to install on any device, and quantitation is performed usually within 2 minutes. Therefore, this technology could be applied broadly, even in nonspecialist centers. Moreover, these image analyses, through machine learning techniques, are fully automated and do not require any manual intervention in any step. This is a major advantage compared with other approaches presented in the literature requiring manual input,,, which have an inherent risk of introducing bias. However, it also should be appreciated that a liver biopsy in a patient with NAFLD may provide other valuable histologic information, including assessment of other potential diagnoses or features, such as iron overload. Our study raises some important issues with the traditional reporting systems, showing a significant overlap as well as only a moderate correlation (Rho, ∼0.5) between semiquantitative scores and quantitative results. First, in the sole category in which a direct comparison of quantitation can be made (steatosis), the pathologists consistently overestimated the fat content (median values for NASH CRN stages 1–3 by quantitation were 2.5% vs 15.6% vs 26.1%, respectively), highlighting the limitation of making a quantitative assessment by visual inspection alone. Second, the inflammation score and inflammation quantitation overlapped significantly, although showing a linear relation. This may be because the inflammation score assesses the number of foci of inflammation, whereas the image analysis provides the proportional area of inflammation. Of note, our image analysis includes both lobular and portal inflammation compared with the score that provides lobular inflammation only. Further discussion about steatosis, inflammation, and ballooning% is provided in the Supplementary Discussion section. In terms of fibrosis evaluation, the CPA increased with each fibrosis stage in an exponential rather than linear fashion, in keeping with previous reports. The Brunt et al system for reporting fibrosis, used alongside the NASH CRN score, describes architectural features rather than the quantity of collagen, and the prognostic significance has been well validated by large cohorts with long-term follow-up data.21, 22, 23 Interestingly, CPA also has been associated independently with clinical outcomes in NAFLD, in addition to fibrosis stage. In addition, taken together, our results raise important questions on how to use liver histology to inform end points of clinical trials. Analyzing a subgroup of paired liver biopsy specimens, we have shown that the CRN scoring system is not as sensitive in showing changes compared with quantitation of histologic features. This finding has been particularly striking in the assessment of inflammation and ballooning. Moreover, by combining the percentage of fat, ballooning, and inflammation, it was possible to diagnose NASH accurately using our algorithm; however, the gold standard for the diagnosis of NASH is based on variable combinations of semiquantitative scores in the NAS system, which still remains primarily academic rather than embedded in clinical practice. Furthermore, our quantitation software was not designed primarily to diagnose NASH, but to stage the disease more accurately. By introducing a more sensitive and reliable system, automated quantitation may provide different results in clinical trials and new insights into the pathophysiology of the disease. Moreover, we have shown that CPA increases exponentially with fibrosis stage, challenging the dogma of 1 or more stage reduction or no worsening of fibrosis as outcomes. Given the pattern we have shown, a reduction from stage 4 to stage 3 would reflect a markedly higher antifibrotic effect than from stage 2 to stage 1. Moreover, it may be that a reduction in CPA within stage 4 still may have important clinical benefits, such as risk of decompensation. This needs to be shown in more studies, but we agree with recent calls to include CPA within trial end points. Our present study shows an important limitation, which is the absence of an external validation cohort. However, we conducted an internal validation across a large cohort of patients who collectively represent the full spectrum of NAFLD severity. In conclusion, we have developed a fast-operating and accurate automated image analysis method to quantitate steatosis, ballooning, inflammation, and fibrosis in routine histologic images of patients with NAFLD. These methodologies do not require sophisticated equipment and have shown reliable and reproducible results. Given the key role for the assessment of these features in NASH clinical trials, there is a compelling argument that these techniques should be considered for use as clinical trial end points. There is now a pressing need for related outcome data to assess their role in everyday practice.

29 in total

Authors: Evangelos Cholongitas; Marco Senzolo; Richard Standish; Laura Marelli; Alberto Quaglia; David Patch; Amar P Dhillon; Andrew Kenneth Burroughs
Journal: Am J Clin Pathol Date: 2006-05 Impact factor: 2.493

Authors: Paul Angulo; David E Kleiner; Sanne Dam-Larsen; Leon A Adams; Einar S Bjornsson; Phunchai Charatcharoenwitthaya; Peter R Mills; Jill C Keach; Heather D Lafferty; Alisha Stahler; Svanhildur Haflidadottir; Flemming Bendtsen
Journal: Gastroenterology Date: 2015-04-29 Impact factor: 22.682

Authors: Pierre Bedossa
Journal: Hepatology Date: 2014-06-26 Impact factor: 17.425

Authors:
Journal: J Hepatol Date: 2016-04-07 Impact factor: 25.083

Authors: Markos G Tsipouras; Nikolaos Giannakeas; Alexandros T Tzallas; Zoe E Tsianou; Pinelopi Manousou; Andrew Hall; Ioannis Tsoulos; Epameinondas Tsianos
Journal: Comput Methods Programs Biomed Date: 2016-11-29 Impact factor: 5.428

Authors: E M Brunt; C G Janney; A M Di Bisceglie; B A Neuschwander-Tetri; B R Bacon
Journal: Am J Gastroenterol Date: 1999-09 Impact factor: 10.864

Authors: Robert J Wong; Maria Aguilar; Ramsey Cheung; Ryan B Perumpail; Stephen A Harrison; Zobair M Younossi; Aijaz Ahmed
Journal: Gastroenterology Date: 2014-11-25 Impact factor: 22.682

Authors: David E Kleiner; Elizabeth M Brunt; Mark Van Natta; Cynthia Behling; Melissa J Contos; Oscar W Cummings; Linda D Ferrell; Yao-Chang Liu; Michael S Torbenson; Aynur Unalp-Arida; Matthew Yeh; Arthur J McCullough; Arun J Sanyal
Journal: Hepatology Date: 2005-06 Impact factor: 17.425

Authors: Pik Eu Chang; George Boon Bee Goh; Wei Qiang Leow; Liang Shen; Kiat Hon Lim; Chee Kiat Tan
Journal: PLoS One Date: 2018-06-20 Impact factor: 3.240

Authors: Mohammad Shadab Siddiqui; Stephen A Harrison; Manal F Abdelmalek; Quentin M Anstee; Pierre Bedossa; Laurent Castera; Lara Dimick-Santos; Scott L Friedman; Katherine Greene; David E Kleiner; Sophie Megnien; Brent A Neuschwander-Tetri; Vlad Ratziu; Elmer Schabel; Veronica Miller; Arun J Sanyal
Journal: Hepatology Date: 2018-02-22 Impact factor: 17.425

18 in total

Authors: Giada Sebastiani; Keyur Patel; Vlad Ratziu; Jordan J Feld; Brent A Neuschwander-Tetri; Massimo Pinzani; Salvatore Petta; Annalisa Berzigotti; Peter Metrakos; Naglaa Shoukry; Elizabeth M Brunt; An Tang; Jeremy F Cobbold; Jean-Marie Ekoe; Karen Seto; Peter Ghali; Stéphanie Chevalier; Quentin M Anstee; Heather Watson; Harpreet Bajaj; James Stone; Mark G Swain; Alnoor Ramji
Journal: Can Liver J Date: 2022-02-04

Authors: Tiffany Wu; Douglas A Simonetto; John D Halamka; Vijay H Shah
Journal: Hepatology Date: 2022-01-31 Impact factor: 17.298

Authors: Andreas Geier; Dina Tiniakos; Helmut Denk; Michael Trauner
Journal: Gut Date: 2021-02-25 Impact factor: 23.059

Authors: Jia-Sheng Cao; Zi-Yi Lu; Ming-Yu Chen; Bin Zhang; Sarun Juengpanich; Jia-Hao Hu; Shi-Jie Li; Win Topatana; Xue-Yin Zhou; Xu Feng; Ji-Liang Shen; Yu Liu; Xiu-Jun Cai
Journal: World J Gastroenterol Date: 2021-04-28 Impact factor: 5.742

Authors: Soma Kobayashi; Joel H Saltz; Vincent W Yang
Journal: World J Gastroenterol Date: 2021-05-28 Impact factor: 5.742

Authors: Andrea Cabbia; Peter A J Hilbers; Natal A W van Riel
Journal: Patterns (N Y) Date: 2020-08-06

Authors: Gwyneth Soon; Aileen Wee
Journal: Clin Mol Hepatol Date: 2020-11-19

Authors: Fernando Pérez-Sanz; Miriam Riquelme-Pérez; Enrique Martínez-Barba; Jesús de la Peña-Moral; Alejandro Salazar Nicolás; Marina Carpes-Ruiz; Angel Esteban-Gil; María Del Carmen Legaz-García; María Antonia Parreño-González; Pablo Ramírez; Carlos M Martínez
Journal: Sensors (Basel) Date: 2021-03-12 Impact factor: 3.576

Authors: Amaro Taylor-Weiner; Harsha Pokkalla; Ling Han; Catherine Jia; Ryan Huss; Chuhan Chung; Hunter Elliott; Benjamin Glass; Kishalve Pethia; Oscar Carrasco-Zevallos; Chinmay Shukla; Urmila Khettry; Robert Najarian; Ross Taliano; G Mani Subramanian; Robert P Myers; Ilan Wapinski; Aditya Khosla; Murray Resnick; Michael C Montalto; Quentin M Anstee; Vincent Wai-Sun Wong; Michael Trauner; Eric J Lawitz; Stephen A Harrison; Takeshi Okanoue; Manuel Romero-Gomez; Zachary Goodman; Rohit Loomba; Andrew H Beck; Zobair M Younossi
Journal: Hepatology Date: 2021-06-24 Impact factor: 17.425

Diagnosis of F ≥ 2

0.72 (0.66–0.8)

Diagnosis of F ≥ 3

0.82 (0.76–0.88)

Diagnosis of F4

0.89 (0.82–0.95)

Authors: David Marti-Aguado; Matías Fernández-Patón; Clara Alfaro-Cervello; Claudia Mestre-Alagarda; Mónica Bauza; Ana Gallen-Peris; Víctor Merino; Salvador Benlloch; Judith Pérez-Rojas; Antonio Ferrández; Víctor Puglia; Marta Gimeno-Torres; Victoria Aguilera; Cristina Monton; Desamparados Escudero-García; Ángel Alberich-Bayarri; Miguel A Serra; Luis Marti-Bonmati
Journal: Biomolecules Date: 2021-12-02