Literature DB >> 34987607

Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis.

Pakanat Decharatanachart1, Roongruedee Chaiteerakij2, Thodsawit Tiyarattanachai3, Sombat Treeprasertsuk4.   

Abstract

BACKGROUND: The global prevalence of non-alcoholic fatty liver disease (NAFLD) continues to rise. Non-invasive diagnostic modalities including ultrasonography and clinical scoring systems have been proposed as alternatives to liver biopsy but with limited performance. Artificial intelligence (AI) is currently being integrated with conventional diagnostic methods in the hopes of performance improvements. We aimed to estimate the performance of AI-assisted systems for diagnosing NAFLD, non-alcoholic steatohepatitis (NASH), and liver fibrosis.
METHODS: A systematic review was performed to identify studies integrating AI in the diagnosis of NAFLD, NASH, and liver fibrosis. Pooled sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and summary receiver operating characteristic curves were calculated.
RESULTS: Twenty-five studies were included in the systematic review. Meta-analysis of 13 studies showed that AI significantly improved the diagnosis of NAFLD, NASH and liver fibrosis. AI-assisted ultrasonography had excellent performance for diagnosing NAFLD, with a sensitivity, specificity, PPV, NPV of 0.97 (95% confidence interval (CI): 0.91-0.99), 0.98 (95% CI: 0.89-1.00), 0.98 (95% CI: 0.93-1.00), and 0.95 (95% CI: 0.88-0.98), respectively. The performance of AI-assisted ultrasonography was better than AI-assisted clinical data sets for the identification of NAFLD, which provided a sensitivity, specificity, PPV, NPV of 0.75 (95% CI: 0.66-0.82), 0.82 (95% CI: 0.74-0.88), 0.75 (95% CI: 0.60-0.86), and 0.82 (0.74-0.87), respectively. The area under the curves were 0.98 and 0.85 for AI-assisted ultrasonography and AI-assisted clinical data sets, respectively. AI-integrated clinical data sets had a pooled sensitivity, specificity of 0.80 (95%CI: 0.75-0.85), 0.69 (95%CI: 0.53-0.82) for identifying NASH, as well as 0.99-1.00 and 0.76-1.00 for diagnosing liver fibrosis stage F1-F4, respectively.
CONCLUSION: AI-supported systems provide promising performance improvements for diagnosing NAFLD, NASH, and identifying liver fibrosis among NAFLD patients. Prospective trials with direct comparisons between AI-assisted modalities and conventional methods are warranted before real-world implementation. PROTOCOL REGISTRATION: PROSPERO (CRD42021230391).
© The Author(s), 2021.

Entities:  

Keywords:  AI-assisted system; NAFLD; NASH; artificial intelligence; diagnostic tool; fatty liver; liver fibrosis; machine-learning; non-invasive tests

Year:  2021        PMID: 34987607      PMCID: PMC8721422          DOI: 10.1177/17562848211062807

Source DB:  PubMed          Journal:  Therap Adv Gastroenterol        ISSN: 1756-283X            Impact factor:   4.409


Introduction

Chronic liver disease (CLD) and cirrhosis have a high burden on global health. CLD is the 11th leading cause of death globally, attributing to 1.1 million deaths annually. In previous decades, the major causes of cirrhosis were chronic hepatitis B (HBV) and hepatitis C (HBC) infection. More recently, the main causes of cirrhosis have shifted to non-alcoholic steatohepatitis (NASH). The global prevalence of non-alcoholic fatty liver disease (NAFLD) is estimated at 25% and is predicted to increase up to 30% in 2030.[3,4] Moreover, liver-specific deaths are also significantly increasing in patients with NAFLD, especially patients with NASH. An updated term ‘metabolic associated fatty liver disease (MAFLD)’ has been proposed to replace NAFLD which establishes the disease as a metabolic disorder.[5,6] This revision highlights the importance of early detection and risk factor modification to slow steatosis and fibrosis progression. The gold standard for the diagnosis of NAFLD, NASH, and cirrhosis is liver biopsy. It provides an assessment of hepatic steatosis, inflammation, and fibrosis. However, liver biopsy is relatively invasive with complications, such as hemoperitoneum and hemothorax. Due to its invasive nature, liver biopsy is also not pragmatic as a follow-up tool. Alternative diagnostic methods for NAFLD, such as clinical/laboratory scores and imaging modalities have been proposed, but with limited performance. For example, NAFLD Liver Fat Score has a sensitivity of 86% and specificity of 71%, whereas ultrasonography have a reasonable performance for the diagnosis of moderate steatosis (>33% of hepatocytes contain steatosis) but is less reliable for mild steatosis (⩽33% steatosis). Magnetic resonance imaging proton density fat fraction (MRI-PDFF) has greater accuracy but comes with a high cost and limited availability. Moreover, limitations also extend to the detection of NASH and significant fibrosis among NAFLD patients. For example, the previously reported area under the receiver operating characteristic curves (AUROCs) for the diagnosis of NASH among NAFLD were up to 0.82 for ultrasonography scores (e.g. ultrasonography fatty liver indicator and ultrasonography fatty score) and 0.82 for transient elastography (TE). On the contrary, the AUROCs for detecting significant fibrosis among NAFLD were 0.83 for TE, 0.88 for MRE and 0.64–0.75 for clinical scoring systems, for example, BARD score (0.64) and FIB-4 (0.75). Artificial intelligence (AI) has begun to be incorporated into these clinical scoring systems and imaging modalities in order to improve diagnostic performance. Over the past decade, AI has been used to identify and predict patterns or connections within large data sets in various fields of medicine, demonstrating particular usefulness in the diagnostic process. Previous systematic review of AI in hepatology reported on the utilization of machine-learning for assessing liver fibrosis, predicting liver decompensation, screening eligible liver transplant recipients as well as predicting post-transplant survival and complications.[13,14] Another recent systematic review summarized the integration of AI in imaging modalities, digital pathology, and electronic health records for the diagnosis and staging of NAFLD. The review emphasized on the high accuracy of AI-based system for NAFLD diagnosis and staging. However, very few meta-analyses have been conducted to summarize the overall diagnostic performance of AI-assisted diagnosis of liver diseases. In this systematic review and meta-analysis, we aimed to determine the performance of AI-assisted systems for the diagnosis of NAFLD, NASH, and liver fibrosis.

Methods

The study was conducted based on the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) checklist. The protocol was registered with PROSPERO (CRD42021230391).

Search strategy

The objective of the search was to identify studies utilizing AI in the diagnosis and classification of NAFLD, NASH and liver fibrosis among NAFLD patients. A literature search was conducted on MEDLINE, Scopus, Web of Science, and Google Scholar databases. The search was conducted from January 2000 through September 2021. We excluded studies published prior to the year 2000 to avoid obsolete computer-based algorithms which are not consistent with the modern AI classification. The keywords for the search included: ‘artificial intelligence’, ‘computer-assisted’, ‘computer-aided’, ‘neural network’, ‘machine learning’, ‘deep learning’, ‘liver’, ‘hepatic’, ‘steatosis’, ‘fatty’, ‘NAFLD’, ‘NASH’, ‘steatohepatitis’, ‘fibrosis,’ and ‘cirrhosis’. Due to the previously mentioned updated nomenclature, the search term ‘metabolic associated fatty liver disease’ or ‘MAFLD’ was also included. However, at the time of literature search, no studies with MAFLD and AI were identified. The search strategies for all databases are present in the Supplemental method.

Inclusion and exclusion criteria

We included articles using AI to assist in the diagnosis and grading of NAFLD. The inclusion criteria consisted of studies with sufficient data to generate a 2 × 2 table of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The articles also had to specify the reference standard (diagnostic method) and class(es) of AI. The exclusion criteria were studies which did not report the desired outcomes or did not have sufficient data to complete the 2 × 2 table. We also excluded studies that did not clearly describe validation methods or characteristics of training and validation cohorts. Studies in languages other than English as well as reviews, editorials, conference proceedings, and abstracts with incomplete information on the study population or characteristics of source image data sets were also excluded.

Data extraction

Two authors (PD and TT) independently screened the abstracts and titles to select the studies for full-text review. After screening, data extraction and quality assessment were also independently performed and cross-checked by the two authors (PD and TT). Any disagreements were discussed and decided by the third author (RC). Extracted data included author’s last name, publication year, study location, study design (prospective or retrospective cohort), validation methods (k-fold cross-validation and independent validation cohort), characteristics of training and validation cohorts (general population or at-risk population with specific diseases), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) as well as TP, TN, FP, and FN values. For studies with multiple AI classifiers, we selected the AI classifier with the best performance indicated by the best accuracy or greatest area under the curve (AUC).

Quality assessment

The quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool which is comprised of 12 questions assessing risk of bias and applicability in four domains (patient selection, appropriate index test, reference standard, and flow and timing). As mentioned in our previous work, some questions were slightly modified to better assess the quality of AI-related studies. For instance, the interpretation of the index test in clinical diagnostic studies should be conducted with an optimal pre-specified threshold in order to avoid overfitting. In AI-related research, separate validation or testing cohorts should be conducted in order to prevent the overfitting issue. Therefore, we assessed whether the included studies provided clear validation methods. Other questions for the assessment of human-oriented bias were also modified including whether knowing the reference standard results influenced the index test results. This was interpreted as a risk of bias caused by human manipulation in the AI protocol which could affect the AI output.

Statistical analysis

We used Covidence (Veritas Health Innovation, Melbourne, and Australia) for the screening, data extraction, and quality assessment process. After data extraction, TP, FP, TN, and FN values were exported from Covidence. If not available, the values were calculated from sensitivity, specificity, and prevalence using Review Manager version 5.3.5. All statistical analysis was conducted using R software, version 3.6.3, Vienna, Austria. Pooled sensitivity, specificity, PPV, NPV, and diagnostic odds ratio (DOR) with 95% confidence intervals (95% CI) were calculated using random effects model. Summary receiver operating characteristics (SROC) with AUCs were also generated. AUC values of 0.5–0.7, 0.7–0.9, and 0.9–1 indicated low, moderate, and high accuracy, respectively. Heterogeneity was assessed using I2 and Cochrane’s Q statistics. Publication bias was assessed with Deeks’ funnel plot. Subgroup analysis and meta-regression were pre-specified according to population, AI classifiers and diagnostic methods. P values of <0.05 were considered statistically significant. Sensitivity analysis was also performed by excluding studies with uncertain risk or high risk for bias and applicability assessed by the QUADAS-2 criteria.

Results

Literature search

The searching process and results are shown in Figure 1. After literature search, a total of 430 articles were identified. After removing 173 duplicates, 257 abstracts were screened and 183 articles were excluded due to the following reasons: conducted on animals (n = 24), meeting abstracts or proceedings (n = 71), editorials or reviews (n = 20), irrelevant articles (n = 66), and written in languages other than English, that is, Chinese (n = 1) and Arabic (n = 1). Next, 74 full-text articles were assessed for eligibility and 49 were excluded due to the following reasons: focusing on other objectives (n = 17), unclear diagnostic method (n = 11), no desired outcomes (n = 15), no validation cohort characteristics (n = 5), and unclear validation methods (n = 1). A final total of 25 studies were included in the systematic review with 13 of the studies identified for the meta-analysis.
Figure 1.

Flow diagram of search methodology and literature selection process.

Flow diagram of search methodology and literature selection process. The studies in the systematic review were divided into 5 categories: (1) AI-assisted ultrasonography to diagnose NAFLD (n = 6), (2) AI-assisted analysis of clinical data sets to diagnose NAFLD (n = 6), (3) AI-assisted system for the diagnosis of NASH (n = 5), (4) AI-assisted system for the diagnosis of liver fibrosis in NAFLD (n = 5), and (5) AI-assisted system for steatosis quantification in pathological specimen (n = 4). One study evaluated both AI-integrated diagnosis of NASH and fibrosis among NAFLD patients, the result for each category was therefore extracted and included in the respective categories. Seven studies contained multiple AI classifiers. For the studies with a single AI model (n = 18), nine studies used neural network models including artificial neural network (ANN) and convolutional neural network (CNN). Nine studies utilized non-neural network models such as regression tree (RT), rule extraction algorithm, and lasso regression. The diagnostic methods for NAFLD, NASH, and liver fibrosis applied in the included articles were liver biopsy, MRI, elastography, ultrasonography, and ultrasonography in combination with elevated liver chemistries. Details of the extracted data on the studies’ information, developmental and validation cohort characteristics, validation methods, diagnostic method, AI classifiers, and performance are shown in Table 1.
Table 1.

Characteristics of included studies in systematic review (13 studies included in meta-analysis are in bold).

StudyCountryStudy cohortDiagnostic methodAI classifierDevelopment cohort (n)Validation cohort (n)Validation methodsSensitivitySpecificityTPFPTNFN
AI-assisted ultrasonography to diagnose NAFLDNAFLD/total, % SteatosisNAFLD/total, % Steatosis
Kuppili et al.b22PortugalRetrospectiveLiver biopsy (not defined)ELM a , SVM36/63N/AN/Ak-fold cross-validation0.9130.921332253
Byra, et al.c23PolandProspectiveLiver biopsy (>5% hepatocyte steatosis)CNN38/5550% had steatosis <30%N/A5-fold cross-validation1.000.882383150
Biswas et al. c24 PortugalRetrospectiveLiver biopsy (not defined)CNN a , SVM, ELM36/63N/AN/A10-fold cross-validation1.001.00360270
Shi et al. 25 ChinaProspectiveMRI (>5% hepatic fat content)RT34/6092% had steatosis < 20%N/A10-fold cross-validation0.8750.9286302244
Han et al. 26 USProspectiveMRI (>5% hepatic fat content)CNN70/102Average 11 ± 9%70/102Average 11 ± 8%Validation cohort0.970.94682302
Zamanian et al. c27 PolandProspectiveLiver biopsy (>5% hepatocyte steatosis)CNN + SVM38/5550% had steatosis < 30%N/A10-fold cross-validation0.9721.00700742
AI-assisted clinical data sets to diagnose NAFLDNAFLD/totalNAFLD/total
Ma et al. 28 ChinaProspectiveUltrasonographyBN a , kNN, SVM, LR, NB, RF, BN, AdaBoost, HNB, Bagging, AODE2522/10,508N/A10-fold cross-validation0.6750.87817029747012820
Islam et al. 29 TaiwanRetrospectiveUltrasonographyLR a , RF, SVM, ANN593/994N/A10-fold cross-validation0.7410.649439141260154
Wu et al. 30 TaiwanRetrospectiveUltrasonographyRF a , LR, ANN, NB377/577N/A10-fold cross-validation0.8720.8593292817248
Atabaki-Pasdar et al. 31 United KingdomRetrospectiveMRI (⩾5% hepatic fat content)RF640/15141011/4617Validation cohort0.670.746778382668334
Chen et al. 32 ChinaRetrospectiveUltrasonographyANNTotal 10,3542218/4436Validation cohort0.8370.80418574351783361
Liu et al. 33 ChinaRetrospectiveUltrasonographyXGBoost a , LR, SVM, SGD, CNN, MLP, LSTM4018/10,3731860/4942Validation cohort0.6110.90911362802802724
AI-assisted diagnosis of NASH in patients at-risk for NASH
Gallego-Duran et al. 21 SpainProspectiveLiver biopsyLRNASH/NAFLD21/39NASH/NAFLD44/87Validation cohort0.870.603817266
Naganawa et al. 34 JapanRetrospectiveLiver biopsyLRTotal 53NASH/non-NASH7/28Validation cohortNo suspicion of fibrosis: 1.00Suspicion of fibrosis: 1.00No suspicion of fibrosis: 0.92Suspicion of fibrosis: 0.314311111500
Uehara et al. 35 JapanRetrospectiveLiver biopsyRule extraction algorithmNASH/non-NASH79/23NASH/non-NASH65/12Validation cohort0.8620.41756759
Garcia-Carretero et al. 36 SpainRetrospectiveUltrasonography with LFTsLasso regressionNASH/non-NASH204/1587NASH/non-NASH51/397Validation cohort0.700.79368331415
Docherty et al. 37 United StatesRetrospectiveLiver biopsykNN, RF, XGBoost a NASH/NAFLD270/152NASH/NAFLD180/102Validation cohort0.810.66146346834
AI-assisted diagnosis of liver fibrosis in NAFLD
Pournik et al. 38 IranRetrospectiveLiver biopsyANNCirrhotic/non-cirrhotic52/248Cirrhotic/non-cirrhotic15/65Validation cohort0.6570.98744430923
Gallego-Duran et al. 21 SpainProspectiveLiver biopsyLRF0-1/F2-420/19F0-1/F2-456/31Validation cohortF2-4 0.77F2-4 0.802411457
Shahabi et al. 39 IranRetrospectiveElastographyANNF0/F1/F2/F3/F4415/151/132/23/515% of data set(same proportion)Validation cohortF1 0.993F2 0.939F3 1.000F4 1.000F1 0.757F2 0.938F3 0.993F4 1.000----
Okanoue et al.d40JapanRetrospectiveLiver biopsy and ultrasonographyANNNormal/F0/F1/F2/F3/F448/106/74/56/65/23F0/F1/F2/F3-F417/18/15/24Validation cohortNAFLD (F0) vs.NASH (F1-4)0.877NAFLD (F0) vs.NASH (F1-4)0.941507116
AI-assisted diagnosis of liver fibrosis in NAFLD
Okanoue et al.d41JapanRetrospectiveLiver biopsyANNF0/F1/F2/F3/F4106/74/56/65/23F0/F1/F2/F3-F430/27/24/29Validation cohortF0 vs. F1-4: 0.85F0-1 vs. F2-4: 0.755F0-2 vs. F3-4: 0.828F0 vs. F1-4: 0.867F0-1 vs. F2-4: 0.877F0-2 vs. F3-4: 0.877684024471026507112135
AI-assisted steatosis quantification of pathological specimen
Vanderbeck et al. 42 United StatesRetrospectivePathologistSVMMacrosteatosis/other features1100/859N/A10-fold cross-validation0.980.9410724885928
Liu et al. 43 ChinaProspectivePathologistLinear regressionSteatosis grade 0: 0Grade 1: 77Grade 2: 45Grade 3: 24Steatosis grade 0: 1Grade 1: 41Grade 2: 22Grade 3: 9Validation cohortSteatosisgrade 0 vs. ⩾ 1: 0.99grade ⩽ 1 vs. ⩾ 2: 0.91grade ⩽ 2 vs. 3: 0.67Steatosisgrade 0 vs. ⩾ 1: 1.00grade ⩽ 1 vs. ⩾ 2: 0.85grade ⩽ 2 vs. 3: 0.987128606113663033
Sun et al. 44 United StatesProspectivePathologistCNN3066Validation cohort⩾ 30% steatosis0.714⩾ 30% steatosis0.973152736
Teramoto et al. 45 JapanRetrospectivePathologistLogistic regressionMatteoni classification 46 Type 1/type 2/type 3-4 (NASH)33/33/33Matteoni classificationType 1/type 2/type 3-4 (NASH)33/33/33Validation cohorttype 1 vs. NASH: 0.879type 2 vs. NASH: 0.909type 1 vs. NASH: 1.00type 2 vs. NASH: 0.909293006666043

ANN, artificial neural network; AODE, aggregating one-dependence estimators; BN, Bayesian network; CNN, convolutional neural networks; ELM, extreme learning machine; F0-4, METAVIR fibrosis staging; FLD, fatty liver disease; HNB, hidden naïve Bayes; kNN, k-nearest network; LFTs, liver function tests; LR, logistic regression; LSTM, long short-term memory; MLP, multilayer perceptron; MRI, magnetic resonance imaging; NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis; NB, naïve Bayes; RF, random forest; RT, regression tree; SGD, stochastic gradient descent; SVM, support vector machine; XGBoost, extreme gradient boosting.

Selected AI in the analysis.

Studies conducted on the same population cohorts.

Characteristics of included studies in systematic review (13 studies included in meta-analysis are in bold). ANN, artificial neural network; AODE, aggregating one-dependence estimators; BN, Bayesian network; CNN, convolutional neural networks; ELM, extreme learning machine; F0-4, METAVIR fibrosis staging; FLD, fatty liver disease; HNB, hidden naïve Bayes; kNN, k-nearest network; LFTs, liver function tests; LR, logistic regression; LSTM, long short-term memory; MLP, multilayer perceptron; MRI, magnetic resonance imaging; NAFLD, non-alcoholic fatty liver disease; NASH, non-alcoholic steatohepatitis; NB, naïve Bayes; RF, random forest; RT, regression tree; SGD, stochastic gradient descent; SVM, support vector machine; XGBoost, extreme gradient boosting. Selected AI in the analysis. Studies conducted on the same population cohorts. Quality assessment by QUADAS-2 showed that most studies contained low risk of bias and had no applicability concerns, except for the four studies which contained uncertain risk of bias, one study with high risk of bias, and one study with high risk for applicability concerns. Studies with uncertain risk for bias were studies referring to both alcoholic and non-alcoholic fatty liver disease (n = 3) of which two of the three studies did not provide detailed distribution of the patient’s degree of liver steatosis which could affect the performance of AI-assisted methods. Another study with uncertain risk of bias used non-standard reference diagnostic methods, that is, ultrasonography with elevated liver function for the diagnosis of NASH (n = 1). The study with high risk of bias had multiple diagnostic methods, that is, using ultrasonography for control group and liver biopsy for NAFLD group. Regarding applicability concerns, one high-risk study included genomic data as AI inputs which could be difficult to obtain in clinical setting (n = 1). Detailed assessments for each study are summarized in Supplemental Table 1.

Performance of AI-assisted ultrasonography for the diagnosis of NAFLD

Systematic review included six studies incorporating AI into ultrasonography for NAFLD diagnosis.[22-27] Three studies relied on multiple AI classifiers[22,24,27] and three studies utilized a single AI classifier (2 CNN[23,26] and 1 RT ). Liver biopsy was employed as the diagnostic method for NAFLD in four studies,[22-24,27] whereas the other two studies chose MRI-PDFF.[25,26] Two studies included 50% of patients with less than 30% steatosis.[23,27] One study consisted of 92% of patients with less than 20% steatosis and the other study had a mean steatosis of 11%. Two pairs of studies (Kuppili et al. and Biswas et al., Byra et al. and Zamanian et al. ) were conducted in the same patient cohorts. In the meta-analysis, we included one study from each population cohort which was more recent and reported the better performance of the AI system.[24,27] Eventually, a total of four studies were included in the meta-analysis.[24-27] The pooled sensitivity, specificity, PPV, NPV, and DOR for the four studies was computed as 0.97 (95% CI: 0.91–0.99), 0.98 (95% CI: 0.89–1.00), 0.98 (95% CI: 0.93–1.00), 0.95 (95% CI: 0.88–0.98), and 599.53 (95% CI: 96.73–3716.06), respectively (Figure 2(a)–(e)). Heterogeneity was relatively low with I2 of 0 for pooled specificity and PPV, I2 of 30, 29, and 53 for sensitivity, PPV, and DOR, respectively. Cochrane’s Q results were also not significant (p ⩾ 0.1) for all analyses. SROC curve with AUC of 0.98 is shown in Figure 3.
Figure 2.

Sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and diagnostic odds ratio (e) of AI-assisted ultrasonography for the diagnosis of NAFLD.

Figure 3.

SROC curves demonstrating performance of AI-assisted diagnosis of NAFLD (AI-assisted ultrasonography and AI-assisted clinical data sets) and AI-assisted diagnosis of NASH).

Sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and diagnostic odds ratio (e) of AI-assisted ultrasonography for the diagnosis of NAFLD. SROC curves demonstrating performance of AI-assisted diagnosis of NAFLD (AI-assisted ultrasonography and AI-assisted clinical data sets) and AI-assisted diagnosis of NASH). We further performed meta-regression with the diagnostic method (liver biopsy vs MRI), AI classifier (neural network vs non-neural network) and study population (general population vs specific at-risk population) as covariates in order to determine whether these factors affected the overall results of the meta-analysis. The p values for the AI classifier, diagnostic method, and study population as covariates were 0.04, 0.04, and 0.23, respectively. This finding suggested that different AI classifiers and diagnostic methods significantly affected the overall performance of AI-assisted diagnosis of NAFLD. Subgroup analysis by AI classifiers revealed that neural network AI had slightly higher sensitivity, NPV, and DOR than non-neural network AI, with sensitivity of 0.98 (95% CI: 0.94–0.99) vs 0.88 (95% CI: 0.73–0.97), NPV of 0.97 (95% CI: 0.92–0.99) vs 0.86 (95% CI: 0.67–0.96) and DOR of 1197.75 (95% CI: 255.84–5607.32) vs 90.00 (95% CI: 15.17–533.81), respectively (Supplemental Table 2). Nevertheless, interpretation of the subgroup analysis should be approached with caution because there was only one study that utilized non-neural network. Moreover, subgroup analysis by diagnostic method showed that studies using liver biopsy had higher NPV and DOR compared to studies using MRI with NPV of 0.98 (95% CI: 0.93–1.00) vs 0.90 (95% CI: 0.79–0.95) and DOR of 4130.95 (95% CI: 368.73–46279.93) vs 200.90 (95% CI: 36.88–1094.46), respectively. Interestingly, heterogeneity was also significantly lower in liver biopsy subgroup with I2 of 0 for all pooled analysis (Supplemental Table 2). However, no significant difference was found in the subgroup analysis based on the population. Sensitivity analysis excluding studies with uncertain or high risk for bias according to the QUADAS-2 showed consistent results with sensitivity, specificity, PPV, NPV, and DOR of 0.96 (95% CI: 0.87–0.99), 0.95 (95% CI: 0.88–0.98), 0.97 (0.93–0.99), 0.94 (95% CI: 0.82–0.98), and 336.58 (95% CI: 53.80–2105.69), respectively (Supplemental Table 3).

Performance of AI-assisted clinical data sets for the diagnosis of NAFLD

We performed a meta-analysis of six studies incorporating AI into clinical data sets for NAFLD diagnosis.[28-33] Examples of clinical data sets primarily included demographic data (age, sex, weight, and height) and laboratory values (liver and renal function tests, lipid profile, and plasma glucose). Multiple AI classifiers were used in four studies,[28-30,33] while the other two studies used a single AI classifier (1 ANN and 1 random forest ). Five articles selected ultrasonography as the diagnostic method,[28-30,32,33] while one study relied on MRI. The pooled sensitivity, specificity, PPV, NPV, and DOR were 0.75 (95% CI: 0.66–0.82), 0.82 (95% CI: 0.74–0.88), 0.75 (95% CI: 0.60–0.86), 0.82 (0.74–0.87), and 13.29 (95% CI: 8.32–21.21), respectively (Figure 4(a)–(e)). Figure 3 shows the SROC with an AUC of 0.85. We observed a high degree of heterogeneity with I2 of 98%, 99%, 99%, 99%, and 98% for pooled sensitivity, specificity, PPV, NPV, and DOR, respectively.
Figure 4.

Sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and diagnostic odds ratio (e) of AI-assisted clinical data sets for the diagnosis of NAFLD.

Sensitivity (a), specificity (b), positive predictive value (c), negative predictive value (d), and diagnostic odds ratio (e) of AI-assisted clinical data sets for the diagnosis of NAFLD. Meta-regression performed with diagnostic method and AI classifier as covariates resulted in p values of 0.20 and 0.55, respectively. Subgroup analysis by AI classifiers revealed that neural network AI had slightly higher sensitivity and DOR than non-neural network AI, with sensitivity of 0.84 (95% CI: 0.82–0.85) vs 0.72 (95% CI: 0.63–0.80) and DOR of 21.08 (95% CI: 18.08–24.59) vs 12.09 (95% CI: 7.13–20.50), respectively. Subgroup analysis according to the diagnostic method yielded a higher specificity, PPV, NPV, and DOR of 0.84 (95% CI: 0.75–0.89), 0.80 (95%CI: 0.70–0.87), 0.80 (95%CI: 0.72–0.86), and 15.60 (95% CI: 10.62–22.92) for studies using ultrasonography (n = 5), compared to 0.74 (95% CI: 0.73–0.75), 0.42 (95% CI: 0.40–0.44), 0.89 (95% CI: 0.88–0.90), and 5.77 (95% CI: 4.96–6.70) for the study using MRI (n = 1). Since only one study utilized neural network as AI classifier and one study used MRI as the diagnostic method, interpretation of the subgroup analysis should be approached cautiously. Results for the subgroup analyses are presented in Supplemental Table 2. Sensitivity analysis, excluding articles with uncertain or high risk for bias revealed similar results with sensitivity, specificity, PPV, NPV, and DOR of 0.72 (95% CI: 0.63–0.80), 0.83 (95% CI: 0.72–0.90), 0.76 (95% CI: 0.69–0.82), 0.80 (95% CI: 0.70–0.88), and 12.94 (95% CI: 8.74–19.15), respectively (Supplemental Table 3).

Performance of AI-assisted diagnosis of NASH in patients at-risk for NASH

We identified five studies focusing on the diagnosis of NASH among patients with NAFLD or with at-risk for NAFLD (i.e. obese and hypertensive).[21,34-37] In this category, two studies integrated AI with imaging modalities[21,34] and three studies incorporated AI with clinical data sets.[35-37] Almost all studies selected liver biopsy as the diagnostic methods, except for one study which used ultrasonography findings in combination with elevated liver enzymes. The pooled sensitivity, specificity, PPV, NPV, and DOR for the diagnosis of NASH were 0.80 (95% CI: 0.75–0.85), 0.69 (95% CI: 0.53–0.82), 0.71 (95% CI: 0.36–0.91), 0.75 (95% CI: 0.35–0.94), and 8.27 (95% CI: 5.53–12.37), respectively. The heterogeneity was relatively high with I2 ranging from 0–98% (Supplemental Figure 1A–1E). SROC curve showed an AUC of 0.81 (Figure 3).

Performance of AI-assisted diagnosis of liver fibrosis in NAFLD

Systematic review included a total of five studies integrating AI for the diagnosis of liver fibrosis among NAFLD patients.[21,38-41] However, the meta-analysis was not feasible due to differences in diagnostic modalities and outcomes of the included studies. Three studies integrated AI with clinical data[38,39,41] and one study incorporated AI with imaging biomarkers to evaluate liver fibrosis in NAFLD patients. The other study investigated AI-assisted clinical data sets for evaluating both the diagnosis of NASH and fibrosis. Two studies conducted by the same investigator group contained overlapping study population.[40,41] Regarding diagnostic methods in each study, three study relied on liver biopsy,[21,38,41] one study used elastography and one study selected liver biopsy and ultrasonography as diagnostic method for the NAFLD group and control group, respectively. Overall, the reported sensitivity and specificity varied by different stages of fibrosis. For example, one study found that the performance for identifying METAVIR F1-F4 ranged from a sensitivity of 0.993 for F1 to 1.00 for F4 and a specificity of 0.757 for F1 to 1.00 for F4.

Performance of AI-assisted steatosis quantification in pathological specimen

Our systematic review identified four studies integrating AI with pathological imaging analysis for steatosis quantification and diagnosis of NAFLD.[42-45] The outcome of each study was different from each other, including steatosis grading, differentiating macrosteatosis from other structures, identify significant steatosis or macrosteatosis and diagnosing NASH among NAFLD samples. Therefore, meta-analysis was not performed. All studies relied on pathologist as the reference standard. The diagnostic performance varied by outcomes of the study. For example, the AI-assisted identification of macrosteatosis showed a sensitivity and specificity of 0.98 and 0.94, respectively, while the sensitivity and specificity for diagnosing ⩾30% steatosis were 0.714 and 0.973, respectively. The performance of AI-assisted system for steatosis grading according to the NASH Clinical Research Network histological scoring system ranged from a sensitivity of 0.99 for grade 1 to 0.67 for grade 3 and a specificity of 1.00 for grade 1 to 0.98 for grade 3 steatosis. Furthermore, the AI-assisted pathological identification of NASH among NAFLD had a sensitivity and specificity of 0.879–0.909 and 0.909–1.00, respectively.

Publication bias

In the Deeks funnel plot, the slope coefficients were relatively symmetrical with a p- value of 0.40 for AI-assisted ultrasonography for the diagnosis of NAFLD, 0.78 for AI-assisted clinical data sets for the diagnosis of NAFLD and 0.23 for AI-assisted clinical data sets for the diagnosis of NASH, indicating that no publication bias was detected for the selected studies (Supplemental Figure 2A–C).

Discussion

This systematic review and meta-analysis have identified many types of AI-assisted methods to diagnose NAFLD, NASH, and fibrosis among NAFLD patients and quantify liver steatosis in pathological specimens. Meta-analysis results showed excellent performance of AI-assisted ultrasonography for the diagnosis of NAFLD, with an AUC of 0.98 and relatively low heterogeneity. Combining AI with clinical data sets also demonstrated an acceptable performance level for the diagnosis of NAFLD, with an AUC of 0.85, with a higher degree of heterogeneity, which was likely due to variations in clinical input data. Integrating AI into ultrasonography can improve the performance of NAFLD diagnosis. Ultrasonography is widely available in most hospitals and healthcare facilities. The equipment is also relatively inexpensive and the procedure is non-invasive. However, since the image analysis is user-dependent, it is also subject to inter- and intra-observer variations. The performance of conventional ultrasonography is often less reliable for the diagnosis of early-stage NAFLD. Therefore, incorporating AI with ultrasonography image analysis can minimize both human-related errors as well as improve overall performance. Our meta-analysis found that three out of the four studies had enrolled patients with mild steatosis (50–92% of patient cohorts had less than 30% steatosis) emphasizing the ability of AI-integrated methods to identify early-stage steatosis. The meta-analysis results show promising performance of AI-assisted ultrasonography with excellent sensitivity, specificity, PPV, and NPV of 0.95 and above as well as high accuracy with an AUC of 0.98. Heterogeneity assessment for AI-assisted ultrasonography was also relatively low with I2 of <40 for all pooled analyses except for the I2 of 53 for DOR. Subgroup analysis by AI classifier indicated the superior performance of neural network AI over non-neural network AI. Interestingly, subgroup also showed that studies using liver biopsy as diagnostic method has significantly lower degree of heterogeneity with I2 of 0 for all pooled analysis, implying that different diagnostic method could be the cause of heterogeneity for AI-assisted ultrasonography. Moreover, compared to currently available imaging modalities, the performance of an AI-assisted system for the diagnosis of NAFLD exceeded the performance of conventional ultrasonography, TE, and dual-gradient echo magnetic resonance imaging (DGE-MRI) reported in previous studies (Table 2).[47,48] Our results support the benefits and robustness of using AI-assisted ultrasonography. Nevertheless, randomized controlled trials with head-to-head comparisons between AI-assisted system and conventional imaging modalities are warranted to validate the performance differences.
Table 2.

Comparisons between the performance of AI-assisted systems in this meta-analysis and the performance of conventional methods reported in previous studies for the diagnosis of NAFLD.

AnalysisAI-assisted ultrasonographyAI-assisted clinical datasetsConventional ultrasonography 48 (⩾5% steatosis)Transient elastography 47 S0 vs S1–3DGE-MRI 48 (⩾5% steatosis)
Sensitivity0.97 (0.91–0.99)0.75 (0.66–0.82)0.62 (0.49–0.73)0.69 (0.60–0.75)0.77 (0.65–0.86)
Specificity0.98 (0.89–1.00)0.82 (0.74–0.88)0.81 (0.72–0.88)0.82 (0.76–0.90)0.87 (0.79–0.92)
Positive predictive value0.98 (0.93–1.00)0.75 (0.60–0.86)0.66 (0.53–0.77)0.78 (0.66–0.87)
Negative predictive value0.95 (0.88–0.98)0.82 (0.74–0.87)0.78 (0.69–0.85)0.86 (0.78–0.92)
AUC0.980.850.820.88

DGE-MRI, dual-gradient echo magnetic resonance imaging.

Comparisons between the performance of AI-assisted systems in this meta-analysis and the performance of conventional methods reported in previous studies for the diagnosis of NAFLD. DGE-MRI, dual-gradient echo magnetic resonance imaging. AI has also been employed to analyze large clinical data sets with various inputs, such as demographic data, physical findings, and laboratory results. The performance of AI in this category is promising but less satisfactory with findings showing only moderate accuracy compared to AI-assisted ultrasonography (AUC: 0.85 vs 0.98) and large heterogeneity (I2: 98–99%). To identify the source of heterogeneity, we performed a meta-regression which suggested that heterogeneity was not driven by various AI classifiers and diagnostic method. We hypothesized that the differences in performance are likely due to the differences in type and quantity of the AI inputs. The inputs for AI-assisted ultrasonography are usually images of the liver which contain diverse and potentially relevant features to be extracted by AI. The inputs for AI-assisted clinical data sets are limited to clinical parameters pre-selected by the investigators. The numbers of selected clinical parameters inputted in the AI were relatively small with great variation among the different studies. This may explain the lower performance and higher degree of heterogeneity. Nonetheless, the overall performance of AI-assisted clinical data sets was still comparable to those of TE and slightly lower than DGE-MRI as shown in Table 2. Incorporating AI into patient information already available in routine clinical practice could provide a preliminary screening method to identify patients at risk for NAFLD, especially in resource-limited settings where TE or MRI machines are unavailable or cost-prohibitive. Other applications of AI in NAFLD are the identification of NASH and fibrosis which could offer tremendous clinical benefits as the degree of hepatic inflammation or fibrosis is associated with liver-related mortality. Regarding the AI-assisted diagnosis of NASH, our meta-analysis showed an acceptable sensitivity of 80% and AUC of 0.8 but with relatively high heterogeneity. We hypothesized that the different diagnostic methods and different population might in part contribute to the high heterogeneity. Due to the limited number of studies included in the meta-analysis (n = 3), interpretation of the results needs to be done with caution. More studies in this topic are required for a more comprehensive analysis. Moreover, various scoring systems and imaging modalities have been proposed as screening tools for early detection of fibrosis, including aspartate aminotransferase-to-platelet ratio index (APRI) and Fibrosis-4 score (FIB-4). A previous meta-analysis reported a relatively low diagnostic performance of these conventional scoring systems, with a pooled sensitivity and specificity of 60% and 77%, respectively, for diagnosing significant fibrosis, and 67% and 77%, respectively, for diagnosing advanced fibrosis. We found that when AI was integrated into the clinical datasets, it provided a better tool for screening fibrosis. For example, AI-integrated clinical data sets had a pooled sensitivity and specificity of 0.99–1.00 and 0.76–1.00 for diagnosing liver fibrosis stage F1–F4, respectively. Nevertheless, more studies focusing on using AI to improve diagnostic capabilities of clinical scoring systems are critically needed. The last application of AI in NAFLD is to quantify liver steatosis in pathological specimens. Previous studies have shown that conventional identification of pathological specimen is susceptible to inter- and intra-observer variations and also considered to be a time-consuming process.[49,50] AI-supported analysis has shown that it can provide reliable results with acceptable performance levels including a sensitivity and specificity of 0.71 and 0.97 for the diagnosis of more than 30% steatosis as well as 0.67 – 0.99 and 0.85 – 1.00 for steatosis grading. This manuscript represents one of the very first meta-analyses focusing on the application of AI in the diagnosis of NAFLD. In the production of this effort, we conducted a comprehensive literature search, including articles from medical journals, computer science, and engineering journals. Our selection criteria also only included articles with clear validation methods which is crucial for evaluating performance of AI technology. We do recognize some limitations remain present in this study. No AI algorithms were completely identical among the included articles. Since AI inputs were slightly different among the studies despite being classified as similar, interpretation of the pooled diagnostic performance must proceed with caution. More studies in each subgroup are required for comprehensive subgroup analysis. Another limitation is the difference in the diagnostic method among the included studies. The gold standard for the diagnosis of NAFLD and steatosis quantification is liver biopsy or MRI-PDFF as the best alternative. However, some studies integrating AI with clinical data sets instead relied on ultrasonography which may affect performance results. In order to accurately evaluate the performance of the AI-assisted diagnostic system, liver biopsy or MRI-PDFF should be employed as the diagnostic method for NAFLD. Finally, prospective or randomized controlled studies comparing AI-supported analysis with conventional methods would be beneficial in assessing the potential utility of AI in clinical practice.

Conclusion

AI-assisted ultrasonography and clinical data sets delivered satisfactory performance as a diagnostic tool for NAFLD. AI-assisted systems used in the identification of fibrosis and NASH as well as the quantification of steatosis of a pathological specimen also yielded promising results albeit the limited number of the studies available for review. Randomized controlled studies or prospective studies are warranted to validate the benefit of AI use in clinical setting. Click here for additional data file. Supplemental material, sj-docx-1-tag-10.1177_17562848211062807 for Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis by Pakanat Decharatanachart, Roongruedee Chaiteerakij, Thodsawit Tiyarattanachai and Sombat Treeprasertsuk in Therapeutic Advances in Gastroenterology Click here for additional data file. Supplemental material, sj-docx-2-tag-10.1177_17562848211062807 for Application of artificial intelligence in non-alcoholic fatty liver disease and liver fibrosis: a systematic review and meta-analysis by Pakanat Decharatanachart, Roongruedee Chaiteerakij, Thodsawit Tiyarattanachai and Sombat Treeprasertsuk in Therapeutic Advances in Gastroenterology
  47 in total

1.  Accuracy of imaging methods for steatohepatitis diagnosis in non-alcoholic fatty liver disease patients: A systematic review.

Authors:  Giulia Besutti; Luca Valenti; Guido Ligabue; Maria Chiara Bassi; Pierpaolo Pattacini; Giovanni Guaraldi; Paolo Giorgi Rossi
Journal:  Liver Int       Date:  2019-05-08       Impact factor: 5.828

Review 2.  A new definition for metabolic dysfunction-associated fatty liver disease: An international expert consensus statement.

Authors:  Mohammed Eslam; Philip N Newsome; Shiv K Sarin; Quentin M Anstee; Giovanni Targher; Manuel Romero-Gomez; Shira Zelber-Sagi; Vincent Wai-Sun Wong; Jean-François Dufour; Jörn M Schattenberg; Takumi Kawaguchi; Marco Arrese; Luca Valenti; Gamal Shiha; Claudio Tiribelli; Hannele Yki-Järvinen; Jian-Gao Fan; Henning Grønbæk; Yusuf Yilmaz; Helena Cortez-Pinto; Claudia P Oliveira; Pierre Bedossa; Leon A Adams; Ming-Hua Zheng; Yasser Fouad; Wah-Kheong Chan; Nahum Mendez-Sanchez; Sang Hoon Ahn; Laurent Castera; Elisabetta Bugianesi; Vlad Ratziu; Jacob George
Journal:  J Hepatol       Date:  2020-04-08       Impact factor: 25.083

3.  Nonalcoholic fatty liver disease: a spectrum of clinical and pathological severity.

Authors:  C A Matteoni; Z M Younossi; T Gramlich; N Boparai; Y C Liu; A J McCullough
Journal:  Gastroenterology       Date:  1999-06       Impact factor: 22.682

4.  Individual patient data meta-analysis of controlled attenuation parameter (CAP) technology for assessing steatosis.

Authors:  Thomas Karlas; David Petroff; Magali Sasso; Jian-Gao Fan; Yu-Qiang Mi; Victor de Lédinghen; Manoj Kumar; Monica Lupsor-Platon; Kwang-Hyub Han; Ana C Cardoso; Giovanna Ferraioli; Wah-Kheong Chan; Vincent Wai-Sun Wong; Robert P Myers; Kazuaki Chayama; Mireen Friedrich-Rust; Michel Beaugrand; Feng Shen; Jean-Baptiste Hiriart; Shiv K Sarin; Radu Badea; Kyu Sik Jung; Patrick Marcellin; Carlo Filice; Sanjiv Mahadeva; Grace Lai-Hung Wong; Pam Crotty; Keiichi Masaki; Joerg Bojunga; Pierre Bedossa; Volker Keim; Johannes Wiegand
Journal:  J Hepatol       Date:  2016-12-28       Impact factor: 25.083

5.  Observer variation in assessment of liver biopsies including analysis by kappa statistics.

Authors:  A Theodossi; A M Skene; B Portmann; R P Knill-Jones; R S Patrick; R A Tate; W Kealey; K J Jarvis; D J O'Brian; R Williams
Journal:  Gastroenterology       Date:  1980-08       Impact factor: 22.682

6.  A diagnostic model for cirrhosis in patients with non-alcoholic fatty liver disease: an artificial neural network approach.

Authors:  Omid Pournik; Sara Dorri; Hedieh Zabolinezhad; Seyyed Moayed Alavian; Saeid Eslami
Journal:  Med J Islam Repub Iran       Date:  2014-10-21

7.  Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts.

Authors:  Naeimeh Atabaki-Pasdar; Mattias Ohlsson; Ana Viñuela; Francesca Frau; Hugo Pomares-Millan; Mark Haid; Angus G Jones; E Louise Thomas; Robert W Koivula; Azra Kurbasic; Pascal M Mutie; Hugo Fitipaldi; Juan Fernandez; Adem Y Dawed; Giuseppe N Giordano; Ian M Forgie; Timothy J McDonald; Femke Rutters; Henna Cederberg; Elizaveta Chabanova; Matilda Dale; Federico De Masi; Cecilia Engel Thomas; Kristine H Allin; Tue H Hansen; Alison Heggie; Mun-Gwan Hong; Petra J M Elders; Gwen Kennedy; Tarja Kokkola; Helle Krogh Pedersen; Anubha Mahajan; Donna McEvoy; Francois Pattou; Violeta Raverdy; Ragna S Häussler; Sapna Sharma; Henrik S Thomsen; Jagadish Vangipurapu; Henrik Vestergaard; Leen M 't Hart; Jerzy Adamski; Petra B Musholt; Soren Brage; Søren Brunak; Emmanouil Dermitzakis; Gary Frost; Torben Hansen; Markku Laakso; Oluf Pedersen; Martin Ridderstråle; Hartmut Ruetten; Andrew T Hattersley; Mark Walker; Joline W J Beulens; Andrea Mari; Jochen M Schwenk; Ramneek Gupta; Mark I McCarthy; Ewan R Pearson; Jimmy D Bell; Imre Pavo; Paul W Franks
Journal:  PLoS Med       Date:  2020-06-19       Impact factor: 11.069

8.  Transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images.

Authors:  Michał Byra; Grzegorz Styczynski; Cezary Szmigielski; Piotr Kalinowski; Łukasz Michałowski; Rafał Paluszkiewicz; Bogna Ziarkiewicz-Wróblewska; Krzysztof Zieniewicz; Piotr Sobieraj; Andrzej Nowicki
Journal:  Int J Comput Assist Radiol Surg       Date:  2018-08-09       Impact factor: 2.924

9.  Noninvasive Diagnosis of Nonalcoholic Fatty Liver Disease and Quantification of Liver Fat with Radiofrequency Ultrasound Data Using One-dimensional Convolutional Neural Networks.

Authors:  Aiguo Han; Michal Byra; Elhamy Heba; Michael P Andre; John W Erdman; Rohit Loomba; Claude B Sirlin; William D O'Brien
Journal:  Radiology       Date:  2020-02-25       Impact factor: 29.146

10.  Imaging biomarkers for steatohepatitis and fibrosis detection in non-alcoholic fatty liver disease.

Authors:  Rocío Gallego-Durán; Pablo Cerro-Salido; Emilio Gomez-Gonzalez; María Jesús Pareja; Javier Ampuero; María Carmen Rico; Rafael Aznar; Eduardo Vilar-Gomez; Elisabetta Bugianesi; Javier Crespo; Francisco José González-Sánchez; Reyes Aparcero; Inmaculada Moreno; Susana Soto; María Teresa Arias-Loste; Javier Abad; Isidora Ranchal; Raúl Jesús Andrade; Jose Luis Calleja; Miguel Pastrana; Oreste Lo Iacono; Manuel Romero-Gómez
Journal:  Sci Rep       Date:  2016-08-12       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.