Ji Li1, Jia-Ming Qian. 1. Department of Gastroenterology, Chinese Academy of Medical Sciences & Peking Union Medical College Hospital, Beijing 100730, China.
In the past decade, artificial intelligence (AI) has been applied in several clinical studies to improve the medical care of patients with gastroenterological diseases, to accurately detect polyps and early cancer lesions, to facilitate the analysis of inflammatory lesions, to assess liver fibrosis, and to predict the response and clinical outcomes of medications.[Inflammatory bowel disease (IBD), comprising Crohn disease (CD) and ulcerative colitis (UC), is a prototypical complex disease that has been considered a significant healthcare burden in the healthcare system of Western countries.[ Considered as one major form of AI, machine learning (ML) is an application that performs repeated iterations of models progressively improving the performance of a specific task by adapting several methods including random forest (RF), support vector machine (SVM), longitudinal regression (LR), and elastic net regularized generalized linear model. ML has been used in several clinical studies comprising IBDpatients to determine the differential diagnosis of IBD, to assess IBD, and to predict the response and clinical outcomes of medications used to treat IBD. However, ML-assisted diagnostic or predictive measures for the daily medical care of IBDpatients has not been applied yet. Here, we briefly reviewed some representative studies and subsequently addressed the challenges and opportunities of AI in the medical care of IBDpatients.Based on the clinical, endoscopic, and histopathologic features, majority of CD and UC could be clearly differentiated. However, approximately 10% to 30% of patients could not be categorized into CD or UC, defined as IBD unclassified.[ Several biomarkers, including anti-Saccharomyces cerevisiae antibodies and antineutrophil cytoplasmic antibodies, T helper lymphocyte polarization in the lamina propria, and certain micro-RNAs (miRNAs) have been indicated to have discriminating potential for the differential diagnosis of IBD, which are not yet well-adapted in clinical practice. Some studies attempted to construct a discriminating model using ML to improve the accuracy of the differential diagnosis of IBD based on the endoscopic and histopathologic findings, single nucleotide polymorphisms (SNPs), or microbiota. One retrospective study with a large sample size enrolled 20,076 CD and 15,307 UC patients from the International IBD Genetics Consortium genomic databases and constructed a combined model involving multiple SNPs using SVM, which had an area under the receiver operating characteristic curve (AUROC) of 0.864 for the differential diagnosis between CD and UC.[ Another study combined the endoscopic and histopathologic lesions in pediatric IBD (143 CD and 67 UC) patients to construct a model using SVM, which had an AUROC curve of 0.87.[ One study analyzed the gut microbiota in 20 CD and 19 UC patients and constructed a model based on the operational taxonomic units using RF, which had an AUROC of 0.72 for the differential diagnosis between CD and UC.[ However, all these studies were retrospective and did not involve a large sample size of IBD unclassified patients. Besides the abovementioned studies, there were a few studies using ML to construct different models to discriminate UC or CD from the healthy controls including SNPs, miRNAs, or multi-omics with accuracy ranging from 78.9% to 92.8%.[Endoscopic procedures are significantly important to assess IBD. Some studies focused on the wireless capsule endoscopy in the evaluation of the gut ulcerative lesions using SVM or convolutional network among CD patients, and the accuracy of these algorithm models were all relatively high with rates ranging from 89.3% to 93.8%.[ However, these studies did not systemically evaluate the endoscopic disease status of CD patients. One study from Japan performed endocytoscopy to evaluate the active histopathologic lesions using SVM and obtained an algorithm with a high accuracy rate of 91.0% in 187 UC patients and 22,853 images.[ Another study enrolled 952 UC patients and 30,322 colonoscopic images and constructed a model with convolutional network, with high AUROCs of 0.94–0.99 between Mayo 0–1 and Mayo 2–3 lesions.[ As discussed above, endoscopic studies using ML always involved a large number of images and had significant potential for endoscopic disease assessment.Clearly determining the predictors and clinical outcomes of certain medications is significantly important to accurately determine the effective medications used to treat IBD. Waljee et al[ used RF methods to conduct several studies to construct different predictive models in IBDpatients. They developed an algorithm involving age and laboratory tests to evaluate the disease remission in patients taking thiopurine, which had an AUROC curve of 0.79, greater than 0.49 based on 6-thioguanine nucleotide (6-TGN) levels, indicating the superiority of ML-based algorithm and the ineffectivity of 6-TGN test in predicting disease remission. Another study collected the data of the phase 3 clinical trial of vedolizumab in CD patients through the Clinical Study Data Request website and constructed a model using the routine laboratory tests (C-reactive protein and biochemical tests) 6 weeks after vedolizumab administration, with an AUROC of 0.75, suggesting the probability of using the laboratory data 6 weeks after vedolizumab administration to predict disease remission at week 52.[ Through the Veterans Health Administration Electronic Database, they also constructed an RF longitudinal algorithm model using the demographics, laboratory tests, and medications of 20,368 IBDpatients, with a high AUROC of 0.85 in predicting hospitalization, with this model considered beneficial in categorizing patients according to their different risks for hospitalization and in administering individualized medications in each patient.[Following the increased clinical use of immunosuppressants and biological agents, the incidence of surgery for IBD decreased. However, it is still significantly important to determine the predictors for surgery. One study enrolled 239 CD patients and constructed some algorithms using the patients’ clinical manifestations, radiologic findings, and laboratory tests using several ML methods, including RF, SVM, and artificial neural networks (ANNs). The RF model had the highest accuracy of 96.26%.[ An Indian team used RF to construct an algorithm used to predict colectomy in severe colitispatients with an accuracy of 77%.[Big data refers to sets of data whose scale and complexity require the use of dedicated analytical and statistical approaches. Currently, the most important data sources for medical big data include administrative databases, online clinical trials registries, electronic medical records, medical images, and omics data,[ which had been used in the abovementioned studies. However, these datasets were heterogeneous and retrospectively collected and lacked highly randomized controlled study design, limiting their value for future clinical application. Meanwhile, a gold standard method to establish the diagnosis of IBD does not exist yet, and poor consistency in the disease assessment based on the clinical manifestations, endoscopic scores, and histopathologic scores in CD patients, and many novel biological agents for the treatment of IBD have resulted in significant challenges in constructing high-quality datasets. Considering the increasing prevalence of IBD, developing countries have significant opportunities to construct big datasets, with the consistent desire for high-quality databases with the following characteristics: databases with high volume, high speed for data collection, and a significant number of structured or unstructured variables and databases comprising crossover and longitudinal clinical information and novel biomarkers based on multi-omics.Never causing foreseeable or unintentional harm is the milestone of the ethical issues of AI. Nowadays, majority of the AI studies in the field of IBD only have a variable accuracy of 72% to 96%. The misdiagnosis or misclassification based on the AI system is inevitable, possibly affecting the decision-making process and the patient's beneficence, preference, and right of informed consent. Minority bias, label bias, agency bias, and informed mistrust all might contribute to the bias of model development and deployment, which will affect the fairness and accuracy of the algorithm derived from AI.[ Sometimes, disparity is the major confounding bias to influence the accuracy and efficacy of the model. For example, there are significant differences in the genotypic aspects between the East and West, specifically the rarity of nucleotide-binding oligomerization domain-containing protein 2 (NOD2) mutation in Asian CD patients.[ When deploying the model, these disparities should be taken into consideration.Considering the AI model's generic characteristics and unavailability or transparency in calculating progress, it is always challenging to conduct external validation for the AI model. Moreover, it is also relatively difficult to determine why an AI model has several errors when it fails in clinical practice. Hence, the AI model should develop its explainability, which is the ability to explain what happens when the model makes a decision.[ However, more powerful ML models always involved more parameters, which might be not easily explainable. Hence, when making decisions using the AI assistant system in future clinical practice, the healthcare providers should be well-informed or should be knowledgeable about the system.One of the most important impulses for the development of AI is the evident progression of computer science and bioinformatics, which is essential to adequately manage and integrate big data. Several methods have been developed with varying levels of performance. Among the abovementioned studies, several of them were conducted based on several ML methods (RF, ANN, SVM, and LR) to determine the best algorithm model.[ Statisticians should get involved in the study design as early as possible, especially in clinical studies with multi-center, large sample size, novel interventions, who will definitely give valuable suggestions for the data collection, data analysis with appropriate methods and interpretation of the results.Following the application and progress of AI in the field of economics and some medical fields, AI would definitely contribute to the improvement of medical care of IBDpatients. On the road of application of AI assistant medications in IBD, prospective randomized controlled studies with large sample sizes and high-quality databases are always required to determine its effectiveness and safety. Meanwhile, it is essentially important to cope with the ethical concerns and facilitate collaborations among clinicians, statisticians, and bioinformaticians.
Funding
This work was supported by grants from the Fundamental Research Funds for the Central Universities (No. 3332018012) and the Funding for Elite Training in the Dongcheng District, Beijing (No. 2018-34).
Authors: David Chen; Clifton Fulmer; Ilyssa O Gordon; Sana Syed; Ryan W Stidham; Niels Vande Casteele; Yi Qin; Katherine Falloon; Benjamin L Cohen; Robert Wyllie; Florian Rieder Journal: J Crohns Colitis Date: 2022-03-14 Impact factor: 10.020