| Literature DB >> 36011133 |
Abstract
Ulcerative colitis is a bowel disease of unknown cause. This research is a proof-of-concept exercise focused on determining whether it is possible to identify the genes associated with ulcerative colitis using artificial intelligence. Several machine learning and artificial neural networks analyze using an autoimmune discovery transcriptomic panel of 755 genes to predict and model ulcerative colitis versus healthy donors. The dataset GSE38713 of 43 cases from the Hospital Clinic of Barcelona was selected, and 16 models were used, including C5, logistic regression, Bayesian network, discriminant analysis, KNN algorithm, LSVM, random trees, SVM, Tree-AS, XGBoost linear, XGBoost tree, CHAID, Quest, C&R tree, random forest, and neural network. Conventional analysis, including volcano plot and gene set enrichment analysis (GSEA), were also performed. As a result, ulcerative colitis was successfully predicted with several machine learning techniques and artificial neural networks (multilayer perceptron), with an overall accuracy of 95-100%, and relevant pathogenic genes were highlighted. One of them, programmed cell death 1 ligand 1 (PD-L1, CD274, PDCD1LG1, B7-H1) was validated in a series from the Tokai University Hospital by immunohistochemistry. In conclusion, artificial intelligence analysis of transcriptomic data of ulcerative colitis is a feasible analytical strategy.Entities:
Keywords: PD-L1; artificial intelligence; artificial neural networks; autoimmunity; immune checkpoint; immune microenvironment; immuno-oncology; machine learning; transcriptome; ulcerative colitis
Year: 2022 PMID: 36011133 PMCID: PMC9408181 DOI: 10.3390/healthcare10081476
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Figure 1Volcano plot. This type of plot is useful to identifying genes that differ significantly between healthy controls and active ulcerative colitis. This type of graph relates fold change to p values. Upregulated genes are highlighted in red and downregulated in blue.
Figure 2Gene set enrichment analysis (GSEA) using an autoimmune discovery panel. The GSEA analysis confirmed that a priori set of genes of the autoimmune discovery panel showed a significant difference between ulcerative colitis and healthy controls. The analysis showed enrichment toward ulcerative colitis. The most relevant genes of the leading edge were IL1RN, MMP3, OSMR, FCGR3B, FCGR3A, TNC, TNFRSF6b, CD274 (PD-L1), PLAU, and S100A9.
Prediction of ulcerative colitis using machine learning and artificial neural network modeling.
| Model | Overall Accuracy (%) | No. Fields (Genes) Used | Most Relevant Genes |
|---|---|---|---|
| C5 | 100 | 2 | |
| Logistic regression | 100 | 734 | |
| Discriminant | 100 | 734 | - |
| LSVM | 100 | 734 | |
| SVM | 100 | 734 | - |
| XGBoost Linear | 100 | 734 | - |
| XGBoost Tree | 100 | 734 | - |
| Neural Network | 100 | 734 | |
| CHAID | 97.7 | 2 | |
| Random Forest | 97.7 | 734 | |
| KNN Algorithm | 95.4 | 734 | - |
| C&R Tree | 95.4 | 12 | |
| Quest | 83.7 | 6 |
|
| Bayesian Network | 65.1 | 734 | - |
| Random Trees | 0 | 734 | N/A |
Figure 3Modeling ulcerative colitis versus healthy controls using C5 tree, CHAID tree, and artificial neural networks. Several machine learning techniques, including artificial neural networks, were used to predict ulcerative colitis using gene expression data from the autoimmune discovery panel. This figure shows the results of the C5 tree (which used GART and IL21R genes in the final model), CHAID tree (IP6K1 and ZFP90), and the neural network (which used the 734 genes of the autoimmune discovery panel). The accuracy of these 3 methods was high, 100%, 98%, and 100%, respectively.
Figure 4Modeling ulcerative colitis versus healthy controls using random forest and Bayesian network. This figure shows the results of the modeling of the prediction of ulcerative colitis against healthy controls using gene expression data of the autoimmune discovery panel. The random forest plot shows the genes of the model, ranked according to their predicted importance. The Bayesian network also predicted the ulcerative colitis cases (subtype 2 in the figure). The Bayesian network shows the genes (nodes) and the probabilistic, or conditional, independencies between them. The causal relationships may be represented, but the links (arcs) of the network do not necessarily represent direct cause and effect.
Prediction of ulcerative colitis (active, non-involved active, and inactive) using machine learning and artificial neural network modeling.
| Model | Overall Accuracy (%) | No. Fields (Genes) Used | Most Relevant Genes |
|---|---|---|---|
| Logistic regression | 100 | 734 | - |
| Discriminant | 100 | 734 | - |
| SVM | 100 | 734 | - |
| XGBoost Linear | 100 | 734 | - |
| XGBoost Tree | 100 | 734 | - |
| CHAID | 97.7 | 4 | |
| Random Forest | 97.7 | 734 | |
| Neural Network | 97.7 | 734 | |
| Bayesian Network | 95.4 | 734 | - |
| KNN Algorithm | 93.0 | 734 | - |
| LSVM | 86.1 | 734 | - |
| C5 | 83.7 | 2 | |
| C&R Tree | 65.1 | 6 |
|
| Quest | 62.8 | 6 |
|
| Random Trees | 0 | 734 | N/A |
Figure 5Modeling ulcerative colitis versus healthy controls. The target variable was the disease, ulcerative colitis (involved active (2), non-involved active (3), and inactive/remission (4)), and healthy controls (1). Using a CHAID tree and the gene expression of 4 genes (MMP3, OSMR, GSDMB, and ZFP90) it was possible to classify for histological subtypes with 97.7% accuracy.
Figure 6Modeling ulcerative colitis versus healthy controls. The target variable was the disease, ulcerative colitis (involved active (2), non-involved active (3), and inactive/remission (4)), and healthy controls (1). Using an artificial neural network, it was possible to classify the patients with 97.7% accuracy; the most relevant gene for predicting the subtype was UBASH3A. The modeling was also complete with a Bayesian network and C5 tree. Of note, C5 tree only used 2 genes, the CD274 (PD-L1) and SULTA1, and had an accuracy of 83.7%.
Figure 7Programmed cell death factor 1 (PD-L1, CD274) expression in ulcerative colitis. Ulcerative colitis is characterized by increased PD-L1 expression more than healthy controls (p = 0.015). Ulcerative colitis samples were characterized by disruption of the epithelial layer, inflammation of the lamina propria, crypt branching, shortening, and disarray.
Clinicopathological characteristics of the cases of ulcerative colitis.
| Type | Biopsy | Age | Sex | Baron Score | Geboes Score | |
|---|---|---|---|---|---|---|
| Control | 0.84 | Rectum | 64 | Male | - | - |
| Control | 1.14 | Descending | 56 | Male | - | - |
| Control | 1.45 | Descending | 59 | Male | - | - |
| Control | 1.45 | Rectum | 26 | Male | - | - |
| Control | 3.09 | Rectum | 59 | Female | - | - |
| Ulcerative colitis | 1.38 | Rectum | 51 | Male | 1 | 1 |
| Ulcerative colitis | 1.47 | Sigmoid | 31 | Female | 2 | 2 |
| Ulcerative colitis | 1.61 | Rectum | 37 | Female | 1 | 2 |
| Ulcerative colitis | 1.80 | Rectum | 37 | Female | 2 | 2 |
| Ulcerative colitis | 2.06 | Rectum | 33 | Male | 2 | 3 |
| Ulcerative colitis | 2.24 | Rectum | 77 | Female | 2 | 2 |
| Ulcerative colitis | 2.97 | Rectum | 46 | Male | 1 | 3 |
| Ulcerative colitis | 2.98 | Sigmoid | 41 | Male | 2 | 3 |
| Ulcerative colitis | 4.74 | Rectum | 59 | Male | 1 | 2 |
| Ulcerative colitis | 6.34 | Rectum | 23 | Male | 2 | 2 |
| Ulcerative colitis | 4.04 | Rectum | 22 | Female | 2 | 4 |
| Ulcerative colitis | 6.52 | Sigmoid | 43 | Female | 3 | 2 |
| Ulcerative colitis | 6.89 | Descending | 54 | Female | 2 | 4 |
| Ulcerative colitis | 10.99 | Rectum | 20 | Male | 3 | 4 |
| Ulcerative colitis | 14.55 | Descending | 17 | Female | 2 | 2 |