| Literature DB >> 36140740 |
Jan K Nowak1, Cyntia J Szymańska1, Aleksandra Glapa-Nowak1, Rémi Duclaux-Loras2, Emilia Dybska1, Jerzy Ostrowski3,4, Jarosław Walkowiak1, Alex T Adams5.
Abstract
Although big data from transcriptomic analyses have helped transform our understanding of inflammatory bowel disease (IBD), they remain underexploited. We hypothesized that the application of machine learning using lasso regression to transcriptomic data from IBD patients and controls can help identify previously overlooked genes. Transcriptomic data provided by Ostrowski et al. (ENA PRJEB28822) were subjected to a two-stage process of feature selection to discriminate between IBD and controls. First, a principal component analysis was used for dimensionality reduction. Second, the least absolute shrinkage and selection operator (lasso) regression was employed to identify genes potentially involved in the pathobiology of IBD. The study included data from 294 participants: 100 with ulcerative colitis (48 adults and 52 children), 99 with Crohn's disease (45 adults and 54 children), and 95 controls (46 adults and 49 children). IBD patients presented a wide range of disease severity. Lasso regression preceded by principal component analysis successfully selected interesting features in the IBD transcriptomic data and yielded 12 models. The models achieved high discriminatory value (range of the area under the receiver operating characteristic curve 0.61-0.95) and identified over 100 genes as potentially associated with IBD. PURA, GALNT14, and FCGR1A were the most consistently selected, highlighting the role of the cell cycle, glycosylation, and immunoglobulin binding. Several known IBD-related genes were among the results. The results included genes involved in the TGF-beta pathway, expressed in NK cells, and they were enriched in ontology terms related to immunity. Future IBD research should emphasize the TGF-beta pathway, immunoglobulins, NK cells, and the role of glycosylation.Entities:
Keywords: Crohn’s disease; TGF-beta; expression; inflammatory bowel disease; ulcerative colitis
Mesh:
Substances:
Year: 2022 PMID: 36140740 PMCID: PMC9498489 DOI: 10.3390/genes13091570
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Genes were selected by lasso regression to best discriminate patients with IBD from controls; both adults and children were included. Coefficients are presented together with the number of times the given transcript appeared in all tested model across the cross-validation/lambda grid (greater values indicate transcripts more systematically linked to IBD). The top three genes (highest n) are indicated in bold. The area under the curve (AUC; 90% confidence interval), lambda shrinkage parameter, and the intercept for each model are also presented. Please note that the genes with the most discriminatory power are present at the top and the bottom of the list.
| IBD | Ulcerative Colitis | Crohn’s Disease | |||
|---|---|---|---|---|---|
| AUC = 0.85 (0.79–0.92), λ = 0.11 | AUC = 0.87 (0.79–0.95), λ = 0.16 | AUC = 0.83 (0.75–0.83), λ = 0.14 | |||
| Gene | Coefficient, | Gene | Coefficient, | Gene | Coefficient, |
|
| 0.79 |
| 0.06 |
|
|
|
| 0.05, 122 |
| −0.01, 38 |
|
|
|
| 0.05, 112 |
| −0.01, 8 |
|
|
|
| 0.05, 105 |
| −0.01, 44 |
| 0.04 |
|
| 0.03, 110 |
| −0.02, 146 |
| 0.02, 104 |
|
| 0.02, 89 |
| −0.05, 50 |
| 0.00, 19 |
|
| 0.01, 108 |
|
|
| −0.02, 44 |
|
| −0.03, 66 |
| −0.05, 159 |
| −0.05, 140 |
|
| −0.04, 44 |
|
|
| −0.05, 95 |
|
| −0.04, 93 |
| −0.05, 153 |
| −0.05, 91 |
|
| −0.04, 30 |
| −0.05, 150 |
| −0.05, 112 |
|
| −0.05, 30 |
| −0.05, 129 |
| −0.05, 94 |
|
| −0.05, 92 |
| −0.05, 164 |
| −0.05, 101 |
|
| −0.05, 105 |
|
|
| −0.05, 141 |
|
| −0.05, 135 |
| −0.05, 138 |
| −0.05, 69 |
|
| −0.05, 129 |
| −0.05, 44 | ||
|
|
|
| −0.05, 161 | ||
|
| −0.05, 127 | ||||
|
|
| ||||
|
| −0.05, 125 | ||||
|
|
| ||||
Genes were selected by lasso regression to best discriminate patients with severe IBD from controls. Coefficients are presented together with the number of times the given transcript appeared in the tested models across the cross-validation/lambda grid (greater values indicate transcripts more systematically linked to IBD). The top three genes (highest n) are indicated in bold. The area under the curve (AUC, 90% confidence interval), lambda shrinkage parameter, and the intercept for each model are also presented. Please note that genes with the most discriminatory power are present at the top and the bottom of the list.
| Severe IBD | Severe Ulcerative Colitis | Severe Crohn’s Disease | |||
|---|---|---|---|---|---|
| AUC = 0.91 (0.83–0.98), λ = 0.13 | AUC = 0.90 (0.73–1.0), λ = 0.15 | AUC = 0.92 (0.79–1.0), λ = 0.25 | |||
| Gene | Coefficient, | Gene | Coefficient, | Gene | Coefficient, |
|
| 0.05, 171 |
| 0.05, 196 |
|
|
|
|
|
|
|
|
|
|
| 0.05, 116 |
| 0.05, 193 |
| −0.97 |
|
| 0.05, 60 |
| 0.05, 138 | ||
|
|
|
| 0.05, 131 | ||
|
| 0.05, 93 |
| 0.05, 122 | ||
|
| 0.05, 143 |
|
| ||
|
| 0.05, 67 |
| 0.05, 129 | ||
|
| 0.05, 119 |
| 0.05, 152 | ||
|
| 0.05, 77 |
| 0.05, 88 | ||
|
| 0.05, 137 |
| 0.03, 117 | ||
|
|
|
| 0.02, 84 | ||
|
| 0.05, 216 |
| 0.01, 96 | ||
|
| 0.05, 124 |
| −0.01, 63 | ||
|
| 0.01, 96 |
| −0.03, 73 | ||
|
| 0.00, 44 |
| −0.05, 108 | ||
|
| 0.00, 93 |
|
| ||
|
| 0.00, 47 |
| −0.05, 183 | ||
|
| −0.03, 122 |
| −0.05, 185 | ||
|
| −0.03, 55 |
| −0.97 | ||
|
| −0.04, 93 | ||||
|
| −0.05, 54 | ||||
|
| −0.05, 95 | ||||
|
| −0.05, 123 | ||||
|
| −0.05, 205 | ||||
|
| −0.05, 124 | ||||
|
| −0.05, 142 | ||||
|
| −0.27 | ||||
Genes were selected by lasso regression to best identify IBD in children and adults (adult patients presented a more quiescent disease relative to children). Coefficients are presented together with the number of times the given transcript appeared in all tested model across the cross-validation/lambda grid (greater values indicate transcripts more systematically linked to IBD). The top three genes (highest n) are indicated in bold. The area under the curve (AUC; 90% confidence interval), lambda shrinkage parameter, and the intercept for each model are also presented. Please note that the genes with the most discriminatory power are present at the top and the bottom of the list.
| IBD | IBD | ||
|---|---|---|---|
| AUC = 0.95 (0.89–1.0), λ = 0.20 | AUC = 0.86 (0.75–0.98), λ = 0.15 | ||
| Gene | Coefficient, | Gene | Coefficient, |
|
| 0.78 |
| 0.73 |
|
| 0.05, 132 |
| 0.00, 118 |
|
|
|
| −0.01, 45 |
|
| 0.05, 178 |
| −0.03, 31 |
|
| 0.02, 137 |
| −0.05, 142 |
|
| 0.01, 176 |
| −0.05, 63 |
|
| 0.00, 154 |
|
|
|
|
|
|
|
|
|
|
| −0.05, 156 |
|
| −0.05, 41 |
| −0.05, 174 |
|
|
| ||
|
| −0.05, 59 | ||
|
| −0.05, 83 | ||
|
| −0.05, 96 | ||
Figure 1Gene ontology analysis of the genes included in the lasso models for the prediction of IBD, UC, and CD status. The ratio of the number of genes in overlap (k) to all genes in the given gene set (K) is indicated.