| Literature DB >> 34882644 |
Fernando Gomollón1, Javier P Gisbert2, Iván Guerra3, Rocío Plaza4, Ramón Pajares Villarroya5, Luis Moreno Almazán6, Mª Carmen López Martín7, Mercedes Domínguez Antonaya8, María Isabel Vera Mendoza9, Jesús Aparicio10, Vicente Martínez10, Ignacio Tagarro10, Alonso Fernández-Nistal10, Sara Lumbreras11, Claudia Maté12, Carmen Montoto10.
Abstract
BACKGROUND: The impact of relapses on disease burden in Crohn's disease (CD) warrants searching for predictive factors to anticipate relapses. This requires analysis of large datasets, including elusive free-text annotations from electronic health records. This study aims to describe clinical characteristics and treatment with biologics of CD patients and generate a data-driven predictive model for relapse using natural language processing (NLP) and machine learning (ML).Entities:
Mesh:
Substances:
Year: 2022 PMID: 34882644 PMCID: PMC8876385 DOI: 10.1097/MEG.0000000000002317
Source DB: PubMed Journal: Eur J Gastroenterol Hepatol ISSN: 0954-691X Impact factor: 2.566
Fig. 1Study design and timeline. For each patient in the database, the Index Date (i.e., Baseline) was defined as the timepoint when diagnostic criteria for CD is first identified. All available EHRs before January 2014 were handled to extract information regarding the clinical history of patients (dotted line). The follow-up period ranged from the index date to the end of the study period or the last data point available. Data from patients’ EHRs were extracted and organized with the EHRead technology. See the Methods section for further details. CD, Crohn’s disease; EHR, electronic health record.
Demographics, clinical characteristics, and medication use
| Demographics | |
| Sex, | |
| Female | 3034 (51.1) |
| Male | 2904 (48.9) |
| Age (years) | |
| | 5934 |
| Mean (SD) | 48.3 (18.3) |
| Median | 46.3 |
| (Q1-Q3) | (35.0-61.0) |
| Adults (≥18 years old), | 5782 (97.37) |
| Children (<18 years old), | 152 (2.56) |
| Missing | 4 |
| Substance use | |
| Tobacco use, | 3465 |
| Ex | 1010 (29.15) |
| No | 1091 (31.5) |
| Yes | 1364 (39.36) |
| Missing | 2473 |
| Alcohol use, | 893 |
| Ex | 81 (9.1) |
| No | 324 (36.3) |
| Yes | 492 (55.1) |
| Missing | 5045 |
| Disease characteristics | |
| Location, | |
| L1 ileal | 913 (54.0) |
| L2 colonic | 199 (11.8) |
| L3 ileocolonic | 507 (30.0) |
| L4 isolated upper disease | 73 (4.3) |
| Missing | 4246 |
| Behavior, | |
| B1 nonstricturing, nonpenetrating: inflammatory | 589 (48.8) |
| B2 stricturing | 356 (29.5) |
| B3 penetrating | 262 (21.7) |
| Missing | 4731 |
Age at registered Crohn’s disease diagnosis.
Subjects with missing values are not included in percentage calculations.
Based on the Montreal Classification.
L4 is a modifier that can be added to L1–L3 when concomitant upper gastrointestinal disease is present.
Family history
| Crohn’s disease | 401 (6.8) |
| Other medical conditions and diseases | |
| Blood and lymphatic system disorders | 87 (1.5) |
| Cardiovascular disorders | 421 (7.1) |
| Disorder of cardiovascular system | 151 (2.5) |
| Myocardial infarction | 76 (1.3) |
| Structural disorder of heart | 67 (1.1) |
| Cerebrovascular disease | 45 (0.8) |
| Others | 189 (3.2) |
| Congenital, familial, and genetic disorders | 31 (0.5) |
| Ear and labyrinth disorders | 12 (0.2) |
| Endocrine disorders | 339 (5.7) |
| Diabetes mellitus | 148 (2.5) |
| Disorder of endocrine system | 108 (1.8) |
| Disorder of thyroid gland | 63 (1.1) |
| Others | 79 (1.3) |
| Eye disorders | 69 (1.2) |
| Gastrointestinal and hepatobiliary disorders | 557 (9.4) |
| Inflammatory disorder of digestive tract | 136 (2.3) |
| Disorder of lower gastrointestinal tract | 97 (1.6) |
| Colitis | 75 (1.3) |
| Viral hepatitis | 54 (0.9) |
| Disorder of rectum | 34 (0.6) |
| Others | 293 (4.9) |
| General disorders | 22 (0.4) |
| Immune system disorders | 88 (1.5) |
| Infections and infestations | 82 (1.4) |
| Injury, poisoning, and procedural complications | 13 (0.2) |
| Metabolism and nutrition disorders | 130 (2.2) |
| Disorder of lipoprotein and/or lipid metabolism | 35 (0.6) |
| Others | 103 (1.7) |
| Musculoskeletal and connective tissue disorders | 183 (3.1) |
| Neoplasms, benign (incl. cysts and polyps) | 109 (1.8) |
| Neoplasms, malignant (incl. cysts and polyps) | 457 (7.7) |
| Malignant tumor of breast | 190 (3.2) |
| Malignant neoplasm of intraabdominal organ | 47 (0.8) |
| Primary malignant neoplasm of colon | 33 (0.6) |
| Neoplastic disease | 32 (0.5) |
| Others | 250 (4.2) |
| Neoplasms, unspecified (incl. cysts and polyps) | 757 (12.7) |
| Malignant tumor of large intestine | 251 (4.2) |
| Neoplasm of large intestine | 137 (2.3) |
| Carcinoma of stomach | 107 (1.8) |
| Neoplasm of lung | 96 (1.6) |
| Neoplasm of breast | 68 (1.1) |
| Neoplasm of prostate | 42 (0.7) |
| Neoplasm of ovary | 36 (0.6) |
| Others | 286 (4.8) |
| Nervous system disorders | 196 (3.3) |
| Cerebral degeneration presenting primarily with dementia | 41 (0.7) |
| Others | 158 (2.7) |
| Pregnancy, puerperium, and perinatal conditions | 37 (0.6) |
| Psychiatric disorders | 72 (1.2) |
| Renal and urinary disorders | 56 (0.9) |
| Reproductive system and breast disorders | 50 (0.8) |
| Respiratory, thoracic, and mediastinal disorders | 171 (2.9) |
| Bronchial hyperreactivity/hyperresponsiveness | 33 (0.6) |
| Others | 146 (2.5) |
| Skin and subcutaneous tissue disorders | 162 (2.7) |
| Acquired disorder of keratinization | 52 (0.9) |
| Others | 116 (2) |
| Social circumstances | 5 (0.1) |
| Surgical and medical procedures | 13 (0.2) |
Percentage based on the total number of patients. All medical terms were obtained from the standardized SNOMED CT glossary.
Procedures and surgical interventions during follow up
| Endoscopy | 2161 (36.4) |
| Colonoscopy | 1705 (28.7) |
| Esophagogastroduodenoscopy | 673 (11.3) |
| Rectoscopy | 112 (1.9) |
| Rectosigmoidoscopy | 72 (1.2) |
| Missing | 3777 |
| Imaging | 2558 (43.1) |
| Diagnostic radiography of abdomen | 1883 (31.7) |
| CT of abdomen | 1068 (18.0) |
| Ultrasonography of abdomen | 975 (16.4) |
| CT of abdomen and pelvis | 744 (12.5) |
| MRI of abdomen | 682 (11.5) |
| Magnetic resonance enterography | 598 (10.1) |
| MRI of abdomen and pelvis | 24 (0.4) |
| Barium enema | 18 (0.3) |
| Missing | 3380 |
| Surgical interventions | 2260 (38.1) |
| Excision | 1550 (26.1) |
| Excision of intestinal structure | 933 (15.7) |
| Colectomy | 585 (9.9) |
| Gastrointestinal and digestive anastomosis | 561 (9.4) |
| Perianal region operations | 122 (2.1) |
| Colostomy | 138 (2.3) |
| Ileostomy operation | 183 (3.1) |
| Proctocolectomy | 53 (0.9) |
| Anal fistulectomy | 32 (0.5) |
| Small intestinal strictureplasty | 9 (0.2) |
| Missing | 3678 |
CT, computed tomography.
Percentage based on the total number of patients.
Biologics treatment by treatment line during the last year of the follow-up period
| Line | Treatment | |
|---|---|---|
| 1L ( | Adalimumab | 207 (46.7) |
| Infliximab | 192 (43.3) | |
| Vedolizumab | 26 (5.9) | |
| Ustekinumab | 14 (3.2) | |
| Certolizumab pegol | 3 (0.7) | |
| Natalizumab | 1 (0.2) | |
| 2L ( | Adalimumab | 65 (45.8) |
| Infliximab | 33 (23.2) | |
| Ustekinumab | 26 (18.3) | |
| Vedolizumab | 14 (9.8) | |
| Certolizumab pegol | 4 (2.8) | |
| NA | 301 | |
| 3L ( | Ustekinumab | 21 (38.8) |
| Vedolizumab | 20 (37) | |
| Infliximab | 5 (9.3) | |
| Adalimumab | 4 (7.4) | |
| Certolizumab pegol | 4 (7.4) | |
| NA | 389 | |
| 4L ( | Ustekinumab | 11 (50) |
| Vedolizumab | 8 (36.4) | |
| Infliximab | 2 (9.1) | |
| Adalimumab | 1 (4.5) | |
| NA | 421 | |
| 5L ( | Vedolizumab | 2 (50) |
| Ustekinumab | 1 (25) | |
| Certolizumab pegol | 1 (25) | |
| NA | 439 |
Percentage based on the total number of biologics-treated patients during the selected window.
NA = not available, representing either patients who (1) continued treatment with the same biologic until the end of the study period, (2) discontinued the treatment and did not receive any other biologic, or (3) information regarding biologic treatment was not available in the electronic health records.
Predictive model for relapse risks at different timepoints
| Accuracy | Confusion matrix | Precision | Recall | AUC | |||
|---|---|---|---|---|---|---|---|
| Decision tree, 3 months | 0.81 | 2191 | 281 | 0.50 | 0.50 | 0.50 | 0.84 |
| 288 | 291 | ||||||
| Logistic regression, 3 months | 0.82 | 2213 | 259 | 0.50 | 0.52 | 0.51 | 0.85 |
| 287 | 292 | ||||||
| Random forest, 3 months | 0.84 | 2295 | 177 | 0.46 | 0.60 | 0.52 | 0.88 |
| 309 | 270 | ||||||
| Random forest, 6 months | 0.84 | 2184 | 189 | 0.56 | 0.67 | 0.61 | 0.89 |
| 301 | 377 | ||||||
| Random forest, 1 year | 0.83 | 2089 | 186 | 0.59 | 0.71 | 0.65 | 0.90 |
| 318 | 458 | ||||||
| Random forest, 2 years | 0.83 | 1984 | 230 | 0.67 | 0.71 | 0.69 | 0.91 |
| 275 | 562 | ||||||
AUC, area under the curve.
In a confusion matrix, the rows reflect the number of positives (i.e., presence of relapse) and negatives (i.e., absence of relapse), whereas columns indicate the positives and negatives that were predicted by the model. The elements in the main diagonal show the elements that were correctly predicted; off-diagonal terms show the false negatives (element 2,1) and false positives (1,2). See Methods section for further details regarding the calculation of each performance metric. See Table S2, Supplemental digital content 1, http://links.lww.com/EJGH/A730 for the relative importance of the most relevant variables included in the predictive model.
Relative importance of the 25 most relevant variables included in the predictive model for CD-related relapse
| Variable | Relative importance |
|---|---|
| Cumulative past flare | 0.358901 |
| Age | 0.029441 |
| Difference between event value and basal value leukocytes | 0.023361 |
| Difference between event value and basal value hemoglobin | 0.010256 |
| Increment respect to maximum normal value fibrinogen | 0.009158 |
| Disposition events - Past admissions | 0.008597 |
| Montreal Scale | 0.008082 |
| Proton pump inhibitors - A02BC | 0.007368 |
| Substance use findings (habits) - TOBACCO USE | 0.006733 |
| Acetic acid derivatives and related substances - M01AB | 0.006705 |
| Belladonna and derivatives in combination with analgesics - A03DB | 0.006624 |
| Evaluation procedures - CT of chest and abdomen | 0.005912 |
| methylprednisolone - H02AB04 | 0.005684 |
| ciprofloxacin - J01MA02 | 0.005551 |
| prednisone - H02AB07 | 0.005274 |
| Evaluation procedures - Source-specific culture | 0.004973 |
| Diagnostic procedures - Laboratory procedure | 0.004925 |
| Medical history - Infection due to Enterobacteriaceae | 0.004657 |
| Difference between event value and basal value CRP - 48 | 0.004641 |
| Sex | 0.004507 |
| mesalazine -A07EC02 | 0.004463 |
| Increment respect to maximum normal value leukocytes - 9 | 0.004463 |
| Evaluation procedures - Imaging of abdomen | 0.004428 |
| Increment respect to maximum normal value CRP - 48 | 0.004324 |
| Evaluation procedures - Blood gas measurement | 0.004311 |
CD, Crohn’s disease; CRP, C-reactive protein.