| Literature DB >> 32792678 |
Hanieh Marvi Khorasani1, Hamid Usefi2, Lourdes Peña-Castillo3.
Abstract
Ulcerative colitis (UC) is one of the most common forms of inflammatory bowel disease (IBD) characterized by inflammation of the mucosal layer of the colon. Diagnosis of UC is based on clinical symptoms, and then confirmed based on endoscopic, histologic and laboratory findings. Feature selection and machine learning have been previously used for creating models to facilitate the diagnosis of certain diseases. In this work, we used a recently developed feature selection algorithm (DRPT) combined with a support vector machine (SVM) classifier to generate a model to discriminate between healthy subjects and subjects with UC based on the expression values of 32 genes in colon samples. We validated our model with an independent gene expression dataset of colonic samples from subjects in active and inactive periods of UC. Our model perfectly detected all active cases and had an average precision of 0.62 in the inactive cases. Compared with results reported in previous studies and a model generated by a recently published software for biomarker discovery using machine learning (BioDiscML), our final model for detecting UC shows better performance in terms of average precision.Entities:
Mesh:
Year: 2020 PMID: 32792678 PMCID: PMC7426912 DOI: 10.1038/s41598-020-70583-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of datasets used in this study.
| Accession number | # of controls | # of UC cases | Description of samples | Platform | # of genes (features) | Usage |
|---|---|---|---|---|---|---|
| GSE1152[ | 4 | 4 | Mucosal biopsies from uninflammed colonic tissues | Affymetrix Human Genome U133A Array and Affymetrix Human Genome U133B Array | 19,353 | Model selection |
| GSE11223[ | 24 | 25 | Biopsies from uninflammed sigmoid colon | Agilent-012391 Whole Human Genome Oligo Microarray G4112A | 18,626 | Model selection |
| GSE22619[ | 10 | 10 | Mucosal colonic tissue from discordant twins | Affymetrix Human Genome U133 Plus 2.0 Array | 22,189 | Model selection |
| GSE75214-active[ | 11 | 74 | Mucosal colonic biopsies from active UC patients and from controls | Affymetrix Human Gene 1.0 ST Array | 20,358 | Model evaluation |
| GSE75214-inactive[ | 11 | 23 | Mucosal colonic biopsies from inactive UC patients from controls | Affymetrix Human Gene 1.0 ST Array | 20,358 | Model evaluation |
Ten top subsets of genes with the highest cross-validated average AP.
| Subset | AP | # of Features |
|---|---|---|
| Subset 10 | 0.97 | 42 |
| Subset 51 | 0.97 | 47 |
| Subset 58 | 0.97 | 32 |
| Subset 83 | 0 .97 | 39 |
| Subset 5 | 0.96 | 37 |
| Subset 16 | 0.96 | 30 |
| Subset 33 | 0.96 | 27 |
| Subset 55 | 0.96 | 22 |
| Subset 62 | 0.96 | 46 |
| Subset 74 | 0.96 | 50 |
Figure 1Identifying the most frequently selected genes. Top: Number of times each gene was selected. Genes were sorted based on the number of times they were selected by DRPT. Bottom: Normal QQ-plot. Horizontal line at 31 indicates the threshold selected to deem a gene as frequently chosen.
Figure 2Precision-recall curve of top selected subsets on GSE75214-active.
Figure 3Precision-recall curve of top selected subsets on GSE75214-inactive.
Phenotypes associated with the 32 most frequently selected genes by DRPT as obtained from Ensembl REST API (Version 11.0)[36].
| Gene symbol | Associated phenotypes | # of times selected |
|---|---|---|
| CWF19L1 | Spinocerebellar ataxia, autosomal recessive 17; depressive disorder, Major | 100 |
| FCER2 | Blood protein levels; post bronchodilator FEV1 | 100 |
| MMP2 | Multicentric Osteolysis-Nodulosis-Arthropathy (MONA) spectrum disorders; cholesterol, HDL; lip and oral cavity carcinoma; body height; winchester syndrome | 99 |
| PPP1CB | Noonan Syndrome-like disorder with loose anagen hair 2; Heel bone mineral density; Blood pressure; basophils asopathy with developmental delay; short stature and sparse slow-growing hair | 99 |
| RPL23AP32 | Attention deficit disorder with hyperactivity; body Height | 95 |
| ZNF624 | None | 94 |
| REG1B | Contrast sensitivity; Body Mass Index | 93 |
| TFRC | Breast ductal adenocarcinoma; esophageal adenocarcinoma; thyroid carcinoma; clear cell renal carcinoma; prostate carcinoma; pancreatic cancer; gastric adenocarcinoma; hepatocellular carcinoma; lung adenocarcinoma; rectal adenocarcinoma; basal cell carcinoma; | 91 |
| FAM118A | 89 | |
| CFHR2 | Macular degeneration; blood protein levels; feeling miserable; alanine aminotransferase (ALT) levels after remission induction therapy in acute lymphoblastic leukaemia (ALL); asthma | 88 |
| KRT8 | Cirrhosis; familial cirrhosis; hepatitis C virus; susceptibility to, cirrhosis, cryptogenic cirrhosis, noncryptogenic cirrhosis; susceptibility to, gamma glutamyl transferase levels, cancer (pleiotropy) | 88 |
| PRELID1 | Body fat distribution; heel bone mineral density; activated partial thromboplastin time | 87 |
| ZNF92 | None | 86 |
| ABHD2 | Itch intensity from mosquito bite adjusted by bite size; gut microbiota; Obesity-related traits; coronary artery disease; advanced age related macular degeneration; squamous cell lung carcinoma; pulse pressure | 79 |
| C16orf89 | None | 79 |
| CAB39L | Hemoglobin S; erythrocyte count; pancreatic neoplasms | 79 |
| SPATC1L | None | 76 |
| DUOXA2 | Familial thyroid dyshormonogenesis; thyroglobulin synthesis defect | 72 |
| MESP1 | None | 70 |
| MAML3 | Social science traits; intelligence (MTAG); chronic mucus hypersecretion; borderline personality disorder; congenital heart malformation | 65 |
| PITX2 | Axenfeld-Rieger syndrome; ring dermoid of cornea; iridogoniodygenesis type 2; peters anomaly; familial atrial fibrillation; rieger anomaly; stroke; ischemic stroke; cataract; PITX2-related eye abnormalities; phosphorus; cognitive decline rate in late mild cognitive impairment; creatinine; intraocular pressure; incident atrial fibrillation; wolff-parkinson-white pattern; parkinson disease; early onset atrial fibrillation; anterior segment sygenesis 4 | 65 |
| DMTN | Total cholesterol levels; LDL cholesterol | 62 |
| ASF1B | None | 52 |
| PGF | Mood instability; blood protein levels | 50 |
| BEX4 | None | 49 |
| ODF1 | Body weight; body mass index; glucose; IgA nephropathy; Chronic lymphocytic leukaemia; type 2 diabetes; erythrocyte indices | 47 |
| PTGR1 | Body height; menarche; monocyte count; blood protein levels | 45 |
| ZNF35 | None | 44 |
| LIPF | Maximal midexpiratory flow rate; blood protein levels; respiratory function tests; blood pressure | 39 |
| SLC25A13 | Citrullinemia type II; neonatal intrahepatic cholestasis due to citrin deficiency; citrin deficiency; citrullinemia type I; bone mineral density | 38 |
| BARX2 | Type 2 diabetes; breast cancer; night sleep phenotypes; response to cyclophosphamide in systemic lupus erythematosus with lupus nephritis; stroke | 35 |
| C2orf42 | None | 34 |