| Literature DB >> 22606341 |
Tianyi Zhang1, Bowen Song, Wei Zhu, Xiao Xu, Qing Qing Gong, Christopher Morando, Themistocles Dassopoulos, Rodney D Newberry, Steven R Hunt, Ellen Li.
Abstract
Previous genome-wide expression studies have highlighted distinct gene expression patterns in inflammatory bowel disease (IBD) compared to control samples, but the interpretation of these studies has been limited by sample heterogeneity with respect to disease phenotype, disease activity, and anatomic sites. To further improve molecular classification of inflammatory bowel disease phenotypes we focused on a single anatomic site, the disease unaffected proximal ileal margin of resected ileum, and three phenotypes that were unlikely to overlap: ileal Crohn's disease (ileal CD), ulcerative colitis (UC), and control patients without IBD. Whole human genome (Agilent) expression profiling was conducted on two independent sets of disease-unaffected ileal samples collected from the proximal margin of resected ileum. Set 1 (47 ileal CD, 27 UC, and 25 Control non-IBD patients) was used as the training set and Set 2 was subsequently collected as an independent test set (10 ileal CD, 10 UC, and 10 control non-IBD patients). We compared the 17 gene signatures selected by four different feature-selection methods to distinguish ileal CD phenotype with non-CD phenotype. The four methods yielded different but overlapping solutions that were highly discriminating. All four of these methods selected FOLH1 as a common feature. This gene is an established biomarker for prostate cancer, but has not previously been associated with Crohn's disease. Immunohistochemical staining confirmed increased expression of FOLH1 in the ileal epithelium. These results provide evidence for convergent molecular abnormalities in the macroscopically disease unaffected proximal margin of resected ileum from ileal CD subjects.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22606341 PMCID: PMC3351422 DOI: 10.1371/journal.pone.0037139
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Patient characteristics associated with each disease phenotype in the training and test sets.
| Training Set | |||
| Variables | Ileal CD (n = 47) | UC (n = 27) | Control (n = 25) |
| Gender (male) | 43% | 59% | 32% |
| Race (white) | 96% | 100% | 96% |
| Median Age (range) y | 35 (20–75) | 43 (17–64) | 55 (18–84) |
| Current smoker | 32% | 10% | 24% |
| Positive fecal | 0% | 30% | 0% |
| Median BMI (range) kg/m2 | 24 (16–38) | 24 (18–43) | 28 (20–38) |
| 5-ASA | 55% | 63% | 0% |
| Steroids | 43% | 67% | 0% |
| Immunomodulators | 45% | 44% | 0% |
| Anti-TNFα biologics | |||
| Current (≤8 weeks of surgery) | 28% | 41% | 0% |
| Past (>8 weeks of surgery) | 8% | 7% | 0% |
| Never | 64% | 52% | 0% |
Comparison of 17 ileal gene signatures selected by four different feature selection methods.
| Methods | AUC | Accuracy |
|
|
|
|
| PAM | 0.895 | 88.9% |
| Random forest | 0.902 | 85.9% |
| LASSO | 0.895 | 85.9% |
Boosting [16], PAM) [17], random forest [18] and LASSO [19] were applied to the SAM filtered training microarray dataset to select 17 ileal gene signatures. The AUC and overall accuracy for each of the signatures were calculated based on the majority vote of 7 classifiers (Boosting, PAM, Random Forest, LASSO, Support Vector Machine, Linear Discriminant Analysis, and Naive Bayes), which is equivalently to the decision based on the median score using an usual probability threshold of 0.5 (see Materials and Methods).
Figure 1Receiver operating characteristic (ROC) curve for different classification methods on the training set.
Figure 2Receiver operating characteristic (ROC) curve for different classification methods on the test set.
Classification results on the training and test sets.
| Classification Method | Accuracy | Sensitivity | Specificity |
|
| |||
| Support Vector Machine (SVM) | 90.9% | 91.5% | 90.4% |
| Random Forest (RF) | 86.9% | 87.2% | 86.5% |
| Linear Discriminant Analysis (LDA) | 90.9% | 89.4% | 92.3% |
| Predictive Analysis of Microarray (PAM) | 88.9% | 89.4% | 88.5% |
| Lasso | 91.9% | 91.5% | 92.3% |
| Boosting | 88.9% | 89.4% | 88.5% |
| Naïve Bayes | 88.9% | 89.4% | 88.5% |
|
|
|
|
|
|
| |||
| Support Vector Machine (SVM) | 83.3% | 80.0% | 85.0% |
| Random Forest (RF) | 73.3% | 90.0% | 65.0% |
| Linear Discriminant Analysis (LDA) | 76.7% | 80.0% | 75.0% |
| Predictive Analysis of Microarray (PAM) | 86.7% | 100.0% | 80.0% |
| Lasso | 86.7% | 80.0% | 90.0% |
| Boosting | 86.7% | 90.0% | 85.0% |
| Naïve Bayes | 83.3% | 100.0% | 75.0% |
|
|
|
|
|
The accuracy, sensitivity, specificity of the ileal gene signature selected by the boosting method [16] are calculated using Leaving-One-Out cross validation on the training and subsequently, direct classification of the test set based on the training set.
Figure 3Partial correlation network among the 17 selected genes.
FOLH1 is linked to multiple genes and serves as a hub gene. A red line between genes indicates a positive non-zero partial correlation and a blue line indicates a negative non-zero partial correlation.
Figure 4Immunohistochemical localization of FOLH1 in disease unaffected ileal mucosa from the proximal margin of resected ileum from an ileal CD subject (left panel) and a control non-IBD subject.
The more prominent FOLH1 staining in the ileal CD sample is localized to the villous epithelium. Magnification is 100×. Bar is 200 µm.