| Literature DB >> 25221466 |
Jacob Tveiten Bjerrum1,2, Mattias Rantalainen3,4, Yulan Wang4,5, Jørgen Olsen1, Ole Haagen Nielsen2.
Abstract
A systems biology approach to multi-faceted diseases has provided an opportunity to establish a holistic understanding of the processes at play. Thus, the current study merges transcriptomics and metabonomics data in order to improve diagnostics, biomarker identification and to explore the possibilities of a molecular phenotyping of ulcerative colitis (UC) patients. Biopsies were obtained from the descending colon of 43 UC patients (22 active UC and 21 quiescent UC) and 15 controls. Genome-wide gene expression analyses were performed using Affymetrix GeneChip Human Genome U133 Plus 2.0. Metabolic profiles were generated using 1H Nuclear magnetic resonance spectroscopy (Bruker 600 MHz, Bruker BioSpin, Rheinstetten, Germany). Data were analyzed with the use of orthogonal-projection to latent structure-discriminant analysis and a multivariate logistic regression model fitted by lasso. Prediction performance was evaluated using nested Monte Carlo cross-validation. The prediction performance of the merged data sets and that of relative small (<20 variables) multivariate biomarker panels suggest that it is possible to discriminate between active UC, quiescent UC, and controls; between patients with or without steroid dependency, as well as between early or late disease onset. Consequently, this study demonstrates that the novel approach of integrating metabonomics and transcriptomics combines the better of the two worlds, and provides us with clinical applicable candidate biomarker panels. These combined panels improve diagnostics and more importantly also the molecular phenotyping in UC and provide insight into the pathophysiological processes at play, making optimized and personalized medication a possibility.Entities:
Keywords: Colonic biopsy; Gene expression profiles; Inflammatory bowel disease; Metabolic profiles; Metabolomics; NMR spectroscopy
Year: 2013 PMID: 25221466 PMCID: PMC4161940 DOI: 10.1007/s11306-013-0580-3
Source DB: PubMed Journal: Metabolomics ISSN: 1573-3882 Impact factor: 4.290
Clinical details
| Inactive | UC Active | UC Controls | |
|---|---|---|---|
| Gender (male/female) | 8/13 | 8/14 | 4/11 |
| Age, years (mean, range) | 53 (27–78) | 40 (18–76) | 43 (19–65) |
| Age at diagnosis (<25 years/>25 years) | 6/15 | 6/16 | – |
| Years with disease (<10 years/>10 years) | 12/9 | 16/6 | – |
| Mayo score (mean, range) | 0.2 (0–1) | 6 (2–9) | – |
| Smoker/non-smoker | 1/20 | 3/19 | 2/13 |
| Steroids: Si/Sd/unknown | 10/0/11 | 9/8/5 | – |
| Daily medication | |||
| Systemic mesalazine (1.6–3.2 g) | 18 | 19 | – |
| Topical mesalazine (1,000 mg) | 6 | 9 | – |
| Systemic glucocorticoids (75 mg) | 2a | 4 | – |
| Topical glucocorticoids (100 mg) | 1 | 2 | – |
| Azathioprine (100–150 mg) | 0 | 6 | – |
| Infliximab (5 mg/kg/infusion) | 0 | 1 | – |
| None | 2 | 2 | 15 |
Steroid dependency is defined as the need for re-introduction of systemic glucocorticoids during tapering or within 30 days of drug discontinuation to maintain symptoms control. None of the patients were steroid refractory
UC ulcerative colitis, Si steroid independence, Sd steroid dependence
aTapering dose of 5 mg/day
Samples sizes in different subgroups of data
| Analysis name | Case name |
|
| Proportion cases |
|---|---|---|---|---|
| ActiveControl | Active | 36 | 21 | 0.58 |
| InactiveControl | Inactive | 34 | 19 | 0.56 |
| InactiveActive | Active | 40 | 21 | 0.53 |
| SiSd | Si | 24 | 16 | 0.67 |
| DurationLess10 | Duration | 40 | 12 | 0.30 |
| DebutLess25 | DebutLess25 | 40 | 29 | 0.73 |
Analysis name subgroups of data, Case name group defined as cases, N total number of observations in total, N cases number of cases, Proportion cases fraction of total number of samples that belong to the case group, Si steroid independence, Sd steroid dependence, DurationLess10 disease duration less than 10 years, DebutLess25 age at diagnosis less than 25 years
Prediction performance
| Analysis name | Metabonomics (NOE) | Metabonomics (CPMG) | Transcriptomics | Omics |
|---|---|---|---|---|
| ActiveControl | 0.95 | 0.92 | 0.97 | 0.97 |
| InactiveControl | 0.79 | 0.65 | 0.57 | 0.58 |
| InactiveActive | 0.98 | 0.95 | 0.96 | 0.96 |
| SiSd | 0.73 | 0.76 | 0.80 | 0.78 |
| DurationLess10 | 0.43 | 0.35 | 0.63 | 0.63 |
| DebutLess25 | 0.43 | 0.34 | 0.38 | 0.36 |
Prediction performance estimates presented as area under the curve (AUC) for each sub-population and for each full data set, including combined data sets: Omics. AUC estimates are based on classification using OPLS-DA and Monte Carlo cross-validation
NOE nuclear Overhauser effect, CPMG Carr–Purcell–Meiboom–Gill, Si steroid independence, Sd steroid dependence, DurationLess10 disease duration less than 10 years, DebutLess25 age at diagnosis less than 25 years
Fig. 1Prediction performance estimates presented as area under the curve. Prediction performance estimates using OPLS-DA (a, c) or logistic regression fitted by lasso (b, d) on metabonomics (NOE and CPMG), transcriptomics (mrna), and omics (all) data sets
Prediction performance of small candidate biomarker panels
| Analysis name | Metabonomics (NOE) | Metabonomics (CPMG) | Transcriptomics | Omics |
|---|---|---|---|---|
| ActiveControl | 0.93 | 0.90 | 0.97 | 0.96 |
| InactiveControl | 0.76 | 0.68 | 0.68 | 0.76 |
| InactiveActive | 0.94 | 0.89 | 0.93 | 0.93 |
| SiSd | 0.62 | 0.71 | 0.63 | 0.70 |
| DurationLess10 | 0.54 | 0.50 | 0.38 | 0.38 |
| DebutLess25 | 0.49 | 0.67 | 0.71 | 0.69 |
Prediction performance estimates presented as area under the curve (AUC) for each sub-population under selection of small candidate biomarker panels for each full data set and the combined data sets: Omics. AUC estimates are based on classification using logistic regression fitted by lasso and a nested Monte Carlo cross-validation procedure
NOE nuclear Overhauser effect, CPMG Carr–Purcell–Meiboom–Gill, Si steroid independence, Sd steroid dependence, DurationLess10 disease duration less than 10 years, DebutLess25 age at diagnosis less than 25 years
Fig. 2Distribution of the size of the selected biomarker panels over 100 (outer) cross-validation rounds. a Metabonomics NOE, b Metabonomics CPMG, c transcriptomics, and d Omics data set. The box border represents the interquartile range and the horizontal line in the box is the median. The whiskers show the largest/smallest observation falling within a distance of 1.5 times the box size
Most frequently selected variables in the inactive UC versus control
| Name | HGNC | ppm | Data set | Selection freq. |
|---|---|---|---|---|
| ENSG00000243024_at | RPS11P6 | Transcriptomics | 0.81 | |
| 3.176 | Metabonomics | 0.77 | ||
| ENSG00000165669_at | FAM204A | Transcriptomics | 0.69 | |
| ENSG00000121351_at | IAPP | Transcriptomics | 0.55 | |
| ENSG00000120675_at | DNAJC15 | Transcriptomics | 0.5 | |
| ENSG00000100664_at | EIF5 | Transcriptomics | 0.48 | |
| ENSG00000128881_at | TTBK2 | Transcriptomics | 0.47 | |
| ENSG00000116885_at | OSCP1 | Transcriptomics | 0.46 | |
| ENSG00000166323_at | C11orf65 | Transcriptomics | 0.31 | |
| 3.093 | Metabonomics | 0.29 | ||
| ENSG00000159208_at | C1orf51 | Transcriptomics | 0.29 | |
| ENSG00000119487_at | MAPKAP1 | Transcriptomics | 0.27 | |
| ENSG00000164291_at | ARSK | Transcriptomics | 0.26 | |
| ENSG00000171786_at | NHLH1 | Transcriptomics | 0.23 | |
| ENSG00000198590_at | C3orf35 | Transcriptomics | 0.17 | |
| ENSG00000051620_at | HEBP2 | Transcriptomics | 0.15 | |
| ENSG00000175550_at | DRAP1 | Transcriptomics | 0.14 | |
| ENSG00000122585_at | NPY | Transcriptomics | 0.13 | |
| ENSG00000131697_at | NPHP4 | Transcriptomics | 0.12 | |
| 2.8 (Aspartate) | Metabonomics | 0.11 | ||
| ENSG00000111224_at | PARP11 | Transcriptomics | 0.11 | |
| ENSG00000189068_at | VSTM1 | Transcriptomics | 0.11 | |
| ENSG00000257726_at | Transcriptomics | 0.11 |
Although there are fewer metabolites selected among the top predictors compared to transcripts, the metabolites carry a substantial amount of information in terms of classification, e.g. the NMR peak (unknown annotation) at 3.176 ppm is selected in 77 % of all cross-validation rounds. Even though the table is dominated by transcriptomics variables, the metabonomic data is indeed adding substantially impact to the classification performance
Name ENSEMBL id/metabolite name (if available), HGNC HGNC gene ID, ppm Peak position in NMR spectrum for metabonomic variables, Data set data set the variable belong to, Selection freq. frequency of which the variable is selected over (outer) cross-validation rounds