| Literature DB >> 35982195 |
Enrico Mossotto1,2,3, Joanna Boberska4, James J Ashton1,5, Imogen S Stafford1,2,3, Guo Cheng1,3, Jonathan Baker5, Florina Borca3, Hang T T Phan3, Tracy F Coelho5, R Mark Beattie5, Sandrine P Claus4, Sarah Ennis6,7,8.
Abstract
Crohn's disease (CD) is characterised by chronic inflammation. We aimed to identify a relationship between plasma inflammatory metabolomic signature and genomic data in CD using blood plasma metabolic profiles. Proton NMR spectroscopy were achieved for 228 paediatric CD patients. Regression (OPLS) modelling and machine learning (ML) approaches were independently applied to establish the metabolic inflammatory signature, which was correlated against gene-level pathogenicity scores generated for all patients and functional enrichment was analysed. OPLS modelling of metabolomic spectra from unfasted patients revealed distinctive shifts in plasma metabolites corresponding to regions of the spectrum assigned to N-acetyl glycoprotein, glycerol and phenylalanine that were highly correlated (R2 = 0.62) with C-reactive protein levels. The same metabolomic signature was independently identified using ML to predict patient inflammation status. Correlation of the individual peaks comprising this metabolomic signature of inflammation with pathogenic burden across 15,854 unselected genes identified significant enrichment for genes functioning within 'intrinsic component of membrane' (p = 0.003) and 'inflammatory bowel disease (IBD)' (p = 0.003). The seven genes contributing IBD enrichment are critical regulators of pro-inflammatory signaling. Overall, a metabolomic signature of inflammation can be detected from blood plasma in CD. This signal is correlated with pathogenic mutation in pro-inflammatory immune response genes.Entities:
Mesh:
Year: 2022 PMID: 35982195 PMCID: PMC9388636 DOI: 10.1038/s41598-022-18178-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1NMR and genomic data integration. Phase (I) NMR spectra and patient CRP data were input to the RF model using (a) RFECV and (b) cross-validated methods to select spectral regions discriminating non/inflamed patients. Phase (II) Informative data-points were clustered and peaks reduced to a single eigenvector. Phase (III) Eigenvectors for each peak were individually correlated against all genes and tested for enrichment. Created with BioRender.com.
Demographic and blood result data.
| Clinical data | Inflamed | Non-inflamed | |
|---|---|---|---|
| Number of samples | 228 | 58 | 96 |
| % Caucasian | 93.0 | 97.6% | 91.2% |
| % Male | 71.2 | 75.9% | 66.7% |
| Age in years at plasma extraction | 14.0 (2.6–17.9) | 14.0 (5.4–17.2) | 14.1 (10.3–16.1) |
| Age in years at diagnosis | 12.2 (1.3–16.9) | 12.6 (4.1–16.1) | 11.9 (2.4–16.6) |
| Time in years since diagnosis to point of sampling | 1.8 (0.0–16.1) | 1.4 (0.0–6.5) | 2.2 (0.0–9.4) |
| Fasted (% of samples) | 30 (13%) | 0 | 0 |
| CRP (mg/L) | 8.75 (0–155) | 20.1 (5–155) | 1.1 (0–4) |
| ALB (g/L) | 38.4 (23–51) | 35.4 (25–45) | 40.5 (26–51) |
| ESR (mm/h) | 14 (1–68) | 21.6 (5–68) | 9 (1–41) |
| HB (g/L) | 123.3 (73–166) | 118.6 (89–144) | 126.5 (80–166) |
| PCV (%/L) | 0.4 (0.2–0.5) | 0.35 (0.3–0.4) | 0.4 (0.3–0.5) |
| PLT (109/L) | 343.5 (138–1018) | 382 (148–1018) | 316.3 (138–568) |
| WBC (109/L) | 7.6 (3.1–20.3) | 8.4 (3.5–16.7) | 7.1 (3.1–17.1) |
Mean value is shown with (minimum–maximum) Ancestry was inferred from genomic data.
Medication usage between inflamed and uninflamed patient groups.
| Thiopurine | Anti-TNF (infliximab or adalimumab) | Steroids | Exclusive enteral nutrition | Ustekinumab | Vedolizumab | |
|---|---|---|---|---|---|---|
| CRP ≥ 5 (n = 58) | 27 patients | 6 patients | 5 patients | 7 patients | 0 patients | 0 patients |
| CRP < 5 (n = 96) | 53 patients | 26 patients | 14 patients | 3 patients | 0 patients | 0 patients |
| 0.30 | 0.28 | n/a | n/a |
Significant values are in [bold].
Patient were frequently on multiple therapies. Twenty-seven patients were on no medications, or only nutritional therapy, at the time of plasma sampling.
*Calculated using a χ2 test.
Figure 2CRP prediction and spectra deconvolution. (A) OPLS scores plot. Each point represents one patient spectrum, colour-coded according to CRP levels. Strong correlation between T and Tcv indicates a robust model. (B) Loadings plot; colour-scale indicates the correlation magnitude of metabolites with the model scores (r2).
List of selected signals from the OPLS model.
| Peak δ (ppm) | Multiplicity | OPLS weight | variation | Assigned metabolite |
|---|---|---|---|---|
| 2.01 | Singlet | 0.49 | ↑ | Composite glycoprotein |
| 2.04 | Singlet | 0.48 | ↑ | Composite glycoprotein |
| 2.66 | Multiplet | 0.69 | ↑ | Unassigned |
| 3.56 | Doublet of doublets | 0.64 | ↑ | Glycerol |
| 3.64 | Doublet of doublets | 0.52 | ↑ | Glycerol |
| 7.33 | Multiplet | 0.46 | ↑ | Phenylalanine |
| 7.38 | Multiplet | 0.45 | ↑ | Phenylalanine |
| 7.43 | Multiplet | 0.49 | ↑ | Phenylalanine |
Reported peaks showed an OPLS weight > 0.4. The OPLS weight value represent the R^2 for each metabolites.
Figure 3Machine learning classification of patients using NMR data. (A) Most informative regions selected by RF model to discriminate patient inflammation status. (B) PCA of patient’s spectra using 258 most informative NMR datapoints. (C) Distribution of the selected most informative datapoints by their shift δ (ppm) and importance. Green and red dashed lines indicate the start and end of a peak.
Machine learning selected NMR peaks.
| Peak | Peak min ppm | Peak max ppm | Delta ppm | # of NMR data points | Max importance observed | Average importance observed | PC1 explained variance (%) | PC2 explained variance (%) | Components selected for gene correlation | Identified by OPLS modelling |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1.45 | 1.50 | 0.058 | 18 | 0.25 | 0.14 | 79.5 | 14.9 | PC1 | No |
| 2 | 2.03 | 2.07 | 0.047 | 21 | 0.28 | 0.15 | 92.4 | 5.1 | PC1, PC4, PC5 | Yes ( |
| 3 | 2.65 | 2.66 | 0.011 | 40 | 0.54 | 0.25 | 98.6 | 0.8 | PC1 | Yes (unassigned peak) |
| 4 | 3.56 | 3.57 | 0.017 | 47 | 82.1 | 15.2 | PC1, PC2 | Yes (glycerol) | ||
| 5 | 7.20 | 7.22 | 0.018 | 58 | 0.57 | 0.22 | 84.7 | 2.7 | PC1 | Yes (phenylalanine) |
| 6 | 7.25 | 7.26 | 0.016 | 15 | 0.27 | 0.16 | 84.9 | 4.9 | PC1 | Yes (phenylalanine) |
Significant values are in [bold].
Peaks identified by the RF classifier in the discrimination of CD patients by their CRP status. Reported importance is scaled by the maximum importance observed.
Enrichment results of gene-peak correlations.
| Peak | Enrichment term (term_id) | Term sizea | Intersectionb | Correlation set | Adjusted | Enriching genesd |
|---|---|---|---|---|---|---|
| 1 | 75 | 11 | All | |||
| Rectum; glandular cells[High] (HPA:0400053) | 2641 | 66 | Negative | 0.026 | Supplementary Table | |
| Peptide GPCRs (WP:WP24) | 75 | 7 | Negative | 0.033 | ||
| GPCRs, Class A Rhodopsin-like (WP:WP455) | 256 | 20 | All | 0.037 | Supplementary Table | |
| hSIR2-p53 complex (CORUM:2821) | 2 | 2 | Positive | 0.050 | ||
| SEC23–SEC24 adaptor complex (CORUM:7139) | 2 | 2 | Positive | 0.050 | ||
| 2 | Receptor complex (GO:0043235) | 379 | 18 | Positive | 0.019 | Supplementary Table |
| Regulation of actin cytoskeleton (KEGG:04810) | 217 | 12 | Positive | 0.028 | ||
| RNA polymerase I transcription regulatory region sequence-specific DNA binding (GO:0001163) | 8 | 3 | Negative | 0.043 | ||
| RNA polymerase I core promoter sequence-specific DNA binding (GO:0001164) | 8 | 3 | Negative | 0.043 | ||
| 3 | Intrinsic component of membrane (GO:0031224) | 2464 | 111 | All | 0.027 | Supplementary Table |
| DTNBP1(1A)-HDAC3 complex (CORUM:7487) | 2 | 2 | Negative | 0.050 | ||
| BKCA-beta2AR complex (CORUM:672) | 2 | 2 | Positive | 0.050 | ||
| 4 | 2464 | 110 | All | Supplementary Table | ||
| 63 | 7 | Negative | ||||
| 6 | 4 | All | ||||
| 2355 | 105 | All | Supplementary Table | |||
| Chromatin silencing complex (GO:0005677) | 6 | 3 | Positive | 0.018 | ||
| Oxidoreductase activity, acting on the CH-NH2 group of donors (GO:0016638) | 17 | 4 | Negative | 0.035 | ||
| BKCA-beta2AR complex (CORUM:672) | 2 | 2 | Positive | 0.050 | ||
| 5 | 181 | 13 | Negative | |||
| Postsynaptic membrane (GO:0045211) | 103 | 8 | Positive | 0.022 | ||
| RFC complex (CORUM:277–279-2799) | 5 | 3 | All | 0.050 | ||
| MSP58-RINT1 complex (CORUM:6314) | 5 | 2 | Negative | 0.050 | ||
| 6 | Plasma membrane (GO:0005886) | 5 | 105 | Positive | 0.050 | Supplementary Table |
| Cell periphery (GO:0071944) | 2 | 105 | Positive | 0.050 | Supplementary Table | |
| Intrinsic component of plasma membrane (GO:0031226) | 4879 | 83 | All | 0.008 | Supplementary Table | |
| Protein-arginine deiminase activity (GO:0004668) | 4971 | 3 | Negative | 0.018 | ||
| Integral component of membrane (GO:0016021) | 1591 | 112 | All | 0.018 | Supplementary Table | |
| Intrinsic component of membrane (GO:0031224) | 5 | 116 | All | 0.025 | Supplementary Table | |
| SPG3A–SPG33 complex (CORUM:6525) | 2355 | 2 | Positive | 0.040 |
Significant values are in [bold].
Enriched terms for genes that positively or negatively correlate with the identified peaks.
aThe term size indicates the number of genes belonging to a specific term in the relative dataset.
bThe intersection refers to the number of genes from the correlation analysis that overlaps with a specific term.
cSCS correction method embedded in gProfiler2.
dThe complete list of genes enriching for the named term is reported in the Supplementary Table 1.