| Literature DB >> 29457906 |
Joram M Posma, Isabel Garcia-Perez1, Timothy M D Ebbels, John C Lindon, Jeremiah Stamler2, Paul Elliott, Elaine Holmes, Jeremy K Nicholson.
Abstract
Metabolism is altered by genetics, diet, disease status, environment, and many other factors. Modeling either one of these is often done without considering the effects of the other covariates. Attributing differences in metabolic profile to one of these factors needs to be done while controlling for the metabolic influence of the rest. We describe here a data analysis framework and novel confounder-adjustment algorithm for multivariate analysis of metabolic profiling data. Using simulated data, we show that similar numbers of true associations and significantly less false positives are found compared to other commonly used methods. Covariate-adjusted projections to latent structures (CA-PLS) are exemplified here using a large-scale metabolic phenotyping study of two Chinese populations at different risks for cardiovascular disease. Using CA-PLS, we find that some previously reported differences are actually associated with external factors and discover a number of previously unreported biomarkers linked to different metabolic pathways. CA-PLS can be applied to any multivariate data where confounding may be an issue and the confounder-adjustment procedure is translatable to other multivariate regression techniques.Entities:
Keywords: Monte Carlo cross-validation; biomarker discovery; chemometrics; confounder elimination; covariate adjustment; metabolic phenotyping; multivariate data analysis; random matrix theory; reanalysis; sampling bias
Mesh:
Substances:
Year: 2018 PMID: 29457906 PMCID: PMC5891819 DOI: 10.1021/acs.jproteome.7b00879
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Data analysis framework and covariate-adjustment algorithm. Left panel shows different stages of the data analysis and shows how the introduction of bias is avoided by carefully splitting and scaling the data before modeling. Right panel (cyan box) outlines the covariate-adjustment algorithm that is used in the data analysis framework in the left panel in the cyan-colored boxes. The green outline indicates the entire MCCV procedure, the red dashed box the regression analysis performed for each covariate and blue dotted boxes indicate a CV loop. Here, β are regression coefficients, RMT stands for random matrix theory (see Supporting Information for algorithm) and ◦ denotes an element-wise operation. See Supporting Information for a glossary of mathematical operations used here.
Figure 2Score plots of the MCCV models of predictive and first orthogonal components with kernel density estimate (KDE), R2 and Q2 shown for the predictive axis. North Chinese individuals (Beijing and Shanxi) are shown as red circles and south Chinese (Guangxi) as cyan crosses. (a) Unadjusted model of urine collection 1. (b) Unadjusted model of urine collection 2. (c) Covariate-adjusted model of urine collection 1. (d) Covariate-adjusted model of urine collection 2. Age, gender, BMI, (on medication for) HBP, smoking status, physical activity, Na/K ratio, and total intake of fats were adjusted for in the CA-(O)PLSDA models (c and d).
Figure 3Top shows the average 1H NMR spectrum from the first visit. The bottom panel shows the variable contribution across MCCV models. Models were adjusted for age, gender, HBP/medication, BMI, physical activity, smoking status, Na/K-ratio, and total fat intake. Labels: 1, 2-oxoisocaproate; 2, leucine; 3, valine; 4, unknown (1.15(s), 3.49(d), 3.61(d), 3.67(m), 3.83(m)); 5, ethylglucuronide; 6, 2-hydroxyisobutyrate; 7, unknown (1.42(d), 1.46(d), 1.51(d)); 8, unknown (1.82(m), 3.52(s)); 9, N-acetyl-S-(1Z)-propenyl-cysteine-sulfoxide; 10, glutamine; 11, acetone; 12, unknown (2.32(d), 2.34(d), 2.38(d), 2.40(d), 3.52(m)); 13, prolinebetaine; 14, sarcosine; 15, dimethylglycine; 16, unknown (1.84(m), 2.78(m), 2.95(s), 3.36(m), 3.59(m), 3.62(m)); 17, creatine; 18, N6,N6,N6-trimethyllysine; 19, dimethylsulfone; 20, O-acetylcarnitine; 21, carnitine; 22, taurine; 23, 4-hydroxyhippurate; 24, 1-methylhistidine; 25, histidine; 26, tyrosine; 27, pseudouridine; 28, formate; 29, N-methylnicotinic acid. Supplementary Figure 5 shows the results for the unadjusted model.
Figure 4Perturbations to a living system often instigate changes to multiple pathways simultaneously; we show here a condensed multicompartmental metabolic reaction network of the homeostatic urinary signature of differences between north and south Chinese individuals for the human supra-organism, created using MetaboNetworks. A link is shown between two metabolites if the reaction is listed in KEGG and can occur in Homo sapiens (solid lines) or the most abundant endosymbionts (dotted lines). Metabolites not connected in the network, and those not listed in KEGG, were connected to the closest related metabolite in the network, indicated by a dashed line. The background shading illustrates different types of metabolism based on the closest affinity with some overlap between groups. A table with full names for the abbreviated metabolite names can be found in Supplementary Table 5.