| Literature DB >> 35907790 |
Anna Niehues1,2, Daniele Bizzarri3,4, Marcel J T Reinders4,5, P Eline Slagboom3,6, Alain J van Gool2, Erik B van den Akker3,4,5, Peter A C 't Hoen7.
Abstract
Population-scale expression profiling studies can provide valuable insights into biological and disease-underlying mechanisms. The availability of phenotypic traits is essential for studying clinical effects. Therefore, missing, incomplete, or inaccurate phenotypic information can make analyses challenging and prevent RNA-seq or other omics data to be reused. A possible solution are predictors that infer clinical or behavioral phenotypic traits from molecular data. While such predictors have been developed based on different omics data types and are being applied in various studies, metabolomics-based surrogates are less commonly used than predictors based on DNA methylation profiles.In this study, we inferred 17 traits, including diabetes status and exposure to lipid medication, using previously trained metabolomic predictors. We evaluated whether these metabolomic surrogates can be used as an alternative to reported information for studying the respective phenotypes using expression profiling data of four population cohorts. For the majority of the 17 traits, the metabolomic surrogates performed similarly to the reported phenotypes in terms of effect sizes, number of significant associations, replication rates, and significantly enriched pathways.The application of metabolomics-derived surrogate outcomes opens new possibilities for reuse of multi-omics data sets. In studies where availability of clinical metadata is limited, missing or incomplete information can be complemented by these surrogates, thereby increasing the size of available data sets. Additionally, the availability of such surrogates could be used to correct for potential biological confounding. In the future, it would be interesting to further investigate the use of molecular predictors across different omics types and cohorts.Entities:
Keywords: Clinical surrogates; Expression profiling; Meta-analysis; Metabolomics; Multi-omics; Population cohort study; Predictors; Surrogate outcomes; Surrogates; Transcriptomics
Mesh:
Year: 2022 PMID: 35907790 PMCID: PMC9339202 DOI: 10.1186/s12864-022-08771-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
Fig. 1Overview of study workflow. Gene-wise models are fitted for various outcome variables based on reported information or metabolomic surrogates, respectively
Overview of phenotypic traits. Availability of variable is indicated by ‘x’
| Phenotypic trait | Reported outcome | Surrogate outcome |
|---|---|---|
| Low estimated Glomerular Filtration Rate (eGFR) | x | |
| High triglycerides | x | x |
| High LDL-associated cholesterol | x | x |
| High total cholesterol | x | x |
| Low HDL-associated cholesterol | x | x |
| Diabetes | x | |
| Metabolic syndrome | x | |
| Sex | x | x |
| Lipid medication | x | x |
| BMI/obesity status | x | x |
| High high-sensitivity C-reactive protein (hsCRP) | x | x |
| Blood pressure lowering medication | x | |
| Low hemoglobin | x | x |
| Low white blood cells | x | x |
| Current smoking | x | x |
| Alcohol consumption | x | |
| High age (≥65 y.o.) | x | x |
Fig. 2Comparison of association study result characteristics. Number of significant associations (based on bacon-corrected and FDR-corrected p-values ) (A), mean absolute effect sizes (based on bacon-corrected effect sizes) across all genes (B), and bias (C) and inflation (D) of test statistics (t-statistic) for alternative models per comparisons and cohort. Horizontal lines for test-statistic mean =0 and standard deviation =1 of theoretical null distribution were added. Comparisons are ordered by performance of metabolomic predictors for binary outcome measures. Type of outcome variable is indicated by color: reported or measured variable = black, metabolomic surrogate = orange. Mean values across four cohorts (two cohorts for hsCRP) are plotted as horizontal bars. Note the log10 scale on the y-axis of the upper plot
Fig. 3Pairwise comparisons of association study results. Absolute Pearson correlation coefficients (Pearson r) of bacon-adjusted regression coefficients of gene-wise linear models (limma/voom) for outcome variables in alternative models per comparison and cohort. Comparisons are ordered by performance (AUC) of metabolomic predictors for binary outcome measures. Mean values across four cohorts (two cohorts for hsCRP) are plotted as horizontal bars (gray)
Fig. 4Meta-analyses and replication studies. Number of meta-analyzed genes (significant associations, bacon-adjusted p-values FDR-adjusted for multiple testing, p<0.05) in leave-one-cohort meta-analyses (A) and percentage of genes replicated (significant associations, Bonferroni-adjusted for multiple testing, p<0.05) in replication cohort (B). Type of outcome variable is indicated by color: reported or measured variable = black, metabolomic surrogate = orange. Mean values across four meta-analyses/replication studies are plotted as horizontal bars. Note the log10 scale on the y-axis of the upper plot
Fig. 5Gene-set enrichment analyses of association study results. Numbers of significantly enriched (Bonferroni-adjusted p<0.05) pathways (Reactome) for each outcome found in each cohort and in meta-analysis of all four (two for hsCRP) cohorts (top). Values for each type of outcome variable are represented as colored bars: reported variable = black, metabolomic surrogate = orange, intersection, i.e., pathways found by all outcome variables = blue