| Literature DB >> 34680927 |
Vânia Tavares1,2, Joana Monteiro1,3, Evangelos Vassos4,5, Jonathan Coleman4,5, Diana Prata1,6.
Abstract
Predicting gene expression from genotyped data is valuable for studying inaccessible tissues such as the brain. Herein we present eGenScore, a polygenic/poly-variation method, and compare it with PrediXcan, a method based on regularized linear regression using elastic nets. While both methods have the same purpose of predicting gene expression based on genotype, they carry important methodological differences. We compared the performance of expression quantitative trait loci (eQTL) models to predict gene expression in the frontal cortex, comparing across these frameworks (eGenScore vs. PrediXcan) and training datasets (BrainEAC, which is brain-specific, vs. GTEx, which has data across multiple tissues). In addition to internal five-fold cross-validation, we externally validated the gene expression models using the CommonMind Consortium database. Our results showed that (1) PrediXcan outperforms eGenScore regardless of the training database used; and (2) when using PrediXcan, the performance of the eQTL models in frontal cortex is higher when trained with GTEx than with BrainEAC.Entities:
Keywords: expression quantitative trait loci; gene expression; genome wide association study; polygenic score; transcriptome
Mesh:
Year: 2021 PMID: 34680927 PMCID: PMC8536060 DOI: 10.3390/genes12101531
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Representation of the steps taken for the selection of genes (A) for which an expression quantitative trait loci (eQTL) model was trained and validated, both internally (B) and externally (C). RNA: ribonucleic acid.
Figure 2Comparison of the eGenScore’s and PrediXcan’s methodological features. LD: linkage disequilibrium; SNP: single nucleotide polymorphism.
Figure 3Venn diagram representing the number of genes for which an eQTL model trained with (a) BrainEAC and eGenScore (orange); (b) GTEx and eGenScore (green); (c) BrainEAC and PrediXcan (blue); or (d) GTEx and PrediXcan (yellow) was found to be statistically significant during the internal validation (i.e., |r| > 0.1 and Fisher’s p-value < 0.05).
Comparison of the gene expression models’ internal validation performance (i.e., the squared averaged Pearson correlation coefficient between the predicted and observed gene expressions) across datasets (i.e., BrainEAC vs. GTEx) and across frameworks (i.e., eGenScore vs. PrediXcan).
| Comparison | df, |
| Cohen’s |
|---|---|---|---|
| eGenScore framework (BrainEAC vs. GTEx) | 61, 3.10 | 0.003 ** | 0.39 |
| PrediXcan framework (BrainEAC vs. GTEx) | 30, −3.63 | 0.001 *** | 0.65 |
| BrainEAC dataset (eGenScore vs. PrediXcan) | 20, −1.79 | 0.088 | 0.39 |
| GTEx dataset (eGenScore vs. PrediXcan) | 227, −13.86 | <0.001 *** | 0.92 |
Only genes with a significant model (i.e., with an absolute averaged Pearson correlation coefficient between the predicted and observed gene expressions above 0.1 and a Fisher’s p-value below 0.05) were considered for this comparison. A two-sided paired-sample t-test was conducted and considered statistically significant at a p-value < 0.05. df: degrees of freedom (i.e., number of genes for which there was a significant model minus one). **: p < 0.01; ***: p < 0.001; t: t-statistic.
Figure 4Comparison of the squared averaged Pearson correlation coefficient (r2) between the predicted and observed gene expressions during the internal cross-validation across databases (i.e., BrainEAC vs. GTEx) using the eGenScore (A) or the PrediXcan (B) framework and across frameworks (i.e., eGenScore vs. PrediXcan) using the BrainEAC (C) or GTEx (D) dataset. The model’s performance is represented with a filled black or hollow gray marker if trained with the BrainEAC or GTEx database, respectively, and with a circle or triangle if trained with the eGenScore or PrediXcan framework, respectively.
Comparison of the gene expression models’ external validation performance (i.e., the squared Pearson correlation coefficient between the predicted and observed gene expressions in the CMC dataset) across datasets (i.e., BrainEAC vs. GTEx) and across frameworks (i.e., eGenScore vs. PrediXcan).
| Comparison | df, |
| Cohen’s |
|---|---|---|---|
| eGenScore framework (BrainEAC vs. GTEx) | 33, 2.57 | 0.015 * | 0.44 |
| PrediXcan framework (BrainEAC vs. GTEx) | 15, −2.04 | 0.060 | 0.51 |
| BrainEAC dataset (eGenScore vs. PrediXcan) | 8, −2.95 | 0.018 * | 0.98 |
| GTEx dataset (eGenScore | 115, −15.76 | <0.001 *** | 1.46 |
Only genes with a statistically significant Pearson correlation coefficient between the predicted and observed gene expressions in the CMC dataset (p-value < 0.05) were considered for this comparison. A two-sided paired-sample t-test was conducted and considered statistically significant at a p-value < 0.05. df: degrees of freedom (i.e., number of genes for which there was a model whose predicted expression correlated significantly with the observed expression of that gene minus one). *: p < 0.05; ***: p < 0.001; t: t-statistic.