| Literature DB >> 31249421 |
Weiguang Mao1,2, Elena Zaslavsky3, Boris M Hartmann3, Stuart C Sealfon3, Maria Chikina4,5.
Abstract
A major challenge in gene expression analysis is to accurately infer relevant biological insights, such as variation in cell-type proportion or pathway activity, from global gene expression studies. We present pathway-level information extractor (PLIER) ( https://github.com/wgmao/PLIER and http://gobie.csb.pitt.edu/PLIER ), a broadly applicable solution for this problem that outperforms available cell proportion inference algorithms and can automatically identify specific pathways that regulate gene expression. Our method improves interstudy replicability and reveals biological insights when applied to trans-eQTL (expression quantitative trait loci) identification.Entities:
Mesh:
Year: 2019 PMID: 31249421 PMCID: PMC7262669 DOI: 10.1038/s41592-019-0456-1
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1:PLIER overview.
PLIER is a matrix factorization approach that decomposes gene expression data into a product of a small number of latent variables and their corresponding gene associations or loadings, while constraining the loadings to align with the most relevant automatically selected subset of prior knowledge. A, Given two inputs, the gene expression matrix Y and the prior knowledge (represented as binary geneset membership in matrix C), the method returns the latent variables (B), their loadings (Z), and an additional sparse matrix (U) that specifies which (if any) prior information genesets and pathways are used for each latent variable. The light gray area of U indicates the large number of zero elements of the matrix. We apply our method to a whole blood human gene expression dataset. B, The positive entries of the resulting U matrix are visualized as a heatmap, facilitating the identification of the correspondence between specific latent variables and prior biological knowledge. Since the absolute scale of the U matrix is arbitrary each column is normalized to a maximum of 1. C, We validate the latent variables mapped to specific leukocyte cell-types by comparing PLIER estimated relative cell-type proportions with direct measurements by Mass Cytometry. Dashed lines represent 0.05, 0.01, and 0.001 significance levels for Spearman rank correlation (single-tailed test). We find that the PLIER estimates are highly accurate, outperforming other matrix decomposition methods. Moreover, PLIER estimates are competitive and in 4 cases outperform both of the dedicated blood mixture deconvolution method NNLS [Abbas et al., 2009] and Cibersort [Newman et al., 2015].
Summary table of all pathway-level effects found in the DGN dataset.
Statistics were computed using Spearman rank correlation across 922 subjects with a two-sided test. False discovery rates are computed using the Benjamini-Hochberg procedure on the total number of tests (number of LVs × number of SNPs). SNP-LV associations that passed FDR<0.05 were further filtered to account for potential cis genes or mismapped cis homologs contributing to the LV estimtate (see Methods for details). In most cases pathways were named based on their geneset association captured in the U matrix. Some pathways are named based on further analysis of the expression patterns of top gene in a independent dataset of mouse immune cells, ImmGen [Heng et al,. 2008] (see Supplementary Fig. 5) and/or a the presence of a putative cis eQTL transcriptional mediator. The complete pathway utilization for these LVs can be seen in Fig.2. The expression patterns for top 15 genes driving each latent variable are plotted in Supplementary Fig. 6. Latent variables with no pathway association in PLIER decomposition (that is no positive entries in U) are starred.
| LV id | LV name | snps | cis-Gene(s) | Benjamini- |
|---|---|---|---|---|
| 44 | Mega/platelet 1 | rs1354034 | ARHGEF3 | 1.707e-41 |
| 133 | Mega/platelet 2 | rs1354034 | ARHGEF3 | 0.03095 |
| 120 | Histones | rs1354034 | ARHGEF3 | 0.0336191 |
| 97 | Zinc fingers, pseudogenes | rs1471738 | SENP7 | 4.011e-13 |
| 56 | PLAGL1 associated, myeloid | rs9321957 | PLAGL1 | 0.0001421 |
| 42* | IKZF1 associated, myeloid | rs10251980 | IKZF1 | 3.39e5-61 |
| 17 | NEK6 associated, myeloid | rs16927294 | NEK6 | 0.008223 |
| 67 | Neutrophils | rs13289095 | PKN3,SET,ZDHHC12 | 0.03361 |
| 55* | NFE2 associated, erythrocyte | rs35979828 | NFE2 | 3.538e-10 |
| 21 | Interferon-gamma | rs3184504 | SH2B3 | 0.0002198 |
| 40 | NFKB/TNF | rs12100841 | PPP2R3C | 0.005094 |
| 16 | Myeloid/ILC | rs1138358 | BCL2A1,MTHFS,ST20 | 0.0008103 |
Figure 2:A, A heatmap of a subset of the U matrix corresponding to LVs with a genotype effect (LV eQTLs). Only pathways with a cross-validation FDR of < 0.05 are shown. We find that two latent variables (LV44 and LV133) share pathway annotations (albeit with different coefficient) that suggest a relationship with megakaryocyte and platelet biology. B, Heatmap of the top genes in the loading for LV44 and LV133. Genes that are annotated to the pathways shown in panel A are in bold. C, Boxplots of the association of LV44 and LV133 with SNP rs1354034 (n=344, 429, 149 for 0, 1, 2 respectively) While the LV estimates are positively correlated, the effects of rs1354034 are opposite. These results indicate that the pathways captured by the expression patterns of LV44 and LV133 are independently regulated by the rs1354034 locus. Boxplot displays the 25th, 50th and 75th percentiles, with whiskers extending to 1.5x the interquartile range or the range of the data whichever is smallest. P-values indicate unocrrected two-tailed Spearman rank correlation test.
Summary table of the associations between the two mega/platelet LVs and SNPs known to affect only one platelet phenotype.
Statistics were computed using Spearman rank correlation across 922 subjects with a two-sided test. Raw p-values are reported. A total of 80 SNPs with known platelet phenotypes were tested [Gieger et al,. 2011]. While no SNPs outside of the ARGHEF3 locus achieved genome-wide significance, some associations were significant at FDR<0.05 when we consider only the 160 (80 SNPs × 2 LVs) hypotheses that are tested (significant p-values are in bold). We find that the associations of the two mega/platelet LVs with other loci known to affect platelet biology are distinct. Our analysis suggests that the early mega/platelet LV (LV133) is more closely related to the process controlling platelet number (PLT) while the late mega/platelet LV (LV44) is related to the process controlling platelet volume (MPV).
| phenotype | reported SNP | Close gene | LV44 p-value | LV 133 p- | proxy SNP |
|---|---|---|---|---|---|
| MPV | rs10876550 | COPZ1 | 0.69933 | rs10876550 | |
| PLT | rs2911132 | ERAP2 | 0.13817361 | rs2549803 |