| Literature DB >> 25617623 |
Yong Zeng1, Gang Wang2, Ence Yang2, Guoli Ji3, Candice L Brinkmeyer-Langford2, James J Cai4.
Abstract
Gene expression as an intermediate molecular phenotype has been a focus of research interest. In particular, studies of expression quantitative trait loci (eQTL) have offered promise for understanding gene regulation through the discovery of genetic variants that explain variation in gene expression levels. Existing eQTL methods are designed for assessing the effects of common variants, but not rare variants. Here, we address the problem by establishing a novel analytical framework for evaluating the effects of rare or private variants on gene expression. Our method starts from the identification of outlier individuals that show markedly different gene expression from the majority of a population, and then reveals the contributions of private SNPs to the aberrant gene expression in these outliers. Using population-scale mRNA sequencing data, we identify outlier individuals using a multivariate approach. We find that outlier individuals are more readily detected with respect to gene sets that include genes involved in cellular regulation and signal transduction, and less likely to be detected with respect to the gene sets with genes involved in metabolic pathways and other fundamental molecular functions. Analysis of polymorphic data suggests that private SNPs of outlier individuals are enriched in the enhancer and promoter regions of corresponding aberrantly-expressed genes, suggesting a specific regulatory role of private SNPs, while the commonly-occurring regulatory genetic variants (i.e., eQTL SNPs) show little evidence of involvement. Additional data suggest that non-genetic factors may also underlie aberrant gene expression. Taken together, our findings advance a novel viewpoint relevant to situations wherein common eQTLs fail to predict gene expression when heritable, rare inter-individual variation exists. The analytical framework we describe, taking into consideration the reality of differential phenotypic robustness, may be valuable for investigating complex traits and conditions.Entities:
Mesh:
Year: 2015 PMID: 25617623 PMCID: PMC4305293 DOI: 10.1371/journal.pgen.1004942
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1MD-based multivariate outlier detection.
(A) Scatter plot for the expression levels of two hypothetical genes. Three outliers indicated with red stars have the largest MD values to the population mean. (B) The chi-square plot showing the relative position and order of the three outlier data points, compared to those of non-outlier data points.
Gene sets that tend to be aberrantly expressed in LCLs of European descent.
| Gene set | # of genes | |
|---|---|---|
|
| ||
| 1. AIGNER_ZEB1_TARGETS | Genes up-regulated in MDA-MB-231 cells (breast cancer) after knockdown of ZEB1 [GeneID = 6935] by RNAi | 28 / 35 |
| 2. CAFFAREL_RESPONSE_TO_THC_8HR_3_UP | Genes up-regulated in EVSA-T cells (breast cancer) treated with 3 micromolar THC (delta-9-tetrahydrocannabinol) [PubChem = 6610319] for 8 h. | 5 / 5 |
| 3. GAUSSMANN_MLL_AF4_FUSION_TARGETS_E_UP | Up-regulated genes from the set E ( | 76 / 97 |
| 4. HOFMANN_MYELODYSPLASTIC_SYNDROM_RISK_UP | Genes up-regulated in bone marrow hematopoietic stem cells (HSC, CD34+ [GeneID = 947]) from patients with high risk of myelodysplastic syndrom (MDS) compared to the low risk patients. | 19 / 24 |
| 5. IWANAGA_CARCINOGENESIS_BY_KRAS_UP | Cluster 3: genes up-regulated in lung tissue samples from mice with tumor-bearing genotypes (activated KRAS [GeneID = 3845] alone or together with inactivated PTEN [GeneID = 5728]). | 141 / 170 |
| 6. LEIN_CHOROID_PLEXUS_MARKERS | Genes enriched in choroid plexus cells in the brain identified through correlation-based searches seeded with the choroid plexus cell-type specific gene expression patterns. | 79 /103 |
| 7. LIEN_BREAST_CARCINOMA_METAPLASTIC_VS_DUCTAL_DN | Genes down-regulated between two breast carcinoma subtypes: metaplastic (MCB) and ductal (DCB). | 77 / 114 |
| 8. LIU_PROSTATE_CANCER_UP | Genes up-regulated in prostate cancer samples. | 79 / 96 |
| 9. MASRI_RESISTANCE_TO_TAMOXIFEN_AND_AROMATASE_INHIBITORS_UP | Genes up-regulated in derivatives of MCF-7aro cells (breast cancer) that developed resistance to tamoxifen [PubChem = 5376] or inhibitors of aromatase (CYP19A1) [GeneID = 1588]. | 11 / 20 |
| 10. MIKKELSEN_MEF_ICP_WITH_H3K27ME3 | Genes with intermediate-CpG-density promoters (ICP) bearing the tri-methylation mark at H3K27 (H3K27me3) in MEF cells (embryonic fibroblasts). | 115 / 206 |
| 11. PEPPER_CHRONIC_LYMPHOCYTIC_LEUKEMIA_DN | Genes down-regulated in CD38+ [GeneID = 952] CLL (chronic lymphocytic leukemia) cells. | 11 / 21 |
| 12. POTTI_ETOPOSIDE_SENSITIVITY | Genes predicting sensitivity to etoposide [PubChem = 36462]. | 37 / 43 |
| 13. QI_PLASMACYTOMA_DN | Down-regulated genes that best disciminate plasmablastic plasmacytoma from plasmacytic plasmacytoma tumors. | 85 / 100 |
| 14. REACTOME_CGMP_EFFECTS | Genes involved in cGMP effects | 15 / 19 |
| 15. REACTOME_LIGAND_GATED_ION_CHANNEL_TRANSPORT | Genes involved in Ligand-gated ion channel transport | 6 / 21 |
| 16. VANHARANTA_UTERINE_FIBROID_UP | Genes up-regulated in uterine fibroids vs normal myometrium samples. | 39 / 45 |
| 17. WU_CELL_MIGRATION | Genes associated with migration rate of 40 human bladder cancer cells. | 143 / 184 |
|
| ||
| 18. TCCAGAG, MIR-518C | Targets of MicroRNA TCCAGAG, MIR-518C | 132 / 148 |
|
| ||
| 19. MODULE_122 | Genes in the cancer module 122 | 111 / 141 |
| 20. MODULE_215 | Genes in the cancer module 215 | 3 / 15 |
| 21. MODULE_274 | Genes in the cancer module 274 | 44 / 82 |
| 22. MORF_BCL2L11 | Neighborhood of BCL2L11 | 123 / 188 |
| 23. MORF_MYL3 | Neighborhood of MYL3 | 44 / 71 |
|
| ||
| 24. EXTRACELLULAR_LIGAND_GATED_ION_CHANNEL_ACTIVITY | Genes annotated by the GO term GO:0005230. Catalysis of the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. | 14 / 22 |
| 25. G_PROTEIN_COUPLED_RECEPTOR_ACTIVITY | Genes annotated by the GO term GO:0004930. A receptor that binds an extracellular ligand and transmits the signal to a heterotrimeric G-protein complex. These receptors are characteristically seven-transmembrane receptors and are made up of hetero- or homodimers. | 94 / 191 |
| 26. TRANSMISSION_OF_NERVE_IMPULSE | Genes annotated by the GO term GO:0019226. The sequential electrochemical polarization and depolarization that travels across the membrane of a nerve cell (neuron) in response to stimulation. | 108 / 189 |
|
| ||
| 27. MEL18_DN.V1_DN | Genes down-regulated in DAOY cells (medulloblastoma) upon knockdown of PCGF2 [GeneID = 7703] gene by RNAi. | 104 / 148 |
|
| ||
| 28. GSE19825_NAIVE_VS_DAY3_EFF_CD8_TCELL_UP | Genes up-regulated in comparison of naive CD8 T cells versus effector CD8 T cells. | 128 / 200 |
| 29. GSE19825_NAIVE_VS_IL2RALOW_DAY3_EFF_CD8_TCELL_UP | Genes up-regulated in comparison of naive CD8 T cells versus effector CD8 IL2RA [GeneID = 3559] low T cells at. | 133 / 200 |
| 30. GSE3982_NKCELL_VS_TH2_UP | Genes up-regulated in comparison of NK cells versus Th2 cells. | 136 / 200 |
| 31. GSE8515_CTRL_VS_IL6_4H_STIM_MAC_DN | Genes down-regulated in comparison of untreated macrophages versus those treated with IL6 [GeneID = 3569]. | 144 / 200 |
The names of gene sets and MSigDB subclasses are given. Number (#) of genes shows the number of genes included in SSMD computation and the number of genes in the original gene set.
Figure 2Gene expression profiles and outlier detection in the gene set, G-protein coupled receptor activity.
(A) The expression profiles of 326 EUR samples for 94 genes in the gene set. The expression profile of the outlier individual with the largest SSMD is outlined in red. (B) The chi-square plot showing three outliers, as highlighted with the star symbol. (C) The null distribution of SSMD established from 1,000 permutations of 94 randomly selected genes. The red vertical line indicates the observed value of SSMD computed for the original gene set.
Gene sets that tend not to be aberrantly expressed in LCLs of European descent.
| Gene set | # of genes | |
|---|---|---|
|
| ||
| 1. PID_ATM_PATHWAY | ATM pathway | 34 / 34 |
| 2. KEGG_HOMOLOGOUS_RECOMBINATION | Homologous recombination | 28 / 28 |
| 3. MORI_PRE_BI_LYMPHOCYTE_DN | Down-regulated genes in the B lymphocyte developmental signature, based on expression profiling of lymphomas from the Emu-myc transgenic mice: the Pre-BI stage. | 73 / 77 |
| 4. XU_RESPONSE_TO_TRETINOIN_UP | Genes up-regulated in NB4 cells (acute promyelocytic leukemia, APL) by tretinoinalone. | 14 / 16 |
| 5. FLECHNER_PBL_KIDNEY_TRANSPLANT_OK_VS_DONOR_DN | Genes downregulated in peripheral blood lymphocytes (PBL) from patients with well functioning kidneys more than 1-year post transplant compared to those from normal living kidney donors | 40 / 41 |
| 6. GARGALOVIC_RESPONSE_TO_OXIDIZED_PHOSPHOLIPIDS_LIGHTYELLOW_UP | Genes from the lightyellow module which are up-regulated in HAEC cells (primary aortic endothelium) after exposure to the oxidized 1-palmitoyl-2-arachidonyl-sn-3-glycerophosphorylcholine (oxPAPC). | 11 / 11 |
| 7. REACTOME_HOMOLOGOUS_RECOMBINATION_REPAIR_OF_REPLICATION_INDEPENDENT_DOUBLE_STRAND_BREAKS | Genes involved in Homologous recombination repair of replication-independent double-strand breaks | 16 / 17 |
| 8. REACTOME_G1_PHASE | Genes involved in G1 Phase. | 35 / 38 |
| 9. BIOCARTA_ATRBRCA_PATHWAY | Role of BRCA1, BRCA2 and ATR in Cancer Susceptibility. | 21 / 21 |
|
| ||
| 10. MODULE_87 | Genes in the cancer module 87 | 44 / 44 |
| 11. MORF_PRKAR1A | Neighborhood of PRKAR1A protein kinase, cAMP-dependent, regulatory, type I, alpha (tissue specific extinguisher 1) in the MORF expression compendium | 139 / 142 |
| 12. MORF_REV3L | Neighborhood of REV3L | 55 / 57 |
| 13. GNF2_DDX5 | Neighborhood of DDX5 DEAD (Asp-Glu-Ala-Asp) box polypeptide 5 in the GNF2 expression compendium | 62 / 63 |
|
| ||
| 14. CARBOHYDRATE_KINASE_ACTIVITY | Genes annotated by the GO term GO:0019200. Catalysis of the transfer of a phosphate group, usually from ATP, to a carbohydrate substrate molecule. | 15 / 15 |
Figure 3Power of SSMD test and validation of significant L- and S-SSMD gene sets.
(A) The change of power as a function of sample size. (B) The change of power as a function of the size of a gene set. (C) Validation of significant L- and S-SSMD gene sets using different expression data. Original: Geuvadis LCL expression data normalized using PEER (i.e., data used for the main results); Rep1: first set of replication of Geuvadis LCL expression data without PEER normalization; Rep2: second set of replication of Geuvadis LCL expression data without PEER normalization; Whole blood: GTEx whole blood expression data; and Muscle: GTEx muscle expression data. The boxplot shows the frequency of observed SSMD is greater than the control SSMD of 1,000 random replicates.
Figure 4Change of diffSSMD as a function of the ratio between partitioned samples and the power of diffSSMD test under varying sample size.
(A) The change of diffSSMD as a function of the size ratio of partitioned samples. The results with respect to two gene sets of size 20 and 40 are shown. For each ratio of partition, the distribution of diffSSMD rand were constructed from 100 randomly shuffled samples. (B) The change of the power of the diffSSMD test between EUR and AFR populations for the population-specific effect as a function of the size of EUR samples. The red line is fitted by using polynomial regression with the cubic model.
Figure 5Differences in expression discordance, heritability and variability between L- and S-SSMD genes.
(A) Normalized mean discordant expression (measure as the relative mean difference, RMD) per gene. (B) Heritability of gene expression. (C) Coefficient of variation of single-cell expression.
Figure 6Distributions of nonzero effect size β of cis-eSNPs of L-SSMD genes in outlier and non-outlier individuals.
The effect size β is genotype-weighted (i.e., β =|β|*genotype, where genotype={0,1,2}).
Density of private SNPs in ENCODE regulatory regions of L-SSMD genes.
| Density of private SNP (per million bp) | ||||||
|---|---|---|---|---|---|---|
| Abbreviation | Description | Observed (#/Mb) | Control 1 | Control 2 | Control 3 | Control 4 |
| E | Predicted enhancer | 2.07 (308/149) | 1.54* | 1.41* | 1.76* | 1.73* |
| TSS | Predicted promoter region including transcription start site | 1.91 (408/214) | 1.51* | 1.23* | 1.45* | 1.82 |
| CTCF | CTCF enriched element | 1.89 (213/113) | 1.71* | 1.34* | 1.56* | 1.79 |
| T | Predicted transcribed region | 2.00 (4184/2092) | 1.79* | 1.52* | 1.83 | 1.92 |
| PF | Predicted promoter flanking region | 1.79 (94/53) | 1.46* | 1.33* | 1.69 | 1.96 |
| R | Predicted repressed or low activity region | 1.88 (10152/5400) | 1.72* | 1.45* | 1.68* | 1.79 |
| WE | Predicted weak enhancer or open chromatin | 1.93 (102/53) | 1.59* | 1.63* | 2.15 | 2.00 |
| UNCL | Unclassified region | 1.64 (833/508) | 1.41* | 0.84* | 1.53 | 1.60 |
The symbol * indicates the SNP density in the corresponding control regions is significantly lower than that in the test regions of the outlier. The significance is assessed by one-tailed t test at the level of P = 0.001. Control 1: randomly selected non-outlier individuals to replace outlier individuals. Control 2: randomly selected genomic region that locate 10 Mb away from L-SSMD genes. Control 3: select randomly shuffled L-SSMD genes to the same amount of original gene set. Control 4: select randomly shuffled S-SSMD genes to the same amount of original gene set.