| Literature DB >> 16978413 |
Debraj GuhaThakurta1, Tao Xie, Manish Anand, Stephen W Edwards, Guoya Li, Susanna S Wang, Eric E Schadt.
Abstract
BACKGROUND: Changes in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene expression quantitative trait loci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16978413 PMCID: PMC1618400 DOI: 10.1186/1471-2164-7-235
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Determining nIBD-BXH regions from IBD blocks between mouse strains. B6 refers to C57BL/6J, DBA refers to DBA/2J, and C3H refers to C3H/HeJ. Horizontal bars represent genomic sequence. Regions that are in the same color between two or more strains represent the IBD blocks between those strains. nIBD-BXH (indicated with a box) are regions that are IBD between C3H/HeJ and DBA/2J, but nIBD between C3H/HeJ and C57BL/6J, and nIBD between C57BL/6J and DBA/2J, as explained in the text.
Genes containing SNPs in promoters (Prom) and non-coding regions (NCR)
| NCR | 4752 | 2047 | 3514 | 1769 | 1.169 | |
| Prom 2 Kb | 1569 | 863 | 1.277 | |||
| Prom 5 Kb | 2260 | 1220 | 1.253 | |||
| Cons NCR | 1476 | 782 | 1.230 | |||
| Cons Prom 2 Kb | 236 | 122 | 1.200 | |||
| Cons Prom 5 Kb | 388 | 196 | 1.173 |
CEG and non-CEG sets are defined in the text. Data for genes containing SNPs in non-coding region (NCR), 2 Kb upstream (Prom 2 Kb) and 5 Kb (Prom 5 Kb) upstream regions are given. Data on genes containing SNPs in conserved regions between mouse and human are indicated by 'Cons'. p-values less than 0.01 are in bold. All p-values are based on the Fisher exact test (FET). The ratio of over-representation (ROR) is defined as: ratio of the fraction of CEGs containing cis-SNPs to the fraction of all genes (CEGs+non-CEGs) containing cis-SNPs.
SNP density in the promoter or non-coding regions of CEGs and non-CEGs
| Full Combined Set | NCR | 1769 | 1745 | 0.630 | 0.463 | NA | NA | NA | ||
| Prom 2 Kb | 863 | 706 | 1.291 | 1.172 | NA | NA | NA | |||
| Prom 5 Kb | 1220 | 1040 | 0.789 | 0.685 | NA | NA | NA | |||
| Genes with No | NCR | 987 | 1051 | 0.546 | 0.352 | NA | NA | NA | ||
| SNPs in Conserved | Prom 2 Kb | 741 | 592 | 1.243 | 1.106 | NA | NA | NA | ||
| Regions (subset 1) | Prom 5 Kb | 1024 | 848 | 0.731 | 0.620 | NA | NA | NA | ||
| Genes having SNPs | All NCR | 0.736 | 0.630 | NA | NA | NA | ||||
| in Conserved | Non Cons NCR | 782 | 694 | 0.657 | 0.554 | 0.732 | 0.620 | |||
| Regions (subset 2) | Cons NCR | 0.091 | 0.101 | 0.050 | 1.359 | 1.193 | ||||
| All Prom 2 Kb | 1.581 | 1.513 | 0.172 | NA | NA | NA | ||||
| Non Cons 2 Kb | 122 | 114 | 1.376 | 1.415 | 0.498 | 1.886 | 2.169 | 0.482 | ||
| Cos Prom 2 Kb | 0.758 | 0.706 | 0.278 | 5.227 | 5.721 | 0.026 | ||||
| All Prom 5 Kb | 1.091 | 0.967 | 0.036 | NA | NA | NA | ||||
| Non Cons 5 Kb | 196 | 192 | 0.910 | 0.822 | 0.119 | 1.139 | 1.025 | 0.174 | ||
| Cons Prom 5 Kb | 0.357 | 0.317 | 0.036 | 3.763 | 3.276 | |||||
"SNP density (Normalized by total non-coding or promoter length)" = 1000*(total number of SNPs in non-coding or promoter sequence)/(total non-coding or promoter length). "SNP density (Normalized by Conserved or Non-Cons region length)" = 1000*(the number of SNPs in conserved or non-conserved regions)/(total length of the conserved or non-conserved sequence in promoters or non-coding regions). "Mean SNP density" gives the average SNP-density over all the genes in a particular set. The means are only given for reference, and have not been used for calculation of p-values (which were done using a non-parametric method). p-values of significance with the Wilcoxon rank sum test (WRST) are given. H0 = CEGs and non-CEGs have equal SNP density, HA = CEGs have higher SNP density compared to non-CEGs. p-values less than 0.01 are in bold.
Perturbation of TFBSs by cis-SNPs and summary gene counts where TFBS predictions are affected by cis-SNPs
| NCR | 3514 | 1769 | 610 | 344 | |
| Prom 2 Kb | 1569 | 863 | 94 | 61 | 0.0174 |
| Prom 5 Kb | 2260 | 1220 | 193 | 111 | 0.1346 |
| Cons NCR | 1476 | 782 | 129 | 66 | 0.6338 |
| Cons Prom 2 Kb | 236 | 122 | 12 | 8 | 0.0852 |
| Cons Prom 5 Kb | 388 | 196 | 27 | 15 | 0.2293 |
The total gene-set consisted of the combined set of CEGs and non-CEGs as described in the text. All p-values are based on the Fisher exact test (FET); p-values less than 0.01 are in bold.
Figure 2An example of a putative candidate . A. cis-acting LOD scores for Casc4 on chromosome 2 in multiple tissues and sample sets (male, female or combined/all) in the BXH cross. x-axis – genomic location in Mb, y-axis – LOD score from interval mapping. Physical location of the gene is indicated with a red arrow-head. Only LODs scores >10 are shown. B. Association of expression levels of Casc4 with genotypes of the promoter SNP, mCV23866990. The distribution of the expression levels in brain (left) and adipose (right) is shown according to the genotypes of this SNP in the F2 animals. A_A represents the DBA and C3H allele, and C_C the B6 allele. C. A binding site for transcription factor Hand1 is affected by SNP mCV23866990. The polymorphism changes a highly conserved base in the binding site (T→C change on the reverse strand, boxed and shaded). The frequency matrix and a sequence logo of the profile representing the binding site are shown. D. Scatter plot of Casc4 (x-axis) versus Hand1 expression levels in the body atlas data set [49, 50]. Hand1 expression is correlated to that of Casc4 with a p-value of < 10-6 (Spearman rank order correlation -0.58).