Literature DB >> 23034535

Use of alternative promoters may hide genetic effects on phenotypic traits.

Abstract

Genome-wide association studies have identified a multitude of single-nucleotide polymorphisms (SNPs) associated with a wide spectrum of human phenotypic traits. However, the SNPs identified so far do not explain much of the expected genetic variation and they are poor predictors of the occurrence of disease. I recently advanced the hypothesis that there is person-to-person variation in the use of alternative regulatory elements (for example, gene promoters) and this new source of variation may explain in part the low genetic variation accounted for known genetic variants. In the present report a simple mathematical model is developed to explore the biological consequences of the proposed hypothesis. The model predicts that in presence of person-to-person variation in the use of alternative promoters the observable effects of genetic variants located inside promoters will be smaller than their actual effects. As a consequence, genetic variation because of those observed polymorphisms will be reduced. The present report suggests new paths of research to elucidate the genetic basis of human complex traits.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 23034535 PMCID： PMC3538106 DOI： 10.1038/jhg.2012.115

Source DB: PubMed Journal: J Hum Genet ISSN： 1434-5161 Impact factor: 3.172

INTRODUCTION

A persistent observation in the current field of human genetics is that single nucleotide polymorphisms (SNPs) identified through genome-wide association studies (GWAS) only explain a small fraction of the genetic variation of complex human traits;[1-3] the so-called missing heritability problem. Several non-mutually exclusive hypotheses such as rare alleles with large effects, and gene-gene and gene-environment interactions (reviewed by Manolio et al.[4], and Gibson[5]) have been proposed to explain this lack of explained genetic variation, and they may all account in part for the missing heritability. However, most of these hypotheses assume that the genome is read in the same way across all individuals and therefore, the same SNP has exactly the same functionality from a person to another person. This assumption may be not true for the use alternative regulatory elements (e.g. gene promoters).[6] For example in a gene with multiple promoters some persons would use preferentially a particular promoter, and other subjects would tend to use another promoter of the same gene. Under this scenario, a particular SNP would have functional significance only among those individuals who use the promoter inside which the SNP is located. The observed effect of that SNP would be attenuated relative to its actual effect on a phenotypic trait of interest. In the present report, I explore some of the quantitative consequences of the hypothesis of alternative use of regulatory elements.

MATERIALS AND METHODS

Let us assume a gene X that controls a continuous phenotypic trait Y. Expression of the gene X is under the control of M alternative promoters, and different SNPs may be present inside each promoter (Figure 1). The model assumes the existence of person-to-person variation in the use of the alternative promoters maybe through the action of epigenetic marks (e.g. DNA methylation). Although in the present work the model is restricted to a gene with only two promoters, P1 and P2, and two SNPs: G1 (inside P1) and G2 (inside P2), the results can be easily generalized to a gene with more than two promoters and more than one SNP inside each promoter. The SNP G1 has two alleles, A1 with frequency equal to p and A2 with frequency equal to p. The allele A1 increases by a units the value of the phenotypic trait Y compared to the allele A2. The SNP G2 has two alleles, B1 with frequency equal to q and B2 with frequency equal to q. The allele B1 increases by b units the value of the phenotypic trait Y compared to the allele B2. Each allele will affect the phenotypic trait Y only when its corresponding promoter is being used (i.e. the allele A1 increases the value of Y only when the promoter P1 is used, and the allele B1 increases the value of Y only when the promoter P2 is used). The promoter P1 is used in a proportion f of the chromosomes in the population, and the promoter P2 is used in an proportion f of the chromosomes in the population. Chromosomes that use the promoter P1 have an increase of e units of the phenotypic trait Y compared to chromosomes that use the promoter P2. Hardy-Weinberg equilibrium is assumed for each SNP. Since the goal of the present analysis is to show how genetic variability may be hidden due to the use of alternative promoters, in the following results it is assumed that we do not observe which one of the alternative promoters is being used. We only observe the genotypes in G1 and G2 as well as the individuals’ values of the phenotypic trait Y.

Figure 1

Gene with alternative promoters

A gene X is transcribed from M alternative promoters, P1, P2, …, PM. It is proposed the existence of person-to-person variation in which of the promoters is used. Each promoter contains different SNPs. A polymorphism G1 is located inside the P1 promoter, and a different polymorphism G2 is located inside the P2 promoter.

Table 1 shows both the unobserved types of chromosomes according to the unobserved promoter use and observed genotypes in the G1 and G2 SNPs (upper half of the table, chromosome types H1 through H8). Note that additive effects are measured relative to the chromosome H8 (P2A2B2) that by definition has a value equal to zero for the phenotypic trait Y. Observed chromosomes based on only the genotypes in the G1 and G2 SNPs are shown in the bottom half of Table 1 (chromosome types J1 through J4). Chromosome frequencies are shown for four different models of linkage disequilibrium (LD) between the G1 and G2 SNPs. The most general scenario (model 1) makes no assumption about any particular value of LD (given by the D coefficient or covariance between the G1 and G2 SNPs). Models 2, 3, and 4 are particular cases of model 1. Model 2 assumes linkage equilibrium between the G1 and G2 SNPs. Scenarios portrayed by models 3 and 4 refer to complete LD between the G1 and G2 SNPs. In model 3, the A1 and B1 (and A2 and B2) alleles are always present together (i.e. only unobserved chromosomes P1A1B1, P1A2B2, P2A1B1, and P2A2B2 exist in the population). An opposite pattern of complete LD is shown in model 4, where the A1 and B2 (and A2 and B1) alleles are always transmitted together (i.e. only unobserved chromosomes P1A1B2, P1A2B1, P2A1B2, and P2A2B1 are present in the population). It must be noticed that observed chromosomes are obtained from the unobserved chromosomes after collapsing over promoters P1 and P2. For example the observed J1 chromosome (A1B1) is a mixture of the unobserved H1 (P1A1B1) and H5 (P2A1B1) chromosomes. Phenotypic value of the J1 chromosome is the average of the phenotypic values of the H1 and H5 chromosomes weighted by the f and f proportions, respectively. Frequency of the J1 chromosome is just the sum of the frequencies of the H1 and H5 chromosomes. The rest of the observed chromosomes can be obtained in a similar way: J2 = H2 + H6, J3 = H3 + H7, and J4 = H4 + H8.

Table 1

List of actual and observed types of chromosomes with their respective phenotypic values and frequencies under linkage disequilibrium (LD) patterns

		Frequency under four different models of linkage disequilibrium (LD)[1]
Type	Phenotypic value	General (1)	No LD (2)(r² = 0, D'=0)	Completepositive LD (3)(r² = 1, D'=1, p₁ = q₁)	Completenegative LD (4)(r² = 1, D'=-1, p₁= q₂)
Actual (unobserved chromosomes)

H1 = P1A1B1	y₁ = a + e	z₁ = f₁(p₁q₁ + D)	z₁ = f₁p₁q₁	z₁ = f₁p₁	z₁ = 0
H2 = P1A1B2	y₂ = a + e	z₂ = f₁(p₁q₂ − D)	z₂ = f₁p₁q₂	z₂ = 0	z₂ = f₁p₁
H3 = P1A2B1	y₃ = e	z₃ = f₁(p₂q₁ − D)	z₃ = f₁p₂q₁	z₃ = 0	z₃ = f₁p₂
H4 = P1A2B2	y₄ = e	z₄ = f₁(p₂q₂ + D)	z₄ = f₁p₂q₂	z₄ = f₁p₂	z₄ = 0
H5 = P2A1B1	y₅ = b	z₅ = f₂(p₁q₁ + D)	z₅ = f₂p₁q₁	z₅ = f₂p₁	z₅ = 0
H6 = P2A1B2	y₆ = 0	z₆ = f₂(p₁q₂ − D)	z₆ = f₂p₁q₂	z₆ = 0	z₆ = f₂p₁
H7= P2A2B1	y₇ = b	z₇ = f₂(p₂q₁ − D)	z₇ = f₂p₂q₁	z₇ = 0	z₇ = f₂p₂
H8 = P2A2B2	y₈ = 0	z₈ = f₂(p₂q₂ + D)	z₈ = f₂p₂q₂	z₈ = f₂p₂	z₈ = 0

Observed chromosomes

J₁ = A₁B₁	v₁ = f₁(a+e) + f₂b	w₁ = p₁q₁ + D	W₁ = p₁q₁	w₁ = p₁	w₁ = 0
J₂=A₁B₂	v₂ = f₁(a+e)	w₂ = p₁q₂ − D	W₂ = p₁q₂	w₂ = 0	W₂ = p₁
J₃ = A₂B₁	v₃ = f₁e + f₂b	W₃ = p₂q₁ − D	W₃ = p₂q₁	w₃ = 0	W₃ = p₂
J₄ = A₂B₄	v₄ = f₁e	w₄ = p₂q₂ + D	W₄ = p₂q₂	W₄ = p₂	w₄ = 0

D is the linkage disequilibrium coefficient that measures the deviation of the observed frequency of a haplotype from its expected frequency under linkage equilibrium

D’ is defined as the LD coefficient normalized to the maximum D that is possible given the observed allele frequencies: D’ = D/Dmax r2 is the squared correlation coefficient between the G1 and G2 SNPs: r2 = D2/(p1p2q1q2)

SUMMARY STATISTICS

The mean chromosome value of the phenotypic trait Y is equal to where phenotype(i) and frequency(i) are the phenotype value and frequency of the i-th chromosome (either observed or unobserved). Variance of the chromosome phenotype values would be equal to It is noteworthy that the mean of the chromosome phenotype values does not depend on LD and is the same for the actual (unobserved) and observed chromosomes. However, as it will be shown below, the actual variance due to unobserved chromosomes (i.e. total variance) will be always greater or equal than the variance due to observed chromosomes. In other words the variance due to measurable genetic variation (i.e. G1 and G2 SNPs) will fail to explain 100% of the actual variance due to the totality of unobserved chromosomes in the population.

RESULTS

Let us define K as the ratio of the variance due to observed chromosomes to the total variance due to unobserved chromosomes. Figure 2 shows K under three different particular scenarios: 1) no LD between the G1 and G2 SNPs, 2) positive LD between the G1 and G2 SNPs (i.e. A1 and B1 alleles tend to be transmitted together), and 3) negative LD between the G1 and G2 SNPs (i.e. A1 and B2 alleles tend to be transmitted together).

Figure 2

Proportion of total variance that is explained by observed chromosomes

In presence of person-to-person variation in the use of alternative promoters the variance due to observed chromosomes is always lower that the total variance of the genetic system (K < 1). Only when use of one promoter is fixed in the population (f = 0 or f = 1) the observed chromosomes would explain 100% of the total genetic variance (K = 1). K variation is under three possible scenarios of linkage disequilibrium between the G1 and G2 SNPs: A) linkage equilibrium, B) positive linkage disequilibrium, and C) negative linkage disequilibrium. The additive effects of the A1 and B1 alleles were assumed to be equal to 5 units of the phenotypic trait. The epigenetic effect was allowed to take four different values in A) (e = 0, 5, 10, and 20 units), and kept constant in B) and C) (e = 5 units).

SCENARIO 1

Figure 2A shows K under linkage equilibrium between the G1 and G2 SNPs (model 2 in Table 1) for different epigenetic effects and proportion of chromosomes using the promoter P1. The additive effects of the A1 and B1 alleles were assumed to be equal to 5 units of the continuous phenotypic trait (a = b = 5 units). The epigenetic effect was allowed to take four different values: e = 0, 5, 10, and 20 units of the phenotypic trait. It is clear that K ≤ 1, and the higher the epigenetic effect the lower the K ratio (i.e. the observed chromosomes explain less of the total variance due to the actual unobserved chromosomes). Even in absence of any epigenetic effect (i.e. e = 0 meaning that the P1 and P2 promoters have the same baseline level of the phenotypic trait Y) the observed chromosomes do not explain the totality of the variance due to unobserved chromosomes. The only instances when K = 1 are when only one promoter is used in the population (i.e. f = 1, use of promoter P1 is fixed; or f = 0, use of promoter P2 is fixed).

SCENARIO 2

Figure 2B shows K under positive LD between the G1 and G2 SNPs (i.e. the A1 and B1 alleles tend to be transmitted together in the same chromosome) for different r values (squared correlation between the G1 and G2 SNPs) and proportion of chromosomes using the promoter P1. The additive effects of the A1 and B1 alleles as well as the epigenetic effect were kept constant and equal to 5 units of the phenotypic trait (a = b = e = 5 units). For this scenario K ≤ 1 too, and it is noteworthy that the stronger the LD between both SNPs (i.e. the higher r) the more the observed chromosomes would explain the total variance. When r = 1.0 (complete positive LD as shown in model 3 of Table 1) reduction of K is attenuated in comparison to the case of linkage equilibrium (r = 0.0).

SCENARIO 3

Figure 2C shows K under negative LD between the G1 and G2 SNPs (i.e. the A1 and B2 alleles tend to be transmitted together in the same chromosome) for different r values and proportion of chromosomes using the promoter P1. The additive effects of the A1 and B1 alleles as well as the epigenetic effect were kept constant and equal to 5 units of the phenotypic trait (a = b = e = 5 units). Similar to the previous two scenarios we have that K ≤ 1 however, in presence of negative LD the higher r the lower the variance that is explained by the observed chromosomes. Maximum reduction of that K is observed when r = 1.0 (complete negative LD as shown in model 4 of Table 1). Only the haplotypes A1B2 and A2B1 are observed in the presence of complete negative LD, and as Figure 2C shows the variance due to the observed chromosomes can completely disappear (K = 0). A simple calculation shows that K vanishes when the proportion of chromosomes using the P1 promoter is equal to f = b/(a + b). K will disappear at f = 0.5 when both A1 and B1 alleles have the same additive effect (a = b); at f < 0.5 when the A1 allele has a higher additive effect than allele B1 (a > b); and at f > 0.5 when the A1 allele has a lower additive effect that allele B1 (a < b).

DISCUSSION

The current model offers a potential mechanism to explain in part why genetic variants discovered so far do not explain much of the expected genetic variability. Although part of the unexplained variability may be due to rare genetic polymorphisms still to be found,[4] the model predicts that person-to-person variation in the use of alternative promoters would reduce the observed genetic variance of a genetic system. Thus, even a complete knowledge of all the genetic variants involved in a particular phenotypic trait would be no enough to explain the whole genetic variance of the trait. Three major factors explain the reduction of the genetic variance according to the model discussed in the present work. First, the observed additive effects of the SNPs inside each of the alternative promoters is attenuated in comparison to their actual effects. For example, because the allele A1 of the G1 SNP exerts its effect only when the promoter P1 is being used, its observed additive effect would be reduced by a factor equal to f relative to its actual effect. The same situation applies for the B1 allele of the G2 SNP whose observed additive effect would be attenuated by a factor equal to f. Second, because the use of alternative promoters is not being measured (e.g. in current genetic epidemiology studies such scenario is not even considered as a possibility) the dimensionality of the observed data would be always lower than the actual dimensionality of the population data. The number of observed chromosomes will be less than the number of actual chromosomes in the population. Third, different promoters may have different baseline levels of the phenotypic trait under study further reducing the proportion of the actual variance that is due to measured genetic polymorphisms. Recent published evidence supports the proposed hypothesis of person-to-person variation in the use of alternative promoters. Turner et al. reported the presence of high inter-individual variability in the methylation patterns of alternative promoters of the glucocorticoid receptor (NR3C1) gene in twenty-six healthy subjects, suggesting person-to-person variation in epigenetic regulatory mechanisms.[7] A small study that measured promoter activity of the aromatase (CYP19A1) gene in skin fibroblasts from 4 normal volunteers found that one subject showed increased activity of the promoters I.3 and II in response to cAMP, in contrast to the other 3 subjects who expressed the cAMP-unresponsive promoter I.4.[8] In non-malignant lung tissue from 15 patients with non-small cell lung cancer, two cases used mostly promoters I.3 and II of the CYP19A1 gene and the rest of patients used the promoter I.4.[9] It is noteworthy that may even exist ethnic differences in the use of alternative promoters. A recent study in 101 women with uterine leiomyoma (31 African American, 34 white American, and 36 Japanese women) reported that leiomyoma tissue from African American women expressed the promoter II in higher proportion compared to Japanese women.[10] At last, the CD36 gene showed inter-individual variability in the use of four out of five alternative promoters in cultured monocytes from 10 subjects.[11] The present results, published evidence about variability in the use of alternative promoters, and the fact that more than half of human genes have alternative promoters,[12] with a mean of 3.1 promoters per gene[13] stress the need to carry out extensive studies in human populations to determine and quantify inter-individual variation in the use of alternative promoters. To date there are few approaches to assess the use of alternative promoters in a genome-wide scale. Singer et al.[14] developed a promoter tiling array that can identify about 35,000 alternative promoters from almost 7,000 human genes, and Jacox et al.[15] described a computational approach to determine alternative promoter usage in nearly 1,500 genes using the Affymetrix Exon 1.0 array. Although those microarrays only interrogates a subset of genes in the genome (i.e. those genes with known alternative promoters) they would provide enough data to test the proposed hypothesis in a genome-wide scale. A comprehensive assessment should ideally measure person-to-person variation across different types of tissue. The present model can be easily extended to include cases of genes with more than two promoters and more than one SNP in each of the promoters. In a gene with multiple promoters, the observed additive effect of a particular SNP would be reduced by a factor equal to the proportion of chromosomes in the population using the promoter in which the SNP is located. The model may also be used for other types of alternative regulatory elements such as multiple enhancers affecting gene expression; the so-called shadow enhancers.[16-18] A limitation of the presented model is that depends on the knowledge about alternative promoters or regulatory elements in general. More experimental work such as chromatin immunoprecipitation (ChIP)-chip assays validated with transgenic models is needed to identify new regulatory elements. In summary, the present report shows that in presence of inter-individual variation in the use of alternative promoters the observable effects of genetic variants will be lower than their actual effects. The proposed model may explain in part why GWAS-identified variants are in most part poor predictors of human complex traits. Future studies are needed to determine and quantify the person-to-person variability in the use of alternative promoters as well as to identify new regulatory elements in the human genome.

18 in total

1. Genetics. Enhancing gene regulation.

Authors: Gregory A Wray; Courtney C Babbitt
Journal: Science Date: 2008-09-05 Impact factor: 47.728

2. Many sequence variants affecting diversity of adult human height.

Authors: Daniel F Gudbjartsson; G Bragi Walters; Gudmar Thorleifsson; Hreinn Stefansson; Bjarni V Halldorsson; Pasha Zusmanovich; Patrick Sulem; Steinunn Thorlacius; Arnaldur Gylfason; Stacy Steinberg; Anna Helgadottir; Andres Ingason; Valgerdur Steinthorsdottir; Elinborg J Olafsdottir; Gudridur H Olafsdottir; Thorvaldur Jonsson; Knut Borch-Johnsen; Torben Hansen; Gitte Andersen; Torben Jorgensen; Oluf Pedersen; Katja K Aben; J Alfred Witjes; Dorine W Swinkels; Martin den Heijer; Barbara Franke; Andre L M Verbeek; Diane M Becker; Lisa R Yanek; Lewis C Becker; Laufey Tryggvadottir; Thorunn Rafnar; Jeffrey Gulcher; Lambertus A Kiemeney; Augustine Kong; Unnur Thorsteinsdottir; Kari Stefansson
Journal: Nat Genet Date: 2008-04-06 Impact factor: 38.330

3. Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes.

Authors: Kouichi Kimura; Ai Wakamatsu; Yutaka Suzuki; Toshio Ota; Tetsuo Nishikawa; Riu Yamashita; Jun-ichi Yamamoto; Mitsuo Sekine; Katsuki Tsuritani; Hiroyuki Wakaguri; Shizuko Ishii; Tomoyasu Sugiyama; Kaoru Saito; Yuko Isono; Ryotaro Irie; Norihiro Kushida; Takahiro Yoneyama; Rie Otsuka; Katsuhiro Kanda; Takahide Yokoi; Hiroshi Kondo; Masako Wagatsuma; Katsuji Murakawa; Shinichi Ishida; Tadashi Ishibashi; Asako Takahashi-Fujii; Tomoo Tanase; Keiichi Nagai; Hisashi Kikuchi; Kenta Nakai; Takao Isogai; Sumio Sugano
Journal: Genome Res Date: 2005-12-12 Impact factor: 9.043

Review 4. Rare and common variants: twenty arguments.

Authors: Greg Gibson
Journal: Nat Rev Genet Date: 2012-01-18 Impact factor: 53.242

5. CpG dinucleotide methylation of the CYP19 I.3/II promoter modulates cAMP-stimulated aromatase activity.

Authors: Masashi Demura; Serdar E Bulun
Journal: Mol Cell Endocrinol Date: 2007-12-08 Impact factor: 4.102

6. Distinct class of putative "non-conserved" promoters in humans: comparative studies of alternative promoters of human and mouse genes.

Authors: Katsuki Tsuritani; Takuma Irie; Riu Yamashita; Yuta Sakakibara; Hiroyuki Wakaguri; Akinori Kanai; Junko Mizushima-Sugano; Sumio Sugano; Kenta Nakai; Yutaka Suzuki
Journal: Genome Res Date: 2007-06-13 Impact factor: 9.043

7. Shadow enhancers as a source of evolutionary novelty.

Authors: Joung-Woo Hong; David A Hendrix; Michael S Levine
Journal: Science Date: 2008-09-05 Impact factor: 47.728

8. Genome-wide association analysis identifies 20 loci that influence adult height.

Authors: Michael N Weedon; Hana Lango; Cecilia M Lindgren; Chris Wallace; David M Evans; Massimo Mangino; Rachel M Freathy; John R B Perry; Suzanne Stevens; Alistair S Hall; Nilesh J Samani; Beverly Shields; Inga Prokopenko; Martin Farrall; Anna Dominiczak; Toby Johnson; Sven Bergmann; Jacques S Beckmann; Peter Vollenweider; Dawn M Waterworth; Vincent Mooser; Colin N A Palmer; Andrew D Morris; Willem H Ouwehand; Jing Hua Zhao; Shengxu Li; Ruth J F Loos; Inês Barroso; Panagiotis Deloukas; Manjinder S Sandhu; Eleanor Wheeler; Nicole Soranzo; Michael Inouye; Nicholas J Wareham; Mark Caulfield; Patricia B Munroe; Andrew T Hattersley; Mark I McCarthy; Timothy M Frayling
Journal: Nat Genet Date: 2008-04-06 Impact factor: 38.330

9. Highly individual methylation patterns of alternative glucocorticoid receptor promoters suggest individualized epigenetic regulatory mechanisms.

Authors: Jonathan D Turner; Laetitia P L Pelascini; Joana A Macedo; Claude P Muller
Journal: Nucleic Acids Res Date: 2008-11-12 Impact factor: 16.971

10. Genome-wide analysis of alternative promoters of human genes using a custom promoter tiling array.

Authors: Gregory A C Singer; Jiejun Wu; Pearlly Yan; Christoph Plass; Tim H M Huang; Ramana V Davuluri
Journal: BMC Genomics Date: 2008-07-25 Impact factor: 3.969

1 in total

1. Redundant enhancers and causal variants in the TCF7L2 gene.

Authors: Edward A Ruiz-Narváez
Journal: Eur J Hum Genet Date: 2014-02-12 Impact factor: 4.246

1 in total