| Literature DB >> 22174852 |
Marion Dawn Teare1, Suteeraporn Pinyakorn, James Heighway, Mauro F Santibanez Koref.
Abstract
Genome wide association studies frequently reveal associations between disease susceptibility and polymorphisms outside coding regions. Such associations cannot always be explained by linkage disequilibrium with changes affecting the transcription products. This has stimulated the interest in characterising sequence variation influencing gene expression levels, in particular in changes acting in cis. Differences in transcription between the two alleles at an autosomal locus can be used to test the association between candidate polymorphisms and the modulation of gene expression in cis. This type of approach requires at least one transcribed polymorphism and one candidate polymorphism. In the past five years, different methods have been proposed to analyse such data. Here we use simulations and real data sets to compare the power of some of these methods. The results show that when it is not possible to determine the phase between the transcribed and potentially cis acting allele there is some advantage in using methods that estimate phased genotype and effect on expression simultaneously. However when the phase can be determined, simple regression models seem preferable because of their simplicity and flexibility. The simulations and the analysis of experimental data suggest that in the majority of situations, methods that assume a lognormal distribution of the allelic expression ratios are both robust to deviations from this assumption and more powerful than alternatives that do not make these assumptions.Entities:
Mesh:
Year: 2011 PMID: 22174852 PMCID: PMC3236754 DOI: 10.1371/journal.pone.0028636
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Diagrammatic representation of the effect of a cis acting polymorphism upon allelic expression.
Depicted is the situation for an individual who is heterozygous for a cis acting polymorphism with alleles A and C and is also heterozygous for a polymorphism within the affected transcript.
Figure 2Observed allelic expression ratios measured at rs5854, a transcribed polymorphism at the 3′ end of the MMP1 gene grouped according to the genotype for rs11292517, a polymorphism in the promoter region of the gene.
Figure 3A visualisation of different approaches for testing an association between allelic expression and a biallelic polymorphism.
The distribution of allelic expression ratios across a population is represented. We consider here two polymorphisms: a transcribed one, with alleles m and M, used to measure allelic expression; and a cis acting one with alleles c and C. Each elongated diamond represents the mean and the spread of the AEI measurements by specific genotypes. A) The general situation. B) Perfect disequilibrium (D′ = 1, R2 = 1) between the cis acting and the transcribed polymorphism, only two distinct haplotypes exist. C) Complete disequilibrium (D′ = 1, R2<1), only three distinct haplotypes exist. D) Situation when the phase between alleles at both sites is known.
Summary of tests used.
| Test | Motivation | Advantages | Disadvantages | Notes |
|
| General situation | Easy to expand (e.g. several cis acting sites). | Assumption of Log normality. | Assume that expression from one allele is drawn from a lognormal distribution |
|
| Requires specialised software | Enables joint estimation of phase and effect. | ||
|
| Compared to LRT.j reduced power in the absence of disequilibrium | Two step procedure: In the first step the phased genotype probabilities are estimated and in the second the effect is assessed. | ||
|
| Simple calculation | Lack of power when phase uncertain | As LRT.p but uses the most likely (best) phased genotype for each individual For R2<1 corresponds to regression of the log AER onto the most likely genotype. | |
|
| As LRT.b but uses true simulated genotype. Represents the outcome of the LRT tests once phase uncertainty has been eliminated. | |||
|
| Perfect disequilibrium (R2 = 1) | No assumption on distribution | Diminishing power when SNPs tend to equilibrium | Tests systematic overexpression of one of the alleles. We use here the Sign test. |
|
| Linkage equilibrium (|D′| = 0,R2 = 0) | Does not require estimating phase | Diminishing power with increasing disequilibrium. Assumes lognormality | Tests whether the spread of AER is larger among heterozygous at the |
|
| |D′|<1, R2<1. | Insensitive to transcribed marker effect | Lack of power when phase uncertain. Assumes lognormality | Requires at least two distinct genotypes to be observed at the cis acting site among transcribed marker heterozygotes. Assumes that the phase can be inferred in double heterozygotes, so we use here the most likely genotype. |
|
| |D′|<1, R2<1. | Represents the outcome of the test above once phase uncertainty has been eliminated. | ||
|
| Complete disequilibrium (|D′| = 1,R2<1) | No assumption on distribution | Assumes that all double heterozygotes have the same phased genotype | Tests whether there is a difference in AER between heterozygotes and homozygotes for the cis acting polymorphism. We use here the Wilcoxon test. |
: Pattern of disequilibrium, as represented in Figure 3, for which the test is most appropriate.
: Assumes that given the genotype AERs follow a log normal distribution.
Experimental data sets.
| Data set name |
|
|
| References |
|
|
| Genotyped Individuals | 107 | 257 |
| Transcribed SNP | rs5854 | rs1799977 |
|
| rs11292517 | rs1800734 |
| AER | ||
| Individuals analysed | 38 | 74 |
| Method | RFLP and gel densitometry | MALDI-TOF |
| Comments | Samples affected by non-sense mediated decay have been excluded |
: [32].
: [33].
Figure 4Power comparisons when data are simulated assuming a log normal distribution for the allelic expression ratios.
For all simulations: . Panel A: Effect of sample size assuming transcribed and cis acting polymorphism are in linkage equilibrium (Simulation parameters: and ). Panel B: The influence of the extent of disequilibrium (Simulation parameters: ); Panels C and D: The influence of effect size (Panel C for and panel D for other simulation parameters ). Panels E and F: The influence of allele frequency for the transcribed polymorphism (Panel E for and panel F for , othersimulation parameters: ). Panels G and H: The influence of allele frequency for the cis acting variant (Panel G for and Panel H for ,other parameters: ).
Figure 5Power comparisons when the simulated model allows for one transcribed marker allele to be consistently over-expressed.
Simulation parameters: . Analysis in greyscale is conducted using (misspecified) methods that do not allow for an allele specific expression effect from the transcribed polymorphism (Panel A: , i.e. no effect from the cis acting polymorphism, and panel B: ). Panels C and D: Analysis conducted using models that do allow for an effect from the transcribed polymorphism (Panel C: and panel D: ).
Figure 6Additional sites affecting the expression in cis.
The graph represents the influence of the number of sites upon the power to detect the SNP with the largest effect. All polymorphisms are assumed to be in linkage disequilibrium. Simulation parameters: .
Figure 7Deviation from a simple log normal distribution (Simulation parameters ).
Panels A and B show the effects of outliers (). In panel A and in Panel B the outlier frequency, p, is 0.03: Panels C and D present the situation when the log of the expression of each allele follows a t-distribution with 2 degrees of freedom (C for and D for ).
Figure 8Effect of sample size in experimental data.
We examine here the power to detect the cis acting effect of polymorphisms known to affect transcription for MMP1 (panel A) and MLH1 (panel B).