| Literature DB >> 21858027 |
Ning Jiang1, Minghui Wang, Tianye Jia, Lin Wang, Lindsey Leach, Christine Hackett, David Marshall, Zewei Luo.
Abstract
BACKGROUND: It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21858027 PMCID: PMC3153488 DOI: 10.1371/journal.pone.0023192
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Probability distribution of joint genotypes at a test marker and a putative QTL and genotypic values at the QTL.
| Genotypes at QTL |
|
|
| ||||||
| Marker genotypes |
|
|
|
|
|
|
|
|
|
| Probabilities | ( | 2 |
| 2 | 2 | 2 | (1− | 2(1− | (1− |
| Genotypic values at QTL |
|
|
| ||||||
where A and a are segregating alleles at a putative QTL, T and t are alleles at the test marker locus. Allele frequency of A is q, allele frequency of T is p. Q and R are conditional probabilities of marker allele T given QTL allele A and a respectively, which are formulated as and where D is the coefficient of linkage disequilibrium between the marker and QTL. μ, d and h are population mean, additive and dominance genic effects at the QTL.
Means and standard errors of regression coefficients (b±se) and proportions ( or ) of statistical tests for significance of the regression coefficients from three methods.
| Pop |
|
| Method 1 | Method 2 | Method 3 | |||||||||||
| Simulated | Predicted | Simulated | Predicted | Simulated | Predicted | |||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |||
| 1 | 0.04 | 0.00 | −0.078±0.015 | 0.06 | 0.00 | 0.00 | 1.293±0.006 | 0.98 | 1.278 | 1.00 | 0.006±0.007 | 0.00 | 1.035±0.006 | 0.84 | 0.00 | 0.00 |
| 2 | 0.04 | 0.00 | −0.087±0.015 | 0.07 | 0.00 | 0.00 | 1.162±0.006 | 0.97 | 1.163 | 0.98 | −0.008±0.007 | 0.00 | 0.940±0.007 | 0.74 | 0.00 | 0.00 |
| 3 | −0.09 | 0.00 | 0.015±0.008 | 0.00 | 0.00 | 0.00 | −2.371±0.005 | 1.00 | −2.368 | 1.00 | 0.006±0.007 | 0.00 | −2.038±0.006 | 1.00 | 0.00 | 0.00 |
| 4 | −0.09 | 0.00 | 0.005±0.011 | 0.00 | 0.00 | 0.00 | −3.157±0.007 | 1.00 | −3.157 | 1.00 | −0.007±0.009 | 0.00 | −2.725±0.008 | 1.00 | 0.00 | 0.00 |
| 5 | 0.02 | 0.05 | 0.965±0.021 | 0.48 | 0.828 | 0.55 | −0.159±0.007 | 0.00 | −0.166 | 0.00 | 0.997±0.006 | 0.85 | 0.082±0.007 | 0.00 | 0.994 | 0.91 |
| 6 | 0.04 | 0.07 | 1.086±0.008 | 0.86 | 1.062 | 0.92 | 0.130±0.007 | 0.00 | 0.125 | 0.00 | 1.280±0.006 | 1.00 | 0.375±0.007 | 0.01 | 1.274 | 1.00 |
| 7 | 0.05 | 0.08 | 1.341±0.008 | 0.98 | 1.325 | 1.00 | 0.333±0.007 | 0.01 | 0.331 | 0.01 | 1.593±0.006 | 1.00 | 0.597±0.007 | 0.14 | 1.59 | 1.00 |
| 8 | 0.05 | 0.08 | 1.260±0.006 | 0.99 | 1.249 | 0.99 | 0.313±0.007 | 0.01 | 0.312 | 0.01 | 1.503±0.006 | 1.00 | 0.572±0.007 | 0.13 | 1.499 | 1.00 |
| 9 | 0.04 | 0.08 | 1.307±0.014 | 0.92 | 1.234 | 0.99 | −0.005±0.006 | 0.00 | 0.00 | 0.00 | 1.698±0.006 | 1.00 | 0.333±0.007 | 0.02 | 1.704 | 1.00 |
| 10 | −0.04 | 0.00 | 0.008±0.009 | 0.01 | 0.00 | 0.00 | −1.233±0.006 | 0.99 | −1.234 | 0.99 | −0.003±0.007 | 0.00 | −0.995±0.007 | 0.80 | 0.00 | 0.00 |
and are the coefficients of LD between the marker and QTL in the simulated mixed population before and after correction for population structure respectively.
predicted when all individuals were allocated to their correct subpopulations;
predicted when half of all individuals were correctly allocated to their subpopulations but other half were randomly allocated to either of the two subpopulations. The predicted values were estimated from theoretical analysis, while the simulated values were estimated from the simulation studies.
Figure 1The first 2 Principal Components from PCA of 142 mixed HapMap Project human samples.
The first and second principal components explained 60.77% and 1.34% of total variability respectively.
The number of eQTLs detected by three different methods (Methods 1, 2, 3 or M1, 2, 3 accordingly) or detected common between two of these methods from the CEU, CHB+JPT and their mixed samples.
| The number of eQTLs per expression trait | The CEU samples | The CHB+JPT samples | The mixed CEU and CHB+JPT samples | ||||||||
| M1 | M2 | M1+2 | M1 | M2 | M1+2 | M1 | M3 | M1+3 | M3a | M3+3a | |
| 1 | 280 | 312 |
| 263 | 255 |
| 206 | 251 |
| 398 |
|
| 2 | 58 | 57 |
| 43 | 41 |
| 16 | 13 |
| 136 |
|
| 3 | 20 | 21 |
| 13 | 16 |
| 2 | 7 |
| 97 |
|
| 4 | 10 | 16 |
| 8 | 6 |
| 2 | 2 |
| 72 |
|
| 5 | 4 | 4 |
| 5 | 6 |
| 0 | 0 |
| 48 |
|
| 6 | 3 | 1 |
| 1 | 3 |
| 0 | 0 |
| 37 |
|
| 7 | 3 | 3 |
| 0 | 2 |
| 0 | 0 |
| 22 |
|
| 8 | 0 | 2 |
| 1 | 0 |
| 1 | 0 |
| 22 |
|
| 9 | 2 | 1 |
| 0 | 0 |
| 0 | 1 |
| 14 |
|
| > = 10 | 19 | 22 |
| 6 | 7 |
| 2 | 2 |
| 1,111 |
|
| Total eQTLs | 1,009 | 1,149 |
| 633 | 670 |
| 296 | 354 |
| 1,975 |
|
|
| 21 | 22 |
| 48 | 49 |
| 51 | 58 |
| 618 |
|
|
| 988 | 1127 |
| 585 | 621 |
| 245 | 296 |
| 1,339 |
|
M3a is for Method 3 when individuals were randomly assigned to the Europe derived sample (CEU) with probability of 58% or to the Asia derived sample (CHB+JPT) otherwise.
Figure 2Manhattan plots for the genome-wide eQTL analysis of two genes POMZP3 and HSD17B12; Quantile-quantile (QQ) plots to compare the distributions between expected and observed p-values.
Plots show score (−log10 p-value) for all SNPs by physical position for POMZP3 and HSD17B12 respectively based on simple linear regression (Method 2, a and b) and corrected linear regression (Method 1, c and d) in 142 mixed population samples.
Figure 3Histograms of coefficient of determination for eQTLs from 142 mixed sample set.
a for Method 1 and b for Method 3.