| Literature DB >> 22373165 |
Abstract
Next-generation sequencing allows for a new focus on rare variant density for conducting analyses of association to disease and for narrowing down the genomic regions that show evidence of functionality. In this study we use the 1000 Genomes Project pilot data as distributed by Genetic Analysis Workshop 17 to compare rare variant densities across seven populations. We made the comparisons using regressions of rare variants on total variant counts per gene for each population and Tajima's D values calculated for each gene in each population, using data on 3,205 genes. We found that the populations clustered by continent for both the regression slopes and Tajima's D values, with the African populations (Yoruba and Luhya) showing the highest density of rare variants, followed by the Asian populations (Han and Denver Chinese followed by the Japanese) and the European populations (CEPH [European-descent] and Tuscan) with the lowest densities. These significant differences in rare variant densities across populations seem to translate to measures of the rare variant density more commonly used in rare variant association analyses, suggesting the need to adjust for ancestry in such analyses. The selection signal was high for AHNAK, HLA-A, RANBP2, and RGPD4, among others. RANBP2 and RGPD4 showed a marked difference in rare variant density and potential selection between the Luhya and the other populations. This may suggest that differences between populations should be considered when delimiting genomic regions according to functionality and that these differences can create potential for disease heterogeneity.Entities:
Year: 2011 PMID: 22373165 PMCID: PMC3287875 DOI: 10.1186/1753-6561-5-S9-S39
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Rare variant density for each population and all populations pooled using all available SNPs. The slope for the linear regression of rare variants on total variants per gene for all populations pooled is shown in black (each point represents an individual gene). The highlighted genes are those with the greatest difference between observed and predicted values. The slopes for the seven individual populations are shown in different colors. The Denver Chinese (D. Chinese) and Han Chinese (H. Chinese) are superimposed in such a way that only one line is observed for both slopes.
Rare variant density and values of Tajima’s D across populations
| Population | Slope | Mean Tajima’s | |||
|---|---|---|---|---|---|
| Estimate | Standard Error | Nonsynonymous SNPs | Synonymous SNPs | ||
| Tuscan | 0.576 | 0.005 | 3.30 × 10−15 a | −0.097 | 0.161 |
| CEPH | 0.626 | 0.004 | 5.90 × 10−3 a | −0.151 | 0.140 |
| Japanese | 0.644 | 0.005 | 6.60 × 10−8 a | −0.197 | 0.090 |
| Han Chinese | 0.677 | 0.004 | 9.33 × 10−1 | −0.296 | −0.032 |
| Denver Chinese | 0.677 | 0.004 | 1.30 × 10−3 a | −0.265 | −0.028 |
| Yoruba | 0.694 | 0.003 | 8.80 × 10−2 | −0.461 | −0.213 |
| Luhya | 0.701 | 0.003 | <2.2 × 10−16 a | −0.444 | −0.215 |
| All | 0.902 | 0.002 | |||
Rare variant to total variant slope estimates and standard errors are given for each population and for all the populations pooled. The populations are ranked according to slope. Equality of slopes for contiguous populations (i.e., Tuscan vs. CEPH, CEPH vs. Japanese, etc.) is tested, and the p-value for each comparison is given. Finally, the mean values of Tajima’s D for each population calculated on the basis of nonsynonymous and synonymous SNPs are given.
a Significant difference between the population and the one immediately below.
Figure 2Comparison of rare variant density when using synonymous and nonsynonymous SNPs. Synonymous SNPs show greater differentiation between the populations. D. Chinese, Denver Chinese; H. Chinese, Han Chinese.
Figure 3Measures of rare variant density used in rare variant case-control association analysis. Comparison of the distribution of rare variants between the Yoruba and CEPH populations. Each point represents the value for one of the genes for both populations. The identity lines are plotted to aid in the comparison.
Figure 4Values of Tajima’s Genes present in more than one population are presented in color. D. Chinese, Denver Chinese; H. Chinese, Han Chinese.
Figure 5Values of Tajima’s Genes present in more than one population are presented in color. D. Chinese, Denver Chinese; H. Chinese, Han Chinese.
Statistics aggregated across populations
| Gene | Mean Tajima’s | Variance of Tajima’s | Total number of variants | Proportion of rare variants | Proportion of rare variants that are nonsynonymous |
|---|---|---|---|---|---|
| −2.08 | 0.06 | 65.86 | 0.86 | 0.69 | |
| −1.97 | 0.02 | 33.43 | 0.74 | 0.59 | |
| −1.95 | 0.04 | 24.29 | 0.72 | 0.69 | |
| −1.83 | 0.19 | 21.29 | 0.62 | 0.56 | |
| −1.49 | 0.39 | 22.86 | 0.71 | 0.61 | |
| NA | NA | 24.57 | 0.82 | NA | |
| −1.72 | 0.22 | 9.43 | 0.82 | 0.75 | |
| −1.70 | 0.10 | 13.71 | 0.83 | 0.58 | |
| −1.73 | 0.05 | 13.29 | 0.81 | 0.63 | |
| −1.16 | 0.83 | 18.86 | 0.83 | 0.62 | |
| −1.59 | 0.06 | 17.14 | 0.85 | 0.65 | |
| −1.38 | 0.19 | 5.29 | 0.95 | 0.77 | |
| NA | NA | 9.14 | 0.54 | NA | |
| −1.82 | 0.04 | 19.43 | 0.69 | 0.66 | |
| −1.24 | 0.29 | 6.86 | 0.63 | 0.93 | |
| −1.64 | 0.04 | 8.00 | 0.90 | 0.77 | |
| Average | −1.66 | 0.18 | 19.59 | 0.77 | 0.68 |
| −1.20 | 0.36 | 36.71 | 0.67 | 0.63 | |
| −1.21 | 0.48 | 14.14 | 0.78 | 0.47 | |
| −1.33 | 0.36 | 0.57 | NA | NA | |
| −1.20 | 0.95 | 54.29 | 0.71 | 0.64 | |
| −1.21 | 0.36 | 18.86 | 0.78 | 0.37 | |
| −1.62 | 0.13 | 18.86 | 0.83 | 0.62 | |
| −0.89 | 1.48 | 21.71 | 0.73 | 0.71 | |
| NA | NA | 3.29 | NA | NA | |
| Average | −1.24 | 0.59 | 21.05 | 0.75 | 0.57 |
Tajima’s D, its variance, total number of polymorphisms, and the proportions of rare and nonsynonymous variants are given for each of the genes that reflect the most negative values of Tajima’s D (less than −2 for the analysis based on nonsynonymous SNPs and less than −1.8 for the analysis based on synonymous SNPs). NA (not applicable) corresponds to instances in which the value is not defined for at least one of the populations.