| Literature DB >> 17274005 |
Damian Labuda1, Catherine Labbé, Sylvie Langlois, Jean-Francois Lefebvre, Virginie Freytag, Claudia Moreau, Jakub Sawicki, Patrick Beaulieu, Tomi Pastinen, Thomas J Hudson, Daniel Sinnett.
Abstract
It is likely that evolutionary differences among species are driven by sequence changes in regulatory regions. Likewise, polymorphisms in the promoter regions may be responsible for interindividual differences at the level of populations. We present an unbiased survey of genetic variation in 2-kb segments upstream of the transcription start sites of 28 protein-coding genes, characterized in five population groups of different geographic origin. On average, we found 9.1 polymorphisms and 8.8 haplotypes per segment with corresponding nucleotide and haplotype diversities of 0.082% and 58%, respectively. We characterized these segments through different summary statistics, Hardy-Weinberg equilibria fixation index (Fst) estimates, and neutrality tests, as well as by analyzing the distributions of haplotype allelic classes, introduced here to assess the departure from neutrality and examined by coalescent simulations under a simple population model, assuming recombinations or different demography. Our results suggest that genetic diversity in some of these regions could have been shaped by purifying selection and driven by adaptive changes in the other, thus explaining the relatively large variance in the corresponding genetic diversity indices loci. However, some of these effects could be also due to linkage with surrounding sequences, and the neutralists' explanations cannot be ruled out given uncertainty in the underlying demographic histories and the possibility of random effects due to the small size of the studied segments. 2007 Wiley-Liss, Inc.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17274005 PMCID: PMC2683062 DOI: 10.1002/humu.20463
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Diversity Indices and Different Estimators of θ (for a Sample of 80 InitiallyAscertained Chromosomes)
| Fst(%) | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| θ | θπ | θ | θ | θ | θk | Four gamete test | Total sample | Non African sample | |||||||
| BTN3A2 | 9 | 14 | 0.57 | 2032 | 1.09 | 2.84 | 2.20 | 5.62 | 5 | 0.98 | 2.42 | 5 | ns | ns | |
| CAT | 8 | 7 | 0.74 | 2092 | 1.81 | 1.41 | 1.65 | 0.69 | 1 | 2.17 | 2.02 | 1 | (1) | 7.9 | 8.5 |
| CCNDI | 11 | 13 | 0.62 | 1606 | 1.28 | 2.62 | 1.43 | 0.99 | 3 | 1.258 | 3.23 | 1 | (1) | ns | ns |
| CCNEI | 3 | 2 | 0.10 | 844 | 1.25 | 0.40 | 0.10 | 0.00 | 1 | 0.08 | 0.46 | 0 | ns | ns | |
| CDC25A | 7 | 7 | 0.34 | 1664 | 1.69 | 1.41 | 0.38 | 0.01 | 3 | 0.37 | 1.66 | 0 | 7.5 | 8.2 | |
| CDKNIA | 13 | 11 | 0.76 | 1597 | 1.53 | 2.22 | 3.18 | 2.58 | 0 | 2.42 | 4.16 | 2 | + | 3.7 | ns |
| CDKNIB | 11 | 10 | 0.78 | 2012 | 1.29 | 2.02 | 1.60 | 0.42 | 2 | 2.74 | 3.23 | 1/0 | 7.5 | 6.9 | |
| CDKN2A | 7 | 8 | 0.35 | 2069 | 0.90 | 1.61 | 0.58 | 0.03 | 2 | 0.41 | 1.66 | 0 | 7.1 | 6.7 | |
| CX3CRI | 17 | 16 | 0.86 | 1987 | 1.25 | 3.23 | 4.03 | 6.07 | 2 | 5.01 | 6.32 | 6 | + | 11.4 | 7.8 |
| E2F1 | 5 | 4 | 0.43 | 1925 | 1.09 | 0.81 | 0.48 | 0.08 | 2 | 0.55 | 1.01 | 0 | ns | ns | |
| FEN1 | 6 | 5 | 0.55 | 1992 | 0.86 | 1.01 | 0.61 | 0.30 | 3 | 0.91 | 1.32 | 0 | 18.7 | 9.8 | |
| FGB | 11 | 13 | 0.80 | 1951 | 0.91 | 2.62 | 2.06 | 1.72 | 5 | 3.09 | 3.23 | 1 | 11.4 | 9.9 | |
| GPX2 | 17 | 16 | 0.74 | 2077 | 0.90 | 3.23 | 2.27 | 2.98 | 6 | 2.17 | 6.32 | 2 | + | 15.1 | 18.5 |
| GPX3 | 10 | 11 | 0.64 | 2157 | 1.05 | 2.25 | 2.61 | 7.61 | 2 | 1.31 | 2.89 | 6 | + | 7.4 | ns |
| GSS | 9 | 9 | 0.73 | 1931 | 1.26 | 1.82 | 1.59 | 2.45 | 3 | 2.11 | 2.40 | 2 | 3.4 | ns | |
| GSTM3 | 7 | 6 | 0.70 | 1951 | 1.09 | 1.21 | 1.38 | 3.99 | 2 | 1.82 | 1.66 | 3 | (1) | 9.3 | 6.4 |
| GSTM4 | 9 | 10 | 0.73 | 1780 | 1.11 | 2.02 | 2.14 | 1.59 | 2 | 2.05 | 2.40 | 1 | ns | ns | |
| GSTPI | 12 | 13 | 0.71 | 2060 | 1.38 | 2.62 | 4.11 | 5.10 | 0 | 1.89 | 3.68 | 5 | + | 5.7 | 2.1 |
| HDACI | 11 | 12 | 0.67 | 2029 | 1.55 | 2.42 | 1.03 | 2.39 | 5 | 1.52 | 3.23 | 2 | 18.6 | ns | |
| HTR2A | 13 | 18 | 0.71 | 2053 | 1.58 | 3.63 | 2.39 | 6.84 | 6 | 1.83 | 4.16 | 5 | + | 1.9 | ns |
| IL1A | 4 | 3 | 0.67 | 2008 | 1.26 | 0.60 | 0.99 | 0.31 | 0 | 1.58 | 0.73 | 0 | 13.5 | 4.7 | |
| MICA | 17 | 15 | 0.87 | 2164 | 2.12 | 3.04 | 1.61 | 0.26 | 5 | 5.38 | 6.41 | 0 | (1) | 3.5 | 3.9 |
| RBI | 6 | 4 | 0.44 | 1963 | 0.98 | 1.01 | 0.53 | 0.06 | 1 | 0.59 | 1.32 | 0 | 3.9 | ns | |
| SKP2 | 3 | 2 | 0.10 | 1934 | 1.00 | 0.40 | 0.10 | 0.00 | 0 | 0.08 | 0.46 | 0 | 3.7 | ns | |
| SMAD3 | 5 | 5 | 0.72 | 1366 | 1.53 | 1.01 | 1.32 | 0.32 | 0 | 1.94 | 1.01 | 0 | ns | ns | |
| SMAD4 | 2 | 2 | 0.03 | 1725 | 1.03 | 0.40 | 0.05 | 0.00 | 2 | 0.02 | 0.22 | 0 | 7.7 | ns | |
| TFDPI | 4 | 8 | 0.28 | 939 | 1.92 | 1.62 | 0.83 | 0.05 | 0 | 0.29 | 0.73 | 0 | 10.6 | 7.0 | |
| TGFBI | 9 | 9 | 0.62 | 2012 | 1.31 | 1.82 | 2.18 | 1.74 | 4 | 1.25 | 2.42 | 2 | ns | ns | |
Number of mutational steps from ancestral to the observed major haplotype (i.e. allelic class of the most frequent haplotype)
(1) indicates the presence of only one recombinant haplotype (see Supplementary Fig.1).
Level of significance as from ARLEQUIN; FSTvalues were calculated for an extended sample of 80 genotyped individuals.
P<0.001
P<0.01
P<0.001
ns,not significant.
Neutrality Tests*
| Observed values | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| π(%) | S | k | hom | D(p) | kexp | kexp( | homexp(pw;ps) | |||
| BTN3A2 | 0.108 | 14 | 9 | 0.43 | −3.57 | 8.4 | 4.9 | 0.28 | ||
| (0.050) | ns | (0.031) | ns | |||||||
| CCNDI | 0.090 | 13 | 11 | 0.38 | 6.4 | 0.23 | ||||
| ns | (ns;0.970) | |||||||||
| CDC25A | 0.023 | 7 | 7 | 0.67 | − | 2.7 | 0.36 | |||
| − | (0.970;ns) | |||||||||
| CDKN2A | 0.027 | 8 | 7 | 0.65 | − | 0.36 | ||||
| − | (0.967;ns) | |||||||||
| GPX2 | 0.110 | 16 | 17 | 0.27 | 0.14 | |||||
| − | 0.14 | |||||||||
| GPX3 | 0.121 | 11 | 10 | 0.36 | −5.96 | 9.4 | 5.9 | 0.25 | ||
| (0.009) | ns | (0.042) | ns | |||||||
| GSTM3 | 0.068 | 6 | 7 | 0.33 | −3.32 | 6.2 | 7.5 | 0.36 | ||
| (0.013) | ns | ns | ns | |||||||
| GSTPI | 0.200 | 13 | 12 | 0.30 | 12.9 | 7.7 | 0.21 | |||
| ns | (0.053) | (ns;0.965) | ||||||||
| HDACI | 0.051 | 12 | 11 | 0.34 | −1.57 | 5.1 | 6.6 | 0.23 | ||
| (0.026) | −5.36/(0.0095) | (0.040) | (ns;0.976) | |||||||
| HTR2A | 0.117 | 18 | 13 | 0.30 | −4.53 | 9.0 | 0.19 | |||
| (0.036) | ns | (ns;0.968) | ||||||||
| II1A | 0.050 | 3 | 4 | 0.33 | 4.9 | 6.7 | 0.55 | |||
| ns | (0.946) | (ns;0.043) | ||||||||
| MICA | 0.062 | 15 | 17 | 0.14 | 6.8 | 15.2 | 0.14 | |||
| −9.84/(0.000) | ns | ns | ||||||||
| SKP2 | 0.005 | 2 | 3 | 0.90 | −1.21 | 1.5 | 1.4 | 0.66 | ||
| (0.041) | −2.41/(0.010) | ns | ns | |||||||
| SMAD4 | 0.003 | 2 | 2 | 0.97 | − | 1.3 | 1.1 | 0.80 | ||
| ns | ns | ns | ||||||||
| TGFBI | 0.109 | 9 | 9 | 0.38 | 8.4 | 5.7 | 0.29 | |||
| ns | ns | (ns;0.982) | ||||||||
Numbers in bold are the results that are significant after correction formultiple testing.
Exceptionally, the values for the Fay and Wu [2000] test are based on a sample of 160 chromosomes.
ns, not significant.
FIGURE 1Distributions of allelic frequency classes (left panels) of frequencies of haplotypes [Middleton et al.,1993] and haplotype allelic classes (right) inCDC25A,CX3CR1, andGSRM3. Bars represent the observed values; lines represent theoretical distributions. The occupancy of allelic frequency classes corresponds to counts of sites represented by i new alleles in a sample of n chromosomes (i=1, 2, 3,…, n–1). Here, the theoretical curve (solid line) corresponds to the distribution calculated from the equation [Fan et al., 2002; Fu,1997] Si(i)Θ/i,using Θ/π (Table 1) as the estimator of Θ.The theoretical distribution (solid line) of haplotype frequencies expected given k observed haplotypes (Table 1) is according to Ewens [1972]. Haplotype names are arbitrary and correspond to their names in our database. In the case of haplotype allelic classes, regrouping haplotypes sharing the same number of mutations from the ancestral haplotype, their theoretical occupancy was obtained by coalescent simulation under the standard model, assuming constant population size without (solid line) and with (dotted line) recombination, at 10-fold the genomic average in the case of segments where crossovers were detected.
FIGURE 2Frequencies of the ancestral haplotypes (A) and the distribution of major haplotypes among haplotype allelic classes (B) (cf. Fig.1) in 28 studied segments.The data (solid bars) are compared with theoretical expectations fromcoalescent simulations under the standard model in the absence (solid line) and in the presence (dashed line) of recombination at 10-fold the genomic average (10 cm/Mb). Simulations were for a sample size of 80 chromosomes, a mutation rate of 2.13×10−8 per bp per generation,corresponding to the average S density of 9.1, and N =10,000.