| Literature DB >> 15606996 |
Eric de Silva1, Lawrence A Kelley, Michael P H Stumpf.
Abstract
We have studied the recombination rate behaviour of a set of 140 genes which were investigated for their potential importance in inflammatory disease. Each gene was extensively sequenced in 24 individuals of African descent and 23 individuals of European descent, and the recombination process was studied separately in the two population samples. The results obtained from the two populations were highly correlated, suggesting that demographic bias does not affect our population genetic estimation procedure. We found evidence that levels of recombination correlate with levels of nucleotide diversity. High marker density allowed us to study recombination rate variation on a very fine spatial scale. We found that about 40 per cent of genes showed evidence of uniform recombination, while approximately 12 per cent of genes carried distinct signatures of recombination hotspots. On studying the locations of these hotspots, we found that they are not always confined to introns but can also stretch across exons. An investigation of the protein products of these genes suggested that recombination hotspots can sometimes separate exons belonging to different protein domains; however, this occurs much less frequently than might be expected based on evolutionary studies into the origins of recombination. This suggests that evolutionary analysis of the recombination process is greatly aided by considering nucleotide sequences and protein products jointly.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15606996 PMCID: PMC3500195 DOI: 10.1186/1479-7364-1-6-410
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Figure 1Potential explanation for why recombination at the DNA sequence level may be under evolutionary pressure resulting from selection at the protein level. If a protein has two variable sites in different domains, then, if the relative viabilities of the possible combinations change over time, recombination between exons carrying the different variants may be evolutionarily advantageous.
Heuristic assignment of genes to the seven classes of observed recombination properties outlined in the 'Materials and methods' section.
| Class | Gene name |
|---|---|
The number of genes in each class is given in brackets.
Statistical correlations of various summary statistics concerning the data between the two populations.
| Comparison | Spearman's | Kendall's |
|---|---|---|
| Recombination distances | 0.59 | 0.75 |
| Average recombination rates | 0.50 | 0.66 |
| Heterozygosities | 0.19 | 0.28 |
| Nucleotide diversities | 0.55 | 0.71 |
| Tajima's | 0.20 | 0.29 |
| Number of non-synonymous polymorphisms | 0.66 | 0.75 |
| Number of synonymous polymorphisms | 0.67 | 0.74 |
All correlation coefficients are statistically significant. Note, in particular, that the average recombination rates across the genes correlate well between the two populations.
Inferred correlations between estimated average recombination rates and recombination distances (in brackets) with heterozygosities, nucleotide diversities, GC content, Tajima's D statistic and the numbers of non-synonymous and synonymous polymorphisms, in the two populations as measured using Spearman's ρ and Kendall's τ statistics.
| Test statistic | African-derived population sample | European-derived population sample | ||
|---|---|---|---|---|
| Spearman's | Kendall's | Spearman's | Kendall's | |
| Heterozygosity | - 0.11 (- 0.04) | - 0.08 (- 0.03) | 0.10 (0.09) | 0.07 (0.06) |
| Nucleotide diversity | ||||
| GC-content | ||||
| Tajima's | 0.04 (0.07) | 0.04 (0.07) | 0.09 (0.09) | 0.05 (0.06) |
| Number of non-synonymous | 0.13 ( | 0.10 ( | ||
| Number of synonymous | 0.01 (0.11) | 0.01 (0.08) | -0.12 (-0.05) | - 0.09 (- 0.04) |
Correlations which differ significantly from 0 (at the 5 per cent levels) are highlighted in bold.
Figure 2Illustrations of possible recombination rate profiles used in the heuristic analysis of intragenic recombination rate variation. Based on the observed behaviour, we divided the 140 genes under consideration into one of the seven different classes depicted here.
Figure 3(a) Inferred recombination distances and (b) average rates across genes and their flanking regions in the African-derived sample compared with the European-derived sample. We found a strong correlation between the estimated recombination properties in the two populations [(a) Spearman's ρ = 0.75, Kendall's τ = 0.59; and (b) Spearman's ρ = 0.66, Kendall's τ = 0.50, with p values less than 10-10 in all cases]. Note that in (b), the correlation still holds with the removal of the righthandmost point.
Figure 4Average inferred recombination rates versus levels of: (a) nucleotide diversity, (b) heterozygosity, (c) values of Tajima's D statistic, (d) average GC content and (e) and (f) the numbers of non-synonymous and synonymous polymorphisms, respectively. The results obtained from the African-derived sample are shown in •, and those from the European-derived sample in •.
Figure 5Recombination rate profiles for four different genes. The results obtained from the African-derived sample are shown in grey, those from the European-derived sample are in black. The solid lines indicate the mean values for the local values of ρ, while the dashed lines show the 2.5 and 97.5 percentiles obtained from the recombination rate estimator. In each case, the profiles of the quantiles are very similar to those of the average behaviour; all curves show comparable levels of recombination rate variation. Note that the recombination profiles in the two populations are vertically shifted, but areas of local change occur in similar positions across the genes.
Figure 6The sequence of the gene http://pga.gs.washington.edu, and the assignment of the exons to the inferred protein folds. The positions of exons along the gene are indicated in dark grey. The shaded region denotes the position of the inferred hotspot (in both populations). Exons 1 to 4 lie 5' of the hotspot, while exons 5 and 6 are on the 3' side. Below the sequence are the assignments of exonic DNA to the inferred protein folds. Exons 1 to 3 belong to the first fold (1FLTv), while DNA in exons 5 and 6 code for protein parts assigned to the second fold (1VGH). We note that in the case of vegf, no nonsynonymous polymorphism was detected in any of the exons.