| Literature DB >> 19917129 |
David J Witherspoon1, W Scott Watkins, Yuhua Zhang, Jinchuan Xing, Whitney L Tolpinrud, Dale J Hedges, Mark A Batzer, Lynn B Jorde.
Abstract
BACKGROUND: Recombination rates vary widely across the human genome, but little of that variation is correlated with known DNA sequence features. The genome contains more than one million Alu mobile element insertions, and these insertions have been implicated in non-homologous recombination, modulation of DNA methylation, and transcriptional regulation. If individual Alu insertions have even modest effects on local recombination rates, they could collectively have a significant impact on the pattern of linkage disequilibrium in the human genome and on the evolution of the Alu family itself.Entities:
Mesh:
Year: 2009 PMID: 19917129 PMCID: PMC2785838 DOI: 10.1186/1471-2164-10-530
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1A typical genomic region surrounding a focal . Estimates of the recombination rate parameter ρ (log10 scale) are shown for the eleven inter-SNP intervals. The sixth ρ -estimate (labeled ρAlu) is for the interval containing the AluY, which has the highest recombination rate in this particular region. The positions of the 12 SNPs chosen for analysis are shown relative to the center of the AluY; other SNPs in the region are indicated by small tick marks. This region spans ~45 kb on chromosome 7, centered on the AluY at 32,081,567 bp (UCSC hg18; [35]).
Tests of effects of fixed and polymorphic AluY insertions on local recombination rates. P-values for the significance of the AluY presence variable are italicized.
| Total Intervalsb | Africa | Europe | East Asia | ||
|---|---|---|---|---|---|
| Polymorphic | 5 | 30 | |||
| Fixed | 14 | 99 | |||
| # individualsa | 138 to 147 | 96 to 108 | 67 to 74 | ||
a Number of individuals used to infer recombination rates, minimum to maximum (number varies between regions because of individuals removed due to missing data).
b Total number of inter-SNP intervals analyzed for a data set.
Means and standard deviations of the regression variables, by data set.
| Data Set | Numbers of Regions (Intervals) | Interval log10( | Interval length, bp (s.d.) | Regional log10( | Interval G+C (s.d.) | Core motif count | Ext. motif count | Hot spot (s.d.) | ||
|---|---|---|---|---|---|---|---|---|---|---|
| Diversity panel: Africa, fixed | 14 | -3.34 | 5,240 | -3.44 | 0.383 | 1.00 | 0.323 | 0.131 | 0.141 | 291 |
| (99) | (1.09) | (2,840) | (0.705) | (0.0664) | (1.55) | (0.636) | (0.339) | (0.35) | (48.4) | |
| HapMap YRI, fixed | 6,235 | -3.60 | 4,210 | -3.43 | 0.403 | 1.18 | 0.358 | 0.142 | 0.143 | 301 |
| (43,645) | (0.844) | (1,270) | (0.837) | (0.0573) | (1.56) | (0.735) | (0.349) | (0.350) | (13.3) | |
| HapMap CEU, fixed | 5,344 | -4.13 | 4,210 | -3.96 | 0.403 | 1.18 | 0.357 | 0.142 | 0.143 | 301 |
| (37,408) | (0.981) | (1,230) | (1.03) | (0.0570) | (1.56) | (0.741) | (0.349) | (0.350) | (13.1) | |
a For each interval, the regional log10(ρ) is the weighted average taken over all intervals in the region, excluding that focal interval.
b AluY length statistics are given for descriptive purposes. Only the presence or absence of an AluY is used as a regression variable.
Stepwise linear regression results (effect size coefficient, standard error, and p-value) for each variable, by data set. P-values < 10-50 are shown as 0.
| Data Set | Interval length | Regional log10( | Interval G+C | Core motif | Extended motif | Hot spot | ||
|---|---|---|---|---|---|---|---|---|
| Diversity panel: Africa, fixed | Coefficient | -1.07 × 10-7 | 1.29 | -2.69 | -0.0439 | -0.0725 | 0.250 | 0.395 |
| Std. Err. | 2.27 × 10-5 | 0.0919 | 0.978 | 0.0572 | 0.104 | 0.202 | 0.182 | |
| 0.996 | 7.3 × 10-25 | 0.0071 | 0.44 | 0.49 | 0.22 | 0.033 | ||
| HapMap YRI, fixed | Coefficient | -6.17 × 10-6 | 0.824 | 0.625 | 0.00891 | 0.0296 | 0.356 | 0.0221 |
| Std. Err. | 1.66 × 10-6 | 0.00248 | 0.0408 | 0.00146 | 0.00283 | 0.00570 | 0.00579 | |
| 2.0 × 10-4 | 0 | 0 | 9.4 × 10-10 | 0 | 0 | 1.4 × 10-4 | ||
| HapMap CEU, fixed | Coefficient | 2.90 × 10-6 | 0.815 | 0.555 | 0.00837 | 0.0378 | 0.389 | 0.0320 |
| Std. Err. | 1.86 × 10-6 | 0.00221 | 0.0443 | 0.00154 | 0.00304 | 0.00624 | 0.00611 | |
| 0.12 | 0 | 0 | 6.1 × 10-8 | 0 | 0 | 1.7 × 10-7 | ||
Figure 2Size and significance of the effect of . (A) The effect of AluY insertions (linear regression coefficient for the AluY variable) is plotted against the percent divergence of AluY elements (binned into non-overlapping groups of approximately uniform number, i.e., 600-800 elements; divergences taken from RepeatMasker). The red and black lines correspond to results from the HapMap YRI and CEU data sets, respectively. (B) Histogram of AluY element frequencies vs. percent divergence from their respective subfamily consensus sequences. Only elements between 250 and 350 bp long, with no more than 10% of their sequence deleted or composed of non-Alu insertions, were counted. The dark gray histogram shows the distribution of AluY elements that were chosen for regions analyzed in this work (magnified vertically by fivefold for visibility), while the light gray histogram includes all AluY elements. The horizontal axes in both panels are identically scaled and aligned.