| Literature DB >> 20502635 |
Sol Katzman1, Andrew D Kern, Katherine S Pollard, Sofie R Salama, David Haussler.
Abstract
Regions of the genome that have been the target of positive selection specifically along the human lineage are of special importance in human biology. We used high throughput sequencing combined with methods to enrich human genomic samples for particular targets to obtain the sequence of 22 chromosomal samples at high depth in 40 kb neighborhoods of 49 previously identified 100-400 bp elements that show evidence for human accelerated evolution. In addition to selection, the pattern of nucleotide substitutions in several of these elements suggested an historical bias favoring the conversion of weak (A or T) alleles into strong (G or C) alleles. Here we found strong evidence in the derived allele frequency spectra of many of these 40 kb regions for ongoing weak-to-strong fixation bias. Comparison of the nucleotide composition at polymorphic loci to the composition at sites of fixed substitutions additionally reveals the signature of historical weak-to-strong fixation bias in a subset of these regions. Most of the regions with evidence for historical bias do not also have signatures of ongoing bias, suggesting that the evolutionary forces generating weak-to-strong bias are not constant over time. To investigate the role of selection in shaping these regions, we analyzed the spatial pattern of polymorphism in our samples. We found no significant evidence for selective sweeps, possibly because the signal of such sweeps has decayed beyond the power of our tests to detect them. Together, these results do not rule out functional roles for the observed changes in these regions-indeed there is good evidence that the first two are functional elements in humans-but they suggest that a fixation process (such as biased gene conversion) that is biased at the nucleotide level, but is otherwise selectively neutral, could be an important evolutionary force at play in them, both historically and at present.Entities:
Mesh:
Year: 2010 PMID: 20502635 PMCID: PMC2873926 DOI: 10.1371/journal.pgen.1000960
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
MWU test significant regions.
| region | offset | p-value | compare | p-value with mask | recomb | telo | ||||||
| ctl | sea | 500 | 1kb | 5kb | 10kb | 5kbHi | avg | male | ||||
| harseq1* | +1.25 | 0.03562 | 0 | 3 | 0.03562 | 0.07069 | 0.29830 | 0.72170 | 0.29830 | 0.00 | 0.00 | 0 |
| harseq18 | +3.00 | 0.02016 | 0 | 2 | 0.08212 | 0.11110 | 0.10930 | 0.19430 | 0.10930 | 1.70 | 1.39 | 19 |
| harseq20 | +1.09 | 0.01450 | 0 | 2 | 0.00892 | 0.01128 | 0.00861 | 0.01574 | 0.18860 | 1.21 | 1.56 | 6 |
| harseq21 | +2.00 | 0.00090 | 0 | 0 | 0.00090 | 0.00090 | 0.00086 | 0.00138 | 0.00510 | 3.43 | 4.29 | 24 |
| harseq25 | +1.65 | 0.00627 | 0 | 1 | 0.00553 | 0.00826 | 0.00694 | 0.01436 | 0.04611 | 1.81 | 0.39 | 2 |
| harseq27 | +1.16 | 0.01303 | 0 | 2 | 0.01496 | 0.01496 | 0.01932 | 0.12340 | 0.06593 | 1.99 | 1.84 | 2 |
| harseq32 | +0.22 | 0.02369 | 0 | 2 | 0.03274 | 0.03140 | 0.03333 | 0.01323 | 0.15330 | 2.04 | 1.94 | 6 |
| harseq34 | +1.65 | 0.00014 | 0 | 0 | 0.00026 | 0.00030 | 0.00258 | 0.00167 | 0.00258 | 1.42 | 0.00 | 3 |
| harseq35 | +1.00 | 0.04483 | 0 | 3 | 0.04483 | 0.03461 | 0.00175 | 0.01625 | 0.46510 | 2.22 | 1.71 | 4 |
| harseq43 | +1.00 | 0.01237 | 0 | 2 | 0.01237 | 0.00721 | 0.01772 | 0.05556 | 0.08562 | 0.28 | 0.14 | 15 |
| harseq46 | +1.10 | 0.02302 | 0 | 2 | 0.01439 | 0.02300 | 0.00568 | 0.00189 | 0.34770 | 0.37 | 0.44 | 15 |
Target regions with most significant p-values for Mann-Whitney U test distinguishing the frequency spectra of weak-to-strong (W2S) from strong-to-weak (S2W) mutations. Starred regions are also significant in the MK test (Table 2). offset: the estimated offset of the two spectra, normalized to 22 chromosomal samples, with positive values indicating W2S mutations shifted to higher derived allele frequencies. compare: the number of regions in 13 ctlreg50-62 (ctl) or 62 Seattle SNPs (sea) genic regions with a more significant p-value and positive W2S offset. p-value with mask columns are the p-values derived by omitting all segregating sites at the indicated number of bases from the center of the target region. The 5kbHi column gives the highest (least significant) p-value obtained by omitting all segregating sites in a set of overlapping 5kb windows centered at each 2.5kb in the target region. recomb: sex-averaged and male recombination rates from deCODE 1Mb regions. telo: distance of the region from the closer telomere measured in number of karyotype bands, where 0 is the telomeric band.
MK test significant regions.
| region | p-value | S2W? | compare | p-value with mask | recomb | telo | ||||||
| ctl | sea | 500 | 1kb | 5kb | 10kb | 5kbHi | avg | male | ||||
| harseq1* | 0.00015 | . | 0 | 0 | 0.04230 | 0.03905 | 0.02225 | 0.00994 | 0.02225 | 0.00 | 0.00 | 0 |
| harseq5 | 0.01073 | . | 0 | 5 | 0.00638 | 0.01267 | 0.00623 | 0.00378 | 0.14533 | 1.69 | 3.39 | 0 |
| harseq9 | 0.02186 | . | 0 | 8 | 0.01548 | 0.01927 | 0.01493 | 0.01402 | 0.15412 | 2.38 | 0.33 | 7 |
| harseq11 | 0.00079 | . | 0 | 1 | 0.00034 | 0.00014 | 0.00122 | 0.00080 | 0.00651 | 2.10 | 0.00 | 10 |
| harseq19 | 0.00394 | . | 0 | 3 | 0.00660 | 0.02085 | 0.06709 | 0.09509 | 0.07281 | 0.34 | 0.05 | *1 |
| harseq22 | 0.04702 | . | 0 | 9 | 0.06066 | 0.05761 | 0.10223 | 0.10821 | 0.14642 | 1.11 | 1.36 | 3 |
| harseq29 | 0.04060 | . | 0 | 8 | 0.03907 | 0.05275 | 0.13001 | 0.43571 | 0.24840 | 2.52 | 3.58 | 0 |
| harseq36 | 0.02390 | . | 0 | 8 | 0.03300 | 0.02129 | 0.09404 | 0.13464 | 0.12102 | 0.34 | 0.05 | *0 |
| harseq38 | 0.00050 | . | 0 | 1 | 0.00166 | 0.00145 | 0.00073 | 0.00259 | 0.01245 | 2.71 | 2.60 | 1 |
| harseq39 | 0.01470 | S2W | na | na | 0.01957 | 0.03485 | 0.04600 | 0.04711 | 0.11567 | 1.99 | 1.37 | 2 |
| harseq42 | 0.00383 | . | 0 | 3 | 0.00509 | 0.00512 | 0.02561 | 0.00465 | 0.02798 | 3.12 | 5.90 | 0 |
Target regions with most significant p-values for McDonald-Kreitman-like test distinguishing the fixed substitutions on the chimp or human lineage from the mutations at segregating sites, comparing weak-to-strong and strong-to-weak mutations. Starred regions are also significant in the MWU test (Table 1). S2W indicates that the fixed differences are biased in the direction of strong-to-weak mutations. Other headings as in Table 1. Starred telo values are measured from the chromosome 2 fusion of ancestral telomeres presumed at human chr2:q14.1.
Figure 1Frequency offset of weak-to-strong versus strong-to-weak mutations.
MWU test is strongly significant for harseq21 (upper panel) and harseq34 (lower panel). This reflects the fact that the derived allele frequency spectrum for weak-to-strong mutations (dark bars) is offset towards higher frequencies compared to strong-to-weak mutations (light bars). N: count of segregating sites of the indicated category in the region.
SweepFinder hits.
| region | p-value | recomb | karyo | ||
| HAR bgrnd | Seasnp bgrnd | avg | male | karyo | |
| harseq9 | 0.049 | 0.048 | 2.383 | 0.332 | chr20.q12 |
| harseq11 | 0.050 | 0.043 | 2.103 | 0 | chrX.p21.1 |
| harseq16 | 0.052 | 0.055 | 0.803 | 0.679 | chr2.q22.3 |
| harseq24 | 0.113 | 0.035 | 0.724 | 0.247 | chr7.p15.2 |
| harseq25 | 0.026 | 0.036 | 1.807 | 0.387 | chr4.q34.3 |
Target regions with most significant p-values for SweepFinder. HAR bgrnd, Seasnp bgrnd refer to null model background frequency spectrum derived from all target regions or SeattleSNPs respectively (see Materials and Methods). recomb columns are the sex-averaged or male only recombination rates at the target region. karyo is the chromosomal karyotype band where the target region is found.