| Literature DB >> 22057783 |
Min Hu1, Qasim Ayub, José Afonso Guerra-Assunção, Quan Long, Zemin Ning, Ni Huang, Irene Gallego Romero, Lira Mamanova, Pelin Akan, Xin Liu, Alison J Coffey, Daniel J Turner, Harold Swerdlow, John Burton, Michael A Quail, Donald F Conrad, Anton J Enright, Chris Tyler-Smith, Yali Xue.
Abstract
We have investigated whether regions of the genome showing signs of positive selection in scans based on haplotype structure also show evidence of positive selection when sequence-based tests are applied, whether the target of selection can be localized more precisely, and whether such extra evidence can lead to increased biological insights. We used two tools: simulations under neutrality or selection, and experimental investigation of two regions identified by the HapMap2 project as putatively selected in human populations. Simulations suggested that neutral and selected regions should be readily distinguished and that it should be possible to localize the selected variant to within 40 kb at least half of the time. Re-sequencing of two ~300 kb regions (chr4:158Mb and chr10:22Mb) lacking known targets of selection in HapMap CHB individuals provided strong evidence for positive selection within each and suggested the micro-RNA gene hsa-miR-548c as the best candidate target in one region, and changes in regulation of the sperm protein gene SPAG6 in the other.Entities:
Mesh:
Year: 2011 PMID: 22057783 PMCID: PMC3325425 DOI: 10.1007/s00439-011-1111-9
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Fig. 1Simulation design. Dotted boxes represent simulated haplotype samples; the star indicates the presence of a positively selected SNP. Arrows show the performance of the analyses described in the oval boxes
Fig. 2Simulation results. a Simulations were carried out under neutrality, and tests for selection [−ln combined p values for Tajima’s D and Fay and Wu’s H (top) or Nielsen et al.’s CLR (bottom)] were calculated in non-overlapping 10 kb windows across 300 kb. Values of the test were averaged over 16 independent neutral simulations that passed the XP-EHH filter. No departures from neutrality were seen. b 1,752 simulations with selection (selection coefficient 0.001, 0.004, 0.007, 0.01) that passed the XP-EHH filter and neutrality tests were averaged as in a. Departures from neutrality are seen most strongly in the window containing the selected SNP. c. The distribution of the top signal (lowest combined p value) or highest CLR in each simulation is shown across the 300-kb region. d. Probability that the known selected variant is found at each distance from the peak test value
Fig. 3Experimental results: localization of likely selection targets in the chr4 and chr10 regions. a. -log e of combined p values from Tajima’s D and Fay and Wu’s H (top) and Nielsen et al.’s CLR (bottom) calculated from re-sequencing data in windows corresponding to two or three PCR fragments (10–20 kb). The most significant statistics are shown in red, and fall into the same window at ~158.98 Mb (blue highlight). b Corresponding analysis of the chr10:22Mb region, where the most significant signals again fall into the same window, this time at ~22.78 Mb. c, d. Protein-coding genes from the Vega annotation, non-coding RNA and miRNA genes, and relevant ENCODE chromatin modifications in the two regions. e. Predicted miRNA in the chr4:158Mb target region. Two SNPs are present, including a G > A at the end of the miRNA carried on the major haplotype (49/50 chromosomes, selected in CHB) that may influence the strand forming the mature miRNA. f. H3K4me1 chromatin modifications indicating enhancer regions in GM12878 (second) and K562 (third) cells, SNPs with high derived allele frequencies (fourth), predicted regulatory potential (fifth) and 28 species conservation (bottom). Three high-frequency derived SNPs lie within candidate enhancers in one or other of the cell lines, but high-frequency derived SNPs do not lie within regions with high predicted regulatory potential or conservation
Fig. 4Localization of the signal of selection within the chr4 and chr10 regions using different approaches. The two starting regions are shown at the top (Sabeti et al. 2007), localizations using sequence data (gray bars) or HapMap2 genotype data (white bars) by this study in the middle, and the localization by the CMS statistic (Grossman et al. 2010 or this work) at the bottom