| Literature DB >> 25990878 |
Y Ma1,2, X Ding1, S Qanbari2, S Weigend3, Q Zhang1, H Simianer2.
Abstract
Identifying signatures of recent or ongoing selection is of high relevance in livestock population genomics. From a statistical perspective, determining a proper testing procedure and combining various test statistics is challenging. On the basis of extensive simulations in this study, we discuss the statistical properties of eight different established selection signature statistics. In the considered scenario, we show that a reasonable power to detect selection signatures is achieved with high marker density (>1 SNP/kb) as obtained from sequencing, while rather small sample sizes (~15 diploid individuals) appear to be sufficient. Most selection signature statistics such as composite likelihood ratio and cross population extended haplotype homozogysity have the highest power when fixation of the selected allele is reached, while integrated haplotype score has the highest power when selection is ongoing. We suggest a novel strategy, called de-correlated composite of multiple signals (DCMS) to combine different statistics for detecting selection signatures while accounting for the correlation between the different selection signature statistics. When examined with simulated data, DCMS consistently has a higher power than most of the single statistics and shows a reliable positional resolution. We illustrate the new statistic to the established selective sweep around the lactase gene in human HapMap data providing further evidence of the reliability of this new statistic. Then, we apply it to scan selection signatures in two chicken samples with diverse skin color. Our analysis suggests that a set of well-known genes such as BCO2, MC1R, ASIP and TYR were involved in the divergent selection for this trait.Entities:
Mesh:
Year: 2015 PMID: 25990878 PMCID: PMC4611237 DOI: 10.1038/hdy.2015.42
Source DB: PubMed Journal: Heredity (Edinb) ISSN: 0018-067X Impact factor: 3.821
Parameter settings varied in the simulated selection scenarios
| Selection coefficient s | 0.005, 0.01, |
| Allele frequency | 0.2, 0.4, 0.6, |
| Sample size | 10, 30, |
| Marker distance | 0.1, 0.5, |
Reference values are underlined.
The absolute values of correlation coefficient of the eight statistical methods were under the null hypothesis (upper triangular) and chicken data (lower triangular), respectively
| XPEHH | 0.04 | 0.03 | 0.00 | 0.05 | 0.03 | 0.04 | 0.05 | |
| XPCLR | 0.03 (0.03) | 0.01 | 0.00 | 0.01 | 0.01 | 0.01 | 0.10 | |
| |iHS| | 0.13 (0.08) | 0.01 (0.02) | 0.02 | 0.05 | 0.07 | 0.04 | 0.01 | |
| CLR | 0.03 (0.02) | 0.00 (0.01) | 0.01 (0.02) | 0.03 | 0.03 | 0.03 | 0.00 | |
| Tajima D | 0.16 (0.13) | 0.05 (0.05) | 0.18 (0.16) | 0.17 (0.19) | 0.61 | 0.75 | 0.02 | |
| FuLi D | 0.03 (0.01) | 0.03 (0.02) | 0.17 (0.13) | 0.12 (0.10) | 0.07 (0.18) | 0.97 | 0.01 | |
| FuLi F | 0.11 (0.09) | 0.01 (0.02) | 0.04 (0.03) | 0.20 (0.20) | 0.82 (0.80) | 0.63 (0.73) | 0.01 | |
| FST | 0.03 (0.03) | 0.23 (0.26) | 0.05 (0.07) | 0.01 (0.02) | 0.02 (0.03) | 0.07 (0.04) | 0.03 (0.01) |
Abbreviations: CLR, composite likelihood ratio; iHS, integrated haplotype score; XPCLR, cross-population composite likelihood ratio; XPEHH, cross population extended haplotype homozogysity.
Note: the correlation coefficients in lower triangular were calculated using those statistic which deleted all loci located at the top 5% quantile in any of the used statistics. Correspondingly, the scores out and in bracket represent the correlation coefficients in yellow skin population and white skin population, respectively. The absolute values of correlation coefficient of the eight statistical methods in human population were displayed in Supplementary Table S6.
Figure 1Power of eight different selection signature test statistics and the novel combining strategy when varying four different parameters: (a) Marker interval distance; (b) frequency of the selected allele; (c) sample size; (d) selection coefficient. The selected scenarios in simulation data were treated as observed population in all methods and the neutral (or no selection) scenarios was treated as reference population when the between-population was performed.
Figure 2Selection signature detected by DCMS in (a) Chromosome 2 in human HapMap data in the analysis of the CEU population vs the ASW population, (b) Chromosome 24 in the comparison of yellow skin vs white skin populations. The y axis reflects the −log (P-values). The red dashed line in (a) marks the location of the LCT gene in the human genome, and the red dashed line in (b) marks the location of the BCO2 gene in the chicken genome. The deep-colored symbols represent the P-value of statistical scores for each statistic less than 1%.
Figure 3Observed values of the eight test statistics and the combining strategy in two replicates of the simulated reference scenario. The red dashed lines indicate the position of the SNP under selection. In the left column, the statistic was calculated between the selected population and a no selection population (Sel vs noSel), while in the right column, both populations were under selection (Sel_1 vs Sel_2). The deep colored symbols represent the top 1% quantile of statistical scores for each statistic.
Figure 4Heat map of the empirical power (in per cent) of eight different selection signature test statistics and the novel combining strategy in 50 kb intervals. The simulated scenario was s=0.02, N=50, d=0.1 kb and P=1.0 (for |iHS|, P=0.8). The middle of this graph indicates the position of the SNP under selection. The clustering of the test statistics is indicated on the left margin for eight used methods.
Figure 5Comparison of the novel combing strategy DCMS and alternative combining methods CSS (Randhawa ) and meta-SS (Utsunomiya ) when varying four different parameters: (a) Marker interval distance; (b) frequency of the selected allele; (c) sample size; (d) selection coefficient.
A partial list of candidate genes revealed by DCMS analysesa
| P | |||||
|---|---|---|---|---|---|
| 1 | 145325;145275; | 0.011;0.037; | Y;W; | Pathway: Melanogenesis, organism-specific biosystem, Pigmentation ( | |
| 1 | 187125; | 0.017; | Y; | Melanogenesis ( | |
| 3 | 67325;65875; | <0.001;0.047; | Y;W; | Pigmentation ( | |
| 5 | OTX2 | 51375; | 0.003; | W; | The developing retinal pigment epithelium ( |
| 11 | MC1R | 18275;18275; | <0.001;0.044; | Y;W; | The regulation of coat, skin and feather color ( |
| 12 | MITF | 15375;15425; | 0.008;0.021; | Y;W; | Regulating the differentiation of pigment cells ( |
| 20 | ASIP | 1425; | <0.001; | Y; | Pigmentation in skin (yellow) and feather ( |
| 20 | EDN3 | 11175; | 0.003; | Y; | Promoting melanoblast proliferation ( |
| 24 | BCO2 | 6125;6125; | <0.001;0.003; | Y;W; | The deposition of yellow carotenoids in the skin ( |
All related genes are close to potential selection regions. See Supplementary Table S4
.
This column presents the middle position of selection regions, the bold and italics represent the genes without detailed annotation in chicken.