| Literature DB >> 27226166 |
Jeffrey D Wall1, Laurie S Stevison2.
Abstract
With recent advances in DNA sequencing technologies, it has become increasingly easy to use whole-genome sequencing of unrelated individuals to assay patterns of linkage disequilibrium (LD) across the genome. One type of analysis that is commonly performed is to estimate local recombination rates and identify recombination hotspots from patterns of LD. One method for detecting recombination hotspots, LDhot, has been used in a handful of species to further our understanding of the basic biology of recombination. For the most part, the effectiveness of this method (e.g., power and false positive rate) is unknown. In this study, we run extensive simulations to compare the effectiveness of three different implementations of LDhot. We find large differences in the power and false positive rates of these different approaches, as well as a strong sensitivity to the window size used (with smaller window sizes leading to more accurate estimation of hotspot locations). We also compared our LDhot simulation results with comparable simulation results obtained from a Bayesian maximum-likelihood approach for identifying hotspots. Surprisingly, we found that the latter computationally intensive approach had substantially lower power over the parameter values considered in our simulations.Entities:
Keywords: composite likelihood; linkage disequilibrium; recombination hotspots
Mesh:
Year: 2016 PMID: 27226166 PMCID: PMC4978882 DOI: 10.1534/g3.116.029587
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Key differences between three implementations of LDhot
| Method | mlehot | ||
|---|---|---|---|
| Window size | 200 kb | 100 kb | 20 kb |
| 0.01 | 0.001 | 0.01 | |
| Max. size | 5 kb | None | None |
| LDhat | Peak must be ≥5/kb | – | Intersection with 1 kb regions with ρ ≥ 5 |
| Simulations | Region-specific | Region-specific | Lookup table |
Figure 1(A) Power to detect a hotspot as a function of the window size (x-axis) and the strength of the hotspot using the protocol of Auton . Background ρ is 0.5/kb. See text for further details. (B) False positive rate for estimated hotspots, defined as the proportion of estimated hotspot sequence that was not actually a simulated hotspot. (C) Estimated hotspot locations without the length limitation for a 100 kb window and a 50-fold hotspot. Dashed vertical lines show location of actual hotspot. Results are similar for other parameter combinations.
Figure 2(A) Power to detect a hotspot as a function of the window size and strength of the hotspot using the protocol of Auton . Results are directly comparable to those of Figure 1A and Figure 3A. (B) False positive rate for estimated hotspots, defined as the proportion of estimated hotspot sequence that was not actually a simulated hotspot.
Figure 3(A) Power to detect a hotspot as a function of the window size and strength of the hotspot using our new protocol. Results are directly comparable to those of Figure 1A and Figure 2A. (B) False positive rate for estimated hotspots, defined as the proportion of estimated hotspot sequence that was not actually a simulated hotspot.
Figure 4Power to detect hotspots using our new protocol over (A) different haploid sample sizes, (B) different background recombination rates, and (C) higher mutation and (background) recombination rates.
Comparison between our implementation of LDhot and Inferrho over 20 different 1 Mb regions
| Power (%) | LDhot | Inferrho |
|---|---|---|
| 10-fold hotspot | 17.5 | 0.9 |
| 20-fold hotspot | 48.8 | 2.8 |
| 50-fold hotspot | 87.5 | 15.0 |
| 100-fold hotspot | 95.0 | 26.5 |
| False positive rate (%) | 1.59 | <0.2 |
| False discovery rate (%) | 55.6 | ≥4.1 |
Due to computational limitations, we could not calculate exact values of the false positive and false discovery rates for Inferrho.
Comparison between the results published in Wang and Rannala (2009, Table 1) and our computations of Inferrho using the same data sets
| IRv1 | INFERrho | ||
|---|---|---|---|
| Power (%) | – | 19.0 | 28.3 |
| False positive (%) | 4 | 0 | 0 |
| Overlap (%) | 74 | 33 | 39 |
False positive refers to the Wang and Rannala (2009) definition – called hotspots that do not overlap at all with true hotspots.
Cannot be determined from the original study.
This refers to the Wang and Rannala (2009) definition – called hotspots that do not overlap with all true hotspots.
This is Wang and Rannala’s (2009) definition of power – the proportion of true hotspots that overlap by at least 1 bp with a called hotspot.