| Literature DB >> 30364947 |
Diego A Hartasánchez1,2,3, Marina Brasó-Vives1,2, Jose Maria Heredia-Genestar1,2, Marc Pybus1,2, Arcadi Navarro1,2,4,5.
Abstract
The study of segmental duplications (SDs) and copy-number variants (CNVs) is of great importance in the fields of genomics and evolution. However, SDs and CNVs are usually excluded from genome-wide scans for natural selection. Because of high identity between copies, SDs and CNVs that are not included in reference genomes are prone to be collapsed-that is, mistakenly aligned to the same region-when aligning sequence data from single individuals to the reference. Such collapsed duplications are additionally challenging because concerted evolution between duplications alters their site frequency spectrum and linkage disequilibrium patterns. To investigate the potential effect of collapsed duplications upon natural selection scans we obtained expectations for four summary statistics from simulations of duplications evolving under a range of interlocus gene conversion and crossover rates. We confirm that summary statistics traditionally used to detect the action of natural selection on DNA sequences cannot be applied to SDs and CNVs since in some cases values for known duplications mimic selective signatures. As a proof of concept of the pervasiveness of collapsed duplications, we analyzed data from the 1,000 Genomes Project. We find that, within regions identified as variable in copy number, diversity between individuals with the duplication is consistently higher than between individuals without the duplication. Furthermore, the frequency of single nucleotide variants (SNVs) deviating from Hardy-Weinberg Equilibrium is higher in individuals with the duplication, which strongly suggests that higher diversity is a consequence of collapsed duplications and incorrect evaluation of SNVs within these CNV regions.Entities:
Mesh:
Year: 2018 PMID: 30364947 PMCID: PMC6239678 DOI: 10.1093/gbe/evy223
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Average values across 1,000 SeDuS simulations for average pairwise differences (π), Tajima’s D, Fay and Wu’s H, and Nei’s haplotype diversity (dh). Values are shown for single-copy, duplicated and collapsed for a range of CO rates (R = 1, 10, 100) and IGC rates (C = 0.5, 1, 5).
. 2.—Boxplot comparison between simulation results from MSMS (complete sweep, incomplete sweep, balancing selection and neutrality) and SeDuS (single-copy, duplicated, collapsed) with low (C = 0.5) and high (C = 5) IGC rates and CO rate R = 10. The length of the boxplot whiskers are 1.5 times the inter-quantile range. Distributions for Fay and Wu's H for an incomplete sweep resemble those from duplicates with low IGC rate. π and dh are two statistics that clearly differentiate between duplicates and collapsed regions from regions under selection.
. 3.—Violin plots show the distribution of differences in average pairwise differences, π, between CNr and CN+ groups (CN+ minus CNr) for the CNV region (blue), and for the 5′ (yellow) and 3′ (green) regions flanking each CNV, pooling data from all CNVs and from the three populations analyzed. Black points indicate the median from each distribution. Mean increases are 15.5%, 4.7%, and 3.2%, with paired t-test P-values (represented by asterisks) of 0.181, 1.7e–04, and 0.326, for the 5′, CNV, and 3′ regions, respectively.