| Literature DB >> 26943675 |
Adam H Freedman1, Rena M Schweizer1, Diego Ortega-Del Vecchyo1, Eunjung Han1, Brian W Davis2, Ilan Gronau3, Pedro M Silva4, Marco Galaverni5, Zhenxin Fan6, Peter Marx7, Belen Lorente-Galdos8, Oscar Ramirez8, Farhad Hormozdiari9, Can Alkan10, Carles Vilà11, Kevin Squire12, Eli Geffen13, Josip Kusak14, Adam R Boyko15, Heidi G Parker8, Clarence Lee16, Vasisht Tadigotla16, Adam Siepel17, Carlos D Bustamante18, Timothy T Harkins16, Stanley F Nelson12, Tomas Marques-Bonet8,19, Elaine A Ostrander2, Robert K Wayne1, John Novembre1.
Abstract
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.Entities:
Mesh:
Year: 2016 PMID: 26943675 PMCID: PMC4778760 DOI: 10.1371/journal.pgen.1005851
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1Distribution of CMS1-FDR statistic calculated in 100kb sliding windows, with a 10kb step.
Fig 2Distributions of observed values for selection scan statistics and those computed from neutral coalescent simulations based up the inferred demographic history [ Dashed lines indicate threshold values for FDR ≤ 0.01.
FDR threshold values and window counts for selection scan statistics.
| Statistic | FDR | Windows <FDR | Minimum value ≤ FDR threshold | P at FDR |
|---|---|---|---|---|
| Δπ | 0.05 | 519 | 3.127 | 1.25 × 10−4 |
| 0.01 | 353 | 3.332 | 1.0 × 10−5 | |
| FST | 0.05 | 1495 | 0.481 | 3.8 × 10−4 |
| 0.01 | 827 | 0.544 | 4.0 × 10−5 | |
| Δ Tajima’s D | 0.05 | 2329 | 1.466 | 6.0 × 10−4 |
| 0.01 | 982 | 1.763 | 5.0 × 10−5 |
Fig 3Z-transformed selection scan statistics, CMS1-FDR, and gene annotations within the (A) top ranked, (B) 3rd ranked, (C) 4th ranked, and (D) 5th ranked candidate regions for positive selection on the dog lineage.
Fig 4Biplots of summary statistics for 100kb sliding windows classified by their (A, C) CMS1-FDR and (B, D) joint percentile.
CMS1-FDR is classified according to whether it is ≥ the minimum value observed in the top 100 regions for the maximum of CMS1-FDR comprising the region (i.e. “high CMS”), and whether at least one summary statistic has an FDR ≤ 0.01 (i.e. low FDR). Thus, windows can be classified as “low CMS, high FDR”, “high CMS, high FDR”, “low CMS, low FDR”, and “high CMS, low FDR.” The first two categories are consistent with neutral expectations, the third is characterized by very weak evidence for selection, and the last category includes those windows with the strongest evidence for selection. For more details on these categories, see Regions under selection in Results.
Fig 5Top 25 outlier regions identified using the FDR-based methodology using Δout FST, Δ Tajima’s D and validated with the 12-breed dog diversity panel (see text), with regions ranked according their respective maximum CMS1-FDR statistic.
Columns within “This study” are based on the sequencing data generated here, while those under “CanMap” are computed from a ~48k SNP data set for a large set of wolves and ancient/basal dog breeds [35]. Heat map colors reflect upper percentiles of the calculated metrics, with warmer colors indicating higher percentiles. Overlaps with previous studies: 1, vonHoldt et al. 2010 [35]; 2, Vaysse et al. (2011) [25]; 3, Boyko et al. (2010) [23]; and Axelsson et al. (2013), [27]; with numbers indicating the joint percentile, FST, FST and region id, respectively for each study.
Enrichment categories discovered from the top 100 regions within 25kb of peak in joint statistic signal, excluding regions that fail to show reduced diversity in the 12-breed data set and categories with FDR >10%.
Input and background total number of genes are 50 and 13,528, respectively.
| Category | P | P-corrected | FDR (%) | Background in Term | Fold Enrichment | Genes |
|---|---|---|---|---|---|---|
| Biological process: behavior | 0.0014 | 0.5877 | 2.1 | 469 | 4.6 | DOCK2, GLRA1, LYST, ABAT, NTAN1, MBD2, ASIP, CXCL10 |
| Biological process: locomotory behavior | 0.0030 | 0.6163 | 4.4 | 274 | 5.9 | DOCK2, GLRA1, LYST, ABAT, NTAN1, CXCL10 |
| Biological process: adult behavior | 0.0037 | 0.5421 | 5.4 | 86 | 12.6 | GLRA1, ABAT, NTAN1, ASIP |
a Benjamini-corrected P-value.
b (number of enriched genes in GO term/50)/(background genes in GO term/13528).