Literature DB >> 26151449

Genome Scan for Selection in Structured Layer Chicken Populations Exploiting Linkage Disequilibrium Information.

Mahmood Gholami¹, Christian Reimer¹, Malena Erbe¹, Rudolf Preisinger², Annett Weigend³, Steffen Weigend³, Bertrand Servin⁴, Henner Simianer¹.

Abstract

An increasing interest is being placed in the detection of genes, or genomic regions, that have been targeted by selection because identifying signatures of selection can lead to a better understanding of genotype-phenotype relationships. A common strategy for the detection of selection signatures is to compare samples from distinct populations and to search for genomic regions with outstanding genetic differentiation. The aim of this study was to detect selective signatures in layer chicken populations using a recently proposed approach, hapFLK, which exploits linkage disequilibrium information while accounting appropriately for the hierarchical structure of populations. We performed the analysis on 70 individuals from three commercial layer breeds (White Leghorn, White Rock and Rhode Island Red), genotyped for approximately 1 million SNPs. We found a total of 41 and 107 regions with outstanding differentiation or similarity using hapFLK and its single SNP counterpart FLK respectively. Annotation of selection signature regions revealed various genes and QTL corresponding to productions traits, for which layer breeds were selected. A number of the detected genes were associated with growth and carcass traits, including IGF-1R, AGRP and STAT5B. We also annotated an interesting gene associated with the dark brown feather color mutational phenotype in chickens (SOX10). We compared FST, FLK and hapFLK and demonstrated that exploiting linkage disequilibrium information and accounting for hierarchical population structure decreased the false detection rate.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2015 PMID： 26151449 PMCID： PMC4494984 DOI： 10.1371/journal.pone.0130497

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

A local reduction of genetic variation up- and downstream of the beneficial mutation is caused by the rapid fixation of a beneficial mutation, leaving special patterns of DNA behind, which is commonly referred to as a “selective sweep” [1]. The study of such signatures of selection can provide valuable insights into genomic regions harboring interesting genes that are or have been under selective pressure and hence can help to understand the mechanisms that led to the differentiation of various genotypes and their influenced phenotypes during selection. Recently, an increasing interest has been placed in the detection of genes, or genomic regions, that are targeted by selection [2], permitted by the availability of large-scale SNP datasets that allow to scan the genome for positions that may have been targets of recent selection [3]. Many different methods are available for detecting selective sweeps from DNA sequence data. Qanbari et al. (2014) [4] classified these methods in two main groups: intra-population statistics (e.g. Kim and Nielsen (2004) [5] and Sabeti et al. (2002) [3]) and inter-populations statistics (e.g. Lewontin and Krakauer (1973) [6] and Beaumont and Balding (2004) [7]). Innan and Kim (2008) [8] and Yi et al. (2010) [9] showed that between recently diverged populations, inter-populations statistics have more statistical power for the detection of selection signatures. These methods are particularly suited for studying species that are structured in well-defined populations, which is the case in many domesticated species. Inter-populations statistics can be divided into two groups based on single site or haplotype differentiation analyses [4]. The most widely used single site differentiation statistic is Wright’s fixation index, FST [10]. A major concern with Wright’s FST is that it implicitly assumes that populations have the same effective size (N ) and to be derived independently from an ancestral population. When this is not true FST will produce false positive signals, similar to the well-known effects of cryptic structure in genome-wide association studies [11]. Bonhomme et al. (2010) [12] proposed a new statistic, termed FLK, that deals with N variation and historical branching of populations by incorporating a population kinship matrix into the Lewontin and Krakauer (LK) [6] statistic and showed that FLK is indeed more powerful than FST for a given false positive rate. Another group of methods builds on the fact that haplotype diversity and linkage disequilibrium (LD) patterns contain useful information for the detection of selection signatures [13] and therefore, usage of haplotype or LD based differentiation analyses has its own advantages. Browning and Weir (2010) [14] showed that SNP ascertainment bias has less impact on haplotype based differentiation analyses compared to single site differentiation. A major challenge regarding the haplotype differentiation scans is that it does not account for the possibility of hierarchical structure between populations. Therefore Fariello et al. (2013) [15] proposed the hapFLK statistic which is a haplotype based extension of the FLK statistic that accounts for both hierarchical structure and haplotype information. They showed that using haplotype information to detect selection in FST-like approaches greatly increases the detection power. Specifically, they demonstrated that the hapFLK statistic has more power in detecting soft sweeps, incomplete sweeps and sweeps occurring in several populations. The chicken is an excellent model for studying the signatures of selection under artificial breeding conditions due to growing genomic resources, the relatively rapid reproduction time and the existence of several inbred lines together with strong agricultural interest [16]. Several studies have investigated selection signatures in chicken either using sequence data or genotype data from low to medium density SNP chips. Rubin et al. (2010) [17] studied the signatures of domestication and selective sweeps using the “Pooled Heterozygosity” (HP) statistic in various commercial broiler and layer lines. Johansson et al. (2010) [18] explored the genomes of two lines of chickens subjected to 50 generations of divergent selection using a 60k SNP assay. Qanbari et al. (2012) [19] applied a modified sliding window, called “creeping window”, of HP measures in pooled sequence data in laying chickens. In an earlier work we [20] studied the signatures of selection by FST in seven commercial breeds using approximately one million SNPs which, however, ignored the hierarchical structure of the populations analyzed. Recent divergence of certain commercial breeds [21] and the introduction of strong selection for production traits (in the 20th century) [22] fosters the interest in detecting selective sweeps in chicken using statistical methods that account for the strong hierarchical structure between these populations. Therefore, this dataset offers an interesting opportunity to evaluate methods that account for population structure in a setting characterized by a strong past selection pressure, high genetic drift and clear population structure, which has never been done before. In this study, FLK [12] and hapFLK [15] statistics were applied on the same data as in our previous study on selection signatures in commercial chicken [20], allowing a comparison between FST, FLK and hapFLK. In contrast to our previous work, the approaches used in the current study have the potential to identify genomic regions which have been selected more recently (e.g. soft sweeps) and are associated with specific layer traits.

Materials and Methods

Animals, Data collection and filtering

Two sets of samples—commercial egg layers and wild chicken (coded respectively LAY and ANC)—were used in this study. The commercial individuals from Lohmann Tierzucht GmbH originated from three different breeds. One commercial white egg layer breed based on White Leghorn (WL), with three separate lines, and the other two brown egg layer breeds based on White Rock (WR) and Rhode Island Red (RIR), respectively, each with two separate lines per breed. In each of these seven lines, ten individuals were sampled and genotyped. The wild chickens, comprising Red Jungle fowl (Cochin-Chinese) (G. g. gallus) and Red Jungle fowl (Burmese) (G. g. spadiceus) were sampled within the AVIANDIV project. A more detailed list of breeds is presented in Table 1. The ANC group consisted of two subspecies of Gallus gallus that are believed to stem in straight line from wild ancestors of domestic chickens. Data is publicly available (S1 Dataset).

Table 1

Name, abbreviation, number of individuals and the egg color for each breed used in this study.

Breed	Abbreviation	# of lines	# of individuals	Egg color
White Leghorn	WL(1/2/3)	3	30(0♂,30♀)	White
Rhode Island Red	RIR(1/2)	2	20(2♂,18♀)	Brown
White Rock	WR(1/2)	2	20(2♂,18♀)	Brown
Gallus gallus gallus	ANC/Ggal	1	2(0♂,2♀)	Brown
Gallus gallus spadiceus	ANC/Gspa	1	2(0♂,2♀)	Brown

Genotyping was done with three Affymetrix 600K SNP arrays. Overlapping SNPs between the three 600K SNP arrays were removed by the data provider and a total of 1,139,073 SNPs remained. For this study we included only the SNPs that were located on autosomal chromosomes (1–28), SNPs that were located on sex chromosomes and linkage groups were removed (62,337 were removed). SNPs with at least one missing value and SNPs with minor allele frequencies lower than 5% (172,344 SNPs) were removed in order to avoid dealing with genotyping errors; this approach was suggested by the data provider. A total of 904,392 SNPs remained after filtering. The entire filtering process was done using the PLINK software (http://pngu.mgh.harvard.edu/purcell/plink/) [23].

Population structure analysis

Using Reynolds’ genetic distances [24], a phylogenetic tree was constructed to retrieve the structure of the studied samples.

FLK and hapFLK calculation

To identify regions under selection, FLK and hapFLK were calculated in all LAY breeds, using ANC individuals for rooting the population tree. FLK calculates variation of the inbreeding coefficient and incorporate hierarchical structure by using a population kinship matrix (for details see Bonhomme et al. (2010) [12]). The same matrix is used in hapFLK, but the statistic is computed from haplotype frequencies rather than SNP allele frequencies. Here, the haplotypes considered are in fact latent states extracted from the multipoint linkage disequilibrium model of Scheet and Stephens [25] (for details read Fariello et al. (2013) [15]). To determine the number of underlying latent states we used the fastPHASE [25] cross validation procedure, which indicated that 5 or 10 haplotype clusters were adequate. We found that using either 5 or 10 haplotype clusters gave nearly identical results and therefore present those obtained assuming 5 haplotype clusters.

Assigning signatures of selection to specific population groups

When using differentiation-based approaches, it is sometimes difficult to pinpoint the population(s) that have been the target of selection. Fariello et al. (2013) [15] proposed to decompose the hapFLK statistic by projecting it on principal components (PC) of the population kinship matrix to identify which part of the population tree exhibits an outlying differentiation in a particular genomic region. Here, we employed this approach to look for selection signatures that affected either (i) the whole population set (LAY), (ii) white layer populations or (iii) brown layer populations. For (i) we used the hapFLK statistic, for (ii) and (iii) we considered the projection of the statistic on the subtree corresponding to white (resp. brown) layer populations. In each case we considered that a position lying in the top and bottom 0.05% of the empirical distribution was potentially within a selection signature. For each selection signature, we then re-estimated the branch lengths of the population tree, using local allele or haplotype clusters frequencies (see Fariello et al. (2013) [15] for details) and identified the branch lengths that seem significantly larger than the branches of whole genome tree to pinpoint selected populations.

Fitting of gamma distribution

As hapFLK statistic does not follow a known distribution under neutrality, the null distribution has to be estimated from the data. As hapFLK is similar to FLK, a good approximation to the asymptotic distribution of hapFLK comes from the gamma distribution family. To estimate p-values of selection signatures, we fitted a gamma distribution to the hapFLK observed distribution, using the minimum distance estimation method [26,27] which is robust to outliers, which helps to reduce the influence of selection signatures in estimating the null distribution. This was done for false detection rate (FDR) estimation.

Annotation

As explained above, regions with extreme FLK and hapFLK values were considered as candidates for selective sweeps. For all the three groups (all layers, white layers and brown layers) the extreme values (the upper and lower 0.05%) that were within 500 kb of each other were grouped together. For all joined groups gene annotations, QTL annotations and pathway annotations were completed. Gene annotations were done with the biomaRt R package [28] based on the Ensembl database [29] of Gallus_gallus-4.0 assembly. Animal QTL database [30] was used for QTL annotation, KEGG database for pathway annotation [31] and Gene Ontology (GO) database for GO annotation [32]. Gene enrichment analysis was done with Fisher’s exact test [33] for all annotated genes in all groups (all layers, white layers and brown layers) separately. Pathways and gene ontologies with p ≤ 0.05 were identified as being under selection.

Results

Population structure

A phylogenetic tree based on Reynolds’ genetic distances with 100,000 randomly selected SNPs (100 replications) was constructed and is shown in Fig 1. As Fig 1 shows, commercial white egg-layer breeds were separated from brown egg-layers and grouped in one sub-tree. In the sub-trees, the two white-layer lines WL2 and WL3 as well as the two brown-layer lines WR1 and WR2 form a separate sub-cluster, respectively. The population specific fixation indices of all populations, also shown in Fig 1, are extremely high (ranging from 0.45 to 0.75), reflecting the very strong effect of genetic drift in these populations, with the three White Leghorn populations notably more inbred than the Brown layer populations.

Fig 1

Reynolds’ genetic distances population tree of seven commercial breeds and histogram of fixation index for each line.

FLK

Based on the FLK values distribution, a total of 107 regions (63 in all layers, 27 in white layers and 17 in brown layers) were detected as signatures of selection (S1 Table). All these regions were in the upper 0.05% of the distribution which is representative of regions with fixed difference between populations. The genome-wide distribution of FLK values obtained from each group—all, white and brown—are depicted in Fig 2A, 2B and 2C, respectively. Annotation was carried out for all regions with extreme FLK values, i.e. potential selection signatures. The lists of genes in selective sweeps detected with FLK are available in the supplementary tables (S2, S3 and S4 Tables). The annotation list is enriched with genes of biological interest involved in various pathways such as ATP metabolic process (P = 0.023), metal ion binding (P = 0.001), nucleic acid binding (P = 0.008) and metabolic pathways (P<0.001), all of which can be related to production traits under selection in layers. The lists of pathways and gene ontologies under selection are available in supplementary tables (S5, S6 and S7 Tables). We identified three candidate genes which can be related to the breeding goals of chickens. H3F3C and AGRP which are associated with body growth and body weight [34,35], and IL19 which is associated to the immune system in chicken [36]. More details about gene locations and study groups are available in Table 2. We also detected several QTL overlapping selection signatures for traits such as breast muscle weight, abdominal fat weight and liver weight, which all are related to the breeding goals of chickens (Table 3).

Fig 2

Manhattan plot of FLK analysis over the entire genome.

Blue line indicates the upper 0.05% of FLK distribution, for (A) within all breeds, (B) within white breeds, and, (C) within brown breeds.

Table 2

Genes associated with productive traits in FLK and hapFLK analysis in all three studies. ‘All’, ‘White’, and ‘Brown’ stand for inclusion of all the commercial breeds, analysis within white layers and analysis within brown layers, respectively. ‘s’ stands for similarity and ‘d’ for difference.

Chr	Gene	Function	Test	Group
1	SOX10	Causal mutation underlying the dark brown mutational phenotype in chickens.	hapFLK	All(d) and Brown(d)
3	H3F3C	Potential role in early feed stress responses and adaptation to feed intake stress.	FLK	All(d), White(d)
10	IGF-1R	Associated with chicken early growth and carcass traits.	hapFLK	Brown(s)
11	AGRP	Associated with chest width, body weight, and high slaughter rate.	FLK	All(d), White(d)
20	BPIFB8	A molecular actor of the avian egg natural defense.	hapFLK	Brown(s)
26	IL19	Associated with immunoprotection.	FLK	All(d), White(d)
27	STAT5B	A potential genetic marker for growth and reproduction traits.	hapFLK	Brown(s)

Table 3

QTL associated with productive traits in FLK analysis in all three studies. ‘All’ stands for inclusion of all commercial breeds, and ‘White’ for analysis within white layers.

Chr	QTL	Group
1	Fear-tonic immobility duration	All, White
4	Disease-related traits	All, White
5	Disease-related traits	All, White
6	Liver weight	All, White
11	Breast muscle weight	All, White
26	Abdominal fat weight	All, White
26	Abdominal fat percentage	All, White

Manhattan plot of FLK analysis over the entire genome.

Blue line indicates the upper 0.05% of FLK distribution, for (A) within all breeds, (B) within white breeds, and, (C) within brown breeds.

hapFLK

Based on the hapFLK values distribution, a total of 41 regions (17 in all layers, 12 in white layers and 12 in brown layers) were detected as selection signatures (S8 Table). All these regions were in either the upper or the lower 0.05% of the distribution, which represent regions with a fixed difference or fixed similarity between populations, respectively. The genome-wide distribution of hapFLK values with 5 haplotype clusters obtained for each group—all, white and brown—are depicted in Fig 3A, 3B and 3C, respectively. Annotation was carried out for all regions with extreme hapFLK values, i.e. potential selective sweeps. The lists of genes for selective sweeps detected with hapFLK are available in the supplementary tables (S9, S10 and S11 Tables). The annotation list is enriched with genes of biological interest involved in various pathways such as nerve development (p = 0.027), growth factor receptor (p = 0.008), RNA metabolic process (p = 0.042) and skeletal muscle cell differentiation (p = 0.032), all of which could be related to production traits indirectly. The lists of pathways and gene ontologies under which were detected under selection in this study are available in the supplementary tables (S12, S13 and S14 Tables). We identified four genes that were related to the breeding goals of chickens with the hapFLK method. IGF-1R and STAT5B are associated with growth and carcass traits [37,38]. BPIFB8 and SOX10, which are associated with egg natural defense [39] and dark brown mutational phenotype [40] respectively (more details is available in Table 2). Several QTL, which were related to the breeding goals of egg-layer chickens were detected as well, for traits such as drumstick and thigh morphology, carcass weight and shank length. A complete list of all QTL with more details is available in Table 4.

Fig 3

Manhattan plot of hapFLK analysis over the entire genome with 5 clusters.

Blue (red) line indicates the upper (lower) 0.05% of hapFLK distribution, for, (A) within all breeds, (B) within white breeds, and, (C) Within brown breeds.

Table 4

QTL associated with productive traits in hapFLK analysis in all three studies. ‘All’, ‘White’, and ‘Brown’, stand for inclusion of all the commercial breeds, analysis within white layers and analysis within brown layers, respectively. ‘s’ stands for similarity and ‘d’ for difference.

Chr	QTL	Group
1	Abdominal fat percentage	All(d), Brown(d) and White(d)
1	Heart weight	All(s) and Brown(s)
2	Carcass weight	All(s) and Brown(s)
2	Drumstick and thigh weight	All(s) and Brown(s)
2	Drumstick and thigh muscle weight	All(s) and Brown(s)
2	Shank length	All(s) and Brown(s)
2	Shank circumference	All(s) and Brown(s)
2	Heart weight	White(s)
9	Liver percentage	White(s)

Manhattan plot of hapFLK analysis over the entire genome with 5 clusters.

Blue (red) line indicates the upper (lower) 0.05% of hapFLK distribution, for, (A) within all breeds, (B) within white breeds, and, (C) Within brown breeds.

Discussion

Structure analysis and P0 comparison

Our population structure analyses are largely in agreement with the expected historical origin of the breeds [21] and as expected, they are also similar to the previous study using the same data [20]. One of the issues in the FLK and hapFLK analysis in this study is using only 4 wild chickens for development of the population's kinship matrix. We assessed whether using a different set of outgroup individuals could possibly change our findings by verifying the influence of the outgroup set on the estimation of the ancestral allele frequency (p0). p0 can be seen as a nuisance parameter in the model that has to be estimated from the data through the kinship matrix. We studied the possible impact of the number of wild chickens used by comparing p0 when being calculated from 4 wild chickens (our ANC group) vs. 40 wild chickens (consisted of 20 Gallus gallus gallus and 20 Gallus gallus spadiceus which were genotyped with Axiom Genome-Wide Chicken Genotyping Array of Affymetrix and were available only for this comparison). p0 was calculated for each group (ANC group and 40 wild chickens) for every SNP on the 600K SNP chip. Pairwise comparison of each group’s p0 values along the genome gave an average correlation of 0.95. This high correlation suggests that there is no vital difference in development of population's kinship matrix with 4 or 40 wild chickens. Therefore the kinship matrix calculated based on four wild chickens, which had been genotyped for the complete set of close to one million SNPs was considered sufficient. A histogram of the differences in p0 estimated with the two outgroup sets is shown in S1 Fig, showing that more than 90% of the differences are less than ± 0.02. Although the outlier approach is an effective and widely used method for identification of genes under selection lacking known phenotypes [41], an outlier signal is not necessarily synonymous with regions being under selection [42]. Therefore we fitted a gamma distribution to the hapFLK in order to estimate the false discovery rate (FDR). This approach suggested an FDR of 10–20% in our analysis.

FST, FLK and hapFLK

An overlap exists between the regions that have been determined as regions under selection in a previous study with FST [20] and the current analysis of FLK and hapFLK as shown by the Venn diagram for the number of SNPs identified as being under selection with either of the methods shown in Fig 4. Using the same threshold as in our previous work [20] (upper and lower 1%) resulted in detection of a lower number of selection signatures with FLK (73.2%) and much lower with hapFLK (13.4%) compared to the FST based results reported in our earlier study on the same data (list of regions detected with FST method is available in S15 Table) [20]. A finding suggested that ten-thousands of polymorphisms respond to selection, which was the case in our earlier work [20], does not appear realistic [43]. Many of the outliers detected with FST must be considered as false positives, which might be partly due to the fact that the method assumes populations to have the same effective size and to have emerged independently from the same ancestral population. Therefore we used a much stricter threshold (upper and lower 0.05%) in the study presented here than in our previous work (upper and lower 1%) [20]. Accordingly, the use of a stricter threshold and the application of methods that account for different effective population sizes and hierarchical phylogenies (FLK and hapFLK), resulted in the detection of much lower number of selection signatures. There is also an overlap between regions detected by hapFLK and FLK (44.2%) which is due to the use of same statistic in both methods. The difference between regions detected by FLK and hapFLK can be due to the fact that haplotype and SNPs harbor different information.

Fig 4

Venn diagram of overlapping SNPs identified as under selection, with FST, FLK and hapFLK methods using same threshold (upper and lower 1%).

As an example, in Fig 5A we demonstrate allele frequencies at SNP positions around the TGFB2 gene (Chr3: 18,690,003–18,753,123) which was detected as a gene under selection by FST [20] due to a reduction of diversity within the WL breed. However, since this reduction exists only within the WL breed this can also be explained by drift alone. By taking the population tree into consideration, FLK does not detect any signals in this region. Another example is the region around the H3F3C gene (Chr3: 16,483,162–16,487,393) which was detected to be under selection by FLK. Allele frequencies around this region shows that a huge diversity exists between some breeds (Fig 5B). We detect an outlier with FLK in particular because WR1 and WR2 show very different patterns of allele frequencies in this region although they are closely related in the population tree. However FST was not able to detect any signal here, since FST treats each population as an independent evidence for sweep detection and does not consider the huge difference between WL, RIR and WR breeds. There are as well cases in which all three methods (FST, FLK and hapFLK) were able to detect the region under selection. An example is a 60Kb region on chromosome 10 (6,799,776–6,738,610). Fig 5C shows allele frequencies around this region.

Fig 5

Allele frequency in different breeds for 2 Mbp around the intended region.

Red box indicates, for (A) TGFB2 gene (Chr3: 18,690,003–18,753,123), (B) H3F3C gene (Chr3: 16,483,162–16,487,393) and (C) 60Kb region on chromosome 10 (6,799,776–6,738,610).

Allele frequency in different breeds for 2 Mbp around the intended region.

Red box indicates, for (A) TGFB2 gene (Chr3: 18,690,003–18,753,123), (B) H3F3C gene (Chr3: 16,483,162–16,487,393) and (C) 60Kb region on chromosome 10 (6,799,776–6,738,610). A complete hard sweep is expected to be large [44], while a soft sweep is more likely to have smaller size [45]. In the current study we detected smaller sweeps (bp length) compared to our FST study, which may be due to the fact that hapFLK has greater power in detection of soft sweeps. Nevertheless we should as well take into account the false positive rate of our FST study. A boxplot of sweep size with FST and hapFLK method is shown in S2 Fig. A vast majority of differentiated polymorphisms in our data set could be caused by genetic drift. Genetic drift is high when the (effective) population size is small [46] which is the case in commercial laying breeds [47]. Since regions differentiated by selection and regions differentiated by drift alone may overlap, there is a lack of power in our analysis. This could be solved by using a larger number of populations to minimize the risk that a systematic pattern of differentiation in many breeds (say, several white layers vs. several brown layers) is created at random by drift alone. Other obstacles in this study are the use of only 10 animals per sample and filtering for minor allele frequencies; these two issues might have an effect on the estimation of allele frequencies, comparison of rarer alleles and identification of all haplotypes. In a recent simulation study [48] it was shown that the power of most selection signature tests is more dependent on marker density than on sample size, and that with a marker density similar to the one used in the present study a high power and positional resolution was achieved with 15 sampled individuals per population. We detected several genes related to the breeding goals of egg-layer chickens, such as low body weight, high reproduction performance and good feed conversion [49], both with FLK and hapFLK. For instance, with the FLK method we detected several QTL associated to disease-related traits and breast muscle weight, as well as AGRP (agouti related protein homolog), which is associated with breast muscle water loss rate, chest width, body weight, slaughter rate and semi-evisceration weight [35]. In the hapFLK analysis, we also detected several genes, which are associated with growth and carcass traits, such as IGF-1R and STAT5B. STAT5B (signal transducer and activator of transcription 5B) is associated with growth and reproduction traits [38]. IGF-1R (insulin-like growth factor 1) is similar to IGF2 [50], which was detected in our previous work [20]. IGF-1R is associated with chicken early growth and carcass traits [37]. We additionally detected several QTL associated to carcass weight, drumstick weight and shank length. QTL associated with meat production, as well as both IGF-1R and STAT5B, were located in regions that were similar between brown layers. Supporting results were found in our previous study [20], where we detected genes associated to meat quality and production in brown layers, which reflects the fact that brown egg-layers were originally a dual-purpose breed [21]. Bonhomme et al. (2010) [12] and Fariello et al. (2013) [15] showed with simulation that using FLK or hapFLK method to detect selection signatures in comparison to other FST-like approaches greatly increases the detection power. Specifically, hapFLK statistic has more power in detecting sweeps occurring in several populations. Due to this, we were able to detect SOX10 with hapFLK which was not detected by FST or FLK method. SOX10 is a gene on chromosome one underlying the dark brown mutational phenotype in chickens plumage [40]. SOX10 was detected in regions that were different between brown layers. Re-estimation of the local tree using haplotype clusters frequencies (Fig 6A) and haplotype frequencies (Fig 6B) for the region surrounding SOX10 revealed selection in the RIR breeds in this region. RIR is the only breed with dark brown feather in our data set [51], which is in great agreement with our selection signature detection.

Fig 6

(A) Re-estimation of local tree using haplotype clusters frequencies for surrounding region of SOX10 gene.

(B) Haplotype frequencies for the surrounding region of SOX10 gene (50.8 Mbp).

(A) Re-estimation of local tree using haplotype clusters frequencies for surrounding region of SOX10 gene.

(B) Haplotype frequencies for the surrounding region of SOX10 gene (50.8 Mbp).

Conclusions

In conclusion we were able to identify several putative selection signature regions with genes corresponding to the traits associated to growth and reproduction traits. Some of these annotated genes were similar (or had similar functions) to our findings in our previous work [20]. However, several of the detected regions were not associated with any genes related to production traits, which could be due to insufficient knowledge about these regions [52]. We did not identify selection signatures that were reported in other studies on chicken [17,53] which could be due to lack of diversity in our data compared to their data set. By detection of SOX10 as a gene under selection, we demonstrated that the use of haplotype frequencies and consideration of hierarchical structure can improve the power of detection of soft sweep in our data set.

Histogram of p0 difference between the calculation with 4 wild chickens (ANC group) and 40 wild chickens.

(TIFF) Click here for additional data file.

Boxplot of sweep size with FST and hapFLK method.

(TIFF) Click here for additional data file.

Regions detected as putative selective sweeps detected with FLK with upper 0.05% threshold.

All, White and Brown stands for studies with all layers, within white layers and within brown layers, respectively. (PDF) Click here for additional data file.

List of genes for selective sweeps detected with FLK with 0.05% threshold in all layers.