Literature DB >> 30105047

Genome-Wide Signatures of Selection Reveal Genes Associated With Performance in American Quarter Horse Subpopulations.

Felipe Avila1, James R Mickelson2, Robert J Schaefer1, Molly E McCue1.   

Abstract

Selective breeding for athletic performance in various disciplines has resulted in population stratification within the American Quarter Horse (QH) breed. The goals of this study were to utilize high density genotype data to: (1) identify genomic regions undergoing positive selection within and among QH subpopulations; (2) investigate haplotype structure within each QH subpopulation; and (3) identify candidate genes within genomic regions of interest (ROI), as well as biological pathways, predicted to play a role in elite performance in each group. For that, 65K SNP genotyping data on 143 elite individuals from 6 QH subpopulations (cutting, halter, racing, reining, western pleasure, and working cow) were imputed to 2M SNPs. Signatures of selection were identified using FST-based (di ) and haplotype-based (hapFLK) analyses, accompanied by identification of local haplotype structure and sharing within subpopulations (hapQTL). Regions undergoing positive selection were identified on all 31 autosomes, and ROI on 2 chromosomes were identified by all 3 methods combined. Genes within each ROI were retrieved and used to identify pathways and genes that might contribute to performance in each subpopulation. These included, among others, candidate genes associated with skeletal muscle development, metabolism, and central nervous system development. This work improves our understanding of equine breed development, and provides breeders with a better understanding of how selective breeding impacts the performance of QH populations.

Entities:  

Keywords:  SNP; ancestral haplotypes; genotyping; horse; imputation; selection signatures

Year:  2018        PMID: 30105047      PMCID: PMC6060370          DOI: 10.3389/fgene.2018.00249

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


Introduction

The American Quarter Horse is the most common horse breed in the United States, with approximately 2.5 million registered individuals (2016 AQHA Annual Report) representing one third of the country's equine population. When accounting for the international equine population, the worldwide number of Quarter Horses (QH) reaches close to 3 million registered individuals (2016 AQHA Annual Report). Formally recognized as a breed with the establishment of the American Quarter Horse Association (AQHA) in 1940, the QH is a versatile and rugged breed that has been increasingly bred for performance in various disciplines. This versatility translates into profitability for the US equine industry: in 2015, over 130 million dollars were paid in awards or purses in AQHA-approved shows and races (2015 AQHA Annual Report). Previous studies have shown that genetic diversity within the QH is high when compared to other horse breeds (McCue et al., 2012; Petersen et al., 2013a,b), which can be attributed to factors including genetic influence from other breeds (such as the Thoroughbred), as well as rapid and continuous population expansion (Petersen et al., 2014). Petersen et al. (2014) used both pedigree analysis and SNP genotyping data to determine if genetic differences exist across the QH breed itself by examining elite individuals from 6 performance groups (cutting, halter, racing, reining, western pleasure, and working cow). This work demonstrated that, with the exception of the working cow and cutting horse subpopulations, each performance group constitutes a genetically distinct subpopulation, a result of increased inbreeding over the past 75 years. Moreover, these data demonstrated that the QH subpopulations are genetically distinct due to the use of different breeding programs and different sire lines among different performance groups (Petersen et al., 2014). Selection for a trait, accomplished through selective breeding practices, results in increased frequency of haplotype(s) containing the gene(s) and functional allele(s) conferring that phenotype at a rate greater than that expected under the null model of neutral evolution. This concept, termed a “selective sweep,” was first proposed by Smith and Haigh (1974). Genomic studies aiming at the identification of signatures of selection for phenotypes of interest, particularly performance traits, are being conducted in many livestock and domestic animal species; for example, cattle (Druet et al., 2014; Gutiérrez-Gil et al., 2015; Boitard et al., 2016; Randhawa et al., 2016), pigs (Li et al., 2014; Yang et al., 2014; Ma et al., 2015), sheep (Moradi et al., 2012; Fariello et al., 2013, 2014; Kijas, 2014), cats (Montague et al., 2014; Bertolini et al., 2016), and dogs (Akey et al., 2010; Boyko et al., 2010; Axelsson et al., 2013; Cagan and Blass, 2016). These studies have typically used a single method, such as the fixation index (FST), extended haplotype homozygosity (EHH) or runs of homozygosity (ROH), to identify from several to hundreds of loci potentially under positive selection. Although some studies were able to identify functional alleles that are potentially driving selective sweeps, the use of complementary approaches such as the composite selection signals (CSS) method to aid in corroboration and prioritization of the identified ROI has been suggested (Randhawa et al., 2014, 2015). Signatures of selection have also been identified in the horse genome. Petersen et al. (2013b) used Equine SNP50 BeadChip (Illumina, San Diego, CA, USA) data on 744 horses from 33 breeds, coupled with d, a modified FST statistic (Akey et al., 2010), to identify and confirm loci associated with skeletal muscle fiber types in the QH (ECA18, MSTN gene), ability to perform alternative gaits in many breeds (ECA23, DMRT3 gene), and size in draft breeds and the Miniature horse (ECA11:23.4Mb). Numerous other highly significant ROI identified in that study require further investigation to define candidate genes and propose putative functional alleles. Frischknecht et al. (2016) identified a selection signature on ECA1, containing the IFF1R and ADAMTS17 genes, that is putatively associated with height in Shetland ponies, while Kader et al. (2015) reported genomic regions harboring candidate genes for height as undergoing positive selection in the Chinese Debao pony, including a region on ECA6 containing the height-associated HMGA2 gene in many pony breeds (Frischknecht et al., 2015). Moon et al. (2015) collected whole genome sequence (WGS) data from Thoroughbred stallions and, using FST analysis between the Thoroughbred and the Jeju pony (an ancestral-type population), identified 12 regions putatively undergoing directional selection in the Thoroughbred. And, McCoy et al. (2014) found evidence of historical positive selection around the GYS1 gene in Belgian horses, likely due to selection for enhanced glycogen deposition conferred by a gain of function mutation in glycogen synthase. Lastly, putative signatures of selection have been identified in racing and cutting QH subpopulations through relative extended haplotype homozygosity (REHH) analysis, combined with FST statistics (Meira et al., 2014; Beltrán et al., 2015). A total of 27 and 36 putative signatures of selection were identified in the racing and cutting lines, respectively, and putative candidate genes related to muscle, skeletal, cardiovascular, respiratory and nervous systems, vision, hearing, and cognition were identified (Meira et al., 2014; Beltrán et al., 2015). In this study, we hypothesized that selective breeding within 6 QH subpopulations has increased the frequency of alleles responsible for elite performance, resulting in selective sweeps within each subpopulation. Based on this hypothesis, and relying on the principal of genetic hitchhiking, we used a combination of three approaches to identify genomic regions undergoing selection in elite QH performers that are likely to harbor alleles underlying performance traits. Within each of these regions, we identified genes and biological pathways predicted to play an important role in performance, which warrant further investigation to identify the underlying functional alleles.

Materials and methods

Sample collection and genotyping

The individuals and Equine SNP70 genotyping data used in this study have been previously described (Petersen et al., 2014) and are publically available (animalgenome.org) Briefly, all horses included in this study were among the 200 top performers of each of the 6 subpopulations (cutting, halter, racing, reining, western pleasure, and working cow) as determined by money or points earned in 2009 and 2010 according to the AQHA, after eliminating full and half-sibships. Primary performance characteristics of these populations are provided in Table 1.
Table 1

Primary characteristics of the six QH performance groups evaluated in this study, as described by the American Quarter Horse Association.

Performance TypeCharacteristics
CuttingHorse and rider must move quietly into a herd of cattle, cut one cow from the herd, drive it to the center of the arena and “hold” it away from the herd. The horse is scored on its ability to keep the cow from returning to the herd, cow sense, attentiveness, and courage.
HalterHorses are led before judges so that lameness and quality of movement can be evaluated. Horses are judged on conformation including: balance, structural correctness, breed, sex characteristics and degree of muscling.
RacingHorses race against one another at distances between 220 and 870 yards. The classic distance is 440 yards (1/4 mile).
ReiningThe horse is judged on movement under saddle, mastery of a prescribed maneuver and attitude as it is guided through a specific pattern. The horse is required to perform stops, spins, rollbacks, lead changes and circles at a lope.
Western pleasureHorses are evaluated on quality of movement under saddle at the walk, jog, and lope, while staying quiet, calm, and traveling on a loose rein.
Working cow (Reined cow horse)Combines reining ability and cow sense. The competition consists of two parts: prescribed reined work and actual cow work. Judging is based on good manners, smoothness, cow sense and ease of reining.

Originally provided in Petersen et al. (.

Primary characteristics of the six QH performance groups evaluated in this study, as described by the American Quarter Horse Association. Originally provided in Petersen et al. (. DNA was isolated from hair root samples of 143 elite individuals (24 individuals each from the cutting, halter, racing, reining, and working cow subpopulations, and 23 from the western pleasure group). These samples were genotyped on the Illumina Equine SNP70 BeadChip (Petersen et al., 2014) that contains approximately 65,000 SNP markers. For the current study, a subset of 47 individuals (16 from the halter, 15 from the racing, 9 from the western pleasure, and 7 from the working cow subpopulations), were also genotyped using the custom MNEc2M SNP array, which contains approximately 2 million SNPs (Schaefer et al., 2017).

Genotype imputation

The Beagle 4.0 software (Browning and Browning, 2007) was used to impute the original 65,000 SNP genotyping data to approximately 2 million SNPs for all 143 individuals included in this study, on a per chromosome basis using default settings. Genotypes were imputed using a reference panel consisting of 1,934,984 SNPs and 485 individuals from 24 different breeds encompassing wide genetic diversity (Petersen et al., 2013a; Schaefer et al., 2017). The average genotype concordance across all chromosomes, measured by dividing the number of accurate calls (best-guess genotypes that matched the true genotype for a particular individual in the reference panel) by the total number of genotype calls, was 99.5% (Schaefer et al., 2017).

SNP pruning

Because sex chromosomes are subjected to different selective pressures, have different effective population sizes and undergo more drift than autosomes (Heyer and Segurel, 2010), only autosomal SNPs were used in this study. After imputation, SNPs with genotyping call rates lower than 95% and minor allele frequency (MAF) of less than 0.05 across all individuals were removed (PLINK, Purcell et al., 2007). After quality control, a total of 1,928,388 SNPs remained for further analysis.

Analysis using the d statistic

Locus-specific divergence in allele frequencies for each QH subpopulation was calculated within non-overlapping 10 kb windows across the 31 autosomes using the d statistic, as previously described (Akey et al., 2010; Petersen et al., 2013b). For each genomic window, d, or the locus-specific divergence of allele frequency for a given subpopulation compared to the genome-wide average of pair-wise FST values for that subpopulation relative to all other subpopulations (expected divergence), was computed (Schaefer, 2018). In other words, this statistic detects locus-specific deviation in allele frequencies for each subpopulation relative to the genome-wide average of pairwise FST values summed across all 6 subpopulations. This method was chosen for calculation because normalizing the standard FST statistic reduces the incidence of false positive tests that are driven by evolutionary relationships between populations rather than by selective pressures. Further, unlike single marker tests, the computing of FST across haplotype windows helps to control for stochastic variation and SNP ascertainment bias. The mean number of SNPs per 10kb window was 8.2 (± 3.2), and the total number of windows analyzed per subpopulation was 217,806. Significant d windows were considered as those corresponding to the top 0.1% of the empirical distribution (approximately 218 windows per subpopulation) and were considered putative signatures of selection. Two or more significant d windows located within 500 kb of each other were considered as a single ROI for analysis purposes.

Analysis using hapFLK

The hapFLK is a haplotype-based approach applied to unphased genotypic data. The hapFLK 1.3.0 program version (https://forge-dga.jouy.inra.fr/projects/hapflk) was used to detect signatures of selection through haplotype differentiation among hierarchically structured populations, as described by Fariello et al. (2013). Briefly, FLK is an extension of the Lewontin and Krakauer (LK) test for the heterogeneity of the inbreeding coefficient, which uses phylogenetically estimated relationships between the subpopulations, overcoming limitations to the LK test due to highly correlated allele frequencies between subpopulations. HapFLK uses local haplotype data, the differences in haplotype allele frequencies between populations, and the hierarchical structure of subpopulations to identify genomic regions undergoing selection. The software was run on a per chromosome basis using a kinship matrix, constructed from population-based Reynolds distances, and the genotypes (*.PED and *.MAP files) for each population. For this study, no outgroups were defined, 10 clusters (-K 10) were used for the fastPHASE model, and the hapFLK statistic was computed for 20 EM runs to fit the LD model (–nfit = 20). Then, p-values were computed for each SNP-specific value using a Python script provided with the hapFLK program, and values were considered significant ROI if –log10 (p-value) > 4, as applied previously (Fariello et al., 2013).

Analysis using hapQTL

Local haplotype sharing within each subpopulation was calculated with hapQTL (http://www.haplotype.org) using default settings (Xu and Guan, 2014). This approach relies on a statistical model for linkage disequilibrium (LD) to infer ancestral haplotypes and their frequencies at each SNP marker for each individual within a population. In hapQTL, each SNP is used as a core marker to calculate the extent of local haplotype sharing (LHS)—or the probability of two individuals descending from the same ancestral haplotypes—within each subpopulation. For each analysis 1 EM run was used with 50 steps (-w 50), 3 upper clusters (-C 3), 10 lower clusters (-c 10), and with a prior LD length of 0.5 centimorgan (-mg 200). Clusters of SNPs with –log10 (Bayes Factor) > 5 were considered significant ROI, and orphan signals were removed from the analysis due to the fact that these do not constitute haplotypes.

Candidate gene identification and prioritization

Candidate genes within the ROI identified by d and hapFLK were retrieved with the Ensembl BioMart tool (www.ensembl.org), using the EquCab2 dataset for each subpopulation. To refine the list of candidate genes in ROI and further understand their association with phenotypes of interest for each subpopulation based on their biological features, functionalities, and biological pathways, a list of seed genes based on hypothesized tissue and metabolic pathways of interest was generated using the PolySearch2.0 algorithm (Liu et al., 2015). The resultant list of seed genes was retrieved from PolySearch2.0 for skeletal muscle development, metabolism, and nervous system development (Supplementary Table 1). This list was further curated manually by analyzing relevant literature using PubMed (https://www.ncbi.nlm.nih.gov/pubmed) for candidate genes associated with phenotypes or pathways of interest for each subpopulation. The number of seed genes chosen for each phenotype was based on work described by Liu et al. (2015). Candidate gene prioritization within the ROI was then performed using Endeavor-GW (https://endeavour.esat.kuleuven.be/Endeavour.aspx) (Tranchevent et al., 2016).

Pathway and functional analysis

Functional classification, statistical overrepresentation and pathway analyses of candidate genes for each subpopulation were conducted using PANTHER 11.1 (http://pantherdb.org/) (Mi et al., 2017), and FunRich3.0 (http://funrich.org/index.html) (Pathan et al., 2015), using default parameters. Specifically, fold enrichment analyses were included for: cellular component, molecular function, biological process, biological pathway, protein domain, site of expression, transcription factor, clinical phenotype associated with human homologs, their expression sites in humans, and results from the statistical overrepresentation analysis.

Results

Identifying signatures of selection using the d statistic

To identify population-specific loci under positive selection in the 6 QH subpopulations, we calculated the d value (Akey et al., 2010; Petersen et al., 2014) for approximately 218,000 non-overlapping 10 kb windows along the genome (Figure 1). Each subpopulation had several clearly defined ROI comprising contiguous windows with significant d values, in addition to numerous ROI comprising a single or just a few windows. The number of ROI identified per subpopulation varied from 39 in the western pleasure subpopulation to 77 in the racing subpopulation, for a total of 346 ROI, with sizes ranging from 10 kb to 1.8 Mb (Table 2). The number of annotated genes located within these ROI ranged from 78 in the working cow subpopulation to 171 in the racing group, with an average of 105 genes identified per subpopulation. Across all 6 QH performance groups a total of 635 genes were situated within these putative signatures of selection. A complete list of the significant d windows (top 0.1% of the empirical distribution) for each subpopulation is provided in Supplementary Table 2.
Figure 1

Genome-wide d values for the 6 QH subpopulations. Each d value is plotted on the y axis and each autosome is shown on the x axis in alternating colors. Each point represents a 10 kb window. The red horizontal line represents the top 0.1% of the empirical distribution of dfor each QH subpopulation.

Table 2

ROI identified by d analysis in the 6 QH subpopulations.

QH subpopulationNumber of ROI identifiedNumber of genes
Cutting60134
Halter4790
Racing77171
Reining5281
Western pleasure3981
Working cow5778
Total346635

Totals do not consider overlap between populations or analysis methods.

Genome-wide d values for the 6 QH subpopulations. Each d value is plotted on the y axis and each autosome is shown on the x axis in alternating colors. Each point represents a 10 kb window. The red horizontal line represents the top 0.1% of the empirical distribution of dfor each QH subpopulation. ROI identified by d analysis in the 6 QH subpopulations. Totals do not consider overlap between populations or analysis methods. A total of 56 d windows were found to be shared between 2 or more subpopulations; of those, 50 were shared by 2 performance groups, 5 by 3 performance groups and 1 was shared by 4 performance groups (Table 3). The subpopulations with the most shared signatures of selection identified by dstatistics were racing and reining (14), and halter and reining (9). Interestingly, no significant d windows were shared between the racing and cutting subpopulations, nor between the racing and the western pleasure subpopulations. No dwindows were shared by all six subpopulations.
Table 3

ROI identified by d analysis that are shared by two or more QH subpopulations.

ChromosomePosition (bp)CuttingHalterRacingReiningWestern pleasureWorking cow
113,685,246XX
1119,564,500XX
1119,676,493XX
1161,715,511XX
228,112,330XXX
263,922,661XX
293,844,075XX
293,904,657XX
295,577,244XX
295,962,979XX
296,425,441XX
312,705,501XX
368,445,127XX
418,376,592XX
419,475,682XX
435,285,735XX
4106,747,371XXX
565,517,082XX
625,394,821XX
626,913,211XX
630,595,130XX
630,637,559XX
630,686,504XX
630,705,641XX
630,724,394XX
630,753,624XX
631,086,772XX
631,334,561XX
681,474,997XXXX
681,534,461XXX
75,529,473XX
765,654,968XX
969,646,156XX
970,735,787XX
970,814,115XX
1023,574,776XXX
1023,956,861XXX
1024,014,382XX
1066,622,713XXX
1071,364,970XX
1083,744,000XX
113,863,250XX
1135,553,211XX
1136,006,028XX
1138,335,749XX
1624,163,505XX
1715,344,171XX
1819,864,538XX
1850,904,645XX
2012,885,496XX
2025,186,056XX
2035,886,068XX
256,263,862XX
256,734,228XX
2514,862,570XX
2814,573,111XX
ROI identified by d analysis that are shared by two or more QH subpopulations.

Identifying signatures of selection using hapFLK

Per chromosome hapFLK values were computed for all 6 subpopulations combined (total of 143 individuals), p-values were calculated from the null distribution of empirical values, and the negative log of p-values was plotted against the genomic position for each chromosome (Figure 2). A total of 6 ROI containing at least one significant peak [–log10 [p-value] > 4] in the hapFLK scan were identified on 5 different autosomes (Table 4). These regions, located on ECA1, 7, 9, 15, and 21 (with two nearby regions), harbor a total of 7 genes putatively undergoing selection.
Figure 2

Haplotype-based hapFLK results for all 6 QH subpopulations. Chromosome number and statistical significance ([–log10] p-values) are plotted in the x and y axes, respectively. The genome-wide significance threshold corresponding to P < 0.0001 is shown as a horizontal red line.

Table 4

ROI identified by hapFLK and their overlap with ROI identified by d and hapQTL.

ChromosomePosition (bp)Lowest p-valueGene symbolsCuttingHalterRacingReiningWestern pleasureWorking cow
140,502,663-40,641,8890.000128973PTEN, ATAD1hapQTLdihapQTL
75,501,423-5,596,6280.000068200CERS4, CD320, RPS28Bothdi
970,643,295-70,822,5360.0000146BothdiBoth
1523,550,251-23,566,2680.000127614hapQTL
2129,967,251-29,989,8560.00006170PRLRhapQTLhapQTLhapQTLhapQTL
2131,684,806-31,731,7260.00004790NPR3hapQTL

Chromosomal location, genomic coordinates, lowest p-value and genes contained within each hapFLK ROI (left). For each QH subpopulation (right), “d.

Haplotype-based hapFLK results for all 6 QH subpopulations. Chromosome number and statistical significance ([–log10] p-values) are plotted in the x and y axes, respectively. The genome-wide significance threshold corresponding to P < 0.0001 is shown as a horizontal red line. ROI identified by hapFLK and their overlap with ROI identified by d and hapQTL. Chromosomal location, genomic coordinates, lowest p-value and genes contained within each hapFLK ROI (left). For each QH subpopulation (right), “d.

Identifying haplotype structure using hapQTL

Local haplotype sharing (LHS) within subpopulations was delineated by analyses using hapQTL (Xu and Guan, 2014). Clusters of contiguous significant SNP markers with –log (Bayes Factor) > 5 were considered as ROI as they most likely represent putative ancestral haplotypes shared by individuals from each subpopulation. Genome-wide hapQTL results for each QH subpopulation are shown in Figure 3, where shared ancestral haplotypes are shown as peaks containing contiguous significant SNPs. Each subpopulation displays numerous significant ROI, typically many on each chromosome, and spread across the entire genome. This approach was used to further validate the signatures of selection detected by the previously described methods, as such regions are most likely found in locally conserved haplotypes that are fixed or nearly fixed within each subpopulation. The number of significant SNPs varies widely across the 6 populations, from < 4,000 in the cutting and reining groups, to > 100,000 in the halter, racing, western pleasure, and working cow subpopulations. A complete list of all SNPs with significant hapQTL values for each subpopulation is provided in Supplementary Table 3.
Figure 3

Genome-wide hapQTL values for the 6 QH subpopulations. Bayes factor values for each SNP are plotted on the y axis and each autosome is shown on the x axis in alternating colors.

Genome-wide hapQTL values for the 6 QH subpopulations. Bayes factor values for each SNP are plotted on the y axis and each autosome is shown on the x axis in alternating colors.

Summation and overlap among selection signature detection methods

Genome-wide overlap in significant ROI identified by each of the three methods was assessed in each subpopulation (Figure 4; Supplementary Figures 1–6). Approximately 90% of significant d windows were contained within shared ancestral haplotypes identified by hapQTL, with little variation in the percent overlap between d and hapQTL ROI among the subpopulations. The observed overlap ranged from 178 shared hits (81% of significant d windows) in the reining subpopulation, to 206 common ROI (94% of significant d windows) in the racing subpopulation (Supplementary Figures 1–6). In addition, all 6 ROI identified by hapFLK were also found to be significant by the dstatistic, hapQTL or both methods (Table 4).
Figure 4

Circos plot showing genome-wide significant d values (red layer), hapQTL values (orange layer), and hapFLK values (blue layer) across all 31 autosomes for the reining QH subpopulation.

Circos plot showing genome-wide significant d values (red layer), hapQTL values (orange layer), and hapFLK values (blue layer) across all 31 autosomes for the reining QH subpopulation.

Candidate gene prioritization

All annotated genes contained within ROI were retrieved using Ensembl BioMart (complete list provided in Supplementary Table 4). After retrieving annotated equine genes within ROI, PolySearch 2.0 (http://polysearch.ca) was used to generate a list of seed genes associated with traits of interest for improved performance (metabolism, skeletal muscle development, central nervous system). A manual curation of each list was performed using literature obtained from PubMed (https://www.ncbi.nlm.nih.gov/pubmed/) for their association with the aforementioned phenotypes. The resultant list of 33 seed genes: 11 using “skeletal muscle” as seed, 11 using “central nervous system” as seed, and 11 using “metabolism” as seed, is shown in Supplementary Table 1. These seed lists were used to prioritize candidate genes retrieved for each subpopulation based on phenotypes of interest using Endeavor-GW. Thus, in addition to the original seed genes, the output resulted in a total of 78–171 ranked genes for each subpopulation based on the phenotypes of interest: 635 across all subpopulations (602 unique, and 33 shared between 2 or more subpopulations) for use in pathway analysis (Supplementary Tables 5–7). Supplementary Table 8 lists the top 3 candidate genes per subpopulation, chosen according to possible associations with the desired performance characteristics based on their biological pathways and/or functions. Table 5 lists the 33 genes putatively undergoing selection in 2 or more subpopulations.
Table 5

Genes undergoing selection in 2 or more subpopulations.

Gene NameCuttingHalterRacingReiningWestern pleasureWorking cow
ABCA13XX
ANKFN1XXX
ARHGAP20XX
ARMC12XX
C11orf53XX
CACNA2D1XX
COLCA2XX
DCAF11XX
DDX11XX
DYNC1/2XX
FDX1XX
GLE1XX
GNEXX
HECW1XX
INVSXX
KIAA1191XX
LENG8XX
LOXL1XX
MAP3K20XX
PARD3XX
PIP4K2AXXX
POU2AF1XX
SLC25A27XX
SRPK1XX
STOML1XX
STX12XXXX
TBC1D21XX
TPD52L1XX
TSPAN9XXX
TTYH1XXX
UNC79XX
WASHC1XX
ZPBPXX

Two genes (7SK and U6) were not counted because they have copies in several genomic loci.

Genes undergoing selection in 2 or more subpopulations. Two genes (7SK and U6) were not counted because they have copies in several genomic loci. The biological functions and pathways in which genes from ROI are involved were assessed using PANTHER 11.1 (Mi et al., 2017) and FunRich3.0 (http://funrich.org/index.html) (Pathan et al., 2015). Supplementary Tables 9–14 provide PANTHER 11.1 and FunRich3.0 analysis reports for each QH subpopulation. These analyses provided information on the common features and shared functions of the genes putatively undergoing genomic selection in different QH performance groups, according to the expected performance profile of elite individuals in each category.

Discussion

Our approach to detecting genome-wide signatures of selection in the QH subpopulations combines different methods to increase efficiency, reliability, and scope. This study revealed that there are hundreds of loci likely undergoing selection, of which many are almost certainly contributing to desired specialized phenotypes, in each subpopulation. Overlap between statistical methods to detect selection (namely d and hapFLK) assisted in pinpointing ROI that most likely constituted true signatures of selection within each subpopulation. Furthermore, the use of hapQTL to confirm that the vast majority of genomic regions undergoing selection constituted shared haplotypes within each subpopulation increased the robustness of our analysis. Although hapQTL was not designed specifically to identify genomic signatures of selection per se, it has been used to infer ancestral haplotype associations among individuals with similar phenotype (Xu and Guan, 2014). Therefore, combined with ROI identified by d and hapFLK, this approach allowed for a more confident interpretation of the association between candidate genes found within such ROI and the unique phenotypic features of each QH subpopulation. This work also reveals the many types of biological mechanisms that can contribute to the complex muscle, nervous system and metabolic phenotypes, as well as the difficulties likely inherent in attempting to identify the functional alleles associated with them. Among the 27 signatures of selection identified by Meira et al. (2014) for the racing QH via REHH and FST, a total of 14 overlap with the ROI identified by the present study: 6 of those were identified by d statistics, and 8 were highlighted by hapQTL. Of those, 4 were identified by both d statistics and hapQTL analysis. In addition, when comparing the 36 signatures of selection identified by Beltrán et al. (2015) in the cutting QH to those detected for this subpopulation in the present study, a total of 11 overlap: 2 identified by both dand hapQTL, and 9 identified exclusively by hapQTL. A summary of these comparisons is shown in Supplementary Table 15. Overlap across studies for these ROI in these two subpopulations is encouraging, making them a top priority in attempting to identify the putative functional alleles associated with performance phenotypes. Nevertheless, our study also benefitted from a higher SNP density, utilizing different methods of detection, and the specific inclusion of only elite performers from 6 different QH subpopulations. This enabled us to define more, and likely highly important, signatures of selection. Moreover, our functional, network, and pathway analyses in the different populations show different populations of genes undergoing selection with different functions and interactions in each QH subpopulation. Several examples from each subpopulation are discussed below.

The cutting subpopulation

Pathway analysis shows an approximate 30-fold enrichment in genes associated with formation of the pyruvate dehydrogenase (PDH) mitochondrial complex and with pyruvate metabolism (DLAT, PDHX; p-value = 0.001) in this subpopulation (Supplementary Table 9). Besides being involved in glucose metabolism, human studies suggest that variants in these genes are associated with neuromuscular disorders and neuronal function (Barnerias et al., 2010). Candidate gene prioritization for skeletal muscle development (Supplementary Table 5) and metabolism (Supplementary Table 7) confirms these findings by assigning high ranks for the aforementioned genes.

The halter subpopulation

In the halter subpopulation, pathways associated with lipid metabolism and function, such as digestion of dietary lipid (59-fold enrichment; p-value = 0.016) and platelet sensitization by LDL (58-fold enrichment; p-value = 0.0004) were identified. It can be hypothesized that the selective breeding for lean and muscular phenotypes in this subpopulation may have led to the selection of genes involved in such metabolic pathways. On the other hand, we found no evidence for a signature of selection near the SCN4A gene locus on chromosome 11 (ECA11:15 Mb) in the halter horse subpopulation. It has been widely assumed that a combination of extensive breeding to a halter horse stallion carrying a SCN4A mutation responsible for the neuromuscular disorder hyperkalemic periodic paralysis (HYPP) (Rudolph et al., 1992), as well as resultant epistatic effects of this mutation on muscle contractility, was responsible for the perceived increase in muscling desired in this population. Our data suggests that SCN4A locus itself is not driving selection for the muscling in this halter horse line. On the other hand there is an ROI located 20 Mb upstream from SCN4A (ECA11:34–36 Mb) identified both by dand hapQTL analyses, containing a total of 13 genes (Supplementary Table 4), that are potentially worth investigating for the presence of variants associated with skeletal muscle development in halter horses, such as MYO19 (ECA11:35.7 Mb), associated with mitochondrial transport during mitosis.

The racing subpopulation

Our d analysis identified four ROI on ECA1 (ECA1:12,256,704-14,342,963; ECA1:40,505,469-41,555,488; ECA1:45,034,220-46,901,028; ECA1:115,914,274-122,185,069) that are found exclusively in the racing QH subpopulation, were found to constitute ancestral shared haplotypes by hapQTL, and were not identified by Meira et al. (2014). Together, these “racing” ROI contain 99 genes, or approximately 58% of all genes undergoing selection in this subpopulation (see Supplementary Table 11). After prioritization, top candidate genes located in these and other loci showing association with the racing phenotype were also identified. One of the top candidate genes for racing ability is BAG3 (BCL2-associated athanogene 3, ECA1:12.8 Mb); variants and/or deletions in this gene have been associated with myopathies, neuropathies and cardiomyopathies in various human subpopulations (Franaszczyk et al., 2014; Kostera-Pruszczyk et al., 2015), suggesting a potential involvement of this gene in skeletal muscle and heart function, essential phenotypes for individuals with increased racing ability. Another example is PRKCA (Protein kinase C alpha, ECA11:13.4 Mb), a member of the of serine- and threonine-specific protein kinase family. Knockout studies in mice suggest that this kinase may be a fundamental regulator of cardiac contractility and calcium handling in myocytes (Braz et al., 2004). Future studies of putative selection in Thoroughbreds and Standardbreds, breeds that have historically been selected for racing performance, will help validate and possibly expand this list of putative candidate genes for racing ability on ECA1 and elsewhere across the genome. Finally, pathway analysis showed a ten-fold enrichment in genes associated with netrin-1 signaling (p-value = 0.00054). These genes play an important role in axon guidance, on the structural integrity, and on the regulation of the transmission of the nervous stimulus through neuronal axons (Supplementary Table 7; Koticha et al., 2006). Regulation of such physiological functions is essential for improved athletic performance in racehorses.

The reining subpopulation

Elite reining performers must possess skills that allow them to move effortlessly around the arena, quickly reacting to subtle hand and foot cues from the rider in order to execute movements such as loping, galloping, circling, spinning, and stopping. Therefore, reining horses must possess improved athleticism and intellectual abilities in order to excel in the sport. Analysis of genomic regions undergoing selection within this subpopulation reveals genes associated with such traits. Examples include RDX (ECA7:18.9 Mb), which plays a role in the binding of the barbed end of actin filaments to the plasma membrane and has been associated with skeletal muscle as well as behavior/neurological phenotypes in humans; KIF1A (ECA6:26.1 Mb), involved in axonal transport of synaptic vesicle precursors, and BACE1 (ECA7:25.2 Mb), both associated with different neurological phenotypes in humans. It is worth noting that highly significant signals were detected using all 3 statistical approaches on a ~300 kb region of the genome in the reining QH subpopulation (ECA9:70.6 Mb−70.9 Mb; see Table 3, Supplementary Tables 2–3). Interestingly, this region does not contain any annotated genes in EquCab2, which might indicate that another genomic feature such as a regulatory sequence, non-coding RNA, or transposable element for example, might be driving selection in this region but cannot yet be identified due to the current state of the equine genome annotation.

The western pleasure subpopulation

In the western pleasure subpopulation, candidate genes located on ECA7 are also of interest when considering selection for the phenotype exhibited by individuals from this group. Western pleasure is a performance style in which horses need to show a relaxed and collected disposition during competition. In this event, animals are asked by their riders to walk, trot and lope in both directions of the arena, as well as to gently come to a stop and to stand still at command. In order to excel in western pleasure, individuals need to have a calm and quiet temperament and smooth, controlled movements. Moreover, elite athletes are expected to possess cognitive and decision-making skills that are not exclusively dependent on the rider in order to excel in competitions. Genomic selection analyses show a collection of 24 genes located within ROI on ECA7 (see Supplementary Table 13). Interestingly, half (12) of these ECA7 genes are associated with cognition, behavior and nervous system phenotypes, including AARS, GRIK4, NTM, PGLS, and SORL1.

The working cow subpopulation

Working cow horses, also known as reined cow horses, perform in events in which they are asked by the rider to work a single cow in an arena, performing specific maneuvers that include circling the cow, directing it in specific patterns, and performing a reining routine. This sport is similar to cutting because horses are also required to work alongside a cow, and for that they must possess “cow sense.” Genes associated with these phenotypes were identified within genomic signatures of selection in the working cow subpopulation, such as DRD2 (ECA7:21.8 Mb) and GRID2 (ECA3:44.7 Mb), associated with cognitive and behavioral traits in humans. The APP gene (ECA26:23.5 Mb), one of the top candidate genes for central nervous system and skeletal muscle development in this subpopulation, is involved in neural growth and maturation during brain development, and has been associated with the pathology of Alzheimer's disease in humans (Coronel et al., 2018). Across all the subpopulations, approximately 90% of significant d windows were also observed by hapQTL as ancestral haplotypes; further, all 6 ROI identified by hapFLK were also significant either by dstatistics or hapQTL analysis. These results corroborate the efficacy of the methods utilized in this study, and show that the use of a combination of different methods to detect genomic selection, coupled with hapQTL, increases the chances of confidently identifying a larger spectrum of regions undergoing selection. It is also interesting to note that the hapFLK analysis identified far fewer selection signatures in the present study than d. hapFLK is expected to be more stringent than d, which typically suffers from selection bias and false positives; this is because hapFLK accounts for haplotype structure of each subpopulation for the significance calculation (Fariello et al., 2013). Moreover, these statistical tests can capture different selection signals: hapFLK is not as powerful for detection of more ancient signatures of selection, whereas d may fail to detect signatures of selection when SNPs within a window are not in high LD with the causative variant (Kijas et al., 2012; Fariello et al., 2013). Therefore, only the top 0.1% of d windows was considered for this analysis, which is consistent with studies in other species using similar approaches (Kijas et al., 2012; McRae et al., 2014; Zhao et al., 2015; Purfield et al., 2017). Limitations to our study include the fact that several of the significant ROI identified did not contain protein-coding genes annotated in the current equine genome assembly (EquCab2). We hypothesize that this is due to limitations in the current equine genome annotation for regulatory regions, non-coding RNAs and sequence motifs, which might be driving selection in some of the identified ROI but are not identified as such in EquCab2. However, there are 20,449 protein coding genes annotated in EquCab2, so pathway analysis still constitutes a useful tool to predict putative functionality of candidate genes within ROI. Another potential limitation to the approach used in this study is that population differentiation at the genome level is not always a product of selective sweeps; it can also arise due to inversions and random genetic drift, for example. This can be tested with functional studies, which are not part of the scope of this work. Other limitations include arbitrary significance thresholds utilized in different detection methods for genomic selection (which might hinder the ability to detect ROI just below said threshold), and the possibility of false positives stemming from each statistical method used. Also, the close genetic relatedness among the performance groups used in this study hinder the ability to detect signatures of selection due to the occurrence of a high number of shared ancestral haplotypes among subpopulations. These subpopulations are comprised of individuals belonging to the same breed, that have been selectively bred for only approximately 60 years; thus, a high level of genetic similarity between different subpopulations is to be expected. Finally, pathways analyzed for gene prioritization were limited to skeletal muscle development, CNS and metabolism, while selection may also be occurring for other less obvious phenotypes of interest for each subpopulation. As shown by previous work (Petersen et al., 2013b), signatures of selection are evident in all major breeds of horse, and understanding the alleles and functional bases for their phenotypic effects will greatly expand our understanding of equine biology. The aforementioned work (Petersen et al., 2013b) utilized 744 individuals from 33 breeds, and a low-density 54K SNP genotyping array, to identify breed-specific signatures of selection using d statistics calculated in 500-kb windows. d windows falling into the 99th percentile of the empirical distribution (a total of 33 for each breed) were considered significant. Among the top 33 d windows identified for the Quarter Horse, 4 are shared with the top 218 dhits in this study: ECA2:95.7 Mb (shared by the cutting, western pleasure and working cow subpopulations), ECA3:26.5 Mb (found in the western pleasure subpopulation), ECA4:19.4 Mb (shared by the racing and working cow subpopulations), and ECA18:65.2 Mb (found in the cutting group). While none of the significant signatures of selection identified by hapFLK in this study were found in the work performed by Petersen et al. (2013b), a total of 18 of the 33 top dhits found in that study were also significant by hapQTL analysis in the present work, constituting shared ancestral haplotypes between individuals of one or more QH subpopulations. Among those, one of particular interest is the ROI located on ECA18:66.3 Mb which contains the myostatin (MSTN) gene, where a SINE insertion allele in the promoter region is associated with an increased gluteal muscle type 2A fiber proportion and contributes to the outstanding sprinting ability of this breed (Petersen et al., 2013b). A common 780.7 kb haplotype encompassing MSTN was shared by 91.3% of Quarter Horses used in the study (Petersen et al., 2013b), in agreement with the hapQTL results presented herein, where this genomic region constitutes an ancestral haplotype shared by all 6 QH subpopulations (Supplementary Table 3). The use of 2 million SNP markers and the comparison of subpopulations within the QH breed, as opposed to comparing the breed as a whole to other horse breeds, likely contribute to differences between the results of this study and that of Petersen et al. (2013b). Our work represents a more thorough study of selection in horse breeding by leveraging high-density SNP data within a single breed to demonstrate that there are many regions of selection ongoing within each of the performance categories of the American Quarter Horse. Moreover, the use of a combination of different statistical and haplotype analysis methods in this study has allowed for a broader range of detection of selection signatures. Although at present we cannot assess issues of false positives, and gene missingness due to incorrect/incomplete genome annotation, we consider this an excellent start. We expect that future high-density analyses will be forthcoming across racing breeds and again across a wide diversity of other breeds, and that from this collection key haplotypes and functional alleles underlying selection will be discovered and used to further select and hopefully improve the populations.

Author contributions

FA: performed the signatures of selection, candidate gene, and pathway analyses; RS: managed the data pipeline and performed the imputation; MM, FA, and JM: designed the study; FA, JM, and MM: wrote the manuscript; RS: edited the manuscript. All authors have read and approved the final version.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  50 in total

1.  The hitch-hiking effect of a favourable gene.

Authors:  J M Smith; J Haigh
Journal:  Genet Res       Date:  1974-02       Impact factor: 1.588

2.  Uncovering Adaptation from Sequence Data: Lessons from Genome Resequencing of Four Cattle Breeds.

Authors:  Simon Boitard; Mekki Boussaha; Aurélien Capitan; Dominique Rocha; Bertrand Servin
Journal:  Genetics       Date:  2016-03-26       Impact factor: 4.562

3.  Selection signatures in Shetland ponies.

Authors:  M Frischknecht; C Flury; T Leeb; S Rieder; M Neuditschko
Journal:  Anim Genet       Date:  2016-02-09       Impact factor: 3.169

4.  Pyruvate dehydrogenase complex deficiency: four neurological phenotypes with differing pathogenesis.

Authors:  Christine Barnerias; Jean-Marie Saudubray; Guy Touati; Pascale De Lonlay; Olivier Dulac; Gerard Ponsot; Cécile Marsac; Michèle Brivet; Isabelle Desguerre
Journal:  Dev Med Child Neurol       Date:  2009-12-01       Impact factor: 5.449

Review 5.  Looking for signatures of sex-specific demography and local adaptation on the X chromosome.

Authors:  Evelyne Heyer; Laure Segurel
Journal:  Genome Biol       Date:  2010-01-28       Impact factor: 13.583

6.  Genomic scan of selective sweeps in thin and fat tail sheep breeds for identifying of candidate regions associated with fat deposition.

Authors:  Mohammad Hossein Moradi; Ardeshir Nejati-Javaremi; Mohammad Moradi-Shahrbabak; Ken G Dodds; John C McEwan
Journal:  BMC Genet       Date:  2012-02-26       Impact factor: 2.797

7.  Composite selection signals can localize the trait specific genomic regions in multi-breed populations of cattle and sheep.

Authors:  Imtiaz Ahmed Sajid Randhawa; Mehar Singh Khatkar; Peter Campbell Thomson; Herman Willem Raadsma
Journal:  BMC Genet       Date:  2014-03-17       Impact factor: 2.797

8.  Population Variation Reveals Independent Selection toward Small Body Size in Chinese Debao Pony.

Authors:  Adiljan Kader; Yan Li; Kunzhe Dong; David M Irwin; Qianjun Zhao; Xiaohong He; Jianfeng Liu; Yabin Pu; Neena Amatya Gorkhali; Xuexue Liu; Lin Jiang; Xiangchen Li; Weijun Guan; Yaping Zhang; Dong-Dong Wu; Yuehui Ma
Journal:  Genome Biol Evol       Date:  2015-12-03       Impact factor: 3.416

9.  Genome-wide analysis reveals selection for important traits in domestic horse breeds.

Authors:  Jessica L Petersen; James R Mickelson; Aaron K Rendahl; Stephanie J Valberg; Lisa S Andersson; Jeanette Axelsson; Ernie Bailey; Danika Bannasch; Matthew M Binns; Alexandre S Borges; Pieter Brama; Artur da Câmara Machado; Stefano Capomaccio; Katia Cappelli; E Gus Cothran; Ottmar Distl; Laura Fox-Clipsham; Kathryn T Graves; Gérard Guérin; Bianca Haase; Telhisa Hasegawa; Karin Hemmann; Emmeline W Hill; Tosso Leeb; Gabriella Lindgren; Hannes Lohi; Maria Susana Lopes; Beatrice A McGivney; Sofia Mikko; Nicholas Orr; M Cecilia T Penedo; Richard J Piercy; Marja Raekallio; Stefan Rieder; Knut H Røed; June Swinburne; Teruaki Tozaki; Mark Vaudin; Claire M Wade; Molly E McCue
Journal:  PLoS Genet       Date:  2013-01-17       Impact factor: 5.917

10.  Genetic diversity in the modern horse illustrated from genome-wide SNP data.

Authors:  Jessica L Petersen; James R Mickelson; E Gus Cothran; Lisa S Andersson; Jeanette Axelsson; Ernie Bailey; Danika Bannasch; Matthew M Binns; Alexandre S Borges; Pieter Brama; Artur da Câmara Machado; Ottmar Distl; Michela Felicetti; Laura Fox-Clipsham; Kathryn T Graves; Gérard Guérin; Bianca Haase; Telhisa Hasegawa; Karin Hemmann; Emmeline W Hill; Tosso Leeb; Gabriella Lindgren; Hannes Lohi; Maria Susana Lopes; Beatrice A McGivney; Sofia Mikko; Nicholas Orr; M Cecilia T Penedo; Richard J Piercy; Marja Raekallio; Stefan Rieder; Knut H Røed; Maurizio Silvestrelli; June Swinburne; Teruaki Tozaki; Mark Vaudin; Claire M Wade; Molly E McCue
Journal:  PLoS One       Date:  2013-01-30       Impact factor: 3.240

View more
  12 in total

1.  Dietary tributyrin supplementation and submaximal exercise promote activation of equine satellite cells.

Authors:  Madison L Gonzalez; Robert D Jacobs; Kristine M Ely; Sally E Johnson
Journal:  J Anim Sci       Date:  2019-12-17       Impact factor: 3.159

2.  Runs of homozygosity in Sable Island feral horses reveal the genomic consequences of inbreeding and divergence from domestic breeds.

Authors:  Julie Colpitts; Philip Dunstan McLoughlin; Jocelyn Poissant
Journal:  BMC Genomics       Date:  2022-07-12       Impact factor: 4.547

3.  Exploring the genetics of trotting racing ability in horses using a unique Nordic horse model.

Authors:  Brandon D Velie; Mette Lillie; Kim Jäderkvist Fegraeus; Maria K Rosengren; Marina Solé; Maja Wiklund; Carl-Fredrik Ihler; Eric Strand; Gabriella Lindgren
Journal:  BMC Genomics       Date:  2019-02-04       Impact factor: 3.969

4.  Selection signatures in four German warmblood horse breeds: Tracing breeding history in the modern sport horse.

Authors:  Wietje Nolte; Georg Thaller; Christa Kuehn
Journal:  PLoS One       Date:  2019-04-25       Impact factor: 3.240

5.  Exploration of fine-scale recombination rate variation in the domestic horse.

Authors:  Samantha K Beeson; James R Mickelson; Molly E McCue
Journal:  Genome Res       Date:  2019-08-21       Impact factor: 9.043

6.  Multiple Selection Signatures in Farmed Atlantic Salmon Adapted to Different Environments Across Hemispheres.

Authors:  María Eugenia López; Tyler Linderoth; Ashie Norris; Jean Paul Lhorente; Roberto Neira; José Manuel Yáñez
Journal:  Front Genet       Date:  2019-10-01       Impact factor: 4.599

Review 7.  Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era.

Authors:  T Raudsepp; C J Finno; R R Bellone; J L Petersen
Journal:  Anim Genet       Date:  2019-09-30       Impact factor: 3.169

8.  Genome-wide detection of signatures of selection in indicine and Brazilian locally adapted taurine cattle breeds using whole-genome re-sequencing data.

Authors:  Elisa Peripolli; Christian Reimer; Ngoc-Thuy Ha; Johannes Geibel; Marco Antonio Machado; João Cláudio do Carmo Panetto; Andréa Alves do Egito; Fernando Baldi; Henner Simianer; Marcos Vinícius Gualberto Barbosa da Silva
Journal:  BMC Genomics       Date:  2020-09-11       Impact factor: 3.969

9.  Genomic Divergence in Swedish Warmblood Horses Selected for Equestrian Disciplines.

Authors:  Michela Ablondi; Susanne Eriksson; Sasha Tetu; Alberto Sabbioni; Åsa Viklund; Sofia Mikko
Journal:  Genes (Basel)       Date:  2019-11-27       Impact factor: 4.096

10.  Selection in Australian Thoroughbred horses acts on a locus associated with early two-year old speed.

Authors:  Haige Han; Beatrice A McGivney; Gabriella Farries; Lisa M Katz; David E MacHugh; Imtiaz A S Randhawa; Emmeline W Hill
Journal:  PLoS One       Date:  2020-02-12       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.