Literature DB >> 25071397

Identification of selection signatures in livestock species.

João José de Simoni Gouveia¹, Marcos Vinicius Gualberto Barbosa da Silva², Samuel Rezende Paiva³, Sônia Maria Pinheiro de Oliveira⁴.

Abstract

The identification of regions that have undergone selection is one of the principal goals of theoretical and applied evolutionary genetics. Such studies can also provide information about the evolutionary processes involved in shaping genomes, as well as physical and functional information about genes/genomic regions. Domestication followed by breed formation and selection schemes has allowed the formation of very diverse livestock breeds adapted to a wide variety of environments and with special characteristics. The advances in genomics in the last five years have enabled the development of several methods to detect selection signatures and have resulted in the publication of a considerable number of studies involving livestock species. The aims of this review are to describe the principal effects of natural/artificial selection on livestock genomes, to present the main methods used to detect selection signatures and to discuss some recent results in this area. This review should be useful also to research scientists working with wild animals/non-domesticated species and plant biologists working with breeding and evolutionary biology.

Entities: Chemical Disease Gene Species

Keywords: artificial selection; domestic animals; selective sweep

Year: 2014 PMID： 25071397 PMCID： PMC4094609 DOI： 10.1590/s1415-47572014000300004

Source DB: PubMed Journal: Genet Mol Biol ISSN： 1415-4757 Impact factor: 1.771

Introduction

Selection tends to cause specific changes in the patterns of variation among selected loci and in neutral loci linked to them. These genomic footprints left by selection are known as selection signatures and can be used to identify loci subjected to selection (Kreitman, 2000). The recent availability of genomic information on domestic animal species and the development of improved statistical tools make the identification of these footprints in a given species possible (International Chicken Genome Sequencing Consortium, 2004; The Bovine Genome Sequencing and Analysis Consortium, 2009; The International Sheep Genomics Consortium, 2010; Groenen ; Dong ). The identification of selection signatures is currently one of the principal interests of evolutionary geneticists because it can provide information ranging from basic knowledge about the evolutionary processes that are shaping genomes to functional information about genes/genomic regions (Nielsen, 2001, 2005; Schlötterer, 2003). For example, if a region that was not previously identified as contributing to any special trait in mapping experiments is targeted by selection in a specific population, then this information could lead to an initial inference about the functional characteristics of that region. This approach could also lead to the identification of genes related to ecological traits (e.g., genes related to tropical adaptation) that are difficult to identify through laboratory experiments and may also be useful in corroborating quantitative trait loci (QTL) mapping experiments in production animals. The final and certainly most ambitious aim of these studies is to identify the causal mutations that confer a selective advantage in a specific population or species (Nielsen, 2001; Schlötterer, 2003; Hayes ). Domestication greatly changed the morphological and behavioral characteristics of modern domestic animals and, along with breed formation and selection schemes for improving the production of specific products or achieving a morphological/behavioral standard, allowed the formation of very diverse modern breeds (Diamond, 2002; Toro and Mäki-Tanila, 2007; Flori ). These features, along with extensive knowledge about genomic regions that affect economically important traits and recent advances in the field of genomics, provide an excellent opportunity for identifying loci subjected to selection and for the validation of new methods developed to detect selection signatures (Hayes ; Flori ). In this review, we describe the effects of natural/artificial selection on genomes, summarize the main methods of detecting the footprints of selection and, finally, indicate and discuss studies aimed at detecting selection signatures in livestock.

Natural and Artificial Selection

Natural selection is a phenomenon driven by the environment in which individuals with specific genotypes have a differential capacity for contributing to the next generation’s gene pool (Falconer and Mackay, 1996; Templeton, 2006; Driscoll ). Natural selection could basically act in three ways: positive selection, purifying selection (also known as negative or background selection) and balancing selection. Each form of selection is a response to environmental pressure and acts differentially to alter the allelic and genotypic frequencies (Harris and Meyer, 2006; Oleksyk ). Positive selection occurs when a newly arisen mutation has a selective advantage over other mutations and, therefore, increases in frequency in the population (Kaplan ). In purifying selection, the disadvantageous variants that appear in the population tend to be removed, thereby maintaining the functional integrity of DNA sequences (Charlesworth ). Balancing selection occurs when polymorphism is favored, leading to increased genetic variability. Several biological processes can be grouped in this type of selection, e.g., overdominant selection (in which the heterozygote has a selective advantage), frequency dependent selection (in which different alleles are favored at different time intervals) and temporally or spatially heterogeneous selection (Charlesworth, 2006). In contrast to natural selection, artificial selection (also called selective breeding) is a human-mediated process in which the gene pool of the next generation does not depend exclusively (or necessarily) on fitness components, but also on traits chosen by humans. Artificial selection can be classified as unconscious selection or methodical selection – the former occurs when there is no long-term objective, and this has been suggested as the cause of the early domestication process. The second occurs when a standard or objective drives the choice of parents for the next generation. Despite these differences and considering that the time frame in which these changes occur is often considerably different, the genetic consequences of natural and artificial selection are essentially the same (Avise and Ayala, 2009; Driscoll ; Gregory, 2009).

Selection Signatures

The occurrence of selection creates departures from the neutral theory expectations in the patterns of molecular variation. Each form of selection causes specific changes in the selected loci and in neutral loci linked to them (Kreitman, 2000). When positive selection operates in a newly arisen allele that has a selective advantage it tends to increase in frequency in the population and carries linked neutral alleles along with it. This phenomenon is known as the hitchhiking effect or selective sweep (Maynard-Smith and Haigh, 1974; Charlesworth, 2007). The selective sweep reduces the heterozygosity of regions surrounding the selected locus (Kaplan ; Kim and Stephan, 2002) and introduces a skew in the site frequency spectrum (SFS) because of an excess of rare variants in the selected region (Braverman ; Kim and Stephan, 2002). An increase in the average linkage disequilibrium (LD) leading to long haplotypes is also expected in the region surrounding the selected site (Kim and Stephan, 2002). As LD decays and high frequency neutral alleles become fixed in the population after fixation of the selected mutation, this selection signature vanishes rapidly (Przeworski, 2002; Kim and Nielsen, 2004; McVean, 2007). Thus, a high frequency derived allele surrounded by a long-range LD is indicative of a recent selective sweep (Sabeti ; Voight ). In addition, the levels of within-population diversity tend to decrease while the between-population levels of diversity tend to increase in the region surrounding the selected locus (Beaumont, 2005; Storz, 2005). Furthermore, the number of nonsynonymous substitutions per nonsynonymous site (dN) tends to be higher than the number of synonymous substitutions per synonymous site (dS) (Nei, 2005; Harris and Meyer, 2006). The model of selective sweep in which a newly arisen allele with a strong selective advantage increases quickly in frequency until reaching fixation is known as “hard sweep”. In contrast, when the selected allele is part of existent genetic variation, it causes a “soft sweep” in which the footprint left by selection tends to be less pronounced and the frequency of the selected allele at the beginning of the selected phase is the crucial factor influencing the selective sweep (Przeworski ; Pritchard ). Balancing selection favors the maintenance of polymorphism (Harris and Meyer, 2006; Oleksyk ). The persistence of the same alleles for a long time is known as long-term balancing selection and, in addition to maintaining polymorphism in the selected locus, it also tends to increase diversity in tightly linked neutral sites; if the region under selection has low recombination rates then it generally also has longer coalescence times than other regions (Charlesworth, 2006). In the presence of long-term balancing selection, the within-population diversity levels tend to increase and the between-population levels of diversity tend to decrease (Navarro and Barton 2002; Charlesworth ; Charlesworth, 2006), leading to reduced inbreeding coefficient (FST) values among populations compared to neutral expectations (Beaumont, 2005; Storz, 2005). However, in some cases, the FST levels may be higher than expected by neutrality (Beaumont, 2005; Charlesworth, 2006). When negative (background) selection occurs, the novel variants are disadvantageous and are consequently removed from the population, along with neutral variations linked to them (Innan and Stephan, 2003). If the recombination rate in the region is restricted or the population is highly inbred then background selection reduces the variability around the eliminated sites (Charlesworth , 1995; Andolfatto, 2001; Stephan, 2010). An excess of low frequency alleles is also observed in small to moderately sized populations (Charlesworth , 1995) and the number of nonsynonymous substitutions per non-synonymous site tends to be lower than the number of synonymous substitutions per synonymous site (Nei, 2005; Harris and Meyer, 2006). However, in regions with normal recombination rates, or when inbreeding is restricted, no reduction in variability is observed (Charlesworth , 1995; Stephan, 2010). Furthermore, background selection does not cause a marked bias in the frequency spectrum (Charlesworth , 1995; Kim and Stephan, 2000; Andolfatto, 2001; Stephan, 2010). Selection signatures can be influenced by several factors. For example, the type of selection, the relative age of the neutral linked alleles, the strength of selection and the recombination rate (Braverman ; Kaplan ; Kim and Stephan, 2002; Charlesworth, 2007; McVean, 2007). Recognition of the molecular footprints left by different types of selection is a crucial task in identifying genomic regions subjected to selection. In this case, the neutral theory serves as the backbone for the statistical tests developed to detect selection signatures. However, in natural populations, some assumptions of the neutral theory can be violated (e.g., population expansion, subdivision and bottlenecking) and this can lead to signals that mimic the footprints of selection. The interaction of different types of selection and interaction between selection and demographic factors can bias the footprints left in the genome (Barton, 1998; Kim and Stephan, 2000; Kreitman, 2000; Charlesworth ; Harris and Meyer, 2006; Toro and Mäki-Tanila, 2007). Because of this, it is worth noting that in studies designed to detect selection signatures in livestock a considerable high rate of false positives is expected as a result of genetic drift and founder effect, both of which were particularly important during the development of livestock breeds (Petersen ).

Methods for Detecting Selected Loci

The methods proposed for detecting selected loci can be classified in different ways (Harris and Meyer, 2006; Oleksyk ). Based on the main variables that affect the patterns of molecular variation left by selection, Hohenlohe proposed a decision tree designed to identify the most appropriate method for each case. This decision tree is based primarily on the time scale in which selection can occur, but also considers other factors (e.g., the number of populations in the study, mode of selection, etc.) and can be used by researchers in studies designed to detect selection signatures.

Tests based on synonymous and non-synonymous substitution rates

When the coding sequences of orthologous genes of interest are compared, it is expected that under neutral evolution, d/d = 1. When positive selection is in effect, d/d > 1, and under negative selection, d/d < 1. Differences in d/d are also expected among lineages when selection is in effect (Yang, 1998). Several methods have been proposed to estimate d and d (Nei, 2005). These methods were initially approximations based on the comparison of two sequences (Nei and Gojobori, 1986). More recently, maximum likelihood estimates from multiple alignments that account for transition/tranversion rate bias, codon usage bias, selective restraints at the protein level (Goldman and Yang, 1994), and variable d/d among sites and among lineages have been proposed (Nielsen and Yang, 1998; Yang ; Yang, 2002; Yang and Nielsen, 2002; O’Brien ). Hypothesis testing can be done using a likelihood ratio test that compares the model (assuming neutrality) with alternative models (Yang, 1998; Yang and Nielsen, 1998; Yang , 2005). Packages such as MEGA (Tamura ) and PAML (Yang, 2007) implement the d/d selection tests.

Tests based on the frequency spectrum

The θ parameter can be estimated from DNA sequences in several ways, and comparison of the different estimates of θ is the basis for some tests aimed at identifying selected regions (Tajima, 1989; Fu and Li, 1993; Fu, 1996, 1997). Tajima (1989) proposed a test based on the difference between θ̂π (the average number of nucleotide differences) and θ̂ (the number of segregating sites along the DNA sequence) because the presence of selection tends to alter the value of θ̂π while that of θ̂ tends to remain unaffected (Tajima, 1989; Hartl and Clark, 2010). The proposed statistic (Tajima’s D) corresponds to the standardized difference between θ̂π and θ̂ (Tajima, 1989; Harris and Meyer, 2006). Under neutrality, the value of D tends to be zero. Positive and negative selection tend to reduce heterozygosity and cause an excess of rare variants surrounding the selected locus, leading to D < 0 (Kaplan ; Tajima, 1989; Charlesworth , 1995; Braverman ; Andolfatto, 2001; Kim and Stephan, 2002; Stephan, 2010). In contrast, long-term balancing selection increases the diversity around the selected locus, leading to D > 0 (Tajima, 1989; Navarro and Barton, 2002; Charlesworth, 2006). Several other tests for detecting selection based on the excess of rare alleles have been developed (Fu and Li, 1993; Fu, 1996, 1997). However, the results of these tests do not always have a straightforward biological interpretation because in some situations it is impossible to differentiate between positive and negative selection (Tajima, 1989; Harris and Meyer, 2006), and also because these tests are sensitive to demography (Tajima, 1989; Charlesworth ; Fu and Li, 1993; Fu, 1996, 1997). While a reduction in heterozygosity and an excess of rare variants are not necessarily a specific pattern left by selection, an excess of derived variants (non-ancestral allele determined by an outgroup) has been identified as a unique feature produced by positive selection (Fay and Wu, 2000). To access this feature, Fay and Wu (2000) proposed a statistic called Fay and Wu’s H that is calculated as the difference between θ̂π and θ̂ (where θ̂ is an estimator of θ weighted by the homozygosity of the derived alleles). When positive (but not negative) selection acts, the value of θ̂ tends to increase because of an excess of derived alleles, leading to H < 0. Thus, in contrast with Tajima’s and Fu and Li’s statistics, Fay and Wu’s H allows the distinction between positive and negative selection (Fay and Wu, 2000). The decrease in variability caused by positive selection tends to be broken by recombination events. Consequently, “valleys” of reduced heterozygosity have been suggested to be footprints of recent hitchhiking events. The depth and extent of the “valleys” is influenced by several factors, such as the strength of selection, recombination rates and effective population size. Because of this, Kim and Stephan (2002) proposed a composite likelihood approach for detecting positive selection in a recombining chromosome. The test is based on the expected number of sites where the derived allele is part of a given frequency interval in the population. More recently, extensions of these tests based on the frequency spectrum around a selective sweep have been proposed. These new methods can deal with genomic data and account for the ascertainment bias (Nielsen ; Kelley ; Williamson ).

Tests based on linkage disequilibrium

Exploitation of the LD patterns is the focus of several tests for detecting selection (Sabeti , 2007; Kim and Nielsen 2004; Voight ; Kimura ). However, these signatures tend to be transient since the recombination tends to quickly break down this LD as soon as the selected locus reaches fixation (Przeworski, 2002; Kim and Nielsen, 2004; McVean, 2007). Sabeti proposed an approach referred to as the long-range haplotype (LRH) test to detect recent selective sweeps by focusing on the relationship between the allele frequency and the LD level surrounding it. This test starts with identification of the core haplotypes (through genotyping a set of single nucleotide polymorphisms (SNPs) in a region so small that recombination may not occur). Subsequently, other SNPs at increasing distances from the core haplotypes are analyzed to evaluate the decay of LD according to distance (Sabeti ). The LD is measured at increasing distances from the core haplotypes through calculation of the extended haplotype homozygosity (EHH), which is the probability that two chromosomes carrying a specific core haplotype are homozygous for the whole region from the core to a distance x (Sabeti ). The relative EHH (REHH) is then calculated to compare the decay of EHH of one specific core haplotype to the decay of EHH of all the other core haplotypes combined. To test for selection, REHH and the frequency for each core haplotype is compared to REHH and the frequency of the other core haplotypes. Positive selection is inferred if one core haplotype has a combination of high REHH and high frequency in the population (Sabeti ). An extension of the LRH test was proposed by Voight . This test is referred to as the iHS (integrated haplotype score) and was designed to work on a genomic scale using information from dense SNP chips. The iHS value can be defined simply as a measure of how unusual the haplotypes around an SNP are, compared to the genome (Voight ). In this approach, each SNP is treated as a core SNP and the test starts with calculation of the EHH for each core SNP. As SNPs are biallelic loci, each core SNP can be ancestral or derived. For the test, the integral of the observed decay of EHH from a core SNP until EHH reaches 0.05 is computed (the area under the curve in an EHH vs. distance plot). This value is referred to as the integrated EHH (iHH) and is identified as iHH or iHH, depending on whether it was computed from the ancestral or the derived allele of the core SNP. This value is then standardized to allow direct comparisons among different SNPs regardless of allele frequencies (Voight ). Hussin proposed a method based on the haplotype allelic classes (HAC). This measure can be defined as the count of allelic differences between the reference allelic class and the individual haplotypes in the sample. The statistic proposed is referred to as Svd, with positive values suggesting positive selection (Hussin ). The LRH and iHS tests rely on the frequencies of alleles at core SNP and therefore have reduced power for detecting selection when the selected allele has reached fixation. To deal with situations in which the selected allele is fixed in one population but remains polymorphic in others, LRH-derived tests based on pairwise comparisons among populations have been proposed (Kimura ; Sabeti ; Tang ). The XP-EHH statistic can be defined as the normalized log-ratio between I and I, where I is the integral of the observed decay of EHH from a core SNP to an SNP X (which has an EHH value as close as possible to 0.04 in both populations) in population A, and I is the analogous measure in population B (Sabeti ). The ln(Rsb) statistic proposed by Tang is very similar to XP-EHH. The main difference between them is that the former calculates the EHH based on the status of each core SNP allele and the latter calculates the EHH based on the core SNP site (Sabeti ; Tang ).

Tests based on population differentiation

The estimation of FST from multiple loci and comparison of these values with its neutral expectations is the basis of several tests aimed at identifying selection (Lewontin and Krakauer, 1973; Bowcock ; Vitalis , 2003; Beaumont and Balding, 2004; Foll and Gaggiotti, 2008; Excoffier ; Bonhomme ). The first effort in this direction was proposed by Lewontin and Krakauer (1973). They suggested that the FST estimated from several loci under neutrality must show small heterogeneity; however, if selection is acting on some of them then the estimates of FST tend to vary widely. The Lewontin and Krakauer test involves comparison between the variance of FST estimated from the data and the expected variance of FST under neutrality through a variance ratio test (Lewontin and Krakauer, 1973). Lewontin and Krakauer’s test was severely criticized soon after publication because of the assumptions they made in estimating the variance of FST under neutrality (Nei and Maruyama, 1975; Robertson, 1975). To avoid the effects of population structure, Bowcock suggested the use of a null distribution obtained by calculating an FST distribution using simulations that take into account the populations phylogenetic history. More recently, models capable of generating the null distribution of FST that are robust to population history and structure (recent divergence and growth, isolation by distance and heterogeneous levels of gene flow between populations) have been proposed (Beaumont and Nichols, 1996; Beaumont and Balding, 2004; Foll and Gaggiotti, 2008; Excoffier ) and implemented in freely distributed softwares such as BayesFST (Beaumont and Balding, 2004), BayeScan (Foll and Gaggiotti, 2008) and Arlequin (Excoffier ). The methods proposed by Beaumont and Nichols (1996) and Excoffier are computationally feasible, but the presence of some complex demographic histories can lead to important biases. On the other hand, Markov chain Monte Carlo (MCMC) based methods (Beaumont and Balding, 2004; Foll and Gaggiotti, 2008) efficiently accommodate some departures from model assumptions but are computationally very intensive. Another way to avoid the effects of demography is to perform pairwise comparisons between populations (Tsakas and Krimbas, 1976). Based on this idea, Vitalis proposed a simple model of population divergence from which they obtained the joint distribution of population-specific estimators of branch length which were used to construct the confidence interval. This approach seems to be robust against departures from model assumptions and also tends to remove the bias introduced by unknown population structure. However, the pairwise comparison tends to reduce the power of the test because information from other populations is discarded (Tsakas and Krimbas, 1976; Vitalis ). This analysis is implemented in the software DetSel 1.0 (Vitalis ). The foregoing discussion has shown that there are currently several approaches for detecting footprints left by selection. Each of these approaches can capture specific patterns of molecular variation. The use of a combination of alternative approaches for detecting selection signals is an interesting strategy that has been suggested as a means of increasing the reliability of these studies. However, the success of one test and failure of another does not exclude the region of interest from having been subjected to selection since different tests can focus on different signals left by selection or look for different time scales in which the selection can act (Hohenlohe ; Oleksyk ).

Selection signatures in livestock

Domestication has resulted in considerable changes in the morphology and behavior of livestock species. In the early stages of domestication, unconscious selection for behavioral traits was applied. This early stage was followed by methodical selection in which specific traits were selected based on goals (Diamond, 2002; Gregory, 2009). The development of specialized breeds, improved to produce specific products or to reach a morphological standard, increased the differences between domesticated animals and their wild relatives and also generated an enormous variety of different populations, with specific traits related to their specialization. Some of these traits are controlled by several interacting genes with minor effects. This creates an exceptional opportunity to gain knowledge of the molecular basis of these traits, particularly since most economically important traits in livestock are quantitative (Andersson and Georges, 2004). The identification of genes targeted by selection in livestock can help to find and prove causal mutations in regions previously identified by QTL mapping experiments and can reveal genes related to ecological traits (e.g., genes related to tropical adaptation) that are difficult to find experimentally. Furthermore, these studies can help to identify the genes or gene networks that contribute to the same trait but that were selected differentially between breeds; they can also unveil genes responsible for genetic correlations and the domestication process (Schlötterer, 2003; Hayes ; Ojeda ; Flori ; MacEachern ).

Signatures associated with domestication and early breed development

In some wild species, the expression both of eumelanin and phaeomelanin pigments is related to a camouflaged coat color. During domestication, non-camouflaged coat patterns were selected because of their direct effect on animal husbandry and also because these patterns may have been used as markers associated with improved individuals, or because of cultural preferences (Fang ; Wiener and Wilkinson, 2011). The melanocyte stimulating hormone receptor gene (MC1R) influences the production of eumelanin and phaeomelanin pigments (Werth ; Kijas ; Fang ; Li ) and is under selection in domestic cattle (Flori ; Stella ) and pig (Fang ; Li ; Amaral ) breeds. Other genes that influence coat color pattern were also suggested to be under selection in domestic species. Selection signatures around the V-Kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog (KIT) have been reported for cattle (Stella ; Wiener ), pigs (Fontanesi ; Amaral ) and sheep (Kijas ). The melanocyte protein 17 precursor (PMEL17), also known as the Silver gene (SILV), is suggested to be under selection in some cattle breeds (Gautier ; Wiener ). The presence/absence of horns is another important feature in breed definition in some livestock species. Recently, the relaxin-like receptor 2 (RXFP2) gene was associated with this trait (Johnston ), and a SNP surrounding this gene showed a strong selection signal in an analysis involving 74 sheep breeds. In cattle, the region surrounding the polled locus was shown to be under selection, although the gene responsible for this trait was not mapped (Drögemüller ; Li ; Stella ). Behavioral changes, such as a reduction in fear and anti-predator responses and an increase in sociability, are believed to be important reflections of animal domestication (Diamond, 2002; Amaral ; Wiener and Wilkinson, 2011). Indeed, several studies in livestock suggest selection signatures surrounding genes related to nervous system development and function (The Bovine HapMap Consortium, 2009; Gautier ; Stella ; Amaral ).

Cattle

Modern bovine breeds can basically be grouped into two major types, the taurine and indicine groups. Within each group, several breeds have been developed, and there is considerable intra- and inter-group variability in productive (milk yield and quality, meat production), morphological (coat color, presence/absence of horns) and adaptive (disease resistance, heat tolerance) traits (The Bovine HapMap Consortium, 2009). Several genome-wide studies focusing on different approaches and using different sets of breeds have sought for selection signatures in bovines (Prasad ; Barendse ; Flori ; Gautier ; Hayes ; MacEachern ; The Bovine HapMap Consortium, 2009; Li ; Qanbari , 2011; Stella ; Wiener ; Hosokawa ). Various studies in beef cattle using approaches such as differences in allele frequencies, iHS and FST have found selection signals in the centromeric region of BTA14 (Hayes ; The Bovine HapMap Consortium, 2009; Wiener ), a region involved in the control of marbling and fatness traits (Barendse, 1999; Moore ; Thaller ; Casas ; Pannier ; Veneroni ). An increase in intramuscular fat percentage in Australian Angus in recent years, together with a significant effect of this region on fat traits, may corroborate with the selection signature found in these studies (Hayes ). The double muscled phenotype has been selected in some beef breeds and mutations in the Growth Differentiation Factor 8 (also known as myostatin or GDF-8) gene are related to this phenotype (Bellinge ). A decrease in heterozygosity around this gene has been demonstrated in double muscled breeds (Wiener ; Wiener and Gutiérrez-Gil, 2009) and an increase in LD (measured using the iHS approach) has been reported in this region (The Bovine HapMap Consortium, 2009). Using the FST approach, a selection signature was found in the median region of BTA2 (Barendse ; The Bovine HapMap Consortium, 2009; Qanbari ). This region was associated with feed efficiency and intramuscular fat in beef breeds (Barendse , 2009) and contains the R3H Domain Containing 1 (R3HDM1) and Zinc Finger, RAN Binding Domain Containing 3 (ZRANB3) genes, which have been suggested to be involved in feed efficiency (Barendse ; The Bovine HapMap Consortium, 2009). Chromosome BTA6 harbors at least three QTLs that affect milk traits (Khatkar ; Ogorevc ; Weikard ) and these regions have been suggested to be under selection in dairy breeds (Hayes ; Barendse ; The Bovine HapMap Consortium, 2009; Qanbari ; Schwarzenbacher ). The first region contains the ATP-binding cassette, Sub-family G (WHITE), Member 2 (ABCG2) gene that was previously related to milk yield and quality traits (Cohen-Zinder ; Olsen ; Cole ; Weikard ). The second region contains the Peroxisome Proliferator-Activated Receptor Gamma, Coactivator 1 Alpha (PPARGC1A) gene that mediates the expression of genes involved in adipogenesis, gluconeogenesis and oxidative metabolism (Weikard ; Ogorevc ) and the third region contains the Casein Cluster associated with milk and protein yield (Boettcher ; Nilsen ; Sodeland ). The increase in allele frequency differences between meat and dairy cattle and the high linkage disequilibrium in dairy breeds (using EHH and iHS methods) suggest that the region surrounding DGAT1 is under selection (Hayes ; Qanbari ; Hosokawa ; Schwarzenbacher ). This gene is suggested to be responsible for a QTL with a major effect on milk fat percentage (Grisart ; Khatkar ; Cole ; Hayes ; Jiang ). At least two QTLs affecting milk traits are located in the BTA20 chromosome. The first QTL was mapped surrounding the Growth Hormone Receptor Gene (GHR) and has a marked effect on protein percentage and a minor effect on fat percentage and milk yield, while the second overlaps the Prolactin Receptor (PRLR) and affects protein and fat yield (Blott ; Khatkar ; Schnabel ; Viitala ; Cole ; Ogorevc ; Jiang ). These regions are under selection (Flori ; Hayes ; The Bovine HapMap Consortium, 2009; Qanbari , 2011; Stella ; Wiener ). Some studies have shown the presence of QTLs affecting milk fat and protein traits in the region surrounding the Signal Transducer and Activator of Transcription 1 (STAT1) gene. This gene has been implicated in mammary gland development and is associated with milk, fat and protein yield in Holstein cattle (Cobanoglu ). Two studies comparing allele frequency differences between beef and dairy cattle suggested a selection signal in the region surrounding this gene (Hayes ; Hosokawa ). The region surrounding the Sialic Acid Binding Ig-Like Lectin 5 (SIGLEC-5) and Zinc Finger Protein 577 (ZNF577) genes was shown to be associated with Net Merit and several related traits, such as conformation, longevity and calving ease in Holstein cattle (Cole ). Based on findings using the iHS approach, this region was suggested to be under selection in Holstein cattle and, although these traits were not the main objective in breeding improvement programs, a weak selection against unfavorable alleles may be responsible for this signature (Qanbari ). Several other regions have been suggested to be under selection in cattle, but the genes under selection cannot be proposed for most of them. Functional analysis of these regions reveals the presence of genes involved in the gonadotropic and somatotropic axes, muscle development, growth, nervous system development and immune response (Barendse ; Flori ; Gautier ; The Bovine HapMap Consortium, 2009; Qanbari , 2011; Stella ; Wiener ).

Pigs

Pig domestication occurred independently multiple times in diverse locations across Eurasia approximately 9000 years ago (Larson ). Domestic pig species are found in a wide range of environments and show extensive variation in morphological, behavioral and ecological characteristics (Larson ; Chen ). The use of this species in very different production systems and environmental conditions around the globe has resulted in an enormous variety of breeds, each one harboring adaptations to special conditions. Currently, most pig production systems are based on five breeds (Large White, Duroc, Landrace, Hampshire and Pietrain) that have been subjected to intense artificial selection focused on productivity traits. Moreover, there is a considerable number of related species and wild individuals that can be used to infer some aspects of selection (Chen ). The increase in muscle mass and decrease in fat content in pigs has been subject to strong selective pressure in commercial pig populations and is related to a substitution in intron 3 of the Insulin-Like Growth Factor 2 (IGF2) gene (Van Laere ). Using Tajima’s D, Ojeda identified a selection signature in the IGF2 gene in three breeds (Pietrain, Hampshire and Duroc) that are commonly used as sire lines, and have been selected for growth and meat leanness. The Melanocortin 4 Receptor (MC4R) gene related to growth and fatness traits has also been suggested to be under selection in pigs (Rubin ; Onteru ). An intronic substitution in the Estrogen Receptor (ESR) gene has been associated with litter size in pigs (Rothschild ; Short ). Although some studies have reported divergent results (Muñoz ), this marker has been used by the pig breeding industry in Marker Assisted Selection (Dekkers, 2004). Recently, Bonhomne suggested that this gene is under selection in the Large White breed. Functional analysis of regions under positive selection in pig breeds has identified genes involved in development of the nervous system and muscle, growth, pigmentation, metabolism, visual/odor perception, immune and inflammatory responses and reproduction (Amaral ; Rubin ; Esteve-Codina ).

Sheep and goats

Sheep and goats were the first domesticated livestock species approximately 9000 years ago. The wide distribution of these species is a reflection of their adaptability to different environments and this has resulted in enormous morphological variation among populations (Diamond, 2002; Gentry ; Naderi ; Chessa ; Kijas ). Since their domestication, sheep have been selected for meat, wool and milk production (Chessa ; Kijas ). Kijas performed a genome scan based on FST to detect selection signatures in a panel of 2819 individuals from 74 sheep breeds. Thirty-one regions showed selection signals and contained genes related to coat color, bone morphology, growth and reproduction traits. This analysis revealed a strong peak of differentiation surrounding the Growth Differentiation Factor 8 (GDF-8) gene when Texel individuals were compared with all other breeds (Kijas ). In addition, Clop showed a reduction in the variability of microsatellites surrounding this gene upon comparing hyper-muscled Texels with other sheep breeds. The region surrounding GDF-8 was associated with QTLs for carcass traits in the Texel breed (Johnson ) and a point in the 3’ UTR of this gene was suggested to be the causal mutation affecting extreme muscling in Texel individuals (Clop ). Moradi performed a genome scan with approximately 50K SNPs to search for signatures of divergent selection in a comparison between fat and thin-tailed sheep breeds; their study identified at least three regions (OAR5, OAR7 and OARX chromosomes) that have undergone selection. Interestingly, most of the regions identified by Moradi intersect with QTLs for carcass traits. Improvement in the sheep genome annotation will facilitate the search for and validation of candidate genes related to these traits.

Horses

Horse domestication appears to have occurred 6000 years ago and was central to the development of human history. The major attraction for domestication of this species was probably its ability to run fast for long distances, but its importance as a source of meat may also have been an important factor. The domestic horse shows marked variation in morphological traits, including shape, size, colours and gait (Bowling and Ruvinsky, 2000; Levine, 2005). Thoroughbred horses have been selected for athletic performance traits and this has led to individuals with extreme phenotypes related to anaerobic and aerobic metabolic capabilities. A genome scan aimed at identifying putative regions under selection in this breed (based on a combination of reduced heterozygosity and increased population differentiation) revealed the presence of genes related to phosphoinositide 3-kinase (PI3K) and insulinsignalling pathways, oxidative stress, energy regulation, adipocyte differentiation and muscle regulation and development. These functions are directly related to the main focus of selection in these breeds, namely, racetrack performance (Gu ). Among the genes suggested to be under selection in Thoroughbred horses, the Pyruvate Dehydrogenase Kinase, isozyme 4 (PDK4) gene has been associated with racing performance phenotypes (Hill ). Petersen identified strong signal differentiation around the myostatin (GDF-8) gene in a comparison of the American Paint Horse and Quarter Horse with other breeds. This gene was also associated with muscle fiber type proportions in these breeds. Another important trait for particular horse breeds is their ability to perform alternate gaits. Recently, it was shown that the gene Doublesex and Mab-3 Related Transcription Factor 3 (DMRT3) is involved with this trait in several breeds (Andersson ). In addition, the region encompassing this gene was suggested (based on population differentiation) to be under selection in several breeds that has been selected for alternative gaits (identified as a breed-defining characteristic) (Petersen ).

Conclusions

Domestication and artificial selection processes have definitely shaped livestock genomes. The identification of candidate regions as being under selection can help researchers understand the molecular mechanisms involved in adaptation and may also be useful in identifying regions associated with important traits that are under selection.

146 in total

Identification of selection signatures in livestock species.

Introduction

Natural and Artificial Selection

Selection Signatures

Methods for Detecting Selected Loci

Tests based on synonymous and non-synonymous substitution rates

Tests based on the frequency spectrum

Tests based on linkage disequilibrium

Tests based on population differentiation

Selection signatures in livestock

Signatures associated with domestication and early breed development

Cattle

Pigs

Sheep and goats

Horses

Conclusions

1. Codon-substitution models for heterogeneous selection pressure at amino acid sites.

2. Letters to the editors: Lewontin-Krakauer test for neutral genes.

3. Letters to the editors: Remarks on the Lewontin-Krakauer test.

4. Learning to count: robust estimates for labeled distances between molecular sequences.

5. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

6. The hitch-hiking effect of a favourable gene.

7. Statistical tests of neutrality of mutations.

Review 8. Myostatin and its implications on animal breeding: a review.

9. The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation.

10. Genetic support for a quantitative trait nucleotide in the ABCG2 gene affecting milk composition of dairy cattle.

1. The Valdostana goat: a genome-wide investigation of the distinctiveness of its selective sweep regions.

2. Exploring evidence of positive selection signatures in cattle breeds selected for different traits.

3. Uncovering Adaptation from Sequence Data: Lessons from Genome Resequencing of Four Cattle Breeds.

4. Selection signatures in melanocortin-1 receptor gene of turkeys (Meleagris gallopavo) raised in hot humid tropics.

5. Strong selection for behavioural resilience in Australian stock working dogs identified by selective sweep analysis.

Review 6. Alternatives to antibiotics in animal agriculture: an ecoimmunological view.

7. Signatures of positive selection in African Butana and Kenana dairy zebu cattle.

8. Genomic Regions and Candidate Genes Linked to Capped Hock in Pig.

9. Challenges and opportunities in genetic improvement of local livestock breeds.

10. Comparison of linkage disequilibrium levels in Iranian indigenous cattle using whole genome SNPs data.