Literature DB >> 31568563

Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era.

T Raudsepp1, C J Finno2, R R Bellone2,3, J L Petersen4.   

Abstract

The horse reference genome from the Thoroughbred mare Twilight has been available for a decade and, together with advances in genomics technologies, has led to unparalleled developments in equine genomics. At the core of this progress is the continuing improvement of the quality, contiguity and completeness of the reference genome, and its functional annotation. Recent achievements include the release of the next version of the reference genome (EquCab3.0) and generation of a reference sequence for the Y chromosome. Horse satellite-free centromeres provide unique models for mammalian centromere research. Despite extremely low genetic diversity of the Y chromosome, it has been possible to trace patrilines of breeds and pedigrees and show that Y variation was lost in the past approximately 2300 years owing to selective breeding. The high-quality reference genome has led to the development of three different SNP arrays and WGSs of almost 2000 modern individual horses. The collection of WGS of hundreds of ancient horses is unique and not available for any other domestic species. These tools and resources have led to global population studies dissecting the natural history of the species and genetic makeup and ancestry of modern breeds. Most importantly, the available tools and resources, together with the discovery of functional elements, are dissecting molecular causes of a growing number of Mendelian and complex traits. The improved understanding of molecular underpinnings of various traits continues to benefit the health and performance of the horse whereas also serving as a model for complex disease across species.
© 2019 The Authors. Animal Genetics published by John Wiley & Sons Ltd on behalf of Stichting International Foundation for Animal Genetics.

Entities:  

Keywords:  Mendelian traits; Y chromosome; ancient genomes; centromeres; complex traits; domestication; modern breeds; signatures of selection

Mesh:

Year:  2019        PMID: 31568563      PMCID: PMC6825885          DOI: 10.1111/age.12857

Source DB:  PubMed          Journal:  Anim Genet        ISSN: 0268-9146            Impact factor:   3.169


Introduction

The horse (Equus caballus, ECA) occupies a special place amongst farm animals. Since domestication about 5500 years ago (Outram et al. 2009; Librado et al. 2016; Gaunitz et al. 2018), horses have served humans in agriculture, warfare and transportation, and as valued companions. In modern times, they continue to interact with humans in many different ways and are an important part of the leisure industry. Humans have selectively bred horses for performance traits (speed, endurance, strength, gait), appearance (size, color, conformation) and temperament, resulting in 400–500 different breeds (Hendricks 2007; Petersen et al. 2013a). The derivation of breeds from selective breeding and the inclusion of only individuals with breed‐defining characteristics have resulted in genomic features that vary among populations (Petersen et al. 2013b). Furthermore, over 130 equine hereditary traits (e.g. muscle disorders, allergies, asthma) can serve as valuable models for the study of similar human conditions (Wade et al. 2009; OMIA: https://omia.org/home/). Interest in economic, biomedical, evolutionary and basic science aspects of the horse combined with intense passion for advancing knowledge on equids have promoted organized studies of the horse genome, initiated at the First International Equine Gene Mapping Workshop in 1995 (reviewed by Chowdhary & Bailey 2003; Finno & Bannasch 2014). Whereas multiple important achievements mark the 25 years of horse genomics (reviewed by (Chowdhary & Bailey 2003; Chowdhary & Raudsepp 2006; Chowdhary et al. 2008; Chowdhary & Raudsepp 2008; Finno et al. 2009; Brosnahan et al. 2010; Bailey & Brooks 2013; Chowdhary 2013; Finno & Bannasch 2014; Librado et al. 2016), without doubt the most important milestone was the generation of the reference genome assembly (Wade et al. 2009), made possible through collaborative efforts of the research community. The completion of a reference genome has shaped horse genome studies during the past 10 years and, together with the ongoing revolution in genomics technologies, particularly next‐generation sequencing (reviewed by van Dijk et al. 2014, has taken equine genomics to a new level. This review will focus on these technology‐driven achievements in the post‐genome era of horse genomics: the development of new genomics tools and resources, the improvement of the reference sequence and functional annotation of the horse genome. We will discuss how the cutting‐edge genomics technologies have improved understanding of the evolutionary history of the horse, the genetic makeup of individual horse breeds, and the study of Mendelian and complex traits.

The horse genome

EquCab2.0

In 2019, we celebrate the tenth anniversary of the publication of the genome sequence of the Thoroughbred mare, Twilight (Wade et al. 2009). The sequence of Twilight represented the first equine and Perissodactyl genome and established a reference sequence for the domestic horse (Wade et al. 2009). Sequencing was performed using the Sanger method with an average of 6.8‐fold genomic coverage. The contiguity of the assembly was increased by the inclusion of end sequences of approximately 315 000 BAC clones from the CHORI‐241 BAC library, which represents sequences from Twilight's half‐brother Bravo (https://bacpacresources.org/; Leeb et al. 2006). Radiation hybrid and cytogenetic maps (Raudsepp et al. 2008) assisted with chromosomal anchoring and orienting the scaffolds. The result was a high‐quality, 2.5 Gb draft assembly, denoted as EquCab2.0, which incorporated the 31 horse autosomes, the X chromosome and the mitochondrial genome. Sequence annotation by the ENSEMBL pipeline predicted 20 322 protein‐coding genes, comparable with human, mouse and other mammals. An important part of the horse genome project was the identification of over a million SNPs, which directed the development of genomic tools for mapping in the horse. The SNPs were generated from the diploid genome of Twilight and by partial sequencing of seven additional horses of diverse breeds. The reference assembly and the SNP map marked a turning point in horse genomics by providing resources for driving subsequent molecular, clinical and evolutionary studies in the horse (reviewed by Wade 2013; Finno & Bannasch 2014; Ghosh et al. 2018).

EquCab3.0

Despite its high quality, EquCab2.0 is a draft assembly and, as a product of the available technology of the time, has limitations (Kalbfleisch et al. 2018). The assembly contains numerous gaps, mainly in structurally complex genomic regions, which include segmental duplications and CNV sites (Doan et al. 2012; Ghosh et al. 2014). Approximately 0.2 Gb of sequence reads in EquCab2.0, mainly highly repetitive regions, remain unassembled and unassigned to chromosomes (Wade et al. 2009; Wade 2013). Furthermore, recent re‐sequencing of the whole genome (Rebolledo‐Mendez et al. 2015) of selected complex regions such as the MHC (Viluma et al. 2017) and part of the PAR (Rafati et al. 2016), together with transcriptome sequencing (Coleman et al. 2010, 2013b) and gene annotations (Hestand et al. 2015; Balmer et al. 2017; Mansour et al. 2017), have revealed several inconsistencies in the assembly. Therefore, taking advantage of new genomic technologies, the genome of Twilight was recently re‐sequenced and assembled, resulting in EquCab3.0 (Kalbfleisch et al. 2018). The new assembly was built upon the solid foundation of EquCab2.0 (Wade et al. 2009), physical maps (Raudsepp et al. 2008) and BAC end sequences (Leeb et al. 2006). These were augmented with 45‐fold short‐read data that improved the accuracy of unique regions of the genome. Chromosome length scaffolding was achieved by including Chicago® and Hi‐C proximity ligation data and 16‐fold long‐read Pacific Biosciences (PacBio) data. The new assembly improved both the contiguity and composition of the horse reference genome. For example, the number of gaps was reduced 10‐fold, from 55 Mb (2.2% of the genome) in EquCab2.0 to 9 Mb (0.34% of the genome) in EquCab3.0, and the number of assembled bases in the incorporated chromosomes improved from 2.33 to 2.41 Gb (3% increase). Contiguity improved nearly 40‐fold and it is noteworthy that only ECA6 comprises two scaffolds; all other chromosomes comprise a single scaffold. In addition, the use of the Chromium 10X platform allowed for true haplotype phasing, so the final assembly has the most common and likely ancestral allele at each heterozygous site. Comparison of EquCab2.0 and EquCab3.0 mapping statistics for 13 previously published ancient DNA samples (Schubert et al. 2014; Librado et al. 2015, 2017; Gaunitz et al. 2018) showed that significantly more reads mapped to the new assembly, demonstrating its improved utility for the mapping of highly fragmented and damaged DNA samples. Table 1 demonstrates that the size and gene content of the reference genome have marginally but consistently increased for all chromosomes. The exception is ECA5, which is smaller and has fewer genes in EquCab3.0 than in EquCab2.0, most likely owing to inconsistencies in the previous assembly. Two chromosomes, ECA12 and ECAX, increased in length by almost 4 Mb owing to incorporating previously unplaced contigs, and the number of annotated, non‐coding genes has essentially increased for all chromosomes. This is probably due to improved sequence composition and better annotation pipelines, although detailed annotation of the horse genome will be the task of the ongoing Functional Annotation of ANimal Genomes (FAANG) project (described in detail below). The necessary prerequisite for FAANG is a high‐quality and contiguity reference genome (Andersson et al. 2015), and EquCab3.0 serves this purpose. Nevertheless, EquCab3.0 is also a tribute to EquCab2.0 and testimony that the first Sanger assembly of the Twilight genome 10 years ago was an outstanding achievement, providing the foundation for answering genetic questions related to the horse.
Table 1

Chromosome‐wise comparison of EquCab2.0 and EquCab3.0.

Horse chromosomeSize, MbProtein coding genesNon‐coding genes
EquCab2EquCab3EquCab2EquCab3EquCab2EquCab3
ECA1185.8188.316561683166705
ECA2120.9121.410621077103450
ECA3119.5121.483888379391
ECA4108.6109.575073590354
ECA599.796.8100499999322
ECA684.787.296298863326
ECA798.5100.81236136786386
ECA894.197.671477369397
ECA983.685.845144739296
ECA1084.085.21032114661346
ECA1161.361.710861141109282
ECA1233.137.063973834177
ECA1342.643.865771144175
ECA1494.094.666569363368
ECA1591.692.965966454376
ECA1687.489.068371373299
ECA1780.880.735233844288
ECA1882.582.642241662259
ECA1960.062.740741847208
ECA2064.265.370973850262
ECA2157.759.037638837199
ECA2249.950.953356053244
ECA2355.755.629629453251
ECA2446.748.3381453141298
ECA2539.540.352355442160
ECA2641.943.122123219129
ECA2740.040.321523220110
ECA2846.247.338338846189
ECA2933.734.818117025156
ECA3030.131.417118520104
ECA3125.026.01401541386
ECAX124.1128.2853821151371
ECAY 1 n/a9.5n/a52 n/a
Total2440.8241915 42821 15120558964

Ensembl: http://www.ensembl.org/index.html for assembly size and annotated gene content.

Y chromosome data are from Janečka et al. (2018).

Chromosome‐wise comparison of EquCab2.0 and EquCab3.0. Ensembl: http://www.ensembl.org/index.html for assembly size and annotated gene content. Y chromosome data are from Janečka et al. (2018).

Unique features of horse centromeres

Centromeres are typically not part of reference genomes because they are composed of arrays of nearly identical tandem repeats, known as satellite DNA, and are extremely difficult to assemble even with long‐read sequencing technologies (Miga et al. 2014; Jain et al. 2018). An outstanding exception is horse ECA11, which has no satellite DNA and is the first sequenced example of a natural satellite‐free and evolutionary ‘immature’ centromere (Wade et al. 2009; Wade 2013). This unusual and most interesting discovery led to in‐depth studies of centromeres in horses and equids. Three centromere satellite families, 37cen, 2PI and EC137, have been isolated, characterized and localized in horse and equid chromosomes, revealing that in donkeys and zebras a large number of centromeres lack satellite DNA (Piras et al. 2010; Nergadze et al. 2014). With the help of chromatin immunoprecipitation sequencing (ChIP‐seq) methodology, it was shown that 37cen satellite binds to the centromeric histone H3 variant CENPA protein, is transcriptionally active and probably required for centromere function (Cerutti et al. 2016). However, further studies of satellite‐free horse ECA11 revealed that centromere is defined epigenetically by binding of the CENPA protein and not by the presence of satellite repeats (Purgato et al. 2015). Interestingly, the size and exact location of the approximately 100 kb (kilo‐base) CENPA binding region differ between individuals, giving rise to epialleles, a phenomenon known as centromere sliding (Purgato et al. 2015; Giulotto et al. 2017). These observations were further strengthened by a study of donkey centromeres (Nergadze et al. 2018). The donkey genome has 16 chromosomes with satellite‐free centromeres, which is perfectly compatible with genome stability and species survival. Analysis of the transmission of CENPA‐binding epialleles in mules and hinnies shows that centromeric domains are inherited as Mendelian traits, although centromere sliding can happen in one generation. Overall, the discovery of satellite‐free centromeres in horses and equids has provided a unique model for the study of the evolution, dynamics and molecular regulation of mammalian centromeres.

The Y chromosome

The female‐based horse reference genome is incomplete because it does not include the Y chromosome. It is therefore noteworthy that, concurrently with the release of EquCab3.0 (Kalbfleisch et al. 2018), the first comprehensive assembly and annotation of the male‐specific region of the horse Y (MSY) was published (Janečka et al. 2018). The 9.5 Mb assembly represents the Y chromosome of a Thoroughbred stallion Bravo, a half‐brother of Twilight, thus completing the Thoroughbred‐based horse reference genome. The assembly provides information about the horse Y chromosome organization, sequence classes and gene content (52 genes and 174 transcripts). Notably, the study identified a novel testis‐expressed XY ampliconic sequence class ETSTY7, which is shared with the parasite Parascaris genome, providing evidence for eukaryotic horizontal gene transfer. Alignment of MSY assembly with horse, donkey and mule testis transcriptome data suggests candidate genes for stallion fertility. The MSY assembly provides a needed reference toward improved understanding for the role of the Y chromosome in equine male development and fertility. Another keen interest in the male‐specific and non‐recombining Y chromosome is its inheritance exclusively through male lineages. This makes Y chromosome sequence polymorphisms excellent markers for tracing the patriline history of ancient horses and modern breeds. However, until recently, the main problem with the horse Y chromosome was the lack of sequence polymorphism (Lindgren et al. 2004; Wallner et al. 2004; Brandariz‐Fontes et al. 2013), suggesting a limited number of patrilines in horse domestication and omitting the use of the Y to trace those patrilines. Even though one polymorphic microsatellite, YA16, with two alleles was detected in a few individuals of indigenous Chinese horses (Ling et al. 2010), no variants were found in other modern breeds. In contrast, sequencing just a 4 kb Y chromosome fragment from eight ancient horses revealed 28 segregating sites and eight haplotypes (Lippold et al. 2011), demonstrating considerable diversity in the ancestral horse Y and the loss of this diversity during domestication. Nevertheless, persistent search, combined with the use of high‐throughput next‐generation sequencing technologies, led to step‐wise discovery of a limited number of variable sites also in the modern horse Y chromosome. These included two SNPs that defined six Y haplotypes in modern breeds (Wallner et al. 2013), and two microsatellites—YP9 in Hucul and Mongolian horses and YN04 in a Shetland pony (Kreutzmann et al. 2014). This small but significant progress led to more systematic discoveries of Y chromosome variants based on WGS and MSY assembly. Fifty‐three variants (50 SNPs and three indels) and 24 haplotypes were identified from a 1.46 Mb MSY sequence of 52 males of 21 different breeds (Wallner et al. 2017), followed by the description of another 211 variants and 58 haplotypes by screening 5.8 Mb of MSY in 130 horses of rural breeds and nine Przewalski's horses (Felkel et al. 2019). This is an awaited breakthrough in horse Y chromosome research and has already launched a number of studies to determine the time and cause(s) of the loss of Y variation (Wutke et al. 2018), as well as enabling the tracing of patrilines of modern breeds and pedigrees (Wallner et al. 2017; Felkel et al. 2018, 2019; Kakoi et al. 2018; Khaudov et al. 2018; Han et al. 2019). Recent studies of ancient samples provide clues about when and why the horse Y chromosome lost its genetic diversity. A study of four MSY polymorphic markers in 96 ancient stallions from early domestication indicates that the reduction of Y diversity over time was not due to genetic drift or founder effect, but the result of artificial selection that started during the Iron Age and continued during the Roman period (Wutke et al. 2018). This is supported and further elaborated by a more extensive study involving 105 ancient stallions and over 1500 polymorphic MSY sites showing that Y chromosome nucleotide diversity decreased steadily during the last approximately 2000 years but dropped to present levels only after 850–1350 AD (Fages et al. 2019)

Genomics tools and resources

SNP genotyping array

The most impactful tool and resource for horse genomics has certainly been the reference genome. As noted, EquCab2.0 has been the critical template for the discovery of millions of sequence polymorphisms from diverse horse breeds, leading to the development of three generations of SNP chips which have been utilized to map traits and understand breed diversity and signatures of selection. First‐ and second‐generation DNA genotyping arrays, containing 54 602 and 74 500 SNP markers, respectively, became available in 2011 (reviewed in Finno & Bannasch 2014). A number of phenotypic traits of interest and disease traits were identified using these arrays, including Lavender Foal Syndrome (Brooks et al. 2010), alternate gait (Andersson et al. 2012), iris color variation (Mack et al. 2017) and ocular squamous cell carcinoma (Bellone et al. 2017; Tables 2 and 3). Additionally, these resources were used to identify breed specific signatures of selection (Petersen et al. 2013b) that aid in our understanding of the biology behind performance and other selected traits in the horse (Andersson et al. 2012; Petersen et al. 2014b).
Table 2

Genetic variants identified for traits influencing pigmentation.

PhenotypeGene Allele Type of variantChromosomeBreedYear publishedPubMed ID
Chestnut MC1R e Missense3Many19968995760
Frame overo (Lethal White Foal Syndrome, LWFS) EDNRB O Missense 17 American Paint Horse, Miniature Horse, Pinto Horse, Quarter Horse Thoroughbred, Appaloosa 1998 9530628
Chestnut MC1R e a Missense3Black Forest 200011086549
Recessive black ASIP a 11 bp deletion Many200111353392
Cream dilution SLC45A2 C Cr Missense21Many200312605854
Sabino 1 KIT SB1 Splicing3Appaloosa, Haflinger, Lipizzan, Noriker, Quarter Horse200516284805
Silver (Multiple Congenital Ocular Anomalies, MCOA) PMEL Z Missense American Miniature Horse, Icelandic Rocky Mountain, Kentucky Mountain Horse200617029645
Tobiano KIT (proposed) To ~43 Mb inversion3Many200718253033
Dominant white KIT W1 Nonsense (stop‐gain)3Franches‐Montagnes200717997609
Dominant white KIT W2 Missense3Thoroughbred200717997609
Dominant white KIT W4 Missense3Camarillo White Horse200717997609
Dominant white KIT W3 Nonsense (stop‐gain)3Arabian200717997609
Grey (melanoma susceptibility) STX17 G 4.6 kb intronic duplication25Many200818641652
Champagne dilution SLC36A1 Ch Missense14Spanish Mustang, Tennessee Walking Horse, Quarter Horse, and pony breeds200818802473
Dominant white KIT W11 Splicing3South German Draft200919456317
Dominant white KIT W8 Splicing3Icelandic200919456317
Dominant white KIT W7 Splicing3Thoroughbred200919456317
Dominant white KIT W5 1 bp deletion, frameshift3Thoroughbred200919456317
Dominant white KIT W9 Missense3Holstein200919456317
Dominant white KIT W10 4 bp deletion, frameshift3Quarter Horse200919456317
Dominant white KIT W6 Missense3Thoroughbred200919456317
Lavender foal syndrome MYO5A LFS 1 bp deletion, frameshift   Arabian 2010 20419149
Dominant white KIT W12 5 bp deletion3Thoroughbred2010 https://doi.org/10.1111/j.1365-2052.2010.02135.x
Dominant white KIT W13 Splicing3American Miniature Horse, Quarter Horse201121554354
Dominant white KIT W16 Missense3Oldenburg201121554354
Dominant white KIT W14 54 bp deletion3Thoroughbred201121554354
Dominant white KIT W17b Missense3Japanese Draft201121554354
Dominant white KIT w17a Missense3Japanese Draft201121554354
Dominant white KIT W15 Missense3Arabian201121554354
Macchiato MITF macchiato Missense16Franches‐Montagnes201222511888
Splashed white MITF SW3 5 bp deletion, frameshift16Quarter Horse201222511888
Splashed white MITF SW1 Insertion 11 bp, regulatory16American Miniature Horse, American Paint Horse, Appaloosa, Icelandic, Morgan, Old‐Tori, Quarter Horse, Shetland Pony, Trakehner201222511888
Splashed white PAX3 SW2 Missense6Lipizzan, Noriker, Quarter Horse201222511888
Dominant white KIT W18 Splicing3Swiss Warmblood201323659293
Dominant white KIT W20 Missense3American Paint Horse, Appaloosa, German Riding Pony, Gipsy, Noriker, Old‐Tori, Oldenburg, Quarter Horse, Thoroughbred, Warmblood, Welsh Pony201323659293
Dominant white KIT W19 Missense3Arabian201323659293
Splashed white PAX3 SW4 Missense6Appaloosa201323659293
Leopard Complex Spotting (Congenital Stationary Night Blindness, CSNB) TRPM1 LP Insertion 1378 bp1American Miniature Horse, Appaloosa, Australian Spotted Pony, British Spotted Pony, Knabstrupper, Noriker, Pony of the Americas, 201324167615
Brindle ( Incontinentia pigmenti ) IKBKG  Nonsense (stop‐gain)XQuarter Horse201324324710
Dominant white KIT W21 1 bp deletion, frameshift 3Icelandic201526059442
Non‐dun with primitive markings TBX3 nd1 Regulatory8Many201626691985
Non‐dun TBX3 nd2 1609‐bp and 8‐bp deletion, regulatory8Many201626691985
LP pattern modifier RFWD3 PATN1 Regulatory3American Miniature Horse, Appaloosa, Australian Spotted Pony, British Spotted Pony, Knabstrupper, Noriker, Pony of the Americas201626568529
Brindle 1 MBTPS2 Br1 Splicing Quarter Horse201627449517
White MITF MITF 244Glu Missense16American Standardbred201727592871
White leg markings MITF  Regulatory16Menorca Purebred201728084638
Dominant white KIT W23 Splicing3Arabian201728378922
Dominant white KIT W22 Deletion 1898 bp3Thoroughbred201728444912
Tiger eye SLC24A5 TE2 Deletion 626 bp1Paso Fino201728655738
Tiger eye SLC24A5 TE1 Missense1Paso Fino201728655738
Dominant white KIT W24 Splicing3Italian Trotter201728856698
Dominant white KIT W27 Missense3Thoroughbred201829333746
Dominant white KIT W26 1 bp deletion, frameshift3Thoroughbred201829333746
Dominant white KIT W25 Missense3Thoroughbred201829333746
Curly coat KRT25 Crd Missense11Bashkir Curly Horse201829686323 29141579
Splashed white, blue eyes and deafness MITF SW5 63 kb deletion16American Paint Horse201930644113
Pearl dilution SLC45A2 C prl Missense21American Paint Horse, Lusitano, Purebred Spanish horse, Quarter Horse201931006892, 30968968
Sunshine dilution SLC45A2 C sun Missense21Standardbred × Tennessee Walking Horse cross201931006892

Variants influencing pigmentation with known pleiotropic effects are in bold; details for genomic, coding and protein coordinates are in Table S1.

Table 3

Genetic variants underlying disease and performance traits in the horse.

Disease GeneType of variantMode of InheritanceChromosomeBreedYear publishedPubMed ID
Hyperkalemic periodic paralysis SCN4A MissenseDominant11American Quarter Horse and related breeds19921338908
Ovotesticular disorder of sexual development (DSD) SRY Large deletion of the DNA‐binding domain of the SRY geneY‐linkedYStandardbred19957558880
Severe combined immunodeficiency disease (SCID) PRKDC Deletion 5 bpRecessive9Arabian19979103416
Junctional epidermolysis bullosa (JEB1) LAMC2 Insertion 1 bpRecessive5Belgian and Italian draft horse200212230513
Malignant hyperthermia (MH) RYR1 MissenseDominant10American Quarter Horse200415318347
Glycogen branching enzyme deficiency (GBED) GBE1 Nonsense (stop‐gain)Recessive26American Quarter Horse and related breeds200415366377
Thrombasthenia ITGA2B MissenseRecessive11American Quarter Horse & Thoroughbred200616407493
Thrombasthenia ITGA2B Deletion 10 bpRecessive11American Quarter Horse200717338169
Hereditary equine regional dermal asthenia (HERDA) PPIB MissenseRecessive1American Quarter Horse200717498917
Polysaccharide storage myopathy (PSSM1) GYS1 MissenseIncompletely dominant10American Quarter Horse, American Paint Horse, Appaloosa, Draft, Pony of the America, and Warmblood200818358695
Junctional epidermolysis bullosa (JEB2) LAMA3 Deletion 6589 bpRecessive5American Saddlebred200919016681
Racing distance MSTN Insertion 227 bp, regulatory 18Thoroughbred201020098749 25160752 30379863
Cerebellar abiotrophy (CA) MUTYH RegulatoryRecessive2Arabian, Bashkir Curly Horse, Danish Sport Horse, Trakehner, and Welsh Pony201121126570 and 29103988
Foal immunodeficiency syndrome in the Fell and Dales pony (FIS) SLC5A3 MissenseRecessive26Dales Pony and Fell Pony201121750681
Androgen insensitivity syndrome (AIS) AR RegulatoryX linkedXAmerican Quarter Horse201222095250
Myotonia CLCN1 MissenseRecessive4New Forest Pony201222197188
Permissive to gait DMRT3 Nonsense (stop‐gain)Dominant23Numerous breeds201222932389
Warmblood fragile foal syndrome (WFFS) or Ehlers–Danlos syndrome, type VI PLOD1 MissenseRecessive2Warmblood201525637337
Hoof wall separation syndrome SERPINB11 Insertion 1 bpRecessive8Connemara201525875171
Hydrocephalus B3GALNT2 Nonsense (stop‐gain)Recessive1Friesian201526452345
Androgen insensitivity syndrome (AIS) AR MissenseX linkedXThoroughbred201627073903
Skeletal atavism SHOX 2 over lapping deletions 160 = 180 kb and 60–80 kbRecessiveX and Y PARShetland201627207956
Dwarfism, Friesian B4GALT7 MissenseRecessive14Friesian201627793082
Dwarfism, ACAN‐related D3* ACAN MissenseRecessive1Miniature Shetland201727942904
Occipitoatlantoaxial malformation (OAAM) HOXD3 Deletion 2.7 kbRecessive18Arabian201728111759
Androgen insensitivity syndrome (AIS) AR Deletion 25 bpX linkedXWarmblood201728192783
Naked foal syndrome ST14 Nonsense (stop‐gain)Recessive7Akhal‐Teke201728235824
Ocular squamous cell carcinoma (ocular SCC) DDB2 MissenseRecessive12Belgian, Haflinger, Percheron, Rocky Mountain Horse201728425625
Immune‐mediated myositis (IMM/MYH1) MYH1 MissenseRecessive11American Quarter Horse201829510741
Curly coat with hypotrichosis Crd KRT25 MissenseDominant11Bashkir Curly Horse201829686323 29141579
Curly coat without hypotrichosis SP6 MissenseDominant11American Bashkir Curly Horse and Missouri Foxtrotter201829686323
Dwarfism, ACAN‐related D4 ACAN Deletion 42 bpRecessive1Miniature201830058072
Dwarfism, ACAN‐related D2 ACAN MissenseRecessive1Miniature201830058072
Dwarfism, ACAN‐related D1 ACAN Deletion 1 bpRecessive1Miniature201830058072

Genomic, coding and protein sequence coordinates are in Table S2.

Genetic variants identified for traits influencing pigmentation. Variants influencing pigmentation with known pleiotropic effects are in bold; details for genomic, coding and protein coordinates are in Table S1. Genetic variants underlying disease and performance traits in the horse. Genomic, coding and protein sequence coordinates are in Table S2. In 2017, a third‐generation SNP array was developed containing 670 805 SNP markers (MNEc670k array, Affymetrix; Schaefer et al. 2017). This array was designed using WGS from 156 horses representing 24 distinct breeds. Mean inter‐SNP distance was estimated at 3756 bp with SNP selection aimed at tagging approximately 2 million SNPs. To date, two published studies have successfully mapped traits with the 670K array, followed by fine‐mapping with WGS. The first identified a nonsense variant in ST14 associated with Naked Foal Syndrome in the Akhal‐Teke (Bauer et al. 2017) and the second identified genetic variants in two genes, KRT25 and SP6, responsible for a curly coat in horses (Thomer et al. 2018). With the recently updated reference assembly of the equine genome (Kalbfleisch et al. 2018), the SNP array coordinates were remapped to EquCab3.0 (Beeson et al. 2019). The raw reports with EquCab3.0 SNP coordinates for the MNEc670k array are hosted at https://www.animalgenome.org/repository/pub/UMN2018.1003/. Furthermore, coordinates between the two assemblies can be easily converted now at NCBI: https://www.ncbi.nlm.nih.gov/genome/tools/remap. The high‐density SNP array resource is undoubtedly aiding in the mapping of other important traits and we anticipate a continued increase in the number of discoveries made possible.

WGSs of individual horses

As next‐generation sequencing continues to become more affordable, WGSs of horses are being generated worldwide. At the time of this writing, 1936 public WGSs are available for the horse through the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra). This tremendous resource provides investigators with a database of horse genomes to screen potential variants, and with the continuing addition of phenotypic metadata, this resource will facilitate future studies for many years to come. WGSs are often used to identify putative genetic variants within regions identified by GWAS. Two recent examples include the discovery of the splice site mutation in B4GALT7 associated with dwarfism and joint laxity in Friesian horses (Leegwater et al. 2016). Using the first‐generation SNP array (~50K markers), a region on ECA14 was initially identified by GWAS. Four dwarf cases and three unrelated controls then underwent WGS. Pooling data from the four cases and variant calling identified the putative variant in B4GALT7, later confirmed via Sanger sequencing and demonstrated to affect splicing through analysis of cDNA from affected horses. Notably, this is the second gene with a role in protein glycosylation in which a pathogenic mutation has been identified in Friesian horses. A nonsense mutation in B3GALNT2 involved in muscular dystrophy with hydrocephalus in stillborn foals was discovered previously by the same group using similar techniques (Ducro et al. 2015). This approach was also utilized to unravel the genetic mutation responsible for both immune‐mediated myositis (Finno et al. 2018) and non‐exertional rhabdomyolysis (Valberg et al. 2018) in the Quarter Horse as well as congenital hepatic fibrosis in the Swiss Franches‐Montagnes (Drogemuller et al. 2014). Many recently discovered, potential causal genetic variants in the horse have been identified through WGS alone. Public WGS was screened for putative deleterious variants associated with stallion infertility and further evaluated in a group of 337 fertile stallions across 19 breeds (Schrimpf et al. 2016). A variant in NOTCH1 (g.37455302G>A) was identified as a significant stallion fertility locus in Hanoverian stallions. Additionally, nine candidate fertility loci with missing homozygous mutant genotypes were validated. WGS has recently identified a genetic cause of occipitoatlantoaxial malformation in an Arabian horse (Bordbari et al. 2017) and dwarfism in a Miniature Shetland pony (Metzger et al. 2017). WGS data have also been used to identify novel pigmentation variants (Henkel et al. 2019). Finally, WGS has been used to investigate the characteristics of highly selected breeds (Metzger et al. 2014). With the continued reduction in the cost of WGS, we can expect to soon see many more publicly available genomes for the horse. The beauty of this resource is that, once available, the data can be used for diverse projects worldwide.

Other tools

In the past 10 years, several other array platforms have been generated to study the horse genome. On the basis of EquCab2.0, an exon array (Doan et al. 2012) and two whole genome tiling arrays (Ghosh et al. 2014; Wang et al. 2014) were constructed for the discovery of CNVs. However, in a short time, these platforms have given way to more comprehensive WGSs. Likewise, cDNA and oligonucleotide microarrays, developed for gene expression studies (reviewed by Coleman et al. 2013a), have been completely replaced by RNA‐seq. Nevertheless, a few genomics resources from the past, such as the genomic BAC library CHORI‐241 (https://bacpacresources.org/), remain in use in the post‐genome era. End sequences of CHORI‐241 BAC clones (Leeb et al. 2006) helped to validate the EquCab3.0 assembly (Kalbfleisch et al. 2018); BAC tiling paths are used to re‐sequence complex genomic regions such as the terminal end of the pseudoautosomal region in ECAX (Rafati et al. 2016) and the male‐specific region of ECAY (Janečka et al. 2018), and cytogenetic mapping of BAC clones is still a reliable method to validate copy number changes (Ghosh et al. 2014; Staiger et al. 2016a) and other large‐scale structural rearrangements.

Ancient genomics and natural history of the horse

The assembly of the horse reference genome EquCab2.0 (Wade et al. 2009), along with success in extracting and sequencing ancient genomic DNA (Orlando et al. 2011), has essentially expanded our knowledge about the natural history of equids (Orlando et al. 2013; Jonsson et al. 2014), horse domestication and the dynamics of the horse genome over time, from the pre‐domestication era to the present (Schubert et al. 2014; Librado et al. 2015, 2017; Gaunitz et al. 2018; Janečka et al. 2018; Wutke et al. 2018; Fages et al. 2019). The findings of ancient DNA studies were summarized and discussed in an excellent review by Librado et al. (2016). Therefore, here we highlight only the most notable discoveries and discuss the studies published since that review. It is noteworthy that, with regards to ancient DNA, horses/equids have made history three times—by being the first and the oldest, and most recently, by spanning the largest time‐scale of ancient genome data among non‐human organisms (Fages et al. 2019). The first successful ancient DNA extraction and analysis ever was reported in 1984 from 150‐year‐old tissue from the extinct quagga (Higuchi et al. 1984), preceding by a year the first human ancient DNA study from a 2400‐year‐old Egyptian Mummy (Pääbo 1985). Likewise, to date the oldest genomic DNA sample which has been successfully sequenced is that of a 700 000‐year‐old Pleistocene horse (Orlando et al. 2011, 2013), exceeding the age of the oldest hominin genomic DNA extracted from 430 000‐year‐old bones from Sima de los Huesos (Meyer et al. 2016). The current peak of ancient genomics of non‐human organisms, but perhaps not the limit, is a study tracking 5000 years of horse domestication based on genome‐scale data from 278 ancient animals (Fages et al. 2019).

Horse ancestry and domestication

Genomics‐based searches into the wild ancestry of the domestic horse and the origins of horse domestication have been ongoing for decades. Yet the answers have only recently started to emerge, largely thanks to the contribution of WGS from ancient equine samples (Schubert et al. 2014; Librado et al. 2015, 2016; Gaunitz et al. 2018; Fages et al. 2019). These studies reveal that, in addition to the two extant horses, the domestic (Equus caballus) and the Przewalski's horse (E. przewalskii) (Der Sarkissian et al. 2015), there existed other, now extinct, horse lineages at the time of early domestication. One lineage, that was initially identified from approximately 43 000‐ to 5000‐year‐old bones from the Holarctic region (Schubert et al. 2014), extended to Southern Siberia (Fages et al. 2019). This lineage shares morphological similarities with an extinct horse described as Equus lenensis (Boeskorov et al. 2018; Fages et al. 2019). Recent mtDNA and Y haplotype analyses of an approximately 24 000‐year‐old specimen from the Tuva Republic suggest that there may have been another genetically divergent lineage of horses in Siberia and the New Siberian Islands, although the genetic contact between E. lenensis and this ‘ghost’ lineage remains unknown (Fages et al. 2019). Finally, Iberian samples from the third and early second millennia BCE, cluster separately from E. caballus, E. przewalskii and E. lenensis, have extremely divergent Y and mtDNA, and therefore suggest that there was a different, now extinct, horse lineage in Iberia during the early phase of horse domestication (Fages et al. 2019). Earlier it was reported that that the Holarctic horse (E. lenensis) contributed 12.9% to the genetic makeup of domestic horses (Schubert et al. 2014). However, the most recent study of 278 ancient equids and modern horses shows that none of the above extinct horse lineages contributed significantly to modern horse diversity (Fages et al. 2019). Another question of interest is the genetic relationship between the two surviving horses—the domestic horse and the Przewalski's horse. Until recently, the overall consensus, based on modern and ancient WGS, was that the two are separate species, diverged approximately 45 000 years ago (Goto et al. 2011; Schubert et al. 2014; Der Sarkissian et al. 2015), with extensive bi‐directional gene flow (Der Sarkissian et al. 2015; Librado et al. 2016). All studies agree that the domestic horse is not a direct descendant of the Przewalski's horse (Librado et al. 2016). These views were recently shaken by a study of over 40 ancient horse genomes from Eurasia, providing striking evidence that the Przewalski's horse is not truly wild, but rather a feral horse descended from the horses domesticated by Botai culture some 5500 years ago (de Barros Damgaard et al. 2018; Gaunitz et al. 2018). At the same time, all studied domestic horses dated from 4000 years ago to present show only 2.7% Botai ancestry, suggesting they descended from a different lineage of wild horses that subsequently went extinct (Fages et al. 2019). Ancient DNA studies have also attempted to decipher, but have not completely resolved, the timeline and geography of horse domestication. Current archeological and DNA evidence suggests multiple sites of horse domestication of which the earliest (~5000–5500 years ago) were in the Western Eurasian steppes: Botai culture in Northern Kazakhstan and the Pontic–Caspian steppe (Outram et al. 2009; Warmuth et al. 2012; Librado et al. 2016; Gaunitz et al. 2018; Fages et al. 2019), followed by additional candidate sites in Iberia, Eastern Anatolia, Western Iran, Levant and Eastern Europe (Hungary; Gaunitz et al. 2018). It is noteworthy that native breeds from the main domestication sites such as the Pontic–Caspian steppes still represent hotspots of genetic diversity for horses (Warmuth et al. 2011; Librado et al. 2016). However, even though the main candidate sites for domestication have been identified, the geographic origins of the modern domestic horse remain unknown. Given that the ancient Botai and Iberian lineages did not contribute substantially to modern domesticates and the temporal origins of the modern horse are modeled within the third and fourth millennium BCE, future studies of this timeline in other candidate regions of early domestication are needed (Fages et al. 2019).

Genetic cost of domestication

Domestication reduces overall fitness, known as ‘the genetic cost of domestication’ (Lu et al. 2006; Moyers et al. 2018). However, because of the extinction of wild horses, it has been difficult to evaluate the extent of this cost for the horse. Once again, the major breakthrough came with ancient DNA sequencing, providing a comparison with horses predating domestication (Schubert et al. 2014) and from early stages of domestication (Librado et al. 2017; Fages et al. 2019). Genetic changes associated with horse domestication can be summarized as follows: Extreme mitochondrial DNA (mtDNA) diversity. Modern horses have high mtDNA diversity and lack phylogeographic mitochondrial structure, resulting in limited correspondence between mtDNA haplotypes, breeds and geography (Vila et al. 2001; Achilli et al. 2012; Librado et al. 2016). Sequencing horse genomes from early domestication sites show that similar mtDNA diversity was present in Scythian horses some 2300 years ago (Librado et al. 2017) and perhaps even earlier. Mitochondrial Bayesian Skylines reconstructed from 211 mitochondrial genomes suggest horse demographic expansion about 4500 years ago (Gaunitz et al. 2018; Fages et al. 2019). Reasons for the lack of mitochondrial structure are thought to be many, including sex‐biased restocking from the wild and human management (Librado et al. 2016). Extreme lack of Y chromosome diversity. Contrasting the diversity of mtDNA, the Y chromosome of domestic horses has become almost homogeneous with just a few haplotypes segregating in modern populations (Wallner et al. 2017; Wutke et al. 2018). At the same time, Y sequences of ancient Scythian horses indicate that a large diversity of domestic male founders contributed to early domestication (Librado et al. 2017). Thus, the observed decline in Y chromosome variation happened only in the past few thousand years, probably because of the human‐mediated reduction in the stallion population size and selective breeding (Librado et al. 2017; Wutke et al. 2018; Fages et al. 2019). Genetic load. As a side‐effect of human‐driven selective breeding, the genome of the domestic horse has an increased level of deleterious mutations compared with pre‐domestication genomes (Schubert et al. 2014). However, as mutation load is also lower in early domesticates from ancient Sintashta and Scythia, the excess of deleterious mutations in present day horses is probably a consequence of the past 2300 years of selective breeding (Librado et al. 2017), whereas the most striking changes have occurred during the past 200 years (Fages et al. 2019). Domestication‐associated genetic changes. One of the first comparisons between ancient pre‐domestic horse genomes and those of modern breeds revealed 125 candidate genes that underwent episodes of positive selection during domestication (Schubert et al. 2014). These include genes involved in the cardiac and circulatory system, bone, limb and face morphogenesis, brain development and behavior, and coat color (Schubert et al. 2014; Librado et al. 2016). Genetic changes during horse domestication agree with the neural crest hypothesis and involve developmental changes affecting tissues and cell types of neural crest origin (Librado et al. 2017). However, the genomic cost of domestication and modern breeding is best illustrated by a striking discovery of a recent comparative study of ancient and modern horse genomes, showing that early breeders managed to maintain genetic diversity for millennia after domestication, and that the genetic diversity of the modern horse has dropped by 16% during just the past 200 years (Fages et al. 2019). It is not a coincidence that the last 200 years also cover the time of the development of horse breeds, the establishment of studbooks and the implementation of extensive selective breeding.

Genetic makeup of modern horse breeds

Since domestication, the genetic diversity present in ancient horse populations has been exploited for selective breeding for a wide range of phenotypes. However, the creation of the about 500 specialized horse populations or breeds by intense artificial selection and the establishment of (closed) studbooks happened only during the past 100–200 years (Hendricks 2007). Owing to the wealth of available genome‐wide tools and resources (see above), these breeds can now be studied in detail for genetic makeup, signatures of selection and relatedness to other breeds. The number of publications in the field is growing almost exponentially and these include a few seminal studies encompassing a global collection of breeds (McCue et al. 2012; Petersen et al. 2013a,b; Jagannathan et al. 2019), and many studies specializing in a single breed or a group of related breeds, for example, the American Quarter Horse (Petersen et al. 2014a; Avila et al. 2018; Marchiori et al. 2019; Pereira et al. 2019), the Hanoverian and other German Warmblood breeds (Nanaei et al. 2019; Nolte et al. 2019), the Arabian and related Middle Eastern breeds (Almarzook et al. 2017; Sadeghi et al. 2019), and some native/primitive breeds such as Hucul and Konik (Gurgul et al. 2019), Chinese native horses (Zhang et al. 2018), Japanese native breeds (Tozaki et al. 2019), the Yakutian Horse (Librado et al. 2015) and Korean horses (Seong et al. 2019). This is by no means an exhaustive list and it is even longer when including breeds that have recently been studied based on a genome‐wide collection of microsatellite markers, such as Estonian horse breeds (Sild et al. 2019) or Konik (Szwaczkowski et al. 2016). Despite the large number of studies, the core findings are rather similar. Collectively, modern horse breeds are characterized by high inter‐breed and low intra‐breed genetic diversity (McCue et al. 2012; Petersen et al. 2013a). Genomes of modern horses show multiple regions with signatures of selective pressures on performance traits and phenotypes. Among these, the most prominent are the MSTN gene in ECA18 for muscle fibers in racing breeds (Petersen et al. 2013b; Avila et al. 2018), the DMRT3 gene in ECA23 to perform alternative gaits in many breeds (Petersen et al. 2013b), and a region in ECA11 for body size in draught breeds and Miniature Horses (Petersen et al. 2013b). Clear signatures of selection are also found at known coat color loci. For example, the recessive chestnut coat color locus at MC1R is defined by a conserved approximately 750 kb haplotype across all breeds studied (McCue et al. 2012) and robust signatures of selection at the TBX3 locus in ECA8 are found in Konik horses, known to be selected for the dun color (Gurgul et al. 2019). Finally, selective breeding for the desired traits in modern breeds has unwillingly introduced accumulation of deleterious mutations (Librado et al. 2016, 2017; Jagannathan et al. 2019). For example, recent WGSs of 88 horses across 25 breeds identified heterozygotes for two potentially deleterious recessive alleles: a nonsense variant in the PALB2 gene, which is essential for mesoderm development, and a nonsense variant in the PLEKHM1 gene, necessary for osteoclast functions (Jagannathan et al. 2019).

Investigating Mendelian traits

The investigation of Mendelian traits in the horse began in the early 1900s. Because of the relative ease of tracking simply inherited phenotypes across generations, some of the first Mendelian traits to be studied involved variations in pigmentation. In fact, Alfred Sturtevant, an undergraduate student working on mapping traits in Drosophila under Thomas Hunt Morgan, was among the first to publish on inheritance of coat color in harness horses (Sturtevant 1920). However, it was almost 100 years later when the genetic mechanism proposed by Sturtevant for the chestnut coat color was identified at the molecular level (Table 2). This and other variants identified in the early 2000s influencing pigmentation were discovered using candidate gene approaches. To date, 58 variants affecting pigmentation have been described, including 27 in the KIT gene that contribute to the dominant white phenotype. The identification of the majority of these pigmentation variants was made possible by the high quality of the horse reference genome sequence and available SNP array tools (Tables 2 and S1 and described above). For example, the initial horse linkage map developed by the horse genomics community (Penedo et al. 2005) allowed for the mapping of several pigmentation traits with pleiotropic effects, such as leopard complex spotting (Terry et al. 2004) and gray depigmentation (Pielberg et al. 2005). However, the discovery of causal variants was possible only through the availability of a reference genome that enabled both whole genome and RNA‐seq data analyses to uncover large structural variants (Rosengren Pielberg et al. 2008; Bellone et al. 2013). The first genetic disorder to be identified at the molecular level in the horse was hyperkalemic periodic paralysis, reported in 1992. Much like the early work investigating pigmentation in the horse, this was discovered by a candidate gene approach. At the time the first draft of the horse genome sequence was complete, nearly 15 years later, only nine disease causing mutations had been discovered (Tables 3 and S2). In the last several years, there has been acceleration in the discovery of causal or highly associated variants for Mendelian traits, with 14 variants reported in the last four years. This acceleration in discovery is due to advances in genomic tools and resources. DNA testing for these Mendelian traits has enabled marker‐assisted selection to be utilized by horse breeders and in some cases is helping to assist in the clinical management of disease.

Genetics of complex traits

Traits considered complex are those which are not determined by one or a few genomic variants but rather by small contributions of many and perhaps hundreds of variants across the genome. These traits are generally lowly heritable, as the phenotypic outcome is determined not only by the underlying genetic variation but also by a significant impact of the environment (e.g. nutrition, training). The role of the environment complicates both the identification and understanding of the genetic components driving these phenotypes. That said, complex traits are of significance to the industry, both economically and for animal wellbeing and include measures of athletic performance, growth and body size, and disorders such as metabolic syndrome, laminitis, equine asthma or recurrent airway obstruction (RAO) and osteochondrosis (OC), among others. Given the availability of genome‐wide genotyping technologies, researchers have been working to piece together the role of genetics in these complex traits. What is known regarding the genetic basis of several of the most highly studied complex traits in horses is outlined below.

Size

Unlike many complex traits, the size of the horse is highly heritable with estimates for wither height ranging from 0.52 to 0.78 (Molina et al. 1999; Zechner et al. 2001; Suontama et al. 2009; Signer‐Hasler et al. 2012). Not surprisingly, positive genetic and phenotypic correlations are reported between wither height and other growth phenotypes such as body length, heart girth and cannon bone circumference (Molina et al. 1999; Sadek et al. 2006). Providing insight into the biology underlying size, Makvandi‐Nejad et al. (2012) employed the 50K SNP array to identify four loci that explained a majority of variation in size across‐breeds. The loci identified in this work include LCORL/NCAPG, HMGA2, ZFAT and LASP1, most of which have also been shown to play a role in size and growth phenotypes in other species (Snelling et al. 2010; Weikard et al. 2010; Lindholm‐Perry et al. 2013; Saatchi et al. 2014). Additional work has supported the association of LCORL/NCAPG (Signer‐Hasler et al. 2012; Tetens et al. 2013; Staiger et al. 2016a; Tozaki et al. 2017), ZFAT (Signer‐Hasler et al. 2012; Tozaki et al. 2017), and HMGA2 (Frischknecht et al. 2015) with height in various other populations. Whereas the means by which these loci act to alter growth traits is not fully understood, a missense mutation in HMGA2 was reported to alter DNA binding, which was attributed to reduced size in ponies (Frischknecht et al. 2015) and was also recently correlated with metabolic syndrome in Welsh ponies (Norton et al. 2019a). Functionally, variation of LCORL was shown to alter its expression, which may explain its role in determining an individual's size (Metzger et al. 2013). Interestingly, LCORL has also been associated with other complex disorders, including recurrent laryngeal neuropathy (Boyko et al. 2014) and OC (Corbin et al. 2012), serving as another example of the pleiotropic effects of loci involved in complex traits. As height has been associated with both OC and recurrent laryngeal neuropathy (Brakenhoff et al. 2006; McGivney et al. 2019), these associations are biologically relevant. The power of high‐density genome‐wide genotyping arrays has also enabled the identification of loci with more minor effects than those found by Makvandi‐Nejad et al. (2012) or which may be breed specific (Meira et al. 2014b; Frischknecht et al. 2016; Metzger et al. 2018). It is important that height is treated distinctly from dwarfism, which is simply inherited in several populations including the Miniature Horse (Metzger et al. 2017; Eberth et al. 2018) and Friesian (Leegwater et al. 2016; Table 3).

Athletic performance

The athletic ability of a horse, whether it be jumping, racing, cutting or pulling, is clearly complex, depending upon efficient metabolic and musculoskeletal properties as well as intricate interactions and the influence of training and husbandry. Particularly in Thoroughbreds, the genetics of racing performance has been of long‐standing interest. Prior to the availability of genotyping technologies, pedigree and performance data were used to examine the genetic components of phenotypic variance for Thoroughbred, Standardbred and sport horse performance (Hintz & Vanvleck 1978; Ojala et al. 1987; Tavernier 1990; Árnason 2001; Ricard & Chanu 2001; Langlois & Blouin 2004, 2007). Heritability estimates for racing in the Thoroughbred range greatly from nearly zero to upwards of 0.75, depending upon the specific phenotype measured (e.g. race time, race length or race winnings) and the model assumed (O'Ferrall & Cunningham 1974; Gaffney & Cunningham 1988; Williamson & Beilharz 1996; Thiruvenkadan et al. 2009a). As genotyping methodologies became available, regions implicated in racing performance have been identified using candidate gene (Gu et al. 2010; Hill et al. 2010a) and genome‐wide association analyses (Binns et al. 2010; Hill et al. 2010b; Tozaki et al. 2010; Shin et al. 2015) and transcriptome analyses (Park et al. 2014), and through the identification of selective sweeps in racing populations (Moon et al. 2015). Genes implicated include COX4I2 (Gu et al. 2010) and PDK4 (Hill et al. 2010a), both involved in cellular respiration. Although these studies have uncovered evidence of genetic factors involved in the racing performance of Thoroughbreds, a single locus, MSTN, has been repeatedly associated with racing performance (Binns et al. 2010; Tozaki et al. 2010). In 2010, an intronic variant of MSTN was noted to be predictive of a horse's best racing distance (Hill et al. 2010b); the use of a genetic test for this variant has been adopted as a means to tailor training programs or choose matings. Since the first publication of this variant, several lines of evidence have supported the role of a SINE insertion in the promoter of MSTN, in high linkage disequilibrium with the intronic SNP in the Thoroughbred breed, as the functional variant (Petersen et al. 2014b; Santagostino et al. 2015; Rooney et al. 2018). Outside of the Thoroughbred, the MSTN variant predictive of better suitability as a sprinter is highly frequent in the Quarter Horse and associated with a higher proportion of fast‐twitch muscle fibers (Petersen et al. 2013b, 2014b). Additional studies in the racing Quarter Horse have identified other loci associated with racing performance; as a result of these works, genes involved in muscle contractility, skeletal development and neurologic function have been suggested to be associated with sprinting (Meira et al. 2014a,c; Beltran et al. 2015). In some cases, variants for racing speed are common across breeds (Shin et al. 2015; Pereira et al. 2016), fitting a hypothesis that these horses shared selective pressures for superior metabolic and musculoskeletal traits. Efforts to understand the genetic components of variation in harness racing horses suggest that their heritability is low to moderate (reviewed in Thiruvenkadan et al. 2009b). In the harness racing populations, similar to the role of MSTN in the Thoroughbreds, a single variant was reported to impact performance to the extent that the variant is nearly fixed in trotting breeds (Promerova et al. 2014). The gene implicated, DMRT3, alters motor coordination and stride length (Andersson et al. 2012); horses homozygous for the variant perform at a higher level than those heterozygous or wt (Jaderkvist et al. 2014; Jaderkvist Fegraeus et al. 2015). Interestingly, this variant was identified not in a study of racing performance but in a study of another complex trait in the horse—the ability to perform an alternative gait (Andersson et al. 2012), again demonstrating the interplay among physiological pathways. As an aside, whereas the DMRT3 variant is deemed necessary for ‘gaitedness’, the basis of variations in gait is yet to be understood (Patterson et al. 2015; Staiger et al. 2016a; Fegraeus et al. 2017; Fonseca et al. 2017). Requiring a different type of athleticism, endurance racing studied in Arabian horses identified five QTL, including genes hypothesized to be involved in neuronal and cardiac function (Ricard et al. 2017). As in complex traits, how the regulation of these genes or variants within them may enhance performance is an area of study. The differentially expression of microRNAs prior to and after endurance exercise is probably one means by which gene regulation is altered to allow horses to endure and excel at this type of performance (Mach et al. 2016). Sport horses require yet another type of athleticism and the heritability and genetics of show jumping and dressage have been studied in several European populations. Heritability estimates, calculated for the longevity (years) of performance, have been similar across studies ranging from 0.07 to 0.18 (Ricard & FournetHanocq 1997; Braam et al. 2011; Seiero et al. 2016; Ricard et al. 2017). Heritability estimates for show jumping range from 0.11 (Sole et al. 2017) to 0.61 (Stock & Distl 2007), and in most cases are greater than estimates for dressage in the same population (Viklund et al. 2010; Braam et al. 2011), although this is not the case in the Swedish Warmblood studied by Wallin et al. (2003). Finally, the use of the Illumina Equine50 SNP array to investigate the biology underlying the success of jumpers revealed a QTL explaining 0.7% of the phenotypic variance in French Warmbloods in a region including the candidate gene RYR2 (Brard & Ricard 2015). In the Hanoverian, a GWAS using the same genotyping platform identified six QTL including genes predicted to function in muscle structure and metabolism (Schroder et al. 2012).

Osteochondrosis

Osteochondrosis is a dysregulation of endochondral ossification of cartilage at the articular/epiphyseal complex most commonly occurring at the fetlock, hock and stifle joints (Jeffcott 1996). As a result, the cartilage becomes thickened and/or is retained, interfering with the normal function of the joint (Jeffcott 1996). The prevalence of OC varies by breed with relatively low incidence (~7%) in Thoroughbreds (Kane et al. 2003) and moderate frequency (~30%) in Danish Warmbloods (van Grevenhof et al. 2009), and with estimates of as many as 50% of Standardbred and German coldblooded horses (Wittwer et al. 2006; Lykkjen et al. 2012) being affected. Although it can be quite common, the heritability of OC is low to moderate (reviewed in Distl 2013; Naccache et al. 2018; McCoy et al. 2019), with a variety of environmental factors identified as having a significant role in its occurrence (Lepeule et al. 2009, 2013; Vander Heyden et al. 2013). Incidence has been positively correlated with size (Stock et al. 2005) and LCORL, itself associated with the size of a horse (described above), has been noted as a risk factor for OC and associated with incidence in GWAS (Teyssedre et al. 2012; Orr et al. 2013; Naccache et al. 2018). In Thoroughbreds, a QTL also on ECA3, although over 20 Mb distant from LCORL, was estimated to explain over 30% of the genetic variation (Corbin et al. 2012). Several research groups have been working to identify genetic risk factors for OC. The complexity as well as hypothesized population‐specific risk factors are evident in the many loci associated using genome‐wide SNP assays, many of which do not overlap between populations (Dierks et al. 2007, 2010; Wittwer et al. 2007; Lampe et al. 2009a,b; Lykkjen et al. 2010; Corbin et al. 2012; Teyssedre et al. 2012; Orr et al. 2013; McCoy et al. 2016; Lewczuk et al. 2017; Table 4).
Table 4

Complex equine diseases and traits with ongoing genetic studies.

Disease/trait (reference)BreedType of genetic studyGenomic region(s) identified
Atrial fibrillation (Physick‐Sheard et al. 2014; Kraus et al. 2017)StandardbredHeritability onlyProbably polygenic; no region identified to date
Body size (e.g. Makvandi‐Nejad et al. 2012)MultipleGWASLoci on ECA3, 6, 9 and 11
Bone fracture (Blott & Vaudin 2013; Blott et al. 2014)ThoroughbredGWASECA18
Brachygnathia (Signer‐Hasler et al. 2014)Franches‐MontagnesGWASECA13
Chronic progressive lymphedema (De Keyser et al. 2014),a Draft breedsCandidate gene approachContinuing to pursue sequencing ELN
Common variable immunodeficiency b VariousEpigenetic investigationRNA‐seq and Methyl‐Seq of E2A and PAX5
Corneal stromal loss (Lassaline‐Utter et al. 2014; Alberi et al. 2018)FriesianHeritability only/candidate geneLikely heritable; excluded BGN variants
Cribbing (crib‐biting) (Hemmann et al. 2014)MultipleCandidate geneExcluded subset of stereotypic genes
Cryptorchidisma IcelandicHeritability onlyLikely to be heritable
Degenerative joint disease (Welsh et al. 2013)ThoroughbredHeritability onlySmall to moderate heritability identified
Guttural pouch tympany (Metzger et al. 2012)Arabians and German WarmbloodsGWASECA15 (Arabians) and ECA3 (German Warmbloods)
Insect bite hypersensitivity (Schurink et al. 2012; Velie et al. 2016)Icelandic, Shetland and ExmoorGWAS

ECA7, 9, 10 and 17

ECA8 (Exmoor)

Recurrent laryngeal neuropathy (Dupuis et al. 2011; Boyko et al. 2014)Thoroughbred, Warmblood, Trotter and DraftGWAS

ECA3 (height locus; Thoroughbred)

,

ECA21 and 31 (Multiple breeds)

Metabolic syndrome (Lewis et al. 2017; Norton et al. 2019b,a) a Welsh pony, Arabian, MorganGWAS

ECA6 (Welsh pony)

,

ECA14 (Arabian)

,

multiple regions (Morgan)

Navicular disease (Diesterbeck et al. 2007; Lopes et al. 2009, 2010)WarmbloodsGWASECA2 and ECA10
Neuroaxonal dystrophy/equine degenerative myeloencephalopathy (Finno et al. 2013, 2014)Quarter HorseGWASECA8 region exclusion, exclusion of TTPA candidate gene
Osteochondrosis/osteochondrosis dissecans (Dierks et al. 2007; Lampe et al. 2009a,b,c; Sevane et al. 2016, 2017)Warmbloods, Trotters, Standardbreds, Spanish PurebredGWAS

ECA2, 4, 5, 16, 18 (Warmblood)

,

ECA10, 14, 21 (Standardbred)

,

candidate gene analysis (Spanish Purebred)

Polysaccharide storage myopathy, type IIa Quarter HorsesGWASECA18
Recurrent airway obstruction (Swinburne et al. 2009; Schnider et al. 2017; Mason et al. 2018)WarmbloodGWASECA 11, 13, 15 (Warmblood)
Recurrent exertional rhabdomyolysis (Fritz et al. 2012)Thoroughbred, StandardbredGWAS

ECA11, 16, 30 (Thoroughbred)

,

ECA10, 11 (Standardbred)

Recurrent uveitis (Kulbrock et al. 2013; Fritz et al. 2014)Appaloosa, German WarmbloodCandidate gene/GWAS

ECA1, 20 (Appaloosa)

,

ECA18, 20 (German Warmblood)

Sarcoid (Christen et al. 2013; Staiger et al. 2016b)

Franches‐Montagnes

, Quarter Horse, Thoroughbred
GWASECA20, 22 (QH, TB)
Stallion subfertility owing to impaired acrosome reaction (Raudsepp et al. 2012)ThoroughbredSusceptibility geneECA13: FKBP6
Stallion fertility (Schrimpf et al. 2015)HanoverianGWASECA13: FKBP6
Stallion fertility (Schrimpf et al. 2016)19 European breedsGWASHigh‐impact variants in CFTR (ECA4), OVGP1 (ECA5), FBXO43 (ECA9), TSSK6 (ECA21), PKD1 (ECA13), FOXP1 (ECA16), TCP11 (ECA20), SPATA31E1 (n/a), NOTCH1 (ECA25)
Swayback (lordosis) (Cook et al. 2010)SaddlebredGWASECA20

ECA, Equus caballus chromosome; GWA, genome‐wide association.

Abstract presented at the Dorothy Russell Havemeyer Foundation International Equine Genome Mapping Workshop.

Abstract presented at the Plant and Animal Genome Conference.

Complex equine diseases and traits with ongoing genetic studies. ECA7, 9, 10 and 17 ECA8 (Exmoor) ECA3 (height locus; Thoroughbred), ECA21 and 31 (Multiple breeds) ECA6 (Welsh pony), ECA14 (Arabian), multiple regions (Morgan) ECA2, 4, 5, 16, 18 (Warmblood), ECA10, 14, 21 (Standardbred), candidate gene analysis (Spanish Purebred) ECA11, 16, 30 (Thoroughbred), ECA10, 11 (Standardbred) ECA1, 20 (Appaloosa), ECA18, 20 (German Warmblood) Franches‐Montagnes, Quarter Horse, Thoroughbred ECA, Equus caballus chromosome; GWA, genome‐wide association. Abstract presented at the Dorothy Russell Havemeyer Foundation International Equine Genome Mapping Workshop. Abstract presented at the Plant and Animal Genome Conference. Functional studies show differential expression of the MMP‐13 gene, encoding for a matrix metallopeptidase, in the cartilage of horses with OC compared with controls (Mirams et al. 2009; Riddick et al. 2012), consistent with hypothesized dysfunction in cartilage maturation and endochondral ossification. Candidate gene expression studies also suggest that chondrocyte maturation and catabolism are altered through dysregulated Wnt signaling (Kinsley et al. 2015). Finally, Mirams et al. (2016) used subtractive hybridization of the transcriptome from cartilage of affected and unaffected foals resulting in a hypothesized etiology that involves cartilage retention in subchondral bone. In addition to protein coding transcripts, differentially expressed miRNAs have been identified and may play a role in alteration of gene expression associated with OC (Desjardin et al. 2014).

Equine metabolic syndrome and laminitis

Horses affected by equine metabolic syndrome (EMS) present with insulin resistance and obesity and/or regional adiposity; additionally, hypertriglyceridemia, elevated leptin and hypertension may occur (Frank et al. 2010). Horses with EMS have an increased risk of laminitis or the disruption of the attachment between the distal phalanx (coffin bone) and inner hoof wall, leading to the coffin bone being rotated downward into the sole of the hoof. The incidence of EMS and associated laminitis is greatest in ponies and obese horses (Treiber et al. 2006; Bailey et al. 2008). Not all cases of laminitis, however, are attributed to EMS as it can occur in conjunction with other wellbeing issues such as colic, abdominal or reproductive infection, or excessive concussion on hard surfaces (Hood 1999). SNP‐based heritability estimates for traits associated with EMS (e.g. circulating insulin, glucose, ACTH, leptin) have been reported to be quite high (Norton et al. 2019b). Genetic factors contributing to risk are also supported by varied rates of prevalence among breeds and variation in insulin responsiveness by breed (Treiber et al. 2006; Bailey et al. 2008; Bamford et al. 2014). In a study of pedigreed ponies, Treiber et al. (2006) proposed that one or a few major dominant genes contribute to predisposition to laminitis. Changes in gene expression related to the onset of laminitis have been studied as a possible means of both understanding the progression of the condition and identifying biomarkers to detect a possible bout prior to the onset of clinical symptoms. With a hypothesized inflammatory component, Tadros et al. (2013) found that circulating cytokine expression of IL‐1B, IL‐8 and IL‐10 was elevated several hours prior to the detectable onset of laminitis. A similar inflammatory response has been identified in the laminar tissue itself in horses induced to develop laminitis compared with healthy controls (Belknap et al. 2007; Loftus et al. 2007; Leise et al. 2010). These studies show a systemic inflammatory response associated with laminitis. However, whereas these studies help identify pathways involved in disease progression, they fail to answer questions of genetic risk. Toward the goal of understanding genetic risk factors for disease, Lewis et al. (2017) identified a single candidate gene, FAM174A, in Arabian horses associated with laminitis and correlated with an increased insulin‐to‐glucose ratio. However, a more complete understanding of the genetic risk factors for EMS remains elusive. Finally, the influence of the microbiome on the development of EMS and the associated laminitis is a recent area of study. Characterization of the hindgut microbiome revealed differences in microbial composition in horses with chronic (Steelman et al. 2012) and induced (Milinovich et al. 2006) laminitis, as well as in horses with EMS (Elzinga et al. 2016) compared with healthy controls. The composition of the microbiome itself has been suggested to be heritable (Blekhman et al. 2015; Goodrich et al. 2016), associated with genes involved in metabolism and immunity; these data connect host genetics to yet another means by which risk for EMS and/or laminitis may be amplified.

Equine asthma or RAO

Equine asthma, also known as RAO or heaves, is a chronic disease of the lower airway, particularly problematic in environments where air flow is limited and in which bedding or hay has high levels of dust or other respiratory irritants (reviewed in Woods et al. 1993; Ramseyer et al. 2007; Pirie 2014). The prevalence of RAO is estimated to be 14% in a sample of British horses (Hotchkiss et al. 2007) and 10% in Swiss Warmbloods (Ramseyer et al. 2007). Often compared with asthma in humans, affected individuals have difficulty breathing, neutrophil and mucus accumulation in the airway, cholinergic bronchospasm and coughing, and are especially sensitive to inhaled allergens (Robinson et al. 1996; Gerber et al. 2004). Whereas symptoms can be mediated by removing the horse from the problematic environment (Vandenput et al. 1998a,b), observations of increased risk in foals with affected parents, or foals of particular families (Marti et al. 1991; Ramseyer et al. 2007), have suggested a heritable component to its etiology. Before genomic tools, a hypothesis derived from the clinical accumulation of mucus in RAO‐affected horses led to the investigation of mucin gene variation as a possible risk factor. As a result, the equine ortholog of MUC5AC was identified as being upregulated in affected horses (Gerber et al. 2003). A candidate gene study also found an isoform of MYH11 to be overexpressed in affected horses (Boivin et al. 2014). The origin of the MYH11 isoform identified in Boivin et al. (2014) has been associated with alternative regulatory mechanisms (Issouf et al. 2018), providing an example of how variable genome regulation contributes to important phenotypes associated with animal health and wellbeing. In perhaps the most highly studied populations of horses, and prior to the genome assembly, the candidate gene IL4R was investigated in two half‐sibling families of Warmbloods using microsatellite genotyping (Jost et al. 2007). In one family, a significant association with RAO was found with a haplotype in ECA13 near the cytokine IL4R, and a recessive mode of inheritance was suggested. However, this association was not consistent in the other family, where an association with RAO was identified in ECA15 with a predicted dominant mode of inheritance. In a GWAS of these families using microsatellites, QTL were identified on ECA13, and whereas no association was found with the positional candidate gene ITGAX, IL4R was noted to be proximal to the hit (Swinburne et al. 2009). In Shakhsi‐Niaei et al. (2012) the same research group used SNP50 to fine‐map the region, which resulted in the identification of a signal on ECA13 in both the family from which the original QTL was identified and a group of unrelated horses; however, the result was not statistically significant and no clear casual mutation or genes were identified. Chromosome 13 was again the strongest region of association in a GWAS repeated on these horses with the high‐density Affymetrix SNP array after it became available (Schnider et al. 2017); the authors note positional candidate gene, TXNDC11, from those analyses. Sequencing of IL4R and expression from bronchoalveolar lavage fluid identified increased expression in horses from which the association was derived but not in other horse families (Klukowska‐Rotzler et al. 2012), further supporting its hypothesized role in RAO in this population. Additional follow‐up studies included quantitative PCR of candidate genes (Lanz et al. 2013) and RNA‐seq (Pacholewska et al. 2015) of peripheral blood mononuclear cells (PBMCs) derived from RAO‐affected and control horses that were stimulated with irritants such as hay dust or lipopolysaccharide. The role of IL4R in the population in which the QTL was identified was supported whereas other data continued to support different mechanisms of disease between the two families (Lanz et al. 2013). RNA‐seq of horses from the sample families found CXCL13 to be upregulated along with cell cycle regulatory transcripts such as CDC20, and genes involved in immune function and development (Pacholewska et al. 2015). Finally, Mason et al. (2018) conducted expression QTL studies of experimentally treated PBMCs from horses with prior RAO and healthy controls; whereas the studies supported prior SNP associations in these families, the identification of functional risk markers remains elusive. Lastly, considering the possible impact of variations other than nucleotide substitutions, genomic copy number variants were analyzed using a tiling array to compare variants of RAO with control horses (Ghosh et al. 2014). Over 700 CNVs were identified across samples with the RAO horses found to have, on average, more CNVs than the controls. Although no significant associations were identified, NME7, involved in ciliary function of the lung, had a suggestive (P = 0.06) association with the RAO phenotype, warranting further investigation (Ghosh et al. 2016).

Reproduction

Compared with other complex traits, the genomics of equine reproduction has been given relatively less attention, even though reproductive performance is of high economic importance for purebred horses and vital for survival in feral populations (Raudsepp et al. 2013; Metzger et al. 2015). Only a few candidate loci or genomic regions have been associated with various fertility parameters and phenotypes (Raudsepp et al. 2013). Among these, the FKBP6 gene is of particular interest because of contrasting associations: in Thoroughbreds it is associated with subfertility owing to impaired acrosome reaction (Raudsepp et al. 2012), but in Hanoverians it is associated with improved conception rates (Schrimpf et al. 2015). Like for other complex traits, genome‐wide SNP‐ or WGS‐based analyses are expected to also make a difference in fertility research. A good example is a recent whole‐genome screening that revealed high‐impact variants in nine putative male fertility genes (Schrimpf et al. 2016; Table 4). On a different note, breeding animals are typically selected on the basis of their pedigrees, athletic performance and appearance, but not for their reproductive potential. This suggests that there are no strong signatures of selection for reproductive performance. Surprisingly, this is not true as shown by a whole‐genome analysis of runs of homozygosity in six diverse breeds, including commercial breeds such as Hanoverian and Thoroughbred, and native breeds such as Sorraia (Metzger et al. 2015). The findings suggest a significant artificial as well as natural positive selection on reproduction performance in all types of horse populations.

Horse as a large animal model for humans

Unlike rats or mice, horses are large and expensive animals with long generation intervals, and therefore are not typical model species for studying human disease and physiology. However, some equine conditions, such as obesity, respiratory disease, orthopedic disease, equine recurrent uveitis and certain cancers translate into similar human conditions better than those from classical model species. For example, because of unique similarities between human and horse insulin resistance response to overfeeding, EMS has the potential to serve as a model for human obesity (Frank et al. 2010; Jacob et al. 2018). Likewise, naturally occurring equine asthma is recognized as a model for some forms of asthma in humans (Bullone & Lavoie 2017; Bond et al. 2018). As an athletic species, the horse is considered as an important large animal model for cardiovascular disease (Tsang et al. 2016) and musculoskeletal disorders, including osteoarthritis (McCoy 2015), as well as a model for articular cartilage repair and regeneration studies (Dias et al. 2018). The horse has also been proposed as a potential model for immune response for infections, such as acute synovitis and septic arthritis (Ludwig et al. 2016), and autoimmune disorders like recurrent uveitis (Witkowski et al. 2016). Furthermore, studies of melanoma in gray horses are expected to help dissect the molecular mechanisms underlying melanoma as well as vitiligo in humans (Rosengren Pielberg et al. 2008; van der Weyden et al. 2016).

Functional annotation of the equine genome

Despite the recent and rapid successes in identifying causative mutations for these mostly simple inherited traits, the genetic investigation of complex traits has not been as straightforward. Although there is strong evidence for the heritability of many complex diseases and traits (described above), despite a significant financial investment by the equine industry in phenotyping and genotyping large numbers of horses, functional genetic mutations have not been discovered for most complex traits examined (Table 4). Finding genetic associations of chromosomal segments to a specific phenotype without identifying functional or causative mutations within protein‐coding genes is not a problem unique to the horse. The idea that much of the non‐coding genome is ‘junk DNA’ (Ohno 1972) and uninteresting to consider further has been reconsidered after overwhelming evidence from the human and murine Encyclopedia of DNA Elements (ENCODE) projects, demonstrating that these regions are functionally important (Consortium et al. 2007; Yue et al. 2014). As many as 93% of the associated and potentially causative variants from human GWAS publications are outside of annotated protein‐coding regions (Hindorff et al. 2009). It is unlikely that these associations are aberrant, but rather they are identifying genomic regions involved in processes other than protein coding, such as regulation of gene expression (Maurano et al. 2012). In addition to the recent discovery that the majority of the genome is transcribed (Consortium et al. 2007), it has also been shown that 92–94% of human protein‐coding genes express multiple mRNA variants or isoforms (Wang et al. 2008). Many of the variants responsible for mapped equine diseases and traits in Table 3, in addition to others, may be regulatory, non‐coding mutations. The need for defining the tissue‐specific gene expression and regulation (i.e. functional annotation) across domestic animal species was acknowledged by the establishment of the International FAANG project (Andersson et al. 2015). The ultimate goal of this initiative is to provide high‐quality functional annotation of animal genomes in a coordinated effort that facilitates data sharing and analysis, while establishing standards for assay quality and continuity for metadata analysis. In an effort to establish a more direct connection between genome function and phenotype, the FAANG consortium has focused on initially assaying tissues from one to two individual animals representing a breed with minimal genetic diversity within a species (Andersson et al. 2015). The resulting data from this annotation across species will provide power to associate phenotypes with functional data, making it possible to generate and test hypotheses regarding the functional mechanisms underlying associations (Tuggle et al. 2016; Giuffra et al. 2019). As demonstrated by the human and murine ENCODE projects, five types of assays uncover the majority of variation in tissue‐specific gene expression and epigenetic modifications: Expression—RNA‐seq identifies expression levels of primarily protein‐coding genes. RNA expression—small RNA‐seq identifies expression of miRNAs, non‐coding RNAs that function primarily in RNA silencing. Histone modifications—ChIP‐seq identifies genome‐wide patterns of histone modifications using antibodies against the modified histone proteins. These prioritized marks have been standardized across species and those focused upon for the FAANG initiative and the genomic features they identify include H3K4me1 (enhancers and distal regulatory elements), H3K4me3 (active promoters and enhancers), H3K27me3 (gene silencing) and H3K27ac (active regulatory elements; Lee et al. 2014). Chromatin accessibility—the Assay for Transposase‐Accessible Chromatin using sequencing (ATAC‐seq) identifies regions of open chromatin. DNA methylation—reduced representation bisulfite sequencing identifies DNA methylation across the genome. In line with the priorities of the FAANG initiative, in 2016, an equine biobank was created based on the sampling and preservation of 86 tissues, two cell lines and fluids from two Thoroughbred mares (Table 5; Burns et al. 2018). Tissues were flash frozen, preserved for histopathology, fixed for chromatin‐immunoprecipitation, and in 16 tissues, nuclei isolation was conducted. This biobank is available for all researchers to access for assays appropriate to the goals of the FAANG initiative. Notably, extensive ante‐ and post‐mortem evaluation by veterinarians was conducted on these two horses to provide the highest standard of a true ‘reference’ database for researchers to utilize. In addition, both mares represented in the biobank had normal karyotypes. Utilizing the strict standard of phenotyping allows for both association of phenotype with genomic data and standardization of future sampling efforts, enhancing the utility of the data generated (Burns et al. 2018).
Table 5

Biobank tissues collected from two Thoroughbred mares.

Musculoskeletal system:

CartilageFetlock

CartilageStifle

Coronary Band

Deep Digital Flexor

Gluteal Muscle

Lamina

Long Bone Marrow

Longissimus Dorsi

Metacarpal Bone Diaphysis

Rib Bone Marrow

Sacrocaudalis Muscle

Sesamoid Bone

Superficial Digital Flexor

Suspensory Ligament

Cardiovascular system:

Aortic Valve

Heart Left Atrium

Heart Left Ventricle

Heart Right Atrium

Heart Right Ventricle

Left Lung

Mitral Valve

Pulmonic Valve

Trachea

Tricuspid Valve

Urogenital System:

Cervix

Mammary Gland

Ovary

Oviduct

Urinary Bladder

Uterus

Vagina

Nervous system:

Cerebellum Vermis

Cerebellum Lateral Hemisphere

Cornea

Corpus Callosum

C1 Spinal Cord

C6 Spinal Cord

Dorsal Root Ganglia

Dura Mater

Frontal Cortex

Hypothalamus

L1 Spinal Cord

L6 Spinal Cord

Occipital Cortex

Parietal Cortex

Pituitary

Pons

Retina

Sciatic Nerve

Temporal Cortex

Thalamus

T8 Spinal Cord

Cell types:

Fibroblasts (culture)

Keratinocytes (culture)

PBMCs

Body fluids:

Plasma

Serum

Cerebrospinal fluid

Synovial fluid

Digestive system:

Cecum

Duodenum

Epiglottis

Esophagus

Ileum

Jejunum

Left Dorsal Colon

Left Ventral Colon

Right Dorsal Colon

Right Ventral Colon

Small Colon

Stomach

Tongue

Abdominal/thoracic organs:

Adrenal Cortex

Adrenal Medulla

Kidney Cortex

Kidney Medulla

Larynx

Liver

Lymph Node

Pancreas

Spleen

Thyroid

Integumentary system:

Dorsum Skin (over back)

Gluteal Adipose

Loin Adipose

Neck Skin

Updated from Burns et al. (2018). Prioritized tissues for study are in bold. Additional tissues from which RNA‐seq data have been collected as funded by outside collaborators are shown in italics.

Biobank tissues collected from two Thoroughbred mares. Musculoskeletal system: Cartilage — Fetlock Cartilage — Stifle Coronary Band Deep Digital Flexor Gluteal Muscle Lamina Long Bone Marrow Longissimus Dorsi Metacarpal Bone Diaphysis Rib Bone Marrow Sacrocaudalis Muscle Sesamoid Bone Superficial Digital Flexor Suspensory Ligament Cardiovascular system: Aortic Valve Heart Left Atrium Heart Left Ventricle Heart Right Atrium Heart Right Ventricle Left Lung Mitral Valve Pulmonic Valve Trachea Tricuspid Valve Urogenital System: Cervix Mammary Gland Ovary Oviduct Urinary Bladder Uterus Vagina Nervous system: Cerebellum Vermis Cerebellum Lateral Hemisphere Cornea Corpus Callosum C1 Spinal Cord C6 Spinal Cord Dorsal Root Ganglia Dura Mater Frontal Cortex Hypothalamus L1 Spinal Cord L6 Spinal Cord Occipital Cortex Parietal Cortex Pituitary Pons Retina Sciatic Nerve Temporal Cortex Thalamus T8 Spinal Cord Cell types: Fibroblasts (culture) Keratinocytes (culture) PBMCs Body fluids: Plasma Serum Cerebrospinal fluid Synovial fluid Digestive system: Cecum Duodenum Epiglottis Esophagus Ileum Jejunum Left Dorsal Colon Left Ventral Colon Right Dorsal Colon Right Ventral Colon Small Colon Stomach Tongue Abdominal/thoracic organs: Adrenal Cortex Adrenal Medulla Kidney Cortex Kidney Medulla Larynx Liver Lymph Node Pancreas Spleen Thyroid Integumentary system: Dorsum Skin (over back) Gluteal Adipose Loin Adipose Neck Skin Updated from Burns et al. (2018). Prioritized tissues for study are in bold. Additional tissues from which RNA‐seq data have been collected as funded by outside collaborators are shown in italics. Eight tissues were prioritized in the initial equine annotation efforts (Table 5) owing to their cross‐species application (e.g. skeletal muscle, liver, lung and ovary) as well as importance to the horse (laminae). Additionally, as an international collaborative initiative, 24 individuals representing 16 research institutions in 10 countries voluntarily participated to support RNA‐seq of additional tissues by providing funding for transcriptome characterization of the tissue(s) most relevant to their own research; RNA‐seq datasets from these tissues are publicly available (EMBL: http://www.ebi.ac.uk/embl/). Seven laboratories also conducted additional assays, such as karyotype analyses, centromere mapping of fibroblast cells, reduced representation bisulfite sequencing of the eight priority tissues, fibroblast functional assays, functional assays on tissues of the suspensory apparatus and further phenotyping through the sequencing of microbiome samples. ChIP assays for four histone modification marks were recently completed on the eight prioritized tissues. At this time, ChIP assays for the major insulator‐binding protein in vertebrates, CCCTC‐binding factor and ATAC‐seq experiments are underway. All datasets will then be fully integrated and correlated with gene expression data and made publicly available as an equine‐specific tissue atlas to the entire equine research community.

Future directions

With the sheer volume of sequencing efforts currently underway in the horse, in addition to the FAANG efforts to define regulatory regions of the genome, the genetic contributions to complex traits can be discovered. Many of these more complex diseases (Table 4) probably have strong environmental contributions to the overall phenotype. Therefore, educating veterinarians and horse owners on the proper use of these genetic ‘risk factors’ in breeding management will be essential to advance the health of horses.

Concluding remarks and future perspectives

In this review, we demonstrate that the 10 years of post‐genome era in equine genomics have been unparalleled and decorated with outstanding achievements in almost all conceivable directions. These include improved characterization of the structure and function of the horse genome, delineating the genetic makeup of breeds and populations, deciphering the evolutionary ancestry of horses and continuing search for molecular causes of Mendelian and complex traits and diseases. The central pillar of support for this success is definitely the high‐quality horse reference genome combined with unprecedented advances in genomics technologies and global collaborations between researchers of diverse disciplines, clinicians, breeders and horse owners. Whereas similar achievements characterize the post‐genome era of all main domestic species, it must be emphasized that the collection of genome‐level sequence data from hundreds of ancient horses from the past few thousand years is unparalleled and not available for any other domestic or non‐primate species. This unique resource allows researchers to track the evolutionary past of any genomic features, particularly the sequence variants that are associated with equine traits of importance, such as performance, coat color, disease and adaptations. This also demonstrates that the history of domestic animals cannot be fully understood without ancient genomic data. The enhanced knowledge about the horse genome, biology and populations also highlights the areas that require critical improvement. Among these, high expectations are put on the equine FAANG initiative in order to identify important functional elements in the horse genome, particularly those underlying simple and complex traits. Also, the growing number of available WGSs of individual horses of diverse breeds and phenotypes allows for a comprehensive catalog of common and rare variants in the horse genome. This in turn is the foundation for equine precision medicine, which should identify novel genetic mutations in a small number of individuals and connect the variation with function.

Conflict of interests

The authors declare no conflict of interests. Table S1 Genetic variants identified for traits influencing pigmentation. Click here for additional data file. Table S2 Genetics variants underlying disease and performance traits in the horse. Click here for additional data file.
  262 in total

1.  Genome-wide association study of osteochondrosis in the tarsocrural joint of Dutch Warmblood horses identifies susceptibility loci on chromosomes 3 and 10.

Authors:  N Orr; E W Hill; J Gu; P Govindarajan; J Conroy; E M van Grevenhof; B J Ducro; J A M van Arendonk; J H Knaap; P R van Weeren; D E Machugh; S Ennis; P A J Brama
Journal:  Anim Genet       Date:  2012-12-25       Impact factor: 3.169

2.  The interleukin 4 receptor gene and its role in recurrent airway obstruction in Swiss Warmblood horses.

Authors:  J Klukowska-Rötzler; J E Swinburne; C Drögemüller; G Dolf; J Janda; T Leeb; V Gerber
Journal:  Anim Genet       Date:  2011-10-28       Impact factor: 3.169

3.  Radiographic changes in Thoroughbred yearlings. Part 1: Prevalence at the time of the yearling sales.

Authors:  A J Kane; R D Park; C W McIlwraith; N W Rantanen; J P Morehead; L R Bramlage
Journal:  Equine Vet J       Date:  2003-06       Impact factor: 2.888

4.  The DMRT3 'Gait keeper' mutation affects performance of Nordic and Standardbred trotters.

Authors:  K Jäderkvist; L S Andersson; A M Johansson; T Árnason; S Mikko; S Eriksson; L Andersson; G Lindgren
Journal:  J Anim Sci       Date:  2014-08-01       Impact factor: 3.159

5.  A genome-wide association study of osteochondritis dissecans in the Thoroughbred.

Authors:  Laura J Corbin; Sarah C Blott; June E Swinburne; Charlene Sibbons; Laura Y Fox-Clipsham; Maud Helwegen; Tim D H Parkin; J Richard Newton; Lawrence R Bramlage; C Wayne McIlwraith; Stephen C Bishop; John A Woolliams; Mark Vaudin
Journal:  Mamm Genome       Date:  2011-11-04       Impact factor: 2.957

6.  Genetic Determinants of the Gut Microbiome in UK Twins.

Authors:  Julia K Goodrich; Emily R Davenport; Michelle Beaumont; Matthew A Jackson; Rob Knight; Carole Ober; Tim D Spector; Jordana T Bell; Andrew G Clark; Ruth E Ley
Journal:  Cell Host Microbe       Date:  2016-05-11       Impact factor: 21.023

7.  A genome-wide association study reveals loci influencing height and other conformation traits in horses.

Authors:  Heidi Signer-Hasler; Christine Flury; Bianca Haase; Dominik Burger; Henner Simianer; Tosso Leeb; Stefan Rieder
Journal:  PLoS One       Date:  2012-05-16       Impact factor: 3.240

8.  Copy number variation in the horse genome.

Authors:  Sharmila Ghosh; Zhipeng Qu; Pranab J Das; Erica Fang; Rytis Juras; E Gus Cothran; Sue McDonell; Daniel G Kenney; Teri L Lear; David L Adelson; Bhanu P Chowdhary; Terje Raudsepp
Journal:  PLoS Genet       Date:  2014-10-23       Impact factor: 5.917

Review 9.  Cross-species models of human melanoma.

Authors:  Louise van der Weyden; E Elizabeth Patton; Geoffrey A Wood; Alastair K Foote; Thomas Brenn; Mark J Arends; David J Adams
Journal:  J Pathol       Date:  2015-10-09       Impact factor: 7.996

10.  A genome-wide association study identifies risk loci to equine recurrent uveitis in German warmblood horses.

Authors:  Maike Kulbrock; Stefanie Lehner; Julia Metzger; Bernhard Ohnesorge; Ottmar Distl
Journal:  PLoS One       Date:  2013-08-14       Impact factor: 3.240

View more
  10 in total

1.  Characterization of A Homozygous Deletion of Steroid Hormone Biosynthesis Genes in Horse Chromosome 29 as A Risk Factor for Disorders of Sex Development and Reproduction.

Authors:  Sharmila Ghosh; Brian W Davis; Maria Rosengren; Matthew J Jevit; Caitlin Castaneda; Carolyn Arnold; Jay Jaxheimer; Charles C Love; Dickson D Varner; Gabriella Lindgren; Claire M Wade; Terje Raudsepp
Journal:  Genes (Basel)       Date:  2020-02-27       Impact factor: 4.096

Review 2.  Impaired Reproductive Function in Equines: From Genetics to Genomics.

Authors:  Nora Laseca; Gabriel Anaya; Zahira Peña; Yamila Pirosanto; Antonio Molina; Sebastián Demyda Peyrás
Journal:  Animals (Basel)       Date:  2021-02-03       Impact factor: 2.752

3.  Genetic Variation and the Distribution of Variant Types in the Horse.

Authors:  S A Durward-Akhurst; R J Schaefer; B Grantham; W K Carey; J R Mickelson; M E McCue
Journal:  Front Genet       Date:  2021-12-02       Impact factor: 4.599

Review 4.  Decoding the Equine Genome: Lessons from ENCODE.

Authors:  Sichong Peng; Jessica L Petersen; Rebecca R Bellone; Ted Kalbfleisch; N B Kingsley; Alexa M Barber; Eleonora Cappelletti; Elena Giulotto; Carrie J Finno
Journal:  Genes (Basel)       Date:  2021-10-27       Impact factor: 4.096

5.  Y-Chromosomal Insights into Breeding History and Sire Line Genealogies of Arabian Horses.

Authors:  Viktoria Remer; Elif Bozlak; Sabine Felkel; Lara Radovic; Doris Rigler; Gertrud Grilz-Seger; Monika Stefaniuk-Szmukier; Monika Bugno-Poniewierska; Samantha Brooks; Donald C Miller; Douglas F Antczak; Raheleh Sadeghi; Gus Cothran; Rytis Juras; Anas M Khanshour; Stefan Rieder; Maria C Penedo; Gudrun Waiditschka; Liliya Kalinkova; Valery V Kalashnikov; Alexander M Zaitsev; Saria Almarzook; Monika Reißmann; Gudrun A Brockmann; Gottfried Brem; Barbara Wallner
Journal:  Genes (Basel)       Date:  2022-01-26       Impact factor: 4.141

6.  Y Chromosome Haplotypes Enlighten Origin, Influence, and Breeding History of North African Barb Horses.

Authors:  Lara Radovic; Viktoria Remer; Carina Krcal; Doris Rigler; Gottfried Brem; Ahmed Rayane; Khadija Driss; Malak Benamar; Mohamed Machmoum; Mohammed Piro; Diana Krischke; Ines von Butler-Wemken; Barbara Wallner
Journal:  Animals (Basel)       Date:  2022-09-27       Impact factor: 3.231

7.  Successful ATAC-Seq From Snap-Frozen Equine Tissues.

Authors:  Sichong Peng; Rebecca Bellone; Jessica L Petersen; Theodore S Kalbfleisch; Carrie J Finno
Journal:  Front Genet       Date:  2021-06-16       Impact factor: 4.599

8.  Functionally Annotating Regulatory Elements in the Equine Genome Using Histone Mark ChIP-Seq.

Authors:  N B Kingsley; Colin Kern; Catherine Creppe; Erin N Hales; Huaijun Zhou; T S Kalbfleisch; James N MacLeod; Jessica L Petersen; Carrie J Finno; Rebecca R Bellone
Journal:  Genes (Basel)       Date:  2019-12-18       Impact factor: 4.096

9.  Genetic diversity, evolution and selection in the major histocompatibility complex DRB and DQB loci in the family Equidae.

Authors:  Marie Klumplerova; Petra Splichalova; Jan Oppelt; Jan Futas; Aneta Kohutova; Petra Musilova; Svatava Kubickova; Roman Vodicka; Ludovic Orlando; Petr Horin
Journal:  BMC Genomics       Date:  2020-09-30       Impact factor: 3.969

Review 10.  Horse Clinical Cytogenetics: Recurrent Themes and Novel Findings.

Authors:  Monika Bugno-Poniewierska; Terje Raudsepp
Journal:  Animals (Basel)       Date:  2021-03-16       Impact factor: 2.752

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.