Literature DB >> 31698676

Genome-Wide Association Analyses of Equine Metabolic Syndrome Phenotypes in Welsh Ponies and Morgan Horses.

Elaine Norton1, Nichol Schultz1, Ray Geor2, Dianne McFarlane3, James Mickelson4, Molly McCue1.   

Abstract

Equine metabolic syndrome (EMS) is a complex trait for which few genetic studies have been published. Our study objectives were to perform within breed genome-wide association analyses (GWA) to identify associated loci in two high-risk breeds, coupled with meta-analysis to identify shared and unique loci between breeds. GWA for 12 EMS traits identified 303 and 142 associated genomic regions in 264 Welsh ponies and 286 Morgan horses, respectively. Meta-analysis demonstrated that 65 GWA regions were shared across breeds. Region boundaries were defined based on a fixed-size or the breakdown of linkage disequilibrium, and prioritized if they were: shared between breeds or across traits (high priority), identified in a single GWA cohort (medium priority), or shared across traits with no SNPs reaching genome-wide significance (low priority), resulting in 56 high, 26 medium, and seven low priority regions including 1853 candidate genes in the Welsh ponies; and 39 high, eight medium, and nine low priority regions including 1167 candidate genes in the Morgans. The prioritized regions contained protein-coding genes which were functionally enriched for pathways associated with inflammation, glucose metabolism, or lipid metabolism. These data demonstrate that EMS is a polygenic trait with breed-specific risk alleles as well as those shared across breeds.

Entities:  

Keywords:  EMS; fat metabolism; genetic risk factors; genetics; genome-wide association analysis; horses; insulin dysregulation

Mesh:

Substances:

Year:  2019        PMID: 31698676      PMCID: PMC6895807          DOI: 10.3390/genes10110893

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

Equine metabolic syndrome (EMS) is now best described as a clustering of risk factors often leading to laminitis, which is the primary clinical concern due to the painful and often career ending outcome of this disease [1]. Although laminitis itself is not fatal, in the best interest of the patient, the severity and crippling pain often leads to a decision of euthanasia [2]. The key component of EMS is insulin dysregulation, which are derangements in the balanced relationship between plasma insulin, glucose and lipids, and manifests clinically as baseline hyperinsulinemia, an exaggerated or prolonged insulin or glucose response post oral or intravenous carbohydrate challenge, tissue insulin resistance, or hypertriglyceridemia [1]. Although a dominant mode of inheritance for laminitis status was originally proposed for a small group of ponies [3], breed differences in EMS susceptibility, metabolic profiles, and clinical severity have led to the more widely applicable, alternative hypothesis that EMS is a complex disease, with both environmental and genetic risk factors contributing to disease severity. As a complex trait, it is likely that EMS is the result of a combination of genes with variable modes of inheritance, penetrance and effect size [4]. Recently, our laboratory provided evidence for this hypothesis through estimation of narrow sense heritability in a cohort of Morgan horses and Welsh ponies, where eight metabolic measurements were estimated to have low, moderate or high heritability [5]. Further, several heritability estimates varied across the two breeds, providing further evidence for breed related differences and are consistent with heritability estimates across ethnic groups for metabolic syndrome (MetS) in humans, a syndrome with shares distinct metabolic similarities to EMS. Although heritability estimates provide valuable insight into the genetic contribution to a trait, they do not provide information on the number of contributing genes, specific genes involved, or where in the genome these genes are located. Identification of the coding and non-coding variants contributing to a complex trait are important for understanding its complete pathophysiology and to gain a better understanding of how genes interact or are influenced by the environment. Further, this information is necessary for the development of genetic tests which would allow veterinarians to assess a patient’s risk for developing EMS before they show clinical signs, identify horses that need frequent monitoring or early environmental modifications, and provide responsible breeding recommendations. Genome-wide association analyses (GWA) has been used and validated across multiple species for both simple and complex traits to narrow down the genome to specific regions of interest harboring the risk alleles and to provide valuable information about the genetic architecture of a trait. For example, GWA for MetS have led to identification of quantitative trait loci harboring candidate genes in several metabolic pathways, including alleles influencing lipoprotein particle size and glucose, insulin and lipid homeostasis [6,7]. Further, these studies have identified different risk alleles amongst ethnic groups [8]. We hypothesized that major genetic risk factors leading to EMS are shared across breeds, and that differences in the severity and secondary features of the EMS phenotype between breeds, or between individuals within a breed, are the result of modifying genetic risk alleles with variable frequencies between breeds. The objectives of this study were to perform within breed GWA to identify significant contributing loci in Welsh ponies and Morgan horses, two breeds known to be high risk for EMS, and to use meta-analysis to identify shared and unique loci between both breeds. We further prioritized regions identified on GWA and performed a functional enrichment analysis on protein-coding genes within the prioritized regions.

2. Materials and Methods

2.1. Samples

Horses used in this study were a part of a large across-breeds study evaluating the EMS phenotype [9]. From this dataset, 264 Welsh ponies (194 females and 70 males with a mean age of 11.7 years) and 287 Morgan horses (184 females and 102 males with a mean age of 12.3 years) were included in this analysis. The Welsh pony cohort represented 74 section As, 146 section Bs, three section Cs, 15 section Ds, 19 section Hs, seven sections Ps, and 10 unknown/unregistered Welsh ponies (Appendix A). Samples were collected from 31 and 28 farms for the Morgan horses and Welsh ponies, respectively, throughout the United States and Canada. Sample procedures and protocols have been described elsewhere [5,9,10]. Briefly, phenotype data collected on all horses included signalment, medical history, laminitis status, environmental management (feed, supplements, turnout and exercise regimen), and morphometric measurements (body condition score (BCS), wither height, and neck and girth circumference). After an eight hour fast, an oral sugar test (OST) was performed using 0.15 mg/kg Karo lite corn syrup as previously described [11]. Biochemical measurements at baseline included insulin (TKIN1 Insulin Coat-A-Count Kit, Siemens Healthcare Diagnostics Inc., Los Angeles, CA, USA), glucose (YSI 2300 STAT Plus glucose and lactate analyzer, YSI Inc., Yellow Springs, OH, USA), non-esterified fatty acids (NEFA; Wako HR Series NEFA-HR kit, Wako Pure Chemical Industries, Ltd., Chuo-Ku Osaka, Japan), triglycerides (TG; TR0100 Serum Triglyceride Determination kit, MilliporeSigma, Saint Louis, MO, USA), adiponectin (EZHMWA-64K Human High Molecular Weight Adiponectin ELISA, MilliporeSigma, Saint Louis, MO, USA), leptin (XL-85K Multi-Species Leptin RIA, MilliporeSigma, Saint Louis, MO, USA) and ACTH (LKAC1 ACTH kits, Siemens Healthcare Diagnostics Inc., Los Angeles, CA, USA). Biochemical measurements 75 minutes after the OST included insulin (INS-OST) and glucose (GLU-OST). Horses with a history or phenotypic appearance of pars pituitary intermedia dysfunction (PPID) were excluded from the study. A previously laminitic horse was defined as an individual who had been diagnosed with pasture-associated or endocrinopathic laminitis by a veterinarian, had radiographic evidence of laminitis, or had signs indicative of chronic laminitis observed by the researchers at the time of sampling. Horses in which laminitis could have been caused by another inciting factor (history of illness, grain-overload, corticosteroid administration or PPID), or who had clinically-evident, acute laminitis at the time of sampling, were excluded from the study.

2.2. Genotype Data

DNA was isolated from whole blood or hair roots using the Puregene Blood Core Kit, (Qiagen, Germantown, MD, USA) per the manufacturer’s instructions. Genome-wide single nucleotide polymorphism (SNP) genotyping was performed on all horses. Horses were genotyped either on the EquineSNP50 Genotyping BeadChip (Illumina, Inc., San Diego, CA, USA; 268 Morgan horses), Axiom Equine MCEc670 array (Thermo Fisher Scientific, Coon Rapids, MN, USA; 220 Welsh ponies), or Axiom Equine MCEc2M array (Thermo Fisher Scientific, Coon Rapids, MN, USA; 44 Welsh ponies and 43 Morgan horses), containing 54,602 SNPs, 670,795 SNPs, and 2,011,826 SNPs across the equine genome including the 31 autosomes and X chromosome, respectively. Haplotype phasing and genotype imputation up to the ~two million SNPs present on the Axiom Equine MCEc2M array were performed on horses genotyped on the two lower density arrays using Beagle software [12]. Based on published recommendations [13], a cross breed population of 496 horses genotyped on the MCEc2M array, including the Welsh ponies and Morgan horses described above, were used as the reference population. Imputation concordance was determined to be 99.2% in the Morgan horses and 99.1% in the Welsh ponies. SNPs that did not have 100% concordance were removed from the data, yielding a total of 1,931,327 SNPs in the Welsh ponies and 1,932,766 SNPs in the Morgan horses. Quality control on the imputed data was performed using the Plink software package [14]. All horses passed quality control, including evaluation for discordant sex information and SNP genotyping rate (>95%). Individual SNPs that had a genotyping rate of <90%, a minor allele frequency (MAF) of <1.0% or were outside of Hardy-Weinberg equilibrium (p-values < 1.0 × 10−05), were removed. After genotype pruning, 1,428,337 and 1,158,831 SNPs remained for subsequent analyses in the Welsh ponies and Morgan horses, respectively. Of these, a total 688,471 SNPs were shared between both breeds. Base pair positions for all SNPs were mapped to EquCab3 [15].

2.3. Genome-Wide Association Analyses (GWA)

Eleven traits significantly associated with EMS, including insulin, glucose, adiponectin, leptin, NEFA, TG, ACTH, INS-OST, GLU-OST, and measures of obesity (neck circumference to withers height ratio [NH] and girth circumference to withers height ratio [GH]), were treated as quantitative response variables in the GWA analyses. Laminitis status was treated as a binary response variable. All quantitative traits were tested for normality using a normal probability plot and Shapiro test and adjusted for normality as appropriate. Adiponectin, leptin, and NEFA were square root adjusted and insulin, INS-OST and triglycerides and ACTH were log transformed. Glucose, GLU-OST, and NH and GH ratios were normally distributed and did not need to be adjusted. Trait measurements were adjusted to account for known confounding covariates using the residuals from a linear mixed effects model in the R software program Linear and Nonlinear Mixed Effects Models (nlme), with sex and age included as fixed effects and farm as a random effect [16]. For each trait, within breed GWA were performed from the imputed SNP genotype data using a custom code for an improved mixed linear regression analysis [9]. This algorithm utilizes a three-step process, which combines a Bayesian Sparse Linear Mixed Model (BSLMM) [17] available in the software program Genome-wide Efficient Mixed Model Association (GEMMA) [18] and a linear mixed model implemented in FaST-LMM [19] (see Appendix B for a full description of this model). The threshold for genome-wide significance was based on the effective number of independent tests (SNPs not in linkage disequilibrium [LD]) as calculated by the Genetic Type 1 Error Calculator [20]. In the Welsh ponies, this value was 841,750 SNPs, resulting in a Bonferroni-corrected threshold for genome-wide significance of 5.98 × 10−08. For the Morgan horses, the effective number of independent tests was 657,030 SNPs, resulting in a Bonferroni corrected threshold for genome-wide significance of 7.61 × 10−08. The suggestive threshold for both breeds was set at 1.00 × 10−05 [21,22]. Principal components analysis (PCA) revealed population stratification in the Welsh pony cohort based on clustering of the registered sections (Figure A1). To account for this population substructure, and avoid over-fitting the model, three separate GWA were performed using the full cohort (n = 264), sections A, B, C and D (n = 238) and sections A and B (n = 220). In order to maximize sensitivity, the union of the GWA results from all three cohorts was used (Appendix A).
Figure A1

Principal components analysis for the Welsh ponies showing clustering between sections.

2.4. Meta-analysis

A GWA meta-analysis was performed with the software program METASOFT [23] using the Morgan horse and Welsh pony GWA summary data from the 688,471 SNPs that were shared between breeds. Briefly, the METASOFT algorithm uses a random effects model which adjusts for heterogeneity between studies by allowing the effect size of the alternative allele to vary between populations. Unlike other random effects models, where both the null and alternative models assume heterogeneity, METASOFT uses a likelihood ratio test that assumes heterogeneity only under the alternative model [24]. The effective number of shared SNPs was 306,023 in the Morgan horses and 307,349 in the Welsh ponies as calculated by GEC. For a more conservative p-value, the threshold for genome-wide significance was determined using the effective number of SNPs for the Welsh ponies (0.05/307,349) and set at 1.63 × 10−07. The suggestive threshold was set at 1.00 × 10−05 [21,22]. To be considered a region of interest identified on meta-analysis (MA-ROI), at least one SNP needed to exceed the threshold for genome-wide significance.

2.5. Prioritization of GWA Regions and Identification of Positional Candidate Genes

All GWA regions where SNPs exceeded the suggestive threshold for significance were reviewed. To be considered within a single region, consecutive SNPs on the same chromosome had to be within 500 kb of each other [24,25]. Regions of interest had to contain a minimum of five SNPs exceeding the suggestive threshold, with at least one SNP exceeding the threshold for genome-wide significance.

2.5.1. Fixed-Size Regions

The boundaries of the fixed-size region were defined as 500 kb 5′ of the base pair position of the minimum SNP within the region and 500 kb 3′ of the base pair position of the maximum SNP [24,25,26,27,28,29]. A region was identified as shared if it was within the boundaries of another region and prioritized as described below.

2.5.2. LD-Bound Regions

To define the boundaries of the LD-bound region, the software program Plink was utilized to calculate the pairwise LD measures for all SNPs within the region [14]. Window size was set at 1 Mb from the minimum and maximum SNP within the region; a pairwise calculation for LD with the test SNP was performed for all SNPs within the window. The threshold for SNPs within LD was set at an r2 of greater than 0.3 [27]. A custom code was used to identify regions where LD for all SNPs dropped below 0.3 and spanned at least 100 kb both 5′ and 3′ to the widest peak of LD within the window, which was used to define the boundaries of the region. If LD did not drop for at least 100 kb on either side of the LD peak, window size was increased by 1 Mb until the region could be defined. An LD- bound region was identified as shared if it was within the boundaries of another LD-bound region and prioritized as described below.

2.5.3. Prioritization

Regions were prioritized if they were identified as shared between breeds on meta-analysis (MA-ROI), shared across traits within a single GWA cohort (for example, a region shared between insulin and adiponectin in the Morgan horses), or breed specific. The prioritized regions were categorized as high, medium or low priority (Figure 1) as follows:
Figure 1

Flow chart of the prioritization of the regions identified on GWA.

High priority: Region was identified as an MA-ROI or it was shared across traits with at least one region being considered an ROI. Medium priority: Region was identified as an ROI in at least one GWA cohort. Low priority: Region was shared across traits, but no regions met the criteria to be considered an ROI. If a region met the criteria for more than one category (for example a region identified as a MA-ROI and was also shared across traits but not an ROI) then the region was assigned the higher priority level.

2.5.4. Identification of Positional Candidate Genes and Functional Enrichment Analysis

Positional candidate genes were identified using the Bioconductor/R software package biomaRt [30] with EquCab3 as the reference genome [31]. Positional candidate genes were defined as all protein coding genes, pseudogenes, and RNA genes within each GWA region as defined by the fixed-size region or the LD-bound regions as described above. Based on the comparison between the fixed-size and LD-bound regions, protein-coding genes within all prioritized LD-bound regions were assessed for functional enrichment analysis between traits using a multi-query approach available in the g: Profiler toolset [32].

3. Results

3.1. GWA Results

The union of GWA results across all twelve traits for the Welsh ponies identified 303 regions where at least one SNP exceeded the suggestive threshold. Of these regions, 64 were considered ROI (i.e., five SNPs exceeding the suggestive threshold and one or more SNPs exceeded the threshold for genome-wide significance) and included: seven ROI for baseline insulin, one ROI for INS-OST, three ROI for baseline glucose, two ROI for GLU-OST, two ROI for NEFA, one ROI for triglycerides, two ROI for adiponectin, three ROI for leptin, three ROI for ACTH, 14 ROI for NH, 16 ROI for GH, and 10 ROI for laminitis status (Tables S1 and S2). GWA across all twelve traits for the Morgan horses identified 142 regions where at least one SNP exceeded the suggestive threshold. Of these regions, 37 ROI were identified and included one ROI for baseline insulin, one ROI for INS-OST, two ROI for baseline glucose, three ROI for GLU-OST, four ROI for NEFA, four ROI for adiponectin, three ROI for leptin, three ROI for ACTH, five ROI for NH, four ROI for GH, and seven ROI for laminitis status (Table S1).

3.2. Shared Regions Across Welsh Ponies and Morgan Horses and Across Traits

Identification of the shared regions between the Morgan horses and the Welsh pony cohort from the boundaries of the fixed region obtained from the GWA results identified one shared region for laminitis status, one shared region for ACTH, and one shared region for insulin-OST (Figure 2). The boundaries defined by LD identified the above shared regions as well as an additional shared region for GH on ECA 22.
Figure 2

Manhattan plots of the genome-wide association results for insulin concentration post oral sugar test in (A) Morgan horses and (B) the section A and B Welsh ponies. The equine chromosomes (ECA) are plotted on the x-axis and the −log of the p-value is plotted on the y-axis. The blue line indicates the suggestive threshold (1.0 × 10−05) and the red line represents the genome-wide significant threshold (7.61 × 10−08 in the Morgan horses and 5.98 × 10−08 the Welsh ponies). In the section A, B, C and D Welsh ponies (not shown) and A and B Welsh ponies (B), the same region on ECA10 exceeds the suggestive threshold, which was also identified as an ROI in the Morgan horses (A). GWA meta-analysis identified this region as shared across both breeds.

Meta-analysis across breeds identified the four shared regions above, as well as an additional 56 regions and five unique regions (regions not identified in either breed as significant on GWA), for a total of 65 shared regions of interest (MA-ROI). MA-ROI included two for insulin and four for glucose post oral sugar challenge, three for insulin, two for glucose, four for NEFA, seven for adiponectin, five for leptin, 15 for NH, eight for GH, and 12 for laminitis status. Unique regions were found for insulin (one MA-ROI) and glucose (one MA-ROI) post oral sugar test and NH (three MA-ROI). Across the MA-ROI, three regions were also shared across traits. No MA-ROI were identified for plasma triglyceride levels (Table 1).
Table 1

Meta-analysis across breeds and across EMS metabolic traits.

TraitChrMin_SNPMax_SNPSugg_SNPsSign_SNPsFE
Insulin 54410412945081679394-
1558878736225014212-
24288040432907691432-
INS-OST 107162083572425049194X
28383076993834459421-
Glucose 41805335718550035201-
89289661931261121-
GLU-OST 355982921565587425739X
42780267428514796184X
15796973637971760333-
28348616643486842022-
NEFA 11835323791841789322115X
171335595814014858231X
2420975408NA11-
302014817320205201103X
Adiponectin 216725632175319032519X
4371059383752304662X
63158234531708194171X
66709762868036518161-
18413998624153308191-
186013840060241267102-
2034470453609674104X
Leptin 7657310126580497463X
10871456NA11-
1948839140496276834422X
242855154428744981176-
ACTH 1697308867025718741-
18275570882879246101-
341684754NA11-
31012362871016186454224X
52882251529342972123X
1078846710NA11-
NH 188009187NA11-
358464229NA11-
451903203534747576440-
663614756638149842010-
922745020NA11-
93354979734165892311-
111898727219176693108-
14637789316387699872X
191134701113966922-
193223024533643392552-
20397975614016278574X
205965999760403627114X
2120193411212560321811X
2433852631348120353623X
GH 11214840571217758734719-
1 13151223913162182633X
484181768852751832911X
111898727219176693109X
173212014532544617234X
1928934939NA11-
206356097163691145106-
22401359634016750244X
LAM 149077969NA11-
2361041513610821966-
41776547318991639113-
1229378128302965091911X
148843022289591967205-
1831679672331345565126X
19280577562841733552-
1957605404584292063620X
22356531543076796238X
2312226548127630203524X
2894465079643240135X

To be considered an MA-ROI, at least one SNP had to exceed the threshold for genome-wide significance (1.6 × 10−07) on meta-analysis. Provided is the base pair position of the lowest (Min_SNP) and highest (Max_SNP) SNP, as well as the number of SNPs per region which exceeded the suggestive (Sugg_SNPs) and genome-wide significance (Sign_SNPs) threshold. Regions shared across two traits in the meta-analysis are represented by the corresponding highlighted chromosomes (Chr). Regions which were statistically significant using a standard fixed effects models (FE) are indicated by an X.

Of the 56 regions identified on meta-analysis that were only significant in one breed-specific GWA, 30 (22 ROI) were called in at least one Welsh pony cohort and 26 (20 ROI) were called in the Morgan horses. Twenty-eight of the MA-ROI contained less than five SNPs of which 11 were single SNP regions. Comparison of the results using a fixed effects model identified 32 of the 65 MA-ROI (Table 1).

3.3. Prioritization of GWA Results and Identification of Positional Candidate Genes Based on Fixed-Size Regions in Welsh Ponies

For the Welsh pony cohorts, 189 of the 303 regions were eliminated from further prioritization. For the remaining 114 regions, 46 regions were considered high priority regions and contained 890 positional candidate genes, 34 regions were considered medium priority regions and contained 719 positional candidate genes, and 35 regions were considered low priority regions and contained 289 positional candidate genes. Accounting for the 19 shared regions resulted in 91 unique regions and 1511 positional candidate genes (Table S7).

3.4. Prioritization of GWA Results and Identification of Positional Candidate Genes Based on Fixed-Size Regions in Morgan Horses

For the Morgan horses, 88 of the 142 regions were eliminated from further prioritization (Table S8). This resulted in 54 regions being prioritized and 1104 positional candidate genes with 38 high priority regions containing 801 positional candidate genes, eight medium priority regions containing 139 positional candidate genes, and eight low priority regions containing 164 positional candidate genes. Accounting for the 10 shared regions resulted in 44 unique regions and 963 positional candidate genes (Tables S8 and S9).

3.5. Prioritization of LD-Bound GWA Regions, Identification of Positional Candidate Genes, and Network Analysis in Welsh Ponies

In the Welsh ponies, the LD-bound regions identified five additional regions shared across traits (ECA1 for adiponectin and INS-OST, ECA5 for insulin and leptin, ECA6 for leptin and GH, ECA9 for INS-OST and NEFA, and ECA18 for insulin and GH). However, the LD-bound regions did not identify six regions as shared across traits that were identified with the fixed-size boundaries (ECA4 for leptin and GH, ECA10 for NH and GH, ECA14 for leptin and laminitis status, ECA19 for ACTH and laminitis status, ECA 28 for insulin and INS-OST, and ECA28 for adiponectin and leptin). This resulted in 214 regions being removed and 89 regions being prioritized with 56 high priority regions containing 1567 positional candidate genes (Table 2). Further, 26 regions were given medium priority and contained 620 positional candidate genes and seven regions were given low priority and contained 30 positional candidate genes for a total of 2217 positional candidate genes. Accounting for the 18 shared regions resulted in 1853 positional candidate genes (Table S13).
Table 2

LD-bound high priority GWA regions for the Welsh pony cohorts.

TraitChrMinMaxProteinCodingRNAGenesTotalGenes
Insulin 5354091044480645826738306
86935084475906595322355
1557483776612684011
187872085879634082246
242845101229887250246
INS-OST 1176773704176873704011
87317345573699198112
107196778372438937303
283932218839488807819
Glucose 158372817883828178202
GLU-OST 283427194935138699909
Adiponectin 1171861236178270042252449
1860060215613490457613
Leptin 5397517975043176920732239
64881374012580151025
76567837668117086123
106920551068890055
212294068123516697101
NEFA 1910057181105718202
283290954235703535651176
ACTH 14294440345232767549
1695587377096058971623
105506051256255134112
10787957108030661320525
206038185060481850000
NH 467130904698732968816
47729824181186565241540
48314484283244842101
79317699193628686011
9326322353758726910818
11183421171987624755460
146370252263847210022
20402440074121087631114
206072301461735694022
2152809936396786268
211951528025046226222749
243184348036758215471057
GH 11321847721337161249716
46842567869636837516
47002625481648125494595
48257001186366835492575
79319167693628672011
11154143371645146324125
11186138951931753626026
187952748481467661131225
193120459631799125000
20294866303097676354862
205946456661015217123
206472242765336095134
212061196322057711347
224103288941066045000
251943504119535041404
LAM 14939103249491032011
235880861366654738614
195708202562825378421659
28999089210844823404
Total 11464151567

Final region boundaries were based on LD and are indicated by the lowest base pair position (Min) and the highest base pair position (Max). The total number of genes includes all protein-coding genes, pseudogenes, and RNA genes identified for region based on EquCab3. Shared regions across prioritized traits are indicated by highlighted chromosomes.

Across nine EMS traits, functional enrichment analysis of the protein-coding genes within the prioritized regions identified enrichment (adjusted p-value < 0.05) for Gene Ontology (GO) terms associated with inflammation and fatty acid metabolism, including: interleukin-1 receptor binding (GH), lipid antigen binding (insulin), RAGE receptor binding (insulin and leptin), toll-like receptor binding (insulin and leptin), fatty-acyl-coA-synthase (INS-OST), cytokine activity (ACTH, NEFA, GH and NH), and regulation of NK-kappa signaling (GH, glucose, insulin, laminitis status, leptin, NEFA and NH). Enrichment for protein-coding genes associated with the GO terms thiosulfate sulfurtransferase activity was identified for NEFA concentrations (p-value = 4.9 × 10−02).

3.6. Prioritization of LD-Bound GWA Regions, Identification of Positional Candidate Genes, and Network Analysis in Morgan Horses

The LD-bound GWA regions for the Morgan horse identified three additional regions shared across traits (ECA 21 for triglycerides and adiponectin, ECA 6 for adiponectin and INS-OST, and ECA 19 for NH and laminitis status), but did not identify two regions as shared across traits that were identified with the fixed boundaries (ECA 20 for adiponectin and insulin and ECA 24 for insulin and NEFA) (Table S14). This resulted in 39 high priority regions containing 1142 positional candidate genes (Table 3). Further, eight regions were assigned medium priority and contained 155 positional candidate genes, and nine regions were assigned low priority and contained 176 positional candidate genes for a total of 1473 positional candidate genes. Accounting for the 12 shared regions resulted in 1167 positional candidate genes (Tables S14 and S15).
Table 3

LD-bound high priority GWA regions for the Morgan horses.

TraitChrMinMaxProteinCodingRNAGenesTotalGenes
Insulin 242113489721184897101
INS-OST 42837320228423202000
63275155234029749121022
1071666607735340536512
Glucose 417239374190438316511
811193683124045728917
GLU-OST 3557463385808599715621
42669561629116058549
NEFA 1184859013187238015241741
171265383514464765426
24202878352097340115116
302091547321380977000
Adiponectin 21636290418105119212142
43472339839321960361047
63248628732841880347
6642974037149304716822191
184144841441498414011
20364905243258728311
Leptin 45159068052810437459
19512864935395902871421
24255647652938467971421
1827009338426978318524
342674448444220132710
3102944842103801021246
52537887827689002121628
NH 1820977188361852314620
4520244705423774781220
6604106477057077314426172
196619781345372416
193296279537391949531973
GH 1120644115124691346362056
173180606033720086337
LAM 417301415198126538716
123288527834800986291645
148791619091602875322658
183009526635177011231336
193013382630183826202
2228434765225020131023
23765640412984095112334
Total 7643671142

Final region boundaries were based on LD and are indicated by the lowest base pair position (Min) and the highest base pair position (Max). The total number of genes includes all protein-coding genes, pseudogenes, and RNA genes identified for region based on EquCab3. Shared regions across prioritized traits are indicated by highlighted chromosomes.

Functional enrichment analysis of the protein-coding genes within the prioritized regions identified enrichment (adjusted p-value < 0.05) for the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms PI3K-AKT signaling pathway (ACTH, adiponectin, GH, glucose-OST, INS-OST, laminitis status, leptin, NEFA, and NH) and the Wnt-signaling pathway (ACTH, adiponectin, INS-OST, NH, and TG). Enrichment was also identified for the GO terms arylesterase activity for adiponectin (adjusted p-value = 1.3 × 10−02) and G-protein-coupled receptor activity for leptin (adjusted p-value = 1.2 × 10−21).

4. Discussion

In this study, we used high density SNP genotype data and GWA in two high risk breeds to identify hundreds of regions of the genome contributing to 12 EMS traits. Both fixed (500 kb) and linkage disequilibrium-based approaches were used to identify the boundaries of genomic regions identified on GWA and from there positional candidate genes within these regions. Within breed prioritization of the LD-bound regions was used to begin the process of organizing and managing this large list, and resulted in 56 high, 26 medium, and seven low priority regions, for a total of 1853 candidate genes in the Welsh ponies; and 39 high, eight medium, and nine low priority regions, for a total of 1167 candidate genes in the Morgan horses. Meta-analysis demonstrated that 65 of these regions were shared across breeds. These data support the hypothesis that EMS is a polygenic trait with both across breed and breed-specific genetic variants and predict that identifying even the major functional variants and risk loci will be a challenging task. Comparison of the regions identified from the within breed GWA identified fewer shared regions across breeds than we anticipated. This could indicate that breed differences account for more of the risk alleles for EMS than previously thought, or that additional regions were shared but not identified in the breed-specific GWA, which can occur for several reasons. First, if the allele frequency of the variant is low in one breed, then it will not be detected on that GWA. Second, the effect of the variant on a trait can vary between breeds. The within-breed population sizes were powered to detect variants of moderate to high effect but would not find variants of low effect [33,34]. Third, variations in recombination of the ancestral chromosome can lead to differences in marker alleles between populations [35]. Depending on the markers represented on the genotyping array, the variant may be identified in one breed but not the other. Increasing the power of the study by performing across-breed GWA could identify more shared regions between breeds. However, combining data can lead to the inclusion of additional population substructure and unknown confounding variables into the model [36]. Therefore, we used meta-analysis to increase the number of individuals within the analysis and to improve the power to find unique associations, variants of low effect, and additional shared regions across breeds [37,38], and identified 65 shared regions of which five were unique (not identified in either breed specific GWA). Selective breeding can lead to population stratification within breeds [39], and not accounting for this population stratification can lead to spurious associations on GWA [40]. For this data, principal components analysis revealed population stratification in the Welsh pony cohort based on clustering of the registered sections (Figure A1). This was not unexpected as the Welsh pony sections are distinct subpopulations based on pedigree and conformation. Mixed linear models are a common way to account for population stratification and relatedness in GWA with the inclusion of a genetic relationship matrix (GRM) [18,41]. However, the Welsh ponies presented a unique challenge since, although the GRM would account for genetic similarities between Welsh pony sections, it would not account for all the phenotypic (conformation traits) variation between sections. On the other hand, including both the GRM and section as a covariate would lead to over-fitting of the model by accounting for relatedness both as a random effect (GRM) and fixed effect (section). Therefore, to account for population stratification within our Welsh pony cohort while maximizing sensitivity to identify genetic variations contributing to EMS both within and across sections, we chose to perform the GWA using the full data set and then subset the data to the section A, B, C and D ponies, and the section A and B ponies. Notably, there is likely a degree of population segregation in the section A and B which we are not fully controlling with a mixed model analysis, and ideally, we would have included the section A, B, C and D ponies as a separate GWA cohorts in order to account for all population substructure; however, our sample size made these analyses unfeasible due to the limited statistical power. To reduce false positives, regions were prioritized based on sharing across breeds and traits. Regions shared across breeds (MA-ROI) were given high priority, as these regions were not breed specific and likely to be found in other high-risk breeds. Regions shared across traits with at least one ROI were also assigned high priority since many components and downstream effects of the endocrine system are highly interrelated; therefore, a variant affecting multiple traits would be expected to have a larger biological effect then a variant affecting a single trait. An ROI identified in one GWA cohort was assigned medium priority as these regions were likely breed or section (Welsh pony) specific and, based on the power of our study, variants of moderate to high effect. Finally, regions that were not an ROI but shared across traits were assigned low priority, because these regions were identified across multiple GWA and were less likely to be false positives and/or regions that contain variants of low effect. Our prioritization removed 71% of the 303 GWA regions for the Welsh ponies and 58% of the 142 GWA regions in the Morgan horses. Of the removed regions, 49% were single SNP regions, 38% were regions with less than five SNPs, and 13% were regions with greater than or equals to five SNPs but no SNPs which exceeded the threshold for genome wide significance. Given: (i) the large percentage of single or low SNP regions that were removed, (ii) the high-density genotype data used in these analyses, and (iii) the use of the max gamma value for BSLMM (improving sensitivity at the cost of specificity), it is likely most of these regions were false positives. However, we utilized Bonferroni corrected p-values which tend to be more conservative corrections [42]; therefore, some of the removed regions may harbor genetic variants associated with EMS but represent variants with very low effect or poorly annotated regions of the genome (relative decreased number of SNPs in that region). Increasing the number of individuals, or represented Welsh pony sections, would improve the power of the study to determine which of these regions were true or false positives. In order to identify candidate genes, we first used a fixed boundary of 500 kb 5′ of the SNP identified on GWA with the lowest base pair position and 3’ of the SNP with the highest base pair position. 500 kb was chosen based on the average distance for LD to breakdown in Thoroughbred horses [26,27,29]. Although LD decay varies between horse breeds [28], using the more conservative Thoroughbred estimate gave a higher likelihood that we would capture all variants within LD (r2 > 0.3) of the marker SNPs in our cohorts. From the fixed boundaries, 1511 and 963 positional candidate genes were identified in the Welsh ponies and Morgan horses, respectively. Estimates of LD decay are based on the average r2 across chromosomal segments and do not represent specific regions of the genome [26,28]. Newer variants or variants within regions under selection will have longer LD blocks whereas older/ancestral variants will have shorter LD blocks due to longer periods of recombination. Therefore, using a fixed region has the potential to exclude causal variants or to include candidate genes that are not in LD with the marker SNPs. To more precisely call positional candidate genes for GWA regions, we calculated LD using the squared correlation coefficients between SNPs. SNPs within LD were defined as an r2 > 0.3 [28]. Boundaries were identified based on gaps of LD, i.e. where all SNPs dropped below 0.3 for a span of 100 kb 5′ (defined the start of the LD block) and 3’ (defined the end of LD block) to the widest peak of LD. Across all Welsh pony cohorts, 70% of the boundaries identified by LD were smaller than those identified by the fixed-size region, with an average difference of 645.4 kb (range of 11.4 kb to 1.7 Mb); whereas, in the Morgan horses, 57% of the LD boundaries were smaller than that of the fixed regions, with an average difference of 566.6 kb (range of 51.5 kb to 2.2 Mb). The large percentage of fixed-size boundaries, likely overestimating the region size, is not surprising given that the fixed-size regions were based on data from Thoroughbreds which have one of the highest inbreeding coefficients and LD amongst horse breeds [24,28]. Ponies and Morgan horses were identified to have LD similar to Quarter Horses [28], a breed with a high level of genetic diversity. For the remaining regions, the LD boundaries were an average of 1.9 Mb longer (range of 22.8 kb to 9.3 Mb) in the Welsh ponies and 1.4 Mb longer (range of 12.6 kb to 8.2 Mb) in the Morgan horses then defined by the fixed-size region and likely represent regions under selection. Results of the functional enrichment analysis of the prioritized regions based on LD provided support for our approach. In the Welsh ponies, nine traits were enriched for pathways associated with inflammation or fatty acid metabolism. This is not surprising given that chronic, low-grade inflammation and dyslipidemia are common components of EMS. The latter results are also consistent with human GWA studies of MetS which have consistently identified a large number of variants related to lipid metabolism [43,44,45]. In addition, the GO term thiosulfate sulfurtransferase (Tst, Rhodanese) activity was enriched in positional genes identified for plasma NEFA concentrations. Tst encodes a mitochondrial enzyme which is involved in mitochondrial energetics and removal of reactivate-oxygen species and was identified as a candidate obesity-resistant gene in the polygenic Lean mouse model [46]. In this study, the authors evaluated Tst mRNA concentrations in multiple cross-ethnic human populations and found that they were higher in lean individuals compared with obese individuals or individuals with type 2 diabetes and negatively correlated with body mass index [46]. In the Morgan horses, the KEEG terms for PI3K/AKT and Wnt signaling pathways were functionally enriched across several traits. Both of these pathways have significant roles in metabolism, with the PI3K/AKT signaling pathway being essential for glucose homeostasis and lipid metabolism [47,48], and the Wnt signaling pathway regulating body mass, glucose metabolism, de novo lipogenesis, and low-density lipoprotein clearance [49]. Further, the GO term arylesterase (ARE) activity was enriched for plasma adiponectin concentrations. ARE activity has been linked to the paraoxonase-1 gene which has roles in lipoprotein oxidative damage prevention and being protective against metabolic syndrome [50,51]. Lastly, serum leptin concentrations were enriched for the GO term G-protein-coupled receptor activity, which have been found to be directly involved in pancreatic β-cell destruction and insulin resistance and remain potential targets for drug therapy for metabolic syndrome and type II diabetes [52].

5. Conclusions

In conclusion, the results of these data provide strong evidence that EMS is a complex, polygenic syndrome with dozens of risk alleles contributing to the phenotype. Prioritization of the hundreds of regions identified on the GWA of 12 individual traits led to the identification of thousands of positional candidate genes which contained protein-coding genes which were functionally enriched for pathways associated with glucose metabolism, lipid metabolism and inflammation. However, further work is required to narrow down the candidate gene pool. Nonetheless, these data were an important first step in the identification of the genetic risk alleles associated with EMS. In addition, this data concludes that a single variant genetic test will not provide enough information to accurately predict an individual’s risk for EMS, and a future DNA test will require a panel of genetic variants including both shared and breed specific variants.
Table A1

Repeatability across results for the Bayesian sparse linear mixed model (BSLMM).

10 M IterationsSeeds 1–1010 M IterationsSeeds 11–2020 M IterationsSeeds 1–1030 M IterationsSeeds 1–10
CHRSuggSignSuggSignSuggSignSuggSign
1 20NANANANANANA
1 10NANA1040
2 38275132251
3 NANANANA1NA
4 5442523052
6 68481112112
7 1401005080
8 60NANANANANANA
9 NANA20NANANANA
14 NANANANA10NANA
15 615152121
16 NANANANA20NANA
18 NANA10NANANANA
19 NANANANANANA30
20 NANANANANANA10
22 NANANANA40NANA
23 NANA110NANANANA
24 NANA41160NANA

Repeatability across results for the Bayesian sparse linear mixed model (BSLMM) using the max gamma values from 10 million (M) iterations with seeds 1–10, 10 M iterations with seeds 11–20, 20 M iterations with seeds 1–10, and 30 M iterations with seeds 1–10 for adiponectin concentrations in the Morgan horses. Regions which are highlighted in yellow indicate those which would have been identified as a region of interest (contained a minimum of five SNPs exceeding the suggestive threshold, with at least one SNP exceeding the threshold for genome wide significance). Abbreviations: Sugg (total number of SNPs which exceeded the suggested threshold for genome-wide significance), Sign (total number of SNPs which exceed the threshold for genome-wide significance).

  61 in total

1.  Seasonal changes in the combined glucose-insulin tolerance test in normal aged horses.

Authors:  R A Funk; A A Wooldridge; A J Stewart; E N Behrend; R J Kemppainen; Q Zhong; A K Johnson
Journal:  J Vet Intern Med       Date:  2012-05-18       Impact factor: 3.333

2.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Authors:  Sharon R Browning; Brian L Browning
Journal:  Am J Hum Genet       Date:  2007-09-21       Impact factor: 11.025

Review 3.  Genetic variants and the metabolic syndrome: a systematic review.

Authors:  C M Povel; J M A Boer; E Reiling; E J M Feskens
Journal:  Obes Rev       Date:  2011-07-12       Impact factor: 9.213

Review 4.  Meta-analysis methods for genome-wide association studies and beyond.

Authors:  Evangelos Evangelou; John P A Ioannidis
Journal:  Nat Rev Genet       Date:  2013-05-09       Impact factor: 53.242

5.  Serum paraoxonase and arylesterase activities in metabolic syndrome in Zahedan, southeast Iran.

Authors:  Mohammad Hashemi; Dor Mohammad Kordi-Tamandani; Nooshin Sharifi; Abdolkarim Moazeni-Roodi; Mahmoud-Ali Kaykhaei; Behzad Narouie; Adam Torkmanzehi
Journal:  Eur J Endocrinol       Date:  2010-11-08       Impact factor: 6.664

6.  Robust remapping of equine SNP array coordinates to EquCab3.

Authors:  Samantha K Beeson; Robert J Schaefer; Victor C Mason; Molly E McCue
Journal:  Anim Genet       Date:  2018-11-13       Impact factor: 3.169

7.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.

Authors:  Steffen Durinck; Paul T Spellman; Ewan Birney; Wolfgang Huber
Journal:  Nat Protoc       Date:  2009-07-23       Impact factor: 13.491

8.  Heritability of metabolic traits associated with equine metabolic syndrome in Welsh ponies and Morgan horses.

Authors:  E M Norton; N E Schultz; A K Rendahl; D Mcfarlane; R J Geor; J R Mickelson; M E McCue
Journal:  Equine Vet J       Date:  2018-12-15       Impact factor: 2.888

9.  A bivariate genome-wide approach to metabolic syndrome: STAMPEED consortium.

Authors:  Aldi T Kraja; Dhananjay Vaidya; James S Pankow; Mark O Goodarzi; Themistocles L Assimes; Iftikhar J Kullo; Ulla Sovio; Rasika A Mathias; Yan V Sun; Nora Franceschini; Devin Absher; Guo Li; Qunyuan Zhang; Mary F Feitosa; Nicole L Glazer; Talin Haritunians; Anna-Liisa Hartikainen; Joshua W Knowles; Kari E North; Carlos Iribarren; Brian Kral; Lisa Yanek; Paul F O'Reilly; Mark I McCarthy; Cashell Jaquish; David J Couper; Aravinda Chakravarti; Bruce M Psaty; Lewis C Becker; Michael A Province; Eric Boerwinkle; Thomas Quertermous; Leena Palotie; Marjo-Riitta Jarvelin; Diane M Becker; Sharon L R Kardia; Jerome I Rotter; Yii-Der Ida Chen; Ingrid B Borecki
Journal:  Diabetes       Date:  2011-03-08       Impact factor: 9.461

10.  A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains.

Authors:  Christy L Avery; Qianchuan He; Kari E North; Jose L Ambite; Eric Boerwinkle; Myriam Fornage; Lucia A Hindorff; Charles Kooperberg; James B Meigs; James S Pankow; Sarah A Pendergrass; Bruce M Psaty; Marylyn D Ritchie; Jerome I Rotter; Kent D Taylor; Lynne R Wilkens; Gerardo Heiss; Dan Yu Lin
Journal:  PLoS Genet       Date:  2011-10-13       Impact factor: 5.917

View more
  2 in total

Review 1.  The Genetic Basis of Obesity and Related Metabolic Diseases in Humans and Companion Animals.

Authors:  Natalie Wallis; Eleanor Raffan
Journal:  Genes (Basel)       Date:  2020-11-20       Impact factor: 4.096

2.  Heritability and Genomic Architecture of Episodic Exercise-Induced Collapse in Border Collies.

Authors:  Elaine M Norton; Katie M Minor; Susan M Taylor; Molly E McCue; James R Mickelson
Journal:  Genes (Basel)       Date:  2021-11-29       Impact factor: 4.096

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.