Mehdi Momen1, Sabrina H Brounts1, Emily E Binversie1, Susannah J Sample1, Guilherme J M Rosa2, Brian W Davis3, Peter Muir1. 1. Department of Surgical Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI 53706, USA. 2. Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA. 3. Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843, USA.
Abstract
Degenerative suspensory ligament desmitis is a progressive idiopathic condition that leads to scarring and rupture of suspensory ligament fibers in multiple limbs in horses. The prevalence of degenerative suspensory ligament desmitis is breed related. Risk is high in the Peruvian Horse, whereas pony and draft breeds have low breed risk. Degenerative suspensory ligament desmitis occurs in families of Peruvian Horses, but its genetic architecture has not been definitively determined. We investigated contrasts between breeds with differing risk of degenerative suspensory ligament desmitis and identified associated risk variants and candidate genes. We analyzed 670k single nucleotide polymorphisms from 10 breeds, each of which was assigned one of the four breed degenerative suspensory ligament desmitis risk categories: control (Belgian, Icelandic Horse, Shetland Pony, and Welsh Pony), low risk (Lusitano, Arabian), medium risk (Standardbred, Thoroughbred, Quarter Horse), and high risk (Peruvian Horse). Single nucleotide polymorphisms were used for genome-wide association and selection signature analysis using breed-assigned risk levels. We found that the Peruvian Horse is a population with low effective population size and our breed contrasts suggest that degenerative suspensory ligament desmitis is a polygenic disease. Variant frequency exhibited signatures of positive selection across degenerative suspensory ligament desmitis breed risk groups on chromosomes 7, 18, and 23. Our results suggest degenerative suspensory ligament desmitis breed risk is associated with disturbances to suspensory ligament homeostasis where matrix responses to mechanical loading are perturbed through disturbances to aging in tendon (PIN1), mechanotransduction (KANK1, KANK2, JUNB, SEMA7A), collagen synthesis (COL4A1, COL5A2, COL5A3, COL6A5), matrix responses to hypoxia (PRDX2), lipid metabolism (LDLR, VLDLR), and BMP signaling (GREM2). Our results do not suggest that suspensory ligament proteoglycan turnover is a primary factor in disease pathogenesis.
Degenerative suspensory ligament desmitis is a progressive idiopathic condition that leads to scarring and rupture of suspensory ligament fibers in multiple limbs in horses. The prevalence of degenerative suspensory ligament desmitis is breed related. Risk is high in the Peruvian Horse, whereas pony and draft breeds have low breed risk. Degenerative suspensory ligament desmitis occurs in families of Peruvian Horses, but its genetic architecture has not been definitively determined. We investigated contrasts between breeds with differing risk of degenerative suspensory ligament desmitis and identified associated risk variants and candidate genes. We analyzed 670k single nucleotide polymorphisms from 10 breeds, each of which was assigned one of the four breed degenerative suspensory ligament desmitis risk categories: control (Belgian, Icelandic Horse, Shetland Pony, and Welsh Pony), low risk (Lusitano, Arabian), medium risk (Standardbred, Thoroughbred, Quarter Horse), and high risk (Peruvian Horse). Single nucleotide polymorphisms were used for genome-wide association and selection signature analysis using breed-assigned risk levels. We found that the Peruvian Horse is a population with low effective population size and our breed contrasts suggest that degenerative suspensory ligament desmitis is a polygenic disease. Variant frequency exhibited signatures of positive selection across degenerative suspensory ligament desmitis breed risk groups on chromosomes 7, 18, and 23. Our results suggest degenerative suspensory ligament desmitis breed risk is associated with disturbances to suspensory ligament homeostasis where matrix responses to mechanical loading are perturbed through disturbances to aging in tendon (PIN1), mechanotransduction (KANK1, KANK2, JUNB, SEMA7A), collagen synthesis (COL4A1, COL5A2, COL5A3, COL6A5), matrix responses to hypoxia (PRDX2), lipid metabolism (LDLR, VLDLR), and BMP signaling (GREM2). Our results do not suggest that suspensory ligament proteoglycan turnover is a primary factor in disease pathogenesis.
Degenerative suspensory ligament (SL) desmitis (DSLD) is an untreatable progressive idiopathic disorder that affects the connective tissue of the lower limbs in horses and often leads to euthanasia (Halper , 2011). DSLD is an important health and welfare problem that is of great concern to the community of owners of Peruvian Horses (PHs) (Peruvian Pasos) and DSLD-affected horses of other breeds. DSLD was first described in the PH, a highly predisposed breed, and has subsequently described in other breeds including the Arabian (AR), American Saddlebred, Quarter Horse (QH), Morgan, Thoroughbred (TB), Paso Fino, Akhal-Teke, and European Warmbloods (Mero and Pool 2002; Mero and Scarlett 2005). Prevalence of DSLD is breed-related; pony and draft breeds do not develop DSLD based on clinical knowledge and published reports; other athletic breeds, such as the AR and TB, have intermediate risk (Young 1993; Mero and Scarlett 2005; Halper , 2011; Luo ). In certain PH bloodlines or families, DSLD prevalence is up to 40% (Luo ).Onset of DSLD is subtle with progressive multilimb lameness developing with associated fetlock hyperextension and SL thickening (Halper ). Age at diagnosis is variable, ranging from 3 to 17 years (Mero and Pool 2002). DSLD is characterized by increased diameter of the body and branches of the SL (Fig. 1). Collagen disruption, accumulation of interfibrillar matrix proteoglycans, and chondroid metaplasia are key histologic features in affected horses (Plaas ). In the PH, DSLD can develop with no history of athletic work or SL injury. This suggests a specific etiology that includes a genetic contribution to disease risk given the strong breed predisposition in PHs (Mero and Scarlett 2005), although the genetic architecture of the disease is unclear.
Fig. 1.
DSLD is a crippling, painful spontaneous equine disease. A Peruvian Horse (PH) (a) that is severely affected with DSLD and a (b) phenotype-negative control PH. As the disease develops, the SL progressively thickens. Over time, the SL mechanically weakens and ruptures, resulting in a classic sign of dropped fetlocks in multiple limbs. In the severe case, obvious thickening and dropping of the fetlocks is evident (inset a) compared with the normal standing posture (inset b). DSLD is typically more evident in the pelvic limbs verses the thoracic limbs, although in some PHs DSLD develops in all four limbs.
DSLD is a crippling, painful spontaneous equine disease. A Peruvian Horse (PH) (a) that is severely affected with DSLD and a (b) phenotype-negative control PH. As the disease develops, the SL progressively thickens. Over time, the SL mechanically weakens and ruptures, resulting in a classic sign of dropped fetlocks in multiple limbs. In the severe case, obvious thickening and dropping of the fetlocks is evident (inset a) compared with the normal standing posture (inset b). DSLD is typically more evident in the pelvic limbs verses the thoracic limbs, although in some PHs DSLD develops in all four limbs.Breed development of the horse has generated selection pressures to enable to work in agriculture and transport. Rare breeds can be exposed to loss of population size and genetic diversity (Ablondi ). Morphological and performance traits have been targets for selective breeding (Petersen ; de Simoni Gouveia ). These genetic differentiation events have been generated by natural and artificial selection, shaping genomes individually over time with unique traits and specific genomic footprints. An unintended consequence of this selection is an increased incidence of disease within certain breeds. When comparing breeds with different disease risk, it is important to highlight the locus undergoing selection where deleterious allele frequency may be increasing, even if it has not yet reached fixation (Gurgul ). Many spontaneous equine diseases, such as DSLD, closely mimic human heritable disorders. Tendon and ligament degeneration is an important health problem in humans and animals, but the contributing genes and pathways are poorly understood. Comparative studies of a disease in a spontaneous animal model where reduced genetic diversity within a breed is associated with long stretches of linkage disequilibrium (LD) can be advantageous as power of association is higher.During past decades, investigation of the molecular pathology of DSLD, disturbances to signaling pathways, genome-wide analysis with low-density markers, and case–control gene expression analysis have been performed (Young ; Mero and Pool 2002; Haythorn ). A low-resolution genome-wide association study (GWAS) in the PH identified candidate loci on Equus caballus autosomes (ECAs) 6, 7, 11, 14, and 26 that did not meet genome-wide significance (Strong 2005; Metzger and Distl 2020). Therefore, improved understanding of the genetic contribution to DSLD is needed. Carrier horses can be inadvertently used for breeding before development of a late-onset disease condition, such as DSLD, causing a welfare concern because of substantial horse morbidity. Breed predisposition suggests that DSLD-associated genetic variants are enriched in the PH through linkage to desirable phenotypes. From an evolutionary perspective, selection for a desired phenotype through breeding practices results in an increased frequency of haplotypes containing the gene(s) and functional allele(s) conferring the phenotype at a rate higher than expected under a null model of neutral evolution (Cutter and Payseur 2013). GWAS and detection of signatures of selection (SOS) are two complimentary approaches for association between a disease phenotype and genetic variation (Patron ). In a comparative analysis, we investigated the breed risk of DSLD using a categorical genome-wide SOS and GWAS approach in the PH and nine other breeds using breed-assigned risk levels that reflect varying levels of breed risk of DSLD, to capitalize on the availability of public single-nucleotide polymorphism (SNP) data from different breeds of horse. A candidate locus from GWAS that localizes in a region with a signature of positive selection signal indicates potential influential variation, particularly for simple Mendelian diseases (Kemper ).DSLD heritability has not been estimated and no associated genetic markers have been identified. Genetic testing for DSLD is not available. Clinically, it is recommended not to use affected horses for breeding. Our rationale for this work was that large effect DSLD genetic variants that have become highly frequent in susceptible breeds should exhibit association signals in a categorical across breed study examining horses classified using average phenotypes for varying breed related DSLD risk. A locus under natural or artificial selection would be expected to contain influential variation. To test this hypothesis, we quantified the rate and strength of positive SOS and GWAS signals for breed-related risk of DSLD in 10 horse breeds classified by levels of risk. The work is innovative because as it departs from the usual case–control GWAS approach by performing across breed categorical genome-wide association using breed-assigned risk levels for breed risk of DSLD. We identified novel candidate genes for breed risk of DSLD that are involved in disturbances to aging in tendon, mechanotransduction, collagen synthesis, matrix responses to hypoxia, and lipid metabolism. Such genes are targets for investigation in human tendon/ligament degeneration and could form the basis for polygenic risk score prediction of disease risk.
Materials and methods
Sampling
Client-owned PHs were recruited at the UW Madison School of Veterinary Medicine, Texas A&M College of Veterinary Medicine, and through online advertising. Pedigrees were used to confirm purebred status. Blood samples or hair bulb samples pulled from the tail or mane were collected from 162 PHs. All owners gave informed consent to participate in the study. Buffy coat or hair bulbs underwent DNA isolation using the Gentra Puregene reagents (Qiagen, Valencia, CA). Concentrated DNA was stored at −20°C for genotyping. SNP genotyping was performed using the Axiom Equine Genotyping Array (Axiom MNEC670K), which includes a total of 670,796 SNPs across all chromosomes. SNP genomic coordinates were based on the latest genome assembly, EquCab3 (Kalbfleisch ). Two additional datasets, described below, were also used by obtaining genotypes for the relevant SNP positions from published whole-genome sequence and high-density genotype data.Genotypic data were obtained from 304 horses representing nine breeds including: TB (n = 28), QH (n = 72), Belgian (BE) (n = 22), AR (n = 36), Welsh Pony (WP) (n = 44), Standardbred (SB) (n = 39), Icelandic Horse (IH) (n = 18), Lusitano (LU) (n = 21), and Shetland Pony (SP) (n = 24) (Table 1). For all breeds other than the SP, data were previously generated during development of the high-density equine SNP array (Schaefer ). In this study, whole-genome sequence data compiled from 153 horses representing 24 separate breeds were used to discover ∼23 million biallelic candidate SNPs. After quality control and filtration based on breed representation, even spacing across the genome, and probe design considerations, 2,001,826 SNPs were selected for the Affymetrix equine MNEc2M SNP array (Schaefer ). Of these SNPs, we selected those SNPs shared with Axiom MNEc670 that were located on ECAs 1–31. Horse samples from 24 SPs provided by Dr. Rebecca R. Bellone, from Veterinary Genetics Laboratory, School of Veterinary Medicine, University of California, Davis were also included in this study. These samples were genotyped using the Axiom Equine Genotyping 670K array and were remapped to EquCab3 reference genome (Tanaka ).
Table 1.
Breeds of horse used for signature of selection (SOS) and genome-wide association study (GWAS) for equine DSLD after disease risk was assigned to each breed based on clinical knowledge.
Breed
# horses
Source
Horse type
DSLD risk
Categorical risk score
Arabian
36
Beeson et al. (2019) and Cosgrove et al. (2020)
Ancient
Low
1
Belgian
22
Beeson et al. (2019)
Draft
Control
0
Icelandic Horse
18
Beeson et al. (2019)
Pony
Control
0
Lusitano
21
Beeson et al. (2019)
Ancient
Low
1
Peruvian Horse
162
University of Wisconsin-Madison, Comparative Genetics & Orthopaedic Research Laboratory
Ambling
High
3
Quarter Horse
72
Beeson et al. (2019)
Athletic
Medium
2
Shetland Pony
24
Tanaka et al. (2019)
Pony
Control
0
Standardbred
39
Beeson et al. (2019)
Athletic
Medium
2
Thoroughbred
28
Beeson et al. (2019)
Athletic
Medium
2
Welsh Pony
44
Beeson et al. (2019)
Pony
Control
0
Breeds of horse used for signature of selection (SOS) and genome-wide association study (GWAS) for equine DSLD after disease risk was assigned to each breed based on clinical knowledge.
Genetic population structure
We investigated patterns of population structure using five methods.
Principal component analysis
The genomic information was used to compute a genotypic (co)variance matrix between all individuals (Vitezica ). By performing eigen decomposition on the matrix using the base eigen() function in R (R Core Team 2021), the eigenvectors and eigen values were obtained and normalized dividing each component of the vector by the length of the vector to vectors with a length of 1. Finally, PCs were computed by multiplying eigenvectors by the square root of the associated eigenvalues (Scholkopf ). To review the results, we plotted the projection of the individuals on the first two PCs, with colors corresponding to their breed assignment.
Phylogenetic tree analysis
A pairwise identity-by-state distance matrix was computed using PLINK v1.09 and the –genome command followed by the –cluster command (Chang ). To produce a bootstrapping procedure, we resampled 500 datasets with replacement from the original genotypes. Neighbor-joining cladograms were constructed with these matrices using PHYLIP (Felsenstein 1993). The “consense” program in PHYLIP was used to combine the bootstrap results and build a majority rule consensus tree. Cladograms were built using iTOL v6 RRID: SCR_018174 (Letunic and Bork 2021).
Admixture analysis
A maximum-likelihood based approach, was used to infer the ancestry proportions population using ADMIXTURE RRID: SCR_001263 (Alexander ). To identify the most likely number of ancestral populations (K) in the dataset, a series of runs with predefined K values ranging from K = 3 to 30, were examined using 20 cross-validation runs (CV = 20). The termination convergence criterion (delta) was set to to stop when the log-likelihood increases by less than this value between two consequent iterations.
Effective population size
Effective population size (Ne) trajectories were estimated from LD, across the horse breeds. We used SneP v1.1 (Barbato ) to estimate breed-specific Ne trajectory, as an indicator of genetic drift and population demography in recent years for each population. The software estimates the historic effective population size based on the relationship between LD (, Ne, and recombination rate (Sved 1971; Weir and Hill 1980).
LD-decay estimation
To measure LD content across breeds, we computed pairwise LD based on SNP using a window size of 100kb, using all markers with a minor allele frequency (MAF) ≥5% on each chromosome using PLINK v1.9 (Chang ). Then, we fitted a nonlinear spline regression (Van Inghelandt ; Vos ). All chromosomes were concatenated to get an overall genome-wide LD-decay estimation.We also generated a genomic relationship matrix (GRM) for each breed of horse to evaluate within-breed subpopulation structure (R Core Team 2021) using the method of VanRaden (2008).
DSLD risk categorization
Contrasts between breed risk groups were used to identify DSLD-associated regions across the genome. We classified each breed into one of the four DSLD risk categories which were then used for individual horses belonging to that breed: control (1) (BE, IH, SP, and WP), low risk (2) (LU, AR), medium risk (3) (SB, TB, QH), and high risk (4) (PH). These breed risk phenotypes were then used for ordinal GWAS. This risk coding reflects current clinical knowledge of DSLD, of which the authors have extensive experience, where draft horses and pony breeds are protected from DSLD, certain athletic breeds have intermediate risk, and other breeds have high risk (Young 1993; Mero and Scarlett 2005; Halper , 2011; Luo ). Some of the breeds in the public SNP dataset were discarded if clinical and epidemiological knowledge could not estimate DSLD risk for that breed.
Selection and differentiation analyses
Since we were interested in selection along the evolutionary branch leading to breeds with high risk of DSLD, the control population was considered as the reference for comparison. As many approaches have been suggested, we scanned the genome for multiple patterns of molecular variation by (1) locally elevated levels of genetic differentiation based on allele frequency differences and (2) differences in long-range haplotype frequencies between risk groups. To infer these types of signatures, we estimated levels and patterns of genetic diversity and differentiation using two approaches. A pairwise population haplotype frequency test using hapFLK (Fariello ), which accounts for the hierarchical structure of the sampled populations, was used. Sex chromosomes were excluded from phasing, so haplotype statistics were limited to autosomes. We also used a window-based fixation index (FST) that is a measurement of population differentiation due to genetic structure (Weir and Cockerham 1984). Finally, we generated locus-specific diagrams of SOS positive hotspot regions reflecting LD as well as position relative to nearby tendinopathy candidate genes.
hapFLK
We performed hapFLK tests to contrast the high risk group with each of the control, low risk and medium risk groups using hapFLK v1.2 (Fariello ). Computation of hapFLK proceeds in three steps. First, we estimated a kinship matrix across groups, which was calculated using the Reynold’s genetic distances approach. For each SNP in the genome and across risk groups, we then performed the hapFLK test that incorporates haplotypic information to increase the power to detect selective sweeps, where a new mutation that increases its frequency and becomes quickly fixed in the population such that linked alleles also become fixed. So, the hapFLK statistic calculates the deviation of haplotypic frequencies with respect to the neutral model estimated by the kinship matrix (Reynolds ). To exploit LD information, hapFLK uses a multipoint model for multilocus genotypes that can be fitted to unphased data (Scheet and Stephens 2006). One of the main applications of this model is to perform phase estimation using fastPHASE (Scheet and Stephens 2006). In our analysis, the model was trained on unphased data and, therefore, our analysis accounts for phase uncertainty associated with estimation of LD content during haplotype analysis.The method was used to regroup local haplotypes along chromosomes in a specified number of clusters K set to 25, using a Hidden Markov Model. We used the following parameters: 25 clusters (-K 25), 20 EM (expectation maximization) runs to fit the LD model (-nfit = 20). Once hapFLK values were generated we implement the “rglm()” function from MASS R package to fit a robust linear model to normalize hapFLK scores and calculate P-values. hapFLK P-values at each SNP were computed from this estimated distribution. To identify region of interest, we used a Bonferroni-corrected threshold of P ≤ 1.13E−07, obtained by dividing P < 0.05 by the number of SNPs in the model.
Fixation index (FST)
The fixation index (Weir and Cockerham 1984) is a moment estimator of FST and uses unbiased estimators of the numerator (between-population variance) and the denominator (total variance). Here, we used the FST statistic to identify unique, divergent regions of the affected population showing differentiation across risk groups. FST was calculated in sliding windows of 50kb and a step size of 25kb using VCFtools 0.1.15 (Danecek ). Then, the estimated FST values were scaled in R software to follow a normal distribution. Standard score or z-score is a measure of standard deviation that estimates the distance from the mean. , here, is mean value of all computed FST and SD(FST) is the standard deviation of them. Then, we defined a stringent cut off value and selected only the 0.005 most divergent windows.
Liability threshold modeling of ordinal DSLD status for across breed GWAS
We also employed a liability categorical threshold (cumulative probit) model. The response variable (disease risk category), , represents an assignment into one of the four mutually exclusive and exhaustive categories that follow an order, where 1 indicates no risk (control), 2 low risk, 3 moderate risk, and 4 high risk. A schematic representation of this model is shown in Supplementary Fig. 1. Therefore, in a GLMM framework, this model can be described by defining the distribution, the linear predictor and a link function. This threshold model assumes that the process that gives rise to the observed categories is an underlying continuous variable with a normal distribution:
where represents liabilities, conditionally independent and distributed as, x denotes the j-th row of the fixed effects design matrix X, with the corresponding fixed effects coefficients denoted by , z denotes the j-th row of the random effects design matrix Z with corresponding random effects , and is the error term, which follows a normal distribution as ; was set to 1 to achieve variance identifiability in the likelihood because the liabilities are unobservable (Gianola 1982; Sorensen ). Also, the vector was assumed to follow the normal distribution , where is a marker-derived GRM, and , represents genomic variance. The ordinal categorical phenotypes were mapped to four categories (C = 4), based on the threshold parameters , with and , which are cut points of the continuous scale such that the assigned ordinal disease risk category () is given by:This model was implemented via a Bayesian approach using the Gibbs sampler algorithm, and sampling from the fully conditional distributions (Sorensen ) in the BGLR R package (Perez and de Los Campos 2014; R Core Team 2021).We then implemented a univariate linear mixed model regression GWAS model using the “gaston” R package (Dandine-Roulland and Perdry 2017) to test each single genetic variant independently. We regressed each SNP and used the Wald test to determine P-values. We included a GRM using the “GRM()” internal function from the “gaston” package to correct for population structure and cryptic relatedness and remove potential sources of spurious associations. After obtaining the P-values, we computed the chi-squared statistic (chisq<-qchisq(1-data$pval,1) and then the genomic inflation factor (λ) (median of the resulting chi-squared test statistics divided by the expected median of the chi-squared distribution with one degree of freedom (df = 1) (median(chisq)/qchisq(0.5,1)) using R (R Core Team 2021) to investigate possible systematic bias in our association results and reviewed results using a quantile–quantile (Q–Q) plot.Next, we considered two approaches to define significant thresholds. First, a Bonferroni correction was used (P ≤ 1.13E−07). However, Bonferroni correction may be too conservative as each SNP is not an independent test when SNPs reside within long blocks of LD. As an alternative, we defined genome-wide significance using 95% confidence intervals (CI) calculated from the empirical distribution of P-values observed in the absence of real association (Karlsson ). We determined this distribution by rerunning the GWAS with randomly permuted phenotypes 500 times. We defined genome-wide significance as associations exceeding the 5% upper empirical CI (P ≤ 3.24E−5).Finally, we recalculated principal component analysis (PCA) as described above using the GWAS SNPs with P-values lower than the permutation threshold for association with breed risk of DSLD and plotted the projection of the horses on the first two PCs, with colors corresponding to their breed assignment.
Candidate gene identification and pathway analysis
The biological functions of genes mapped to the human genome and located within positive SOS genomic regions or within flanking regions (50kb) of GWAS-associated SNPs were screened for relevance to tendon/ligament homeostasis. We also used the University of California Santa Cruz genome browser (https://genome.ucsc.edu/) to map the significant SNPs that were found to be under positive selection and associated with breed group risk of DSLD to the EquCab3.0 assembly (Casper ). Lists of associated genes were submitted for pathway analysis using Panther software (Mi , 2019).
Results
Genetic diversity and population structure
The population architecture and substructures were investigated using several approaches as a prerequisite for SOS analysis. Genomic relationship matrices were plotted for each breed to evaluate population substructure (Supplementary Fig. 2). PCA showed good differentiation of the 10 horse breeds by the first and second principal components (PC1, PC2) (Fig. 2). The first two PCs explained 34.8% and 12.5% of total genetic variation, respectively. Adding additional PCs did not result in any further observable genetic clusters (Supplementary Fig. 3). The Peruvian Horse breed was separated from other breeds using PC1. The SP, IH, BE, WP, and LU breeds were separated along PC2. The PC results showed a great overlap between AR and QH breeds, indicating a close genetic relationship between these two breeds in our data. Furthermore, SB, QH, AR, and TB breeds showed a closer relationship with each other than with other breeds along both PC1 and PC2.
Fig. 2.
A Principal component analysis (PCA) plot representing the genetic landscape of 10 horse breeds extended across first and second components (PC1 and PC2) derived from eigen vectors and eigen values obtained from eigen decomposition of a genotypic (co)variance matrix between all individuals. Each color shows a different breed, and each point represents 1 sample.
A Principal component analysis (PCA) plot representing the genetic landscape of 10 horse breeds extended across first and second components (PC1 and PC2) derived from eigen vectors and eigen values obtained from eigen decomposition of a genotypic (co)variance matrix between all individuals. Each color shows a different breed, and each point represents 1 sample.We also assessed population history for admixture events using model-based ancestry clustering. The ADMIXTURE software was used to determine the number of genetic background ancestors (K) that explain the observed total sum of between-breed genetic variation. For SOS analysis, it is important to know the composition of breed ancestry. To estimate the optimum number of ancestors, we tested a range of K values from 3 to 33. The lowest 20-fold cross-validation mean squared error (MSE) was obtained at K = 14 (MSE = 0.431, Supplementary Fig. 4). There were only minor differences between the four K values, K = 8, 12, 14, and 18 with the lowest MSE (Supplementary Fig. 4); results from K = 12, 14, and 18 had less than 0.001 MSE differences. This analysis indicated the presence of at least 12 genetic backgrounds. The individual assignment probabilities indicate distinct genetic backgrounds exist for the AR, SB, TB, BE, and IH, suggesting clear genetic divergence. This was evident across the four lowest values of K tested. In all these scenarios, the SB, PH, LU, and AR showed some degree of ancestral gene flow based on admixture analysis (Supplementary Fig. 4). For the PH, our PCA results show that ancestors are distant and different from the other breeds in the data set, different from the admixture analysis. All 10 breeds formed single, breed-specific nodes with good bootstrap support (Figs. 2 and 3). Like the PCA plot, the identity by descent similarity revealed a close genetic relationship between AR, QH, TB, SB, WP, SP, LU, BE, and IH breeds. Also, the PH resides in a distinct clade. We observed horses from the same breed almost perfectly clustered together, but slight differences were found in the internal branches within a breed. Next, we estimated patterns of LD decay among clusters over all 31 autosomal chromosomes and found the average correlation coefficient (r2), generally decayed rapidly. The decay trend differed between breeds (Supplementary Fig. 5). The largest LD was observed in the TB and the lowest in the PH. The breeds with second and third largest LD content were the SP and AR, respectively. Other breeds with the smallest LD were the WP and the QH.
Fig. 3.
Cladogram of 10 horse breeds obtained using a bootstrapped procedure by identity-by-state distance matrix (IBS) and a neighbor-joining tree algorithm. The outer circle color indicates how breeds constituted their own branches. The number on the branches and sub-branches indicates how they were supported by 500 bootstrap replicates.
Cladogram of 10 horse breeds obtained using a bootstrapped procedure by identity-by-state distance matrix (IBS) and a neighbor-joining tree algorithm. The outer circle color indicates how breeds constituted their own branches. The number on the branches and sub-branches indicates how they were supported by 500 bootstrap replicates.To further investigate differences in LD content across breeds, we estimated ancestral and recent effective population sizes (Ne), which is an important genetic parameter that relates to the amount of LD content, genetic variation and genetic drift in a population and represents the minimum number of breeding individuals in an idealized population. The estimated effective population sizes for breeds over the past 100 generations showed an increasing trend across generations for effective population size and the Ne parameter diversified between breeds (Supplementary Fig. 6). The QH, PH, and WP breeds had larger Ne values, respectively, although the PH had a higher slope, than other breeds and after generation 75 had the highest Ne estimates. The lowest Ne estimates were obtained for TB, SP, and SB breeds. The Ne estimates agreed with LD content among breeds.
Candidate regions and genes under positive selection in horse breeds with differing breed risk of DSLD
The hapFLK analysis identified 6 regions under positive selection, which passed the Bonferroni-corrected threshold when the breed group with high risk of DSLD was compared with the control breed group (i.e. PH vs. BE, IH, SP, and WP) and distributed on ECA2, 3, 6, 7, 10, and 23 (Fig. 4). These regions were also detected when the breed group with high risk of DSLD was compared with low and medium risk breed groups. The largest regions were located on the ECA7 and 23, with a length of ∼3.3 and ∼4.6Mb, respectively (Table 2). The region on ECA7 contained the COL5A3 (collagen type V alpha 3 chain), KANK2 (KN Motif and Ankyrin Repeat Domains 2), LDLR (low-density lipoprotein receptor), and PIN1 (Peptidyl-prolyl cis-trans isomerase NIMA-interacting 1) genes (Fig. 5).
Fig. 4.
Manhattan plot of hapFLK and FST analysis over the 31 autosomal chromosomes across 3 group comparisons for the high risk (Peruvian Horse), medium risk (Standardbred, Thoroughbred, Quarter Horse), low risk (Lusitano, Arabian), and control (Belgian, Icelandic Horse, Shetland Pony, Welsh Pony) breed groups. a) High risk vs. control, b) high risk vs. low risk, and c) high risk vs. medium risk. The red line in the hapFLK Manhattan plot indicates the Bonferroni threshold line and in FST shows the upper 0.005% of the top windows of FST values distribution. The x-axes show the chromosomes and the y-axis the –log10(P-value) for hapFLK and Z-score based for FST. All SNPs and windows passed the defined thresholds highlighted with red color.
Table 2.
Significant signature of selection regions identified using hapFLK, across four DSLD risk groups, high risk vs. control (HR-Cont), high risk vs. low risk (HR-LR), and high risk vs. medium risk (HR-MR).
Regions shared between risk groups. Genes highlighted in bold underlined text may have biological effects on tendon and ligament homeostasis.
Fig. 5.
Locus Zoom plot of hotspot regions containing eight tendinopathy-related candidate genes for breed group risk of degenerative suspensory ligament desmitis on ECA7, 18, and 23. Each point represents an SNP. The color of each SNP indicates its LD quality, as indicated by color index tab. The most significant SNP in each region is indicated by a purple diamond.
Manhattan plot of hapFLK and FST analysis over the 31 autosomal chromosomes across 3 group comparisons for the high risk (Peruvian Horse), medium risk (Standardbred, Thoroughbred, Quarter Horse), low risk (Lusitano, Arabian), and control (Belgian, Icelandic Horse, Shetland Pony, Welsh Pony) breed groups. a) High risk vs. control, b) high risk vs. low risk, and c) high risk vs. medium risk. The red line in the hapFLK Manhattan plot indicates the Bonferroni threshold line and in FST shows the upper 0.005% of the top windows of FST values distribution. The x-axes show the chromosomes and the y-axis the –log10(P-value) for hapFLK and Z-score based for FST. All SNPs and windows passed the defined thresholds highlighted with red color.Locus Zoom plot of hotspot regions containing eight tendinopathy-related candidate genes for breed group risk of degenerative suspensory ligament desmitis on ECA7, 18, and 23. Each point represents an SNP. The color of each SNP indicates its LD quality, as indicated by color index tab. The most significant SNP in each region is indicated by a purple diamond.Significant signature of selection regions identified using hapFLK, across four DSLD risk groups, high risk vs. control (HR-Cont), high risk vs. low risk (HR-LR), and high risk vs. medium risk (HR-MR).Regions shared between risk groups. Genes highlighted in bold underlined text may have biological effects on tendon and ligament homeostasis.The second largest region seen in the high risk vs. control breed group comparison, located on the ECA23, contained the KANK1 (KN motif and ankyrin repeat domain-containing protein 1) and VLDLR (very low-density lipoprotein receptor) genes (Fig. 5). Interestingly, a highly conserved region under positive selection on ECA10 (27.8–28.3Mb) also contained a cluster of Zinc finger protein-related genes, ZNF135, ZNF274, ZNF606, C10H19orf18 (chromosome 10 C19orf18 homolog), and ZNF671.SOS analysis of the breed group with the high risk of DSLD vs. the breed group with medium risk of DSLD identified a ∼1.9Mb region under positive selection on ECA18. This region contained the COL5A2 (collagen type 5 alpha 2 chain) gene (Fig. 5). In this contrast, the ECA7 region also contained the JUNB (proto-oncogene, AP-1 transcription factor subunit) and PDRX2 (peroxiredoxin 2) genes.Finally, our SOS results also showed that for comparison of the breed group with high risk of DSLD vs. the control breed group, there were ∼0.11, 0.52, and ∼1.1Mb regions with a high selection signal on the ECA2, 3, and 6, respectively, that did not contain genes whose function is known to be related to tendon biology. Evaluation of genomic regions under selection using analysis with the FST statistic identified regions shared with hapFLK analysis, particularly loci on ECA2, 10, and 23 (Fig. 5). FST results showed a more polygenic trend than hapFLK analysis.
Association testing of breed risk groups using a liability ordinal threshold model
GWAS results showed many SNPs deviating from the null line, and 51 SNPs showed a −log10 P-value, larger than the Bonferroni threshold (Fig. 6). The estimated lambda value was ∼1 indicating that there is no systematic bias in the analysis. The QQ plot showed deviation of many observed P-values for significant SNPs from the null hypothesis, indicative of polygenicity in breed risk of DSLD.
Fig. 6.
Manhattan plot of GWAS results from the liability estimates used as input for an ordinary linear mixed model of breed groups with differing risk of degenerative suspensory ligament desmitis. Each point represents a SNP. The first dashed dark red line represents the permuted threshold line and the second dark red solid line represents the genome-wide significance level for Bonferroni correction in −log10(P-value) scale. The top right plot shows the Q–Q plot of the genome-wide association, where the −log10-transformed observed P-values (y-axis) are plotted against –log10-transformed expected P-values (x-axis).
Manhattan plot of GWAS results from the liability estimates used as input for an ordinary linear mixed model of breed groups with differing risk of degenerative suspensory ligament desmitis. Each point represents a SNP. The first dashed dark red line represents the permuted threshold line and the second dark red solid line represents the genome-wide significance level for Bonferroni correction in −log10(P-value) scale. The top right plot shows the Q–Q plot of the genome-wide association, where the −log10-transformed observed P-values (y-axis) are plotted against –log10-transformed expected P-values (x-axis).The strongest single-variant associations between breed groups with differing risk of DSLD detected in the GWAS are reported in Supplementary File 2. We evaluated the 51 SNPs that passed the stringent Bonferroni correction threshold for associated genes by investigating 50kb up- and down-stream from each SNP; this analysis resulted in the identification of 46 genes (Table 3).
Table 3.
Significant genome-wide SNPs identified using ordinal GWAS across four DSLD risk groups and genes located in the associated ±50kb window.
SNP position
P-Value
Chr.
Window (bp)
Genes
1:167,889,750
5.23E−09
1
167,839,750–167,939,750
None
1:16,979,620
8.12E−09
1
16,929,620–17,029,620
ATRNL1
1:152,659,086
2.48E−08
1
152,609,086–152,709,086
MEIS2
1:172,709,687
2.64E−08
1
172,659,687–172,759,687
NPAS3
1:165,399,279
3.08E−08
1
165,349,279–165,449,279
STXBP6
1:120,389,182
3.72E−08
1
120,339,182–120,439,182
SEMA7A
1:181,251,053
4.45E−08
1
181,201,053–181,301,053
None
1:161,411,374
6.58E−08
1
161,361,374–161,461,374
None
2:85,873,720
5.69E−13
2
85,823,720–85,923,720
None
2:92,260,610
2.58E−09
2
92,210,610–92,310,610
RAB33B/NAA15
2:95,726,382
9.07E−09
2
95,676,382–95,776,382
None
2:83,316,550
2.40E−08
2
83,266,550–83,366,550
LRBA/DCLK2
2:113,433,033
4.46E−08
2
113,383,033–113,483,033
CAMK2D/ANK2
2:103,773,160
4.54E−08
2
103,723,160–103,823,160
None
2:113,793,329
8.30E−08
2
113,743,329–113,843,329
ANK2
4:29,201,049
1.27E−11
4
29,151,049–29,251,049
None
4:85,780,520
2.34E−09
4
85,730,520–85,830,520
MKLN1
5:60,855,540
5.92E−08
5
60,805,540–60,905,540
None
5:21,231,893
1.09E−07
5
21,181,893–21,281,893
KCNK2
6:33,050,364
2.67E−09
6
33,000,364–33,100,364
CCND2
6:77,717,350
1.51E−08
6
77,667,350–77,767,350
SLC16A7
8:35,272,427
6.35E−10
8
35,222,427–35,322,427
LRRC30
8:31,344,118
1.47E−08
8
31,294,118–31,394,118
None
8:96,476,384
5.20E−08
8
96,426,384–96,526,384
ATP9B
8:32,610,990
9.96E−08
8
32,560,990–32,660,990
ZNF891/ZNF10
8:57,317,599
1.04E−07
8
57,267,599–57,367,599
CCDC178
9:72,122,737
7.83E−09
9
72,072,737–72,172,737
None
10:84,997,106
3.41E−09
10
84,947,106–85,047,106
NHSL1/CCDC28A/ECT2L
10:69,894,473
4.22E−08
10
69,844,473–69,944,473
TBC1D32
13:16,237,653
3.58E−09
13
16,187,653–16,287,653
AUTS2
15:55,512,469
1.63E−09
15
55,462,469–55,562,469
PPM1B
15:74,044,214
3.55E−09
15
73,994,214–74,094,214
None
16:30,408,620
9.94E−09
16
30,358,620–30,458,620
FHIT
16:68,091,834
1.91E−08
16
68,041,834–68,141,834
COL6A5
17:77,491,730
2.65E−09
17
77,441,730–77,541,730
COL4A1
17:6,212,612
1.88E−08
17
6,162,612–6,262,612
RNF6/CDK8
17:40,253,974
6.67E−08
17
40,203,974–40,303,974
None
18:27,056,892
8.22E−08
18
27,006,892–27,106,892
GTDC1
19:61,510,513
4.96E−09
19
61,460,513–61,560,513
None
20:61,064,890
7.51E−08
20
61,014,890–61,114,890
None
22:33,468,031
1.85E−08
22
33,418,031–33,518,031
PTPRT
24:22,791,028
6.75E−11
24
22,741,028–22,841,028
SPTLC2
24:46,690,329
2.23E−09
24
46,640,329–46,740,329
CEP170B/PLD4/AHNAK2/CLBA1
27:1,613,634
6.87E−11
27
1,563,634–1,663,634
PSD3
27:18,207,816
8.31E−08
27
18,157,816–18,257,816
SGCZ/MIR383
27:10,846,899
9.65E−08
27
10,796,899–10,896,899
None
29:17,669,169
2.26E−08
29
17,619,169–17,719,169
MALRD1
30:4,279,845
4.07E−10
30
4,229,845–4,329,845
GREM2/FMN2
31:14,453,924
7.19E−09
31
14,403,924–14,503,924
None
31:24,992,307
4.26E−08
31
24,942,307–25,042,307
None
31:6,076,336
1.01E−07
31
6,026,336–6,126,336
PACRG
Genes highlighted in bold underlined text may have biological influences on tendon and ligament homeostasis. Windows represent 50kb flanking regions around each associated SNP.
Significant genome-wide SNPs identified using ordinal GWAS across four DSLD risk groups and genes located in the associated ±50kb window.Genes highlighted in bold underlined text may have biological influences on tendon and ligament homeostasis. Windows represent 50kb flanking regions around each associated SNP.Several genes were identified with high relevance to tendon/ligament homeostasis (Table 3). Associated genes we found included SEMA7A (semaphorin 7A) on ECA1, COL6A5 (collagen type VI alpha 5 chain) on ECA16, COL4A1 (collagen type IV alpha 1 chain) gene on ECA17, and GREM2 (gremlin 2 DAN family BMP antagonist) on ECA30. Finally, associations with zinc finger proteins were also identified on ECA8. When PCA was repeated using the 1,440 significant GWAS SNPs with P-values lower than the permutation testing threshold (P ≤ 3.24E−5), we found that the first 2 PCs explained most of the genetic variation (Supplementary Fig. 7). The PH population separated into two clusters and the other horses in the remaining nine breeds cluster together (Fig. 7).
Fig. 7.
Principal component analysis (PCA) plot using the top 1,440 GWAS SNPs with P-values below the significance threshold determined by permutation testing for association with the breed risk of DSLD. In contrast to the single cluster of Peruvian Horses in Fig. 2, Peruvian Horses now separate into 2 clusters with the remaining breeds clustered together.
Principal component analysis (PCA) plot using the top 1,440 GWAS SNPs with P-values below the significance threshold determined by permutation testing for association with the breed risk of DSLD. In contrast to the single cluster of Peruvian Horses in Fig. 2, Peruvian Horses now separate into 2 clusters with the remaining breeds clustered together.
Pathway analysis
Results of pathway analysis with Panther are presented in Supplementary File 3. Overrepresented functions for genes associated with differing breed risk of DSLD from SOS analysis included cellular and metabolic processes and binding. Highlighted pathways included signaling through the EGF receptor, inflammation mediated by chemokines and cytokines, nicotinic acetylcholine receptor, gonadotropin-releasing hormone receptor, and the wnt pathway. Overrepresented protein classes included nucleic acid metabolism proteins, metabolite interconversion enzymes, and gene-specific transcriptional regulators. Similar results were obtained from the GWAS analysis. Here, pathways enriched by GWAS-associated genes include integrin, inflammation mediated by chemokines and cytokines, PI3 kinase, ionotropic glutamate receptor, and cell cycle signaling.
Discussion
Categorical GWAS approach using breed-assigned risk levels
In a One Health comparative analysis undertaking research that is beneficial to both humans and animals, a genome-wide SOS analysis and a GWAS were used to identify genome-wide associations between breed groups categorized by differing risk of DSLD. Genes residing in candidate loci were evaluated for relevance to tendon/ligament injury. As a part of our SOS analysis, population structure within the data was evaluated. We confirmed homogeneity within horse breeds and heterogeneity between breeds in our sample population of 10 breeds based on PCA (Schaefer ) and findings from the phylogenetic cladogram. LD decay within breeds generally decayed as previously described (Petersen ; Schaefer ). The admixture patterns in our results showed some admixture between the PH and other breeds with differing DSLD risk. We based risk categorization on clinical knowledge of the incidence of DSLD in different breeds (Young 1993; Mero and Scarlett 2005; Halper , 2011; Luo ). We found that the PH, which was the only breed assigned a high risk of DSLD, formed a distinct cluster genetically based on PCA, but admixture analysis suggests sharing of ancestry with the SB, the AR, and the LU breeds. Here, it is important not to overinterpret admixture plots where recent genetic drift is strong (Lawson ). The AR breed as a genetic root also formed a distinct cluster; results that align with knowledge of the origin of this horse breed (Cosgrove ). This type of categorical GWAS research approach with breed-assigned risk levels has previously been used in dogs (Smith ; Doherty ), but not horses. Further validation of candidate genes is important as biologically relevant candidate genes may still represent false-positive discoveries (Pavlidis ).
Effective population size
The effectiveness of selection of a mutation depends on both the fitness effect of new beneficial mutations and the effective population size (Ne). We computed effective population size for each breed separately. The breeds we studied have small recent effective population size, except for the QH (Ne > 100). It has been suggested that a minimum Ne of 50–100 is needed for sustaining reproductive fitness in the short term (∼100 years) (Frankham ). Horse breeds are largely defined by segregation of alleles because of founder effect, small effective population size, and maintenance of deleterious alleles through artificial selection (Gossmann ).
DSLD pathology
A key feature of DSLD is acellular accumulation of proteoglycans, such as aggrecan and decorin, replacing cells and collagen bundles, suggesting disturbance to matrix homeostasis (Halper , 2011; Schenkman ; Young ). Such features parallel tendon aging and tendinopathy in humans (Winnicki ; Smith and McIlwraith 2021). A certain amount of proteoglycan turnover may be required to maintain normal tendon/ligament collagen assembly and matrix homeostasis by aggrecanases that are constitutively active in the tissue (Rees ). In this regard, it is interesting to note that accumulation of ADAMTS4 (∼chr5:32.4Mb) and ADAMTS5 (∼chr26:25.1Mb) aggrecanases has been identified in DSLD-affected tendons suggesting sequestration of these enzymes (Plaas ).
Candidate genetic variants associated with differing breed risk of DSLD influence tissue aging
Variants in candidate genes discovered in the present study have the potential to influence SL homeostasis and matrix composition through effects on mechanotransduction, collagen assembly, and turnover and, consequently, contribute to DSLD pathology. PIN1 (∼chr7:51.8Mb) is a gene with an important role in tendon aging in humans through modulation of diverse cellular functions; such effects may influence tendon degeneration (Chen ). Aging is a dominant risk factor for tendon injury and impaired tendon healing, but the associated mechanisms are poorly understood. PIN1 has regulatory effects on cellular aging; overexpression delays the progression of cellular senescence as indicated by the downregulation of senescence-associated -galactosidase and enhanced telomerase activity (Chen ; Lv ). PIN1 is also recognized to have a vital role in tendon stem/progenitor cell aging and its upstream miRNA is a prospective target for preventing progenitor cell aging (Dai ). Mechanical loading influences expression of multiple miRNAs in tendon including downregulation of miRNA-140-5p (Mendias ).
Candidate genetic variants associated with differing breed risk of DSLD influence tissue mechanotransduction
Other genes associated with differing breed group risk of DSLD include KANK1 and KANK2. The KANK protein family is characterized by an N-terminal KN motif, coiled-coil motifs, and 4–5 ankyrin repeat domains located in the C-terminus and are key regulators of adhesion dynamics (Kakinuma ; Zhu ). KANK is a novel Akt substrate and its interaction with 14–3–3 proteins is controlled by the phosphoinositide-3 kinase (PI3K)–Akt signaling pathway (Kakinuma ); a pathway that is critically important to tendon mechanotransduction (Wang ). KANK proteins act to decrease the RhoA-dependent formation of actin stress fibers and cell migration (Kakinuma ; Seetharaman and Etienne-Manneville 2020). PI3K/Akt signaling regulates important cellular functions including apoptosis, cell growth, and cell migration; this pathway has an important role in tendon/ligament homeostasis and repair (Zhang ). KANK2, located on ECA7, is a paralog of this gene and has similar physiological functions (Zhu ; Gee ) and likely similar effects on mechanotransduction. Disruption of KANK1/2 function may have important novel effects on tendon/ligament mechanotransduction and collagen synthesis (Paradzik ).Other candidate tendinopathy genes associated with differing breed group risk of DSLD identified in this study include JUNB, and SEMA7A. The transcription factor JUNB has been linked to type I collagen disruption in fibrous connective tissue disease (Ponticos ) and involvement in signaling pathways for tendon (Tan ) and muscle (Verbrugge ) homeostasis. Semaphorins are cell surface signaling molecules that regulate cell migration. SEMA7A may influence mechanotransduction in tendon/ligament (Spencer and Lallier 2009), but more studies are needed. SEMA7A also has a role in TGF-beta1-mediated tissue remodeling and fibrosis (Kang ).
Candidate genetic variants associated with differing breed risk of DSLD influence matrix composition
Interestingly, we also identified additional genes associated with differing breed group risk of DSLD that influence low abundance fibrillar collagen homeostasis in tendon/ligament. Fibrillar collagen molecules are trimers that can be composed of 1 or more types of alpha chains. SNP associations within the COL5A2 gene on ECA18 and COL5A3 gene on ECA7 were identified from our SOS analysis. Type V collagen is found in tissues containing type I collagen and acts to regulate assembly of heterotypic fibers composed of both type I and type V collagen (Mak ). In humans, Ehlers–Danlos syndrome associated with connective tissue hyperelasticity and fragility is linked with heterozygous mutations in COL5A2 (Chiarelli ); COL5A3 mutations have also been implicated in the syndrome (Imamura ). Collagen V plays an important role in regulating fibrillogenesis and associated recovery of mechanical integrity in tendons after injury (Johnston ). A proteomic composition analysis of the muscle, tendon, and junction tissues showed that COL5A3 as a potential marker for the muscle-tendon junction, a highly interdigitated interface that seamlessly transfers muscle-generated forces to the tendon (Jacobson ). These observations suggest more detailed study of myofibers in the equine SL may be indicated. Risk of Achilles tendinopathy is influenced by interactions between extracellular matrix proteins and cell signaling pathways and this risk is influenced by COL5A2 and COL5A3 variants (Bancelin ; Bittermann ; Jacobson ). The proportion of collagen type V is increased in Warmblood injured SL, a breed registry that is predisposed to DSLD (Shikh-Alsook ).We also found that SNPs within COL4A1 and COL6A5 on ECA16 and 17 were associated with breed group differences in the risk of DSLD. Collagen IV is a major component of basement membrane. The α1(IV) chain, encoded by COL4A1, is expressed ubiquitously and associates with the α2(IV) chain to form the α1α1α2(IV) heterotrimer. Analysis of skeletal muscle from COL4A1 mutant animals identified a muscular dystrophy phenotype with myofiber atrophy, centronucleation, focal inflammatory infiltrates, and fibrosis (Guiraud ). Collagen VI is an extracellular matrix protein with critical roles in maintaining muscle and skin integrity (Lamande and Bateman 2018). The role of COL6A5 mutations in the development of myopathy and fibrosis is not understood (Sabatelli ), but the expression of COL6A5 is decreased in the skin of DSLD-affected horses (Haythorn ).
Candidate genetic variants associated with differing breed risk of DSLD influence lipid metabolism
There are few studies specifically focused on the role of low-density lipoprotein (LDL) and very low-density lipoprotein (VLDL) mutations in tendon/ligament homeostasis, but there is strong evidence that links these 2 lipoproteins with tendinopathy in familial hypercholesterolemia (Tilley ; Aljenedil ). Elevated plasma cholesterol, low-density lipoprotein cholesterol and triglyceride, and lower high-density lipoprotein cholesterol are associated with an increased risk of tendinopathy and xanthoma formation because of LDLR mutations (Raal and Santos 2012; Tilley ). Linkage of VLDLR variants with tendinopathy has not been established.
Other candidate genetic variant effects on differing breed risk of DSLD
Other candidate tendinopathy genes associated with breed group differences in DSLD risk identified in this study include PRDX2 and GREM2. Upregulation of the antioxidant enzyme PRDX2 has a functional role in oxidative stress (Chu ; Jeong ) and has been linked to development of rotator cuff tendinopathy in humans (Thankam and Agrawal 2021). GREM2 is a BMP antagonist that is regulated by the circadian clock; GREM2 has been linked to the development of tendon calcification and the development of tendinopathy (Yeung ).Many associations were identified with zinc finger proteins. Zinc finger proteins are among the most abundant proteins in eukaryotic genomes and their functions are extraordinarily diverse (Laity ; Gurgul ), but a role in tendinopathy has not been clearly defined.In general, candidate genes associated with breed group differences in DSLD risk were supported by pathway analysis results. Pathway analysis suggested that disturbances to mechanotransduction are an important component of DSLD pathogenesis. It is also interesting to note that structural and extracellular matrix proteins were not highlighted in our pathway analysis.
Study limitations regarding use of breed repository SNP data
More work validating our research findings and expansion of the breed repository SNP set is needed to fully understand the usefulness of this approach, since biologically relevant candidate genes may still represent false-positive discoveries (Pavlidis ). Our current data set is limited to a single breed with high risk of DSLD, the PH, which showed some admixture with other breeds with differing DSLD risk, such as the AR, the SB, and the LU. The extent to which contrasts with other breeds may reflect variant differences between breed characteristics as opposed to causal associations with DSLD is unclear and needs more investigation. However, associations with candidate tendinopathy genes are biologically plausible. When PCA using the top GWAS SNPs was repeated, the PH population segregated into 2 clusters likely with differing within-breed risk of DSLD. This observation supports the categorical GWAS approach we have used in this study. These findings would be strengthened by further across breed analysis after addition of other high-risk breeds to the data set. Enlargement of the breed repository data set over time could be very valuable with regarding to analysis of other disease phenotypes in horses where clinical knowledge includes information on breed incidence or breed categorical risk of disease. Additional analysis through case–control association within the PH breed using individually phenotyped horses would also provide further validation for specific candidate genes.
Conclusions
Use of categorical GWAS with breed-assigned risk levels is a potentially useful research approach in horses. Our findings contribute to knowledge of the genetic background that explains differing breed risk of DSLD. Using a One Health comparative analysis investigating SOS and categorical GWAS of breeds of horse with different breed risk of DSLD, our results suggest that DSLD is a complex disease with a polygenic architecture. We have identified several novel candidate genes associated with elevated breed risk of DSLD that are also novel candidates for human tendinopathy. When taken together with existing knowledge of the pathology of the disease, our findings suggest that DSLD pathogenesis is associated with disturbances to SL homeostasis where matrix responses to mechanical loading are perturbed through disturbances to aging in tendon (PIN1), mechanotransduction (KANK1, KANK2, JUNB, SEMA7A), collagen synthesis (COL4A1, COL5A2, COL5A3, COL6A5), matrix response to hypoxia (PRDX2), and lipid metabolism (LDLR, VLDLR). Our results do not suggest that proteoglycan turnover in SL matrix is a primary factor in the disease pathogenesis. Candidate loci can now be specifically characterized using whole-genome sequencing for mutation discovery in both humans and the equine DSLD spontaneous tendinopathy model. These results are also encouraging regarding future development of a genetic risk test or prediction model to predict DSLD risk in horses before onset of clinical signs. To avoid inadvertent use of affected individuals for breeding, genetic testing will need a polygenic risk scoring approach.Click here for additional data file.
Authors: Kamil Winnicki; Anna Ochała-Kłos; Bartosz Rutowicz; Przemysław A Pękala; Krzysztof A Tomaszewski Journal: Ann Anat Date: 2020-01-21 Impact factor: 2.698
Authors: Elissa J Cosgrove; Raheleh Sadeghi; Florencia Schlamp; Heather M Holl; Mohammad Moradi-Shahrbabak; Seyed Reza Miraei-Ashtiani; Salma Abdalla; Ben Shykind; Mats Troedsson; Monika Stefaniuk-Szmukier; Anil Prabhu; Stefania Bucca; Monika Bugno-Poniewierska; Barbara Wallner; Joel Malek; Donald C Miller; Andrew G Clark; Douglas F Antczak; Samantha A Brooks Journal: Sci Rep Date: 2020-06-16 Impact factor: 4.379