| Literature DB >> 30216366 |
Sook Hee Yoon1, Wonseok Lee2, Hyeonju Ahn3, Kelsey Caetano-Anolles1, Kyoung-Do Park4, Heebal Kim1,5.
Abstract
The Thoroughbred horse breed was developed primarily for racing, and has a significant contribution to the qualitative improvement of many other horse breeds. Despite the importance of Thoroughbred racehorses in historical, cultural, and economical viewpoints, there was no temporal and spatial dynamics of them using the mitogenome sequences. To explore this topic, the complete mitochondrial genome sequences of 14 Thoroughbreds and two Przewalski's horses were determined. These sequences were analyzed together along with 151 previously published horse mitochondrial genomes from a range of breeds across the globe using a Bayesian coalescent approach as well as Bayesian inference and maximum likelihood methods. The racing horses were revealed to have multiple maternal origins and to be closely related to horses from one Asian, two Middle Eastern, and five European breeds. Thoroughbred horse breed was not directly related to the Przewalski's horse which has been regarded as the closest taxon to the all domestic horses and the only true wild horse species left in the world. Our phylogenomic analyses also supported that there was no apparent correlation between geographic origin or breed and the evolution of global horses. The most recent common ancestor of the Thoroughbreds lived approximately 8,100-111,500 years ago, which was significantly younger than the most recent common ancestor of modern horses (0.7286 My). Bayesian skyline plot revealed that the population expansion of modern horses, including Thoroughbreds, occurred approximately 5,500-11,000 years ago, which coincide with the start of domestication. This is the first phylogenomic study on the Thoroughbred racehorse in association with its spatio-temporal dynamics. The database and genetic history information of Thoroughbred mitogenomes obtained from the present study provide useful information for future horse improvement projects, as well as for the study of horse genomics, conservation, and in association with its geographical distribution.Entities:
Mesh:
Year: 2018 PMID: 30216366 PMCID: PMC6138400 DOI: 10.1371/journal.pone.0203917
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Plot of the nucleotide (a) and amino acid (b) variations throughout the mitochondrial genomes of 15 Thoroughbred horses. Number of dissimilarities was calculated as the total number of altered nucleotides at each site compared using the multiple sequence alignment method. Each gene is indicated at the top of each plot. Single letter abbreviations of tRNA genes stand for following full names: F, tRNA-Phenylalanine; V, tRNA-Valine; L, tRNA-Leucine; I, tRNA-Isoleucine; Q, tRNA-Glutamine; M, tRNA-Methionine; W, tRNA-Tryptophan; A, tRNA-Alanine; N, tRNA-Asparagine; C, tRNA-Cysteine; Y, tRNA-Tyrosine; S, tRNA-Serine; D, tRNA-Aspartate; K, tRNA-Lysine; G, tRNA-Glycine; R, tRNA-Arginine; H, tRNA-Histidine; E, tRNA-Glutamate; T, tRNA- Threonine; P, tRNA-Proline.
Fig 2Plotting the nucleotide (a) and amino acid (b) differences throughout the mitochondrial genomes of 167 modern horses. Number of differences was estimated as the total number of altered nucleotides at each site compared with the multiple sequence alignment method. Each gene is indicated at the top of each plot. Single letter abbreviations of tRNA genes stand for following full names: F, tRNA-Phenylalanine; V, tRNA-Valine; L, tRNA-Leucine; I, tRNA-Isoleucine; Q, tRNA-Glutamine; M, tRNA-Methionine; W, tRNA-Tryptophan; A, tRNA-Alanine; N, tRNA-Asparagine; C, tRNA-Cysteine; Y, tRNA-Tyrosine; S, tRNA-Serine; D, tRNA-Aspartate; K, tRNA-Lysine; G, tRNA-Glycine; R, tRNA-Arginine; H, tRNA-Histidine; E, tRNA-Glutamate; T, tRNA- Threonine; P, tRNA-Proline.
Summary of mitochondrial genome regions of 15 thoroughbred horses used in this study.
| Genomic region | Total sites including gaps, nt/ aa | Variable sites (%),/ aa | Sequence identities (%),(average), nt/ aa | Base frequencies,A, C, G (%) | ||
|---|---|---|---|---|---|---|
| 12S rRNA | 977/ - | 15(1.5%)/ - | 99.2–100.0(99.7)/ - | 29.67 | 36.6, 24.1, 17.0 | - |
| 16S rRNA | 1,581/ - | 19(1.2%)/ - | 99.4-100(99.8)/ - | 19.46 | 37.8, 22.2, 16.9 | - |
| NADH1 | 954/ 318 | 14(1.5%)/ 4(1.3%) | 99.3-100(99.7)/ 98.7-100(98.3) | 29.67 | 36.1, 24.1, 17.4 | 0.031 |
| NADH2 | 1,038/ 346 | 15(1.4%)/ 4(1.2%) | 99.2-100(99.7)/ 99.1-100(99.8) | 29.68 | 36.8, 23.8, 17.5 | 0.021 |
| COX1 | 1,542/ 514 | 18(1.2%)/ 0(0.0%) | 99.4-100(99.8)/100-100(100) | 18.48 | 37.8, 22.4, 16.9 | 0.010 |
| COX2 | 681/ 227 | 8(1.2%)/ 0(0.0%) | 99.0-100(99.7)/ 100-100(100) | 28.60 | 35.8, 24.2, 16.4 | 0.014 |
| ATPase8 | 201/ 67 | 1(0.5%)/ 1(1.5%) | 99.5-100(99.8)/ 98.5-100(99.8) | 29.67 | 34.8, 20.2, 17.4 | 0.343 |
| ATPase6 | 678/ 226 | 12(1.8%)/ 4(1.8%) | 99.0-100(99.7)/ 98.7-100(99.6) | 29.67 | 35.8, 24.3, 16.5 | 0.026 |
| COX3 | 783/ 261 | 13(1.7%)/ 4(1.5%) | 99.1-100(99.7)/ 98.8-100(99.7) | 29.67 | 36.0, 24.4, 16.7 | 0.005 |
| NADH3 | 345/ 115 | 3(0.9%)/ 0(0.0%) | 99.1-100(99.7)/ 100-100(100) | 29.66 | 34.2, 22.3, 18.3 | 0.014 |
| NADH4L | 294/ 98 | 3(1%)/ 2(2.0%) | 99.0-100(99.6)/ 97.9-100(99.7) | 29.67 | 35.4, 21.7, 17.3 | 0.028 |
| NADH4 | 1,377/ 459 | 16(1.2%)/ 9(2.0%) | 99.3-100(99.7)/ 99.6-100(99.9) | 29.67 | 37.7, 23.1, 16.8 | 0.019 |
| NADH5 | 1,812/ 604 | 24(1.3%)/ 9(1.5%) | 99.5-100(99.7)/ 99.3-100(99.7) | 13.08 | 38.1, 22.2, 16.3 | 0.025 |
| NADH6 | 528/ 175 | 26(4.9%)/ 5(2.9%) | 97.9-100(98.7)/ 98.9-100(99.6) | 14.05 | 25.7, 13.1, 28.3 | 0.088 |
| Cytb | 1,140/ 379 | 15(1.3%)/ 2(0.5%) | 99.3-100(99.7)/ 99.5-100(99.9) | 29.67 | 37.2, 23.7, 17.1 | 0.017 |
| Control region | 962/ - | 15(1.6%)/ - | 99.2-100(99.7)/ - | 29.67 | 36.2, 24.2, 17.3 | - |
| 13 protein coding genes | 11,373/ 3,789 | 196(1.7%)/ 44(1.2%) | 99.2-100(99.6)/ 99.6-100(99.8) | 29.56 | 30.2, 29.3, 13.3 | 0.049 |
| 22 tRNAs | 1,520/ - | 13(0.9%)/ - | 99.7-100(99.9)/ - | 29.67 | 32.7, 18.5, 19.4 | - |
| Overall | 16,432 - | 299(1.8%)/ - | 99.8-100(99.6)/ - | 29.44 | 32.4, 28.4, 13.2 | - |
*Ts/Tv ratio = transition versus transversion ratio
*ω (dN/dS) value = relative rates of nonsynonymous and synonymous substitutions
Fig 3Pairwise dN/dS (ω) values of the mitochondrial genomes of 167 modern horses.
By comparing the relative rates of nonsynonymous and synonymous substitutions in 13 protein coding genes, we discovered that the ATP8 gene in both Thoroughbred racehorses and other modern horses has the highest level of adaptive variation.
Summary of mitochondrial genome regions of 167 modern horses used in this study.
| Genomic | Total sites | Variable sites (%), | Sequence identities (%), | Base frequencies, | Evolutionary model | Nst | Rates | ||
|---|---|---|---|---|---|---|---|---|---|
| 12S rRNA | 977/ - | 50(5.1%)/ - | 98.7–100.0(99.7)/ - | 9.21 | 36.7, 24.2, 17.0 | TIM+I+G | 6 | gamma | - |
| 16S rRNA | 1,581/ - | 65(4.1%)/ - | 99.3-100(99.7)/ - | 10.04 | 37.8, 22.2, 16.9 | TrN+I+G | 6 | gamma | - |
| NADH1 | 954/ 318 | 47(4.9%)/ 10(3.1%) | 98.7-100(99.7)/ 98.4-100(99.7) | 8.10 | 36.1, 24.2, 17.5 | GTR+G | 6 | gamma | 0.031 |
| NADH2 | 1,038/ 346 | 50(4.8%)/ 19(5.5%) | 98.7-100(99.6)/ 98.5-100(99.7) | 9.14 | 36.8, 23.8, 17.6 | HKY+I | 2 | equal | 0.022 |
| COX1 | 1,542/ 514 | 62(4.0%)/ 4(0.8%) | 99.1-100(99.7)/ 99.8-100(99.9) | 9.65 | 37.8, 22.4, 16.9 | HKY+I | 2 | equal | 0.010 |
| COX2 | 681/ 227 | 37(5.4%)/ 4(1.8%) | 98.5-100(99.7)/ 99.1-100(99.9) | 5.64 | 35.9, 24.3, 16.5 | HKY+I | 2 | equal | 0.014 |
| ATPase8 | 201/ 67 | 12(6.0%)/ 5(7.5%) | 98.5-100(99.8)/ 97.0-100(99.5) | 3.01 | 34.9, 20.4, 17.4 | K81uf | 6 | equal | 0.342 |
| ATPase6 | 678/ 226 | 37(5.5%)/ 14(6.2%) | 98.5-100(99.7)/ 97.8-100(99.5) | 5.64 | 35.9, 24.4, 16.6 | TrN+G | 6 | gamma | 0.024 |
| COX3 | 783/ 261 | 41(5.2%)/ 7(2.7%) | 98.6-100(99.7)/ 98.8-100(99.8) | 6.93 | 36.1, 24.5, 16.8 | HKY+I | 2 | equal | 0.005 |
| NADH3 | 345/ 115 | 20(5.8%)/ 2(1.7%) | 98.5-100(99.7)/ 98.3-100(99.9) | 4.91 | 34.2, 22.6, 18.2 | K81uf | 6 | equal | 0.014 |
| NADH4L | 294/ 98 | 22(5.8%)/ 4(4.1%) | 98.3-100(99.7)/ 97.9-100(99.8) | 3.95 | 35.4, 22.0, 17.3 | K81uf+G | 6 | gamma | 0.030 |
| NADH4 | 1,377/ 459 | 54(3.9%)/ 22(4.8%) | 99.1-100(99.7)/ 99.1-100(99.8) | 9.88 | 37.8, 23.1, 16.8 | K81uf+I | 6 | equal | 0.019 |
| NADH5 | 1,812/ 604 | 73(4.0%)/ 26(4.3%) | 99.2-100(99.7)/ 98.8-100(99.7) | 9.7 | 38.1, 22.2, 16.3 | TIM+I | 6 | equal | 0.025 |
| NADH6 | 528/ 175 | 49(9.3%)/ 20(11.4%) | 97.1-100(98.7)/ 97.1-100(99.6) | 8.61 | 25.6, 13.1, 28.3 | TVM+G | 6 | gamma | 0.086 |
| Cytb | 1,140/ 379 | 51(4.5%)/ 11(2.9%) | 98.5-100(98.9)/ 98.9-100(99.9) | 9.37 | 37.2, 23.8, 17.2 | TrN+I | 6 | equal | 0.016 |
| Control region | 963/ - | 49(5.1%)/ - | 98.6-100(99.6)/ - | 9.13 | 36.3, 24.2, 17.3 | GTR+I+G | 6 | gamma | - |
| 13 protein coding genes | 11,373/ 3,789 | 589(5.2%)/ 148(3.9%) | 98.9-100(99.5)/ 99.5-100(99.8) | 26.20 | 30.2, 29.3, 13.3 | TIM+I+G | 6 | gamma | 0.049 |
| 22 tRNAs | 1,520/ - | 46(3%)/ - | 99.3-100(99.8)/ - | 3.46 | 32.7, 18.5, 19.5 | TrN+I+G | 6 | gamma | - |
| Overall | 16,432/ - | 845(5.1%)/ - | 99.0-100(99.5)/ - | 18.29 | 32.5, 28.4, 13.2 | GTR+I+G | 6 | gamma | - |
*Ts/Tv ratio = transition versus transversion ratio
**ω (dN/dS) value = relative rates of nonsynonymous and synonymous substitutions
Fig 4Bayesian maximum clade credibility phylogenomic tree on the ground of the mitochondrial genome sequences of 167 modern horses.
The data set (16,432 base pairs) was also analyzed phylogenetically using Bayesian inference (BI) and maximum likelihood (ML) methods which showed the same topologies. 95% Highest Posterior Density of node heights are shown by blue bars. Groups are marked by a “G”. Numbers at the nodes represent (left to right): posterior probabilities (≥0.80) for the BI tree and bootstrap values (≥70%) for the ML tree. The racing horses were revealed to have multiple maternal origins and to be closely related to horses from one Asian, two Middle Eastern, and five European breeds. Results of phylogenomic analyses also uncovered no apparent association between geographic origin or breed and heterogeneity of global horses. The most recent common ancestor of the Thoroughbreds lived approximately 8,100–111,500 years ago, which was significantly younger than the most recent common ancestor of modern horses (0.7286 My).
Fig 5Distribution of Thoroughbred horses within the Bayesian maximum clade credibility phylogenetic tree derived from the complete mitochondrial genome sequences of 167 global horses.
Thoroughbred horse samples are shown in italic and bold type.
Summary of the six horse groups.
| Group | No. of horses | Kinds of breeds | Geographic regions | tMRCA (Mya) |
|---|---|---|---|---|
| 1 | 43 | AkT, Arb, BlF, ChP, CsP, Hol, IcH, Irn, Ita, Jeju, KiH, Kla, Kuz, Lus, Mrm, NoF, RHD, RRH, ShA, She, SuP, Syr, | Asia, Middle East, Europe, America | 0.3704 |
| 2 | 39 | AkT, Alt, Arb, Cam, CsP, Gia, Irn, Ita, Jeju, Kab, Kla, Kus, Lie, Mrm, Naq, NoF, Nor, OrT, Prz, Shi, Syr, | Asia, Middle East, Europe | 0.3964 |
| 3 | 4 | BeD, Irn, Jeju, Mrm (n = 4) | Asia, Middle East, Europe | 0.2892 |
| 4 | 38 | AkT, AmP, And, App, Arb, BaC, Bar, CsP, GRP, GSH, HuC, Irn, Ita, KiH, Kon, Lie, Mon, Mrm, Old, PaH, Per, Sil, SpH, | Asia, Middle East, Europe, Africa, America | 0.2599 |
| 5 | 41 | AkT, Alt, And, Arb, Ard, Cly, CsP, Deq, EnS, ExP, Fre, Haf, Han, IcH, Irn, Jeju, Kla, Lew, Mrm, RHD, Sad, | Asia, Middle East, Europe, America | 0.4789 |
| 6 | 2 | Mrm, Irn (n = 2) | Middle East, Europe | 0.07 |
Fig 6Bayesian skyline plot (BSP) based on mitochondrial genome sequences from 167 modern horses.
The dark line in the BSP represents the estimated effective population size through time. The green area represents the 95% highest posterior density confidence intervals for this estimate.