| Literature DB >> 29740610 |
Gabriel Renaud1, Bent Petersen2,3, Andaine Seguin-Orlando1,4,5, Mads Frost Bertelsen6, Andrew Waller7, Richard Newton7, Romain Paillot7, Neil Bryant7, Mark Vaudin7, Pablo Librado1,5, Ludovic Orlando1,5.
Abstract
Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome-wide and locally, and to detect runs of homozygosity potentially pertaining to positive selection in domestic donkeys. Finally, this new assembly allowed us to identify fine-scale chromosomal rearrangements between the horse and the donkey that likely played an active role in their divergence and, ultimately, speciation.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29740610 PMCID: PMC5938232 DOI: 10.1126/sciadv.aaq0392
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Quality metrics for this assembly compared to previous donkey genome assemblies.
The number of annotated genes (lower than that in previous assemblies) shows a better homologous correspondence with the horse gene set (see Gene annotation).
| N50 contigs | 140.3 kb | 66.7 kb | 6.38 kb |
| N50 scaffolds | 15.4 Mb | 3.8 Mb | 100.94 kb |
| Coverage | 61.2× | 42.4× | 12.4× |
| Total bases | 2.320 Gb | 2.391 Gb | 2.293 Gb |
| Largest scaffold | 84.20 Mb | 17.06 Mb | 1.09 Mb |
| Unresolved bases per 100 kb | 1121.61 | 1384.93 | 4128.43 |
| Total number of predicted protein-coding genes | 18,984 | 23,850* | 24,156 |
*Calculated using one isoform per gene and 42,247 total transcripts.
Fig. 1Distribution of the cumulative scaffold length compared to previously published genome assemblies.
The red line represents the genome assembly obtained in this work using the Chicago HiRise technology. It shows that the greater N50 value of our new assembly is not simply due to a few longer scaffolds than two previously reported assemblies. Mbp, million base pairs.
Fig. 2Heterozygosity rates for various equine species.
The heterozygosity estimates were computed using the same data aligned both to the horse genome (EquCab2.0) from a previous study and to the donkey reference presented in this study.
Fig. 3Demographic trajectories of zebras and asses during the last ~2.5 million years (Ma).
(A and B) PSMC reconstruction of the effective population size over time, for different ass species (A) and zebra species (B). The first 100 ka are highlighted for the ass and zebra species.
Fig. 4Dot plot showing the correspondence of unique 101-nucleotide oligomers from the donkey scaffolds to their location on the horse genome, using exact matches.
Because the orientation of the donkey scaffolds is unknown a priori, those were oriented using the strand that minimized the number and the size of inversions with respect to the horse chromosomes. The large inversions on the donkey scaffolds aligning to ECA7, ECA28, and ECA31 are enlarged for clarity. In the enlarged alignment to ECA7, donkey scaffold ScCGjx6_197 is not reverse-complemented consistently with the figures found in the Supplementary Materials.
Fig. 5Genetic distance of the donkey scaffold to ECA28.
The middle part of the scaffold (~20 Mb) represents a good candidate for an inversion in either lineage and shows inflated level of divergence at the breakpoints. The dotted lines in the bottom panel represent the genomic average and the 95% confidence interval for the upper and lower divergence.