| Literature DB >> 26178449 |
Robert R Fitak1, Elmira Mohandesan1, Jukka Corander2, Pamela A Burger1.
Abstract
The single-humped dromedary (Camelus dromedarius) is the most numerous and widespread of domestic camel species and is a significant source of meat, milk, wool, transportation and sport for millions of people. Dromedaries are particularly well adapted to hot, desert conditions and harbour a variety of biological and physiological characteristics with evolutionary, economic and medical importance. To understand the genetic basis of these traits, an extensive resource of genomic variation is required. In this study, we assembled at 65× coverage, a 2.06 Gb draft genome of a female dromedary whose ancestry can be traced to an isolated population from the Canary Islands. We annotated 21,167 protein-coding genes and estimated ~33.7% of the genome to be repetitive. A comparison with the recently published draft genome of an Arabian dromedary resulted in 1.91 Gb of aligned sequence with a divergence of 0.095%. An evaluation of our genome with the reference revealed that our assembly contains more error-free bases (91.2%) and fewer scaffolding errors. We identified ~1.4 million single-nucleotide polymorphisms with a mean density of 0.71 × 10(-3) per base. An analysis of demographic history indicated that changes in effective population size corresponded with recent glacial epochs. Our de novo assembly provides a useful resource of genomic variation for future studies of the camel's adaptations to arid environments and economically important traits. Furthermore, these results suggest that draft genome assemblies constructed with only two differently sized sequencing libraries can be comparable to those sequenced using additional library sizes, highlighting that additional resources might be better placed in technologies alternative to short-read sequencing to physically anchor scaffolds to genome maps.Entities:
Keywords: Camelus dromedarius; adaptation; demography; domestication; next-generation sequencing
Mesh:
Year: 2015 PMID: 26178449 PMCID: PMC4973839 DOI: 10.1111/1755-0998.12443
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Read statistics after quality and length trimming
| Library | # Reads with partner | # Reads without partner | Mean length (SD) | Total number of bases | Sequence coverage |
|---|---|---|---|---|---|
| 500‐bp PE | 579 823 726 | 5 045 754 | 98.2 (6.4) | 114 374 878 323 | 55.7× |
| 500‐bp PE‐corrected | 562 416 289 | 22 102 005 | 98.1 (7.0) | 112 536 342 122 | 54.8× |
| 5‐kb MP | 224 408 840 | 2 834 348 | 48.6 (1.8) | 21 970 012 359 | 10.7× |
| Total (Corrected+MP) | 786 825 129 | 24 936 353 | — | 134 506 354 481 | 65.5× |
PE, paired‐end library; MP, mate‐pair library.
Summary of the dromedary genome assembly presented in this study compared with the current reference
|
| Reference | |
|---|---|---|
| # Scaffolds | 35 752 | 32 572 |
| Mean length (bp) | 57 481.1 | 61 526.7 |
| Total length (bp) | 2 055 063 633 | 2 004 047 047 |
| Longest (bp) | 9 719 801 | 23 736 781 |
| GC content | 41.3% | 41.2% |
| Repeat content | 33.7% | 28.4% |
| N50 (count) | 1 482 444 (393) | 4 188 677 (132) |
| N60 (count) | 1 108 832 (553) | 2 993 967 (190) |
| N70 (count) | 842 144 (764) | 2 137 136 (268) |
| N80 (count) | 558 658 (1063) | 1 311 427 (389) |
| N90 (count) | 260 185 (1592) | 689 795 (594) |
| Number of gaps | 150 386 | 72 775 |
| Total gap length | 53 439 631 | 22 596 073 |
| CEGs | 98.7% | 98.5% |
Accession no. GCA_000767585.1; Wu et al. (2014).
Proportion of 458 core eukaryotic genes (CEGs) identified using cegma.
Figure 1Cumulative length of the African Camelus dromedarius assembly. Scaffolds are sorted from longest to smallest along the horizontal axis. The vertical dotted line indicates the number of scaffolds containing 95% of the total assembly.
Frequency of different assembly errors compared with the reference genome for short‐insert reads (separated by insert size)
|
| Reference | |||
|---|---|---|---|---|
| Insert size | 500 bp | 170 bp | 500 bp | 800 bp |
| Error‐free bases | 91.8% | 83.4% | 74.9% | 68.6% |
| FCD | 37 015 | 9 641 002 | 203 806 | 195 429 |
| Collapsed repeats | 10 233 | 86 488 | 8694 | 4659 |
| Wrong read orientation | 113 677 | 95 230 | 215 821 | 210 951 |
Accession no. GCA_000767585.1; Wu et al. (2014).
Fragment coverage distribution.
Figure 2Calculation of the fragment coverage distribution (FCD) error cut‐off. For each potential FCD cut‐off, each solid line represents the proportion of 100‐bp windows that would fail and subsequently be labelled as an assembly error. The vertical dashed lines are the cut‐off scores determined in reapr using the value where the normalized (between −1 and 1) first and second derivatives are ≥0.05. See Hunt et al. (2013) for a complete description of the method. The colours correspond with the different read alignments separated by genome and insert size.
Figure 3Density of SNPs within the African dromedary genome assembly (dark grey bars) and density of divergent sites (light grey bars) from the alignment with the reference genome (Accession no. GCA_000767585.1). * Genome‐wide density is based upon 1000‐bp nonoverlapping windows.
Figure 4Historical effective population size of the African dromedary inferred with the filtered, repeat‐masked set of variants (black line; strict conditions) and with default parameters (red line; lenient conditions) in psmc (Li & Durbin 2011). The lighter‐coloured lines of the same colour represent the 100 bootstrap replicates. The result is scaled using a generation time (g) of five years and a per‐base mutation rate (μ) of 2.5 × 10−8. The light‐blue and blue‐shaded regions indicate the last glacial period (LGP) and last glacial maximum (LGM), respectively.