| Literature DB >> 35511871 |
Evan W London1,2, Alfred L Roca1,2,3, Jan E Novakofski1,2, Nohra E Mateus-Pinilla1,2.
Abstract
Cervids are distinguished by the shedding and regrowth of antlers. Furthermore, they provide insights into prion and other diseases. Genomic resources can facilitate studies of the genetic underpinnings of deer phenotypes, behavior, and disease resistance. Widely distributed in North America, the white-tailed deer (Odocoileus virginianus) has recreational, commercial, and food source value for many households. We present a genome generated using DNA from a single Illinois white-tailed sequenced on the PacBio Sequel II platform and assembled using Wtdbg2. Omni-C chromatin conformation capture sequencing was used to scaffold the genome contigs. The final assembly was 2.42 Gb, consisting of 508 scaffolds with a contig N50 of 21.7 Mb, a scaffold N50 of 52.4 Mb, and a BUSCO complete score of 93.1%. Thirty-six chromosome pseudomolecules comprised 93% of the entire sequenced genome length. A total of 20 651 predicted genes using the BRAKER pipeline were validated using InterProScan. Chromosome length assembly sequences were aligned to the genomes of related species to reveal corresponding chromosomes. © The American Genetic Association. 2022.Entities:
Keywords: Illumina; Omni-C; Pacific Biosciences; annotation; haploid; non-model species
Mesh:
Year: 2022 PMID: 35511871 PMCID: PMC9308042 DOI: 10.1093/jhered/esac022
Source DB: PubMed Journal: J Hered ISSN: 0022-1503 Impact factor: 2.679
Bioinformatics software used for assembly and analysis
| Software | Version | |
|---|---|---|
|
| ||
| Long-read filtering | Fastp | 0.20.0 |
| De novo Assembly | Wtdbg2 | 2.5 |
| Contig polishing (long reads) |
| 8.0.0 |
| Short-read pre-processing | Bcl2fastq2 | 2.20 |
| Short-read filtering | Fastp | 0.20.0 |
| Contig polishing (short reads) | Pilon | 1.2.2 |
| Contig deduplication | Purge-Haplotigs | 1.1.1 |
| Contamination screen | BLAST+ | 2.10.1 |
|
| ||
| Omni-C™ read filtering |
| 0.3.0 |
| Arima genomics mapping pipeline |
| 0.7.17 |
| Omni-C™ scaffolding | SALSA2 | 2.2 |
| Omni-C™ contact map | Juicebox | 1.11.08 |
| Scaffold deduplication | Purge-Haplotigs | 1.1.1 |
|
| ||
| Genome completeness | BUSCO | 4.1.4 |
| Synteny with other species |
| 1.0 |
|
| ||
| Repeat assessment | RepeatMasker | 4.1.1 |
| Protein alignments | ProtHint | 2.5.0 |
| RNA alignments | STAR | 2.7.6a |
| Gene prediction | BRAKER | 2.1.6 |
| Prediction filtering | Interproscan | 5.52-86 |
Software presented in relative order of use in the pipeline. See citations in-text.
Assembly statistics and BUSCO scores for white-tailed deer
|
|
| |
|---|---|---|
| Total length (bp) | 2 461 348 864 | 2 424 946 708 |
| Number of sequences | 2420 | 508 |
| Number of “N” gaps | n/a | 311 |
| % “N” | n/a | 0.006% |
| Largest sequence (bp) | 108 025 303 | 108 602 581 |
| Smallest sequence (bp) | 1939 | 2657 |
| Average length (bp) | 1 017 086.3 | 4 773 517.1 |
| N50 (bp) | 21 776 300 | 52 482 646 |
| L50 (# of sequences) | 32 | 18 |
| N90 (bp) | 3 308 695 | 10 477 849 |
| L90 (# of sequences) | 134 | 49 |
| BUSCO | ||
| C: complete | 93.2% (12 433) | 93.2% (12 424) |
| S: single copy | 90.9% (12 128) | 91.0%(12 129) |
| D: duplicated | 2.3% (305) | 2.2% (295) |
| F: fragmented | 0.4% (53) | 0.4% (51) |
| M: missing | 6.4% (849) | 6.4% (860) |
Single-copy orthologous genes from the 22 species in the Cetartiodactyla lineage dataset.
Figure 1.Contig, scaffold, and chromosome-level assemblies of the white-tailed deer genome. (A) Scaffolds are arranged by size (bottom) and their component contigs are arranged by scaffold (top). The largest scaffolds representing 50% (orange) and 90% (orange + red) of the assembly are indicated with color, leaving the remaining 10% of the assembly (black + gray). Scaffolds below 3 Mb (gray) are not visually separated. The number of contigs per scaffold is presented in Table 3. (B) Scaffold contact map generated from chromatin conformation capture Omni-C sequencing and visualized with HiCExplorer. Scaffold-scaffold contacts are shown increasing from blue to white, to red, and the strong diagonal signal represents scaffold self-association based on nuclear proximity. (C) Contact map for chromosome-sized pseudomolecules sequences manually curated into chromosomes.
Genome annotations and homology for the 36 chromosome pseudomolecules of white-tailed deer
| Chrom. ID | Ungapped length (bp) | No. of gaps | No. of genes | No. of repeats |
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 108 600 581 | 4 | 721 | 174 661 | 3 | 18 | 4 | 4 | 4 | 7 |
| 2 | 102 048 420 | 2 | 929 | 173 101 | 5 | 11 | 6 | 11 | 1 | 9 |
| 3 | 100 279 162 | 1 | 1 253 | 169 281 | 4 | 9 | 5 | 7 | 5 | 5 |
| 4 | 93 958 800 | 7 | 628 | 158 735 | 7 | 19 | 8 | 1 | 1 | 3 |
| 5 | 93 570 283 | 16 | 956 | 164 814 | 2 | 20 | 3 | 3 | 1 | 1 |
| 6 | 89 349 494 | 3 | 813 | 150 017 | 6 | 12 | 7 | 10 | 7 | 15 |
| 7 | 85 956 676 | 3 | 583 | 141 968 | 8 | 15 | 9 |
| 22 | 10 |
| 8 | 80 668 930 | 2 | 385 | 133 362 | 9 | 30 | 10 | 12 | 10 | 13 |
| 9 | 78 136 789 | 7 | 685 | 134 814 | 10 | 23 | 1 | 13 | 13 | 20 |
| 10 | 73 421 497 | 8 | 704 | 130 149 | 11 | 1 | 11 | 15 | 15 | 11 |
| 11 | 72 668 630 | 4 | 589 | 123 588 | 13 | 14 | 13 | 16 | 12 | 1 |
| 12 | 68 288 379 | 2 | 574 | 118 932 | 1 | 16 | 2 | 17 | 17 | 22 |
| 13 | 68 111 889 | 5 | 304 | 109 301 | 15 | 33 | 14 | 2 | 2 | 2 |
| 14 | 67 564 244 | 4 | 319 | 117 421 | 16 | 25 | 15 | 20 | 16 | 5 |
| 15 | 66 412 021 | 3 | 332 | 115 247 | 12 | 21 | 12 | 14 | 9 | 8 |
| 16 | 61 986 249 | 6 | 472 | 107 329 | 17 | 13 | 16 | 21 | 18 | 15 |
| 17 | 60 095 371 | 8 | 1 077 | 101 312 | 1 | 5 | 2 | 19 | 11 | 17 |
| 18 | 57 744 214 | 5 | 336 | 93 516 | 14 | 29 | 18 | 8 | 2 | 9 |
| 19 | 57 482 545 | 4 | 252 | 93 221 | 20 | 28 | 26 | 9 | 8 | 6 |
| 20 | 57 216 540 | 5 | 1 059 | 98 539 | 18 | 4 | 1 | 18 | 14 |
|
| 21 | 56 275 464 | 4 | 254 | 91 774 | 19 |
| 17 | 6 | 6 | 4 |
| 22 | 55 840 698 | 2 | 246 | 91 928 | 23 | 27 | 21 | 24 | 23 | 18 |
| 23 | 53 708 925 | 2 | 546 | 96 459 | 21 | 22 | 20 | 5 | 3 | 12 |
| 24 | 52 991 459 | 1 | 465 | 88 100 | 25 | 3 | 23 | 5 | 3 | 12 |
| 25 | 51 970 072 | 1 | 210 | 88 431 | 27 | 31 | 24 | 1 | 1 | 21 |
| 26 | 47 961 987 | 1 | 376 | 76 790 | 22 | 24 | 19 | 22 | 19 | 3 |
| 27 | 45 470 101 | 4 | 603 | 74 352 | 28 | 7 | 25 | 23 | 20 | 6 |
| 28 | 44 772 963 | 4 | 506 | 77 294 | 29 | 2 | 28 | 29 | 21 | 11 |
| 29 | 43 582 846 | 2 | 249 | 81 117 | 26 | 6 | 27 | 6 | 6 | 4 |
| 30 | 43 498 002 | 3 | 426 | 77 730 | 24 | 33 | 22 | 2 | 2 | 1 |
| 31 | 43 483 628 | 4 | 329 | 77 828 | 30 | 16 | 29 | 8 | 2 | 9 |
| 32 | 41 958 503 | 5 | 221 | 66 679 | 31 | 32 | 30 | 27 | 26 | 8 |
| 33 | 40 612 519 | 2 | 659 | 77 187 | 32 | 10 | 31 | 25 | 24 | 16 |
| 34 | 35 913 106 | 0 | 238 | 57 077 | 33 | 26 | 32 | 9 | 8 | 6 |
| X | 54 563 062 | 18 | 340 | 97 953 | X | X | X | X | X | X |
| Y | 2 343 217 | 3 | 11 | 4 570 | Y | Y | — | X | X | X |
| Placed | 2 258 507 266 | 18 869 | 3 834 577 | — | — | — | — | — | — | |
| Unplaced | 166 333 442 | - | 1 782 | 281 882 | — | — | — | — | — | — |
| Total | 2 424 840 708 | 20 651 | 4 116 459 | — | — | — | — | — | — |
Gray cells—multiple chromosomes in the Odocoileus virginianus assembly aligned to the same chromosome in another organism. Bold cells—a single chromosome in the Odocoileus virginianus assembly aligned to multiple chromosomes in another organism.
Chromosomes (chrom.) for this species are not numbered in order of size.
No Y chromosome sequence available for Cervus nippon.