| Literature DB >> 35143647 |
Roger Huerlimann1,2,3, Jeff A Cowley1,4, Nicholas M Wade1,4, Yinan Wang5, Naga Kasinadhuni5, Chon-Kit Kenneth Chan5, Jafar S Jabbari5, Kirby Siemering1,5, Lavinia Gordon5, Matthew Tinning1,5, Juan D Montenegro5, Gregory E Maes2,6,7, Melony J Sellars4, Greg J Coman1,8, Sean McWilliam1,4, Kyall R Zenger1,2, Mehar S Khatkar1,9, Herman W Raadsma1,9, Dallas Donovan1,10, Gopala Krishna1,10, Dean R Jerry1,2,3.
Abstract
Shrimp are a valuable aquaculture species globally; however, disease remains a major hindrance to shrimp aquaculture sustainability and growth. Mechanisms mediated by endogenous viral elements have been proposed as a means by which shrimp that encounter a new virus start to accommodate rather than succumb to infection over time. However, evidence on the nature of such endogenous viral elements and how they mediate viral accommodation is limited. More extensive genomic data on Penaeid shrimp from different geographical locations should assist in exposing the diversity of endogenous viral elements. In this context, reported here is a PacBio Sequel-based draft genome assembly of an Australian black tiger shrimp (Penaeus monodon) inbred for 1 generation. The 1.89 Gbp draft genome is comprised of 31,922 scaffolds (N50: 496,398 bp) covering 85.9% of the projected genome size. The genome repeat content (61.8% with 30% representing simple sequence repeats) is almost the highest identified for any species. The functional annotation identified 35,517 gene models, of which 25,809 were protein-coding and 17,158 were annotated using interproscan. Scaffold scanning for specific endogenous viral elements identified an element comprised of a 9,045-bp stretch of repeated, inverted, and jumbled genome fragments of infectious hypodermal and hematopoietic necrosis virus bounded by a repeated 591/590 bp host sequence. As only near complete linear ∼4 kb infectious hypodermal and hematopoietic necrosis virus genomes have been found integrated in the genome of P. monodon previously, its discovery has implications regarding the validity of PCR tests designed to specifically detect such linear endogenous viral element types. The existence of joined inverted infectious hypodermal and hematopoietic necrosis virus genome fragments also provides a means by which hairpin double-stranded RNA could be expressed and processed by the shrimp RNA interference machinery.Entities:
Keywords: zzm321990 Penaeus monodonzzm321990 ; Australia; IHHNV EVE; PacBio; genome assembly
Mesh:
Year: 2022 PMID: 35143647 PMCID: PMC8982415 DOI: 10.1093/g3journal/jkac034
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Illumina, PacBio, 10× Genomics, and DoveTail sequencing data used for the assembly and scaffolding of the black tiger shrimp genome.
| Sequencing platform | Paired end reads | Yield (Gb) | Coverage | GenBank accessions |
|---|---|---|---|---|
| Illumina (250 bp PE) | 315 M | 158 | 72× | SRR10713996, SRR10713997 |
| PacBio Sequel | N/A | 165 | 75× | SRR10713990–SRR10713995 |
| SRR10713998–SRR10714025 | ||||
| 10× Genomics (250 bp PE) | 987 M | 494 | 224× | N/A |
| DoveTail (100 bp PE) | 1.2 B | 119 | 54× | N/A |
Summary of assembly statistics for the Australian and Thai P. monodon, and P. vannamei genomes.
| Metrics |
|
|
|
|---|---|---|---|
| No. of contigs | 47,607 | 70,380 | 50,304 |
| Largest contig | 1,147,530 | 1,387,722 | 739,419 |
| Total length of contigs | 1.89 Gb | 2.39 Gb | 1.62 Gb |
| Contig N50 | 78 kb | 79 kb | 58 kb |
| No. of scaffolds | 31,922 | 44 | – |
| Largest scaffold | 21.70 Mb | 65.87 Mb | – |
| Total length of scaffolds | 1.89 Gb | 1.99 Gb | 1.66 Gb |
| Scaffold N50 | 0.50 Mb | 49.0 Mb | 0.60 Mb |
| Projected genome size | 2.20 Gb | 2.20 Gb | 2.45 Gb |
| Percentage covered by scaffolds | 86.1% | 90.3% | 67.7% |
| GC (%) | 35.6 | 36.6 | 35.7 |
| Complete BUSCOs (C) | 86.8 | 87.9 | 78.0 |
| Complete and single-copy BUSCOs (S) | 85.8 | 84.8 | 74.0 |
| Complete and duplicated BUSCOs (D) | 1.0 | 3.1 | 4.0 |
| Fragmented BUSCOs (F) | 4.5 | 4.0 | 4.0 |
| Missing BUSCOs (M) | 8.7 | 8.0 | 18.0 |
| No. of predicted gene models | 35,517 | 31,640 | 25,596 |
| No. of protein-coding genes | 25,809 | 30,038 | – |
| No. of genes annotated in interproscan | 17,158 | 20,615 | – |
| References | This study |
|
|
Fig. 1.Kimura distances of repetitive sequences in the genome assemblies of Australian black tiger shrimp (Pmono Australia, P. monodon, NCBI accession: JAAFYK000000000, this study) Thai black tiger shrimp (Pmono Thailand, P. monodon, Pmono Thailand, Uengwetwanit ), Whiteleg shrimp (Pvana, Penaeus vannamei, NCBI accession: QCYY00000000.1, Zhang ), Japanese blue crab (Ptrit, Portunus trituberculatus, gigadb.org/dataset/100678, Tang ), and Chinese mitten crab (Ejapo, Eriocheir japonica sinensis, NCBI accession: LQIF00000000.1) determined by using either (a) repeat length or (b) repeat class.
Fig. 2.a) Schematic diagram of a 3,832 bp ssDNA genome of infectious hypodermal and hematopoietic necrosis virus (IHHNV) showing the relative positions of coding sequences (arrows) for the virus replicase (ORF1), NS1 nonstructural protein (ORF2), and viral capsid protein (ORF3). A color gradient was applied to visualize relative genome positions. b) Schematic diagram of the positions and orientations of IHHNV genome fragments comprising the Scaffold_97 EVE (S97-EVE). The orientations of the IHHNV fragments (colored arrows) and the flanking repeated 591/590 bp host sequence (black arrows) are shown by arrow directions. The origins of the S97-EVE fragments relative to their positions in a linear IHHNV-EVE (see a) are identified by color. The 10,226 bp S97-EVE resided between positions 1,656,907 and 1,667,132 in the 2,608,951 bp Scaffold_97 sequence. The larger gray arrows identify the positions and orientations of at least 6 core repeat blocks comprising of 2 smaller inverted repeats. Gray vertical bars show the location of a 34 bp sequence in each flanking repeat capable of folding into a stable secondary structure. The purple vertical bars show the locations of the 18 bp palindromic sequence present at the boundaries of each RU and partial RU. Dashed lines (>–<) identify the regions amplified by the 4 PCR tests S97-1a, S97-2, S97-3, and S97-4a. c) Coverage depth across the S97-EVE sequence of raw short reads used to assemble genome scaffolds of P. monodon from Australia (this study), Thailand (Uengwetwanit ), Vietnam (Van Quyen ), and China (Yuan ). d) Agarose gel image showing DNA products amplified by the S97-1a, S97-2, S97-3, and S97-4a PCR tests.
Detection and notable features of IHHNV-EVE sequences identified in other genomes of P. monodon.
| Reference genome IDs | Notable EVE features | ||||
|---|---|---|---|---|---|
| Start | End | Length (bp) | Orientation | Homology (%) | |
|
| |||||
|
| 770,236 | 778,124 | 7,888 | ||
|
| 772,730 | 773,391 | 661 | Minus | 99.9 |
|
| 773,450 | 774,111 | 661 | Plus | 100.0 |
|
| 774,170 | 774,831 | 661 | Minus | 99.9 |
|
| 774,890 | 775,551 | 661 | Plus | 97.9 |
|
| 862,618 | 878,928 | 16,310 | ||
|
| 866,534 | 867,145 | 611 | Minus | 79.4 |
|
| 867,204 | 867,791 | 587 | Plus | 81.3 |
|
| 867,840 | 868,467 | 627 | Minus | 83.5 |
|
| 868,515 | 869,130 | 615 | Plus | 80.0 |
|
| 872,127 | 872,754 | 627 | Plus | 78.9 |
|
| 872,799 | 873,434 | 635 | Minus | 90.0 |
|
| 873,492 | 874,152 | 660 | Plus | 97.2 |
|
| 875,469 | 876,168 | 699 | Plus | 92.1 |
|
| |||||
|
| 4,003 | 98.4 | |||
|
| 1,917 | 99.0 | |||
|
| 2,220 | 98.9 | |||
|
| |||||
|
| 645 | 98.9 | |||
|
| 848 | 98.3 | |||