| Literature DB >> 33976235 |
Guilherme B Dias1, Musaad A Altammami2, Hamadttu A F El-Shafie3, Fahad M Alhoshani2, Mohamed B Al-Fageeh4, Casey M Bergman5, Manee M Manee6.
Abstract
The red palm weevil Rhynchophorus ferrugineus (Coleoptera: Curculionidae) is an economically-important invasive species that attacks multiple species of palm trees around the world. A better understanding of gene content and function in R. ferrugineus has the potential to inform pest control strategies and thereby mitigate economic and biodiversity losses caused by this species. Using 10x Genomics linked-read sequencing, we produced a haplotype-resolved diploid genome assembly for R. ferrugineus from a single heterozygous individual with modest sequencing coverage ([Formula: see text] 62x). Benchmarking against conserved single-copy Arthropod orthologs suggests both pseudo-haplotypes in our R. ferrugineus genome assembly are highly complete with respect to gene content, and do not suffer from haplotype-induced duplication artifacts present in a recently published hybrid assembly for this species. Annotation of the larger pseudo-haplotype in our assembly provides evidence for 23,413 protein-coding loci in R. ferrugineus, including over 13,000 predicted proteins annotated with Gene Ontology terms and over 6000 loci independently supported by high-quality Iso-Seq transcriptomic data. Our assembly also includes 95% of R. ferrugineus chemosensory, detoxification and neuropeptide-related transcripts identified previously using RNA-seq transcriptomic data, and provides a platform for the molecular analysis of these and other functionally-relevant genes that can help guide management of this widespread insect pest.Entities:
Year: 2021 PMID: 33976235 PMCID: PMC8113489 DOI: 10.1038/s41598-021-89091-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
General assembly statistics and BUSCO scores for RPW genome assemblies.
| Assemblies from this study | Assemblies from Hazzouri et al.[ | ||||||
|---|---|---|---|---|---|---|---|
| 10x pseudo-haplotype1 | 10x pseudo-haplotype2 | 10x diploid megabubbles | Male diploid ABySS | 10x mixed-sex megabubbles | ABySS+10x (M_v.1) | hybrid (M_pseudochr) | |
| Assembly size (bp) | 589,402,552 | 588,821,663 | 730,004,643 | 749,083,425 | 967,735,890 | 780,518,639 | 782,098,041 |
| Assembly size for scaffolds | 589,402,552 | 588,821,663 | 730,004,643 | 596,550,086 | 967,735,890 | 780,518,639 | 782,098,041 |
| Contig count (#) | 42,051 | 41,976 | 47,835 | 1,832,276 | 125,191 | 48,516 | 48,516 |
| Contig N50 (bp) | 37,927 | 37,977 | 38,908 | 2000 | 18,553 | 23,825 | 23,825 |
| Scaffold count (#) | 24,005 | 24,005 | 25,245 | 1,810,484 | 78,408 | 12,400 | 4807 |
| Scaffold N50 (bp) | 471,583 | 471,583 | 288,817 | 2289 | 76,013 | 127,745 | 64,117,472 |
| GC content (%) | 32.21 | 32.21 | 32.15 | 32.4 | 32.27 | 32.19 | 32.19 |
| Complete | 98.1 | 97.5 | 97.3 | 94.9 | 95.3 | 92.8 | 93.2 |
| Single-copy | 96.2 | 95.5 | 71.7 | 93.6 | 13.4 | 16.0 | 54.2 |
| Duplicated | 1.9 | 2.0 | 25.6 | 1.3 | 81.9 | 76.8 | 39.0 |
| Fragmented | 1.1 | 1.1 | 1.1 | 2.9 | 1.6 | 1.4 | 1.3 |
| Missing | 0.8 | 1.4 | 1.6 | 2.2 | 3.1 | 5.8 | 5.5 |
Figure 1Phase blocks and B-allele frequency (BAF) of single-nucleotide variants (SNVs) in the 10 largest scaffolds of the RPW pseudo-haplotype1 assembly. Phased regions are shown as gray highlighted boxes and SNVs as black dots. Regions with white background represent unphased segments of the genome where both pseudo-haplotype assemblies are identical. SNVs in a diploid genome are expected to display BAF values of 0.5.
Figure 2Identification of putative sex chromosome scaffolds. Sequencing data were subsampled to 39 Gb and mapped to pseudo-haplotype1. Mean mapped read depth of the 10x Genomics reads produced in this study (SRX7520800; green shaded area), and the female (black line with open diamonds) and male (purple line) Illumina reads from Hazzouri et al.[18] (SRX5416728 and SRX5416729) is shown in 100 kb windows across the 10 longest scaffolds of pseudo-haplotype1 assembly with terminal windows removed. Phase blocks are shown as gray rectangles. The ratio of male/female mean mapped read depth is given on the right side of each scaffold. Scaffolds with a male/female ratio of 0.5 are indicated as putative sex chromosome sequences. The similar mapped read depth of our RPW sample and the female sample from Hazzouri et al., as well as the presence of phase blocks on putative sex chromosome scaffolds implies heterozygosity due to diploidy and indicates that such scaffolds are X-linked and that the individual sequenced in this study is female.
Figure 3Mapped read depth of BUSCO genes in the RPW pseudo-haplotype1 assembly. BUSCO genes were categorized as single-copy or duplicated based on their status in the M_pseudochr assembly from Hazzouri et al.[18]. The four DNA-seq datasets analyzed are (from top to bottom): 10x Genomics library from this study (SRX7520800), 10x Genomics library from Hazzouri et al.[18] (SRX5416727), female RPW Illumina PE library from Hazzouri et al.[18] (SRX5416728), and male RPW Illumina PE library from Hazzouri et al.[18] (SRX5416729). Depth estimates based on all mapped read and high quality mapped reads (MAPQ > 0) are shown in the left and right columns, respectively. The median values for each depth distribution are given as labeled vertical lines.
Statistics and BUSCO scores for RPW and Tribolium annotation gene sets.
| RPW pseudo-haplotype1 BRAKER | RPW isoseq3 | RPW M_v.1 Funannotate | RPW M_pseudochr BRAKER | Tribolium v5.2 | |
|---|---|---|---|---|---|
| Transcript (#) | 25,382 | 24,009 | 25,394 | 36,491 | 25,229 |
| Loci (#) | 23,413 | 6,222 | 25,394 | 33,422 | 14,244 |
| Complete | 91.8 | 62.5 | 68.9 | 88.8 | 95.6 |
| Single-copy | 82.0 | 29.8 | 21.1 | 9.6 | 76.2 |
| Duplicated | 9.8 | 32.7 | 47.8 | 79.2 | 19.4 |
| Fragmented | 2.2 | 1.6 | 10.5 | 2.2 | 0.3 |
| Missing | 6.0 | 35.9 | 20.6 | 9.0 | 4.1 |
| Complete | 91.2 | 60.1 | 68.9 | 88.3 | 91.6 |
| Single-copy | 89.2 | 59.7 | 21.1 | 10.9 | 91.1 |
| Duplicated | 2.0 | 0.4 | 47.8 | 77.4 | 0.5 |
| Fragmented | 2.2 | 2.6 | 10.5 | 2.2 | 0.5 |
| Missing | 6.6 | 37.3 | 20.6 | 9.5 | 7.9 |
Functional annotation of 25,382 predicted proteins in the RPW pseudo-haplotype1 annotation.
| Database | Sequences annotated | % of total |
|---|---|---|
| 17,893 | 70.5 | |
| InterPro | 21,210 | 83.6 |
| 22,001 | 86.7 | |
| Gene Ontology | 13,779 | 54.3 |
Presence of functionally-relevant gene sets in the RPW pseudo-haplotype1 assembly and annotation.
| Gene family | Transcripts | Transcripts mapped to assembly | Mapped loci | Mapped loci with StringTie models | StringTie models consistent with BRAKER annotation |
|---|---|---|---|---|---|
| OR | 76 | 76 | 74 | 68 | 68 |
| OBP | 38 | 35 | 35 | 34 | 34 |
| GR | 15 | 15 | 15 | 15 | 13 |
| CSP | 12 | 6 | 6 | 6 | 6 |
| IR | 10 | 10 | 10 | 8 | 8 |
| SNMP | 6 | 6 | 6 | 5 | 4 |
| CYP | 77 | 70 | 64 | 55 | 54 |
| Neuropeptide | 42 | 42 | 40 | 37 | 36 |
| GPCR | 46 | 46 | 46 | 43 | 43 |
| Total | 322 | 306 | 296 | 271 | 266 |
Curated gene set from Antony et al.[9]. Curated gene set from Antony et al.[53]. Curated gene set from Zhang et al.[11]. Numbers of transcripts, numbers of transcripts mapped to the pseudo-haplotype1 assembly, and numbers of loci in the pseudo-haplotype1 assembly are based on the original transcripts reported in Antony et al.[9] and Zhang et al.[11]. Numbers of loci that are consistent with the BRAKER annotation of pseudo-haplotype1 are based on the corresponding StringTie transcript model overlapping transcripts from Antony et al.[9], Antony et al.[53] and Zhang et al.[11] to correct for orientation strand orientation artifacts in the Antony et al.[9] transcriptome and provide a uniform set of gene structures across curated gene sets.