| Literature DB >> 33561252 |
Ariel Gershman1, Tatiana G Romer2, Yunfan Fan2, Roham Razaghi2, Wendy A Smith3, Winston Timp1,2.
Abstract
The tobacco hornworm, Manduca sexta, is a lepidopteran insect that is used extensively as a model system for studying insect biology, development, neuroscience, and immunity. However, current studies rely on the highly fragmented reference genome Msex_1.0, which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. We present a new reference genome for M. sexta, JHU_Msex_v1.0, applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly is 470 Mb and is ∼20× more continuous than the original assembly, with scaffold N50 > 14 Mb. We annotated the assembly by lifting over existing annotations and supplementing with additional supporting RNA-based data for a total of 25,256 genes. The new reference assembly is accessible in annotated form for public use. We demonstrate that improved continuity of the M. sexta genome improves resequencing studies and benefits future research on M. sexta as a model organism.Entities:
Keywords: zzm321990 Manduca sextazzm321990 ; Sphingid moth; assembly; genome; sequencing
Year: 2021 PMID: 33561252 PMCID: PMC8022704 DOI: 10.1093/g3journal/jkaa047
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Sequencing summary statistics. A comparison of sequencing data collection for the JHU_Msex_V1.0 assembly versus the Msex_1.0 assembly
| Library type | Platform | Yield (Gb) | Library size (bp) | Number of reads | |
|---|---|---|---|---|---|
| JHU_Msex_V1.0 | Short read | Illumina Novaseq | 32.83 | 2 × 150 | 219M |
| Long read | ONT MinION | 19.47 | N50: 9,156 | 4.40M | |
| Hi-C | Illumina Novaseq | 32.76 | 2 × 150 | 218M | |
| Msex_1.0 | Short read | 454 Titanium fragment | 14 | fragment | 26.5M |
| Mate pair | 454 mate pair | 7 | 3,000 | 14.7M | |
| Mate pair | 454 mate pair | 2.5 | 8,000 | 6.3M | |
| BAC | Sanger | na | 165,000 | 7,000 |
Figure 1Final assembly metrics. (A) (Left) NGx plot for final JHU_Msex_v1.0 assembly after scaffolding compared to the old Msex_1.0 assembly. Plot represents the largest 30 scaffolds from each assembly. (Right) Repeat annotation comparisons JHU_Msex_v1.0 compared to MSex_1.0. (B) BUSCO results from insecta odbv10 database. Comparing the raw Canu assembly, Nanopolished Canu assembly, Racon and Nanopolished Canu assembly, the final scaffolded and polished assembly (JHU_Msex_v1.0) and the Msex_1.0 assembly.
Final assembly statistics
| Statistic | JHU_Msex_V1.0 | Msex_1.0 |
|---|---|---|
| Total contig length | 468,966,500 | 399,658,702 |
| Number of contigs | 6,517 | 38,553 |
| Contig N50 | 402,416 | 40,289 |
| Largest contig | 2,620,325 | 401,033 |
| Total scaffold length | 470,051,811 | 419,427,777 |
| Number of scaffolds | 4,057 | 20,869 |
| Scaffold N50 | 14,248,853 | 664,006 |
| Largest scaffold | 21,114,789 | 3,253,989 |
A comparison of assembly continuity metrics between the JHU_Msex_V1.0 assembly and the Msex_1.0 assembly. Contig and scaffold N50 is a weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value, therefore larger N50 values indicate more continuous assemblies. We note a contig N50 that is 20-fold greater, and the largest scaffold 7 times larger in the JHU_Msex_V1.0 compared to Msex_1.0.
Figure 2Phylogenetic relationships. (A) Upset plot illustrating the number of shared orthogroups between the six species. Bars with less than 100 orthogroups were removed. (B) Phylogenetic tree generated from orthogroup comparisons. Age of divergence in Mya (million years ago) was collected from Kawahara .
Figure 3Gene expression clustering Top panel is the percent aligned from each RNA-seq library to either our JHU_Msex_v1.0 or Msex_1.0. Expression matrix is the Z-score of the rlog expression transformation for all highly expressed genes. Genes were clustered by Z-score into 18 clusters by euclidean distance clustering. Libraries are in order shown in Supplementary Data S1. Statistically significant enriched GO terms for each cluster are listed in Supplementary Data S2. Results can also be shown as gene expression matrix with libraries ordered by increasing developmental stage (Supplementary Figure S6)
Figure 4Gene expression in the midgut. (A) (Left) Heatmap of Z-scores for expression of digestive proteases. (Right) Quantification of midgut digestive protease expression throughout development. Library naming nomenclature was derived from Cao and Jiang (2017). The first part of the library names indicates that the libraries are made from midgut (G). The second part indicates major stages of the insect, i.e., embryo (E), 1st to 5th instar larvae (L1 − L5), pupae (P), and adults (A). In the third part, “D” stands for day, “h” for hour, “preW” for pre-wandering, “W” for wandering. “S” in the last part of library names indicates single-end sequencing; no “S” in the end indicates paired-end sequencing. The libraries present as follows: midgut (G) (2nd L; 3rd L; 4th L, 0 h; 4th L, 12 h; 4th L, late; 5th L, 1–3 h; 5th L, 24 h;. 5th L, preW; 5th L, W; P, D1; P, D15–18; A, D3–5). (B) (Left) Heatmap of expression Z-score of midgut autophagic genes throughout development. Gene names were assigned based on the NCBI GCA_000262585.1. Genes not present in this annotation were functionally annotated with Interproscan5 and assigned gene names. All genes in the heatmap were annotated as autophagy Gene Ontology Term (GO:0006914)