| Literature DB >> 34322153 |
Lu Zhao1, Hang Wang1, Ping Li1, Kuo Sun1, De-Long Guan1, Sheng-Quan Xu1.
Abstract
Sphingonotus Fieber, 1852 (Orthoptera: Acrididae), is a grasshopper genus comprising approximately 170 species, all of which prefer dry environments such as deserts, steppes, and stony benchlands. In this study, we aimed to examine the adaptation of grasshopper species to arid environments. The genome size of Sphingonotus tsinlingensis was estimated using flow cytometry, and the first high-quality full-length transcriptome of this species was produced. The genome size of S. tsinlingensis is approximately 12.8 Gb. Based on 146.98 Gb of PacBio sequencing data, 221.47 Mb full-length transcripts were assembled. Among these, 88,693 non-redundant isoforms were identified with an N50 value of 2,726 bp, which was markedly longer than previous grasshopper transcriptome assemblies. In total, 48,502 protein-coding sequences were identified, and 37,569 were annotated using public gene function databases. Moreover, 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were identified. According to gene functions, 61 cytochrome P450 (CYP450) and 66 heat shock protein (HSP) genes, which may be associated with drought adaptation of S. tsinlingensis, were identified. We compared the transcriptomes of S. tsinlingensis and two other grasshopper species which were less tolerant to drought, namely Mongolotettix japonicus and Gomphocerus licenti. We observed the expression of CYP450 and HSP genes in S. tsinlingensis were higher. We produced the first full-length transcriptome of a Sphingonotus species that has an ultra-large genome. The assembly characteristics were better than those of all known grasshopper transcriptomes. This full-length transcriptome may thus be used to understand the genetic background and evolution of grasshoppers.Entities:
Keywords: PacBio isoform sequencing; Sphingonotus Fieber; gene functions; genetic background; grasshoppers
Year: 2021 PMID: 34322153 PMCID: PMC8313316 DOI: 10.3389/fgene.2021.678625
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Comparison of transcriptome assemblies and gene numbers with the eight published Acrididae transcriptomes.
| Species name | Transcript length (bp) | Transcript number | Average length (bp) | N50 (bp) | Genome size (Gb) |
| 86,939,307 | 82,251 | 1,057 | 1,357 | 8.95 | |
| 136,517,140 | 96,643 | 1412 | 2,371 | ∼9 | |
| 199,205,336 | 126,643 | 1572 | 2,671 | N.A. | |
| 39,306,387 | 135,320 | 290 | 428 | 8.55∼8.96 | |
| 392,472,062 | 607,901 | 646 | N.A. | 5.28∼6.44 | |
| 112,816,350 | 70,581 | 1598.4 | 2,434 | N.A. | |
| 16,884,056 | 27,004 | 625 | 1,031 | N.A. | |
| 212,567,026 | 1,564,070 | 478 | 424 | ∼10 | |
| 221,466,421 | 88,693 | 2,497 | 2,726 | 12.81 |
FIGURE 1Statistics on the functional annotation of all Sphingonotus tsinlingensis protein-coding sequences (CDSs) in seven common databases (A), and a Venn diagram showing the functional annotation of S. tsinlingensis transcripts in the five most commonly used databases (B). NR, non-redundant protein sequence; KEGG, Kyoto Encyclopedia of Genes and Genomes; KOG, EuKaryotic Orthologous Groups; GO, gene ontology; NT, non-redundant nucleotide sequences; Pfam, protein family.
FIGURE 2Expression levels of CYP450 and HSP genes in Sphingonotus tsinlingensis, Mongolotettix japonicus, and Gomphocerus licenti. The y-axis indicates FPKM values.
FIGURE 3Venn diagram showing the distribution of long non-coding RNAs (lncRNAs) identified in Sphingonotus tsinlingensis. CNCI, Coding–Non-Coding Index; Pfam, protein family; PLEK, predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme; CPC, coding potential calculator.
FIGURE 4Gene ontology (GO) enrichment reveals the functions of genes that have been determined to be transcription factors (TFs). The rich factor refers to the ratio of the number of differentially expressed transcripts to the total number of annotated transcripts located in the GO term, and the Q-value is the P-value corrected by multiple hypothesis tests, with a value range of 0 to 1. The closer the Q-value is to zero, the more significant is the enrichment.