| Literature DB >> 32848169 |
Hao Yuan1, Xue Zhang1, Lina Zhao1, Huihui Chang1, Chao Yang1,2, Zhongying Qiu3, Yuan Huang4.
Abstract
Acrididae are diverse in size, body shape, behavior, ecology and life history; widely distributed; easy to collect; and important to agriculture. They represent promising model candidates for functional genomics, but their extremely large genomes have hindered this research; establishing a reference transcriptome for a species is the primary means of obtaining genetic information. Here, two Acrididae species, Gomphocerus licenti and Mongolotettix japonicus, were selected for full-length (FL) PacBio transcriptome sequencing. For G. licenti and M. japonicus, respectively, 590,112 and 566,165 circular consensus sequences (CCS) were generated, which identified 458,131 and 428,979 full-length nonchimeric (FLNC) reads. After isoform-level clustering, next-generation sequencing (NGS) short sequences were used for error correction, and remove redundant sequences with CD-HIT, 17,970 and 16,766 unigenes were generated for G. licenti and M. japonicus. In addition, we obtained 17,495 and 16,373 coding sequences, 1,082 and 813 transcription factors, 11,840 and 10,814 simple sequence repeats, and 905 and 706 long noncoding RNAs by analyzing the transcriptomes of G. licenti and M. japonicus, respectively, and 15,803 and 14,846 unigenes were annotated in eight functional databases. This is the first study to sequence FL transcriptomes of G. licenti and M. japonicus, providing valuable genetic resources for further functional genomics research.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32848169 PMCID: PMC7450073 DOI: 10.1038/s41598-020-71178-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary for the full-length transcriptome of G. licenti and M. japonicus using PacBio sequencing.
| Parameters | ||
|---|---|---|
| Number of clean data (Gb) | 34.16 | 34.55 |
| Number of CCS | 590,112 | 566,165 |
| Read bases of CCS | 1,833,944,798 | 1,774,198,050 |
| Mean read length of CCS (bp) | 3,107 | 3,133 |
| Number of full-length non-chimeric reads | 458,131 | 428,979 |
| Number of non-full-length reads | 131,027 | 136,304 |
| Number of filtered short reads | 954 | 882 |
| Full-length non-chimeric percentage (%) | 77.63 | 75.77 |
| Number of consensus isoforms | 29,340 | 25,379 |
| Mean read length of consensus isoforms (bp) | 2,995 | 2,943 |
| Number of polished high-quality isoforms | 28,736 | 24,831 |
| Number of polished low-quality isoforms | 601 | 544 |
| Percent of polished high-quality isoforms(%) | 97.94 | 97.86 |
| Number of unigenes | 17,932 | 16,739 |
| Mean read length (bp) | 3,000 | 2,933 |
| Smallest read length (bp) | 200 | 200 |
| Largest read length (bp) | 14,592 | 12,710 |
| N50 length (bp) | 3,605 | 3,503 |
| GC content (%) | 42.2 | 42.6 |
Figure 1The comparison of unigene length distributions between pacbio sequencing and Illumina sequencing.
Figure 2The length distributions of the complete encoded protein sequences of G. licenti and M. japonicus.
Figure 3Classification of the top 20 TF families. (a) G. licenti. (b) M. japonicus.
Figure 4Densities of different types of SSRs and Venn diagrams of the numbers of lncRNAs identified by CPC, CNCI, Pfam and CPAT. (a) Densities of different types of SSRs in G. licenti. (b) Densities of different types of SSRs in M. japonicus. (c) Venn diagram showing numbers of identified lncRNAs in G. licenti. (d) Venn diagram showing numbers of identified lncRNAs in M. japonicus.
Annotation of full-length transcript datasets to public databases.
| Annotated databases | ||||
|---|---|---|---|---|
| Unigene number | Percentage (%) | Unigene number | Percentage (%) | |
| COG | 6,108 | 33.99 | 5,724 | 34.14 |
| GO | 8,065 | 44.88 | 7,637 | 45.55 |
| KEGG | 8,115 | 45.16 | 7,338 | 43.77 |
| KOG | 11,921 | 66.34 | 11,246 | 67.08 |
| Pfam | 13,686 | 76.16 | 13,046 | 77.81 |
| Swiss-Prot | 11,179 | 62.21 | 10,717 | 63.92 |
| EggNOG | 15,059 | 83.80 | 14,238 | 84.92 |
| NR | 15,594 | 86.78 | 14,669 | 87.49 |
| All annotated | 15,803 | 87.94 | 14,846 | 88.55 |
| All analysed | 17,970 | 100.00 | 16,766 | 100.00 |
Figure 5Distribution diagram of species containing homologous sequences in NR. (a) G. licenti. (b) M. japonicus.
Figure 6KEGG pathway classifications (Kanehisa, M. & Goto, S., 2000) for all annotated unigenes. (a) G. licenti. (b) M. japonicus.
Figure 7Distribution of GO terms for all annotated unigenes. (a) G. licenti. (b) M. japonicus.