| Literature DB >> 35213661 |
Jianping Jiang1, Juan Huo1, Yueyun Zhang1, Yongli Xu1, Chengjian Zhao1, Jianhua Miao1.
Abstract
Tokay Gecko (Gekko gecko) is a rare and endangered medicinal animal in China. Its dry body has been used as an anti-asthmatic agent for two thousand years. To date, the genome and transcriptome of this species remain poorly understood. Here, we adopted single molecule real-time (SMRT) sequencing to obtain full-length transcriptome data and characterized the transcriptome structure. We identified 882,273 circular consensus (CCS) reads, including 746,317 full-length nonchimeric (FLNC) reads. The transcript cluster analysis revealed 212,964 consensus sequences, including 203,994 high-quality isoforms. In total, 111,372 of 117,888 transcripts were successfully annotated against eight databases (Nr, eggNOG, Swiss-Prot, GO, COG, KOG, Pfam and KEGG). Furthermore, 23,877 alternative splicing events, 169,128 simple sequence repeats (SSRs), 10,437 lncRNAs and 7,932 transcription factors were predicted across all transcripts. To our knowledge, this report is the first to document the G. gecko transcriptome using SMRT sequencing. The full-length transcript data might accelerate transcriptome research and lay the foundation for further research on G. gecko.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35213661 PMCID: PMC8880673 DOI: 10.1371/journal.pone.0264499
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of the BUSCO analysis.
| BUSCO results | Count | Percentage (%) |
|---|---|---|
| Complete BUSCOs (C) | 2,397 | 92.60% |
| Complete and single-copy BUSCOs (S) | 645 | 24.90% |
| Complete and duplicated BUSCOs (D) | 1,752 | 67.70% |
| Fragmented BUSCOs (F) | 63 | 2.40% |
| Missing BUSCOs (M) | 126 | 5.00% |
| Total BUSCO groups searched | 2,586 |
Summary of PacBio SMRT sequencing of Gekko gecko.
| Category | Dataset |
|---|---|
| Read bases of CCS | 3,430,277,475 |
| Number of CCS | 882,273 |
| Mean Read Length of CCS | 3,888 |
| Number of undesired primer reads | 79,372 |
| Number of filtered short reads | 0 |
| Number of full-length nonchimeric reads | 746,317 |
| Full-length nonchimeric percentage (FLNC%) | 84.59% |
| Number of consensus isoforms | 212,964 |
| Average consensus isoforms read length | 4,153 |
| Number of polished high-quality isoforms | 203,994 |
| Number of polished low-quality isoforms | 7,917 |
Fig 1Distribution of the FLNC read length.
Statistics of the annotation results.
| Annotated databases | Isoform number | Percentage |
|---|---|---|
| Nr | 111,001 | 99.67% |
| eggNOG | 109,042 | 97.91% |
| Pfam | 91,887 | 82.50% |
| GO | 84,713 | 76.06% |
| KOG | 83,361 | 74.85% |
| KEGG | 75,001 | 67.34% |
| Swiss-Prot | 73,152 | 65.68% |
| COG | 34,491 | 30.97% |
| All database | 111,372 | 94.47% |
Fig 2(A) The species identified by a homology search against the Nr databases. (B) GO annotation and (C) COG annotation of the G. gecko transcriptome.
The top 20 mapped pathways annotated by the KEGG database.
| Pathways | Pathway ID | Gene number | Percentage |
|---|---|---|---|
| Endocytosis | ko04144 | 2,464 | 3.29% |
| Focal adhesion | ko04510 | 1,564 | 2.09% |
| MAPK signaling pathway | ko04010 | 1,522 | 2.03% |
| Regulation of actin cytoskeleton | ko04810 | 1,497 | 2.00% |
| Tight junction | ko04530 | 1,466 | 1.95% |
| Herpes simplex infection | ko05168 | 1,431 | 1.91% |
| Protein processing in endoplasmic reticulum | ko04141 | 1,307 | 1.74% |
| Phagosome | ko04145 | 1,239 | 1.65% |
| RNA transport | ko03013 | 1,199 | 1.60% |
| Purine metabolism | ko00230 | 1,167 | 1.56% |
| Insulin signaling pathway | ko04910 | 1,116 | 1.49% |
| Ubiquitin mediated proteolysis | ko04120 | 1,104 | 1.47% |
| mTOR signaling pathway | ko04150 | 1,076 | 1.43% |
| Spliceosome | ko03040 | 1,064 | 1.42% |
| Calcium signaling pathway | ko04020 | 1,052 | 1.40% |
| FoxO signaling pathway | ko04068 | 1,013 | 1.35% |
| Apoptosis | ko04210 | 984 | 1.31% |
| Lysosome | ko04142 | 979 | 1.31% |
| Adherens junction | ko04520 | 946 | 1.26% |
| Adrenergic signaling in cardiomyocytes | ko04261 | 941 | 1.25% |
Statistical analysis of SSRs.
| Item | Number |
|---|---|
| Total number of sequences examined | 116,842 |
| Total number of sequences examined (bp) | 517,279,084 |
| Total number of identified SSRs | 169,128 |
| Number of SSR-containing sequences | 72,630 |
| Number of sequences containing more than 1 SSR | 42,163 |
| Mononucleotides | 104,516 |
| Dinucleotides | 33,648 |
| Trinucleotides | 26,224 |
| Tetranucleotides | 4,137 |
| Pentanucleotides | 488 |
| Hexanucleotides | 115 |
Fig 3Candidate lncRNAs identified by CPC, CNCI, CPAT and Pfam.
Fig 4(A) Length distribution of CDSs and (B) type distribution of TFs.