| Literature DB >> 34282833 |
Yu Bai1,2,3, Yonglu Meng2,3, Jianlin Luo2,3, Hui Wang2,3, Guoyong Li2,3, Can Li2,3.
Abstract
The Chinese giant salamander, Andrias davidianus, is the largest amphibian species in the world; it is thus an economically and ecologically important species. The skin of A. davidianus exhibits complex adaptive structural and functional adaptations to facilitate survival in aquatic and terrestrial ecosystems. Here, we report the first full-length amphibian transcriptome from the dorsal skin of A. davidianus, which was assembled using hybrid sequencing and the PacBio and Illumina platforms. A total of 153,038 transcripts were hybrid assembled (mean length of 2039 bp and N50 of 2172 bp), and 133,794 were annotated in at least one database (nr, Swiss-Prot, KEGG, KOGs, GO, and nt). A total of 58,732, 68,742, and 115,876 transcripts were classified into 24 KOG categories, 1903 GO term categories, and 46 KEGG pathways (level 2), respectively. A total of 207,627 protein-coding regions, 785 transcription factors, 27,237 potential long non-coding RNAs, and 8299 simple sequence repeats were also identified. The hybrid-assembled transcriptome recovered more full-length transcripts, had a higher N50 contig length, and a higher annotation rate of unique genes compared with that assembled in previous studies using next-generation sequencing. The high-quality full-length reference gene set generated in this study will help elucidate the genetic characteristics of A. davidianus skin and aid the identification of functional skin proteins.Entities:
Keywords: Andrias davidianus; Chinese giant salamander; Illumina; PacBio; hybrid sequencing; skin transcriptome
Mesh:
Substances:
Year: 2021 PMID: 34282833 PMCID: PMC8329649 DOI: 10.1042/BSR20210511
Source DB: PubMed Journal: Biosci Rep ISSN: 0144-8463 Impact factor: 3.840
Summary of transcript assembly information
| Library | 0.5–4 kb | >4 kb | after clustering | after correction |
|---|---|---|---|---|
| 219,021 | 98,405 | 153,038 | 153,038 | |
| 202,602 (92.5%) | 77,445 (78.7%) | 147,243 (96.2%) | 147,243 (96.2%) | |
| 16,419 (7.5%) | 20,960 (21.3%) | 5795 (3.8%) | 5795 (3.8%) | |
| 48.20% | 53.22% | 49.17% | 49.17% | |
| 446,609,747 | 451,427,874 | 402,093,303 | 402,093,303 | |
| 2039.1 | 4587.4 | 2627.4 | 2627.4 | |
| 2172 | 4739 | 3432 | 3432 |
Figure 1Length distribution of assembled and annotated transcripts
Figure 2Length distribution of assembled and annotated transcripts
Figure 3Distribution of CDS length
Figure 4Top 29 out of the 47 TF families
Figure 5Distribution of SSR motifs
Figure 6KOG classification of A. davidianus transcripts
(A) RNA processing and modification; (B) Chromatin structure and dynamics; (C) Energy production and conversion; (D) Cell cycle control, cell division, chromosome partitioning; (E) Amino acid transport and metabolism; (F) Nucleotide transport and metabolism; (G) Carbohydrate transport and metabolism; (H) Coenzyme transport and metabolism; (I) Lipid transport and metabolism; (J) Translation, ribosomal structure and biogenesis; (K) Transcription; (L) Replication, recombination and repair; (M) Cell wall/membrane/envelope biogenesis; (N) Cell motility; (O) Posttranslational modification, protein turnover, chaperones; (P) Inorganic ion transport and metabolism; (Q) Secondary metabolites biosynthesis, transport and catabolism; (R) General function prediction only; (S) Function unknown; (T) Signal transduction mechanisms; (U) Intracellular trafficking, secretion, and vesicular transport; (V) Defense mechanisms; (W) Extracellular structures; (X) Unnamed protein; (Y) Nuclear structure; (Z) Cytoskeleton.
Figure 7GO classification of unigenes
Figure 8KEGG classification of the transcript isoforms
Summary of transcript assembly information
| Sequencing technology | Hybrid (This study) | Illumina (This study) | Illumina [ | Illumina [ | Illumina [ | Illumina [ |
|---|---|---|---|---|---|---|
| 153,038 | 127,992 | 132,912 | 167,064 | 93,366 | 87,297 | |
| 49.17% | 47.96% | 49.85% | NA | 47.96% | NA | |
| 17,552 | 15,158 | 16,067 | NA | 128,777 | NA | |
| 2627.4 | 1176 | 690 | 647.84 | 1326 | 734.80 | |
| 3432 | 2076 | 1263 | 956 | 2409 | 1216 | |
| 1559 | 476 | 262 | NA | 502 | NA | |
| 87.43% | 48.50% | 29.85% | 33.93% | 44.85% | 40.06% |
Annotated rate: Percentage of unigenes annotated in each database, including nr, Swiss-Prot, KEGG, KOGs, GO, and nt databases. NA: not available.
Mapping rate of Illumina short reads
| Transcripts | SRX4453287 (This study) | SRR5344016 [ | SRR4449143 [ | SRR4449153 [ | SRR4449117 [ | SRR1609131 [ |
|---|---|---|---|---|---|---|
| 80.32% | 81.4% | 81.41% | 65.4% | 71.72% | 80.73% | |
| 78.07% | 77.85% | 79.6% | 68.16% | 72.36% | 81.64% | |
| 77.39% | 80.43% | 82.64% | 79.58% | 81.18% | 81.50% |
SRX4453287 was from dorsal skin tissue; SRR5344016 was from skin tissue; SRR4449143 was from dorsal skin tissue; SRR4449153 was from lateral skin tissue; SRR4449153 was from abdominal skin tissue; and SRR1609131 was from skin tissue. Transcripts were hybrid assembled, and unigenes were assembled using Illumina short reads. CGS_All_Unigene_filter.fa was downloaded from Xiaofang Geng et al. [11] (http://gigadb.org/dataset/100277), which was assembled from more than 20 tissues [11].