| Literature DB >> 29159068 |
Yong Huang1, Jian Li Xiong1, Xiao Chan Gao1, Xi Hong Sun1.
Abstract
The Chinese giant salamander (Andrias davidianus) is an economically important animal on academic value. However, the genomic information of this species has been less studied. In our study, the transcripts of A. davidianus were obtained by RNA-seq to conduct a transcriptomic analysis. In total 132,912 unigenes were generated with an average length of 690 bp and N50 of 1263 bp by de novo assembly using Trinity software. Using a sequence similarity search against the nine public databases (CDD, KOG, NR, NT, PFAM, Swiss-prot, TrEMBL, GO and KEGG databases), a total of 24,049, 18,406, 36,711, 15,858, 20,500, 27,515, 36,705, 28,879 and 10,958 unigenes were annotated in databases, respectively. Of these, 6323 unigenes were annotated in all database and 39,672 unigenes were annotated in at least one database. Blasted with KEGG pathway, 10,958 unigenes were annotated, and it was divided into 343 categories according to different pathways. In addition, we also identified 29,790 SSRs. This study provided a valuable resource for understanding transcriptomic information of A. davidianus and laid a foundation for further research on functional gene cloning, genomics, genetic diversity analysis and molecular marker exploitation in A. davidianus.Entities:
Keywords: Andrias davidianus; Gene annotation; Transcriptome; Unigenes
Year: 2017 PMID: 29159068 PMCID: PMC5675895 DOI: 10.1016/j.gdata.2017.10.005
Source DB: PubMed Journal: Genom Data ISSN: 2213-5960
Transcriptome assembly statistics in A. davidianus.
| Category | Transcripts | Unigenes |
|---|---|---|
| Total length (bp) | 128,175,999 | 91,713,308 |
| Sequence no. | 158,103 | 132,912 |
| ≥ 500 bp | 59,715 | 42,327 |
| ≥ 1000 bp | 34,075 | 21,855 |
| N50 | 1659 | 1263 |
| Max length (bp) | 16,067 | 16,067 |
| Min length (bp) | 201 | 201 |
| Average length (bp) | 810 | 690 |
N50 of Transcripts or unigenes was calculated by ordering all sequences, then adding the lengths from longest to shortest until the summed length exceeded 50% of the total length of all sequence.
Fig. 1Assembled Unigenes length distribution of A. davidianus transcriptome.
Fig. 2GC content distribution of unigenes.
Fig. 3Venn diagram shows commonality and difference of annotation based on NR, KEGG, Swiss-Prot and KOG.
Unigenes functional annotation by various databases.
| Databases | Number of unigenes | Percentage (%) |
|---|---|---|
| CDD | 24,049 | 18.09 |
| KOG | 18,406 | 13.85 |
| NR | 36,711 | 27.62 |
| NT | 15,858 | 11.93 |
| Pfam | 20,500 | 15.42 |
| Swiss-prot | 27,515 | 20.7 |
| TrEMBL | 36,705 | 27.62 |
| GO | 28,879 | 21.73 |
| KEGG | 10,958 | 8.24 |
| Annotated in at least one database | 39,672 | 29.85 |
| Annotated in all database | 6323 | 4.76 |
| All unigenes | 132,912 | 100 |
Fig. 4GO functional annotation of unigenes.
Fig. 5KOG annotation of unigenes.
Fig. 6KEGG annotation of unigenes.
General statistics of SSR identified transcriptome.
| Item | Number |
|---|---|
| Total number of identified SSR | 29,790 |
| Number of SSR containing sequences | 21,470 |
| Number of sequences containing > 1 SSR | 5644 |
| Number of SSRs present in compound formation | 1923 |
| Mononucleotide | 25,100 |
| Dinucleotide | 3276 |
| Trinucleotide | 1237 |
| Tetranucleotide | 168 |
| Pentanucleotide | 8 |
| Hexanucleotide | 1 |