| Literature DB >> 28204480 |
Xiaofang Geng1,2, Wanshun Li3, Haitao Shang4, Qiang Gou5, Fuchun Zhang2, Xiayan Zang1, Benhua Zeng4, Jiang Li3, Ying Wang5, Ji Ma2, Jianlin Guo1, Jianbo Jian3, Bing Chen5, Zhigang Qiao1, Minghui Zhou5, Hong Wei4, Xiaodong Fang3, Cunshuan Xu1.
Abstract
BACKGROUND: Chinese giant salamander (CGS) is the largest extant amphibian species in the world. Owing to its evolutionary position and four peculiar phenomenon of life (longevity, starvation tolerance, regenerative ability, and hatch without sunshine), it is an invaluable model species for research. However, lack of genomic resources leads to fewer study progresses in these fields, due to its huge genome of ∼50 GB making it extremely difficult to be assembled.Entities:
Keywords: Andrias davidianus; Assembly; Chinese giant salamander; De novo transcriptome
Mesh:
Year: 2017 PMID: 28204480 PMCID: PMC5467019 DOI: 10.1093/gigascience/gix006
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Summary statistics of sequencing data and Q20 percentage
| Samples | Clean reads | Clean data | Q20% of fq1 | Q20% of fq2 |
|---|---|---|---|---|
| Abdominal skin | 71 388 238 | 6 424 941 420 | 97.86 | 97.15 |
| Blood | 73 523 050 | 6 617 074 500 | 97.83 | 96.23 |
| Brain | 72 150 562 | 6 493 550 580 | 98.10 | 97.24 |
| Cartilage | 72 085 300 | 6 487 677 000 | 98.00 | 97.33 |
| Dorsal skin | 71 852 996 | 6 466 769 640 | 97.82 | 96.91 |
| Eye | 72 360 422 | 6 512 437 980 | 97.96 | 97.44 |
| Fat | 72 747 654 | 6 547 288 860 | 97.01 | 95.82 |
| Fingertip | 71 793 242 | 6 461 391 780 | 98.04 | 97.38 |
| Heart | 71 465 342 | 6 431 880 780 | 98.00 | 96.71 |
| Kidney | 73 287 452 | 6 595 870 680 | 98.00 | 96.31 |
| Lateral skin | 71 640 046 | 6 447 604 140 | 97.95 | 97.30 |
| Liver | 71 772 352 | 6 459 511 680 | 98.18 | 96.90 |
| Long bone | 72 620 898 | 6 535 880 820 | 97.06 | 96.43 |
| Lung | 73 629 864 | 6 626 687 760 | 97.86 | 96.46 |
| Maxillary | 71 368 582 | 6 423 172 380 | 97.94 | 96.73 |
| Muscle | 73 184 476 | 6 586 602 840 | 97.73 | 96.12 |
| Ovary | 73 636 484 | 6 627 283 560 | 97.95 | 96.69 |
| Pancreas | 71 963 574 | 6 476 721 660 | 97.21 | 96.51 |
| Skull | 73 445 086 | 6 610 057 740 | 97.60 | 96.52 |
| Small intestine | 71 451 888 | 6 430 669 920 | 97.40 | 96.63 |
| Spinal cord | 72 208 398 | 6 498 755 820 | 98.14 | 97.30 |
| Spleen | 71 432 332 | 6 428 909 880 | 97.37 | 96.59 |
| Stomach | 73 740,532 | 6 636 647 880 | 97.96 | 96.56 |
| Tail fat | 72 435 894 | 6 519 230 460 | 98.03 | 97.61 |
Figure 1:Huge RNA-seq data assembly. (A) The pipeline for de novo assembly, quality filter, and gene identification and classification. (B) The statistics of mapping rate before and after transcript filter. Among the total mapped reads, the unique mapped reads were >98% and the multiple mapped reads were <2% after transcript filter, except sample ‘stomach.’ Moreover, the total mapping rate was a slight decrease in comparison to the result before filter.
The statistics of final assembly and coding gene prediction
| Total data (Mb) | Total length (bp) | Total number (≥250 bp) | Total number (≥1 kp) | Average length | Coding gene | Noncoding genes |
|---|---|---|---|---|---|---|
| 156 347 | 123 835 135 | 93 366 | 34 840 | 1326 | 26 135 | 67 231 |
Statistics for functional annotation
| Functional | Number of sequences |
|---|---|
| database | annotated |
| NR | 41 043 |
| Swiss-Prot | 30 049 |
| KEGG | 32166 |
| COG | 13 229 |
| GO | 16 369 |
| Total | 41 874 |
Figure 2:Identification and evaluation of giant salamander gene set. (A) The pipeline of prediction of coding genes. PRD represents Western clawed frog protein set, 947 proteins of CGS and 554 proteins of Newt from NCBI. (B) The results of BUSCO estimation. Asterisk (*) represents the final protein sets; pound (#) represents the primary protein sets. (C) Comparison of the length of homologous region to X. tropicalis and N. parkeri based on single copy othorlogs. The X-axis is the ratio of similarity length, and the Y-axis is the percentage of gene number at this scale. (D) Comparison of the length of CDS to X. tropicalis and N. parkeri based on single copy othorlogs. The X-axis is log base 2 of CDS length ratio, and the Y-axis is the percentage of gene number at this scale.
The results of gene family classification
| Species | Total genes | Unclustered genes | Gene families | Unique families | Average genes per family |
|---|---|---|---|---|---|
|
| 26 135 (25 965)* | 6341 | 12 188 | 520 | 1.62 |
|
| 18 429 | 218 | 13 235 | 21 | 1.38 |
|
| 22 972 | 2391 | 13 986 | 306 | 1.47 |
|
| 17 767 | 818 | 13 387 | 30 | 1.27 |
|
| 18 164 | 638 | 13 548 | 31 | 1.29 |
|
| 26 046 | 1453 | 13 832 | 177 | 1.78 |
|
| 19 671 | 1461 | 12 437 | 138 | 1.46 |
|
| 21 375 | 2062 | 15 542 | 409 | 1.24 |
Asterisk (*) represents gene number after correction.
The statistics of transcripts and coding genes expressed in each sample
| Samples | Expressed transcripts | Coding genes | Samples | Expressed transcripts | Coding genes |
|---|---|---|---|---|---|
| Abdominal skin | 53 324 | 20 193 | Long bone | 56 286 | 19 754 |
| Dorsal skin | 60 446 | 21 580 | Lung | 70 132 | 22 991 |
| Lateral skin | 53 285 | 20 437 | Maxillary | 59 431 | 21 424 |
| Blood | 56 540 | 19 994 | Muscle | 49 582 | 19 968 |
| Brain | 66 923 | 22 715 | Ovary | 53 343 | 21 072 |
| Cartilage | 59 724 | 20 979 | Pancreas | 44 177 | 18 746 |
| Eye | 67 769 | 22 826 | Skull | 59 933 | 22 206 |
| Fat | 65 586 | 21 570 | Small intestine | 59 156 | 21 588 |
| Fingertip | 63 582 | 21 626 | Spinal cord | 64 808 | 22 423 |
| Heart | 62 127 | 21 734 | Spleen | 64 258 | 21 699 |
| Kidney | 66 223 | 22 792 | Stomach | 58 688 | 21 601 |
| Liver | 59 755 | 21 622 | Tail fat | 63 090 | 21 264 |
Figure 3:Hierarchical clustering of gene expression profiling. Coding genes (left); noncoding genes (right). The coding genes had higher expression abundances than noncoding genes.