| Literature DB >> 28196101 |
Chuankun Zhu1,2, Zhengjun Pan1,2, Hui Wang1,2, Guoliang Chang1,2, Nan Wu1,2, Huaiyu Ding1,2.
Abstract
The Chinese lake gudgeon Sarcocheilichthys sinensis is a small cyprinid fish with great aquaculture potential both for its edible and ornamental values. Nevertheless, available genomic and transcriptomic information for this fish is extremely deficient. In this study, a normalized cDNA library was constructed using 13 mixed tissues of an adult male S. sinensis, and was sequenced by the Illumina HiSeq2500 platform. De novo assembly was performed using 38,911,511 obtained clean reads, and a total of 147,282 unigenes with an average length of 900 bp were finally achieved. 96.2% of these unigenes were annotated in 9 public databases, and 16 segments of growth-related genes were identified for future studies. In addition, 28,493 unigenes were assigned to 61 subcategories of Gene Ontology (GO), and 10,483 unigenes were assigned to 25 categories of Cluster of Orthologous Group (COG). Moreover, 14,943 unigenes were classified into 225 pathways of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. A total of 30,666 microsatellites were detected from 17,627 unigenes with an average distribution density of 1:2405 bp. This transcriptome data set will be valuable for researches on discovery, expression and evolution on genes of interest. Meanwhile, the identified microsatellites would be useful tools for genetic and genomic studies in S. sinensis.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28196101 PMCID: PMC5308828 DOI: 10.1371/journal.pone.0171966
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Statistical summary of the de novo transcriptome assembly for Sarcocheilichthys sinensis.
| Length range (bp) | Transcript | Unigene |
|---|---|---|
| 300–500 | 74,663 | 66,112 |
| 500–1000 | 57,329 | 47,674 |
| 1000–2000 | 32,546 | 20,098 |
| 2000+ | 30,392 | 13,398 |
| Total number | 194,930 | 147,282 |
| Total length | 224,384,594 | 132,493,261 |
| N50 length | 1,924 | 1,204 |
| Mean length | 1,151 | 900 |
Fig 1Overview of the transcriptome assembly for Sarcocheilichthys sinensis.
(A) Size distribution of unigenes; (B) Size distribution of coding sequences (CDS).
Summary of functional annotations for unigenes of Sarcocheilichthys sinensis.
| Annotated Database | Annotated Number | 300< = length<1000 (bp) | length> = 1000 (bp) |
|---|---|---|---|
| COG Annotation | 10,483 | 3,142 | 7,341 |
| GO Annotation | 28,493 | 11,281 | 17,212 |
| KEGG Annotation | 14,943 | 5,405 | 9,538 |
| KOG Annotation | 26,326 | 9,727 | 16,599 |
| Pfam Annotation | 28,205 | 8,579 | 19,626 |
| Swissprot Annotation | 27,272 | 9,689 | 17,583 |
| TrEMBL Annotation | 47,576 | 23,533 | 24,043 |
| Nr Annotation | 47,248 | 23,222 | 24,026 |
| Nt Annotation | 140,308 | 106,904 | 33,404 |
| All Annotated | 141,669 | 108,207 | 33,462 |
Fig 2Species distribution of homologies for Sarcocheilichthys sinensis.
(A) Overall species distribution of the top BLAST hits against available public databases; (B) Species distribution of homologies against the Nr database.
Fig 3Gene Ontology (GO) classification of assembled unigenes.
Fig 4Functional classification of unigenes.
(A) COG (Clusters of Orthologous Groups) functional classification of unigenes; (B) KOG (Eukaryotic Ortholog Groups) functional classification of unigenes.
Summary of SSRs identified from the transcriptome of Sarcocheilichthys sinensis.
| SSR Type | Number | Percentage | Dominant motif | Number of | Percentage of |
|---|---|---|---|---|---|
| Mononucleotide | 18,007 | 58.72% | A/T | 17,565 | 57.30% |
| Dinucleotide | 9,295 | 30.31% | AC/GT | 5,591 | 18.20% |
| Trinucleotide | 2,633 | 8.59% | AAT/ATT | 646 | 2.10% |
| Tetranucleotide | 656 | 2.14% | AGAT/ATCT | 98 | 0.30% |
| Pentanucleotide | 55 | 0.18% | AAGAG/CTCTT | 8 | 0.03% |
| Hexanucleotide | 20 | 0.07% | ACACTC/AGTGTG | 6 | 0.02% |
| Total | 30,666 | 100% | - | 23,914 | 77.95% |
Summary of different repeat times for SSRs isolated from the transcriptome of Sarcocheilichthys sinensis.
| 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | >15 | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mononucleotide | 0 | 0 | 0 | 0 | 0 | 4,666 | 3,003 | 1,905 | 1,333 | 944 | 741 | 5,415 | 18,007 |
| Dinucleotide | 0 | 2,569 | 1,554 | 960 | 756 | 618 | 669 | 542 | 166 | 206 | 143 | 1,112 | 9,295 |
| Trinucleotide | 1,286 | 592 | 330 | 263 | 27 | 44 | 37 | 16 | 14 | 4 | 4 | 16 | 2,633 |
| Tetranucleotide | 271 | 197 | 33 | 23 | 19 | 11 | 11 | 13 | 13 | 6 | 12 | 47 | 656 |
| Pentanucleotide | 22 | 4 | 6 | 3 | 5 | 2 | 1 | 1 | 6 | 3 | 0 | 2 | 55 |
| Hexanucleotide | 5 | 2 | 6 | 1 | 1 | 2 | 2 | 1 | 0 | 0 | 0 | 0 | 20 |
| Total | 1,584 | 3,364 | 1,929 | 1,250 | 808 | 5,343 | 3,723 | 2,478 | 1,532 | 1,163 | 900 | 6,592 | 30,666 |
| Percentage | 5.17% | 10.97% | 6.29% | 4.08% | 2.63% | 17.42% | 12.14% | 8.08% | 5.00% | 3.79% | 2.93% | 21.50% | 100% |