| Literature DB >> 27918468 |
Jiyeon Seong1, Se Won Kang2, Bharat Bhusan Patnaik3,4, So Young Park5, Hee Ju Hwang6, Jong Min Chung7, Dae Kwon Song8, Mi Young Noh9, Seung-Hwan Park10, Gwang Joo Jeon11, Hong Sik Kong12, Soonok Kim13, Ui Wook Hwang14, Hong Seog Park15, Yeon Soo Han16, Yong Seok Lee17.
Abstract
The tadpole shrimp (Triops longicaudatus) is an aquatic crustacean that helps control pest populations. It inhabits freshwater ponds and pools and has been described as a living fossil. T. longicaudatus was officially declared an endangered species South Korea in 2005; however, through subsequent protection and conservation management, it was removed from the endangered species list in 2012. The limited number of available genetic resources on T. longicaudatus makes it difficult to obtain valuable genetic information for marker-aided selection programs. In this study, whole-transcriptome sequencing of T. longicaudatus generated 39.74 GB of clean data and a total of 269,822 contigs using the Illumina HiSeq 2500 platform. After clustering, a total of 208,813 unigenes with an N50 length of 1089 bp were generated. A total of 95,105 unigenes were successfully annotated against Protostome (PANM), Unigene, Eukaryotic Orthologous Groups (KOG), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases using BLASTX with a cut-off of 1E-5. A total of 57,731 unigenes were assigned to GO terms, and 7247 unigenes were mapped to 129 KEGG pathways. Furthermore, 1595 simple sequence repeats (SSRs) were detected from the unigenes with 1387 potential SSR markers. This is the first report of high-throughput transcriptome analysis of T. longicaudatus, and it provides valuable insights for genetic research and molecular-assisted breeding of this important species.Entities:
Keywords: Illumina sequencing; SSRs (simple sequence repeats); Triops longicaudatus; tadpole shrimp; transcriptome
Year: 2016 PMID: 27918468 PMCID: PMC5192490 DOI: 10.3390/genes7120114
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Summary statistics from Illumina sequencing of the T. longicaudatus.
| Total Number of Raw Reads | |
| Number of sequences | 323,319,608 |
| Number of bases | 40,738,270,608 |
| Total number of clean reads | |
| Number of sequences | 318,610,596 |
| Number of bases | 39,745,513,470 |
| Mean length of contig (bp) | 124.7 |
| N50 length of contig (bp) | 126 |
| GC % of contig | 48.39 |
| High-quality reads (%) | 98.54 (sequences), 97.56 (bases) |
| Contig information | |
| Total number of contig | 269,822 |
| Number of bases | 192,327,026 |
| Mean length of contig (bp) | 712.8 |
| N50 length of contig (bp) | 1148 |
| GC % of contig | 46.82 |
| Largest contig (bp) | 40,450 |
| No. of large contigs (≥500 bp) | 89,407 |
| Unigene information | |
| Total number of unigenes | 208,813 |
| Number of bases | 146,173,633 |
| Mean length of unigene (bp) | 700.0 |
| N50 length of unigene (bp) | 1089 |
| GC % of unigene | 46.97 |
| Length ranges (bp) | 224–40,450 |
Figure 1Size distribution of contigs (blue) and unigenes (red) after assembly and clustering of the quality reads from the transcriptome of T. longicaudatus.
Annotation of T. longicaudatus assembled unigene sequences against public databases.
| Databases | All | ≤300 bp | 300–1000 bp | ≥1000 bp |
|---|---|---|---|---|
| PANM-DB | 87,719 | 20,029 | 43,958 | 23,732 |
| UNIGENE | 26,845 | 6231 | 12,885 | 7729 |
| KOG | 63,978 | 12,955 | 30,892 | 20,131 |
| GO | 57,731 | 12,915 | 28,153 | 16,663 |
| KEGG | 7247 | 1735 | 3400 | 2112 |
| ALL | 95,105 | 22,935 | 48,081 | 24,089 |
The number of unigenes hits using BLASTX search (E-value < 1E−5).
Figure 2The sequence annotation profile of T. longicaudatus unigenes against PANM-DB, Unigene DB and KOG DB.
Figure 3Homology searches of T. longicaudatus unigenes against the PANM-DB. (A) E-value distribution; (B) Top-hit species distribution.
Figure 4KOG DB based functional analysis of T. longicaudatus unigenes.
Figure 5GO term classification for T. longicaudatus. (A) Predicted functional interpretation of unigenes into represented biological process, cellular component, and molecular function; (B) Number of unigene sequences annotated with numbers of GO terms per sequence.
Figure 6GO annotation of unigenes from T. longicaudatus based on biological processes, molecular functions and cellular components.
Figure 7Identified KEGG pathways of assembled unigenes from T. longicaudatus.
SSRs identified from the unigene sequences of T. longicaudatus.
| SSR parameters | Number |
|---|---|
| Total number of sequences examined | 29,547 |
| Total size of examined sequences (bp) | 75,658,821 |
| Total number of identified SSRs | 1595 |
| Di-nucleotide | 529 |
| Tri-nucleotide | 862 |
| Tetra-nucleotide | 144 |
| Penta-nucleotide | 33 |
| Hexa-nucleotide | 27 |
| Number of SSR containing sequences | 1432 |
| Number of sequences containing more than 1 SSR | 140 |
| Number of SSRs present in compound formation | 74 |
Figure 8The number of SSRs discovered in the unigenes from T. longicaudatus based on motif sequence types.