| Literature DB >> 30446694 |
Digang Zeng1, Xiuli Chen1, Jinxia Peng1, Chunling Yang1, Min Peng1, Weilin Zhu1, Daxiang Xie1, Pingping He1, Pinyuan Wei1, Yong Lin1, Yongzhen Zhao2, Xiaohan Chen3.
Abstract
Although shrimp are of great economic importance, few full-length shrimp transcriptomes are available. Here, we used Pacific Biosciences single-molecule real-time (SMRT) long-read sequencing technology to generate transcripts from the Pacific white shrimp (Litopenaeus vannamei). We obtained 322,600 full-length non-chimeric reads, from which we generated 51,367 high-quality unique full-length transcripts. We corrected errors in the SMRT sequences by comparison with Illumina-produced short reads. We successfully annotated 81.72% of all unique SMRT transcripts against the NCBI non-redundant database, 58.63% against Swiss-Prot, 45.38% against Gene Ontology, 32.57% against Clusters of Orthologous Groups of proteins (COG), and 47.83% against Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Across all transcripts, we identified 3,958 long non-coding RNAs (lncRNAs) and 80,650 simple sequence repeats (SSRs). Our study provides a rich set of full-length cDNA sequences for L. vannamei, which will greatly facilitate shrimp transcriptome research.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30446694 PMCID: PMC6240054 DOI: 10.1038/s41598-018-35066-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1ROI read length distribution. Different colors represent different SMRT sequencing libraries with different cDNA insert size ranges.
Figure 2Percentage of L. vannamei transcripts with BlastX hits in various species. Transcripts were searched against the NCBI non-redundant protein database, using BlastX with the E-value cutoff set to <10−5. Only species with matches for >1.8% of the L. vannamei transcripts are shown; species matching fewer than 1.8% of all transcripts are classed as ‘Other’.
Figure 3GO classification of the putative functions of the unique transcripts of L. vannamei.
Figure 4Lengths of candidate protein-coding RNAs.
Figure 5Candidate lncRNAs identified using CPC[24], CNCI[25], CPAT[26], and Pfam[27]. Un-overlapping areas indicate the number of lncRNAs identified by the single tool; overlapping areas indicate the total number of lncRNAs identified by the several tools.
Figure 6Lengths of candidate lncRNAs.
Figure 7Lengths of unique transcripts in transcriptomes generated by SMRT sequencing (this study), 454 pyrosequencing[17], and Illumina sequencing[18].
Figure 8Successful functional annotations of unique transcripts in transcriptomes generated by SMRT sequencing (this study), 454 pyrosequencing[17], and Illumina sequencing[18].