| Literature DB >> 28155708 |
Ruei-Chi Gan1,2, Ting-Wen Chen2, Timothy H Wu3, Po-Jung Huang2, Chi-Ching Lee2, Yuan-Ming Yeh2, Cheng-Hsun Chiu4, Hsien-Da Huang5,6, Petrus Tang7,8,9.
Abstract
BACKGROUND: Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared.Entities:
Keywords: Comparative transcriptome; De novo transcriptome assembly; Non-model transcriptome; Transcriptome quantification; Web service
Mesh:
Year: 2016 PMID: 28155708 PMCID: PMC5260104 DOI: 10.1186/s12859-016-1366-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Analysis strategy for quantifying transcriptomes without a reference genome. All reads from different transcriptomes are pooled together for de novo assembly. Assembled contigs are used for searching homologs. Contigs which are matched to the same homolog are used to construct a virtual transcript for later use in quantification of expression. The sequencing reads are mapped to the virtual transcripts. Expression level for each virtual transcript are represented as mapped read count, estimated RPKM and estimated TPM based the mapping results
Fig. 2PARRoT workflow. PARRoT includes the following analysis steps: 1) de novo assembly of pooled RNA-Seq data; 2) functional annotation and homolog search; 3) generated virtual transcripts; 4) mapping the sequence reads; 5) quantification of each transcript contig by calculating the RC (number of mapped read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) in each transcriptome dataset; and 6) show the expression level of contig or virtual transcript for two datasets together with their functional annotations
Fig. 3Expression level and number of virtual transcriptomes from Cnidaria. a For each contigs generated from pooled assembly, the expression level in RPKM is calculated for each virtual transcript. This plot shows that the expression level for most contigs from these two transcriptome datasets are similar to each other. b Number of virtual transcripts in each GO cellular component category in each transcriptome. After searching the most similar sequence of contigs in NCBI NR and Swiss-Prot databases, the GO annotations for the best hits are used to provide functional annotations for the assembled contigs. For all contigs hitting the same transcript in NR database, PARRoT quantify the expressions from them altogether because they are likely to come from the same transcript which is called as a virtual transcript in the plot. PARRoT calculates how many virtual transcripts were found for each GO category in each transcriptome
Fig. 4Cumulative expression level for GO terms. a Users can select different level for GO terms. PARRoT includes pre-computed tree structure for GO terms and users can selection the level from the dropdown list. Once the level is changed, both the plot and table will change with new data corresponding to whatever the level users select. b Number of virtual transcripts belong to the GO terms. c Number of raw counts mapped to the virtual transcripts belonging to the GO terms. d Sum up of all estimated RPKM (eRPKM) for virtual transcripts belonging to the GO terms. e Sum up of all estimated TPM (eTPM) for virtual transcripts belonging to the GO terms