| Literature DB >> 24330842 |
Michiel Van Bel, Sebastian Proost, Christophe Van Neste, Dieter Deforce, Yves Van de Peer, Klaas Vandepoele.
Abstract
Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system. TRAPID is freely available at http://bioinformatics.psb.ugent.be/webtools/trapid/.Entities:
Mesh:
Year: 2013 PMID: 24330842 PMCID: PMC4053847 DOI: 10.1186/gb-2013-14-12-r134
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Schematic overview of the TRAPID pipeline. The TRAPID pipeline consists of two separate steps. The first one is a non-interactive processing step, during which all transcripts are assigned to gene families using a RAPSearch2 similarity search, followed by functional annotation transfer and meta-annotation assignment. The second step is interactive and directly commanded through the website interface. Here, the user has the ability to analyze his data using functional enrichment analyses, multiple sequence alignments, and phylogenetic trees.
Overview and content of the TRAPID reference databases
| OrthoMCL-DB version 5 | PFAM domains | All | 150 | 1,398,546 | OrthoMCL clustering |
| | | Alveolata | 15 | 98,796 | |
| | | Amoebozoa | 4 | 41,930 | |
| | | Archaea | 16 | 30,233 | |
| | | Bacteria | 36 | 112,059 | |
| | | Euglenozoa | 9 | 107,034 | |
| | | Eukaryota | 98 | 1,256,264 | |
| | | Fungi | 24 | 680,778 | |
| | | Metazoa | 29 | 529,788 | |
| PLAZA 2.5 | Gene Ontology, InterPro domains | Viridiplantae (green plants) | 25 | 780,667 | TribeMCL clustering + integrative orthologs |
| | | Angiosperms | 18 | 671,950 | |
| | | Eudicots | 13 | 480,106 | |
| Monocots | 5 | 191,844 |
Figure 2Visualization of a phylogenetic tree with associated meta-annotation labels per transcript. Cladogram based on a FastTree2 phylogenetic tree for the alpha/beta-Hydrolases superfamily (based on PLAZA 2.5 gene family HOM002165). Transcripts are marked in black, while colored gene identifiers refer to homologs from reference species: Ostreococcus tauri (ota), Physcomitrella patens (ppa), Populus trichocarpa (ptr), Arabidopsis thaliana (ath), Oryza sativa ssp. Japonica (osa), Sorghum bicolor (sbo), and Zea mays (zma). Meta-annotation for the different transcripts is visualized using the colored boxes on the right.
Feature comparison web-based transcript analysis platforms
| Sequence similarity search | NCBI BLAST | BLAST (bi-directional) | RAPSearch2 |
| ORF finding | No | No | Yes |
| Frameshift correction | No | No | FrameDP |
| Reference database | NCBI non redundant database | Curated KEGG genes | OrthoMCL-DB version 5, PLAZA 2.5 |
| Functional annotation | Gene Ontology, InterProScan, Enzyme codes, KEGG | KEGG (KEGG Orthology groups) | Gene Ontology, Protein domains (InterPro/PFAM) |
| Enrichment analysis | Yes | No | Yes |
| Protein alignments | No | No | MUSCLE |
| Phylogenetic trees | No | No | FastTree, PhyML |
| Others | advanced stand-alone graphical user interface | graphical pathway maps | ORF length meta-annotation, share experiments with other users |
aBasic web-start version.
Comparison of computation time and transcript coverage for different web-based transcript analysis platforms
| 50 | 2 (16%) | 29 (44%) | 5 (50%) | 5 (70%) |
| 500 | 4 (20%) | 232 (77%) | 3 (54%) | 3 (65%) |
| 5000 | 11 (23%) | - | 25 (54%) | 25 (66%) |
| 25392 | 32 (24%) | - | 95 (54%) | 95 (66%) |
aTime is measured in minutes and numbers in parenthesis indicate the fraction of genes which received a GO functional annotation. The GO annotation for TRAPID was obtained through the protocol using the transfer from both the gene family and best-hit annotation. Missing values for BLAST2GO are due to runtimes larger than 24 h. The data is based on the 25,392 transcripts of Panicum hallii.
bGF indicates the fraction of transcripts assigned to a gene family.