| Literature DB >> 28874813 |
Ting-Wen Chen1,2, Ruei-Chi Gan1,2, Yi-Kai Fang3, Kun-Yi Chien4,5, Wei-Chao Liao2,6,7, Chia-Chun Chen4,8, Timothy H Wu9, Ian Yi-Feng Chang1,2, Chi Yang1,2, Po-Jung Huang1,2, Yuan-Ming Yeh1,2, Cheng-Hsun Chiu10, Tzu-Wen Huang11, Petrus Tang12,13,14.
Abstract
ABSATRACT: Along with the constant improvement in high-throughput sequencing technology, an increasing number of transcriptome sequencing projects are carried out in organisms without decoded genome information and even on environmental biological samples. To study the biological functions of novel transcripts, the very first task is to identify their potential functions. We present a web-based annotation tool, FunctionAnnotator, which offers comprehensive annotations, including GO term assignment, enzyme annotation, domain/motif identification and predictions for subcellular localization. To accelerate the annotation process, we have optimized the computation processes and used parallel computing for all annotation steps. Moreover, FunctionAnnotator is designed to be versatile, and it generates a variety of useful outputs for facilitating other analyses. Here, we demonstrate how FunctionAnnotator can be helpful in annotating non-model organisms. We further illustrate that FunctionAnnotator can estimate the taxonomic composition of environmental samples and assist in the identification of novel proteins by combining RNA-Seq data with proteomics technology. In summary, FunctionAnnotator can efficiently annotate transcriptomes and greatly benefits studies focusing on non-model organisms or metatranscriptomes. FunctionAnnotator, a comprehensive annotation web-service tool, is freely available online at: http://fa.cgu.edu.tw/ . This new web-based annotator will shed light on field studies involving organisms without a reference genome.Entities:
Year: 2017 PMID: 28874813 PMCID: PMC5585236 DOI: 10.1038/s41598-017-10952-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Annotation system implemented in FunctionAnnotator. After users upload a FASTA file containing nucleotide sequences and select the desired analysis modules, FunctionAnnotator will execute all of the selected annotation processes in parallel. FunctionAnnotator includes in-house scripts and annotation tools, as listed in this figure, including LAST, BLAST2GO, PSORT, TMHMM, etc. for annotating GO terms, enzyme and domain identification, predictions for subcellular localization, lipoproteins, secretory proteins and transmembrane proteins, etc. For each annotation category, FunctionAnnotator annotates uploaded sequences with corresponding annotation tools and integrates the output into graphs or tables. All of the annotation results are also available for download as text files.
Benchmarks for FunctionAnnotator performance.
| Organism(s) | # of contigs | Total bp | # of contigs with best hit (%) | # of contigs annotated* (%) | Elapsed Time |
|---|---|---|---|---|---|
| Clam | 101,795 | 38,886,727 | 29,960 (29%) | 35,971 (64%) | 7 h 20 m 38 s |
| Metatranscriptome I | 241 | 85,193 | 225 (93%) | 126 (64%) | 24 m 47 s |
| Metatranscriptome II | 381 | 137,588 | 367 (96%) | 243 (76%) | 29 m 57 s |
| Trichomonas | 19,415 | 24,204,403 | 16,866 (87%) | 13,497 (70%) | 3 h 26 m 56 s |
*Only contigs having predicted coding sequences longer than 66 were counted and subcellular localization prediction results are eliminated.
Figure 2Partial annotation result for the clam transcriptome. (a) Basic statistics for uploaded nucleotide sequences including number of entries (contigs), total base pairs and upload date are listed in the table. (b) Basic information from the uploaded contigs, including GC content, N50, average length, etc., are listed in this table together with a bar chart of the length distribution for contigs. (c) Distribution of GO annotation results for molecular function. The most abundant molecular function in the 3rd level is ion binding, which can be found in approximately 34% of GO annotated contigs. Of note, each contig can have more than one GO term assignment, therefore the total percentage from this bar chart is larger than 1. (d) Transmembrane domain (TM) prediction results show 5,480 contigs have one TM domain and 2,891 contigs have multiple TM domains. FunctionAnnotator also plots the predicted topology of transmembrane domains along with their positional information.
Figure 3Domains and subcellular localization predictions for transcripts from clam. (a) Domain identification result (partial) shows that FunctionAnnotator identified 14,037 domains from this transcriptome. The identified domains are shown together with their domain IDs, domain names, domain coverages and RPS BLAST e-values. (b) Subcellular localization prediction results demonstrate that 19,339 of the transcripts are predicted to be located in the extracellular compartment followed by 17,362 transcripts located in the cytosol. FunctionAnnotator presents this summary table and a detailed table containing subcellular localization and a prediction score for each contig.
Figure 4Taxonomy distribution for two metatranscriptomes from the gut microbiome of medicinal leech. FunctionAnnotator searched the NR database for a homolog of each transcript and then identifies which species the best hits come from. The taxonomic information for these species is presented in a bar chart and the user can select different taxonomic levels. (a) At the species level, the best hits of 163 out of the original 241 contigs are from Mucinivorans hirudinis, and for another 4 contigs, the best hits are from Aeromonas veronii for the first dataset. Similar results were obtained for the second dataset. (b) At the Family level, again the most abundant family is Rikenellaceae, followed by Aeromonadaceae, Enterobacteriaceae and Bacteriaoidaceae for both metatranscriptomes.
Figure 5Enzyme, lipoprotein and signal peptide identification for metatranscriptomes from the gut microbiome of medicinal leech. (a) One putative enzyme identified in this metatranscriptome listed together with its predicted EC number. By clicking on the EC number, the user will be linked to a website providing more detailed information about the chemical reactions the enzyme catalyzes. (b) Putative signal peptides identified by FunctionAnnotator are also listed, as well as their predicted cleavage sites and prediction scores. (c) Putative lipoproteins are listed with predicted score, cleavage site and the amino acid in position +2 after the cleavage site.