| Literature DB >> 25031571 |
Abstract
Owing to the generation of vast amounts of sequencing data by using cost-effective, high-throughput sequencing technologies with improved computational approaches, many putative proteins have been discovered after assembly and structural annotation. Putative proteins are typically annotated using a functional annotation system that uses extant databases, but the expansive size of these databases often causes a bottleneck for rapid functional annotation. We developed SFannotation, a simple and fast functional annotation system that rapidly annotates putative proteins against four extant databases, Swiss-Prot, TIGRFAMs, Pfam, and the non-redundant sequence database, by using a best-hit approach with BLASTP and HMMSEARCH.Entities:
Keywords: bioinformatics; gene product; protein annotation
Year: 2014 PMID: 25031571 PMCID: PMC4099352 DOI: 10.5808/GI.2014.12.2.76
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1Database filtration (A) and workflow of the SFannotation annotation system (B). Black arrows represent putative proteins that are annotated by the best-hit approach, and red arrows represent the conversion of unannotated proteins to query putative proteins to search homologs against other databases.
Fig. 2Runtime of the SFannotation system (red) and a best-hit approach without the hierarchical SFannotation workflow (black). Randomly selected proteins from Escherichia coli MG 1655 (GenBank accession number: U00096) were tested using a 64-bit Linux system (Ubuntu) possessing 20 CPU threads.