| Literature DB >> 34681040 |
Anna Vlasova1,2, Toni Hermoso Pulido3, Francisco Camara3, Julia Ponomarenko3,4, Roderic Guigó3,4.
Abstract
Functional annotation allows adding biologically relevant information to predicted features in genomic sequences, and it is, therefore, an important procedure of any de novo genome sequencing project. It is also useful for proofreading and improving gene structural annotation. Here, we introduce FA-nf, a pipeline implemented in Nextflow, a versatile computational workflow management engine. The pipeline integrates different annotation approaches, such as NCBI BLAST+, DIAMOND, InterProScan, and KEGG. It starts from a protein sequence FASTA file and, optionally, a structural annotation file in GFF format, and produces several files, such as GO assignments, output summaries of the abovementioned programs and final annotation reports. The pipeline can be broken easily into smaller processes for the purpose of parallelization and easily deployed in a Linux computational environment, thanks to software containerization, thus helping to ensure full reproducibility.Entities:
Keywords: containerization; functional annotation; pipeline; reproducibility
Mesh:
Year: 2021 PMID: 34681040 PMCID: PMC8535801 DOI: 10.3390/genes12101645
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1(a) A typical functional annotation workflow; (b) simplified flowchart of FA-nf pipeline.
Non-exhaustive summary table of existing functional annotation programs or pipelines.
| Program/Pipeline | Installation | Used Software | Datasets | Comments |
|---|---|---|---|---|
| Blast2GO [ | Local installation and web/cloud services | BLAST+, Interproscan, BLAST2GO specific software, etc. | Custom, Normally, NCBI BLAST DBs, InterPro, GO | Subscription tool. Visualization dashboard. Gene structural annotation options. Newer versions integrated into other toolboxes. |
| eggNOG mapper [ | Web service and local installation | DIAMOND, HMMER | eggNOGdb (from several sources), GO, PFAM, SMART, COG | Available command-line tool and REST API for querying the service. Gene structural annotation options. |
| FA-nf | Local installation | BLAST+, DIAMOND, Interproscan, KOFAM, CDD, SignalP, TargetP, etc. | Custom. Normally, NCBI BLAST DBs, InterPro and UniProt-GOA | Based on Nextflow pipeline framework and software containers. |
| GenSAS [ | Web service | BLAST+, DIAMOND, Interproscan, SignalP, TargetP, etc. | SwissProt/TrEMBL, RefSeq, RepBase | No installation needed. Requires web user registration. Includes gene structural annotation and visualization. There can be resources and usage restrictions. |
| MicrobeAnnotator [ | Local installation | BLAST+, DIAMOND, KOFAM. | SwissProt/TrEMBL, RefSeq, KEGG | Focused on microbiomes. Conda/Python based. |
| PANNZER2 [ | Web service | SANSparallel | UniProt, UniProt-GOA, GO, KEGG | Available command-line tool for querying the service. |
| Sma3s [ | Local installation | BLAST+ | Reference datasets generated from UniProt, GO | A Perl script. Simple installation. |
Figure 2Automatically generated schema from Nextflow output.
Figure 3FA-nf annotation run results on Apis dorsata. (A) Example of NF report; (B) example of annotation summary; (C) proteins annotated with GO terms by annotation sources; (D) length distributions for annotated and unannotated proteins.