Literature DB >> 29379090

GO FEAT: a rapid web-based functional annotation tool for genomic and transcriptomic data.

Fabricio Almeida Araujo1, Debmalya Barh2, Artur Silva1, Luis Guimarães1, Rommel Thiago Juca Ramos3.   

Abstract

Downstream analysis of genomic and transcriptomic sequence data is often executed by functional annotation that can be performed by various bioinformatics tools and biological databases. However, a full fast integrated tool is not available for such analysis. Besides, the current available software is not able to produce analytic lists of annotations and graphs to help users in evaluating the output results. Therefore, we present the Gene Ontology Functional Enrichment Annotation Tool (GO FEAT), a free web platform for functional annotation and enrichment of genomic and transcriptomic data based on sequence homology search. The analysis can be customized and visualized as per users' needs and specifications. GO FEAT is freely available at http://computationalbiology.ufpa.br/gofeat/ and its source code is hosted at https://github.com/fabriciopa/gofeat .

Entities:  

Mesh:

Year:  2018        PMID: 29379090      PMCID: PMC5789007          DOI: 10.1038/s41598-018-20211-9

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Giving biological meaning to genomic and transcriptomic data is laborious and time consuming, especially considering the large amount of data generated by high-throughput technologies[1] and the number of tools, web-servers and databases developed for this purpose[2]. The biological analysis is often given by functional annotation through Gene Ontology (GO) database[3] which is widely used as the gene functions dictionary. Besides, it’s very usual to perform data functional enrichment by the integration of several databases such as: UniProt[4], InterPro[5], KEGG[6], Pfam[7], NCBI[8] and SEED[9]. Many tools are available for the annotation process: Blast2GO[10], AmiGO[11], GOrilla[12], REVIGO[13], QuickGO[14], NaviGO[15]. However, these tools have limitations: a) not all are completely and freely available; b) installation, configuration and command line are complex; c) lack of visual interface; d) limited capacity or sequence number limitation for analysis e) difficulty to share and export results. To address these issues, we developed GO FEAT, a free, on-line, user friendly platform for functional annotation and enrichment of genomic and transcriptomic data based on sequence homology search, allowing users to export the results to different output formats, to generate reports, tables, GO charts and graphs that help them with downstream analysis.

Methods

GO FEAT is developed in PHP as back-end programming language. HTML5, CSS3 and JavaScript are used as front-end programming language, and PERL is adapted for remote connection scripts. To store the records from the tool we used MySQL RDBMS. All remote calling is made by public REST API (EMBL-EBI’s public API for Blast, UniProt for database integration, QuickGO for ontologies and SEED’s public API for SEED). The user can share their data to other users, export data to several formats, and generate Gene Ontology charts (general and by type of ontology). GO FEAT receives a multi-fasta file (nucleotide or protein) as an input, once a project is registered or assigned. The pipeline (Fig. 1) proceed to search for homology with e-value defined by the user and then annotate the homologs using public databases. After the submission, each sequence is queued to the processing line. The processing starts with the remote BLAST[16] using the EMBL-EBI public API[17] or local DIAMOND[18] aligner. GO FEAT automatically identifies the type of sequence to be searched (nucleotide or protein) and runs the specific program: BLASTx for nucleotide sequences or BLASTp for protein sequences. The next step is to integrate the result from the alignment to UniProt, NCBI Protein, KEGG, InterPro, Pfam and Gene Ontology databases via UniProt public API and SEED database via SEED public API. After the integration, the results are processed and displayed in graphs, charts, and tables to simplify the analysis.
Figure 1

GO FEAT pipeline steps. (1) A multi-fasta file containing any number of sequences (nucleotide or protein) is used as input. (2) Each sequence is used as query against EBI database through EBI public API or local DIAMOND. (3) The alignment results are mapped to UniProt, NCBI Protein, KEGG, GO databases by UniProt public API and SEED database by SEED public API. Finally, (4) the results are displayed in tables, charts and graphs.

GO FEAT pipeline steps. (1) A multi-fasta file containing any number of sequences (nucleotide or protein) is used as input. (2) Each sequence is used as query against EBI database through EBI public API or local DIAMOND. (3) The alignment results are mapped to UniProt, NCBI Protein, KEGG, GO databases by UniProt public API and SEED database by SEED public API. Finally, (4) the results are displayed in tables, charts and graphs. Since the EBI servers restrict the number of request to 30 at time, a queue control parameter was developed to optimize the server’s resources. For projects with 100 or less sequences, resources are allocated dynamically for maximum of 10 users simultaneously (3 requests for each project). If resources are available, the projects can receive more than 3 requests. Projects with more than 100 sequences are put in a queue for local alignment using DIAMOND that process batches of 500 sequences at a time. This allows the server’s resources usage to be optimized and more sequences can be processed at the same time. To compare the results from GO FEAT with other tools, we performed the functional annotation in six different scenarios: a random sequence with 500 bp from Escherichia coli; the full genome of Escherichia coli K-12 MG1655 (4140 CDS and average size of 321 bp) [RefSeq NC_000913.3]; the full genome of Drosophila melanogaster BDGP6 (30482 CDS and CDS average size of 668 bp) [Assembly GCA_000001215.4]; the full genome of Nostoc sp. PCC 7107 (5237 CDS and CDS average size of 330 bp) [RefSeq NC_019676.1]; the transcriptomic data from E. coli response to five different perturbations (4092 CDS and CDS average size of 326 bp);[19] and the transcriptomic data from M. tuberculosis response to macrophages (4076 CDS and CDS average size of 332 bp)[20]. The results of this comparison are shown in the next section.

Results

Interface

GO FEAT was developed to be executed in any modern internet browser. Also, it has a clean and easy-to-use graphic interface. It’s not required any kind of installation of any tool or software and users can execute projects without previous registration.

Project manager

GO FEAT provides a project manager to facilitate the categorization of each analysis performed by registered users. In the project manager, it’s possible to check the project’s progress, export data to several formats and share projects to other people users to avoid running the same project multiple times.

Reports, charts and graphs

GO FEAT allows different ways for result visualization: spreadsheet reports present sequences in tables corresponding to its Blast result, which are integrated to several databases, to perform searches and export results; it’s also possible to view the results in graphs and charts, which are divided by molecular function, cellular component and biological process. On each one, it is possible to view all GO terms of each category together with the sequences identification. Finally, the user can view the GO terms with its acyclic graph, downloaded through the Quick GO API.

Benchmarking

For a 500 bp random sequence chosen from Escherichia coli’s genome, GO FEAT takes around 4 minutes for full functional annotation and enrichment while Blast2GO takes around 14 minutes for the same sequence. Direct Blast to NCBI website takes around 2 minutes, however, the mapping between the blast result and other databases are not automatically made. At UniProt, the function annotation and enrichment takes around 2 minutes. Since NCBI’s Blast does not perform a full functional annotation and UniProt website has limitations regarding the number of sequences, they will not be included in further analysis. For complete genomes of model organisms, GO FEAT needs around 5 hours for Escherichia coli and 30 hours for Drosophila melanogaster. For transcriptomic data, 5 hours were required to peform the functional annotation described in the Jozefczuk’s paper and 5 hours to perform the functional annotation described in Rohde’s paper. Blast2GO was unable to perform the full annotation and enrichment of any complete genome or transcriptomic data analysis in less than 10 days. For non-model organism such as Nostoc sp. PCC 7107, around 4 hours is required to finish the processing in GO FEAT. The time varies depending on server loads of the remote APIs. At full load, around 11 hours was necessary to process 10 projects, each one with 1000 different sequences from Drosophila melanogaster and CDS average size of 603 bp. Regarding functionalities, GO FEAT presents useful features in comparison to other functional annotation tools (Table 1) and rapidly process the input sequence and generates the results.
Table 1

F1) full freely available; F2) online or simple installation; F3) visual interface; F4) unlimited dataset; F5) share project and F6) export data.

ToolF1F2F3F4F5F6
Blast2GONoYesYesYesNoYes
AmiGOYesYesYesNoNoYes
GOrillaYesYesYesNoNoNo
QuickGOYesYesYesNoNoNo
NaviGOYesYesYesNoNoYes
GO FEATYesYesYesYesYesYes
F1) full freely available; F2) online or simple installation; F3) visual interface; F4) unlimited dataset; F5) share project and F6) export data.

Limitations

GO FEAT was developed to perform functional annotations on previously predicted genes, coding DNA sequences (CDS), open reading frames (ORF) or transcripts. Thus, large sequences, such as full genomes or contigs, are not suitable to be used as inputs in GO FEAT due to size limitation of alignment softwares.

Conclusions

Functional characterization of biological sequences is a required step in the analysis of biological data. GO FEAT is an annotation platform integrated with several databases which can be used for different datasets, such as: coding sequences identified after gene prediction and sequences produced after new sequence assembly of next-generation sequencing data. The user can share results with collaborators through graphic interface and can export the results to many formats. Since the tool uses API to access various databases, the annotations are based on most recent and updated data from those databases. We are committed to maintain GO FEAT for at least 2 years and expect to improve its performance as our computational infrastructure grows. For future works, we plan on adding a prediction step before the functional annotation so users can input large sequences, exporting the predicted sequences.
  20 in total

1.  Bioinformatics software for biologists in the genomics era.

Authors:  Sudhir Kumar; Joel Dudley
Journal:  Bioinformatics       Date:  2007-05-07       Impact factor: 6.937

2.  Using EMBL-EBI Services via Web Interface and Programmatically via Web Services.

Authors:  Rodrigo Lopez; Andrew Cowley; Weizhong Li; Hamish McWilliam
Journal:  Curr Protoc Bioinformatics       Date:  2014-12-12

Review 3.  Coming of age: ten years of next-generation sequencing technologies.

Authors:  Sara Goodwin; John D McPherson; W Richard McCombie
Journal:  Nat Rev Genet       Date:  2016-05-17       Impact factor: 53.242

4.  Pfam: clans, web tools and services.

Authors:  Robert D Finn; Jaina Mistry; Benjamin Schuster-Böckler; Sam Griffiths-Jones; Volker Hollich; Timo Lassmann; Simon Moxon; Mhairi Marshall; Ajay Khanna; Richard Durbin; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

5.  NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

Authors:  Qing Wei; Ishita K Khan; Ziyun Ding; Satwica Yerneni; Daisuke Kihara
Journal:  BMC Bioinformatics       Date:  2017-03-20       Impact factor: 3.169

6.  UniProt: the universal protein knowledgebase.

Authors: 
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

7.  QuickGO: a web-based tool for Gene Ontology searching.

Authors:  David Binns; Emily Dimmer; Rachael Huntley; Daniel Barrell; Claire O'Donovan; Rolf Apweiler
Journal:  Bioinformatics       Date:  2009-09-10       Impact factor: 6.937

8.  GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.

Authors:  Eran Eden; Roy Navon; Israel Steinfeld; Doron Lipson; Zohar Yakhini
Journal:  BMC Bioinformatics       Date:  2009-02-03       Impact factor: 3.169

9.  AmiGO: online access to ontology and annotation data.

Authors:  Seth Carbon; Amelia Ireland; Christopher J Mungall; ShengQiang Shu; Brad Marshall; Suzanna Lewis
Journal:  Bioinformatics       Date:  2008-11-25       Impact factor: 6.937

10.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).

Authors:  Ross Overbeek; Robert Olson; Gordon D Pusch; Gary J Olsen; James J Davis; Terry Disz; Robert A Edwards; Svetlana Gerdes; Bruce Parrello; Maulik Shukla; Veronika Vonstein; Alice R Wattam; Fangfang Xia; Rick Stevens
Journal:  Nucleic Acids Res       Date:  2013-11-29       Impact factor: 16.971

View more
  29 in total

1.  Mutation load at a mimicry supergene sheds new light on the evolution of inversion polymorphisms.

Authors:  Paul Jay; Mathieu Chouteau; Annabel Whibley; Héloïse Bastide; Hugues Parrinello; Violaine Llaurens; Mathieu Joron
Journal:  Nat Genet       Date:  2021-01-25       Impact factor: 38.330

2.  Stenotrophomonas maltophilia Differential Gene Expression in Synthetic Cystic Fibrosis Sputum Reveals Shared and Cystic Fibrosis Strain-Specific Responses to the Sputum Environment.

Authors:  Graham G Willsey; Korin Eckstrom; Annette E LaBauve; Lauren A Hinkel; Kristin Schutz; Robert J Meagher; John J LiPuma; Matthew J Wargo
Journal:  J Bacteriol       Date:  2019-07-10       Impact factor: 3.490

3.  Functional annotation of hypothetical proteins from the Exiguobacterium antarcticum strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance.

Authors:  Wana Lailan Oliveira da Costa; Carlos Leonardo de Aragão Araújo; Larissa Maranhão Dias; Lino César de Sousa Pereira; Jorianne Thyeska Castro Alves; Fabrício Almeida Araújo; Edson Luiz Folador; Isabel Henriques; Artur Silva; Adriana Ribeiro Carneiro Folador
Journal:  PLoS One       Date:  2018-06-25       Impact factor: 3.240

4.  Greater genetic and regulatory plasticity of retained duplicates in Epichloë endophytic fungi.

Authors:  Baojun Wu; Murray P Cox
Journal:  Mol Ecol       Date:  2019-11-06       Impact factor: 6.185

5.  Novel Target Exploration from Hypothetical Proteins of Klebsiella pneumoniae MGH 78578 Reveals a Protein Involved in Host-Pathogen Interaction.

Authors:  G Pranavathiyani; Jyoti Prava; Athira C Rajeev; Archana Pan
Journal:  Front Cell Infect Microbiol       Date:  2020-04-03       Impact factor: 5.293

6.  Characteristics of the AT-Hook Motif Containing Nuclear Localized (AHL) Genes in Carrot Provides Insight into Their Role in Plant Growth and Storage Root Development.

Authors:  Gabriela Machaj; Dariusz Grzebelus
Journal:  Genes (Basel)       Date:  2021-05-18       Impact factor: 4.096

7.  Mantis: flexible and consensus-driven genome annotation.

Authors:  Pedro Queirós; Francesco Delogu; Oskar Hickl; Patrick May; Paul Wilmes
Journal:  Gigascience       Date:  2021-06-02       Impact factor: 6.524

8.  Draft genome sequence of Trametes villosa (Sw.) Kreisel CCMB561, a tropical white-rot Basidiomycota from the semiarid region of Brazil.

Authors:  Dalila Souza Santos Ferreira; Rodrigo Bentes Kato; Fábio Malcher Miranda; Kenny da Costa Pinheiro; Paula Luize Camargos Fonseca; Luiz Marcelo Ribeiro Tomé; Aline Bruna Martins Vaz; Fernanda Badotti; Rommel Thiago Jucá Ramos; Bertram Brenig; Vasco Ariston de Carvalho Azevedo; Raquel Guimarães Benevides; Aristóteles Góes-Neto
Journal:  Data Brief       Date:  2018-04-25

9.  Co-Expression Networks for Causal Gene Identification Based on RNA-Seq Data of Corynebacterium pseudotuberculosis.

Authors:  Edian F Franco; Pratip Rana; Ana Lidia Queiroz Cavalcante; Artur Luiz da Silva; Anne Cybelle Pinto Gomide; Adriana R Carneiro Folador; Vasco Azevedo; Preetam Ghosh; Rommel T J Ramos
Journal:  Genes (Basel)       Date:  2020-07-14       Impact factor: 4.096

10.  Characterization of an Environmental Multidrug-Resistant Acinetobacter seifertii and Comparative Genomic Analysis Reveals Co-occurrence of Antimicrobial Resistance and Metal Tolerance Determinants.

Authors:  João Pedro Rueda Furlan; Otávio Guilherme Gonçalves de Almeida; Elaine Cristina Pereira De Martinis; Eliana Guedes Stehling
Journal:  Front Microbiol       Date:  2019-09-18       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.