Literature DB >> 29931111

Operon-mapper: a web server for precise operon identification in bacterial and archaeal genomes.

Blanca Taboada1, Karel Estrada2, Ricardo Ciria2, Enrique Merino2.   

Abstract

Summary: Operon-mapper is a web server that accurately, easily and directly predicts the operons of any bacterial or archaeal genome sequence. The operon predictions are based on the intergenic distance of neighboring genes as well as the functional relationships of their protein-coding products. To this end, Operon-mapper finds all the ORFs within a given nucleotide sequence, along with their genomic coordinates, orthology groups and functional relationships. We believe that Operon-mapper, due to its accuracy, simplicity and speed, as well as the relevant information that it generates, will be a useful tool for annotating and characterizing genomic sequences. Availability and implementation: http://biocomputo.ibt.unam.mx/operon_mapper/.

Entities:  

Mesh:

Year:  2018        PMID: 29931111      PMCID: PMC6247939          DOI: 10.1093/bioinformatics/bty496

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

In prokaryotes, it is common for metabolically or functionally related genes to be contiguously arranged in the genome and co-transcribed in the same polycistronic messenger RNA as a part of the same operon. As operons are biologically relevant in the regulation of gene expression, we have developed one of the most accurate algorithms for operon prediction to date (Taboada ). Our method is based on an artificial neural network (ANN) in which the inputs are the intergenic distances of contiguous genes and a score that reflects the functional relationships between the protein products. Our algorithm, when tested on a set of experimentally defined operons in E.coli and B.subtilis, reached accuracies of 94.6 and 93.3%, respectively (Taboada ). Compared to other algorithms, ours showed the highest correlations with experimentally validated operons in a recent evaluation (Zaidi and Zhang, 2017). Currently, the predicted operons of model organisms can be found in various databases (Mao ; Pertea ), including ours (Taboada ). Recent advances in sequencing technologies have made it possible for nearly any research group to determine the complete genome sequence of a particular bacterium in a fast, low-cost manner. For these newly sequenced or draft genomes, there is no easy way to predict their corresponding operons. Therefore, based on our published algorithm (Taboada ), we have developed Operon-mapper, a web server tool that can accurately and easily predict the operons of any bacterial or archaeal genome sequence.

2 Overview and implementation of the Operon-mapper web server

Operon-mapper was written in Perl. It generates HTML and JavaScript code ‘on the fly’ and integrates various sequence analysis software programs (described in the Section 3) in a Linux environment. The Operon-mapper runs on a 64 core/512 Gb of RAM server under Ubuntu Linux 16.04 LTS and is available at http://biocomputo.ibt.unam.mx/operon_mapper.

3 Results

The Operon-mapper web server, developed in Perl, consists of three main stages: Data acquisition. This procedure is performed using a web page written in HTML and JavaScript. The only required input for the operon prediction process is the genomic nucleotide sequence in FASTA format; however, the ORF genomic coordinates can also be provided by the user, either in General Feature Format (GFF) or GenBank format. Sequence analysis. The analysis is divided into five different tasks. 2.1) ORF prediction uses Prokka software, which employs dynamic programming to accurately predict the 5′ and 3′ ends of all the ORFs in the given nucleotide sequence (Hyatt ; Seemann, 2014). 2.2) Homology gene assignments are determined based on Hidden Markov Models (HMMs) search using the hmmsearch program (Eddy, 2011). This HMMs search process employs a previously constructed model set that represents each of the 4873 COGs (Taboada ; Tatusov ) and 8539 Remained Orthologous Groups (ROGs) (Taboada ). 2.3) The intergenic distance evaluation is determined based on the ORF coordinates using a custom Perl program. 2.4) Operon prediction is performed with an ANN implemented in R. The network inputs of our ANN are the intergenic distance between the genes and a score that reflects the functional relationships of their corresponding protein products. These scores have been defined in the STRING database (Jensen ) or in our publication (1), and they are presented for different pairs of proteins according to their associated COG or ROG. This step represents the core process of Operon-mapper, where a confidence value is evaluated for a pair of genes that might be found in the same operon. This confidence value is normalized between 0 and 1. A value greater than 0.5 indicates that the gene pair belongs to the same operon. The confidence values with the greatest accuracies are near 0 or 1, and confidence values close to 0.5 have the lowest accuracies. 2.5) Gene function assignments are based on the most significant hit using DIAMOND (Buchfink ) against a core set of well-characterized proteins from the Uniprot Knowledgebase (Apweiler ). Delivery of results. A Perl program is used to build an HTML page where the user can choose the file or set of files with the results of the different analyses performed by Operon-mapper, including the following: i) the predicted operonic gene pairs with their corresponding confidence values for being found in the same operon; ii) a list of operons with their corresponding genes; iii) the coordinates of the predicted ORFs; iv) the DNA sequences of the predicted ORFs; v) the translated protein sequences of the predicted ORFs; vi) the homology assignments of the proteins, corresponding to their COG or ROG; vii) the functional descriptions of the proteins; viii) all the above output files at once; and ix) a compressed file with all the above output files. These results are shown on the web page once the analysis is finished and are sent to the email specified by the user. As a benchmark test, Operon-mapper was used to predict the operons of eight genomes of different sizes and nucleotide GC contents. Table 1 shows the accuracy of our predictions considering two scenarios: i) when the genomic sequence is used as the only input information, and ii) when, in addition to the nucleotide sequence, the coordinates of the genes are also provided. In these two cases, the accuracy of Operon-mapper was evaluated by comparing its predictions to experimentally determined operons; these data were recently compiled in (Zaidi and Zhang, 2017).
Table 1.

Benchmark test of Operon-mapper using genomic sequences of different sizes and GC % contents

Accuracy
OrganismsAccession numberSizeGC%NCBI ORFsPredicting ORFs
B.subtilisNC_000964421643.594.1%94.3%
C.glutamicumNC_006958328354.087.6%85.3%
E.coliNC_000913464250.894.4%94.4%
H.pyloriNC_00091166838.993.1%92.4%
L.monocytogenesNC_003210294438.091.6%90.9%
L.pneumophilaNC_006368363538.390.6%90.1%
P.profundumNC_006370640342.092.1%92.4%
S.solfataricusNC_002754299235.895.8%96.1%
Benchmark test of Operon-mapper using genomic sequences of different sizes and GC % contents

4 Conclusions

Operon-mapper is the first publicly available, web-based tool that is designed to predict operons in bacterial and archaebacterial genomes with only their genomic sequences as a required input. Operon-mapper has several strengths, including its accuracy, simplicity and speed. In addition to predicting operons, Operon-mapper also generates useful, relevant information that is common to most bacterial genome annotation projects, such as the identification of ORFs in a nucleotide sequence, the assignment of COGs to each of the encoded proteins, and functional annotations of proteins. For these reasons, we hope that Operon-mapper quickly becomes a reference tool in the field of bacterial genome annotation.
  12 in total

1.  Prokka: rapid prokaryotic genome annotation.

Authors:  Torsten Seemann
Journal:  Bioinformatics       Date:  2014-03-18       Impact factor: 6.937

Review 2.  Computational operon prediction in whole-genomes and metagenomes.

Authors:  Syed Shujaat Ali Zaidi; Xuegong Zhang
Journal:  Brief Funct Genomics       Date:  2017-07-01       Impact factor: 4.241

3.  High accuracy operon prediction method based on STRING database scores.

Authors:  Blanca Taboada; Cristina Verde; Enrique Merino
Journal:  Nucleic Acids Res       Date:  2010-04-12       Impact factor: 16.971

4.  Prodigal: prokaryotic gene recognition and translation initiation site identification.

Authors:  Doug Hyatt; Gwo-Liang Chen; Philip F Locascio; Miriam L Land; Frank W Larimer; Loren J Hauser
Journal:  BMC Bioinformatics       Date:  2010-03-08       Impact factor: 3.169

5.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

6.  ProOpDB: Prokaryotic Operon DataBase.

Authors:  Blanca Taboada; Ricardo Ciria; Cristian E Martinez-Guerrero; Enrique Merino
Journal:  Nucleic Acids Res       Date:  2011-11-16       Impact factor: 16.971

7.  STRING 8--a global view on proteins and their functional interactions in 630 organisms.

Authors:  Lars J Jensen; Michael Kuhn; Manuel Stark; Samuel Chaffron; Chris Creevey; Jean Muller; Tobias Doerks; Philippe Julien; Alexander Roth; Milan Simonovic; Peer Bork; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2008-10-21       Impact factor: 16.971

8.  DOOR: a database for prokaryotic operons.

Authors:  Fenglou Mao; Phuongan Dam; Jacky Chou; Victor Olman; Ying Xu
Journal:  Nucleic Acids Res       Date:  2008-11-06       Impact factor: 16.971

9.  OperonDB: a comprehensive database of predicted operons in microbial genomes.

Authors:  Mihaela Pertea; Kunmi Ayanbule; Megan Smedinghoff; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2008-10-23       Impact factor: 16.971

10.  The COG database: an updated version includes eukaryotes.

Authors:  Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal:  BMC Bioinformatics       Date:  2003-09-11       Impact factor: 3.169

View more
  42 in total

1.  Multi-scale architecture of archaeal chromosomes.

Authors:  Naomichi Takemata; Stephen D Bell
Journal:  Mol Cell       Date:  2020-12-30       Impact factor: 17.970

Review 2.  A computational system for identifying operons based on RNA-seq data.

Authors:  Brian Tjaden
Journal:  Methods       Date:  2019-04-04       Impact factor: 3.608

3.  Multiple Copies of flhDC in Paraburkholderia unamae Regulate Flagellar Gene Expression, Motility, and Biofilm Formation.

Authors:  Shelley N-M Thai; Michelle R Lum; Jeanine Naegle; Michael Onofre; Hassan Abdulla; Allison Garcia; Andreh Fiterz; Ashley Arnell; Thuthiri T Lwin; Aaron Kavanaugh; Zade Hikmat; Nora Garabedian; Ryan Toan Ngo; Brenda Dimaya; Adan Escamilla; Luiza Barseghyan; Maria Shibatsuji; Salma Soltani; Luke Butcher; Firas Hikmat; Dro Amirian; Artin Bazikyan; Nathan Brandt; Mary Sarkisian; Xavier Munoz; Andrew Ovakimyan; Emily Burnett; Jennifer Ngoc Pham; Ania Shirvanian; Roberto Hernandez; Maria Vardapetyan; Matthew Wada; Cuauhtemoc Ramirez; Martin Zakarian; Fabrizio Billi
Journal:  J Bacteriol       Date:  2021-09-20       Impact factor: 3.490

4.  Xanthohumol Requires the Intestinal Microbiota to Improve Glucose Metabolism in Diet-Induced Obese Mice.

Authors:  Isabelle E Logan; Natalia Shulzhenko; Thomas J Sharpton; Gerd Bobe; Kitty Liu; Stephanie Nuss; Megan L Jones; Cristobal L Miranda; Stephany Vasquez-Perez; Jamie M Pennington; Scott W Leonard; Jaewoo Choi; Wenbin Wu; Manoj Gurung; Joyce P Kim; Malcolm B Lowry; Andrey Morgun; Claudia S Maier; Jan F Stevens; Adrian F Gombart
Journal:  Mol Nutr Food Res       Date:  2021-10-12       Impact factor: 5.914

5.  Phenotype-Guided Comparative Genomics Identifies the Complete Transport Pathway of the Antimicrobial Lasso Peptide Ubonodin in Burkholderia.

Authors:  Truc Do; Alina Thokkadam; Robert Leach; A James Link
Journal:  ACS Chem Biol       Date:  2022-07-08       Impact factor: 4.634

6.  Role of metAB in Methionine Metabolism and Optimal Chicken Colonization in Campylobacter jejuni.

Authors:  Brandon Ruddell; Alan Hassall; Orhan Sahin; Qijing Zhang; Paul J Plummer; Amanda J Kreuder
Journal:  Infect Immun       Date:  2020-12-15       Impact factor: 3.441

7.  The actinobacterium Tsukamurella paurometabola has a functionally divergent arylamine N-acetyltransferase (NAT) homolog.

Authors:  Vasiliki Garefalaki; Evanthia Kontomina; Charalambos Ioannidis; Olga Savvidou; Christina Vagena-Pantoula; Maria-Giusy Papavergi; Ioannis Olbasalis; Dionysios Patriarcheas; Konstantina C Fylaktakidou; Tamás Felföldi; Károly Márialigeti; Giannoulis Fakis; Sotiria Boukouvala
Journal:  World J Microbiol Biotechnol       Date:  2019-10-31       Impact factor: 3.312

8.  The First Insight into Polyhydroxyalkanoates Accumulation in Multi-Extremophilic Rubrobacter xylanophilus and Rubrobacter spartanus.

Authors:  Xenie Kouřilová; Jana Schwarzerová; Iva Pernicová; Karel Sedlář; Kateřina Mrázová; Vladislav Krzyžánek; Jana Nebesářová; Stanislav Obruča
Journal:  Microorganisms       Date:  2021-04-24

9.  Computational analysis of LexA regulons in Proteus species.

Authors:  Yongzhong Lu; Linyue Cheng
Journal:  3 Biotech       Date:  2021-02-19       Impact factor: 2.406

10.  The carbon source-dependent pattern of antimicrobial activity and gene expression in Pseudomonas donghuensis P482.

Authors:  Marta Matuszewska; Tomasz Maciąg; Magdalena Rajewska; Aldona Wierzbicka; Sylwia Jafra
Journal:  Sci Rep       Date:  2021-05-26       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.