Literature DB >> 23093601

DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes.

Feng Gao1, Hao Luo, Chun-Ting Zhang.   

Abstract

Replication of chromosomes is one of the central events in the cell cycle. Chromosome replication begins at specific sites, called origins of replication (oriCs), for all three domains of life. However, the origins of replication still remain unknown in a considerably large number of bacterial and archaeal genomes completely sequenced so far. The availability of increasing complete bacterial and archaeal genomes has created challenges and opportunities for identification of their oriCs in silico, as well as in vivo. Based on the Z-curve theory, we have developed a web-based system Ori-Finder to predict oriCs in bacterial genomes with high accuracy and reliability by taking advantage of comparative genomics, and the predicted oriC regions have been organized into an online database DoriC, which is publicly available at http://tubic.tju.edu.cn/doric/ since 2007. Five years after we constructed DoriC, the database has significant advances over the number of bacterial genomes, increasing about 4-fold. Additionally, oriC regions in archaeal genomes identified by in vivo experiments, as well as in silico analyses, have also been added to the database. Consequently, the latest release of DoriC contains oriCs for >1500 bacterial genomes and 81 archaeal genomes, respectively.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23093601      PMCID: PMC3531139          DOI: 10.1093/nar/gks990

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The identification of replication origins will be helpful to reveal the regulatory mechanisms of the initiation step in DNA replication (1,2) and discover new broad-spectrum antibacterial drugs (3). Based on the Z-curve theory (4), we have developed a web-based system Ori-Finder for finding oriCs in bacterial genomes with high accuracy and reliability (5), and the predicted oriC regions in bacterial genomes have been organized into an online database DoriC (6). Based on the database, putative origins of replication in Sorangium cellulosum, Microcystis aeruginosa (7) and Cyanothece 51142 (8), which could not be determined by using standard GC skew, have been identified by taking advantage of comparative genomics. The application of the proposed oriC selection criteria and the comparison of different cyanobacterial strains may also gain insight into the replication origins in other cyanobacteria (9). As the database was constructed in 2007, we noticed that the replication origins of Anabaena sp. PCC 7120 (10), Cytophaga hutchinsonii ATCC 33406 (11) and Synechococcus elongatus PCC 7942 (12) have been confirmed by experiments, which are all consistent with our predictions in DoriC. Because of continuous updates, our database has been widely used in the comparative genomics analysis. For example, as a source of data, DoriC has been used in the study of the relationship between the functionality of essential genes and gene strand bias in bacterial genomes (13), in the analysis of nucleotide compositional asymmetry between the leading and lagging strands of bacterial genomes (14), in the investigation of the association between growth-related traits and minimal generation times (15), in an algorithm for prediction of putative essential and core-essential genes in Mycoplasma genomes (16), in the research on coordination of spatiotemporal gene expression during the bacterial growth cycle (17) and in the study of the variation in terms of the percentage of leading strand genes across different bacteria (18), etc. It is expected that the new release of the database, DoriC 5.0, will promote the study of oriCs in both bacteria and archaea.

DATABASE UPDATES

In the current release, the database has been significantly improved compared with the initial release, and the main advances include (i) inclusion of oriCs in more bacterial genomes that increased from 435 to 1528; (ii) inclusion of oriCs in 81 archaeal genomes; (iii) inclusion of detailed information about repeats in oriCs identified by REPuter program (19); and (iv) addition of URLs that link to NCBI Map Viewer (20) or UCSC Archaeal Genome Browser (21), which are useful to explore and discover the conserved features around the oriC region. Consequently, the latest release of DoriC contains oriCs for >1500 bacterial genomes and 81 archaeal genomes, which can be accessed from http://tubic.tju.edu.cn/doric/.

DATABASE DESCRIPTION

Replication origins in bacteria

To identify oriC regions of unannotated bacterial genomes, we have developed a web-based system, Ori-Finder, based on an integrated method comprising gene identification, analysis of base composition asymmetry using the Z-curve method, distribution of DnaA boxes, occurrence of genes frequently close to oriCs and phylogenetic relationships. Consequently, the predicted oriC regions have been organized into an online database, DoriC. Based on DoriC, the relationships between the conserved features associated with the oriC regions, such as adjacent genes, DnaA boxes, etc., and the taxonomic levels of the corresponding bacteria have been summarized. For example, detailed analyses have shown that the consensus sequence of the DnaA boxes in oriC regions and the distribution of genes around oriCs are strongly conserved among the bacteria in the phylum cyanobacteria (7,8). The feature that the oriC is adjacent to dnaN gene, which encodes the beta clamp processivity factor, has been found to be universal among the bacteria within the phylum cyanobacteria, and the ‘species-specific’ DnaA box motif for the phylum cyanobacteria is ‘TTTTCCACA’ instead of ‘TTATCCACA’, the DnaA box motif of Escherichia coli. These strongly conserved features indicate that the in silico identified oriCs are reliable, as they have been confirmed by comparative genomics approaches. This observation also shows that if the oriC for one of the bacteria in the phylum cyanobacteria is confirmed experimentally, the oriCs for the other bacterial genomes in this phylum may be confirmed simultaneously. As we expected, the experimentally confirmed replication origins of Anabaena sp. PCC 7120 (10) and S. elongatus PCC 7942 (12) in the phylum cyanobacteria are all adjacent to the dnaN gene, which encodes the beta clamp processivity factor. Therefore, the proposed rules may be helpful to predict the oriC regions for some bacteria without complete genomes in the phylum cyanobacteria. In addition, the application of the proposed rules derived from DoriC would speedup the experimental confirmation and functional analysis of oriCs in bacterial genomes. Because of the rapid growth in the number of sequenced bacterial genomes, the replication origins for those unsubmitted to GenBank or not deposited in DoriC temporarily can be predicted by Ori-Finder firstly, which now has been used to analyze ∼30 newly sequenced bacterial genomes.

Replication origins in archaea

The Z-curve analysis has been used to identify one replication origin in the genomes of Methanocaldococcus jannaschii (22) and Methanosarcina mazei (23), two replication origins in the Halobacterium species NRC-1 genome (24), which have been confirmed by in vivo experiments (25,26) and three replication origins in the Sulfolobus solfataricus P2 genome (24), which have been later confirmed experimentally (27,28). Here, we collected the information of oriCs provided in the literature, such as the oriC sequences, origin recognition boxes (ORB) motifs, uncharacterized motif sequences, etc., which were identified by in vivo experiments (25–34), as well as in silico analysis (4,22–24,35). In addition, we also predicted some new replication origins by Z-curve method, with the aid of homologous sequence search against the known replication origins, analysis of ORB motifs and repeats, cdc6 gene location, etc. Consequently, oriC regions in 81 archaeal genomes identified by in vivo experiments, as well as in silico analyses, have been added to our database. The number of oriCs in archaea is correlated with the phylogeny, which has been summarized in detail in the ‘Introduction’ section of the (34). Based on our results in DoriC, it shows that there is one replication origin in the genomes within the order Methanococcales (11 genomes) and within the class Thermococci (12 genomes), and three replication origins in Sulfolobus species (13 genomes). Our results and the Z-curves also show that the archaea within the Crenarchaeota phylum contain multiple origins, although some origins could not be determined at the sequence level currently. For example, Pyrobaculum calidifontis has been experimentally characterized to contain four replication origins, which is the highest number detected in a prokaryotic organism (34). However, only one origin can be determined at the sequence level (34). During the course of the prediction, we found that the location of some putative replication initiator gene besides cdc6 gene can be helpful to the oriC prediction in some cases. For example, in the genome of M. jannaschii, an ORF (MJ0774), annotated as a hypothetical protein, is a distant homolog of the Cdc6 protein in fact (22). The name Mc-pRIP for the putative replication initiator protein in Methanococcales has been used here for MJ0774 and related proteins to distinguish it from bona fide orthologous Cdc6. We also found the genes, which encode Mc-pRIP in other 10 genomes within the order Methanococcales (Methanococcus aeolicus Nankai-3, Methanocaldococcus fervens AG86, Methanococcus maripaludis C5, M. maripaludis C6, M. maripaludis C7, M. maripaludis S2, M. maripaludis X1, Methanococcus vannielii SB, Methanococcus voltae A3 and Methanocaldococcus vulcanius M7), were annotated as ‘LysR family protein’, ‘regulatory protein ArsR’, ‘MarR family transcriptional regulator’, etc. Based on the locations of these genes, the oriCs in the aforementioned genomes were predicted reliably, which contains almost all the features of known replication origins in archaeal genomes. URLs that link to NCBI Map Viewer or UCSC Archaeal Genome Browser (if available) are also provided, which will be useful to explore and discover the conserved features around the oriC region. With the availability of an increasing number of archaeal genomes, the prediction will be more accurate and reliable, as the ORB elements or genes frequently close to oriCs can also be analyzed by comparative genomics, and new rules for replication origins in archaeal genomes will also be extracted in the future with the continuous update of DoriC. Here, motif-based sequence analysis tools, the multiple EM for motif elicitation (MEME) Suite (36), have been used to discover motifs in the replication origins of closely related species, e.g. the archaea from the order Thermococcales. Consequently, ORB motifs and some new uncharacterized motif sequences have been found by the MEME Suite and are also included in the database.

CONCLUSION

With the increased availability of completely sequenced bacterial and archaeal genomes and experimental evidence, the database will become more useful because of including more information. The application of the rules from the database will be helpful to develop new prediction algorithms of replication origins and speedup the experimental confirmation and functional analysis of oriCs in bacterial or archaeal genomes. Systematic and functional analysis of oriC regions in bacteria and archaeal genomes will also be useful for the construction of the minimum genome and regulation of growth rate and generation time of bacteria and archaea, which play a key role in the emerging field of synthetic biology. DoriC will be updated periodically to include more entries, and to integrate more information for each entry. We also welcome any feedback or corrections to help us improve the database.

FUNDING

The National Natural Science Foundation of China [31171238, 30800642 and 10747150]. Funding for open access charge: The National Natural Science Foundation of China [31171238]. Conflict of interest statement. None declared.
  35 in total

1.  Functionality of essential genes drives gene strand-bias in bacterial genomes.

Authors:  Yan Lin; Feng Gao; Chun-Ting Zhang
Journal:  Biochem Biophys Res Commun       Date:  2010-04-24       Impact factor: 3.575

2.  DoriC: a database of oriC regions in bacterial genomes.

Authors:  Feng Gao; Chun-Ting Zhang
Journal:  Bioinformatics       Date:  2007-05-12       Impact factor: 6.937

3.  Origins of replication in Cyanothece 51142.

Authors:  Feng Gao; Chun-Ting Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2008-12-30       Impact factor: 11.205

Review 4.  Regulation of the replication cycle: conserved and diverse regulatory systems for DnaA and oriC.

Authors:  Tsutomu Katayama; Shogo Ozaki; Kenji Keyamura; Kazuyuki Fujimitsu
Journal:  Nat Rev Microbiol       Date:  2010-03       Impact factor: 60.633

5.  Multiple replication origins of Halobacterium sp. strain NRC-1: properties of the conserved orc7-dependent oriC1.

Authors:  James A Coker; Priya DasSarma; Melinda Capes; Tammitia Wallace; Karen McGarrity; Rachael Gessler; Jingfang Liu; Hua Xiang; Roman Tatusov; Brian R Berquist; Shiladitya DasSarma
Journal:  J Bacteriol       Date:  2009-06-05       Impact factor: 3.490

6.  Origins of replication in Sorangium cellulosum and Microcystis aeruginosa.

Authors:  Feng Gao; Chun-Ting Zhang
Journal:  DNA Res       Date:  2008-05-12       Impact factor: 4.458

7.  The systemic imprint of growth and its uses in ecological (meta)genomics.

Authors:  Sara Vieira-Silva; Eduardo P C Rocha
Journal:  PLoS Genet       Date:  2010-01-15       Impact factor: 5.917

8.  MEME SUITE: tools for motif discovery and searching.

Authors:  Timothy L Bailey; Mikael Boden; Fabian A Buske; Martin Frith; Charles E Grant; Luca Clementi; Jingyuan Ren; Wilfred W Li; William S Noble
Journal:  Nucleic Acids Res       Date:  2009-05-20       Impact factor: 16.971

9.  Genetic and physical mapping of DNA replication origins in Haloferax volcanii.

Authors:  Cédric Norais; Michelle Hawkins; Amber L Hartman; Jonathan A Eisen; Hannu Myllykallio; Thorsten Allers
Journal:  PLoS Genet       Date:  2007-04-05       Impact factor: 5.917

10.  Ori-Finder: a web-based system for finding oriCs in unannotated bacterial genomes.

Authors:  Feng Gao; Chun-Ting Zhang
Journal:  BMC Bioinformatics       Date:  2008-02-01       Impact factor: 3.169

View more
  83 in total

1.  Exploring the roles of DNA methylation in the metal-reducing bacterium Shewanella oneidensis MR-1.

Authors:  Matthew L Bendall; Khai Luong; Kelly M Wetmore; Matthew Blow; Jonas Korlach; Adam Deutschbauer; Rex R Malmstrom
Journal:  J Bacteriol       Date:  2013-08-30       Impact factor: 3.490

2.  The Arg Fingers of Key DnaA Protomers Are Oriented Inward within the Replication Origin oriC and Stimulate DnaA Subcomplexes in the Initiation Complex.

Authors:  Yasunori Noguchi; Yukari Sakiyama; Hironori Kawakami; Tsutomu Katayama
Journal:  J Biol Chem       Date:  2015-06-30       Impact factor: 5.157

Review 3.  Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics.

Authors:  Anukriti Sharma; Rup Lal
Journal:  Indian J Microbiol       Date:  2016-11-11       Impact factor: 2.461

4.  Background Mutational Features of the Radiation-Resistant Bacterium Deinococcus radiodurans.

Authors:  Hongan Long; Sibel Kucukyildirim; Way Sung; Emily Williams; Heewook Lee; Matthew Ackerman; Thomas G Doak; Haixu Tang; Michael Lynch
Journal:  Mol Biol Evol       Date:  2015-05-14       Impact factor: 16.240

5.  YMC-2011, a Temperate Phage of Streptococcus salivarius 57.I.

Authors:  Wen-Chun Chou; Szu-Chuan Huang; Cheng-Hsun Chiu; Yi-Ywan M Chen
Journal:  Appl Environ Microbiol       Date:  2017-03-02       Impact factor: 4.792

6.  Transcription-coupled DNA supercoiling dictates the chromosomal arrangement of bacterial genes.

Authors:  Patrick Sobetzko
Journal:  Nucleic Acids Res       Date:  2016-01-17       Impact factor: 16.971

7.  Measurement of bacterial replication rates in microbial communities.

Authors:  Christopher T Brown; Matthew R Olm; Brian C Thomas; Jillian F Banfield
Journal:  Nat Biotechnol       Date:  2016-11-07       Impact factor: 54.908

8.  Mycobacterium tuberculosis oriC sequestration by MtrA response regulator.

Authors:  Gorla Purushotham; Krishna B Sarva; Ewelina Blaszczyk; Malini Rajagopalan; Murty V Madiraju
Journal:  Mol Microbiol       Date:  2015-08-31       Impact factor: 3.501

9.  Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples.

Authors:  Tal Korem; David Zeevi; Jotham Suez; Adina Weinberger; Tali Avnit-Sagi; Maya Pompan-Lotan; Elad Matot; Ghil Jona; Alon Harmelin; Nadav Cohen; Alexandra Sirota-Madi; Christoph A Thaiss; Meirav Pevsner-Fischer; Rotem Sorek; Ramnik Xavier; Eran Elinav; Eran Segal
Journal:  Science       Date:  2015-07-30       Impact factor: 47.728

10.  Non-Random Inversion Landscapes in Prokaryotic Genomes Are Shaped by Heterogeneous Selection Pressures.

Authors:  Jelena Repar; Tobias Warnecke
Journal:  Mol Biol Evol       Date:  2017-08-01       Impact factor: 16.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.