Literature DB >> 19846593

REBASE--a database for DNA restriction and modification: enzymes, genes and genomes.

Richard J Roberts¹, Tamas Vincze, Janos Posfai, Dana Macelis.

Abstract

REBASE is a comprehensive database of information about restriction enzymes, DNA methyltransferases and related proteins involved in the biological process of restriction-modification (R-M). It contains fully referenced information about recognition and cleavage sites, isoschizomers, neoschizomers, commercial availability, methylation sensitivity, crystal and sequence data. Experimentally characterized homing endonucleases are also included. The fastest growing segment of REBASE contains the putative R-M systems found in the sequence databases. Comprehensive descriptions of the R-M content of all fully sequenced genomes are available including summary schematics. The contents of REBASE may be browsed from the web (http://rebase.neb.com) and selected compilations can be downloaded by ftp (ftp.neb.com). Additionally, monthly updates can be requested via email.

Entities: Gene

Mesh：

Substances：

Year: 2009 PMID： 19846593 PMCID： PMC2808884 DOI： 10.1093/nar/gkp874

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

OVERVIEW

The previous description of REBASE in the 2007 NAR Database Issue (1) described 3805 biochemically or genetically characterized restriction–modification (R–M) systems and included an analysis of approximately 400 bacterial and archaeal genomes that had been deposited in the RefSeq Database of GenBank (2,3). Analysis of the available sequence information in GenBank led to the prediction of 2709 restriction enzyme (R) genes and 4485 DNA methyltransferase (M) genes. These numbers have now risen to 4990 R genes and 8080 M genes of which 3511 R and 5497 M genes have arisen from the 1050 completely sequenced bacterial and archaeal genomes. These putative R–M system genes are given systematic names according to the agreed upon nomenclature rules (4). The names all carry the suffix ‘P’ to indicate their putative status. In many cases, the recognition specificity of these systems can be assigned with some degree of confidence because of their similarity to biochemically well-characterized enzymes. The REBASE web site (http://rebase.neb.com) summarizes all information known about every restriction enzyme and any associated proteins. This includes the recognition sequences, cleavage sites, source, commercial availability, sequence data, crystal structure information, isoschizomers and methylation sensitivity. Within the reference section of REBASE, links are maintained to the full text of all papers whenever they are readily available on the web. Also, there is extensive reciprocal cross-referencing between REBASE and NCBI, including links to GenBank and PubMed and NCBI’s LinkOut utility. Links to other major databases such as UniProt (5), PDB (6) and Pfam (7) are also maintained. There are currently 3945 biochemically or genetically characterized restriction enzymes in REBASE and of the 3834 Type II restriction enzymes, 299 distinct specificities are known. Six hundred and forty one restriction enzymes are commercially available, including 235 distinct specificities. As shown in Figure 1, the rate of discovery of new putative restriction and modification genes is rising rapidly. In contrast, the rate at which candidates are being characterized biochemically has actually dropped to the level it was three decades ago. Nevertheless, because of the large number of sequenced examples of biochemically characterized restriction systems, the putative recognition sequences of predicted restriction enzymes and DNA methyltransferases can be inferred. Currently, all new sequences entering GenBank are checked using data mining techniques for the presence of R–M systems and, following extensive manual checking, the resulting inferences are all included within REBASE where they are clearly marked as predictions. When analyzing DNA sequence data, it is the DNA methyltransferase genes that are the more reliable indicators of an R–M system and the presence, proper order and characteristic spacing of well-conserved motifs that are used to suggest likely candidates.

Figure 1.

The graph shows the numbers of R–M systems entering REBASE since its inception in 1975. The open bars show systems that have been characterized either biochemically or genetically. The black bars show the increasing accumulation of potential R–M systems that have been found by bioinformatic analysis of sequences in GenBank. The surge in 2004 represents the addition of metagenomic sequences from the Sargasso Sea collecting expedition (9). It should be noted that at the present time it is not possible to distinguish DNA methyltransferases reliably enough to be completely confident in the assignments. Some RNA and protein methyltransferases can sometimes be confused for DNA methyltransferases as is widely reflected by the annotations found in GenBank files. In general, REBASE takes a liberal approach and includes all likely candidates until it becomes clear that non-DNA methyltransferases have been included erroneously and then these are culled from the database. The more widely divergent genes that encode the restriction enzymes always reside close to the genes for their cognate methyltransferases, but often they cannot be recognized directly because they are a rapidly evolving set of genes and frequently lack any sequence similarity to any other genes in GenBank. However, other methods can sometimes be used to infer their presence such as the analysis of shotgun sequence data from which missing clones can be inferred to be caused by the presence of active restriction enzyme genes (8). Given the wealth of experimental data, both published and unpublished, contained within REBASE, it can be an especially valuable resource during the annotation of bacterial and archaeal genomes. With the plethora of restriction systems that occur in all sequenced microbial genomes, annotators are encouraged to use the resources of the REBASE database or to contact the REBASE staff if help is needed. Custom analyses of unpublished genome sequence data are carried out upon request. From the REBASE web site users have a variety of resources available that facilitate the analysis of sequence information including tools for analyzing sequences (REBASE tools) that allow restriction enzyme recognition sites to be found in submitted sequences (NEBcutter) and an implementation of BLAST to allow searching against all sequences in REBASE. Specialty lists of sequence data (REBASE lists) such as all known Type II restriction enzyme genes, all known Type I specificity subunit genes, etc., are available for download. The coming year will see some major additions to REBASE in terms of new sequence acquisitions, such as the inclusion of all metagenomic sequence data (only partially analyzed to date) and a tool to permit users to perform their own analysis of newly sequenced genomes.

FUNDING

National Library of Medicine (LM04971); New England Biolabs, Inc. Funding for open access charge: New England Biolabs; National Institutes of Health grant. Conflict of interest statement. None declared.

9 in total

1. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes.

Authors: Richard J Roberts; Marlene Belfort; Timothy Bestor; Ashok S Bhagwat; Thomas A Bickle; Jurate Bitinaite; Robert M Blumenthal; Sergey Kh Degtyarev; David T F Dryden; Kevin Dybvig; Keith Firman; Elizaveta S Gromova; Richard I Gumport; Stephen E Halford; Stanley Hattman; Joseph Heitman; David P Hornby; Arvydas Janulaitis; Albert Jeltsch; Jytte Josephsen; Antal Kiss; Todd R Klaenhammer; Ichizo Kobayashi; Huimin Kong; Detlev H Krüger; Sanford Lacks; Martin G Marinus; Michiko Miyahara; Richard D Morgan; Noreen E Murray; Valakunja Nagaraja; Andrzej Piekarowicz; Alfred Pingoud; Elisabeth Raleigh; Desirazu N Rao; Norbert Reich; Vladimir E Repin; Eric U Selker; Pang-Chui Shaw; Daniel C Stein; Barry L Stoddard; Waclaw Szybalski; Thomas A Trautner; James L Van Etten; Jorge M B Vitor; Geoffrey G Wilson; Shuang-yong Xu
Journal: Nucleic Acids Res Date: 2003-04-01 Impact factor: 16.971

2. Environmental genome shotgun sequencing of the Sargasso Sea.

Authors: J Craig Venter; Karin Remington; John F Heidelberg; Aaron L Halpern; Doug Rusch; Jonathan A Eisen; Dongying Wu; Ian Paulsen; Karen E Nelson; William Nelson; Derrick E Fouts; Samuel Levy; Anthony H Knap; Michael W Lomas; Ken Nealson; Owen White; Jeremy Peterson; Jeff Hoffman; Rachel Parsons; Holly Baden-Tillson; Cynthia Pfannkoch; Yu-Hui Rogers; Hamilton O Smith
Journal: Science Date: 2004-03-04 Impact factor: 47.728

3. The RCSB PDB information portal for structural genomics.

Authors: Andrei Kouranov; Lei Xie; Joanna de la Cruz; Li Chen; John Westbrook; Philip E Bourne; Helen M Berman
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

4. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors: Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal: Nucleic Acids Res Date: 2006-11-27 Impact factor: 16.971

5. REBASE--enzymes and genes for DNA restriction and modification.

Authors: Richard J Roberts; Tamas Vincze; Janos Posfai; Dana Macelis
Journal: Nucleic Acids Res Date: 2007-01 Impact factor: 16.971

6. Using shotgun sequence data to find active restriction enzyme genes.

Authors: Yu Zheng; Janos Posfai; Richard D Morgan; Tamas Vincze; Richard J Roberts
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971

7. GenBank.

Authors: Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; Eric W Sayers
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

8. The Universal Protein Resource (UniProt) 2009.

Authors:
Journal: Nucleic Acids Res Date: 2008-10-04 Impact factor: 16.971

9. The Pfam protein families database.

Authors: Robert D Finn; John Tate; Jaina Mistry; Penny C Coggill; Stephen John Sammut; Hans-Rudolf Hotz; Goran Ceric; Kristoffer Forslund; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman
Journal: Nucleic Acids Res Date: 2007-11-26 Impact factor: 16.971

9 in total

257 in total

1. The Need for Speed: Run-On Oligomer Filament Formation Provides Maximum Speed with Maximum Sequestration of Activity.

Authors: Claudia J Barahona; L Emilia Basantes; Kassidy J Tompkins; Desirae M Heitman; Barbara I Chukwu; Juan Sanchez; Jonathan L Sanchez; Niloofar Ghadirian; Chad K Park; N C Horton
Journal: J Virol Date: 2019-02-19 Impact factor: 5.103

2. A vast collection of microbial genes that are toxic to bacteria.

Authors: Aya Kimelman; Asaf Levy; Hila Sberro; Shahar Kidron; Azita Leavitt; Gil Amitai; Deborah R Yoder-Himes; Omri Wurtzel; Yiwen Zhu; Edward M Rubin; Rotem Sorek
Journal: Genome Res Date: 2012-02-01 Impact factor: 9.043

3. Exploring the roles of DNA methylation in the metal-reducing bacterium Shewanella oneidensis MR-1.

Authors: Matthew L Bendall; Khai Luong; Kelly M Wetmore; Matthew Blow; Jonas Korlach; Adam Deutschbauer; Rex R Malmstrom
Journal: J Bacteriol Date: 2013-08-30 Impact factor: 3.490

4. Convergence of DNA methylation and phosphorothioation epigenetics in bacterial genomes.

Authors: Chao Chen; Lianrong Wang; Si Chen; Xiaolin Wu; Meijia Gu; Xi Chen; Susu Jiang; Yunfu Wang; Zixin Deng; Peter C Dedon; Shi Chen
Journal: Proc Natl Acad Sci U S A Date: 2017-04-11 Impact factor: 11.205

5. Bci528I, a new isoschizomer of EcoRI isolated from Bacillus circulans 528.

Authors: Sung-Ryong Ra; Myong-Suk Kim; Chon-Il Paek; Yong-Chol Pak; Song-Hui Pak; Hyong-Bom Pak; Kum-Chol Ri
Journal: Folia Microbiol (Praha) Date: 2019-04-01 Impact factor: 2.099

Review 6. The phage-host arms race: shaping the evolution of microbes.

Authors: Adi Stern; Rotem Sorek
Journal: Bioessays Date: 2011-01 Impact factor: 4.345

7. Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle.

Authors: Jennifer B Kozdon; Michael D Melfi; Khai Luong; Tyson A Clark; Matthew Boitano; Susana Wang; Bo Zhou; Diego Gonzalez; Justine Collier; Stephen W Turner; Jonas Korlach; Lucy Shapiro; Harley H McAdams
Journal: Proc Natl Acad Sci U S A Date: 2013-11-11 Impact factor: 11.205