Literature DB >> 22121216

OriDB, the DNA replication origin database updated and extended.

Cheuk C Siow¹, Sian R Nieduszynska, Carolin A Müller, Conrad A Nieduszynski.

Abstract

OriDB (http://www.oridb.org/) is a database containing collated genome-wide mapping studies of confirmed and predicted replication origin sites. The original database collated and curated Saccharomyces cerevisiae origin mapping studies. Here, we report that the OriDB database and web site have been revamped to improve user accessibility to curated data sets, to greatly increase the number of curated origin mapping studies, and to include the collation of replication origin sites in the fission yeast Schizosaccharomyces pombe. The revised database structure underlies these improvements and will facilitate further expansion in the future. The updated OriDB for S. cerevisiae is available at http://cerevisiae.oridb.org/ and for S. pombe at http://pombe.oridb.org/.

Entities: Chemical

Mesh：

Substances：
DNA, Fungal

Year: 2011 PMID： 22121216 PMCID： PMC3245157 DOI： 10.1093/nar/gkr1091

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Complete, accurate replication of the genome is crucial for life. Chromosomes must be precisely copied exactly once, a process that takes place during S phase. To complete DNA replication within S phase, replication of eukaryotic genomes is initiated at multiple discrete chromosomal sites called replication origins. Appropriate distribution of the origin sites is important to ensure that every sequence is replicated. However, not every origin site is used in every cell cycle; that is replication origins differ in their efficiency. Furthermore, origins activate at characteristic times during S phase, with some origins activating early in S phase and others later. Replication origins are best characterized in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. In both organisms, origin sequences have been isolated through their ability to support plasmid replication (called Autonomously Replicating Sequences or ARS) (1,2). Chromosomal origin activity has been assayed using two-dimensional (2D) gel electrophoresis to detect replication intermediates in both S. cerevisiae and S. pombe (3,4). Saccharomyces cerevisiae origins contain an essential sequence element called the ARS consensus sequence (ACS) (5). In contrast, S. pombe origins feature AT-rich sequences, but no specific sequence motif (6). Origin sites in both yeasts are bound by the Origin Recognition Complex (ORC), which in turn recruits Cdc6 and Cdt1 to load Mcm2-7 double hexamers and form a pre-replication complex (pre-RC). Assembly of the pre-RC ‘licenses’ the origin for activation in the subsequent S phase. Saccharomyces cerevisiae ORC binds to the ACS, however the ACS alone is not sufficient for origin function. Indeed, there are approximately 12 000 matches to the ACS in the genome, but only approximately 500 of these are functional replication origins. Consequently, there must be additional mechanisms to specify replication origin sites. These are thought to include transcription that ablates origin function (7,8), chromatin structure that can aid ORC recruitment (9,10) and secondary sequence motifs (11,12). The S. pombe Orc4 protein contains AT-hook domains that recognize and bind AT-rich origin sequences (13). The high AT content of S. pombe replication origins has allowed their identification, genome wide, as AT-rich islands (14). Genome-wide approaches to identify and characterize replication origin locations rely on detecting either the origin-associated proteins or the DNA synthesis at active origin sites. Chromatin-immunoprecipitation (ChIP) of ORC and/or MCM proteins have been used to isolate origin sites (15–17). In S. cerevisiae, this has been combined with motif searches or phylogenetic footprinting to predict the location of the ACS (5,9,18). Active replication origins have been identified as local points of the earliest replicating sequence in genome-wide measures of when each sequence in the yeast genome replicates (19–22). Origin sites have also been identified as sites of BrdU incorporation or accumulation of single-stranded DNA when cells are challenged with hydroxyurea (16,23,24). Previously, we collated the proposed location of S. cerevisiae origin sites from the available genome-wide mapping studies and presented the results in a web-accessible database, OriDB (25). This collated data set has facilitated comparisons with a range of other chromosomal features including transcription (26), genomic rearrangements (27) and fragile sites (28,29). Furthermore, the comprehensive origin data sets and the underlying data have permitted mathematical approaches to investigate genome replication (30–32). Now we present a major update to OriDB. We have completely restructured the underlying database tables to enable the incorporation of many additional data sets, improvements in user access to the raw data and expansion to a second model system, S. pombe.

RESULTS

Revised database structure

The original release of OriDB implemented a simple, but limited table structure. The majority of data was stored in a single table. This has made updating the database time consuming and has risked the introduction of errors. The rapid growth in replication origin studies necessitated a complete restructuring of the underlying database. We have replaced the original OriDB database tables with a large number of non-redundant tables with defined relationships (Figure 1). Four primary tables define the database and the relationship between all the tables. First, the table ‘sc_ori’ contains the list of confirmed, likely and dubious origin sites collated from published studies as described previously (25). Second, the table ‘sc_ori_studies’ lists the studies that have published lists of origin locations. Third, the table ‘sc_repl_data’ lists the studies for which OriDB has stored the experimental data from which origin predictions have been made. Fourth, the table ‘sc_elements_studies’ lists the studies that have proposed origin sequence elements. Each of these primary tables defines the relationship with further tables that store the data from each of these studies. These tables of published data are from genome-wide studies and are supplemented with additional tables that have been collated by OriDB from the literature: origin sites confirmed by 2D gel electrophoresis (sc_2D_gel), origin sites confirmed by ARS assay (cloned_ori) and confirmed origin sequence elements (sc_confirmed_ACS).

Figure 1.

Four primary tables define the database structure for S. cerevisiae OriDB. (Left-hand side) The table ‘sc_ori_studies’ describes the curated studies that have reported replication origin sites, each of which is represented by a further table. (Middle) The table ‘sc_repl_data’ describes additional tables that contain the experimental data from origin mapping studies. (Right-hand side) The table ‘sc_elements_studies’ describes the curated studies that reported sequence elements at replication origins; each of these studies is represented by a table. (Bottom) The table ‘sc_ori’ contains the collated list of all reported replication origin sites from those studies listed in ‘sc_ori_studies’. Finally, each table is linked to the appropriate PubMed record in a locally stored table (‘local_pubmed’) that retrieves data directly from PubMed. All collated data sets and chromosomal coordinates are presented relative to the 1 October 2003 release of the S. cerevisiae genome (referred to as sacCer1 at the UCSC genome browser) (33,34). To convert between the various sequence releases, we have used the liftOver tool from UCSC (35) with custom generated parameter (over.chain) files. Members of the yeast community can use this tool through a web interface at: http://www.nieduszynski.org/liftover/. The restructuring of the OriDB database tables necessitated a complete re-writing of the web pages. The resulting changes offer a number of significant benefits for users. The origin details pages now load all tabs concurrently, but only display the user-selected tab; this allows rapid switching between tabs. Furthermore, the new data structures allow improved user access to the underlying data, making it straightforward to include many additional origin mapping data sets and allow for the expansion of OriDB to include the fission yeast S. pombe. These pages are available at http://cerevisiae.oridb.org/ and http://pombe.oridb.org/ (with backup sites available at http://www.nottingham.ac.uk/plzcnlab/oridb/cerevisiae/ and at http://www.nottingham.ac.uk/plzcnlab/oridb/pombe/).

Improved user access to data

The most frequent user request is to retrieve data from an OriDB curated study in a user-specified format. The new database structure, described above, allows us to implement a straightforward yet powerful route to the underlying data. A ‘download’ link present in the top bar of every OriDB page allows access to all the datasets curated at OriDB, including those tables curated from the literature. Data sets are grouped first by the data type (e.g. predictions of origin location) and then by the original study. Access to the underlying data is also available from links on the pages that summarize the findings of individual studies. The user has a choice of appropriate formats for downloading the data (including the raw data in a tab or comma separated format, BED or WIG formats for display in genome browsers, and FASTA for sequence download). These pages and links are generated from the underlying database tables and therefore will automatically update to include new studies, as they are included in OriDB.

Expanded data coverage

The original OriDB database collated four genome-wide (microarray) data sets (15,19,20,23) and our phylogenetic footprinting of origin sequence elements (5). The availability of high-resolution microarrays (18) and more recently, deep-sequencing technologies (9) have led to a large increase in the number of studies proposing origin locations. The new database structure has allowed us to integrate many additional data sets, so that at the time of writing, S. cerevisiae OriDB includes 10 genome-wide data sets and has the capability to include an effectively unlimited number in the future. The data from these studies are presented to the user through the details page for each origin. As in the previous version of OriDB, the details page includes an ‘Origin Location Assignments’ tab which lists all the studies that identify the particular origin [as described previously this is based upon the proposed resolution of the study in question (25)]. The ‘Origin Location Assignments’ tab also has the capability to display additional information from each study for each origin location. For example, a recent study mapped the activity of origins in different mutant cells subjected to the drug hydroxyurea (24); OriDB includes the details of which mutants the origin was reported to be active in.

Collation of S. pombe replication origin sites

The mapping of replication origin sites in S. pombe has drawn on a similar range of experimental techniques as used in S. cerevisiae, including ARS assays and 2D gels. Although S. pombe replication origins do not contain a discrete sequence motif for ORC recruitment, the replication origins have a characteristic AT composition, called AT islands. The computational identification of AT islands allowed the accurate predication of replication origin sites throughout the S. pombe genome (14). More recently, genome-wide studies have employed microarray technologies to identify origins based upon the location of pre-RC proteins, newly synthesized DNA (16), the increase in DNA copy number as a sequence replicates (21) or the single-stranded DNA that accumulates at stalled replication forks (23). Each of these studies produced a genome-wide list of replication origin sites. To facilitate access to these data sets and allow comparison between them, we generated a single collated list of replication origin sites presented through a web-accessible database, which includes text and graphic representations of the data (Figure 2). The independent studies that identified S. pombe replication origins have used a range of naming conventions that have resulted in different names being assigned to the same origin. To consolidate replication origin naming in S. pombe, we have assigned each S. pombe replication origin site a systematic name based upon the chromosome number (in roman numerals) and the chromosomal coordinate. Hence the origin on chromosome 1 at 3060 kb is named ori-I-3060 [other names for this origin are ars1119 (16), ori1095 (21), AT1098 (14) and ars766 (1)]. Our collated S. pombe replication origin data is presented relative to the current genome sequence, downloaded on 1 October 2011 (36). The S. pombe replication origin database can be accessed at http://pombe.oridb.org/.

Figure 2.

Screen shot from S. pombe OriDB showing the Origin Summary Graphic tab for ori-I-3060. A window of the S. pombe genome is shown centred upon the origin of interest. (Top) the gene structure is shown (‘mouse over’ displays the name of the each gene). (Main plot) Vertical bars show the replication origin sites (black for confirmed; dark grey for likely; light grey for dubious). Blue and green bars illustrate the location of signal from ChIP of Mcm6 and Orc1, respectively (16). The red curve gives the increase in DNA content during DNA replication in the presence of hydroxyurea (21). The blue curve shows the accumulation of single-stranded DNA during DNA replication in Δcds1 cells exposed to hydroxyurea (23).

DISCUSSION

In the era of high-throughput genome-wide data generation, it is essential that the scientific community can access the data and the conclusions drawn from these data. For replication origin mapping studies, this means access to microarray (or deep sequencing) data and the inferred origin locations. OriDB aims to provide access to exactly these data types, presenting them through a user-friendly interface. In this update, we improve user access to the underlying data (now available for download), extend the number of studies collated and for the first time collate origin sites from S. pombe.

FUNDING

The Royal Society, The University of Nottingham and the Biotechnology and Biological Sciences Research Council (grant numbers BB/E023754/1, BB/G001596/1); David Phillips Fellowship (to C.A.N.). Funding for open access charge: Biotechnology and Biological Sciences Research Council. Conflict of interest statement. None declared.

36 in total

1. Genome-wide distribution of DNA replication origins at A+T-rich islands in Schizosaccharomyces pombe.

Authors: Mónica Segurado; Alberto de Luis; Francisco Antequera
Journal: EMBO Rep Date: 2003-10-17 Impact factor: 8.807

2. Isolation and characterisation of a yeast chromosomal replicator.

Authors: D T Stinchcomb; K Struhl; R W Davis
Journal: Nature Date: 1979-11-01 Impact factor: 49.962

3. Replication dynamics of the yeast genome.

Authors: M K Raghuraman; E A Winzeler; D Collingwood; S Hunt; L Wodicka; A Conway; D J Lockhart; R W Davis; B J Brewer; W L Fangman
Journal: Science Date: 2001-10-05 Impact factor: 47.728

4. GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome.

Authors: Matthew D Sekedat; David Fenyö; Richard S Rogers; Alan J Tackett; John D Aitchison; Brian T Chait
Journal: Mol Syst Biol Date: 2010-03-09 Impact factor: 11.429

5. The origin recognition complex interacts with a subset of metabolic genes tightly linked to origins of replication.

Authors: Erika Shor; Christopher L Warren; Joshua Tietjen; Zhonggang Hou; Ulrika Müller; Ilaria Alborelli; Florence H Gohard; Adrian I Yemm; Lev Borisov; James R Broach; Michael Weinreich; Conrad A Nieduszynski; Aseem Z Ansari; Catherine A Fox
Journal: PLoS Genet Date: 2009-12-04 Impact factor: 5.917

6. Modeling genome-wide replication kinetics reveals a mechanism for regulation of replication timing.

Authors: Scott Cheng-Hsin Yang; Nicholas Rhind; John Bechhoefer
Journal: Mol Syst Biol Date: 2010-08-24 Impact factor: 11.429

7. The UCSC Genome Browser database: update 2011.

Authors: Pauline A Fujita; Brooke Rhead; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Melissa S Cline; Mary Goldman; Galt P Barber; Hiram Clawson; Antonio Coelho; Mark Diekhans; Timothy R Dreszer; Belinda M Giardine; Rachel A Harte; Jennifer Hillman-Jackson; Fan Hsu; Vanessa Kirkup; Robert M Kuhn; Katrina Learned; Chin H Li; Laurence R Meyer; Andy Pohl; Brian J Raney; Kate R Rosenbloom; Kayla E Smith; David Haussler; W James Kent
Journal: Nucleic Acids Res Date: 2010-10-18 Impact factor: 16.971

8. The requirement of yeast replication origins for pre-replication complex proteins is modulated by transcription.

Authors: Conrad A Nieduszynski; J Julian Blow; Anne D Donaldson
Journal: Nucleic Acids Res Date: 2005-04-28 Impact factor: 16.971

9. Global effects of DNA replication and DNA replication origin activity on eukaryotic gene expression.

Authors: Larsson Omberg; Joel R Meyerson; Kayta Kobayashi; Lucy S Drury; John F X Diffley; Orly Alter
Journal: Mol Syst Biol Date: 2009-10-13 Impact factor: 11.429

10. Genetic analysis of an ARS element from the fission yeast Schizosaccharomyces pombe.

Authors: R K Clyne; T J Kelly
Journal: EMBO J Date: 1995-12-15 Impact factor: 11.598

75 in total

1. O Cdc7 kinase where art thou?

Authors: Robert A Sclafani; Jay R Hesselberth
Journal: Curr Genet Date: 2017-11-13 Impact factor: 3.886

2. Inevitability and containment of replication errors for eukaryotic genome lengths spanning megabase to gigabase.

Authors: Mohammed Al Mamun; Luca Albergante; Alberto Moreno; James T Carrington; J Julian Blow; Timothy J Newman
Journal: Proc Natl Acad Sci U S A Date: 2016-09-14 Impact factor: 11.205

3. Ribose-seq: global mapping of ribonucleotides embedded in genomic DNA.

Authors: Kyung Duk Koh; Sathya Balachander; Jay R Hesselberth; Francesca Storici
Journal: Nat Methods Date: 2015-01-26 Impact factor: 28.547

4. Bayesian inference of origin firing time distributions, origin interference and licencing probabilities from Next Generation Sequencing data.

Authors: Alina Bazarova; Conrad A Nieduszynski; Ildem Akerman; Nigel J Burroughs
Journal: Nucleic Acids Res Date: 2019-03-18 Impact factor: 16.971

5. The dynamics of eukaryotic replication initiation: origin specificity, licensing, and firing at the single-molecule level.

Authors: Daniel Duzdevich; Megan D Warner; Simina Ticau; Nikola A Ivica; Stephen P Bell; Eric C Greene
Journal: Mol Cell Date: 2015-04-23 Impact factor: 17.970

6. AID and Reactive Oxygen Species Can Induce DNA Breaks within Human Chromosomal Translocation Fragile Zones.

Authors: Nicholas R Pannunzio; Michael R Lieber
Journal: Mol Cell Date: 2017-12-07 Impact factor: 17.970

7. DNA copy-number measurement of genome replication dynamics by high-throughput sequencing: the sort-seq, sync-seq and MFA-seq family.

Authors: Dzmitry G Batrakou; Carolin A Müller; Rosemary H C Wilson; Conrad A Nieduszynski
Journal: Nat Protoc Date: 2020-02-12 Impact factor: 13.491