Literature DB >> 27508123

Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

Vinod Kumar Singh¹, Annangarachari Krishnamachari¹.

Abstract

Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

Entities: Chemical Disease Gene Species

Keywords: Autonomous replicating sequences; Computational methods; Origin of replication; Saccharomyces cerevisiae

Year: 2016 PMID： 27508123 PMCID： PMC4971157 DOI： 10.1016/j.gdata.2016.07.005

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Introduction

In a typical DNA replication process, each parent DNA is duplicated precisely and entirely before cell division takes place. DNA unwinding and loading of replication machinery happens at specific sites called origin of replication (ORI) or replication sites. ORI differs by number, architecture, and replication proteins among the three domains of life: the Bacteria, the Eukaryotes and the Archaea [1]. Identification of ORI plays an important role to understand the replication behavior and its coordinated processes during cell division. Most bacterial genomes are circular and have single ORI site. Computational methods employed to detect these origin sites exploit the property of asymmetrically biased nucleotide composition, and this strategy is found to be successful in bacterial genomes [2], [3], [4], [5], [6]. Replication origin firing plays a significant role in cell cycle regulation. Hence locating these sites may be useful in drug development process to combat diseases caused by bacteria, virus, and parasites [4]. Eukaryotic chromosomes are linear and have multiple ORIs which fire in a stochastic manner and carry out the complete duplication of the large genome within a limited period [7], [8]. Thus, ORIs prediction in eukaryotes is more challenging and complicated compared to prokaryotes [9] and hence generally used skew based methods are not suitable. Structure and location of replication origins and their functions are more or less well characterized in the eukaryotic model organism Saccharomyces cerevisiae [10], [11], [12]. Origin activity in yeast depends on cis-acting replicator sequence called autonomous replication sequences (ARS). ARS contains an essential 11 bp ACS (ARS consensus sequences) element and three non-essential elements i.e. B1, B2, and B3. Additionally, ACS and B1 are recognized as the binding sites for origin recognition complex (ORC) proteins, and their motif is well defined. B2 is probably DNA unwinding element (DUE) or helicase loading site, and B3 is transcription factor ARS-binding factor 1 (Abflp) binding site that acts as a replication enhancer element [13], [14], [15]. S. cerevisiae genome comprises > 12,000 potential ACS sites and approximately 400 origins [16]. A match to the ACS sequence pattern is essential for replicating sites, but it is not sufficient for functional origin. This indicates that presence of some additional unknown functional sequences, chromatin and conformation structures [17]. A study on ORC-ACS and nrACS data show significant differences in nucleosome occupancy status in their flanking regions [12]. Moreover, ORC-ACS are flanked by asymmetric well-positioned nucleosomes on both sides [12]. Nucleosome-free region (NFR) of ORIs are permissive sites for multiprotein assemblies, i.e., ORC like protein that play critical role in regulating key DNA-templated processes [18]. A study on yeast has also shown that chromatin structure and nucleotides flanking both sides of ACS + B1 play a critical role in origin efficiency along with ARS activity [19]. Carrying out research on DNA replication process is an active area of investigation and it can be divided into three broad categories: 1) replication mechanism and ORC binding sites [19], [20], [21], 2) hidden intrinsic characterization of ARS at sequence level [22] and their location in genome [23] and 3) physical properties of DNA like cleavage intensity and DNA bending [24]. Collective study on all these areas, is useful for understanding the regulatory mechanisms of replication process. The present study focus on chosen set of replicating and non-replicating sites using in silico analysis. A recent study on yeast replication sites by Li et al. focused on some nucleotide compositional and DNA conformational properties of ORI regions for their computational prediction experiments [25]. Hence, we have considered the sequence context around ORC-ACS and nrACS motif and its possible influence on conformation and stability of DNA. Our present study attempts to address the issue using two well demarcated datasets of replicating and non-replicating sequences having the ACS like motifs. This may shed light on sequence make of the said sequences.

Materials and methods

Datasets

Two published data sets of S. cerevisiae genome were used in this study: 1). ORC-ACS data: ACSs to which ORC binds and show replication activity, and 2) nrACS: ACSs to which ORC doesn't bind and are also found not close to any known replication sites [12]. The original dataset consists of 251 ORC-ACS and 251 nrACS coordinates (genomic locations), and we have not included sequence segments from the chromosome ends (of size 10 kb). Thus we are left with 225 ORC-ACS and 230 nrACS cordinates from the original data. For the sequence based study, 10 kb flanking DNA sequence on both sides of ACS were extracted. The majority of ARS sequences have length ranging from 56 to 2000 bp and ACS motif lie within this segment. Hence, we considered 1000 bp flanking region on both sides for the study and the context is considered. Moreover, the flanking region may contain information and contribute to signal buildup, i.e., the order of sequence makeup may play a role in the protein binding process. The annotation file, nucleosome occupancy signal and ORC binding data were obtained from GEO database (GSE16926) and genome data for S. cerevisiae were downloaded from Saccharomyces Genome Database SGD [26]. The values of dinucleotide properties of interest are taken from the DiProDB [27]. The consensus motif of ORC-ACS and nrACS are more or less the same but the sequence makeup of the flanking region are different [12].

Jensen-Shannon divergence (JSD)

Genomic DNA data can be studied using information theoretic measures such as Shannon entropy and relative entropy for finding conserved protein binding sites in DNA and other genomic features [28], [29]. JSD is one of the information theoretic based symmetric measures that captures the divergence between any two given probability distributions [28]. This measure exploits the compositional bias in symbolic DNA data and has been applied to distinguish different genomic regions or segments [28], [30]. We have introduced a novel way of computing the JSD from two multiple aligned sets. The purpose of this measure in the present study is to see the divergence pattern arising from the two datasets mentioned earlier. The adopted procedure to compute JSD is as follows, Centre (ACS) aligned two datasets of ACS sequences were taken together as one set and JSD measure is computed in a sliding overlapping window fashion, but the two required probability distributions are calculated from the ORC-ACS and nrACS datasets independently. Let two datasets namely r and s, are centrally aligned sequences w.r.t. ACS and the number of sequences having a length (L) are denoted by n and n respectively. For the chosen window of size w, Shannon entropy between the nucleotide distribution of each column of ORC-ACS (r) and nrACS (s) sequences is computed separately, averaged over the window size, and subtracted from combined entropy of the of all columns of the chosen window from the two datasets together (Fig. 1A). JSD can be calculated as,

Fig. 1

A. Scheme for JSD calculation

JSD for of kth window of two datasets r and s is calculated. The window is slided by 1 bp, and the whole process is repeated. Positions marked as positive or negative with respect to ACS.

B. Jensen-Shannon divergence between ORC-ACS and nrACS

Y1-axis represents the JSD (dark black) between ORC-ACS and nrACS sequences with respect to ACS location (x-axis). Y2-axis represents nucleosome occupancy signal (gray) along ORC-ACS.

Where, . Where H(x) is Shannon-entropy for probability distribution of x, p(i) and p(i) is the probability of event i (or nucleotides' in our case) in jth column of kth overlapping window of size w in centrally aligned sequences of given dataset r and s respectively. JSD at a given location k measures the heterogeneity between two probability distributions. JSD = 0 if only if X = X i.e., lower bound is for identical distribution. JSD is plotted against the positions, to view the divergence embedded in the datasets. Higher value of JSD at a given location k indicates the region of interest and taken up for further study.

Average sequence measure (ASM)

Measures such as correlation, AT skew or GC skew have been used to capture desired genomic features like ORI, repeats, etc. [2]. Correlation-based methods are also used for sequence alignment, detection of local similarities in DNA and ORI detection in prokaryotes [5], [31], [32]. Based on these studies, we introduced the Average sequence measure (AS) for the sliding overlapping window of size w and at kth position of a given set of n centrally aligned sequences having equal length L was calculated as: Where, S is the value of chosen sequence measure for the kth window of size w in jth sequence. In simple words, a window is selected from the aligned sequences of ORC-ACS dataset, and finally value of the signal is computed. The window is then slided by 1 bp and computation gets repeated till the last possible window. The S value for three type of sequence measures were considered in this study and calculated as follows: Correlation measure: Steps for computing correlation value for DNA bases. First, DNA sequence of the chosen window is converted to four numeric strings. For example, base A is replaced with 1, while all other bases i.e., T, G, and C are replaced by − 1. In an identical fashion, similar numeric strings were generated for other bases. Finally, the correlation was calculated for each of these numeric strings as described in Shah et al. [5]. A GC-Skew measure of the desired window sequence is defined as [33]. Where f and f are frequencies of bases guanine, and cytosine in the chosen sequences. AT-skew calculation is similar to GC-skew calculation i.e. replace f by f and f by f in Eq. (4).

Average DNA structure conformation and thermodynamic profile

Protein binding sites in replication and promoter regions of the genome are less flexible (i.e. more rigid) and it helps in their recognition by the regulatory proteins [24], [34]. These segments also show higher free energy compared to other parts of the genome that favor DNA unwinding [35]. These properties have been widely used for prediction of protein binding sites. The average profile value (APV) calculated using a sliding overlapping window of size w at location k in a set of n sequences of equal length L bps can be calculated as: Where, PV(N, N) is the value of given dinucleotide property at the ith location for jth sequence.

Software tools used

We have used MEME software suite to identify the embedded motifs in the datasets. MEME utilizes expectation maximization algorithm [36] for discovering novel recurring (ungapped and fixed length) motifs in the given set of sequences [37]. MAST tool was used to scan the putative sites. Positional conservations of the discovered motif were represented in graphical form [38]. For statistical comparison of medians of two independent distributions, MATLAB non-parametric Wilcoxon-ranksum test (Mann-Whitney test) was used. MATLAB codes were written to compute JSD, GC-skew and correlation measure.

Results and discussion

The primary objective of the study was to characterize the sequences of S. cerevisiae which have ACS like motif and why some of them bound to ORC proteins while others don't. This aspect was studied making use of experimentally verified and published datasets. Aforementioned, ORC-ACS and nrACS showed similarity regarding their motif, but there exists a difference when we map the nucleosome occupancy data [12]. This work is focused on computational based study of compositional, thermodynamic and structural properties of these said sequences and to understand how replicating machinery recognizes ORC-ACS over nrACS. For this study, sequences around each ACS datasets are processed so as to display ARS oriented in the same direction followed by their central (ACS) alignment. It is to be noted that ACS ends are located at position zero. Desired properties were calculated in an overlapping sliding window fashion at each bp step for all given sequences and averaged over the sequences of the respective class.

Divergence in the nucleotide distribution of ORC-ACS and nrACS sequences within nucleosome-free region (NFR)

Despite the similarity in sequences, ORC-ACS and nrACS differ in nucleosome occupancy around ACS matches. Nucleosomes are conserved, well positioned and asymmetric in ORC-ACS [12]. Some regions of DNA sequences have a relatively higher affinity for nucleosomes [39], [40]. Hence, we hypothesized that distribution of bases in ORC-ACS should differ from nrACS sequences at conserved nucleosome positions. With the aim of finding diverged regions at nucleosomes positions of the given two sequence datasets, JSD was calculated (see Methods). Nucleosome wraps approximately 147 bps of DNA. Hence, sliding window equal to half of its size i.e., 75 bps was considered for study [12]. The overall divergence between the distribution of bases at nucleosome locations were not observed, but a marked divergence was seen around + 60 position between mentioned datasets (Fig. 1B). Such a pattern of divergence might have occurred due to some conserved signal at this location of ORC-ACS sequences. This prompted us to study further a few sequence-based features for the identified region mentioned above.

Autocorrelation values (A and T bases) of ORC-ACS shows abrupt rise and fall within NFR

Analysis was further explored using nucleotide-based autocorrelation measure. This measure considers spatial locations of a particular base in a chosen genomic fragment and provides a clue about the nature of embedded signals within sequences [5]. Highly correlated sequence will give unity while the random will give zero for this measure. Correlation strengths of all four numeric strings for each of the sequences were computed independently for a sliding window of size 75 with step size 1 bp and the ensemble average was considered. For the nucleotides A and T base correlation strength for ORC-ACS sequences fall within NFR and shows a sudden change at around + 60 position when compared to nrACS (Fig. 2) (Fig. S1). This clearly indicates that this region has some spatial correlation or hidden pattern of A or T base in ORC-ACS. This type of abrupt change in correlation strength due to T-rich ACS motif and A-rich B2 elements. These elements have been reported in prior studies [41]. Results are in conformation with the divergence pattern observed in the previous section.

Fig. 2

Correlation strength of ORC-ACS and nrACS sequences

Y1-axis represents correlation strength of ‘A’ for ORC-ACS (blue) and nrACS (red) sequences using windows of size 75 bp and step size 1 bp. Y2-axis represents nucleosome occupancy signal (green) along ORC-ACS sequences. X-axis represents position relative to ACS.

Asymmetry in distribution of nucleotide bases

GC skew is a well-known computational measure for detecting replication origin sites in bacterial genomes [33]. To find out whether a GC-skew like pattern is present in eukaryotic replication sites, the average GC-skew analysis was done on ORC-ACS sequences and nrACS (using Eqs. (3), (4)). The average GC-skew pattern of ORC-ACS shifts its polarity from positive to negative on moving from upstream of NFR to downstream (Fig. 3A), whereas in the case of nrACS no such distinct change was observed (Fig. 3B).

Fig. 3

Compositional skew plots

Y1-axis represents (A) GC-Skew along ORC-ACS (B) GC-Skew along nrACS (C) AT-Skew along ORC-ACS (D) AT-Skew along nrACS sequences using sliding window of size 75 bp and step size 1 bp. Y2–axis represents nucleosome signal (gray) of corresponding sequences. X-axis represents position relative to ACS.

In bacterial genomes, GC-skew shows a transition in polarity at replicating or terminus sites due to the preference of G over C in the leading strand of DNA. Thus, one can easily say that replicating nature of ORC-ACS region may have caused this asymmetry and in S. cerevisiae replication mechanism [2], [33], [42] may be similar to that of bacteria in the same fashion for each origin sites. This study finds two interesting observations: Both ORC-ACS and nrACS datasets showed significantly higher GC skew signal on upstream of NFR w.r.t to downstream with p-value < 10− 247 and p-value < 10− 165 (Wilcoxon rank sum test) respectively. In summary, Compositional bias on both sides of NFR was seen in both datasets, but the transition of GC-skew polarity on both sides of NFR was only visible in ORC-ACS. Significantly high value of GC-skew at upstream of nrACS suggests that nrACS matches may be evolving as replicating ACS or have very low chances of ORC binding to them. Hence there is a possibility that nrACS sites could not be detected as ORC-ACS sites in any of the experiment carried out till date. The GC-skew value goes back to nearly zero around 1000 bp away from ORC-ACS whereas average inter ORC-ACS distance in yeast was about 40 kb (Fig. S2). Influence of mutational effect due to replication slowly reduces with distance and need not be stretched to 20 kb. The cause of this observation may be due to combined effect of variability in the location of fork-convergence sites, the stochastic firing of ARS, the asynchronous departure of two forks from the ARS, and the difference in rates of fork progression [7], [42], [43]. In the case of AT-skew plot, only ORC-ACS shows transition behavior within NFR (Fig. 3C, and D). It is interesting to note that, AT-polarity transition was only visible within NFR. This finding was slightly different to an earlier study done by Aiger [42] on the genic and intergenic sequences of leading and lagging strand. They reported transition in AT-polarity on moving from upstream replication termination region of ARS to downstream replication terminal region [42]. However, this study shows that property exists only within NFR because of A and T rich regions. This might have occurred due to consistency in the signal attained by aligning ARS sequences by their ACS of the same orientation. This is clear from the analysis of correlation and skew measure, that the context of the sequence makeup near the vicinity of NFR is important and may play a significant role in the replication process.

Motif discovery and their localization within NFR

Analysis of sequences in this study indicated the presence of A-rich pattern around + 50 location. Earlier work by Eaton et al. [41] reported that motif downstream of ACS is A-rich but did not figure out the pattern. Hence, the downstream region i.e. + 20 to + 180 (within NFR) of ORC-ACS was searched for any conserved motif using MEME suit [37]. The study discovered an A-rich motif (in 88 out of 225 ORC-ACS sequences) (Fig. S3A). PSSM of the discovered motif was used to search similar motifs in downstream of nrACS sequences i.e. between + 20 to + 180 using MAST suit with p-value < 10− 4 and E-value < 10 [44]. The frequency distribution of discovered motif shows that downstream region of ORC-ACS has a higher number of discovered motifs up to + 60 bp when compared to nrACS (Fig. S3B & S3C, where 1 bin represents 5 bps). The frequency plot of the previously discovered B2 motif (5′-ANWWAAAT-3′) [19] (Fig. S4) matches with frequency plot of motif discovered in this study. A-rich motifs discovered in both studies shows abundance up to + 65 in ORC-ACS. The length of the newly discovered motif is bit longer compared to the previously known B2 motif, and the chances of random occurrence of a long motif in a given DNA sequence are very low. B2 elements are not present in all ARS but play a significant role in enhancing ARS efficiency and DNA unwinding [45] by loading MCM2 helicase or possibly act as a second site for ORC binding [46], [47].

Variability in DNA structural and thermodynamical properties within region of interest in ORC-ACS sequences

DNA replication process starts with unwinding of DNA double helix and the thermal stability of the double helix is contributed by Watson and Crick base pairing and base stacking [48]. Base pairing such as GC and AT content plays a significant role in DNA helix stability of replication regions [49]. Base stacking is the stacking of one base over the other in a DNA single strand. Stacking energy is energy required to destack two consecutive stacking bases in a single strand. Hence to examine the effect of base stacking energy at replication sites, average base stacking energy profile for the above datasets, based on Eq. (5), was calculated using dinucleotide stacking values obtained from DiProDB database [27], [50]. ORC-ACS data shows lower stacking base energy compared to nrACS, when considered within NFR (Fig. 4A & B). This implies that NFR region needs less energy to destack bases of a single strand. Furthermore, earlier published study also reported the destacking of bases play a major role in DNA unwinding during replication process [51], and our findings were in conformity with that study.

Fig. 4

Stacking energy and conformational variation along ORC-ACS and nrACS sequences

Y1-axis represents blue curve represents stacking energy of (A) ORC-ACS (B) nr-ACS sequences, helical twist angle of (C) ORC-ACS (D) nr-ACS sequences, inclination angle of (E) ORC-ACS (F) nr-ACS sequences, Y2-axis represents (Green curve) nucleosome signal. X-axis represents position relative to ACS.

Stacking energy also influences the conformational structure of DNA, e.g., twist angle between stacking base pairs that cause helical repeats in DNA. Helical twist angle showed higher value within NFR of ORC-ACS as compared to nrACS data (methods Eq. (5); Fig. 4C & D). Such a correlation between base stacking energy and the helical twist angle was also observed by Swart and Cooper et al. [52], [53]. ORC-ACS in the NFR shows higher twist angle in the ORC binding site compared to the neighboring region. nrACS within NFR has helical twist angle similar to its neighboring region. This unique feature, i.e., higher twist angle may have a role in DNA-protein interaction during replication process [49]. The orientation of a base pair on either overall or a local helix axis is known as base Inclination angle; a major factor in determining DNA structure. ORC-ACS data shows (Fig. 4E & F) switching of inclination from + ve to − ve within NFR. In the DNA structure, this implies a sudden change from the right-hand rotation to left-hand rotation about a vector from the helical axis towards the major groove [54] within ORC-ACS NFR. This signifies the role of A-rich region in shaping DNA structure for ORC binding. In this study, we figured out A-rich B2 elements are abundantly present within NFR of ORC-ACS. Our analysis suggests a new possible role of B2 element on DNA conformation and thermodynamics that favors protein binding to these sites and DNA unwinding in the NFR of ORC-ACS. Previous studies also showed the effect of AT-tract of B2 element on the free energy contribution for DNA unwinding [45]. Junction of poly(A) tracts and mixed base sequence also plays a role in influencing DNA structural properties and nucleosome organization [55], [56].

Distribution of ACS in the non-coding segment of the genome

In the human genome, the replicating regions are surrounded by abundant genes and replication fork progression is co-oriented with transcription [57]. Most of these replicating sites are not randomly distributed. They instead overlap with promoter regions [58]. To examine the relationship between transcription segments and ACS sites, three types of intergenic regions namely tandem, divergent and convergent were extracted from SacCer3 annotation file. Distribution of ORC-ACS and nrACS in above mentioned intergenic regions were analyzed and tabulated (see Table 1.)

Table 1

Distribution of ACS matches within three types of intergenic regions.

Intergenic region	ACS
Intergenic region	ORC-ACS (225)	nrACS (230)
Tandem (one promoter)3064 (48.4% of total )	98(3.2% of tandem)	73(2.4% of tandem)
Convergent (no promoter)1671 (26.4% of total)	66(3.9% of convergent)	60(3.5% of convergent)
Divergent (two promoter)1593 (25.2% of total)	45(2.8% of divergent)	33(2.1% of divergent)
Total6328	209/6328(3.3% of intergenic)	166/6328(2.6% of intergenic)

Note: Convergent intergenic regions for both datasets have a high density of ACS within them followed by tandem and convergent.

The median of the intergenic width of divergent, tandem, and convergent are 1305, 753 and 403 bp respectively for ORC-ACS data while for nrACS data median of intergenic widths are 749, 469, and 282 bp respectively (Fig. 5A). The analysis of Table 1 and Fig. 5A gives two observations: 1) the convergent intergenic regions have the highest percentage of ACS (i.e. 3.9% for ORC-ACS and 3.5% for nrACS) and have the lowest median of intergenic width (p-values < 10− 5 for ORC-ACS and p-values < 0.018 for nrACS, Wilcoxon rank sum right tail test) followed by tandem and divergent, 2) the width of intergenic regions embedding ORC-ACS is significantly broader as compared to nrACS sites (p-value < 10–3, Wilcoxon rank sum test). It seems convergent intergenic region is highly preferred followed by tandem and divergent. This analysis also validates the possible relation between transcription and replication [58].

Fig. 5

Distribution of intergenic regions containing ACS

(A) Boxplot of width (in base pairs) of three types of intergenic regions embedding ORC-ACS and nrACS: RTI - ORC-ACS within the tandem intergenic region, nRTI - nrACS within the tandem intergenic region. RCI - ORC-ACS within the convergent intergenic region, nRCI - nrACS within the convergent intergenic region. RDI - ORC-ACS within the divergent intergenic region, nRDI - nrACS within the divergent intergenic region. (B) Distance (in base pairs) of ORC-ACS and nrACS from gene start or end position: RS - Distance from ORC-ACS to nearest gene start position, RS - Distance from nrACS to nearest gene start position, RE - Distance from ORC-ACS to nearest gene end position, nRE - Distance from nrACS to nearest gene start position.

The locations of the nearest gene start site or TSS from ORC-ACS site are further as compared to nrACS (p-values < 10− 7, Wilcoxon rank sum right tail test, Fig. 5B). In our view, this may facilitate to accomplish replication proteins [59]. Hence small distance in case of nrACS may be playing a crucial role by limiting DNA accessibility for ORC to bind despite its close similarity to ORC-ACS motifs. Thus, the analysis showed that the replicating ACSs prefer to maintain sufficient distance from promoter regions where transcription machinery binds.

Conclusion

In S. cerevisiae, out of a large number of available ACS sites, replication protein complex binds only to some of them as demonstrated by many genome-wide experiments. Hence, there was a need to understand the sequence context in which these motifs are occurring. Our study focused on content and contextual analysis of ORC binding ACS and ORC free ACS sites. We could see profound changes in the conformation related features such as DNA helical twist, inclination angle and stacking energy in ORC-ACS compared to nrACS. Our analysis suggests a new possible role of A-rich B2 elements along with T-rich ACS, which may be providing structurally and thermodynamically favorable environment for ORC to bind DNA and carry forward the replication process. This study also showed that nrACS are closer to the nearest transcription start site position as compared to ORC-ACS, and this may be a limiting factor for DNA accessibility to ORC proteins. Even though both datasets have similar sequences, our study confirm the fact that context features within the nucleosome-free region differ. This may be the reason for the ORC-ACS to act as replicating sites. Our attempt is novel in considering ARS consensus sequences and its flanking region with nucleosome positioning to get contextual insights in S. cerevisiae genome. Thus, our context-based computational study is bit comprehensive in the analysis of the replication origin sequences of S. cerevisiae and may be useful for biologists and bioinformaticians to plan further studies and will be helpful to develop origin prediction tools.

Author contribution

AK conceived and designed the study. VKS performed the analysis. VKS and AK wrote the manuscript. Both the authors have read and approved the manuscript.

Conflict of interest

We declare that there is no conflict of interests for this work.

55 in total

Review 1. Making sense of eukaryotic DNA replication origins.

Authors: D M Gilbert
Journal: Science Date: 2001-10-05 Impact factor: 47.728

2. Nucleotide correlation based measure for identifying origin of replication in genomic sequences.

Authors: Kushal Shah; Annangarachari Krishnamachari
Journal: Biosystems Date: 2011-09-17 Impact factor: 1.973

3. Human gene organization driven by the coordination of replication and transcription.

Authors: Maxime Huvet; Samuel Nicolay; Marie Touchon; Benjamin Audit; Yves d'Aubenton-Carafa; Alain Arneodo; Claude Thermes
Journal: Genome Res Date: 2007-08-03 Impact factor: 9.043

Review 4. The structure and function of yeast ARS elements.

Authors: C S Newlon; J F Theis
Journal: Curr Opin Genet Dev Date: 1993-10 Impact factor: 5.578

Review 5. Time to be versatile: regulation of the replication timing program in budding yeast.

Authors: Kazumasa Yoshida; Ana Poveda; Philippe Pasero
Journal: J Mol Biol Date: 2013-09-25 Impact factor: 5.469

Review 6. Replication timing and its emergence from stochastic processes.

Authors: John Bechhoefer; Nicholas Rhind
Journal: Trends Genet Date: 2012-04-18 Impact factor: 11.639

7. Structural properties of replication origins in yeast DNA sequences.

Authors: Xiao-Qin Cao; Jia Zeng; Hong Yan
Journal: Phys Biol Date: 2008-09-29 Impact factor: 2.583

8. DiProDB: a database for dinucleotide properties.

Authors: Maik Friedel; Swetlana Nikolajewa; Jürgen Sühnel; Thomas Wilhelm
Journal: Nucleic Acids Res Date: 2008-09-19 Impact factor: 16.971

9. Saccharomyces Genome Database: the genomics resource of budding yeast.

Authors: J Michael Cherry; Eurie L Hong; Craig Amundsen; Rama Balakrishnan; Gail Binkley; Esther T Chan; Karen R Christie; Maria C Costanzo; Selina S Dwight; Stacia R Engel; Dianna G Fisk; Jodi E Hirschman; Benjamin C Hitz; Kalpana Karra; Cynthia J Krieger; Stuart R Miyasato; Rob S Nash; Julie Park; Marek S Skrzypek; Matt Simison; Shuai Weng; Edith D Wong
Journal: Nucleic Acids Res Date: 2011-11-21 Impact factor: 16.971

10. A novel method for prokaryotic promoter prediction based on DNA stability.

Authors: Aditi Kanhere; Manju Bansal
Journal: BMC Bioinformatics Date: 2005-01-05 Impact factor: 3.169