Literature DB >> 27563027

Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition.

Abstract

MOTIVATION: Nucleosome positioning plays important roles in many eukaryotic intranuclear processes, such as transcriptional regulation and chromatin structure formation. The investigations of nucleosome positioning rules provide a deeper understanding of these intracellular processes.
RESULTS: Nucleosome positioning prediction was performed using a model consisting of three types of variables characterizing a DNA sequence-the number of five-nucleotide sequences, the number of three-nucleotide combinations in one period of a helix, and mono- and di-nucleotide distributions in DNA fragments. Using recently proposed stringent benchmark datasets with low biases for Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans and Drosophila melanogaster, the present model was shown to have a better prediction performance than the recently proposed predictors. This model was able to display the common and organism-dependent factors that affect nucleosome forming and inhibiting sequences as well. Therefore, the predictors developed here can accurately predict nucleosome positioning and help determine the key factors influencing this process. CONTACT: awa@hiroshima-u.ac.jpSupplementary information: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Nucleosomes
DNA

Year: 2016 PMID： 27563027 PMCID： PMC5860184 DOI： 10.1093/bioinformatics/btw562

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Nucleosomes are the basic units of eukaryotic chromatin, and each one is formed by 147 DNA base pair (bp) sequences wrapped tightly around a histone octamer. The precise nucleosome formation and its inhibitory effects on promoters (Choi and Kim, 2009; Jiang and Pugh, 2009; Tirosh and Barkai, 2008), enhancers (Andreu-Vieyra ; He ; Maston ; McPherson ) and insulators (Bi ; Takagi ) play crucial roles in the precise regulation of transcription (West ). The precise nucleosome positioning facilitates DNA replication, DNA repair, and RNA splicing (Berbenetz ; Chen , 2014a; Schwartz ; Yasuda ). Therefore, the elucidation of nucleosome positioning steps may allow an in-depth understanding of various biological processes. Recently, high-resolution genome-wide nucleosome maps were obtained for several model organisms (Lee ; Mavrich a,b; Schones ; Segal ). In contrast to this, the determinant factors of the nucleosome positioning remained unclear. However, with the increase in the availability of high-quality experimental datasets, various computational methods and tools for the prediction of nucleosome positioning were proposed (reviewed in Teif, 2015), providing valuable insights and allowing the mechanisms determining nucleosome positioning to be unveiled. Furthermore, the construction of the accurate predictors can lead to the possibility of the analysis of single nucleotide polymorphism and gene mutation effects on this process. Many of these predictors were constructed based on the information about the frequencies and distributions of the combinations of polynucleotide sequences as feature vectors (Field ; Ioshikhes ; Kaplan ; Ogawa ; Peckham ; Segal ; Struhl and Segal, 2013; Yi ; Zhang ). Sequence-dependent mechanical properties, such as sequence-dependent geometry and DNA fragment flexibility, were also considered for the characterization of nucleosome forming and inhibiting sequences (Chen , 2015; Freeman ; Goñi ; Guo ; Isami ; Nikolaou ; Tahir and Hayat, 2016; Tolstorukov ; Stolz and Bishop, 2010; Yuan and Liu, 2008). Furthermore, a powerful web-server called Pse-in-One (Liu ) was developed, where all existing feature vectors for DNA/RNA and protein/peptide sequences can be generated (see references cited in Chen and Lin 2015), together with the generation of the feature vectors for the sequences defined by users themselves. For human (Homo sapiens), worm (Caenorhabditis elegans) and fly (Drosophila melanogaster) genomes, Guo et al. (2014) constructed the stringent benchmark datasets of nucleosome forming and inhibiting sequences with low similarities, in order to examine the performance of nucleosome position predictors. Additionally, predictors iNuc-PseKNC and iNuc-PseSTNC (we call iNuc-Pse predictors) were proposed, and they were shown to have better success rates in the prediction of nucleosome positioning than any of the previously developed predictors (Guo ; Tahir and Hayat, 2016). Furthermore, for yeast (Saccharomyces cerevisiae) genomes, Chen et al. (2015) constructed a stringent benchmark dataset using the same methodology as Guo et al. (2014), and predicted the nucleosome positioning in yeast genome based on the deformation energies of DNA fragments. In order for iNuc-Pse predictors to show the best prediction performance for nucleosome positioning in different organisms, different sets of parameter values must be used (Guo ; Tahir and Hayat, 2016). The sequence predicted as the nucleosome forming sequence in one organism may be predicted as the nucleosome inhibiting sequence in another organism. This shows that the function of any given DNA sequence in assisting or inhibiting nucleosome formation depends on the investigated organism. However, no key factors and criteria affecting this process could be elucidated using this predictor, because iNuc-Pse predictors are based on support vector machine. Additionally, based on these datasets, some common short motif nucleosome forming and inhibiting sequences were found (Giancarlo ). However, the nucleosome positioning cannot be predicted sufficiently well using only these motives. In this study, a novel nucleosome positioning predictor was developed based on the linear regression model, consisting of three types of variables with different fragment length scales—the number of five-nucleotide sequences, the number of three-nucleotide combinations in one period of helix, and mono- and di-nucleotides distributions in whole DNA fragments. This predictor exhibited better prediction performance than the recently developed iNuc-Pse predictors for the same benchmark datasets of human and fly genomes and displayed common and organism-dependent key factors of nucleosome positioning explicitly. A series of recent publications (Jia , 2016; Lin , Liu ; Qiu , 2016a; Xiao ) demonstrated, in compliance with Chou’s five-step rule (Chou, 2011) that, in order to establish a useful sequence-based statistical predictor for a biological system, the following five guidelines should be observed: (i) how to construct or select a valid benchmark dataset to train and test the predictor; (ii) how to represent the biological sequence samples by catching their key features associated with the target to be predicted; (iii) how to introduce or develop a powerful algorithm to operate the prediction; (iv) how to properly perform cross-validation tests to objectively evaluate the anticipated accuracy; and (v) how to establish a user-friendly web-server for the predictor that is accessible to the public. Below, these steps are further explained.

2 Materials and methods

2.1 Benchmark datasets of nucleosome forming and inhibiting sequences

The stringent benchmark datasets of nucleosome forming and inhibiting sequences with low biases constructed by Guo and Chen were used for the evaluation of the performance of the proposed predictor. These datasets involved human (H.sapiens: 2273 forming sequences and 2300 inhibiting sequences of 147 bp), worm (C.elegans: 2567 forming sequences and 2608 inhibiting sequences of 147 bp), fly (D.melanogaster: 2900 forming sequences and 2850 inhibiting sequences of 147 bp) (Guo ) and yeast (S.cerevisiae: 1880 forming sequences and 1740 inhibiting sequences of 150 bp) (Chen ). In these datasets, none of the sequences has >80% pairwise sequence identity with any other sequence. Note that the benchmark datasets used in previous studies were expected to contain many redundant, highly similar sequences, and these biased datasets lacked statistical representativeness (Chou, 2011), and the predictors may have yielded misleading results if trained and tested using these biased datasets. Therefore, only the low-biased datasets, proposed by Guo and Chen were employed in this study.

2.2 Model predicting nucleosome positioning 1: three-length scales model

In order to predict whether a given 147-bp DNA sequence of human, worm, and fly genomes is involved in the formation or the inhibition of formation of nucleosome, the model included three types of variables: (i) the number of five-nucleotide sequences, (ii) the number of three-nucleotide combinations in one period of a double helix and (iii) mono- and di-nucleotide distributions in DNA fragments. The model was named three length scales (3LS), and it belongs to a class of general PseKNC-based predictors (Guo ; Liu ). The model is described by the following equations: and Here, Qseq is defined as a value of a given sequence, and when Q > Qc = 0.5, this sequence was considered a nucleosome forming sequence, while it was predicted as an inhibiting sequence otherwise. are defined as follows: defines the number of adenine (A) or thymine (T) nucleotides in ith region of the given DNA sequence. Here, region 1 occupies the central 11 bp fragment of the given 147-bp DNA, the regions for 1 < i < 8 occupy 2 10-bp fragments (20 bp) at (i-1)th nearest neighbor of first region, and eighth region occupies the remaining 16-bp fragment (Fig. 1a).

Fig. 1.

Nucleotide regions and groups analyzed in each 147-bp DNA sequence. (a) Each nucleotide belongs to a specific region. (b) Each dinucleotide pair belongs to a specific group. b indicates the nth base of nucleotide, and dinucleotide pairs are underlined red defines the sum of the number of each type of successive dinucleotide sequence and its complementary sequence, named Di-seq and Di-seq*, in the i’th group of the dinucleotide series of a given DNA sequence. Here, first group consists of 10 dinucleotides at the central region of a given 146-dinucleotide series, i’th groups for 1 < i’ < 8 consist of 20 dinucleotides between (5 ± (10 × (i’-2)+1))th to (5 ± 10 × (i’-1))th dinucleotide from the center of a given 146 dinucleotide series, and eight group contains the remaining 16 dinucleotides (Fig. 1b). ) defines the sum of the number of each type of combination of 3-nucleotide set (3-nuc) that consists of a nucleotide, the second nucleotide located downstream at the distance j, and the third nucleotide located at the distance k in downstream sequence (j < k), together with the number of the complementary nucleotide combinations (3-nuc*) in the given DNA sequence. Here, 5 < k <11 cases were considered. (5-seq or 5-seq*) defines the sum of the number of each type of successive five-nucleotide sequence (5-seq) and that of the complementary sequence (5-seq*) in the given DNA sequence. The coefficients M (), D (), T (), and P () provide the weight of the contributions of to Qseq and Q0 as a constant value. They are organism-dependent values, which reveal the common and organism-specific characteristics of nucleosome forming and inhibiting sequences.

2.3 Variable selection in 3LS model

In order to obtain high prediction performances, the 3LS model should contain only the appropriate variables of. The coefficients M (), D (), T () and P () of the appropriate variables should be given as finite values, while the values of redundant variables should be given as zero. The appropriate variables were chosen by the stepwise forward selection method (Efroymson, 1960). Here, in order to avoid multicollinearity (Farrar and Glauber, 1967), the variance inflation factors of all chosen variables were kept below 10 (10.5 for fly genomes, since the prediction performance of the model increased drastically in comparison with the case when 10.0 was used) (O’brien, 2007). The model consists of the linear combination of, instead of that of , since this allows a better prediction performance.

2.4 Model predicting nucleosome positioning 2: tri-nucleotide sequence model

For the prediction of nucleosome positioning, a simpler model than 3LS, named Tri-nucleotide sequence (TNS) model was introduced: where is defined by the sum of the number of each type of successive TNS and that of the complementary sequence in a given DNA sequence. The coefficient R () provide the weight of the contributions of to Qseq and Q0 as a constant value. This simple model allows a very high accuracy of the nucleosome positioning prediction for yeast genome.

2.5 Evaluations of the quality of prediction

The prediction quality of the present model was evaluated using the jackknife test (Lachenbruch and Mickey, 1968) and relative operating characteristic (ROC) curve. These methods were generally employed for the evaluation of the quality of several previously developed predictors (Chen , 2013; Chen and Li, 2013, Chou ; Esmaeili ; Gupta ; Mei, 2012, Mohabatkar et al., 2011, 2013) and iNuc-Pse predictors (Guo ; Tahir and Hayat, 2016). Here, N+, N−, N+−, and N−+ were defined as the total number of nucleosome forming sequences, nucleosome inhibiting sequences, nucleosome forming sequences incorrectly predicted as nucleosome inhibiting sequences, and nucleosome inhibiting sequences incorrectly predicted as nucleosome forming sequences. Using the jackknife test, the following metrics were obtained: where Sn, Sp, Acc and MCC stand for sensitivity, specificity, accuracy, and Mathew’s correlation coefficient, respectively. Note that Sn and (1 − Sp) represent true positive rate (TPR) and false positive rate (FPR), respectively. The conventional formulations of the four metrics are not quite intuitive and it may be difficult for many experimental scientists to understand them, particularly MCC. Fortunately, the more intuitive expressions, presented in this paper, can be derived using the symbols defined in a signal peptide study (Chou, 2001), and elaborated in other studies (Chen ; Xu ). The ROC curve can be obtained as the trajectory of TPR–FPR two-dimensional surface for the change in Qc. The area surrounded by TPR = 0, FPR = 0, and ROC curve, called AUROC, was used to estimate the performances of predictors, where AUROC = 0.5 is equivalent to a random prediction, and AUROC = 1 indicates perfect prediction. Note that the following three cross-validation methods are often used to examine the effectiveness of a predictor in practical applications: independent dataset test, subsampling test, and jackknife test (Chou and Zhang, 1995). However, of the three, the jackknife test is deemed the least arbitrary one (most objective) that can always yield a unique result for a given benchmark dataset (Chou, 2011), and therefore, it has been increasingly used for the investigations of the accuracy of various predictors (e.g. Dehzangi ; Kabir and Hayat, 2016; and references cited in Chou, 2011). Accordingly, the jackknife test was also adopted here for the examination of the quality of the present predictor.

2.6 Construction of nucleosome positioning predictor

Based on the 3LS and TNS models, the nucleosome positioning predictors for each organism were constructed. The predictors for human, worm and fly genomes were assumed to consist of the appropriately chosen variables. The coefficients of these chosen variables were determined by the multiple regression analysis, using benchmark datasets for each organism, and the explanatory variables were given by the chosen for 3LS model, and for TNS model, and the objective variables were given as 1 for nucleosome forming sequences and 0 otherwise.

3 results

3.1 Variable selection for 3LS model using human, worm and fly sequences

Variables , involved in the construction of the nucleosome positioning predictors in 3LS model were chosen by stepwise forward selection method. Here, 403, 392 and 325 variables were chosen for human, worm and fly genomes, respectively (Supplementary Table S1).

3.2 Prediction quality for human, worm and fly genomes

Using the jackknife cross-validation tests, Sn, Sp, ACC and MCC of 3LS model based predictor were evaluated for human, worm, and fly genome benchmark datasets (Table 1). The obtained ACCs of the investigated predictor for these datasets (≈ 0.9001, ≈ 0.8786 and ≈ 0.8341, respectively) were shown to be higher than those obtained by iNuc-PseKNC (Guo ) for all organisms, and higher than those obtained by iNuc-PseSTNC (Tahir and Hayat, 2016) for human and fly genomes. The higher AUROC values were obtained as well (≈ 0.9588, ≈ 0.9505 and ≈ 0.9147 for human, worm, and fly datasets, respectively), compared with those obtained by iNuc-PseKNC (≈ 0.925, ≈ 0.935 and ≈ 0.874) (Guo ) (Fig. 2). Thus, we expected that 3LS model-based predictor with appropriate coefficients (Supplementary Table S2a) can predict the nucleosome positioning more accurately than the recent iNuc-Pse predictors for human and fly genomes.

Table 1.

The prediction quality of 3LS model-based predictor measured using jackknife tests

	Human	Worm	Fly
ACC	0.9001 (0.8627^a, 0.8760^b)	0.8786 (0.8690^a, 0.8862^b)	0.8341 (0.7997^a, 0.8167^b)
Sn	0.9169 (0.8786^a, 0.8931^b)	0.8654 (0.9030^a), 0.9162^b)	0.8407 (0.7831^a, 0.7976^b)
Sp	0.8835 (0.8470^a, 0.8591^b)	0.8921 (0.8355 ^a, 0.8666^b)	0.8274 (0.8165^a, 0.8361^b)
MCC	0.8006 (0.73^a, 0.75^b)	0.7576 (0.74 ^a, 0.77^b)	0.6682 (0.60^a, 0.63^b)

Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Mathew’s correlation coefficient.

Values in brackets are those obtained using iNuc-PseKNCa and iNuc-PseSTNCb.

Fig. 2.

ROC curves obtained with the jackknife tests using human, worm, and fly genome datasets (Color version of this figure is available at Bioinformatics online.)

ROC curves obtained with the jackknife tests using human, worm, and fly genome datasets (Color version of this figure is available at Bioinformatics online.) The prediction quality of 3LS model-based predictor measured using jackknife tests Sn, sensitivity; Sp, specificity; Acc, accuracy; MCC, Mathew’s correlation coefficient. Values in brackets are those obtained using iNuc-PseKNCa and iNuc-PseSTNCb.

3.3 TNS model for yeast genome

The quality of TNS model-based predictor was expected to be lower than that of 3LS model based. ACCs of TNS model were shown to be ≈ 0.8167, ≈ 0.8394 and ≈ 0.7082 for human, worm, and fly genomes, respectively. However, TNS model based predictor exhibited perfect nucleosome positioning prediction (ACC = 1.0) for the benchmark yeast genome dataset, presented in Chen . For the same benchmark dataset, the predictor based on DNA deformation energy (Chen ) had ACC of ≈ 0.981. Moreover, we confirmed that the predictor based on the nearest neighbor algorithm (Yi ) had ACC of ≈ 0.9906 for the same benchmark dataset. These predictors can perform sufficiently well in predicting nucleosome positioning for yeast genome. However, we expected that TNS model-based predictor with appropriate coefficients (Supplementary Table S2b) is able to predict the nucleosome positioning more precisely than these recent predictors.

4 Discussion

3LS model-based predictor can predict nucleosome positioning in human and fly genomes more accurately than the recently proposed nucleosome position predictors can. Additionally, the predictor defined here can display the details of organism-dependent key factors for the determination of nucleosome forming and inhibiting sequences. The chosen in 3LS model differed greatly between human, worm, and fly genomes (Supplementary Table S1). This indicates there are many organism-dependent differences in the features contributing to the nucleosome formation. The coefficients of these variables, M (), D (), T () and P (), and constant value Q0, obtained by multiple regression analysis, clearly showed organism-dependent specificities (Supplementary Table S2). These differences are presented in the following examples: (i) In 3LS models of these genomes that contained common variables, their coefficients’ signs often differed between the organisms. (ii) There were only six variables with the same signs of their coefficients between these organisms, and these were: T(0, 1, 6, = CTT or 0, 5, 6, = AAG) > 0, T(0, 1, 10, = TTG or 0, 9, 10, = CAA) > 0, P(TTTTT or AAAAA) < 0, P(GCTTC or GAAGC) > 0, P(GTGTC or GACAC) > 0 and P(GGATC or GATCC) >0 Poly(dA-dT) sequences, such as AAAAA sequence, are known as physically rigid sequences (Brunkner et al., 1995; Nelson ; Packer ). Therefore, the sequences containing these motives inhibit the nucleosome formation in the genomes of several organisms, which was confirmed by experimental evidence and the use of different nucleosome positioning predictors (Bi ; Giancarlo ; Kunkel and Martinson, 1981; Yi ), which is consistent with the results presented here. The sequences with high GC content were reported to have a nucleosome-forming tendency (Tillo and Hughes, 2009). However, considering the results of the recent studies, 30–50% nucleotides found in the nucleosome forming sequences are A or T nucleotides located at the appropriate positions (Giancarlo ; Ioshikhes ; Ogawa ; Ohyama 2001; Satchwell ; Segal ), which seems to agree with the results obtained in this study. (iii) When only the six variables described above were chosen in 3LS model based predictor, ACCs for human, worm, and fly genomes were ACC ≈ 0.7525, ≈ 0.7716 and 0.6438, respectively, which is much lower than the values obtained using the model with suitable variables. However, even when these variables were removed from the 3LS model based predictor with suitable variables, the decrease in ACCs for human, worm and fly genomes was not considerable, and the obtained ACC values were ≈ 0.8974, ≈ 0.8730 and ≈ 0.8290, respectively. This indicates that the organism-specific sequence patterns dominantly contribute to the determination of nucleosome forming abilities. (iv) The weight of the contribution of the set for each k is defined as Wk= [Number of chosen]/[Number of chosen variables] (Table 2). The obtained Wk values were different for different organisms, e.g. W5 ∼ 0.074, 0.112, 0.080 (k = 5 as the smallest k) and W10 ∼ 0.159, 0.115, 0.151 (k = 10 as the largest k) were obtained for human, worm, and fly genomes, respectively. This indicates that the length scale of nucleotide combinations required for the characterization of nucleosome forming sequences depends on the organism analyzed.

Table 2.

Weights of the contributions of (0, j, k, = 3-nuc or 0, k − j, k, = 3-nuc*) for each k (Wk) and (i’ | Di-seq or Di-seq*) for the positions near and far from the dyad position (Wnear and Wfar)

	Human	Worm	Fly
W₅	0.074441687	0.112244898	0.08
W₆	0.094292804	0.068877551	0.089230769
W₇	0.069478908	0.073979592	0.098461538
W₈	0.11662531	0.114795918	0.083076923
W₉	0.1191067	0.135204082	0.12
W₁₀	0.158808933	0.114795918	0.150769231
W_near	0.027295285	0.025510204	0.027692308
W_far	0.027295285	0.015306122	0.006153846

Weights of the contributions of (0, j, k, = 3-nuc or 0, k − j, k, = 3-nuc*) for each k (Wk) and (i’ | Di-seq or Di-seq*) for the positions near and far from the dyad position (Wnear and Wfar) (v) The weight of the contribution of the set (i’ | Di-seq or Di-seq*) near and far from the center of sequence (dyad position) was defined as Wnear = [Number of chosen (i’ | Di-seq or Di-seq*) near and far from the center of sequence (dyad position) was defined as Wnear = [Number of chosen)]/[Number of chosen variables] and Wfar = [Number of chosen]/[Number of chosen variables] (Table 2). Wnear values were similar values in the datasets for the 3 investigated organisms. The values of Wfar, ≈ 0.027, 0.015 and 0.006, were obtained for human, worm, and fly genomes, respectively, where Wfar for fly was shown to be ∼1/2 of that for worm and ∼1/4 for human. This suggests that the contribution of the sequences far from the dyad position to the nucleosome formation depends on the organism type. Using the TNS model-based predictor, the obtained ACC values of nucleosome position predictions for human, worm, and fly genomes were much lower than those obtained using 3LS model-based predictor. while ACC = 1 was obtained for yeast genome. This clearly demonstrates organism-dependent characteristics of nucleosome forming and inhibiting sequences, showing that the nucleosome positioning is much more easily predicted in yeast than in higher organisms. The predictors developed here can predict nucleosome positioning in human, fly and yeast genomes with higher accuracy than the recently proposed predictors and can determine the key factors influencing this positioning in human, worm, fly and yeast genomes. In contrast to the recently proposed iNuc-Pse predictors, 3LS model-based predictor developed in this study is based on the following sequence properties as well: (i) Combinations of nucleotides located further away than those considered by iNuc-Pse predictors; (ii) More detailed distributions of A, T and dinucleotide sequences in a DNA fragment than those in iNuc-Pse predictors. These properties most likely contribute to the exhibited improved performance of the predictor proposed here in comparison with the iNuc-Pse predictors. However, the variable selections and the formalization of the model can be improved, and further modifications are needed for this predictor to perform better than the recent ones. Recent studies suggested that sequence-dependent geometry and flexibility of each DNA fragment may play important roles in the determination of its nucleosome forming ability (Chen , 2015; Freeman ; Goñi ; Guo ; Isami ; Nikolaou ; Stolz and Bishop, 2010; Tolstorukov ; Yuan and Liu, 2008). Furthermore, the nucleosome forming ability of each sequence may change with intracellular and environmental conditions (Andreu-Vieyra ; He ; Maston ; McPherson ; Struhl and Segal, 2013; Zhang ). Because of this, the predictors should be modified in the future by considering these physical and chemical influences. Additionally, as demonstrated in a series of recent publications (e.g. Chen , 2016; Jia ; Lin ; Liu ; Qiu ), during the development of new prediction methods, user-friendly and publicly accessible web-servers can significantly enhance the impacts of these tools (Chou, 2015). Therefore, the future efforts will include providing a web-server for the use of the prediction method presented here. Click here for additional data file.

76 in total

Review 1. Characterization of enhancer function from genome-wide analyses.

Authors: Glenn A Maston; Stephen G Landt; Michael Snyder; Michael R Green
Journal: Annu Rev Genomics Hum Genet Date: 2012-06-11 Impact factor: 8.929

2. Nucleosomal structure of undamaged DNA regions suppresses the non-specific DNA binding of the XPC complex.

Authors: Takeshi Yasuda; Kaoru Sugasawa; Yuichiro Shimizu; Shigenori Iwai; Tadahiro Shiomi; Fumio Hanaoka
Journal: DNA Repair (Amst) Date: 2005-03-02

3. Two strategies for gene regulation by promoter nucleosomes.

Authors: Itay Tirosh; Naama Barkai
Journal: Genome Res Date: 2008-04-30 Impact factor: 9.043

4. Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses.

Authors: Maryam Esmaeili; Hassan Mohabatkar; Sasan Mohsenzadeh
Journal: J Theor Biol Date: 2009-12-02 Impact factor: 2.691

5. Computational prediction of nucleosome positioning by calculating the relative fragment frequency index of nucleosomal sequences.

Authors: Ryu Ogawa; Noriyuki Kitagawa; Hiroki Ashida; Rintaro Saito; Masaru Tomita
Journal: FEBS Lett Date: 2010-03-03 Impact factor: 4.124

6. Sequence periodicities in chicken nucleosome core DNA.

Authors: S C Satchwell; H R Drew; A A Travers
Journal: J Mol Biol Date: 1986-10-20 Impact factor: 5.469

7. A high-resolution atlas of nucleosome occupancy in yeast.

Authors: William Lee; Desiree Tillo; Nicolas Bray; Randall H Morse; Ronald W Davis; Timothy R Hughes; Corey Nislow
Journal: Nat Genet Date: 2007-09-16 Impact factor: 38.330

8. ICM Web: the interactive chromatin modeling web server.

Authors: Richard C Stolz; Thomas C Bishop
Journal: Nucleic Acids Res Date: 2010-06-11 Impact factor: 16.971

9. Some remarks on protein attribute prediction and pseudo amino acid composition.

Authors: Kuo-Chen Chou
Journal: J Theor Biol Date: 2010-12-17 Impact factor: 2.691

10. Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides.

Authors: I Brukner; R Sánchez; D Suck; S Pongor
Journal: EMBO J Date: 1995-04-18 Impact factor: 11.598

5 in total

1. The 1-Particle-per-k-Nucleotides (1PkN) Elastic Network Model of DNA Dynamics with Sequence-Dependent Geometry.

Authors: Takeru Kameda; Shuhei Isami; Yuichi Togashi; Hiraku Nishimori; Naoaki Sakamoto; Akinori Awazu
Journal: Front Physiol Date: 2017-03-14 Impact factor: 4.566

2. LeNup: learning nucleosome positioning from DNA sequences with improved convolutional neural networks.

Authors: Juhua Zhang; Wenbo Peng; Lei Wang
Journal: Bioinformatics Date: 2018-05-15 Impact factor: 6.937

3. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX).

Authors: Ehsaneddin Asgari; Alice C McHardy; Mohammad R K Mofrad
Journal: Sci Rep Date: 2019-03-05 Impact factor: 4.379

4. Nucleosome positioning based on DNA sequence embedding and deep learning.

Authors: Guo-Sheng Han; Qi Li; Ying Li
Journal: BMC Genomics Date: 2022-04-13 Impact factor: 3.969

5. Comparative analysis and prediction of nucleosome positioning using integrative feature representation and machine learning algorithms.

Authors: Guo-Sheng Han; Qi Li; Ying Li
Journal: BMC Bioinformatics Date: 2021-06-02 Impact factor: 3.307

5 in total