Literature DB >> 35637888

Dataset of normalized probability distributions of virtual bond lengths, bond angles, and dihedral angles for the coarse-grained single-stranded DNA structures.

Jun-Lin Qian1, Li-Zhen Sun1.   

Abstract

The utility of the coarse-grained (CG) single-stranded DNA (ssDNA) model can drastically reduce the compute time for simulating the ssDNA dynamics. The model-matched CG potentials and the inherent potential constants can be derived by coarse-graining the experimentally measured ssDNA structures. A useful and widespread treatment of the CG model is to use three different pseudo-atoms P, S, and B to represent the atomic groups of phosphate, sugar, and base, respectively, in each nucleotide of the ssDNA structures. The three pseudo-atoms generate nine types of the structural parameters to characterize the unstructured ssDNA conformations, including three (virtual) bond lengths (P-S, S-B, and S-P) between two neighbouring beads, four bond angles (P-S-P, S-P-S, P-S-B, and B-S-P) between three adjacent bonds, and two dihedral angles (P-S-P-S and S-P-S-P) between three successive bonds. This paper mainly presents the data of normalized probability distributions of the bond lengths, bond angles, and dihedral angles for the CG ssDNAs.
© 2022 The Authors.

Entities:  

Keywords:  Coarse-grained potential; Computational simulation; Conformational change; Unstructured ssDNA

Year:  2022        PMID: 35637888      PMCID: PMC9142626          DOI: 10.1016/j.dib.2022.108284

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

These data can provide an insightful view of the conformational changes for the unstructured ssDNAs. These data can be used to support the developments of new CG ssDNA models or the modifications of the existing models. The researchers who are interested in the ssDNA dynamics and simulation models will benefit from the data presented in this paper.

Data Description

The PDB identifications (PDBids) and the basic information (including the number of chains Nc, the number of all nucleotides Nn, and the ssDNA types) of the selected ssDNAs are summarized in Table 1. More detailed information such as the sequences are deposited in the Mendeley Data database (Table 1: sequences of the selected ssDNA structures). Fig. 1 shows the normalized probability distributions of structural information the virtual bond lengths (Fig. 1(a1)–(a3)), the bond angles (Fig. 1(b1)–(b4)), and dihedral angles (Fig. 1(c1) and (c2)) for the CG ssDNAs. The ssDNA backbone-involved structural parameters, such as the bond length P-S, the bond angle P-S-P, and the dihedral angle P-S-P-S, are calculated from all selected ssDNA structures. As we mainly focus on the ssDNA polythymine poly(T), the base-involved structural parameters such as the bond length S-B and bond angle P-S-B, are obtained from the thymine-involved structures. In addition, the base-involved dihedral angles are not calculated as their effects on the conformations of the unstructured polythymine are weak [2]. The data are also deposited in the Mendeley Data database (Tables 2–4).
Table 1

The basic information of the selected ssDNA structures.

PDBidNcaNnbtypecPDBidNcNntype
1eyg270poly(T)4jlj234mixed
1jmc18poly(C)4jrq226poly(A)
1pa6336mixed4js4232poly(A)
1ph1339mixed4js5226poly(T)
1ph2335mixed4kdp234mixed
1ph3338mixed4ki2444mixed
1ph4338mixed4noe130mixed
1ph5336mixed4ou6110poly(T)
1ph6337mixed4ou7110poly(T)
1ph7336mixed4owx112poly(T)
1ph8338mixed4pog4120poly(T)
1ph9338mixed4pso440poly(T)
1phj337mixed4qgu224mixed
1s40111mixed5cd42120mixed
2c62120mixed5eax234poly(T)
2ccz115poly(T)5fhd238mixed
2vw9135poly(T)5h1b19poly(T)
3a5u131poly(C)5odl19poly(T)
3cmu118poly(T)5orq110mixed
3cmw230poly(T)5u8t114poly(T)
3ugo222mixed5usb19mixed
3ugp222mixed5usn19mixed
3ulp270poly(T)5uso19mixed
3vdy5175poly(T)5xrz140mixed
4bhm324poly(T)5xs0327poly(C)
4g0r110mixed5zg9120poly(T)
4gnx2124poly(T)5zva110mixed
4gop264poly(T)5zvb19mixed
4hid19mixed6fwr111poly(T)
4hik19mixed6fws221poly(T)
4him19mixed6i52120poly(T)
4hj519mixed6irq250poly(T)
4hj719mixed6jdg360poly(T)
4hj819mixed6kbs110mixed
4hj9110mixed6pij396mixed
4hja111mixed6qem136poly(T)

aNc denotes the number of chains in a structural file. bNn denotes total number of nucleotides in a structural file. cThe type named “mixed” represents the corresponding ssDNA with different compositions.

Fig. 1

(a1)–(a3) show the normalized probability distributions of the virtual bond lengths P-S, S-P+1, and S-B(T), respectively. (b1)–(b4) show the normalized probability distributions of bond angles P-S-P+1, S-P1-S1, P-S-B(T), and B(T)-S-P+1, respectively. (c1) and (c2) show the normalized probability distributions of the dihedral angles S-1-P-S-P+1 and P-S-P+1-S+1, respectively. Here the subscript represents the nucleotide index.

The basic information of the selected ssDNA structures. aNc denotes the number of chains in a structural file. bNn denotes total number of nucleotides in a structural file. cThe type named “mixed” represents the corresponding ssDNA with different compositions. (a1)–(a3) show the normalized probability distributions of the virtual bond lengths P-S, S-P+1, and S-B(T), respectively. (b1)–(b4) show the normalized probability distributions of bond angles P-S-P+1, S-P1-S1, P-S-B(T), and B(T)-S-P+1, respectively. (c1) and (c2) show the normalized probability distributions of the dihedral angles S-1-P-S-P+1 and P-S-P+1-S+1, respectively. Here the subscript represents the nucleotide index.

Experimental Design, Materials and Methods

The statistical analysis of the structural information for the ssDNAs mainly involves three steps: (1) selecting appropriate all-atom ssDNA structures; (2) coarse-graining the selected structures; and (3) calculating the CG structural parameters. The details are described as following: Selection of the all-atom ssDNA structures. The experimentally measured all-atom ssDNA structures are download from the protein data bank [6]. The selection of the structures based on the following criteria: (1) the ssDNA is bound to proteins to avoid the formation of helical structures [7]; and (2) there are at least 8 consecutive unpaired nucleotides. A total of 72 ssDNA structure files with PDB format (see the PDBids in Table I) are used. Then we use the visualization tool UCSF Chimera [4] to delete unnecessary molecules (such as proteins and waters), ions and hydrogen atoms of the ssDNAs, and use the software package X3DNA [5] to label the unpaired nucleotides. CG structures of the ssDNAs. We calculate the centers of mass for the atomic groups of the phosphate (P), sugar (S), and base (B) in each unpaired nucleotide. In particular, for the nucleotide i, the phosphate includes the atom phosphor and the directly bonded oxygen atoms (here an oxygen named O3’ in PDB files in fact belongs to the nucleotide i-1), the sugar group includes the sugar ring and an atom named C5’, and the base group includes other atoms in this nucleotide except the atom O3’ as it belongs to the phosphate of nucleotide i+1. In the CG structures, the atomic groups are represented by the three types of pseudo-atoms P, S, and B located the corresponding centers of mass. The pseudo-atoms are assumed to be connected by virtual bonds. Calculation of the structural parameters. For all CG ssDNA structures, we calculate the virtual bond lengths between two neighbouring pseudo-atoms (including P-S, S-P, and S-B(T)), the bond angles between two adjacent virtual bonds (including P-S-P, S-P-S, P-S-B(T), and B(T)-S-P), and the dihedral angles formed by three successive bonds (including P-S-P-S, S-P-S-P) in the backbone. Based on the calculation results, the normalized probability distribution for the corresponding structural parameters then can be statistically analyzed (see Fig. 1).

CRediT authorship contribution statement

Jun-Lin Qian: Resources, Investigation, Visualization, Writing – original draft. Li-Zhen Sun: Conceptualization, Methodology, Software, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
SubjectBiophysics
Specific subject areaThe coarse-grained model for simulating the single-stranded DNA dynamics
Type of dataTable and figure
How the data were acquiredThe normalized probabilities of the structural parameters are statistically obtained by coarse-graining the experimentally detected ssDNA structures [1], [2], [3]. A software UCSF Chimera (version 1.11.2) [4] is used to delete the unnecessary atoms in the all-atom structures. A software X3DNA (version 2.4) [5] is used to label the unpaired nucleotides.
Data formatRaw and analysed
Description of data collectionThe normalized probabilities are calculated as following:(1) For each nucleotide, we calculate the centers of mass of the phosphate (P), sugar (S), and base thymine (B(T)).(2) For each ssDNA chain, the structural parameters then can be calculated, including• virtual bond lengths Pi-Si, Si-Bi(T), and Si-Pi+1;• virtual bond angles Pi-Si-Pi+1, Si-Pi+1-Si+1, Pi-Si-Bi(T), and Bi(T)-Si-Pi+1;• and dihedral angles Pi-Si-Pi+1-Si+1 and Si-1-Pi-Si-Pi+1;where the subscript represents the nucleotide index.and (3) the normalized probabilities of the structural parameters can be statistically analyzed from the corresponding parameters obtained in the step above.
Data source locationAll selected 3D ssDNA structures are downloaded from the website of the protein data bank [6] (access to https://www.rcsb.org/).
Data accessibilityRepository name: Mendeley DataData identification number: doi:10.17632/nbd83424kc.2Direct link to the dataset: https://data.mendeley.com/datasets/nbd83424kc/2
Related research articleL.Z Sun, J.L. Qian, P. Cai, H.X. Hu, X. Xu, M.B. Luo, Mg2+ effects on the single-stranded DNA conformations and nanopore translocation dynamics, Polymer, 250 (2022) 124895
  7 in total

1.  The Protein Data Bank.

Authors:  H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures.

Authors:  Xiang-Jun Lu; Wilma K Olson
Journal:  Nucleic Acids Res       Date:  2003-09-01       Impact factor: 16.971

3.  UCSF Chimera--a visualization system for exploratory research and analysis.

Authors:  Eric F Pettersen; Thomas D Goddard; Conrad C Huang; Gregory S Couch; Daniel M Greenblatt; Elaine C Meng; Thomas E Ferrin
Journal:  J Comput Chem       Date:  2004-10       Impact factor: 3.376

4.  An experimentally-informed coarse-grained 3-Site-Per-Nucleotide model of DNA: structure, thermodynamics, and dynamics of hybridization.

Authors:  Daniel M Hinckley; Gordon S Freeman; Jonathan K Whitmer; Juan J de Pablo
Journal:  J Chem Phys       Date:  2013-10-14       Impact factor: 3.488

5.  Introducing improved structural properties and salt dependence into a coarse-grained model of DNA.

Authors:  Benedict E K Snodin; Ferdinando Randisi; Majid Mosayebi; Petr Šulc; John S Schreck; Flavio Romano; Thomas E Ouldridge; Roman Tsukanov; Eyal Nir; Ard A Louis; Jonathan P K Doye
Journal:  J Chem Phys       Date:  2015-06-21       Impact factor: 3.488

6.  Sequence-Dependent Three Interaction Site Model for Single- and Double-Stranded DNA.

Authors:  Debayan Chakraborty; Naoto Hori; D Thirumalai
Journal:  J Chem Theory Comput       Date:  2018-06-26       Impact factor: 6.006

7.  Polyelectrolyte properties of single stranded DNA measured using SAXS and single-molecule FRET: Beyond the wormlike chain model.

Authors:  Steve P Meisburger; Julie L Sutton; Huimin Chen; Suzette A Pabit; Serdal Kirmizialtin; Ron Elber; Lois Pollack
Journal:  Biopolymers       Date:  2013-12       Impact factor: 2.505

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.