| Literature DB >> 24364355 |
Gabor Nagy1, Chris Oostenbrink.
Abstract
In an accompanying paper (Nagy, G.; Oostenbrink, C. Dihedral-based segment identification and classification of biopolymers I: Proteins. J. Chem. Inf. Model. 2013, DOI: 10.1021/ci400541d), we introduce a new algorithm for structure classification of biopolymeric structures based on main-chain dihedral angles. The DISICL algorithm (short for DIhedral-based Segment Identification and CLassification) classifies segments of structures containing two central residues. Here, we introduce the DISICL library for polynucleotides, which is based on the dihedral angles ε, ζ, and χ for the two central residues of a three-nucleotide segment of a single strand. Seventeen distinct structural classes are defined for nucleotide structures, some of which--to our knowledge--were not described previously in other structure classification algorithms. In particular, DISICL also classifies noncanonical single-stranded structural elements. DISICL is applied to databases of DNA and RNA structures containing 80,000 and 180,000 segments, respectively. The classifications according to DISICL are compared to those of another popular classification scheme in terms of the amount of classified nucleotides, average occurrence and length of structural elements, and pairwise matches of the classifications. While the detailed classification of DISICL adds sensitivity to a structure analysis, it can be readily reduced to eight simplified classes providing a more general overview of the secondary structure in polynucleotides.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24364355 PMCID: PMC3904765 DOI: 10.1021/ci400542n
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Summary of Analyzed Polynucleotide Data Sets, Classification Efficiency of DISICL and X3DNA Algorithms, and Agreement between These Algorithms
| nucleotide
database | ||
|---|---|---|
| database | DNA | RNA |
| file number | 1,871 | 900 |
| model number | 8,044 | 5,109 |
| total data set | 94,080 | 187,602 |
| ave. multiplicitiy | 5.1 | 5.7 |
| ave. model length | 14.4 | 32.0 |
| base pairing (%) | 52.9 | 54.9 |
Figure 1Representation of region definitions used for polynucleotide classification (on the left) based on subsequent (ε, ζ, χ) values within a trinucleotide segment (on the right). Colored rectangles show the boundaries of regions marked with Greek letters. Atoms and bonds that define ε1, ζ2, and χ2 are marked in red.
Detailed DISICL Classes for Polynucleotide Classification, and Their Abbreviations (code), Occurrence (occ.), and Average Structure Element Lengths in the DNA and RNA Data Sets (top)a
| method | DISICL | DISICL | |||
|---|---|---|---|---|---|
| database | DNA | RNA | |||
| class | code | occ. (%) | length | occ. (%) | length |
| BI-helix | BI | 35.6 | 3.3 | 0.4 | 2.2 |
| BII-helix | BII | 3.4 | 2.3 | 0.1 | 2.0 |
| BIII-helix | BIII | 2.4 | 2.4 | 0.1 | 2.1 |
| B-loop | BL | 15.9 | 2.6 | 0.8 | 2.1 |
| A-helix | AH | 2.2 | 2.9 | 51.3 | 4.6 |
| A-loop | AL | 0.4 | 2.1 | 7.1 | 2.1 |
| Z-helix | ZH | 1.0 | 2.5 | 0.4 | 2.1 |
| quad loop | QL | 3.6 | 2.3 | 0.4 | 2.1 |
| sharp turns | ST | 3.1 | 2.2 | 2.1 | 2.1 |
| tetraloop B | TL | 1.6 | 2.0 | 8.6 | 2.1 |
| AB trans. | AB | 11.1 | 2.2 | 6.6 | 2.3 |
| AB2 trans. | AB2 | 1.4 | 2.1 | 3.2 | 2.1 |
| AZ trans. | AZ | 0.3 | 2.1 | 1.0 | 2.1 |
| ZB trans. | ZB | 0.9 | 2.0 | 0.4 | 2.0 |
| AD trans. | AD | 0.3 | 2.1 | 1.0 | 2.3 |
| BD trans. | BD | 2.3 | 2.3 | 0.4 | 2.1 |
| ZD trans. | ZD | 0.6 | 2.2 | 0.2 | 2.1 |
| unclassified | UC | 14.0 | 3.1 | 16.0 | 3.3 |
The same data is given for the X3DNA classes at the bottom of the table.
Simplified DISICL Classes for Polynucleotide Classification and Detailed Classes of Which They Are Formed, Occurrence (occ.), and Average Structure Element Lengths in DNA and RNA Data Sets
| simplified class | detailed class | DNA | RNA | ||
|---|---|---|---|---|---|
| name | code | occ (%) | length | occ (%) | length |
| B-helix | BI | 35.6 | 3.3 | 0.4 | 2.2 |
| irregular B | BII, BIII, BL | 21.7 | 3.0 | 1.0 | 2.2 |
| A-helix | AH | 2.2 | 2.9 | 51.3 | 4.6 |
| irregular A | AL, TL | 2.1 | 2.1 | 16.0 | 2.8 |
| Z-helix | ZH | 1.0 | 2.5 | 0.4 | 2.1 |
| quad loop | QL | 3.6 | 2.3 | 0.4 | 2.1 |
| AB transition | AB | 11.1 | 2.5 | 6.6 | 2.3 |
| transitory | AB2, ST, AZ, BZ, ZD, AD, BD | 8.8 | 2.2 | 8.0 | 2.3 |
| unclassified | unclassified | 14.0 | 3.1 | 16.0 | 3.3 |
Definitions for DISICL Polynucleotide Classificationa
| DISICL
polynucleotide classes | ||
|---|---|---|
| class | code | segment definition |
| BI-helix | BI | β1.β1, β1.ab1 |
| BII-helix | BII | β1.β2, β2.β1 |
| BIII-helix | BIII | β3.β3 |
| B-loop | BL | β1.β3, β3.β1, β2.β2, β3.β2, β2.β3, β3.αβ1, ab1.β3, β2.ab1, ab1.β2 |
| A-helix | AH | α1.α1, α1.ab1 |
| A-loop | AL | ab1.α3, α3.ab1, α3.α3, α1.α3, α1.α2 |
| Z-helix | ZH | ζ1.ζ2, ζ2.ζ1, ζ2.ζ3, ζ3.ζ2 |
| quad loop | QL | ζ1.ζ1, ζ1.ζ3, ζ3.ζ1, δ1.δ1, δ3.δ3, δ1.δ3, δ3.δ1, δ3.δ2, δ2.δ3, ab1.ζ1, ζ1.ab1, ζ1.β1, β1.ζ1, δ3.β1, ζ1.β3, β3.ζ1, β1.ζ2, ζ1.δ1, ζ1.α3, ζ1.β2, |
| sharp turns | ST | ζ2.ζ2, α3.β3, δ2.δ2, δ2.δ1, ζ2.α3, α3.β1, δ2.β2, ζ2.ab1, ζ2.β3, ζ2.α1, ab2.β2, ζ2.α2, ζ2.β1, ζ2.β2 |
| tetraloop B | TL | α2.β2, δ1.δ2, α3.α1, α2.α2, α2.α1, α2.β3, α2.α3, α3.α2, α3.ζ2, α3.β2, α1.ζ2 ab1.ζ2, δ1.ζ1, α2.ab2, α2.ζ2 |
| AB trans. | AB | ab1.ab1, ab1.α1, ab1.β1, α1.β1, α1.β3, β1.α1, α1.ab2, β1.ab2 |
| AB2 trans. | AB2 | β2.α3, β3.α1, β3.α3, ab2.ab2, β3.α2, α1.β2, β2.α1, ab2.α1, ab2.α3, ab2.β1, ab2.d1, δ1.ab2, |
| AZ trans. | AZ | α1.ζ1, α1.ζ3, ζ1.α1, ζ3.α1, α2.ζ1, α2.ζ2, α2.ζ3, ζ1.α2, ζ3.α2, α3.ζ1, α3.ζ3, ζ3.α3 |
| ZD trans. | ZD | ζ1.δ2, ζ1.δ3, δ2.ζ1, δ3.ζ1, ζ2.δ1, ζ2.δ2, ζ2.δ3, δ1.ζ2, δ2.ζ2, δ3.ζ2, ζ3.δ1 ζ3.δ2, ζ3.δ3, δ1.ζ3, δ2.ζ3, δ3.ζ3, |
| ZB trans. | ZB | β1.ζ3, β2.ζ1, β2.ζ2, β2.ζ3, β3.ζ2, β3.ζ3, ζ3.β1, ζ3.β2, ζ3.β3, |
| BD trans. | BD | β1.δ1, β1.δ2, β1.δ3, δ1.β1, δ2.β1 β2.δ1, β2.δ3, δ1.β2, δ3.β2, β3.δ1, β3.δ2 β3.δ3, δ1.β3, δ2.β3, δ3.β3, δ1.αβ1, |
| AD trans. | AD | α1.δ1, α1.δ2, α1.δ3, α3.δ3, δ1.α1, δ2.α1, δ3.α1, δ3.α3, |
Segments are assigned to a class if their central residues fall into regions separated by a dot in the segment definitions.
Figure 2Schematic representation of the calculation of groove dimensions in double-stranded DNA helices. Groove dimensions are calculated as distances of phosphorus atoms in the indicated nucleotides. See the corresponding part of the Methods section for further information.
Average Groove Dimensions for Various DNA Double Helices Observed in the DNA Data Seta
| sorted groove dimensions (DNA) | MGW | MGD | mgW | mgD | |||||
|---|---|---|---|---|---|---|---|---|---|
| structure | occurrence | mean | rmsf | mean | rmsf | mean | rmsf | mean | rmsf |
| BI-helix/BI-helix | 2511 | 17.5 | 2.8 | 9.4 | 1.2 | 12.9 | 2.4 | 8.3 | 1.1 |
| BI-helix/BII-helix | 185 | 19.1 | 2.9 | 8.9 | 2.0 | 13.4 | 2.6 | 8.2 | 1.0 |
| BI-helix/BIII-helix | 144 | 18.2 | 3.1 | 9.5 | 1.5 | 12.0 | 2.8 | 8.3 | 1.0 |
| BI-helix/B-loop | 1217 | 18.5 | 3.1 | 9.3 | 1.7 | 13.4 | 3.0 | 7.9 | 1.8 |
| BI-helix/A-helix | 19 | 20.4 | 3.7 | 10.5 | 1.5 | 12.5 | 2.3 | 8.6 | 1.5 |
| BI-helix/Z-helix | 5 | 21.3 | 0.6 | 3.0 | 0.5 | 13.4 | 0.1 | 8.5 | 0.4 |
| BI-helix/AB | 938 | 18.1 | 3.1 | 9.6 | 1.5 | 13.0 | 2.4 | 8.2 | 1.1 |
| BII-helix/BII-helix | 85 | 21.0 | 3.5 | 8.7 | 1.1 | 13.3 | 3.1 | 8.6 | 1.1 |
| BII-helix/BIII-helix | 26 | 17.7 | 3.7 | 8.8 | 1.1 | 12.4 | 3.2 | 8.9 | 1.2 |
| BII-helix/B-loop | 131 | 18.9 | 3.0 | 9.2 | 1.5 | 11.9 | 2.6 | 8.4 | 1.1 |
| BII-helix/A-helix | 3 | 16.3 | 2.4 | 7.7 | 0.7 | 15.6 | 4.7 | 7.9 | 2.4 |
| BII-helix/AB | 132 | 18.4 | 2.0 | 8.8 | 1.5 | 12.0 | 2.6 | 8.5 | 1.1 |
| BIII-helix/BIII-helix | 42 | 20.5 | 3.1 | 8.6 | 1.1 | 11.3 | 3.0 | 8.7 | 0.5 |
| BIII-helix/B-loop | 213 | 18.7 | 2.9 | 9.1 | 1.2 | 12.1 | 2.7 | 8.5 | 0.8 |
| BIII-helix/A-helix | 3 | 18.7 | 2.8 | 9.5 | 1.0 | 13.6 | 0.3 | 7.2 | 0.2 |
| BIII-helix/AB | 47 | 19.9 | 4.5 | 9.0 | 2.0 | 11.8 | 2.6 | 8.4 | 1.1 |
| B-loop/B-loop | 617 | 19.3 | 3.1 | 9.1 | 1.4 | 12.3 | 2.4 | 8.5 | 1.2 |
| B-loop/A-helix | 17 | 23.1 | 5.7 | 10.3 | 2.1 | 12.7 | 1.7 | 8.3 | 2.0 |
| B-loop/AB | 429 | 20.1 | 4.0 | 9.0 | 1.7 | 12.5 | 2.4 | 8.1 | 1.3 |
| A-helix/A-helix | 147 | 15.2 | 2.4 | 10.0 | 0.5 | 17.2 | 1.0 | 6.0 | 0.8 |
| A-helix/AB | 43 | 19.6 | 4.8 | 10.5 | 1.2 | 13.8 | 2.9 | 7.5 | 1.6 |
| Z-helix/Z-helix | 1 | 21.0 | 0.0 | 5.9 | 0.0 | 13.3 | 0.0 | 5.9 | 0.0 |
| AB/AB | 274 | 18.8 | 3.7 | 9.8 | 1.3 | 12.5 | 2.1 | 8.2 | 1.1 |
| overall average | 7229 | 18.3 | 3.2 | 9.4 | 1.5 | 12.9 | 2.6 | 8.2 | 1.3 |
Helices are sorted based on the assigned DISICL classification for the central segment of the helix turn on both strands. Groove dimensions are given as averages (mean) and root-mean-square fluctuation (rmsf) in Å. MGW: major groove width. MGD: major groove depth. mgW: minor groove width. mgD: minor groove depth.
Figure 3Examples of DNA structures and structure classification by DISICL. For each model, the PDB identification code is given followed by the abbreviation of classes according to Table 3, which are color coded to match the structures they mark.
Figure 4Examples of RNA structures and structure classification by DISICL. For each model, the PDB identification code is given followed by the abbreviation of classes according to Table 3, which are color coded to match the structures they mark.
Scaled Match Scores for Comparison of Secondary Structure Classifications by DISICL (simple) and X3DNA on the Combined DNA and RNA Data Seta
| class | XDNA | A-helix | B-helix | TA trans. | unclassified |
|---|---|---|---|---|---|
| DISICL | % | 37.5 | 12.19 | 0.9 | 49.3 |
| B-helix | 10.2 | 0.1 | 47.0 | 7.9 | 46.5 |
| irregular B | 6.8 | 1.3 | 38.7 | 2.8 | 48.0 |
| A-helix | 37.6 | 66.3 | 0.3 | 21.7 | 27.5 |
| irregular A | 12.1 | 29.5 | 0.2 | 7.3 | 44.1 |
| Z-helix | 0.6 | 0.8 | 0.0 | 0.0 | 43.9 |
| Quad loop | 1.3 | 1.6 | 3.8 | 0.1 | 74.2 |
| AB transition | 8.2 | 11.1 | 4.2 | 1.8 | 44.4 |
| transitory | 7.9 | 20.5 | 13.6 | 33.7 | 49.0 |
| unclassified | 15.4 | 11.8 | 4.2 | 9.8 | 42.6 |
For both algorithms, the occurrence of each class is displayed in the first row or column, respectively.
Figure 5Examples of DNA/RNA–protein complexes classified by DISICL and X3DNA. For each model, the PDB identification code is given, followed by the method of classification and the abbreviation of structural classes according to Table 3. Abbreviations are color coded to match the structures they mark.