| Literature DB >> 34617336 |
Dominique Mias-Lucquin1, Isaure Chauvot de Beauchene1,2.
Abstract
We explored the Protein Data Bank (PDB) to collect protein-ssDNA structures and create a multi-conformational docking benchmark including both bound and unbound protein structures. Due to ssDNA high flexibility when not bound, no ssDNA unbound structure is included in the benchmark. For the 91 sequence-identity groups identified as bound-unbound structures of the same protein, we studied the conformational changes in the protein induced by the ssDNA binding. Moreover, based on several bound or unbound protein structures in some groups, we also assessed the intrinsic conformational variability in either bound or unbound conditions and compared it to the supposedly binding-induced modifications. To illustrate a use case of this benchmark, we performed docking experiments using ATTRACT docking software. This benchmark is, to our knowledge, the first one made to peruse available structures of ssDNA-protein interactions to such an extent, aiming to improve computational docking tools dedicated to this kind of molecular interactions.Entities:
Keywords: benchmark; molecular docking analysis; single-stranded DNA; single-stranded DNA-binding protein
Mesh:
Substances:
Year: 2021 PMID: 34617336 PMCID: PMC9292434 DOI: 10.1002/prot.26258
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
FIGURE 1Benchmark building summary
ATTRACT docking results for cluster #4
| Unbound | 1smy_c | 5tmf_c | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Bound | 4oip_h | 4oiq_h | 4g7h_r | 4q4z_h | 4oip_h | 4oiq_h | 4g7h_r | 4q4z_h | |
| GAG (678) | irmsd | 8.195 | 8.333 | 8.176 | 8.164 | 7.673 | 7.722 | 7.764 | 7.756 |
| lrmsd | 20.775 | 21.011 | 20.950 | 20.934 | 19.774 | 20.041 | 20.035 | 19.962 | |
| fnat | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| AGC (497) | irmsd | 4.408 | 4.633 | 4.600 | 4.601 | 10.436 | 10.741 | 10.654 | 10.542 |
| lrmsd | 12.023 | 12.080 | 12.230 | 12.120 | 28.732 | 28.898 | 29.216 | 29.028 | |
| fnat | 0.36 | 0.33 | 0.31 | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 | |
| GCT (490) | irmsd | 4.002 | 4.007 | 3.999 | 4.017 | 4.199 | 4.323 | 4.389 | 4.333 |
| lrmsd | 11.688 | 11.679 | 11.639 | 11.696 | 11.380 | 11.547 | 11.539 | 11.527 | |
| fnat | 0.25 | 0.19 | 0.17 | 0.17 | 0.27 | 0.33 | 0.30 | 0.25 | |
Note: The number between brackets indicates the number of conformations used for the ensemble docking by ATTRACT. The number between parentheses is the size of the tri‐nucleotide library for the corresponding fragment.
Abbreviations: fnat, fraction of native contacts; irmsd, interface RMSD; lrmsd, ligand RMSD.
ATTRACT docking results for cluster #8.1
| Unbound | 3wod_f | 5xj0_f | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Bound | 4g7z_h | 4oir_h | 4q4z_h | 4oio_h | 4g7z_h | 4oir_h | 4q4z_h | 4oio_h | |
| AAT (497) | irmsd | 9.814 | 9.816 | 9.758 | 9.800 | 9.868 | 9.707 | 9.727 | 9.809 |
| lrmsd | 30.346 | 30.315 | 30.256 | 30.328 | 27.727 | 27.702 | 27.637 | 27.712 | |
| fnat | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| ATG (590) | irmsd | 11.584 | 11.575 | 11.512 | 11.523 | 12.128 | 12.159 | 12.120 | 12.213 |
| lrmsd | 33.453 | 33.474 | 33.461 | 33.662 | 31.797 | 31.802 | 31.788 | 32.010 | |
| fnat | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| TGG (612) | irmsd | 12.987 | 12.958 | 12.891 | 12.910 | 14.018 | 14.028 | 14.038 | 13.898 |
| lrmsd | 34.938 | 34.933 | 34.955 | 35.139 | 34.692 | 34.719 | 34.757 | 34.866 | |
| fnat | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
Note: The number between brackets indicates the number of conforations used for the ensemble docking by ATTRACT. The number between parentheses is the size of the tri‐nucleotide library for the corresponding fragment.
Abbreviations: fnat, fraction of native contacts; irmsd, interface RMSD; lrmsd, ligand RMSD.
FIGURE 2(A) Ratio between number of bound and unbound chains in each cluster; blue lines: #bound = #unbound ± 20%. (B) Mean‐RMSD for the clusters composed of at least three chains (one bound and two unbound or vice versa); blue line: median mean‐RMSD; blue dots (resp. orange): mean‐RMSD lower (resp. higher) than median mean‐RMSD
Occurrences of ssDNA homopolymer of each composition and length in the nonredundant dataset
| Sequence | Count |
|---|---|
| AAAA | 1 |
| CCCC | 2 |
| CCCCCC | 1 |
| CCCCCCCC | 2 |
| TTTT | 16 |
| TTTTT | 10 |
| TTTTTT | 10 |
| TTTTTTT | 2 |
| TTTTTTTT | 1 |
| TTTTTTTTT | 5 |