| Literature DB >> 30087916 |
Dmitrii M Nikolaev1, Andrey A Shtyrov1, Maxim S Panov2, Adeel Jamal3, Oleg B Chakchir1, Vladimir A Kochemirovsky2, Massimo Olivucci4, Mikhail N Ryazantsev2,5.
Abstract
Rhodopsins are seven α-helical membrane proteins that are of great importance in chemistry, biology, and modern biotechnology. Any in silico study on rhodopsin properties and functioning requires a high-quality three-dimensional structure. Due to particular difficulties with obtaining membrane protein structures from the experiment, in silico prediction of the three-dimensional rhodopsin structure based only on its primary sequence is an especially important task. For the last few years, significant progress was made in the field of protein structure prediction, especially for methods based on comparative modeling. However, the majority of this progress was made for soluble proteins and further investigations are needed to achieve similar progress for membrane proteins. In this paper, we evaluate the performance of modern protein structure prediction methodologies (implemented in the Medeller, I-TASSER, and Rosetta packages) for their ability to predict rhodopsin structures. Three widely used methodologies were considered: two general methodologies that are commonly applied to soluble proteins and a methodology that uses constraints that are specific for membrane proteins. The test pool consisted of 36 target-template pairs with different sequence similarities that was constructed on the basis of 24 experimental rhodopsin structures taken from the RCSB database. As a result, we showed that all three considered methodologies allow obtaining rhodopsin structures with the quality that is close to the crystallographic one (root mean square deviation (RMSD) of the predicted structure from the corresponding X-ray structure up to 1.5 Å) if the target-template sequence identity is higher than 40%. Moreover, all considered methodologies provided structures of average quality (RMSD < 4.0 Å) if the target-template sequence identity is higher than 20%. Such structures can be subsequently used for further investigation of molecular mechanisms of protein functioning and for the development of modern protein-based biotechnologies.Entities:
Year: 2018 PMID: 30087916 PMCID: PMC6068592 DOI: 10.1021/acsomega.8b00721
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Rhodopsin clusters considered in the present work. The connection between names used and RCSB codes is given in the Supporting Information.
Target-Template Rhodopsin Pairs Considered in Our Study
| target rhodopsin | template rhodopsin | sequence identity (%) |
|---|---|---|
| 55 | ||
| archaerhodopsin 1 | archaerhodopsin 2 | 84 |
| archaerhodopsin 2 | 54 | |
| green-light PR | blue-light PR from HOT75 | 77 |
| blue-light PR from Med12 | blue-light PR from HOT75 | 57 |
| thermophilic rhodopsin | xanthorhodopsin | 53 |
| 55 | ||
| Acetabularia rhodopsin I | Acetabularia rhodopsin II | 55 |
| Acetabularia rhodopsin II | 19 | |
| 28 | ||
| Anabaena sensory rhodopsin | 27 | |
| sodium pump KR2 | 19 | |
| thermophilic rhodopsin | 24 | |
| blue-light PR from HOT75 | 25 | |
| 26 | ||
| 20 | ||
| 31 | ||
| channel rhodopsin | 15 |
Average Quality of the 16 Predicted Structures for Cases When Target-Template Sequence Identity was Higher Than 40% (Intracluster Structures)a
| homology modeling algorithm | Cα-RMSD, Å | GDT-HA, % | Cα-RMSD, Å (TM part) | GDT-HA, % (TM part) |
|---|---|---|---|---|
| I-TASSER (MP-T align.) | 1.350 (0.584–2.422) | 75.60 (59.26–90.81) | 1.137 (0.551–2.059) | 76.80 (59.73–91.82) |
| I-TASSER (AlignMe align.) | 1.342 (0.779–2.061) | 75.97 (62.11–88.32) | 1.206 (0.568–1.977) | 77.06 (63.25–89.53) |
| I-TASSER (MUSTER align.) | 1.440 (0.958–2.293) | 72.67 (56.47–83.05) | ||
| Medeller (MP-T align.) | 2.234 (0.680–4.000) | 80.52 (70.92–86.75) | 0.938 (0.639–1.702) | 83.24 (72.03–88.49) |
| Medeller (AlignMe align.) | 2.154 (0.769–4.657) | 80.52 (70.12–85.46) | 1.079 (0.674–1.736) | 82.73 (71.21–88.07) |
| RosettaCM (MP-T align.) | 1.901 (0.979–3.048) | 63.52 (52.41–73.61) | 1.571 (0.974–2.783) | 65.78 (52.41–73.95) |
| RosettaCM (AlignMe align.) | 2.054 (1.199–3.787) | 63.87 (54.38–73.03) | 1.752 (1.085–3.702) | 65.04 (53.89–74.89) |
The range of the corresponding values is given in brackets.
Average Quality of All 36 Predicted Models (Excluding the Failed Ones) Considered in the Present Worka
| homology modeling algorithm | Cα-RMSD, Å | GDT-HA, % | Cα-RMSD, Å (TM part) | GDT-HA, % (TM part) | number of failed structures |
|---|---|---|---|---|---|
| I-TASSER (MP-T align.) | 2.042 (0.584–3.731) | 66.54 (49.89–90.81) | 1.806 (0.551–3.529) | 68.78 (54.30–91.82) | 6 |
| I-TASSER (AlignMe align.) | 1.990 (0.779–3.896) | 68.78 (52.20–88.32) | 1.839 (0.568–3.754) | 70.20 (54.41–89.53) | 7 |
| I-TASSER (MUSTER align.) | 2.055 (0.958–3.816) | 66.07 (49.12–83.05) | 5 | ||
| Medeller (MP-T align.) | 2.483 (0.680–4.000) | 70.20 (50.70–86.75) | 1.663 (0.639–3.688) | 72.85 (53.35–88.49) | 7 |
| Medeller (AlignMe align.) | 2.429 (0.769–4.657) | 70.30 (48.06–85.46) | 1.714 (0.674–3.754) | 72.74 (51.94–88.07) | 8 |
| RosettaCM (MP-T align.) | 2.493 (0.979–3.943) | 60.39 (49.67–73.61) | 2.121 (0.974–3.782) | 62.35 (50.89–73.95) | 10 |
| RosettaCM (AlignMe align.) | 2.586 (1.199–3.952) | 59.72 (48.25–73.03) | 2.292 (1.085–3.783) | 60.82 (45.48–74.89) | 11 |
The range of the corresponding values is given in brackets.
Figure 2Dependence of Cα-RMSD of predicted models TM part on the target-template sequence identity for different algorithms of homology modeling with AlignMe alignment provided.
Figure 4Clustering pictures of the rhodopsins considered in the present work based on the quality of homology modeling predictions. For each pair of proteins, A and B, the three-dimensional model of A was predicted based on the crystallographic structure of B (AB value) and vice versa (BA value). The AB/BA values are given along each connecting line. For each prediction the average quality determines the reported distance between proteins. The different panels refer to different method/scoring combinations: (a) Medeller/Cα-RMSD; (b) Medeller/GDT-HA; (c) I-TASSER/Cα-RMSD; (d) I-TASSER/GDT-HA; (e) RosettaCM/Cα-RMSD; and (f) RosettaCM/GDT-HA. In all cases the AlignMe alignment results were provided.
Figure 3Dependence of GDT-HA of predicted models TM part on the target-template sequence identity for different algorithms of homology modeling with AlignMe alignment provided.
Figure 5Visualization of amino acid side-chains forming the active site of archaerhodopsin 2 in models produced by different algorithms of homology modeling: (a) Medeller; (b) I-TASSER; and (c) RosettaCM with AlignMe alignment. The crystallographic structure of H. salinarum BR (sequence identity 55%) was taken as the template. The side chains of the predicted model are blue, the side chains of the crystallographic structure are gray.
Figure 6Visualization of amino acid side-chains forming the active site of archaerhodopsin 2 in models produced by different structure building algorithms: (a) Medeller; (b) I-TASSER; and (c) RosettaCM with AlignMe alignment after the retinal insertion and short geometry optimization. The crystallographic structure of H. salinarum BR (sequence identity 55%) was taken as the template. The side chains of the predicted model are blue, the side chains of the crystallographic structure are gray.
Results of Structure Prediction with Multiple Templates Performed by RosettaCM Algorithma
| structure | templates | alignment method | Cα-RMSD | GDT-HA |
|---|---|---|---|---|
| 1m0l, best RosettaCM | 3wqj | MP-T | 1.615 | 71.15 |
| 1m0l, lowest RMSD | 3wqj, I-TASSER | MUSTER | 1.108 | 77.31 |
| 1m0l, highest GDT-HA | 3wqj, Medeller | AlignMe | 2.865 | 85.46 |
| 1m0l, 3 templates | 4qi1, 3wqj, 1uaz | ClustalO | 1.149 | 73.78 |
| 1m0l, 3 templates | 4pxk, 4fbz, 4jr8 | ClustalO | 1.508 | 62.43 |
| 1m0l, 6 templates | BR cluster | ClustalO | 1.249 | 70.59 |
| 1m0l, 9 templates | seq. id. ≥ 29% | ClustalO | 1.375 | 69.60 |
| 1m0l, 3 templates | 4qi1, 3wqj, 1uaz | PralineTM | 1.950 | 67.50 |
| 1m0l, 3 templates | 4pxk, 4fbz, 4jr8 | PralineTM | 1.780 | 68.50 |
| 1m0l, 6 templates | BR cluster | PralineTM | 1.574 | 65.53 |
| 1m0l, 9 templates | seq. id. ≥ 29% | PralineTM | 1.348 | 73.24 |
| 3wqj, best RosettaCM | 1uaz | MP-T | 0.979 | 73.61 |
| 3wqj, lowest RMSD | 1uaz, Medeller | MP-T | 0.680 | 86.75 |
| 3wqj, highest GDT-HA | 1uaz, Medeller | MP-T | 0.680 | 86.75 |
| 3wqj, 3 templates | 1uaz, 1m0l, 4qi1 | ClustalO | 1.235 | 68.80 |
| 3wqj, 6 templates | bacteriorhodopsin cluster | ClustalO | 1.924 | 64.32 |
| 3wqj, 3 templates | 1uaz, 1m0l, 4qi1 | PralineTM | 1.504 | 60.79 |
| 3wqj, 6 templates | BR cluster | PralineTM | 2.530 | 65.06 |
| 3ug9, best RosettaCM | 1m0l | MP-T | 4.216 | 35.96 |
| 3ug9, lowest RMSD | 1m0l, RosettaCM | MP-T | 4.280 | 35.96 |
| 3ug9, highest GDT-HA | 1m0l, Medeller | AlignMe | 4.731 | 42.23 |
| 3ug9, 5 templates | seq. id. ≥ 20% | ClustalO | 13.514 | 31.28 |
| 3ug9, 5 templates | seq. id. ≥ 20% | PralineTM | 5.357 | 38.19 |
| 1e12, best RosettaCM | 3a7k | AlignMe | 1.466 | 65.48 |
| 1e12, lowest RMSD | 3a7k, Medeller | AlignMe | 1.097 | 83.89 |
| 1e12, highest GDT-HA | 3a7k, Medeller | MP-T | 1.117 | 84.10 |
| 1e12, 6 templates | seq. id. ≥ 30% | ClustalO | 2.260 | 61.40 |
| 1e12, 6 templates | seq. id. ≥ 30% | PralineTM | 2.664 | 59.21 |
The results are compared with the best models for the same proteins produced by the RosettaCM algorithm based on the single template and with models with lowest Cα-RMSD values and GDT-HA values obtained in the current work.