| Literature DB >> 29178285 |
Tsukasa Nakamura1,2, Toshiyuki Oda1, Yoshinori Fukasawa1, Kentaro Tomii1,2,3,4.
Abstract
Proteins often exist as their multimeric forms when they function as so-called biological assemblies consisting of the specific number and arrangement of protein subunits. Consequently, elucidating biological assemblies is necessary to improve understanding of protein function. Template-Based Modeling (TBM), based on known protein structures, has been used widely for protein structure prediction. Actually, TBM has become an increasingly useful approach in recent years because of the increased amounts of information related to protein amino acid sequences and three-dimensional structures. An apparently similar situation exists for biological assembly structure prediction as protein complex structures in the PDB increase, although the inference of biological assemblies is not a trivial task. Many methods using TBM, including ours, have been developed for protein structure prediction. Using enhanced profile-profile alignments, we participated in the 12th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP12), as the FONT team (Group # 480). Herein, we present experimental procedures and results of retrospective analyses using our approach for the Quaternary Structure Prediction category of CASP12. We performed profile-profile alignments of several types, based on FORTE, our profile-profile alignment algorithm, to identify suitable templates. Results show that these alignment results enable us to find templates in almost all possible cases. Moreover, we have come to understand the necessity of developing a model selection method that provides improved accuracy. Results also demonstrate that, to some extent, finding templates of protein complexes is useful even for MEDIUM and HARD assembly prediction.Entities:
Keywords: biological assembly; community wide experiment; heterooligomers; homooligomers; protein complexes
Mesh:
Substances:
Year: 2017 PMID: 29178285 PMCID: PMC5836938 DOI: 10.1002/prot.25432
Source DB: PubMed Journal: Proteins ISSN: 0887-3585
Figure 1Schematic showing our prediction procedure
Summary of methods used for profile construction
| Abbreviations | Query | Library | Profile construction (DB, # iterations) |
|---|---|---|---|
| PSI_PSSM | ○ | (ii) | (TM‐align |
| DB_PSSM | ○ | (i) | DELTA‐BLAST (CDD, 1) |
| SSM‐PSI_PSSM | ○(*) | (i) | SSEARCH (nr) + MAFFT + PSI‐BLASTexB (nr, 1) |
| HH‐PSI_PSSM | ○ | N/A | HHblits (up20, 3) + PSI‐BLASTexB (nr, 1) |
| PSI_PSRP | ○ | (ii) | (TM‐align |
| DB_PSRP | ○ | (i) | DELTA‐BLAST (CDD, 1) |
| SSM‐PSI_PSRP | N/A | (i) | SSEARCH (nr) + MAFFT + PSI‐BLASTexB (nr, 1) |
| HH‐PSI_PSRP | ○ | (i) | HHblits (up20, 3) + PSI‐BLASTexB (nr, 1) |
| HH_PSRP | ○ | (iii) | HHblits (up20, 3) |
The “Profile construction” column shows the methods (, databases, and number of iterations of search methods in parentheses) used in profile construction. “nr” and “CDD” respectively stand for the NCBI nr and conserved domain database. “up20” stands for HH‐suite's uniprot20 database. In the “Abbreviations” column, PSI = PSI‐BLASTexB, DB = DELTA‐BLAST, SSM = SSEARCH + MAFFT, HH = HHblits, PSSM = position specific scoring matrix, PSRP = position specific residue's probability (see the text). In the “Query” column “○”denotes the procedure used in profile construction for query proteins. (*) SSM‐PSI_PSSM was not used for constructing query profiles during the CASP12 experiments. Numbers (see Template libraries in the text) in the “Library” column represent the types of template libraries.
Figure 2Numbers of target domains for which “correct” templates were detected. Each row corresponds to individual template libraries. Each column represents a type of query profile that we used. The modified scoring scheme was used for 20 combinations shown in the four rightmost columns. Numbers in cells show the numbers of target domains for which “correct” templates were detected among the top five hits by each combination. Colors of cells correspond to the numbers of target domains for which “correct” templates were detected. Warmer colors represent larger numbers; colder colors represent smaller numbers. The bar of the coloring schema is shown on the rightmost side
Figure 3Plots of TM‐scores vs. the highest Z‐scores of templates. The horizontal axis shows Z‐score of an alignment between a target domain sequence and a template sequence in PDB. We show the highest Z‐score when the same template was identified within the top five hits using different profile–profile alignment methods. The vertical axis shows TM‐scores calculated using MMalign between a target complex and a template complex in PDB. The red circle represents a template complex with stoichiometry that is the same as that of the target. Each blue square dot corresponds to a template structure that has different stoichiometry as the target structure. Green star with a rectangle label corresponds to a template structure that we used to construct a model in CASP12. Text above each figure shows the multimer target name, target stoichiometry, target symmetry, and target difficulty in the first line and the target domain name, domain range, domain difficulty classification, target type (Human/Server), template used to construct our model in the CASP term, Z‐score of the template used, and the TM‐score of the complex template used. Templates given the highest Z‐score and the highest TM‐score are annotated with a label. The label contains a PDB ID and a number, which represents the number of biological assembly defined in the PDB. We gave 0 for an asymmetric unit
QS‐scores and TM‐scores of our first models and baseline for EASY and MEDIUM targets
| QS‐score | TM‐score | ||||
|---|---|---|---|---|---|
| Target ID | Difficulty category | FONT (1st) | Baseline | MM‐align | TM‐score |
| T0861‐T0862‐T0870 | MEDIUM | 0.000 | 0.29 | 0.469 | 0.334 |
| T0867 | EASY | 0.928 | 0.70 | 0.982 | 0.986 |
| T0873 | MEDIUM | 0.548 | 0.32 | 0.484 | 0.492 |
| T0880 | MEDIUM | 0.276 | 0.00 | 0.590 | 0.439 |
| T0881 | EASY | 0.557 | 0.34 | 0.809 | 0.733 |
| T0888 | MEDIUM | 0.422 | 0.00 | 0.820 | 0.713 |
| T0893 | EASY | 0.472 | 0.04 | 0.419 | 0.411 |
| T0906 | EASY | 0.815 | 0.73 | ‐ | ‐ |
| T0909 | EASY | 0.391 | 0.02 | 0.764 | 0.359 |
| T0917 | EASY | 0.658 | 0.10 | 0.867 | 0.860 |
| T0921‐T0922 | EASY | 0.065 | 0.02 | 0.655 | 0.553 |
| T0931 | MEDIUM | 0.490 | 0.39 | 0.514 | 0.536 |
QS‐scores of the first models of FONT and baseline QS‐scores (A. Lafita, personal communication) for EASY and MEDIUM targets are shown. TM‐scores, calculated with MM‐align and TM‐score, of our first models are also shown. Three (T0860, T0889, and T0903‐T0904) targets that we missed the opportunity to submit are not shown. The TM‐score of our first model for T0906 was not calculable because coordinate data of T0906 were unavailable.
Figure 4Comparison of target and template structures. The template structure (PDB ID: 4G6V34; green) was superimposed onto the target (T0868 (blue) and T0869 (red)) structure (PDB ID: 5J4A35) using UCSF Chimera.36 Tentative top (right) and side (left) views are shown. RMSD Cα = 3.12 Å >90 amino acids between 4G6VA and 5J4AA