| Literature DB >> 20369011 |
Petras J Kundrotas1, Ilya A Vakser.
Abstract
The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 A, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 A<RMSD<10 A, the accuracy suitable for less sensitive structure-alignment methods. Overall, approximately 50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20369011 PMCID: PMC2848539 DOI: 10.1371/journal.pcbi.1000727
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Interacting chains with known structure used in calculations.
| 1acbEI | 1e96AB | 1h2sAB | 1kxqAH | 1otsAC | 1t9gDS | 1x3wAB | 2ayoAB |
| 1agrAE | 1eaiBD | 1h4lAD | 1kz7AB | 1oxbAB | 1ta3BA | 1x86AB | 2b3tBA |
| 1aroPL | 1ebdBC | 1h59AB | 1kzyCA | 1oyvAI | 1tafAB | 1xb2AB | 2b59AB |
| 1avaAC | 1eerBA | 1h6kAX | 1l4dAB | 1oyvBI | 1tdqAB | 1xd3AB | 2b5iBA |
| 1avgHI | 1efnAB | 1h9hEI | 1l6xAB | 1p5vAB | 1te1AB | 1xdkBA | 2b5iCA |
| 1avwAB | 1ewyAC | 1he1AC | 1l7vAC | 1p8vAC | 1th1AC | 1xdtTR | 2bcjAQ |
| 1axiBA | 1f02IT | 1he8AB | 1ldjAB | 1p9mCB | 1th8AB | 1xg2AB | 2bfxAD |
| 1ay7AB | 1f34AB | 1hl6BA | 1lfdBA | 1p9mAB | 1tmqAB | 1xk4AC | 2bh1AX |
| 1b0nAB | 1f3vBA | 1hx1AB | 1lpbBA | 1pk1AB | 1tnrAR | 1xl3AC | 2bkhAB |
| 1b34AB | 1f5qAB | 1i1rAB | 1ltxAR | 1ppfEI | 1tocBR | 1xouBA | 2bkkAB |
| 1b6cAB | 1f60AB | 1i2mBA | 1m1eAB | 1pqzAB | 1tt5AB | 1xqsAC | 2bkrAB |
| 1blxAB | 1f6fBA | 1i7wAB | 1m27AC | 1pvhAB | 1tueAB | 1xtgAB | 2bo9AB |
| 1bmlCA | 1f6mAC | 1i8lAC | 1m2vBA | 1pxvAC | 1tx4AB | 1xu1AR | 2bseAE |
| 1bndAB | 1f93BE | 1iarBA | 1m9fAD | 1qa9AB | 1tx6AI | 1y4hAC | 2btfAP |
| 1buhAB | 1fbvAC | 1ib1AE | 1ma9AB | 1qavBA | 1txqAB | 1y64AB | 2c1mAB |
| 1buiAC | 1fccAC | 1ibrBA | 1mbxAC | 1qbkBC | 1tygAB | 1y8xAB | 2c5dAC |
| 1bvnPT | 1fleEI | 1iraYX | 1moxAC | 1qo3AC | 1u0sYA | 1ycsAB | 2ckhAB |
| 1bzqAL | 1fm9AD | 1itbBA | 1mq8AB | 1r0rEI | 1u7fAB | 1yvbAI | 2ey4AE |
| 1c1yAB | 1foeAB | 1ixsBA | 1mvfAE | 1r1kAD | 1uadAC | 1z0jAB | 2ey4AC |
| 1c4zAD | 1fqjAB | 1j2jAB | 1mzwAB | 1r4aAE | 1ueaAB | 1z2cBA | 2f9dAP |
| 1c9pAB | 1fqjCA | 1jatAB | 1n0wAB | 1r8sAE | 1ughEI | 1z3eAB | 2fi4EI |
| 1cd9BA | 1fr2BA | 1jdhAB | 1nexBA | 1rp3AB | 1ujwAB | 1z3gHA | 2g45AB |
| 1choEI | 1fs1BA | 1jiwPI | 1nf3AC | 1s1qAB | 1ukvGY | 1z5yED | 2gooAC |
| 1clvAI | 1fyhAB | 1jk9BA | 1nmuAB | 1s3sBH | 1ul1XA | 1z92AB | 2gy7AB |
| 1cseEI | 1g3nAB | 1jmaAB | 1npeAB | 1s4yBA | 1us7AB | 1zbdAB | 2hppHP |
| 1cxzAB | 1g3nAC | 1jowBA | 1nqlAB | 1s6vAB | 1usuAB | 1zbxAB | 2mtaCA |
| 1d2zBA | 1g4uSR | 1jtdAB | 1nt2BA | 1sbbBC | 1uuzAD | 1zc3AD | 2sniEI |
| 1d3bAB | 1g6vAK | 1jtgAB | 1nunBA | 1sgfGB | 1uw4BA | 1zlhAB | 2trcBP |
| 1d4xAG | 1g73AC | 1jtpAL | 1nvuSQ | 1sgpEI | 1uzxAB | 1zm2AB | 3fapAB |
| 1d6rAI | 1gc1GC | 1jw9BD | 1nw9BA | 1shwAB | 1v5iAB | 2a19BA | 3hhrCA |
| 1devAB | 1gcqAC | 1k5dAB | 1o6sAB | 1shyBA | 1v74AB | 2a41AC | 3proAC |
| 1df9AC | 1gh6BA | 1k8rAB | 1o94AC | 1shzAC | 1vetAB | 2a42AB | 3sicEI |
| 1dfjEI | 1ghqAB | 1k90AD | 1oc0AB | 1sppAB | 1vg0AB | 2a5dBA | 3ygsCP |
| 1dhkAB | 1gl0EI | 1kacAB | 1oeyJA | 1sq0AB | 1w1iAF | 2a5tAB | 4htcHI |
| 1dkfBA | 1gl1AI | 1kg0BC | 1ofhAG | 1sq2LN | 1w98AB | 2a5yBA | 4sgbEI |
| 1dkgDA | 1gl4AB | 1kgyAE | 1ofuAX | 1stfEI | 1wmhAB | 2a78BA | |
| 1dmlAB | 1glbFG | 1ki1BA | 1ohzAB | 1sv0AC | 1wmiAB | 2ajfAE | |
| 1dn1AB | 1go4AG | 1kpsAB | 1ol5AB | 1svxBA | 1wpxAB | 2apoAB | |
| 1dowAB | 1gpwAB | 1kshAB | 1oo0AB | 1syxAB | 1wq1RG | 2assBA | |
| 1ds6AB | 1gvnBA | 1ktkEA | 1ophAB | 1t0fAC | 1wr6AE | 2assBC | |
| 1dtdAB | 1gxdAC | 1ktzBA | 1or7AC | 1t6bXY | 1wrdAB | 2auhAB | |
| 1e44BA | 1gzsAB | 1ku6AB | 1oryAB | 1t6gAC | 1wywAB | 2aw2AB |
First four symbols are the PDB code followed by the IDs of interacting chains as in the PDB file.
Number of structures with full interface coverage alignments, N FIC, for different types of complexes.
| Complex type | Total number of structures | Total number of BLAST alignments |
| ||
| both monomers | one monomer | none of the monomers | |||
| All | 329 | 66706 | 218 | 99 | 12 |
| Antibody-antigen | 12 | 11657 | 12 | 0 | 0 |
| Enzyme-inhibitor | 63 | 9441 | 42 | 20 | 1 |
| Cytokine | 25 | 5183 | 19 | 6 | 0 |
| Other | 229 | 40425 | 145 | 73 | 11 |
Figure 1Percentage of alignments with full interface coverage (FIC alignments) in alignment pool produced by PSI-BLAST on the representative set of 329 two-chain complexes at various maximum target sequence coverage q max.
Figure 2Comparison of distributions of alignment identities and similarities between alignments containing all interface residues and all alignments.
The distributions of alignments containing all interface residues are shown by open bars and those of all alignments are shown by closed bars. Panels A and C show distributions for the alignments with maximum query sequence coverage 40% and panels B and D show the distributions for the whole alignment pool irrespectively of query sequence coverage.
Figure 3Distributions of interface identities and similarities in alignments containing all interface residues.
Panels A and C show the distributions for the alignments with maximum query sequence coverage 40% and panels B and D show the distributions for the alignments irrespectively of query sequence coverage. For the definitions of interface identity and similarity see text.
Figure 4Probability of finding all interface residues inside an alignment as a function of alignment identity and similarity.
Curves are least-square polynomial fits to the data points obtained from the analysis of PSI-BLAST alignments for the representative set of 329 complexes used in the study.
Parameters of the top models produced on the basis of alignments with maximum 40% target sequence coverage and full interface coverage.
| Target | Template | Log e | q, % | qdom, % | Alignment | Interface | Interface RMSD, Å | ||||||
| PDB and chain ID | Source organism | Biological function | PDB and chain ID | Source organism | Biological function | identity | similarity | identity | similarity | ||||
|
| Cow (M) | Blood coagulation |
|
| Proteolysis | −5.16 | 35.9 | – | 17.2 | 30.1 | 5.3 | 10.5 | 7.9 |
|
| Rat (M) | Detection of light |
| Mouse (M) | GTP-binding | −0.23 | 34.4 | 56.6 | 18.0 | 26.5 | 15.8 | 21.1 | 3.7 |
|
|
| Dephosphorylation |
| P.aeruginosa (B) | GTPase | −19.00 | 31.6 | 90.3 | 26.5 | 42.2 | 34.5 | 44.8 | 1.5 |
|
| Human (M) | T-cell receptor |
| Mouse (M) | T-cell receptor | −0.75 | 39.5 | – | 20.3 | 33.8 | 12.5 | 29.2 | 10.4 |
|
| Human (M) | Ca, Zn binding |
| Human | Ca, Zn binding | −47.70 | 32.3 | 100.0 | 34.3 | 52.0 | 17.5 | 45.0 | 2.2 |
|
|
| Ion transport |
|
| H ion transport | −12.05 | 29.8 | – | 37.3 | 58.2 | 23.8 | 57.1 | 19.4 |
|
| Human (M) | T-cell receptor |
| Mouse | T-cell receptor | −2.70 | 39.9 | 77.4 | 10.5 | 29.1 | 5.6 | 27.8 | 7.4 |
|
|
| DNA repair |
|
| ATP binding | −0.20 | 31.1 | 60.0 | 23.3 | 39.2 | 15.0 | 45.0 | 22.0 |
|
| Human (M) | Immune response |
|
| Mn ion binding | −0.17 | 28.7 | 60.7 | 33.3 | 42.6 | 35.7 | 35.7 | 16.9 |
|
| Mouse (M) | GTPase |
|
| ATP binding | −0.60 | 38.7 | – | 22.2 | 36.1 | 25.0 | 29.2 | 31.1 |
|
| Human (M) | Immune response |
| Synthetic construct | MHC-I binding | −0.96 | 38.5 | 80.3 | 17.2 | 37.4 | 11.5 | 26.9 | 5.6 |
|
| HIV virus I (V) | RNA binding |
|
| Glucose metabolism | −1.24 | 33.6 | – | 38.9 | 48.2 | 25.0 | 25.0 | 33.4 |
|
| Human | Leukocyte migration |
| Human (M) | Mn ion binding | −6.00 | 28.2 | 100.0 | 36.6 | 51.2 | 40.0 | 60.0 | 1.3 |
|
| Yeast (F) | Protein ubiquitination |
|
| Ca ion binding | −0.66 | 20.5 | – | 9.3 | 33.0 | 3.2 | 41.9 | 53.4 |
|
|
| r,tRNA processing |
|
| Coenzyme binding | −0.75 | 39.1 | – | 12.4 | 32.6 | 6.5 | 19.4 | 8.6 |
|
| Mouse (M) | DNA damage repair |
|
| ATP binding | −10.40 | 37.1 | – | 23.0 | 40.5 | 18.5 | 33.3 | 4.1 |
|
| Human (M) | Cell division |
|
| Oxidation reduction | −1.52 | 36.1 | – | 37.5 | 59.4 | 40.0 | 60.0 | 21.5 |
|
| Mouse (M) | - |
| Mouse (M) | - | −0.43 | 38.9 | 73.4 | 28.9 | 46.4 | 35.7 | 57.1 | 5.0 |
|
| Yeast (F) | RNA binding |
|
| RNA binding | −4.70 | 38.9 | 68.3 | 15.2 | 38.0 | 11.8 | 29.4 | 3.2 |
|
| Human (M) | Cell proliferation |
| Yeast (F) | Cytokinesis | −2.70 | 20.8 | – | 25.4 | 46.5 | 36.1 | 52.8 | 6.2 |
|
| Cow(M) | Phosphorylation |
| Rat (M) | Phosphorylation | −1.68 | 11.0 | – | 9.8 | 31.7 | 21.1 | 31.6 | 48.7 |
|
| Rat/Mouse (M) | ADP ribosylation |
| Mouse (M) | GTP binding | −0.55 | 38.8 | 65.9 | 17.6 | 31.0 | 14.3 | 23.8 | 8.5 |
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
| ATP binding |
| Human (M) | ATP binding | −0.77 | 35.6 | – | 13.7 | 22.1 | 15.0 | 30.0 | 32.5 |
|
| Human (M) | Phosphorylation |
| Human | T-cell receptor | −0.35 | 36.4 | 68.9 | 22.2 | 32.1 | 10.4 | 17.2 | 18.5 |
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
| Human (M) | Phosphorylation |
| Wheat (P) | Sugar binding | −1.92 | 37.8 | – | 11.0 | 17.2 | 0.0 | 5.3 | 30.8 |
|
|
| Iron binding |
|
| FAD binding | −1.23 | 38.1 | – | 14.3 | 26.8 | 16.7 | 33.3 | 22.7 |
First four symbols are the PDB code followed by ID of the chain as in the PDB file. Asterisk indicates that protein is a monomer in the PDB file.
As provided in PDB file. Letters in parenthesis stand for higher levels of taxonomy classification (V: viruses; A: archaea; B: bacteria; F: fungi; P: plants; M: mammals).
Extracted from PDB GO terms section.
Logarithm of alignment expectation value (e-value).
Entire target sequence coverage in the alignment of the model, as defined by equation (1).
Coverage of the target binding domain (for multi-domain structures) in the alignment of the model.
As defined by Eq. 3.
As defined by Eq. 4.
RMSD between Cα atoms of the interface residues in the model and the native structure.
For some targets the parameters of the model with the smallest interface RMSD are shown if the best and the top models have substantially different interface RMSD values (in bold).
Figure 5Examples of partial homology models.
The models (white ribbons) are superimposed on the target native structures (gray ribbons). (A) Good accuracy model (interface RMSD = 5.0 Å) in the case of target and template proteins from the same organism. Target is malaria transmission blocking antibody 2A8 from mouse, (1z3g, chain H) and template is mouse BM3.3 T-cell receptor α-chain (1fo0, chain A). (B) Good accuracy model (interface RMSD = 3.7 Å) in the case of target and template proteins from different organisms. Target is guanine nucleotide-binding protein alpa-1 subunit from bovine, (1fqj, chain A) and template is yeast RAS-related protein RAB-33 (2g77, chain B). (C) Acceptable accuracy model (interface RMSD = 8.6 Å). Target is fibrillarin-like preRRNA processing protein from Archaeoglobus fulgidus (1nt2, chain A) and template is UDP-N-acetylglucosamine 4-epimerase from Pseudomonas aeruginosa (1sb8, chain A). (D) Incorrect model (interface RMSD = 16.9 Å). Target is human MHC Class II receptor HLA-DR1 (1kg0, chain B) and template is intron-encoded endonuclease from Desulfurococcus mobilis (1b24, chain A). Arrow indicates an incorrect loop which is the cause for large interface RMSD in this model. Blue and yellow meshes indicate positions of the backbone atoms of the interface residues in the model and the native structures, respectively. Other parameters of the models are presented in Table 3.