| Literature DB >> 34807909 |
Xingjie Pan1,2, Tanja Kortemme1,2,3,4.
Abstract
A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34807909 PMCID: PMC8648124 DOI: 10.1371/journal.pcbi.1009620
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Number of matched binding sites.
| Binding site library | Match type | Native Rossmann N | Native NTF2 N | ||||
|---|---|---|---|---|---|---|---|
| All binding sites N | fast | 6860 (248) | 8761 (795) | 9034 [2442] | 8909 [943] | ||
| Rosetta | 5896 (212) | 7450 (580) | 7475 [1791] | 7548 [678] | |||
| 3 protein residue binding sites N | fast | 3556 (324) | 5714 (1306) | 6537 [3305] | 6128 [1720] | 2544 | 3864 |
| Rosetta | 2142 (199) | 3541 (807) | 3715 [1772] | 3686 [952] | 1500f {1677} | 2395f {2482} |
a Total number of binding sites in the library.
b Total number of scaffolds in the fold family.
c Numbers in parentheses are binding sites that cannot be matched to de novo scaffolds with the same topology.
d Numbers in square brackets are binding sites that cannot be matched to native scaffolds with the same topology.
e Numbers of scaffolds in randomly resampled subsets of scaffolds.
f Average numbers of matches to 100 randomly resampled subsets of scaffolds.
g Numbers in curly braces are from the best subset of scaffolds in the 100 randomly resampled subsets.
Dependency of matching success on binding site size (number of protein residues).
| Binding site size | Native Rossmann | Native NTF2 | ||||||
|---|---|---|---|---|---|---|---|---|
| success count | success rate | success count | success rate | success count | success rate | success count | success rate | |
| 2 | 4590 | 80.9% | 5340 | 94.2% | 5328 | 93.8% | 5359 | 94.4% |
| 3 | 1182 | 21.4% | 1792 | 32.5% | 1853 | 33.4% | 1882 | 33.9% |
| 4 | 118 | 2.7% | 272 | 6.3% | 281 | 6.5% | 276 | 6.4% |
| 5 | 6 | 0.2% | 38 | 1.4% | 12 | 0.4% | 27 | 1.0% |
| 6 | 0 | 0 | 6 | 0.4% | 1 | 0.06% | 3 | 0.2% |
| 7 | 0 | 0 | 2 | 0.2% | 0 | 0 | 1 | 0.1% |