| Literature DB >> 35458706 |
Shivangi Gupta1, Jerome Baudry2, Vineetha Menon1.
Abstract
In the living cells, proteins bind small molecules (or "ligands") through a "conformational selection" mechanism, where a subset of protein structures are capable of binding the small molecules well while most other protein structures are not capable of such binding. The present work uses machine learning approaches to identify, in a very large amount of protein:ligand complexes, what protein properties are associated with their capacity to bind small molecules. In order to do so, we calculate 40 physicochemical properties on about 1.5 millions of protein conformations: ligand and protein conformations. This work describes a machine learning approach to identify the unique physico-chemical descriptors of a protein that maximize the prediction rate of potential protein molecular conformations for the test case proteins ADORA2A (Adenosine A2a Receptor), ADRB2 (Adrenoceptor Beta 2) and OPRK1 (Opioid Receptor Kappa 1). We find adequate machine learning techniques can increase by an order of magnitude the identification of "binding protein conformations" in an otherwise very large ensemble of protein conformations, compared to random selection of protein conformations. This opens the door to the systematic identification of such "binding conformations" for proteins and provides a big data approach to the conformational selection mechanism.Entities:
Keywords: big data; deep learning; drug discovery; feature selection; machine learning; protein conformation selection
Mesh:
Substances:
Year: 2022 PMID: 35458706 PMCID: PMC9025728 DOI: 10.3390/molecules27082509
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Flowchart of the proposed sampling-based framework with Enrichment ratio.
Describes the protein descriptors for ADORA2A, ADRB2 and OPRK1 datasets. The molecular descriptors were calculated using the protein descriptors of the program MOE [4,5,17].
| Protein Property | Description |
|---|---|
| pro_pI_seq ** | Sequence based pI |
| pro_mass | Protein Mass |
| pro_debye | Debye Screening length: Thickness of the Stern layer |
| pro_pI_3D | Structure-based pI Prediction |
| pro_coeff_280 | Extinction coefficient at 280 nm |
| pro_coeff_fric | Frictional Coefficient |
| pro_coeff_diff | Diffusion coefficient |
| pro_r_gyr | Radius of Gyration |
| pro_r_solv | Hydrodynamic Radius |
| pro_sed_const | Sedimentation Constant |
| pro_eccen | Protein Eccentricity |
| pro_asa_vdw | Water Accessible Surface Area |
| pro_asa_hyd | Hydrophobic Surface Area |
| pro_asa_hph | Hydrophilic Surface Area |
| pro_volume | Protein Volume |
| pro_mobility | Protein Mobility |
| pro_helicity | Protein Helix Ratio |
| pro_henry | Henry’s Function f(ka) |
| pro_net_charge | Protein Net Charge |
| pro_app_charge | Protein Charge at Debye Length |
| pro_dipole_moment | Protein Dipole Moment |
| pro_hyd_moment | Hydrophobicity moment |
| pro_zeta | Zeta Potential |
| pro_zdipole | Zeta Dipole Moment |
| pro_zquadrupole | Zeta Quadrupole Moment |
| pro_patch_hyd | Area of hydrophobic protein patch(es) |
| pro_patch_hyd_1 | Area of largest hydrophobic protein patch(es) |
| pro_patch_hyd_2 | Area of 2 largest hydrophobic protein patch(es) |
| pro_patch_hyd_3 | Area of 3 largest hydrophobic protein patch(es) |
| pro_patch_hyd_4 | Area of 4 largest hydrophobic protein patch(es) |
| pro_patch_hyd_5 | Area of 5 largest hydrophobic protein patch(es) |
| pro_patch_hyd_n | Count of hydrophobic protein patch(es) |
| pro_patch_ion | Area of ionic protein patch(es) |
| pro_patch_ion_1 | Area of largest ionic protein patch(es) |
| pro_patch_ion_2 | Area of 2 largest ionic protein patch(es) |
| pro_patch_ion_3 | Area of 3 largest ionic protein patch(es) |
| pro_patch_ion_4 | Area of 4 largest ionic protein patch(es) |
| pro_patch_ion_5 | Area of 5 largest ionic protein patch(es) |
| pro_patch_ion_n | Count of ionic protein patch(es) |
| pro_patch_neg | Area of negative protein patch(es) |
| pro_patch_neg_1 | Area of largest negative protein patch(es) |
| pro_patch_neg_2 | Area of 2 largest negative protein patch(es) |
| pro_patch_neg_3 | Area of 3 largest negative protein patch(es) |
| pro_patch_neg_4 | Area of 4 largest negative protein patch(es) |
| pro_patch_neg_5 | Area of 5 largest negative protein patch(es) |
| pro_patch_neg_n | Count of negative protein patch(es) |
| pro_patch_pos | Area of positive protein patch(es) |
| pro_patch_pos_1 | Area of largest positive protein patch(es) |
| pro_patch_pos_2 | Area of 2 largest positive protein patch(es) |
| pro_patch_pos_3 | Area of 3 largest positive protein patch(es) |
| pro_patch_pos_4 | Area of 4 largest positive protein patch(es) |
| pro_patch_pos_5 | Area of 5 largest positive protein patch(es) |
| pro_patch_pos_n | Count of positive protein patch(es) |
Note **: ADRB2 has 1 additional feature-pro_pl_seq.
Figure 2Enrichment ratio Framework.
Enrichment Ratios of ADORA2A on the original dataset with no feature selection with training size of 30%.
| Classifier | Maxima | Filter | % of Data Used | Minima | Filter | % of Data Used |
|---|---|---|---|---|---|---|
| LR + SMOTE − KNN | 11.0 | Filter C | 0.5% | 10.1 | Filter A | 0.5% |
| LR + SMOTE − GB | 10.7 | Filter B | 0.5% | 10.1 | Filter C | 1.0% |
Enrichment Ratios of ADRB2 on the original dataset with no feature selection with training size of 30%.
| Classifier | Maxima | Filter | % of Data Used | Minima | Filter | % of Data Used |
|---|---|---|---|---|---|---|
| LR + SMOTE − KNN | 21.7 | Filter B | 1.0% | 11.2 | Filter C | 0.5% |
| LR + SMOTE − GB | 8.3 | Filter B | 1.0% | 4.2 | Filter A | 5.0% |
Enrichment Ratios of OPRK1 on the original dataset with no feature selection with a training size of 30%.
| Classifier | Maxima | Filter | % of Data Used | Minima | Filter | % of Data Used |
|---|---|---|---|---|---|---|
| LR + SMOTE − KNN | 20.1 | Filter A | 0.5% | 18.5 | Filter B | 10% |
| LR + SMOTE − GB | 13.3 | Filter C | 0.5% | 3.9 | Filter A | 0.5% |
Common selected features between ADORA2A and OPRK1 having a feature score of 3.
| pro_asa_vdw | pro_hyd_moment | pro_asa_hyd | pro_patch_neg_n | pro_zquadrapole |
Common selected features between ADORA2A and ADRB2 having a feature score of 3.
| pro_hyd_moment |
Common selected features between ADRB2 and OPRK1 having a feature score of 3.
| pro_patch_hyd | pro_patch_hyd_5 | pro_patch_neg_1 |