| Literature DB >> 35203699 |
Valeria V Kleandrova1, Alejandro Speck-Planche2.
Abstract
Pancreatic cancer (PANC) is a dangerous type of cancer that is a major cause of mortality worldwide and exhibits a remarkably poor prognosis. To date, discovering anti-PANC agents remains a very complex and expensive process. Computational approaches can accelerate the search for anti-PANC agents. We report for the first time two models that combined perturbation theory with machine learning via a multilayer perceptron network (PTML-MLP) to perform the virtual design and prediction of molecules that can simultaneously inhibit multiple PANC cell lines and PANC-related proteins, such as caspase-1, tumor necrosis factor-alpha (TNF-alpha), and the insulin-like growth factor 1 receptor (IGF1R). Both PTML-MLP models exhibited accuracies higher than 78%. Using the interpretation from one of the PTML-MLP models as a guideline, we extracted different molecular fragments desirable for the inhibition of the PANC cell lines and the aforementioned PANC-related proteins and then assembled some of those fragments to form three new molecules. The two PTML-MLP models predicted the designed molecules as potentially versatile anti-PANC agents through inhibition of the three PANC-related proteins and multiple PANC cell lines. Conclusions: This work opens new horizons for the application of the PTML modeling methodology to anticancer research.Entities:
Keywords: IGF1R; MLP; TNF-alpha; caspase-1; cell line; fragment; multi-target; pancreatic cancer; virtual design
Year: 2022 PMID: 35203699 PMCID: PMC8962338 DOI: 10.3390/biomedicines10020491
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Experimental conditions reported in this work.
|
| Cutoff b |
|
|
|---|---|---|---|
| IC50 (nM)p | ≤1100 nM | Caspase-1 | B (assay format) |
| Caspase-1 | B (single protein format) | ||
| Caspase-1 | B (cell-based format) | ||
| ≤1635 nM | TNF-alpha | B (single protein format) | |
| TNF-alpha | F (assay format) | ||
| TNF-alpha | B (assay format) | ||
| TNF-alpha | B (cell-based format) | ||
| TNF-alpha | F (cell-based format) | ||
| ≤50 nM | IGF1R | B (single protein format) | |
| IGF1R | B (cell-based format) | ||
| IGF1R | B (assay format) | ||
| IGF1R | F (cell-based format) | ||
| IGF1R | F (assay format) | ||
| IC50 (nM)c | ≤6449.735 nM | PSN1 | F (cell-based format) |
| PANC-03-27 | F (cell-based format) | ||
| HPAC | F (cell-based format) | ||
| MZ1-PC | F (cell-based format) | ||
| KP-4 | F (cell-based format) | ||
| KP-2 | F (cell-based format) | ||
| PA-TU-8988T | F (cell-based format) | ||
| Capan-2 | F (cell-based format) | ||
| MIA-PaCa-2 | F (cell-based format) | ||
| CFPAC-1 | F (cell-based format) | ||
| PANC-10-05 | F (cell-based format) | ||
| BxPC-3 | F (cell-based format) | ||
| SUIT-2 | F (cell-based format) | ||
| KP-1N | F (cell-based format) | ||
| HuP-T4 | F (cell-based format) | ||
| SW1990 | F (cell-based format) | ||
| PL18 | F (cell-based format) | ||
| QGP-1 | F (cell-based format) | ||
| HuP-T3 | F (cell-based format) | ||
| SU8686 | F (cell-based format) | ||
| PL4 | F (cell-based format) | ||
| PA-TU-8902 | F (cell-based format) | ||
| PANC-02-03 | F (cell-based format) | ||
| DAN-G | F (cell-based format) | ||
| CAPAN-1 | F (cell-based format) | ||
| PANC-08-13 | F (cell-based format) | ||
| HPAF-II | F (cell-based format) | ||
| KP-3 | F (cell-based format) | ||
| YAPC | F (cell-based format) | ||
| AsPC-1 | F (cell-based format) | ||
| PANC-04-03 | F (cell-based format) |
a Measure of biological activity; IC50 (nM)p is the concentration required for 50% inhibition of a protein, while IC50 (nM)c is the concentration required for a chemical to inhibit cell viability by 50%. b Value of activity from which a molecule was labeled and considered as active (IAi(cj) = 1). c Refers to the targets (either a protein or a PANC cell line). d Information related to the diverse experimental assays. Here, each annotation is a combination of the columns “assay type” (first letter) and “BioAssay Ontology” (phrase between parentheses), which are reported in any ChEMBL file containing bioactivity data. Each assay involving a PANC cell line was annotated as “F (cell-based format)”.
Molecular descriptors of the type D(GTI)cj present in the PTML-MLP models.
| Model a | Symbology b | Code c | Concept |
|---|---|---|---|
|
| Deviation of the normalized spectral moment of order 3 based on hydrophobicity-weighted bonds. | ||
| Deviation of the normalized Kier–Hall (valence) connectivity index involving only path-based subgraphs of order 4. | |||
| Deviation of the normalized edge (bond) connectivity index involving only path-based subgraphs of order 1. | |||
| Deviation of the normalized edge (bond) connectivity index involving only path-based subgraphs of order 2. | |||
| Deviation of the normalized edge (bond) connectivity index involving only chain-based subgraphs of order 6. | |||
| Deviation of the spectral moment of order 7 based on hydrophobicity-weighted bonds. | |||
| Deviation of the edge (bond) connectivity index involving only chain-based subgraphs of order 5. | |||
| Deviation of the normalized spectral moment of order 1 based on bonds weighted by the polar surface area. | |||
| Deviation of the normalized spectral moment of order 3 based on bonds weighted by the Gasteiger–Marsili charges. | |||
| Deviation of the normalized Kier-Hall (valence) connectivity index involving only path-based subgraphs of order 1. | |||
| Deviation of the Kier-Hall (valence) connectivity index involving only chain-based subgraphs of order 6. | |||
| Deviation of the normalized spectral moment of order 1 based on hydrophobicity-weighted bonds. | |||
| Deviation of the normalized spectral moment of order 1 based on bonds weighted by the molar refractivity. | |||
| Deviation of the normalized edge (bond) connectivity index involving only path-based subgraphs of order 5. | |||
| Deviation of the normalized edge (bond) connectivity index involving only path-cluster subgraphs of order 6. | |||
|
| Deviation of the stochastic atom-based local quadratic index weighted by the hydrophobicity of the halogens and their neighbor atoms located at the topological distance of 4. | ||
| Deviation of the stochastic atom-based local quadratic index weighted by the hydrophobicity of the heteroatoms (N, O, S, P, and Se) and their neighbor atoms located at the topological distance of 3. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the hydrophobicity of the heteroatoms (N, O, S, P, and Se) and their neighbor atoms located at the topological distance of 4. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the electronegativity of the heteroatoms (N, O, S, P, and Se) and their neighbor atoms located at the topological distance of 2. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the polar surface area of the heteroatoms (N, O, S, P, and Se) and their neighbor atoms located at the topological distance of 1. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the atomic weight of the aliphatic carbons and their neighbor atoms located at the topological distance of 1. | |||
| Deviation of the stochastic atom-based local quadratic index (order 0) weighted by the Kupchik’s vertex degree of the halogens in a molecule. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the polar surface area of the heteroatoms (N, O, S, P, and Se) and their neighbor atoms located at the topological distance of 4. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the hydrophobicity of the halogens and their neighbor atoms located at the topological distance of 1. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the hydrophobicity of the halogens and their neighbor atoms located at the topological distance of 2. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the atomic weight of the halogens and their neighbor atoms located at the topological distance of 2. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the hydrophobicity of the aliphatic carbons (only methyl groups) and their neighbor atoms located at the topological distance of 1. | |||
| Deviation of the stochastic atom-based local quadratic index weighted by the Kupchik’s vertex degree of the aliphatic carbons (only methyl groups) and their neighbor atoms located at the topological distance of 1. | |||
| Deviation of the stochastic atom-based local quadratic index (order 0) weighted by the hydrophobicity heteroatoms (N, O, S, P, and Se) in a molecule. |
aModel 1, first PTML-MLP model, which contains the first 15 D(GTI)cj descriptors shown in this table; Model 2, second PTML-MLP model, which contains the remaining14 D(GTI)cj descriptors shown in this table. b Molecular descriptors of the type D(GTI)cj with endings on “ma” consider both the molecular structure and the measure of inhibitory activity. Those with the ending “tg” depend on the molecular structure and the biological target (either a protein or a PANC cell line). Finally, D(GTI)cj descriptors with the ending “ei” characterize the molecular structure and information on the diverse experimental assays. c Codes were used to abbreviate the representation of the D(GTI)cj descriptors.
Statistical indices demonstrating the performances of the two PTML-MLP models.
| SYMBOLS a,b |
|
| ||
|---|---|---|---|---|
| Training Set | Test Set | Training Set | Test Set | |
|
| 3010 | 1001 | 3010 | 1001 |
|
| 2495 (1293) | 799 (447) | 2486 (1084) | 785 (352) |
| 82.89% (42.96%) | 79.82% (44.66%) | 82.59% (36.01%) | 78.42% (35.16%) | |
|
| 4273 | 1421 | 4273 | 1421 |
|
| 3817 (3693) | 1207 (1219) | 3813 (3625) | 1195 (1194) |
| 89.33% (86.43%) | 84.94% (85.78%) | 89.23% (84.84%) | 84.10% (84.03%) | |
|
| 0.724 (0.331) | 0.646 (0.338) | 0.721 (0.241) | 0.624 (0.222) |
aNActive, Number of chemicals/cases annotated as active; NInactive, Number of chemicals/cases designated as inactive; CCCActive, Number of chemicals/cases correctly classified/predicted as active; CCCInactive, Number of chemicals/cases correctly classified/predicted as inactive; Sn(%), Sensitivity (percentage of chemicals/cases correctly classified as active); Sp(%), Specificity (percentage of chemicals/cases properly classified as inactive); MCC, Refers to the Matthews’ correlation coefficient. b Values between parentheses correspond to models derived from the technique known as linear discriminant analysis (LDA).
Molecular descriptors of the type D(GTI)cj present in the first PTML-MLP model (Model 1) and their relative propensities.
| Codes a | Descriptors | CLASS-BASED MEANS b | Propensity c | |
|---|---|---|---|---|
| Active | Inactive | |||
| −2.3485 × 10−2 | 1.0631 × 10−1 | Decrease | ||
| 6.6912 × 10−3 | 5.5922 × 10−3 | Increase | ||
| −3.2309 × 10−4 | 8.6125 × 10−2 | Decrease | ||
| −4.7514 × 10−2 | 1.4992 × 10−1 | Decrease | ||
| −4.1335 × 10−2 | 1.7654 × 10−1 | Decrease | ||
| 3.1706 × 10−2 | −2.5668 × 10−1 | Increase | ||
| 5.2858 × 10−3 | −4.9427 × 10−2 | Increase | ||
| 2.2379 × 10−2 | −1.9481 × 10−2 | Increase | ||
| 2.8122 × 10−2 | −1.3897 × 10−1 | Increase | ||
| −1.2919 × 10−2 | 1.1367 × 10−1 | Decrease | ||
| −4.9125 × 10−3 | −7.1198 × 10−2 | Increase | ||
| −5.5366 × 10−2 | 2.3959 × 10−1 | Decrease | ||
| −4.2905 × 10−2 | 2.2042 × 10−1 | Decrease | ||
| −1.2401 × 10−2 | −5.6792 × 10−2 | Increase | ||
| 3.4465 × 10−2 | −2.3273 × 10−1 | Increase | ||
a Symbols of the different molecular descriptors of the type D(GTI)cj in Model 1 as represented in Table 2. b Average values of each D(GTI)cj descriptor by considering the active and inactive categories. c Relative tendency of a molecular descriptor to vary (increase or decrease) its value, resulting in a simultaneous enhancement of the inhibitory activity against PANC-related proteins (caspase-1, TNF-alpha, and IGF1R) and the PANC cell lines.
Figure 1The different D(GTI)cj descriptors and their relative significances in Model 1.
Class-based means and relative propensities of the D(GTI)cj descriptors present in the second PTML-MLP model (Model 2).
| Codes a | Descriptors | CLASS-BASED MEANS b | Propensity c | |
|---|---|---|---|---|
| Active | Inactive | |||
| −5.4750 × 10−3 | −4.6447 × 10−3 | Decrease | ||
| 2.6348 × 10−2 | −1.8596 × 10−1 | Increase | ||
| 3.5673 × 10−2 | −1.3724 × 10−1 | Increase | ||
| 5.0259 × 10−2 | −2.2849 × 10−1 | Increase | ||
| 3.8557 × 10−3 | 6.3789 × 10−2 | Decrease | ||
| 5.9744 × 10−2 | −3.4756 × 10−1 | Increase | ||
| 3.9557 × 10−3 | 1.1249 × 10−1 | Decrease | ||
| 3.1247 × 10−2 | −1.2646 × 10−1 | Increase | ||
| −1.1160 × 10−2 | 4.0920 × 10−2 | Decrease | ||
| 5.8725 × 10−4 | 3.3683 × 10−2 | Decrease | ||
| 9.2750 × 10−3 | 1.9038 × 10−2 | Decrease | ||
| 2.6208 × 10−2 | −1.7829 × 10−1 | Increase | ||
| 2.8400 × 10−2 | −3.0151 × 10−1 | Increase | ||
| 1.3478 × 10−2 | −2.1995 × 10−2 | Increase | ||
a Symbols of the different D(GTI)cj descriptors in the second PTML-MLP model according to Figure 2. b Average values of each D(GTI)cj descriptor by considering the active and inactive categories. c Relative tendency of each D(GTI)cj descriptor to vary (increase or decrease) its value, resulting in a simultaneous enhancement of the inhibitory activity against PANC-related proteins (caspase-1, TNF-alpha, and IGF1R) and the PANC cell lines.
Figure 2Relative importance of the different D(GTI)cj descriptors in Model 2.
Figure 3Generic molecular fragments directly extracted from the physicochemical and structural interpretation of the descriptors in Model 2. The descriptors are associated with different fragments. The symbols mean A = amino, hydroxyl, alkylamino, or alkoxy; G = halogen; Q = amino, hydroxyl, alkylamino, alkoxy, or a non-substituted secondary carbon; X = O, -NH-, or a secondary carbon; Z = N or aromatic carbon.
Figure 4New molecules designed from suitable molecular fragments by using the physicochemical and structural interpretations as guidelines.
Physicochemical properties estimated for the designed molecules.
| ID a | nHDon | nHAcc | MW (Da) | MlogP | AlogP | MR (cm3/mol) | nAT | RBN | PSA (Å) |
|---|---|---|---|---|---|---|---|---|---|
| MPMCI-001 | 4 | 8 | 311.41 | 1.429 | 0.675 | 83.009 | 44 | 6 | 104.37 |
| MPMCI-002 | 4 | 9 | 354.49 | 2.259 | 1.578 | 99.009 | 52 | 8 | 102.41 |
| MPMCI-003 | 2 | 9 | 340.46 | 2.151 | 1.515 | 92.717 | 49 | 8 | 92.27 |
a The physicochemical properties described in this table are as follows: number of hydrogen bond donors (nHDon), number of hydrogen bond acceptors (nHAcc), molecular weight (MW), logarithm of the octanol/water partition coefficient according to the Moriguchi approach (MlogP), logarithm of the octanol/water partition coefficient according to the Ghose–Crippen approach (AlogP), molar refractivity according to the Ghose–Crippen approach (MR), total number of atoms (nAT), number of rotatable bonds (RBN), and polar surface area (PSA).