| Literature DB >> 32290461 |
George Nicolae Daniel Ion1, George Mihai Nitulescu1.
Abstract
Protein kinases play a pivotal role in signal transduction, protein synthesis, cell growth and proliferation. Their deregulation represents the basis of pathogenesis for numerous diseases such as cancer and pathologies with cardiovascular, nervous and inflammatory components. Protein kinases are an important target in the pharmaceutical industry, with 48 protein kinase inhibitors (PKI) already approved on the market as treatments for different afflictions including several types of cancer. The present work focuses on facilitating the identification of new PKIs with antitumoral potential through the use of data-mining and basic statistics. The National Cancer Institute (NCI) granted access to the results of numerous previously tested compounds on 60 tumoral cell lines (NCI-60 panel). Our approach involved analyzing the NCI database to identify compounds that presented similar growth inhibition (GI) profiles to that of existing PKIs, but different from approved oncologic drugs with other mechanisms of action, using descriptive statistics and statistical outliers. Starting from 34,000 compounds present in the database, we filtered 400 which displayed selective inhibition on certain cancer cell lines similar to that of several already-approved PKIs.Entities:
Keywords: NCI-60 cells; anti-proliferative fingerprint; anticancer drug screening; data-mining; drug discovery; drug repurposing; protein kinase inhibitors; targeted therapy
Mesh:
Substances:
Year: 2020 PMID: 32290461 PMCID: PMC7221881 DOI: 10.3390/molecules25081766
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Descriptive statistics for the testing and predictive sets.
| Descriptives | Testing Set | Predictive Set | |
|---|---|---|---|
| PKI Group | AOD Group | ||
| No. of compounds | 9137 | 18 | 80 |
| No. of cell lines | 60 | 60 | 60 |
| Total datapoints | 548,220 | 1080 | 4800 |
| Missing values | 33,861 | 49 | 122 |
| Average no. of datapoints/compound | 56.29 | 57.28 | 58.47 |
| Average pGI50 value | 5.10 | 5.74 | 5.46 |
| Total no. of outliers | 36,570 | 69 | 253 |
| No. of outliers/compound | 1–22 | 0–8 | 0–14 |
| Range* | 0.002–6.65 | 0.54–3.98 | 0.12–4.4 |
| Standard deviation | 0.0003–1.6617 | 0.1168–1.3622 | 0.028–0.9999 |
No. = number; Avg* = average value; Range* (for a compound) = difference between the maximum value and minimum value for all of the compound’s datapoints; PKI = protein kinase inhibitors; AOD = approved oncologic drugs; pGI50 = negative log values of 50% growth inhibition concentration.
Cancer cell lines encompassed in the NCI-60 panel with their weights calculated based on lower and upper outlier frequency in each of the predictive set’s groups, sorted by represented tissue.
| Cell No. | Cell Line | Tumoral Tissue Type | Cell line Weight Factors | Cell No. | Cell Line | Tumoral Tissue Type | Cell line Weight Factors | ||
|---|---|---|---|---|---|---|---|---|---|
| Upper Outliers | Lower Outliers | Upper Outliers | Lower Outliers | ||||||
| 1 | CCRF-CEM | Leukemia | −4.44 | 0.00 | 31 | M14 | Melanoma | 5.56 | −2.50 |
| 2 | HL-60(TB) | Leukemia | −20.00 | 5.56 | 32 | MDA-MB-435 | Melanoma | 3.06 | 0.00 |
| 3 | K-562 | Leukemia | 11.67 | 11.11 | 33 | SK-MEL-2 | Melanoma | 1.81 | −1.25 |
| 4 | MOLT-4 | Leukemia | −6.94 | −1.25 | 34 | SK-MEL-28 | Melanoma | 11.11 | −3.75 |
| 5 | RPMI-8226 | Leukemia | −1.94 | 5.56 | 35 | SK-MEL-5 | Melanoma | 3.06 | 0.00 |
| 6 | SR | Leukemia | −6.39 | 5.56 | 36 | UACC-257 | Melanoma | 1.81 | −2.50 |
| 7 | A549/ATCC | NSCLC | −1.25 | −1.25 | 37 | UACC-62 | Melanoma | 1.81 | −1.25 |
| 8 | EKVX | NSCLC | 27.78 | −6.25 | 38 | IGROV1 | Ovarian | 1.81 | −1.25 |
| 9 | HOP-62 | NSCLC | 0.00 | 4.31 | 39 | OVCAR-3 | Ovarian | 0.00 | 3.06 |
| 10 | HOP-92 | NSCLC | 8.61 | −3.75 | 40 | OVCAR-4 | Ovarian | −1.25 | −7.50 |
| 11 | NCI-H226 | NSCLC | 11.11 | 4.31 | 41 | OVCAR-5 | Ovarian | 5.56 | −2.50 |
| 12 | NCI-H23 | NSCLC | −1.25 | −1.25 | 42 | OVCAR-8 | Ovarian | 0.00 | −1.25 |
| 13 | NCI-H322M | NSCLC | 27.78 | −5.00 | 43 | NCI/ADR-RES | Ovarian | −1.25 | −14.44 |
| 14 | NCI-H460 | NSCLC | −10.00 | −1.25 | 44 | SK-OV-3 | Ovarian | 9.86 | 9.86 |
| 15 | NCI-H522 | NSCLC | 1.81 | 0.00 | 45 | 786-0 | Renal | −3.75 | 4.31 |
| 16 | COLO 205 | Colon | 5.56 | −1.25 | 46 | A498 | Renal | 24.03 | 4.31 |
| 17 | HCC-2998 | Colon | −2.50 | −1.25 | 47 | ACHN | Renal | 15.42 | −5.00 |
| 18 | HCT-116 | Colon | 4.31 | 4.31 | 48 | CAKI-1 | Renal | 17.22 | −7.50 |
| 19 | HCT-15 | Colon | −2.50 | −7.50 | 49 | RXF 393 | Renal | 4.31 | 0.00 |
| 20 | HT29 | Colon | 9.86 | 0.00 | 50 | SN12C | Renal | 0.00 | −1.25 |
| 21 | KM12 | Colon | 16.67 | 4.31 | 51 | TK-10 | Renal | 16.67 | −6.25 |
| 22 | SW-620 | Colon | 0.00 | 5.56 | 52 | UO-31 | Renal | 3.06 | −8.75 |
| 23 | SF-268 | CNS | 0.00 | 0.00 | 53 | PC-3 | Prostate | −1.25 | −2.50 |
| 24 | SF-295 | CNS | −1.25 | 0.00 | 54 | DU-145 | Prostate | −2.50 | −1.25 |
| 25 | SF-539 | CNS | 4.31 | 5.56 | 55 | MCF7 | Breast | −2.50 | 0.00 |
| 26 | SNB-19 | CNS | 0.00 | −0.69 | 56 | MDA-MB-231/ATCC | Breast | −1.25 | −5.00 |
| 27 | SNB-75 | CNS | 6.11 | −1.25 | 57 | MDA-MB-468 | Breast | −3.75 | 0.00 |
| 28 | U251 | CNS | −1.25 | 5.56 | 58 | HS 578T | Breast | 7.36 | −2.50 |
| 29 | LOX IMVI | Melanoma | −3.75 | 0.00 | 59 | BT-549 | Breast | 1.81 | −3.75 |
| 30 | MALME-3M | Melanoma | 5.56 | −1.25 | 60 | T-47D | Breast | −3.75 | 1.81 |
NSCLC = non-small cell lung cancer; CNS = central nervous system; NCI-60 = National Cancer Institute’s previously tested compounds on 60 tumoral cell lines.
Descriptive statistics for the 409 compounds identified as potential PKIs.
| Descriptives | Resulted Compounds Set |
|---|---|
| Number of compounds | 409 |
| Total datapoints | 24,540 |
| Missing values | 2207 |
| Average pGI50 | 5.90 |
| Total outliers | 1907 |
| No. outliers/compound | 1–19 (Avg = 4.66) |
| Range | 0.44–6.66 (Avg = 2.03) |
| Standard deviation | 0.0914–7.1358 (Avg = 0.4363) |
| Score values | 10–118.33 (Avg = 19.66) |
| SD of score values | 10.1343 |
| Upper fence values | 5.02–10.19 (Avg = 6.61) |
| SD of upper fence values | 0.7577 |
Upper fence-SD = standard deviation, Avg* = average value.
Descriptive statistics for the computed physicochemical properties of the 409 predicted potential PKIs.
| Minimum | Maximum | Mean | Standard Deviation | |
|---|---|---|---|---|
| Molecular weight | 119.19 | 1546.61 | 423.55 | 186.92 |
| cLogP | −13.21 | 13.71 | 3.25 | 2.90 |
| No. of H-Acceptors | 0 | 45 | 5.78 | 4.82 |
| No. of H-Donors | 0 | 28 | 1.51 | 2.62 |
| Total surface area | 91.17 | 1084.60 | 295.57 | 134.93 |
| Relative polar surface area | −0.01 | 0.69 | 0.23 | 0.12 |
| Molecular flexibility | 0 | 0.86 | 0.37 | 0.18 |
| Molecular complexity | 0.38 | 1.28 | 0.83 | 0.14 |
| No. of non-C/H atoms | 1 | 47 | 6.88 | 4.93 |
| No. rotatable bonds | 0 | 40 | 5.89 | 6.12 |
| No. rings closures | 0 | 20 | 3.40 | 2.00 |
| No. of aromatic rings | 0 | 8 | 2.06 | 1.49 |
Top 10 best scoring compounds.
| Compound | Score | Upper Fence | Upper Outlier Count | Outlier Cell Lines |
|---|---|---|---|---|
| NSC 693255 | 118.33 | 6.278 | 8 | ACHN, CAKI-1, EKVX, IGROV1, NCI-H322M, NCI-H522, SK-OV-3, TK-10 |
| NSC 686288 | 62.5 | 6.435 | 11 | A498, CAKI-1, IGROV1, K-562, MCF7, NCI-H460, OVCAR-5, SW-620, T-47D, TK-10, UACC-257 |
| NSC 665910 | 54.16 | 7.016 | 2 | A549/ATCC, HS 578T, NCI-H226, NCI-H322M, SF-295, SK-OV-3, SNB-19, TK-10 |
| NSC 669364 | 52.36 | 6.225 | 5 | ACHN, DU-145, EKVX, NCI-H522, SK-OV-3 |
| NSC 24112 | 52.08 | 6.120 | 6 | A498, HCT-116, HOP-92, HT29, K-562, SR |
| NSC 22323 | 51.94 | 7.234 | 9 | CCRF-CEM, EKVX, HCT-116, HOP-62, HOP-92, HT29, K-562, MOLT-4, NCI-H23, RPMI-8226 |
| NSC 61805 | 49.30 | 6.276 | 6 | COLO 205, EKVX, MCF7, MOLT-4, PC-3, RXF 393, TK-10 |
| NSC 239072 | 48.75 | 8.003 | 13 | A498, A549/ATCC, ACHN, COLO 205, HCC-2998, HL-60(TB), HOP-92, K-562, MDA-MB-231/ATCC, MDA-MB-435, NCI-H460, SF-295, TK-10 |
| NSC 676469 | 48.61 | 6.093 | 6 | ACHN, HCT-15, K-562, M14, TK-10, UACC-62 |
| NSC 650395 | 47.77 | 8.784 | 9 | A498, BT-549, CAKI-1, HCC-2998, HOP-62, HOP-92, MCF7, MDA-MB-231/ATCC, OVCAR-3, PC-3, RXF 393, SF-295, SK-OV-3, SNB-19, UACC-62 |
Upper outlier = any pGI50 value bigger than the upper fence (1,5 * IQR); IQR = interquartile range; NSC = Cancer Chemotherapy National Service Center number.
Figure 1(a) Distribution of the pGI50 upper outlier values obtained for the NCI-60 cancer cell lines for the 18 PKIs from the predictive set. (b) Distribution of the pGI50 upper outlier values obtained for the NCI-60 cancer cell lines for the top 10 best-scoring compounds illustrated in Table 5. Cell line numbers correspond to cell line names as initially presented in Table 2 and are separated by tissue types for easier interpretation. The graph shows colored bars corresponding to a certain compound wherever the pGI50 value was identified as being an upper outlier for that compound. By comparison, some similarities between profiles of some of the PKIs from the predictive set and the top 10 predicted potential PKIs can be observed.
Figure 2Chemical structures of the top 10 best-scoring compounds based on the prediction algorithm.
Figure 3Receiver operating characteristic (ROC) analysis diagram. Maximum area under the curve = 0.952, with standard error = 0.025.
Top 10 predicted potential PKIs correlated with known PKI drugs using COMPARE algorithm.
| Compound | Correlation Found with at Least One Compound (Pearson>0.4) | Correlation with PKI (with Correlation Score) |
|---|---|---|
| NSC 693255 | Yes | erlotinib (0.76) |
| NSC 686288 | No | - |
| NSC 665910 | No | - |
| NSC 669364 | Yes | erlotinib (0.63) |
| NSC 24112 | Yes | imatinib (0.44) |
| NSC 22323 | Yes | imatinib (0.44) |
| NSC 61805 | No | - |
| NSC 239072 | No | - |
| NSC 676469 | No | - |
| NSC 650395 | No | - |