| Literature DB >> 27083051 |
Bin Zhou1,2, Qi Sun1,2, De-Xin Kong1,2.
Abstract
In this study, we proposed an improved algorithm for identifying proteins relevant to cancer. The algorithm was named two-layer molecular similarity ensemble approach (TL-SEA). We applied TL-SEA to analyzing the correlation between anticancer compounds (against cell lines K562, MCF7 and A549) and active compounds against separate target proteins listed in BindingDB. Several associations between cancer types and related proteins were revealed using this chemoinformatics approach. An analysis of the literature showed that 26 of 35 predicted proteins were correlated with cancer cell proliferation, apoptosis or differentiation. Additionally, interactions between proteins in BindingDB and anticancer chemicals were also predicted. We discuss the roles of the most important predicted proteins in cancer biology and conclude that TL-SEA could be a useful tool for inferring novel proteins involved in cancer and revealing underlying molecular mechanisms.Entities:
Keywords: cancer; cell line; chemoinformatics; drug development; similarity ensemble approach
Mesh:
Substances:
Year: 2016 PMID: 27083051 PMCID: PMC5078021 DOI: 10.18632/oncotarget.8716
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
List of the predicted cancer-related proteins
| Protein ID | Protein name | Source | AS score (rank) | Reference | ||
|---|---|---|---|---|---|---|
| K562 | MCF7 | A549 | ||||
| Melatonin receptor type 1B | Chicken | 0.0021 (1) | 0.0025 (1) | 0.0025 (1) | [ | |
| Tubulin beta-1 chain | Human | 0.0062 (2) | 0.0064 (3) | 0.0083 (4) | [ | |
| Tissue factor | Human | 0.0065 (3) | 0.0077 (6) | 0.0080 (3) | [ | |
| Atypical chemokine receptor 3 | Human | 0.0066 (4) | 0.0067 (4) | 0.0093 (6) | [ | |
| cAMP and cAMP-inhibited cGMP 3′,5′-cyclic phosphodiesterase 10A | Mouse | 0.0083 (5) | 0.0075 (5) | 0.0090 (5) | ||
| Prostaglandin G/H synthase 2 | Rat | 0.0093 (6) | 0.0084 (7) | 0.0103 (7) | [ | |
| Eukaryotic initiation factor 4A-I | Human | 0.0103 (7) | 0.0102 (8) | 0.0137 (9) | [ | |
| Macrophage metalloelastase | Mouse | 0.0125 (8) | 0.0118 (9) | 0.0149 (10) | [ | |
| Tubulin beta-2B chain | Bovine | 0.0139 (9) | 0.0124 (10) | 0.0155 (11) | [ | |
| Substance-K receptor | Mouse | 0.0149 (10) | 0.0143 (13) | 0.0175 (13) | [ | |
| Multidrug resistance protein 1B | Mouse | 0.0154 (11) | 0.0137 (12) | 0.0176 (14) | [ | |
| Aldo-keto reductase family 1 member C2 | Human | 0.0160 (12) | 0.0191 (19) | 0.0187 (15) | [ | |
| Sodium/iodide cotransporter | Rat | 0.0172 (13) | 0.0168 (15) | 0.0208 (19) | [ | |
| Receptor-type tyrosine-protein phosphatase C | Human | 0.0177 (14) | 0.0216 (22) | 0.0189 (16) | ||
| Toll-like receptor 4 | Mouse | 0.0180 (15) | 0.0201 (20) | 0.0286 (28) | [ | |
| Pituitary adenylate cyclase-activating polypeptide type I receptor | Human | 0.0188 (16) | 0.0176 (17) | 0.0171 (12) | [ | |
| Aryl hydrocarbon receptor | Rabbit | 0.0198 (17) | 0.0176 (16) | 0.0202 (17) | [ | |
| Potassium voltage-gated channel subfamily KQT member 2 | Human | 0.0226 (18) | 0.0261 (28) | 0.0267 (24) | [ | |
| Calcium-activated potassium channel subunit alpha-1 | Human | 0.0235 (19) | 0.0298 (35) | 0.0253 (23) | [ | |
| Luciferin 4-monooxygenase | Firefly | 0.0239 (20) | 0.0153 (14) | 0.0241 (20) | ||
| Monoacylglycerol lipase ABHD6 | Mouse | 0.0245 (21) | 0.0272 (29) | - | [ | |
| Melatonin receptor type 1B | Human | 0.0247 (22) | 0.0245 (24) | 0.0247 (22) | [ | |
| Vascular endothelial growth factor receptor 2 | Human | 0.0248 (23) | 0.0179 (18) | 0.0204 (18) | [ | |
| Potassium voltage-gated channel subfamily KQT member 1 | Human | 0.0268 (24) | 0.0216 (21) | 0.0244 (21) | ||
| Collagenase 3 | Rat | 0.0269 (25) | 0.0281 (31) | - | [ | |
| Melatonin receptor type 1A | Human | 0.0270 (26) | 0.0260 (27) | 0.0259 (24) | [ | |
| Prokineticin receptor 1 | Human | 0.0278 (27) | 0.0252 (26) | - | [ | |
| cGMP-inhibited 3′, 5′-cyclic phosphodiesterase B | Human | 0.0292 (28) | 0.0223 (23) | 0.0280 (27) | ||
| Neuropeptides B/W receptor type 1 | Human | 0.0294 (29) | 0.0284 (33) | - | [ | |
| cGMP-specific 3′,5′-cyclic phosphodiesterase | Human | 0.0296 (30) | 0.0282 (32) | - | [ | |
| Vasopressin V1b receptor | Rat | 0.0299 (31) | 0.0293 (34) | - | ||
| Endothelin-1 receptor | Mouse | - | 0.0033 (2) | 0.0043 (2) | ||
| Major prion protein | Sheep | - | 0.0134 (11) | 0.0122 (8) | ||
| Galanin receptor type 2 | Human | - | 0.0251 (25) | 0.0272 (26) | [ | |
| Voltage-dependent L-type calcium channel subunit alpha-1S | Human | - | 0.0275 (30) | - | ||
The list was sorted by K562 significance (AS score), and then by MCF7. References regarding to the proteins related to proliferation, apoptosis, or differentiation of cancer cells were listed in the last column.
Uniprot ID of the proteins [77].
- AS score larger than 0.03.
Figure 1Scatter graph of the mean value (top) and standard deviation (bottom) of random initial score (I) with different sampling lengths (m, horizontal axis)
(A) For K562 dataset. Fitting with formulae 2, 3, constant parameters were estimated (a = 0.0088, b = 0.9950, k = 0.0088). (B) For MCF7 dataset, a = 0.0086, b = 0.9952, k = 0.0090. (C) For A549 dataset, a = 0.0083, b = 0.9969, k = 0.0089.
Figure 2Chemical-protein association networks
The NCI compounds are represented with triangle nodes. Proteins are denoted with round nodes. Among the proteins, the important ones are denoted as orange squares. (A) Main network for blastic phase of chronic myelogenous leukemia (K562) cell line active compounds and proteins. Gray nodes denote that it does not appear in this system. (B) Main network for Non Small Cell Lung cancer (A549) cell line. (C) Main network for breast cancer (MCF7) cell line. C') and C”) are two sub-networks extracted from the MCF7 network. See text for details.
List of the proteins linked to more than 15 anticancer compounds according to Pz < 0.0001
| Protein_ID | Number of linked compounds | ||
|---|---|---|---|
| K562 | MCF7 | A549 | |
| Q61614 | - | 59 | 58 |
| Q9H4B7 | 30 | 27 | 32 |
| P41586 | 17 | 20 | 25 |
| P07382 | 31 | 31 | 45 |
| P00378 | 30 | 28 | 42 |
| P11387 | 29 | - | - |
| P00375 | 22 | 25 | 26 |
| P49892 | 21 | 40 | 31 |
| Q6Y1R5 | 21 | 41 | 31 |
| P07900_P08238 | 18 | 15 | - |
| P22102 | 18 | 22 | 24 |
| P34970 | 18 | 37 | 28 |
| Q8TEK3 | 18 | 40 | 28 |
| O02747 | 17 | - | 21 |
| Q05932 | 17 | 23 | 25 |
| P17707 | - | 44 | 35 |
| O02667 | - | 31 | 22 |
| P23526 | - | 26 | 23 |
| P05227 | - | 19 | 22 |
| P15328 | - | 18 | 20 |
| P28647 | - | 18 | - |
| O00142 | - | 15 | - |
| P41148 | - | 15 | - |
| Q62645 | - | 15 | 17 |
| P48544 | - | - | 20 |
| P48549 | - | - | 19 |
| Q01782 | - | - | 15 |
- with less than 15 linked compounds.
predicted as a cancer related protein.
Figure 3The overall protocol of this study
Figure 4Schematic representation of the TL-SEA algorithm
First, the target protein similarity matrix (Mt) was extracted from the overall NCI-BindingDB similarity matrix (M). Then, the matrix was translated into an initial score vector. Next, the initial score vector was normalized to the Z score vector through random column sampling. Finally, the association score (AS) was calculated based on the Z score vector and another random sampling of random similarity matrixes. Here, n is the number of the active compounds of a NCI cell line. S and S' are the similarity value between NCI compound and BindingDB compound. I is the sum of the similarity values over 0.15 in the corresponding column. Refer to the text for a detailed description.