| Literature DB >> 18554402 |
John C Boik1, Robert A Newman.
Abstract
BACKGROUND: Quantitative structure-activity relationship (QSAR) models have become popular tools to help identify promising lead compounds in anticancer drug development. Few QSAR studies have investigated multitask learning, however. Multitask learning is an approach that allows distinct but related data sets to be used in training. In this paper, a suite of three QSAR models is developed to identify compounds that are likely to (a) exhibit cytotoxic behavior against cancer cells, (b) exhibit high rat LD50 values (low systemic toxicity), and (c) exhibit low to modest human oral clearance (favorable pharmacokinetic characteristics). Models were constructed using Kernel Multitask Latent Analysis (KMLA), an approach that can effectively handle a large number of correlated data features, nonlinear relationships between features and responses, and multitask learning. Multitask learning is particularly useful when the number of available training records is small relative to the number of features, as was the case with the oral clearance data.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18554402 PMCID: PMC2442056 DOI: 10.1186/1471-2210-8-12
Source DB: PubMed Journal: BMC Pharmacol ISSN: 1471-2210
Results for oral clearance (OC) models
| OC.1 | oral clearance | Gaussian | subset | 0.649 (0.060) | 0.569 (0.111) |
| OC.2 | oral clearance | linear | subset | 0.606 (0.071)* | 0.547 (0.107)** |
| OC.3 | oral clearance & bioavailability | Gaussian | subset | 0.636 (0.060) | 0.625 (0.091) |
| OC.4 | oral clearance & bioavailability | Gaussian | all features | 0.626 (0.066) | 0.634 (0.108)*** |
| OC.5 | oral clearance & bioavailability | linear | subset | 0.635 (0.061) | 0.637 (0.093) |
* Significant differences by ANOVA (p < 0.05): OC.1 vs. OC.2
** Significant differences by ANOVA (p < 0.05): OC.2 vs. OC.3, OC.4, and OC.5 when testing OC.1 through OC.5
*** Significant differences by ANOVA (p < 0.05): OC.4 vs. OC.1 when testing OC.1, OC.3, and OC.4
Results for rat LD50 (LD) models
| LD.1 | linear | 0.702 (0.042) | 0.674 (0.021)* |
| LD.2 | Gaussian | 0.717 (0.029) | 0.707 (0.019) |
* Significant differences by ANOVA (p < 0.05): Model 1 vs. Model 2
Results for cytotoxicity (C) models
| C.1 | H460 | LC50 | Gaussian | 0.707 (0.032) | 0.809 (0.008) |
| C.2 | H460 | LC50 & TGI | Gaussian | LC50: 0.711 (0.034) TGI: 0.732 (0.021) | LC50: 0.809 (0.009) TGI: 0.834 (0.009) |
| C.3 | H460 | TGI | Gaussian | 0.729 (0.028) | 0.838 (0.009) |
| C.4 | H460 | LC50 | linear | 0.630 (0.056)* | 0.788 (0.014)* |
| C.5 | H460 | TGI | linear | 0.683 (0.047)** | 0.803 (0.013)** |
| C.6 | MCF7 | LC50 | Gaussian | 0.675 (0.019) | 0.801 (0.011) |
| C.7 | MCF7 | LC50 & TGI | Gaussian | LC50: 0.674 (0.036) TGI: 0.685 (0.027) | LC50: 0.798 (0.016) TGI: 0.820 (0.012) |
| C.8 | SF-268 | LC50 & TGI | Gaussian | LC50: 0.665 (0.056) TGI: 0.698 (0.047) | LC50: 0.826 (0.014) TGI: 0.845 (0.013) |
* Significant differences by ANOVA (p < 0.05): C.4 vs. C.1 and C.2
** Significant differences by ANOVA (p < 0.05): C.5 vs. C.2 and C.3
Summary of screening results
| 1 | Oral clearance, LD50, and all six cytotoxicity models | 0.0036 | 416 |
| 2 | Oral clearance, LD50, and passing in both H460 models and both LC50 and TGI of either MCF7 or SF-268 models | 0.0043 | 498 |
| 3 | LD50 and passing in both H460 models and both LC50 and TGI of either MCF7 or SF-268 models | 0.035 | 4,014 |
| 4 | Oral clearance and passing in both H460 models and both LC50 and TGI of either MCF7 or SF-268 models | 0.017 | 1,981 |
| 5 | Oral clearance, LD50, and passing in both H460 models | 0.0045 | 520 |
| 6 | Oral clearance and passing in both H460 models | 0.039 | 4,458 |
| 7 | LD50 and passing in both H460 models | 0.022 | 2,255 |
| 8 | Passing both H460 models only | 0.24 | 27,608 |
Modeling phases and data partitions
| Phase I | training set | 348 of 435 (80%) | 3,095 of 3,869 (80%) | 4,000 of 8,983 (44%), chosen from 70% |
| testing set | 87 of 435 (20%) | 774 of 3,869 (20%) | 2,695 of 8,983 (30%) | |
| Phase II | training set | 435 of 435 (100%) | 3,869 of 3,869 (100%) | 4,500 of 8,983 (50%), includes all positive labels |