| Literature DB >> 26665059 |
David E Jones1, Hamidreza Ghandehari2, Julio C Facelli3.
Abstract
The use of data mining techniques in the field of nanomedicine has been very limited. In this paper we demonstrate that data mining techniques can be used for the development of predictive models of the cytotoxicity of poly(amido amine) (PAMAM) dendrimers using their chemical and structural properties. We present predictive models developed using 103 PAMAM dendrimer cytotoxicity values that were extracted from twelve cancer nanomedicine journal articles. The results indicate that data mining and machine learning can be effectively used to predict the cytotoxicity of PAMAM dendrimers on Caco-2 cells.Entities:
Keywords: data mining; machine learning; molecular descriptors; poly(amido amine) dendrimers (PAMAM)
Year: 2015 PMID: 26665059 PMCID: PMC4660915 DOI: 10.3762/bjnano.6.192
Source DB: PubMed Journal: Beilstein J Nanotechnol ISSN: 2190-4286 Impact factor: 3.649
Results from the 10-fold cross-validation listed by classifier for the first analysis including all molecular descriptors. See Equation 1–4 for the definition of precision, recall, F-measure, and mean absolute error and accuracy.
| Classifier | Precision | Recall | F-measure | Mean absolute error | Accuracy |
| Naive Bayes | 0.654 | 0.660 | 0.655 | 0.3370 | 66.0% |
| SMO | 0.738 | 0.738 | 0.725 | 0.2621 | 73.8% |
| J48 | 0.789 | 0.748 | 0.750 | 0.3077 | 74.8% |
| Bagging | 0.746 | 0.738 | 0.740 | 0.3211 | 73.8% |
| Classification via regression | 0.734 | 0.738 | 0.730 | 0.2978 | 73.8% |
| Filtered classifier | 0.789 | 0.748 | 0.750 | 0.3077 | 74.8% |
| LWL | 0.775 | 0.738 | 0.741 | 0.2966 | 73.8% |
| Decision table | 0.678 | 0.660 | 0.664 | 0.3878 | 66.0% |
| DTNB | 0.691 | 0.670 | 0.674 | 0.3490 | 67.0% |
| NBTree | 0.696 | 0.670 | 0.674 | 0.3511 | 67.0% |
| Random forest | 0.736 | 0.718 | 0.722 | 0.3077 | 71.8% |
Results from the 10-fold cross-validation listed by classifier for the second analysis including the automatically feature-selected molecular descriptors. See Equation 1–4 for the definition of precision, recall, F-measure, and mean absolute error and accuracy.
| Classifier | Precision | Recall | F-measure | Mean absolute error | Accuracy |
| Naive Bayes | 0.654 | 0.660 | 0.655 | 0.3370 | 66.0% |
| SMO | 0.738 | 0.738 | 0.725 | 0.2621 | 73.8% |
| J48 | 0.789 | 0.748 | 0.750 | 0.3077 | 74.8% |
| Bagging | 0.746 | 0.738 | 0.740 | 0.3211 | 73.8% |
| Classification via regression | 0.734 | 0.738 | 0.730 | 0.2978 | 73.8% |
| Filtered classifier | 0.789 | 0.748 | 0.750 | 0.3077 | 74.8% |
| LWL | 0.775 | 0.738 | 0.741 | 0.2966 | 73.8% |
| Decision table | 0.678 | 0.660 | 0.664 | 0.3878 | 66.0% |
| DTNB | 0.691 | 0.670 | 0.674 | 0.3490 | 67.0% |
| NBTree | 0.696 | 0.670 | 0.674 | 0.3572 | 67.0% |
| Random forest | 0.736 | 0.718 | 0.722 | 0.2988 | 71.8% |
Results from the 10-fold cross-validation listed by classifier for the third analysis including the molecular descriptors selected by experts. See Equation 1–4 for the definition of precision, recall, F-measure, and mean absolute error and accuracy.
| Classifier | Precision | Recall | F-measure | Mean absolute error | Accuracy |
| Naive Bayes | 0.762 | 0.748 | 0.750 | 0.2822 | 74.8% |
| SMO | 0.738 | 0.738 | 0.725 | 0.2621 | 73.8% |
| J48 | 0.789 | 0.748 | 0.750 | 0.3077 | 74.8% |
| Bagging | 0.731 | 0.718 | 0.721 | 0.3217 | 71.8% |
| Classification via regression | 0.762 | 0.748 | 0.750 | 0.3230 | 74.8% |
| Filtered classifier | 0.804 | 0.757 | 0.760 | 0.3061 | 75.7% |
| LWL | 0.834 | 0.777 | 0.778 | 0.3008 | 77.7% |
| Decision table | 0.658 | 0.650 | 0.653 | 0.3980 | 65.0% |
| DTNB | 0.658 | 0.650 | 0.653 | 0.3969 | 65.0% |
| NBTree | 0.722 | 0.689 | 0.693 | 0.3454 | 68.9% |
| Random forest | 0.758 | 0.748 | 0.750 | 0.2973 | 74.8% |
Figure 1Decision tree for both 10-fold and leave-one-out cross-validation J48 classifier of the first, second, and third analyses. The values indicated on the branches represent the rule or decision used for making the classification. The boxes at the bottom represent the classifications with the number of PAMAM dendrimers classified as such on the left and the number of exceptions (misclassifications) on the right.
Results from the 10-fold cross-validation listed by classifier for the fourth analysis including the expert-selected molecular descriptors with cytotoxicity concentration. See Equation 1–4 for the definition of precision, recall, F-measure, and mean absolute error and accuracy.
| Classifier | Precision | Recall | F-measure | Mean absolute error | Accuracy |
| Naive Bayes | 0.755 | 0.738 | 0.741 | 0.2984 | 73.8% |
| SMO | 0.738 | 0.738 | 0.725 | 0.2621 | 73.8% |
| J48 | 0.838 | 0.835 | 0.836 | 0.2203 | 83.5% |
| Bagging | 0.836 | 0.835 | 0.835 | 0.2618 | 83.5% |
| Classification via regression | 0.742 | 0.738 | 0.739 | 0.3157 | 73.8% |
| Filtered classifier | 0.804 | 0.757 | 0.760 | 0.3061 | 75.7% |
| LWL | 0.834 | 0.777 | 0.778 | 0.2995 | 77.7% |
| Decision table | 0.658 | 0.650 | 0.653 | 0.3980 | 65.0% |
| DTNB | 0.658 | 0.650 | 0.653 | 0.3969 | 65.0% |
| NBTree | 0.716 | 0.689 | 0.693 | 0.3347 | 68.9% |
| Random forest | 0.769 | 0.767 | 0.768 | 0.2483 | 76.7% |
Figure 2Decision tree for 10-fold cross-validation J48 classifier for the fourth analysis including the molecular descriptors expert-selected with the concentration information of dendrimers used in the experiments. The values present on the branches represent the rule or decision used for making the classification. The boxes at the bottom represent the classifications with the number of PAMAM dendrimers classified as such on the left and the number of exceptions (misclassifications) on the right.
Results from the external validation test set analysis listed by classifier using all molecular descriptors. See Equation 1–4 for the definition of precision, recall, F-measure, and mean absolute error and accuracy.
| Classifier | Precision | Recall | F-measure | Mean absolute error | Accuracy |
| Naive Bayes | 0.803 | 0.650 | 0.617 | 0.3426 | 65.0% |
| SMO | 0.803 | 0.650 | 0.617 | 0.3500 | 65.0% |
| J48 | 0.803 | 0.650 | 0.617 | 0.2776 | 65.0% |
| Bagging | 0.803 | 0.650 | 0.617 | 0.2953 | 65.0% |
| Classification via regression | 0.803 | 0.650 | 0.617 | 0.3047 | 65.0% |
| Filtered classifier | 0.803 | 0.650 | 0.617 | 0.2776 | 65.0% |
| LWL | 0.955 | 0.950 | 0.950 | 0.2510 | 95.0% |
| Decision table | 0.803 | 0.650 | 0.617 | 0.4206 | 65.0% |
| DTNB | 0.803 | 0.650 | 0.617 | 0.4182 | 65.0% |
| NBTree | 0.803 | 0.650 | 0.617 | 0.2945 | 65.0% |
| Random forest | 0.803 | 0.650 | 0.617 | 0.2784 | 65.0% |
Results from the external validation test set analysis listed by classifier including the molecular descriptors expert-selected with cytotoxicity concentration. See Equation 1–4 for the definition of precision, recall, F-measure, and mean absolute error and accuracy.
| Classifier | Precision | Recall | F-measure | Mean absolute error | Accuracy |
| Naive Bayes | 0.918 | 0.900 | 0.900 | 0.1868 | 90.0% |
| SMO | 0.803 | 0.650 | 0.617 | 0.3500 | 65.0% |
| J48 | 0.918 | 0.900 | 0.900 | 0.1768 | 90.0% |
| Bagging | 0.888 | 0.850 | 0.849 | 0.2408 | 85.0% |
| Classification via regression | 0.803 | 0.650 | 0.617 | 0.3678 | 65.0% |
| Filtered classifier | 0.803 | 0.650 | 0.617 | 0.2776 | 65.0% |
| LWL | 0.955 | 0.950 | 0.950 | 0.2467 | 95.0% |
| Decision table | 0.803 | 0.650 | 0.617 | 0.4206 | 65.0% |
| DTNB | 0.803 | 0.650 | 0.617 | 0.4182 | 65.0% |
| NBTree | 0.803 | 0.650 | 0.617 | 0.3082 | 65.0% |
| Random forest | 0.888 | 0.850 | 0.849 | 0.2187 | 85.0% |
Figure 3Simplified workflow diagram for the method used in this study.