| Literature DB >> 25347075 |
Diana de la Iglesia1, Miguel García-Remesal1, Alberto Anguita1, Miguel Muñoz-Mármol1, Casimir Kulikowski2, Víctor Maojo1.
Abstract
BACKGROUND: Clinical Trials (CTs) are essential for bridging the gap between experimental research on new drugs and their clinical application. Just like CTs for traditional drugs and biologics have helped accelerate the translation of biomedical findings into medical practice, CTs for nanodrugs and nanodevices could advance novel nanomaterials as agents for diagnosis and therapy. Although there is publicly available information about nanomedicine-related CTs, the online archiving of this information is carried out without adhering to criteria that discriminate between studies involving nanomaterials or nanotechnology-based processes (nano), and CTs that do not involve nanotechnology (non-nano). Finding out whether nanodrugs and nanodevices were involved in a study from CT summaries alone is a challenging task. At the time of writing, CTs archived in the well-known online registry ClinicalTrials.gov are not easily told apart as to whether they are nano or non-nano CTs-even when performed by domain experts, due to the lack of both a common definition for nanotechnology and of standards for reporting nanomedical experiments and results.Entities:
Mesh:
Year: 2014 PMID: 25347075 PMCID: PMC4210133 DOI: 10.1371/journal.pone.0110331
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Total number of registered clinical studies in EudraCT and ClinicalTrials.gov over the last years.
Statistics related to the number of features found in the body of CT summary documents.
| Unigrams | Bigrams | |||
| Number of unigrams | Number of unique | Number of bigrams | Number of unique | |
|
| 111 | 66 | 45 | 31 |
|
| 15092 | 1277 | 13124 | 1449 |
|
| 732.462 | 268.282 | 450.179 | 240.772 |
|
| 1000.856 | 138.014 | 788.709 | 167.367 |
The “Minimum number of N-grams per document” and the “Maximum number of N-grams per document” refer to the number of N-grams (both allowing and not allowing double- count) found in the documents containing the smallest and greatest number of N-grams in the collection, respectively. For instance, regarding unigrams, the document containing the smallest number of unigrams allowing double-count contains 111 unigrams, while the document containing the smallest number of unigrams not allowing double-count contains 66 unigrams.
no double-count allowed
Figure 2Illustration of the followed approach.
Best evaluation results obtained in 10-fold and Leave-One-Out cross-validation experiments with unigrams for the different transformations of the input set.
| TP Rate | FP Rate | Precision | Recall | F-Measure | MCC | AUC | ||||||||
| 10-fold CV | LOO | 10-fold CV | LOO | 10-fold CV | LOO | 10-fold CV | LOO | 10-fold CV | LOO | 10-fold CV | LOO | 10-fold CV | LOO | |
|
| ||||||||||||||
| L1-LogReg | 0,909 | 0,916 | 0,091 | 0,084 | 0,91 | 0,917 | 0,909 | 0,916 | 0,909 | 0,916 | 0,819 | 0,833 | 0,909 | 0,916 |
| SEM | 0,0066 | - | 0,0066 | - | 0,0063 | - | 0,0066 | - | 0,0066 | - | 0,0128 | - | 0,0066 | - |
| SVM-Pol | 0,867 | 0,875 | 0,133 | 0,125 | 0,87 | 0,878 | 0,867 | 0,875 | 0,867 | 0,875 | 0,737 | 0,753 | 0,867 | 0,875 |
| SEM | 0,0124 | - | 0,0124 | - | 0,0127 | - | 0,0124 | - | 0,0124 | - | 0,0250 | - | 0,0124 | - |
|
| ||||||||||||||
| SVM-Linear | 0,881 | 0,881 | 0,119 | 0,119 | 0,882 | 0,881 | 0,881 | 0,881 | 0,881 | 0,881 | 0,763 | 0,762 | 0,881 | 0,881 |
| SEM | 0,0112 | - | 0,0112 | - | 0,0113 | - | 0,0112 | - | 0,0112 | - | 0,0225 | - | 0,0112 | - |
| SVM-Pol | 0,886 | 0,893 | 0,114 | 0,107 | 0,888 | 0,894 | 0,886 | 0,893 | 0,886 | 0,893 | 0,774 | 0,787 | 0,886 | 0,893 |
| SEM | 0,0090 | - | 0,0090 | - | 0,0091 | - | 0,0090 | - | 0,0090 | - | 0,0179 | - | 0,0090 | - |
|
| ||||||||||||||
| L1-LogReg | 0,916 | 0,932 | 0,084 | 0,068 | 0,918 | 0,934 | 0,916 | 0,932 | 0,916 | 0,932 | 0,834 | 0,866 | 0,916 | 0,932 |
| SEM | 0,0086 | - | 0,0086 | - | 0,0083 | - | 0,0086 | - | 0,0086 | - | 0,0169 | - | 0,0086 | - |
| L2-LogReg | 0,888 | 0,902 | 0,112 | 0,098 | 0,888 | 0,903 | 0,888 | 0,902 | 0,888 | 0,902 | 0,776 | 0,805 | 0,888 | 0,902 |
| SEM | 0,0095 | - | 0,0095 | - | 0,0094 | - | 0,0095 | - | 0,0096 | - | 0,0189 | - | 0,0095 | - |
|
| ||||||||||||||
| SVM-Linear | 0,882 | 0,886 | 0,118 | 0,114 | 0,882 | 0,886 | 0,882 | 0,886 | 0,882 | 0,886 | 0,764 | 0,772 | 0,882 | 0,886 |
| SEM | 0,0084 | - | 0,0084 | - | 0,0086 | - | 0,0084 | - | 0,0084 | - | 0,0170 | - | 0,0084 | - |
| SVM-Pol | 0,888 | 0,882 | 0,112 | 0,118 | 0,889 | 0,884 | 0,888 | 0,882 | 0,888 | 0,882 | 0,777 | 0,766 | 0,888 | 0,882 |
| SEM | 0,0087 | - | 0,0087 | - | 0,0085 | - | 0,0087 | - | 0,0088 | - | 0,0172 | - | 0,0087 | - |
|
| ||||||||||||||
| L1-LogReg | 0,944 | 0,955 | 0,056 | 0,045 | 0,946 | 0,956 | 0,944 | 0,955 | 0,944 | 0,955 | 0,89 | 0,911 | 0,944 | 0,955 |
| SEM | 0,0086 | - | 0,0086 | - | 0,0075 | - | 0,0086 | - | 0,0087 | - | 0,0161 | - | 0,0086 | - |
| SVM-Linear | 0,905 | 0,908 | 0,095 | 0,092 | 0,906 | 0,909 | 0,905 | 0,908 | 0,905 | 0,908 | 0,811 | 0,817 | 0,905 | 0,908 |
| SEM | 0,0109 | - | 0,0109 | - | 0,0110 | - | 0,0109 | - | 0,0109 | - | 0,0218 | - | 0,0109 | - |
|
| ||||||||||||||
| L1-LogReg | 0,851 | 0,854 | 0,149 | 0,146 | 0,866 | 0,866 | 0,851 | 0,854 | 0,849 | 0,853 | 0,717 | 0,72 | 0,851 | 0,854 |
| SEM | 0,0102 | - | 0,0102 | - | 0,0082 | - | 0,0102 | - | 0,0107 | - | 0,0182 | - | 0,0102 | - |
| SVM-Linear | 0,896 | 0,899 | 0,104 | 0,101 | 0,897 | 0,899 | 0,896 | 0,899 | 0,896 | 0,899 | 0,793 | 0,798 | 0,896 | 0,899 |
| SEM | 0,0086 | - | 0,0086 | - | 0,0086 | - | 0,0086 | - | 0,0086 | - | 0,0171 | - | 0,0086 | - |
|
| ||||||||||||||
| L1-LogReg | 0,913 | 0,912 | 0,087 | 0,088 | 0,917 | 0,917 | 0,913 | 0,912 | 0,913 | 0,912 | 0,83 | 0,829 | 0,913 | 0,912 |
| SEM | 0,0058 | - | 0,0058 | - | 0,0053 | - | 0,0058 | - | 0,0058 | - | 0,0110 | - | 0,0058 | - |
| L2-LogReg | 0,905 | 0,902 | 0,095 | 0,098 | 0,905 | 0,902 | 0,905 | 0,902 | 0,905 | 0,902 | 0,81 | 0,804 | 0,905 | 0,902 |
| SEM | 0,0086 | - | 0,0086 | - | 0,0085 | - | 0,0086 | - | 0,0086 | - | 0,0171 | - | 0,0086 | - |
|
| ||||||||||||||
| SVM-Linear | 0,907 | 0,91 | 0,093 | 0,09 | 0,909 | 0,912 | 0,907 | 0,91 | 0,907 | 0,91 | 0,816 | 0,822 | 0,907 | 0,91 |
| SEM | 0,0068 | - | 0,0068 | - | 0,0072 | - | 0,0068 | - | 0,0068 | - | 0,0140 | - | 0,0068 | - |
| SVM-Pol | 0,869 | 0,87 | 0,131 | 0,13 | 0,885 | 0,884 | 0,869 | 0,87 | 0,868 | 0,869 | 0,754 | 0,753 | 0,869 | 0,87 |
| SEM | 0,0084 | - | 0,0084 | - | 0,0078 | - | 0,0084 | - | 0,0087 | - | 0,0160 | - | 0,0084 | - |
As indication of the variance of the obtained estimations, for 10-fold cross-validation, the table provides the value of the standard error of the mean (SEM) in addition to the figures resulting from the average of the 10 models.
Figure 3Learning curve for L1-regularized logistic regression in the case of unigrams with IDF transformation.
Figure 4ROC curve for the best classification models resulting from the LOO validation (ranking based on the AUC obtained for each classifier).
Kappa statistic for the best classification models resulting from the Leave-One-Out validation.
| Classifier | Kappa coefficient |
| L1-Logistic Regression IDF | 0.91 |
| L1-Logistic Regression Frequencies | 0.864 |
| L1-Logistic Regression Binary Occurrences | 0.832 |
| SVM Linear Normalized TFIDF | 0.826 |
| L1-Logistic Regression TFIDF | 0.824 |
| SVM Linear IDF | 0.816 |