| Literature DB >> 22720696 |
Muskan Kukreja1, Stephen Albert Johnston, Phillip Stafford.
Abstract
BACKGROUND: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data.Entities:
Mesh:
Year: 2012 PMID: 22720696 PMCID: PMC3430557 DOI: 10.1186/1471-2105-13-139
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1One-to-one correspondence found in gene expression microarrays is not observed for the immunosignaturing arrays. We propose that a single peptide may bind numerous antibodies, and have shown that a single antibody can bind hundreds of different peptides.
Overall performance measure of classification algorithms on datasets
| 77.7 | |||||||||
| 71.1 | 84.7 | 89.3 | 87.3 | ||||||
| 88.0 | 71.3 | 86.1 | 88.4 | 87.0 | |||||
| 75.5 | 62.6 | 87.7 | 84.9 | ||||||
| 89.8 | 89.7 | 81.3 | 62.3 | 82.0 | 86.6 | 87.8 | 82.8 | ||
| 82.4 | 62.8 | 80.6 | 81.4 | 81.1 | 81.9 | ||||
| 87.7 | 53.9 | 80.2 | 83.2 | 85.1 | 81.8 | ||||
| 88.3 | 80.7 | 59.6 | 77.8 | 83.3 | 83.6 | 80.7 | |||
| 60.4 | 50.7 | 81.5 | 84.8 | 78.9 | |||||
| 71.8 | 72.2 | 65.0 | 68.5 | 84.7 | 77.8 | ||||
| 81.5 | 52.5 | 55.8 | 87.5 | 75.7 | 89.0 | 76.2 | |||
| 81.9 | 89.4 | 53.5 | 64.3 | 68.8 | 70.7 | 74.2 | |||
| 85.1 | 58.7 | 83.2 | 60.0 | 75.2 | 73.4 | 79.6 | 73.6 | ||
| 80.3 | 69.7 | 78.4 | 48.7 | 70.6 | 68.4 | 76.7 | 70.4 | ||
| 83.8 | 71.7 | 76.2 | 52.9 | 69.3 | 60.8 | 75.0 | 70.0 | ||
| 76.8 | 70.0 | 77.9 | 43.1 | 72.0 | 63.1 | 76.7 | 68.5 | ||
| 69.7 | 52.0 | 89.1 | 70.8 | 62.8 | 69.7 | 52.6 | 66.7 |
T1D: Type 1 diabetes datasets, Az: Alzehemer’s dataset, Ab: Antibodies dataset. Table showing algorithms overall performance in each datasets based on average score. Score >90% are marked in bold. Naïve Bayes scored the overall highest average score of 90.4%.
Performance measures of data mining algorithm at different levels of significance over Type 1 diabetes dataset
| | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 87.5 | 85.0 | 89.7 | |||||||||||||||
| 85.4 | 89.0 | 85.4 | |||||||||||||||
| 88.8 | 82.9 | 0.89 | 82.9 | ||||||||||||||
| 87.5 | 87.8 | 87.2 | 87.8 | 88.8 | 85.4 | ||||||||||||
| 85.4 | 85.0 | 80.5 | 89.7 | ||||||||||||||
| 86.3 | 87.8 | 84.6 | 0.82 | 87.5 | 82.1 | ||||||||||||
| 87.5 | 82.9 | 88.8 | 85.4 | 87.5 | 82.9 | ||||||||||||
| 85.4 | 85.4 | 83.8 | 78.0 | 89.7 | 0.89 | ||||||||||||
| 80.0 | 80.5 | 79.5 | 0.89 | dnf | dnf | dnf | dnf | ||||||||||
| 87.5 | 84.6 | 89.7 | 83.8 | 74.4 | 89.8 | ||||||||||||
| 82.9 | 82.9 | 86.3 | 78.0 | 0.87 | 85.0 | 75.6 | 0.85 | 88.3 | |||||||||
| 88.8 | 85.4 | 85.0 | 80.5 | 89.7 | 81.3 | 78.0 | 84.6 | 0.87 | 78.8 | 73.2 | 84.6 | 0.85 | 85.1 | ||||
| 85.0 | 87.8 | 82.1 | 0.85 | 78.8 | 75.6 | 82.1 | 0.79 | 87.5 | 85.4 | 89.7 | 0.88 | 83.8 | 85.4 | 82.1 | 0.84 | 83.8 | |
| 87.5 | 87.8 | 87.2 | 85.4 | 0.98 | 85.4 | 53.8 | 5.1 | 0.54 | 81.9 | ||||||||
| 86.3 | 85.4 | 87.2 | 0.79 | 81.3 | 82.9 | 79.5 | 0.83 | 78.8 | 82.9 | 74.4 | 0.72 | 80.0 | 85.4 | 74.4 | 0.73 | 80.3 | |
| 86.3 | 85.4 | 87.2 | 0.79 | 80.0 | 82.9 | 76.9 | 0.80 | 80.0 | 87.8 | 71.8 | 0.78 | 66.3 | 80.5 | 51.3 | 0.55 | 76.8 | |
| 88.8 | 82.9 | 85.4 | 0.95 | 40.0 | 15.8 | 0.68 | 21.3 | 0.0 | 0.48 | 69.7 | |||||||
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: “Did Not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Performance measures of data mining algorithm at different levels of significance over Alzheimer’s dataset
| | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 82.0 | 82.0 | 86.5 | 84.0 | ||||||||||||||||
| 87.0 | 83.3 | ||||||||||||||||||
| 0.87 | 81.8 | 0.89 | 81.8 | ||||||||||||||||
| 86.9 | 83.3 | ||||||||||||||||||
| 82.6 | 66.7 | 0.83 | 81.8 | ||||||||||||||||
| 81.8 | 81.8 | 73.9 | 81.8 | 66.7 | 89.7 | ||||||||||||||
| 87.0 | 83.3 | 0.87 | 82.6 | 81.8 | 83.3 | 0.83 | 87.0 | 81.8 | 0.87 | 88.0 | |||||||||
| 81.8 | 87.0 | 81.8 | 0.86 | 78.3 | 81.8 | 75.0 | 0.84 | 87.7 | |||||||||||
| 86.9 | 81.8 | 82.6 | 81.8 | 83.3 | 73.9 | 72.7 | 75.0 | 0.89 | 72.6 | 81.8 | 75.0 | 0.84 | 82.4 | ||||||
| 78.2 | 81.8 | 75.0 | 0.86 | 56.5 | 18.2 | 0.64 | 81.5 | ||||||||||||
| 86.9 | 81.8 | 73.9 | 72.7 | 75.0 | 0.82 | 60.9 | 63.6 | 58.3 | 0.80 | 52.2 | 54.5 | 50.0 | 0.69 | 71.8 | |||||
| 78.3 | 72.7 | 83.3 | 0.78 | 60.9 | 54.5 | 66.7 | 0.61 | 73.9 | 63.6 | 83.3 | 0.74 | 73.9 | 81.8 | 66.7 | 0.74 | 71.7 | |||
| 73.9 | 63.6 | 83.3 | 0.61 | 68.9 | 63.6 | 58.3 | 0.56 | 73.9 | 81.8 | 66.7 | 0.75 | 78.2 | 63.9 | 0.61 | 70.0 | ||||
| 73.9 | 63.6 | 83.3 | 0.61 | 60.9 | 63.6 | 58.3 | 0.56 | 73.9 | 81.8 | 70.0 | 0.75 | 78.3 | 63.6 | 0.61 | 69.7 | ||||
| 69.5 | 54.5 | 83.3 | 0.80 | 52.2 | 45.5 | 58.3 | 0.73 | 56.5 | 45.5 | 66.7 | 0.43 | 56.5 | 36.4 | 75.0 | 0.44 | 58.7 | |||
| 69.6 | 72.7 | 66.7 | 0.81 | 34.8 | 40.0 | 75.0 | 0.45 | 34.8 | 0.0 | 0.30 | 30.4 | 0.0 | 0.52 | 52.0 | |||||
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: Did not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Performance measures of data mining algorithm at different levels of significance over Antibodies dataset
| | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 88.0 | 88.0 | 88.0 | 88.0 | ||||||||||||||
| 88.0 | 88.0 | 88.0 | 88.0 | 88.0 | 88.0 | 88.0 | 88.0 | ||||||||||
| 80.0 | 86.6 | 80.0 | 0.86 | 86.0 | 89.9 | 86.0 | 0.89 | ||||||||||
| 80.0 | 89.8 | 80.0 | 86.0 | 89.9 | 86.0 | ||||||||||||
| 84.0 | 84.0 | 0.89 | 86.0 | 83.2 | 86.0 | ||||||||||||
| 82.0 | 82.0 | 84.0 | 88.7 | 84.0 | 86.0 | 86.0 | 89.4 | ||||||||||
| 72.0 | 85.3 | 72.0 | 84.0 | 84.0 | 89.1 | ||||||||||||
| 80.0 | 80.0 | 76.0 | 87.4 | 76.0 | 78.0 | 89.4 | 78.0 | 74.0 | 85.4 | 74.0 | 0.89 | 83.2 | |||||
| 64.0 | 83.6 | 64.0 | 72.0 | 84.9 | 72.0 | 80.0 | 87.5 | 80.0 | 87.1 | 80.0 | 81.3 | ||||||
| 88.0 | 88.0 | 82.0 | 20.0 | 20.8 | 0.68 | 80.7 | |||||||||||
| 80.0 | 80.0 | 0.86 | 72.0 | 87.0 | 72.0 | 0.87 | 70.0 | 87.6 | 70.0 | 0.79 | 64.0 | 86.1 | 64.0 | 0.77 | 78.4 | ||
| 82.0 | 82.0 | 0.87 | 72.0 | 82.9 | 72.0 | 0.82 | 70.0 | 87.8 | 70.0 | 0.76 | 64.0 | 88.5 | 64.0 | 0.75 | 77.9 | ||
| 72.0 | 72.0 | 0.81 | 64.0 | 82.1 | 64.0 | 0.73 | 68.0 | 87.7 | 68.0 | 0.78 | 74.0 | 89.7 | 74.0 | 0.82 | 76.2 | ||
| 72.0 | 88.5 | 72.0 | 0.86 | 64.0 | 64.0 | 0.85 | 58.0 | 58.0 | 0.86 | 52.0 | 52.0 | 0.89 | 75.5 | ||||
| 68.0 | 84.5 | 68.0 | 0.88 | 40.0 | 81.1 | 40.0 | 0.71 | 42.0 | 89.7 | 48.8 | 0.54 | 20.0 | 88.4 | 25.0 | 0.58 | 60.4 | |
| 46.0 | 68.7 | 46.0 | 0.57 | 46.0 | 68.7 | 46.0 | 0.57 | 40.0 | 68.1 | 40.0 | 0.54 | 40.0 | 68.1 | 40.0 | 0.54 | 52.5 | |
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: Did not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Performance measures of data mining algorithm at different levels of significance over Asthma dataset 4 classes
| | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61.7 | 87.2 | 61.7 | 0.82 | 68.1 | 89.3 | 68.1 | 0.86 | 72.3 | 72.3 | 0.87 | 70.2 | 70.2 | 0.86 | 77.7 | |||
| 57.5 | 85.8 | 57.4 | 0.80 | 57.4 | 85.6 | 57.4 | 0.81 | 72.3 | 72.3 | 0.85 | 55.3 | 86.1 | 55.3 | 0.76 | 72.2 | ||
| 55.3 | 86.2 | 55.3 | 0.77 | 55.3 | 86.2 | 55.3 | 0.77 | 61.7 | 87.2 | 61.7 | 0.82 | 66.0 | 87.6 | 66.0 | 0.81 | 71.3 | |
| 55.3 | 86.1 | 55.3 | 0.82 | 53.2 | 84.6 | 53.2 | 0.80 | 63.8 | 87.8 | 63.8 | 0.88 | 71.1* | |||||
| 48.9 | 87.0 | 48.9 | 0.78 | 53.2 | 84.4 | 53.2 | 0.79 | 59.6 | 86.4 | 59.6 | 0.84 | 68.0 | 89.2 | 68.1 | 0.86 | 70.8 | |
| 48.9 | 86.9 | 48.9 | 0.77 | 48.9 | 86.9 | 48.9 | 0.77 | 46.8 | 81.1 | 46.8 | 0.75 | 40.4 | 80.0 | 40.4 | 0.71 | 62.8 | |
| 48.9 | 82.8 | 48.9 | 0.66 | 48.9 | 82.9 | 48.9 | 0.67 | 51.0 | 83.6 | 51.1 | 0.69 | 46.8 | 81.9 | 46.8 | 0.77 | 62.6 | |
| 51.1 | 83.4 | 51.1 | 0.72 | 53.2 | 84.0 | 53.2 | 0.70 | 46.8 | 71.8 | 46.8 | 0.74 | 42.6 | 80.3 | 42.0 | 0.75 | 62.3 | |
| 48.9 | 82.8 | 48.9 | 0.79 | 55.3 | 86.1 | 55.3 | 0.81 | 42.5 | 81.0 | 42.6 | 0.68 | 27.6 | 75.8 | 27.7 | 0.57 | 60.0 | |
| 42.5 | 87.1 | 42.6 | 0.69 | 46.8 | 86.6 | 46.8 | 0.67 | 44.6 | 88.0 | 44.7 | 0.69 | 36.2 | 79.7 | 36.2 | 0.67 | 59.6 | |
| 40.4 | 81.9 | 40.4 | 0.60 | 46.8 | 82.2 | 46.8 | 0.65 | 42.6 | 80.7 | 42.6 | 0.62 | 34.0 | 78.0 | 34.0 | 0.56 | 55.8 | |
| 38.3 | 79.3 | 38.3 | 0.56 | 36.2 | 77.8 | 36.2 | 0.56 | 44.7 | 81.4 | 44.7 | 0.63 | 36.2 | 77.6 | 36.2 | 0.60 | 53.9 | |
| 48.9 | 83.0 | 48.9 | 0.70 | 38.3 | 79.4 | 38.3 | 0.63 | 36.2 | 79.4 | 36.2 | 0.62 | 23.4 | 76.4 | 23.4 | 0.49 | 53.5 | |
| 29.8 | 76.6 | 29.8 | 0.53 | 40.4 | 80.2 | 40.4 | 0.60 | 38.3 | 79.5 | 38.3 | 0.59 | 40.4 | 80.2 | 40.4 | 0.60 | 52.9 | |
| 53.2 | 84.4 | 53.2 | 0.80 | 27.7 | 80.0 | 32.5 | 0.57 | 8.5 | 86.5 | 16.7 | 0.56 | 14.9 | 83.6 | 23.3 | 0.53 | 50.7 | |
| 27.7 | 75.4 | 27.7 | 0.52 | 27.7 | 75.9 | 27.7 | 0.49 | 42.6 | 80.8 | 42.6 | 0.58 | 31.9 | 77.1 | 31.9 | 0.52 | 48.7 | |
| 27.7 | 76.0 | 27.7 | 0.52 | 19.2 | 71.8 | 19.1 | 0.46 | 29.8 | 76.7 | 29.8 | 0.52 | 21.2 | 74.8 | 21.3 | 0.45 | 43.1 | |
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: Did not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Performance measures of data mining algorithm at different levels of significance on A & B conditions
| | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 87.5 | 83.3 | 0.84 | 83.3 | 83.3 | |||||||||
| 79.2 | 75.0 | 83.3 | 83.3 | 87.5 | 75.0 | 87.7 | |||||||
| 87.5 | 83.3 | 0.88 | 83.3 | 83.3 | 75.0 | 0.83 | 87.5 | ||||||
| 83.3 | 83.3 | 83.3 | 0.83 | 87.5 | 83.3 | 0.87 | 87.5 | 83.3 | 0.88 | 86.1 | |||
| 79.2 | 83.3 | 75.0 | 0.70 | 84.7* | |||||||||
| 83.3 | 75.0 | 83.3 | 83.3 | 83.3 | 70.8 | 83.3 | 58.3 | 0.88 | 82.0 | ||||
| 66.7 | 83.3 | 50.0 | 0.76 | 79.2 | 83.3 | 75.0 | 0.85 | 81.5 | |||||
| 79.2 | 83.3 | 75.0 | 79.2 | 75.0 | 83.3 | 0.86 | 79.2 | 75.0 | 83.3 | 0.78 | 80.6 | ||
| 83.3 | 75.0 | 0.87 | 83.3 | 83.3 | 83.3 | 0.83 | 75.0 | 75.0 | 75.0 | 0.67 | 80.2 | ||
| 75.0 | 83.3 | 66.7 | 0.85 | 75.0 | 58.3 | 75.0 | 58.3 | 0.84 | 77.8 | ||||
| 75.0 | 83.3 | 66.7 | 0.74 | 75.0 | 75.0 | 75.0 | 0.79 | 75.0 | 75.0 | 75.0 | 0.74 | 75.2 | |
| 62.5 | 66.7 | 58.3 | 0.65 | 79.2 | 83.3 | 75.0 | 0.85 | 70.8 | 75.0 | 66.7 | 0.76 | 72.0 | |
| 62.5 | 66.7 | 58.3 | 0.65 | 79.2 | 83.3 | 75.0 | 0.85 | 66.7 | 75.0 | 58.3 | 0.72 | 70.6 | |
| 70.8 | 75.0 | 66.7 | 0.70 | 70.8 | 75.0 | 66.7 | 0.70 | 66.7 | 66.7 | 66.7 | 0.67 | 69.3 | |
| 70.8 | 75.0 | 66.7 | 0.80 | 66.7 | 75.0 | 58.3 | 0.77 | 50.0 | 50.0 | 50.0 | 0.60 | 65.0 | |
| 66.7 | 41.7 | 0.83 | 58.3 | 46.7 | 0.83 | 50.0 | 0.0 | 0.50 | 64.3 | ||||
| 79.2 | 83.3 | 75.0 | 0.84 | 61.2 | 64.5 | 54.5 | 0.52 | 29.2 | 14.3 | 0.56 | 62.8 | ||
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: Did not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Performance measures of data mining algorithm at different levels of significance on A & C conditions
| | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 81.8 | |||||||||||||
| 87.0 | 83.3 | ||||||||||||
| 86.9 | 81.8 | ||||||||||||
| 73.9 | 75.0 | 72.7 | 0.74 | 88.4 | |||||||||
| 82.6 | 72.7 | 78.2 | 83.3 | 72.7 | 0.83 | 86.6 | |||||||
| 86.0 | 81.8 | 69.6 | 83.3 | 54.5 | 0.76 | 84.8 | |||||||
| 81.8 | 81.8 | 65.2 | 66.7 | 63.6 | 0.72 | 83.3 | |||||||
| 82.6 | 83.3 | 81.8 | 69.6 | 66.7 | 72.7 | 0.64 | 83.2 | ||||||
| 87.0 | 83.3 | 82.6 | 83.3 | 81.8 | 69.5 | 66.7 | 72.7 | 0.75 | 81.4 | ||||
| 69.6 | 83.3 | 54.5 | 0.69 | 60.9 | 63.6 | 63.6 | 0.63 | 75.7 | |||||
| 0.86 | 65.2 | 58.3 | 72.7 | 0.72 | 65.2 | 58.3 | 72.7 | 0.56 | 73.4 | ||||
| 81.8 | 65.2 | 71.7 | 58.6 | 0.77 | 17.4 | 25.0 | 0.52 | 69.7 | |||||
| 73.9 | 54.5 | 78.2 | 54.5 | 0.82 | 47.8 | 0.0 | 0.50 | 68.8 | |||||
| 87.0 | 83.3 | 0.89 | 73.9 | 75.0 | 72.7 | 0.74 | 43.5 | 41.7 | 45.5 | 0.45 | 68.5 | ||
| 69.6 | 66.7 | 72.7 | 0.76 | 69.6 | 58.3 | 81.8 | 0.77 | 60.9 | 58.3 | 63.6 | 0.66 | 68.4 | |
| 65.6 | 66.7 | 72.7 | 0.76 | 69.6 | 66.7 | 72.7 | 0.76 | 47.8 | 66.7 | 27.3 | 0.49 | 63.1 | |
| 73.9 | 54.5 | 0.73 | 73.9 | 66.7 | 81.8 | 0.74 | 34.8 | 33.3 | 36.4 | 0.35 | 60.8 | ||
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: Did not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Performance measures of data mining algorithm at different levels of significance on B & D conditions
| | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 83.3 | |||||||||||||
| 83.3 | |||||||||||||
| 87.5 | 75.0 | 83.3 | |||||||||||
| 79.1 | 83.3 | 75.0 | 87.5 | 83.3 | |||||||||
| 87.5 | 83.3 | 87.5 | 83.3 | 89.3* | |||||||||
| 87.5 | 83.3 | 0.88 | 87.5 | 83.3 | 0.88 | 89.0 | |||||||
| 87.5 | 83.3 | 0.89 | 0.87 | 83.3 | 75.0 | 87.8 | |||||||
| 83.3 | 83.3 | 83.3 | 0.89 | 87.5 | 83.3 | 0.86 | 83.3 | 83.3 | 83.3 | 0.84 | 85.1 | ||
| 83.3 | 83.3 | 83.3 | 0.88 | 79.2 | 66.7 | 87.5 | 75.0 | 0.89 | 84.7 | ||||
| 79.2 | 75.0 | 83.3 | 0.80 | 83.3 | 83.3 | 83.3 | 0.83 | 87.5 | 83.3 | 83.6 | |||
| 83.3 | 83.3 | 83.3 | 0.83 | 79.2 | 83.3 | 75.0 | 0.84 | 79.2 | 83.3 | 75.0 | 0.81 | 81.1 | |
| 87.5 | 83.3 | 0.88 | 79.2 | 83.3 | 75.0 | 0.73 | 75.0 | 83.3 | 66.7 | 0.69 | 79.6 | ||
| 83.3 | 0.83 | 75.0 | 83.3 | 66.7 | 0.61 | 70.8 | 75.0 | 66.7 | 0.64 | 76.7 | |||
| 83.3 | 0.83 | 75.0 | 83.3 | 66.7 | 0.61 | 70.8 | 75.0 | 66.7 | 0.64 | 76.7 | |||
| 83.3 | 75.0 | 0.83 | 70.8 | 66.7 | 75.0 | 0.71 | 70.8 | 66.7 | 75.0 | 0.71 | 75.0 | ||
| 70.8 | 66.7 | 75.0 | 0.83 | 79.2 | 75.0 | 83.3 | 0.82 | 58.3 | 16.7 | 0.58 | 70.7 | ||
| 62.5 | 72.3 | 60.9 | 0.75 | 50.0 | 65.0 | 48.0 | 0.71 | 20.8 | 42.6 | 18.6 | 0.45 | 52.6 | |
Acc: Accuracy, Sp: Specificity, Sn: Sensitivity, AUC: Area under ROC curve, Avg: Average score in % for each algorithms, dnf: Did not Finish”, * denotes Avg. from 3 significance levels. Measures >90% are marked in bold.
Worst case time performance (in ms) of classification algorithms
| 16581 | 11731 | ||||
| 25974 | 6341 | 11555 | |||
| 10496 | 29008 | 14076 | |||
| 50087 | 21452 | 26524 | |||
| 50290 | 23452 | 27435 | |||
| 55672 | 25000 | 29901 | |||
| 85955 | 12405 | 29658 | 42672 | ||
| 632840 | 48215 | 605365 | 428806 | ||
| 658668 | 869523 | 632983 | 720391 | ||
| 1589092 | 1146783 | 1315256 | 1350377 | ||
| 5444533 | 2465021 | 4565896 | 4158483 | ||
Table showing time performance in milliseconds over >1000 peptides for three datasets. Random Tree, KNN, Hyper Pipes and VFI were among the fastest. MLP were among the slowest with dnf: “Did not finish”. Time measurements less than 10 seconds are marked in bold.
Time performance (in ms) of classification algorithms on datasets
| | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16581 | 11731 | |||||||||||
| 25974 | ||||||||||||
| 10496 | 18297 | 18372 | 23712 | 29009 | ||||||||
| 11889 | 18254 | 50087 | 13848 | 21452 | ||||||||
| 50290 | 12033 | 23452 | ||||||||||
| 55672 | 25000 | |||||||||||
| 11876 | 85955 | 12405 | 29658 | |||||||||
| 11215 | 26380 | 79308 | 632840 | 22625 | 48215 | 17389 | 20649 | 89107 | 605365 | |||
| 658668 | 35568 | 869523 | 17373 | 632983 | ||||||||
| 24687 | 1589092 | 48659 | 1146783 | 255103 | 1315256 | |||||||
| 32836 | 5444533 | 36849 | 2465021 | 25496 | 4565896 | |||||||
| 23759 | 314076 | 4572305 | 30342 | 2789485 | 22916 | 156905 | 3277395 | |||||
Table showing time performance in milliseconds on all level of significance for three datasets. MLP were among the slowest with dnf: “Did not finish”. Time measurements less than 10 seconds are marked in bold.
Summary of performance and time measures of classification algorithms
| 5 | 1 | 6 | |||
| 0 | 0 | 4 | 7615X | ||
| 0 | 1 | 3 | 11X | ||
| 0 | 2 | 4 | −5.7 | ||
| 0 | 0 | 0 | −7.9 | ||
| 1 | 0 | 2 | −8.8 | 21X | |
| 0 | 1 | 2 | −8.8 | 24X | |
| 0 | 0 | 1 | −9.9 | 34X | |
| 0 | 1 | 3 | −11.8 | 1072X | |
| 1 | 1 | 2 | −12.9 | 340X | |
| 0 | 0 | 1 | −14.4 | ||
| 0 | 0 | 1 | −16.5 | 9X | |
| 0 | 0 | 0 | −17.0 | 22X | |
| 0 | 0 | 0 | −20.2 | 8X | |
| 0 | 0 | 0 | −20.7 | ||
| 0 | 0 | 0 | −22.1 | 3300X | |
| 0 | 0 | 0 | −24.0 | 572X |
#Rank 1, Rank 2: No. of times algorithm ranked 1st and 2nd on 7 datasets, # > 90%: No. of times algorithm scored overall average score >90% on 7 datasets, Distance: magnitude an algorithm trails behind on average from the Rank 1 for the datasets (5% or less distance are marked in bold). Time: performance slower with respective to fastest algorithm. Time performances slower by 5 folds to fastest algorithm are marked in bold.