| Literature DB >> 22479449 |
Gregor Stiglic1, Simon Kocbek, Igor Pernek, Peter Kokol.
Abstract
PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22479449 PMCID: PMC3316502 DOI: 10.1371/journal.pone.0033812
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Comparison of the original J48 decision tree (upper image) and visually tuned version from VTJ48 (lower image) on the letter dataset.
Basic information on 40 datasets from UCI repository used in this study including information about number of instances, attributes, classes, length of longest attribute name (LAN) and length of the longest nominal attribute value (LAV).
| Dataset | Samples | Attributes | Nominal | Numeric | Classes | LAN | LAV |
| anneal | 898 | 39 | 33 | 6 | 6 | 22 | 5 |
| anneal.orig | 898 | 39 | 33 | 6 | 6 | 22 | 5 |
| arrhythmia | 452 | 280 | 74 | 206 | 16 | 28 | 2 |
| audiology | 226 | 70 | 70 | 0 | 24 | 23 | 32 |
| autos | 205 | 26 | 11 | 15 | 7 | 17 | 13 |
| balance-scale | 625 | 5 | 1 | 4 | 3 | 14 | 1 |
| breast-cancer | 286 | 10 | 10 | 0 | 2 | 11 | 20 |
| breast-w | 699 | 10 | 1 | 9 | 2 | 21 | 9 |
| colic | 368 | 23 | 16 | 7 | 2 | 27 | 29 |
| colic.orig | 368 | 28 | 21 | 7 | 2 | 27 | 7 |
| credit-a | 690 | 16 | 10 | 6 | 2 | 5 | 2 |
| credit-g | 1000 | 21 | 14 | 7 | 2 | 22 | 30 |
| diabetes | 768 | 9 | 1 | 8 | 2 | 5 | 15 |
| ecoli | 336 | 8 | 1 | 7 | 8 | 5 | 3 |
| glass | 214 | 10 | 1 | 9 | 7 | 4 | 20 |
| heart-c | 303 | 14 | 8 | 6 | 5 | 8 | 21 |
| heart-h | 294 | 14 | 8 | 6 | 5 | 10 | 21 |
| heart-statlog | 270 | 14 | 1 | 13 | 2 | 36 | 7 |
| hepatitis | 155 | 20 | 14 | 6 | 2 | 15 | 6 |
| hypothyroid | 3772 | 30 | 23 | 7 | 4 | 25 | 23 |
| ionosphere | 351 | 35 | 1 | 34 | 2 | 5 | 1 |
| iris | 150 | 5 | 1 | 4 | 3 | 11 | 15 |
| kr-vs-kp | 3196 | 37 | 37 | 0 | 2 | 5 | 5 |
| labor | 57 | 17 | 9 | 8 | 2 | 30 | 13 |
| letter | 20000 | 17 | 1 | 16 | 26 | 5 | 1 |
| lymph | 148 | 19 | 16 | 3 | 4 | 15 | 12 |
| mushroom | 8124 | 23 | 23 | 0 | 2 | 24 | 1 |
| optdigits | 5620 | 65 | 1 | 64 | 10 | 7 | 1 |
| pendigits | 10992 | 17 | 1 | 16 | 10 | 7 | 1 |
| primary-tumor | 339 | 18 | 18 | 0 | 22 | 15 | 17 |
| segment | 2310 | 20 | 1 | 19 | 7 | 20 | 9 |
| sick | 3772 | 30 | 23 | 7 | 2 | 25 | 8 |
| sonar | 208 | 61 | 1 | 60 | 2 | 12 | 4 |
| soybean | 683 | 36 | 36 | 0 | 19 | 15 | 27 |
| splice | 3190 | 62 | 62 | 0 | 3 | 13 | 24 |
| vehicle | 846 | 19 | 1 | 18 | 4 | 25 | 4 |
| vote | 435 | 17 | 17 | 0 | 2 | 38 | 10 |
| vowel | 990 | 14 | 4 | 10 | 11 | 14 | 6 |
| waveform-5000 | 5000 | 41 | 1 | 40 | 3 | 5 | 1 |
| zoo | 101 | 18 | 17 | 1 | 7 | 8 | 12 |
Feature datasets used in protein solubility classification.
| # | Name | Size |
| 1 | MonomersNatural | 20 |
| 2 | DimersNatural | 13 |
| 3 | TrimersNatural | 24 |
| 4 | MonomersHydro | 5 |
| 5 | TrimersHydro | 12 |
| 6 | MonomersConfSimi | 7 |
| 7 | DimersConfSimi | 20 |
| 8 | TrimersConfSimi | 15 |
| 9 | MonomersBlosum | 8 |
| 10 | DimersBlosum | 25 |
| 11 | MonomersClustEm14 | 14 |
| 12 | DimersClustEm14 | 16 |
| 13 | TrimersClustEm14 | 22 |
| 14 | MonomersClustEm17 | 17 |
| 15 | DimersClustEm17 | 27 |
| 16 | TrimersClustEm17 | 42 |
| 17 | MonomersPhysChem | 7 |
| 18 | DimersPhysChem | 21 |
| 19 | Computed | 4 |
| 20 | eSol | 22 |
| 21 | All Features | 342 |
Comparison of decision tree dimensions on 40 UCI datasets including the number of leaves.
| Leaves | Width | Height | ||||
| J48 | VTJ48 | J48 | VTJ48 | J48 | VTJ48 | |
| anneal | 37.69 | 12.98 | 2753.62 | 2555.43 | 670.11 | 677.68 |
| anneal.ORIG | 46.37 | 11.10 | 3426.41 | 1362.30 | 868.05 | 546.30 |
| arrhythmia | 40.59 | 10.20 | 1679.34 | 1589.04 | 1555.57 | 1462.47 |
| audiology | 30.25 | 9.11 | 3799.18 | 3781.91 | 923.98 | 921.00 |
| autos | 45.25 | 12.77 | 6527.37 | 4199.37 | 654.02 | 637.58 |
| balance-scale | 41.24 | 25.86 | 1986.12 | 1222.98 | 821.96 | 747.91 |
| breast-cancer | 9.60 | 4.04 | 1177.92 | 1518.04 | 348.63 | 354.23 |
| breast-w | 12.08 | 14.23 | 781.99 | 967.01 | 637.84 | 698.75 |
| colic | 6.07 | 8.76 | 546.44 | 1198.39 | 360.41 | 424.61 |
| colic.ORIG | 1.00 | 6.83 | 1.00 | 480.83 | 1.00 | 372.96 |
| credit-a | 21.40 | 12.01 | 1664.81 | 1098.50 | 669.91 | 619.21 |
| credit-g | 89.05 | 7.07 | 13906.86 | 1077.07 | 877.89 | 335.60 |
| diabetes | 21.87 | 11.97 | 1488.65 | 963.30 | 830.31 | 694.87 |
| ecoli | 18.70 | 17.78 | 1039.47 | 1055.72 | 735.37 | 723.22 |
| glass | 23.73 | 12.23 | 2293.10 | 1838.96 | 827.88 | 754.41 |
| heart-c | 26.05 | 8.85 | 3273.48 | 1399.74 | 618.49 | 476.47 |
| heart-h | 7.21 | 8.17 | 673.28 | 1042.54 | 408.37 | 464.92 |
| heart-statlog | 17.85 | 13.41 | 1577.55 | 1309.66 | 633.84 | 605.13 |
| hepatitis | 9.24 | 12.41 | 522.66 | 754.78 | 571.69 | 659.98 |
| hypothyroid | 14.39 | 13.43 | 1101.64 | 1070.54 | 756.02 | 771.15 |
| ionosphere | 13.85 | 11.59 | 1070.98 | 1019.02 | 775.47 | 734.02 |
| iris | 4.69 | 4.76 | 227.87 | 231.10 | 428.43 | 432.21 |
| kr-vs-kp | 28.98 | 13.16 | 1187.64 | 1104.25 | 1091.28 | 1076.77 |
| labor | 4.00 | 5.20 | 329.00 | 464.17 | 333.06 | 380.56 |
| letter | 1165.00 | 12.65 | 63285.55 | 63344.28 | 1916.69 | 1919.54 |
| lymph | 17.43 | 10.12 | 1863.97 | 1252.03 | 580.12 | 462.35 |
| mushroom | 24.93 | 24.93 | 1022.25 | 1022.25 | 527.00 | 527.00 |
| optdigits | 205.46 | 16.09 | 11154.56 | 11195.65 | 1330.36 | 1334.04 |
| pendigits | 188.13 | 16.04 | 10719.41 | 10784.98 | 1297.69 | 1296.05 |
| primary-tumor | 43.18 | 14.62 | 3794.33 | 1797.86 | 891.43 | 789.16 |
| segment | 41.12 | 11.09 | 3749.02 | 3748.84 | 1084.95 | 1085.78 |
| sick | 27.59 | 14.22 | 1763.57 | 1087.54 | 815.68 | 710.67 |
| sonar | 14.71 | 13.80 | 1107.13 | 1089.68 | 665.59 | 659.63 |
| soybean | 61.28 | 11.04 | 6175.62 | 6180.02 | 913.67 | 920.67 |
| splice | 173.83 | 20.78 | 7537.58 | 6176.44 | 759.48 | 731.51 |
| vehicle | 69.22 | 16.27 | 5069.70 | 4183.99 | 1168.31 | 1065.60 |
| vote | 5.81 | 6.22 | 390.94 | 432.98 | 508.98 | 513.86 |
| vowel | 126.41 | 10.58 | 11046.43 | 11045.28 | 985.60 | 986.01 |
| waveform-5000 | 295.66 | 16.82 | 16325.97 | 13756.92 | 1494.51 | 1386.66 |
| zoo | 8.31 | 8.31 | 436.69 | 436.69 | 567.50 | 567.50 |
Comparison of decision tree dimensions on the protein feature datasets including the number of leaves.
| Leaves | Width | Height | ||||
| J48 | VTJ48 | J48 | VTJ48 | J48 | VTJ48 | |
| MonomersNatural | 91.73 | 13.08 | 6965.05 | 4779.43 | 1296.91 | 1147.77 |
| DimersNatural | 54.05 | 12.93 | 3880.20 | 1807.83 | 1226.37 | 882.51 |
| TrimersNatural | 15.57 | 11.03 | 1403.25 | 1338.10 | 798.08 | 784.64 |
| MonomersHydro | 7.25 | 7.05 | 576.10 | 600.18 | 547.75 | 553.87 |
| TrimersHydro | 41.54 | 11.38 | 3035.91 | 1498.45 | 1068.53 | 816.91 |
| MonomersConfSimi | 16.55 | 11.91 | 1225.22 | 999.07 | 765.66 | 701.04 |
| DimersConfSimi | 85.58 | 13.02 | 6518.23 | 3880.32 | 1256.65 | 1045.51 |
| TrimersConfSimi | 37.49 | 11.02 | 2607.05 | 1251.43 | 1112.98 | 807.38 |
| MonomersBlosum | 29.72 | 13.61 | 2270.26 | 1399.86 | 909.40 | 781.34 |
| DimersBlosum | 94.21 | 13.27 | 7139.47 | 4640.00 | 1297.53 | 1129.44 |
| MonomersClustEm14 | 68.44 | 13.46 | 5272.88 | 3006.28 | 1169.90 | 984.28 |
| DimersClustEm14 | 51.66 | 11.20 | 4115.81 | 1974.13 | 1202.74 | 921.91 |
| TrimersClustEm14 | 35.04 | 10.53 | 2808.01 | 1310.34 | 1245.19 | 845.54 |
| MonomersClustEm17 | 84.87 | 12.90 | 6687.42 | 3637.08 | 1182.30 | 956.30 |
| DimersClustEm17 | 117.36 | 10.26 | 7419.49 | 3430.75 | 1609.62 | 1158.41 |
| TrimersClustEm17 | 88.52 | 10.06 | 6730.81 | 3020.26 | 1912.71 | 1221.96 |
| MonomersPhysChem | 32.92 | 13.67 | 2655.87 | 1674.10 | 919.93 | 831.75 |
| DimersPhysChem | 80.22 | 10.81 | 5927.14 | 2879.09 | 1356.17 | 1047.01 |
| Computed | 7.99 | 8.51 | 724.04 | 779.70 | 516.48 | 534.92 |
| eSol | 89.79 | 14.13 | 7611.00 | 4331.36 | 1124.45 | 976.64 |
| All Features | 111.09 | 13.14 | 6949.88 | 4988.28 | 1744.36 | 1448.88 |
Comparison of classification performance (20 runs of 10-fold cross-validation) on 40 UCI datasets.
| Accuracy | AUC | Δ (J48 - VTJ48) | ||||
| J48 | VTJ48 | J48 | VTJ48 | ACC | AUC | |
| Anneal | 98.64±0.2 | 98.93±0.2 | 99.36±0.3 | 98.85±0.3 | −0.28 | 0.51 |
| anneal.orig | 92.34±0.5 | 81.34±1 | 97.47±0.4 | 83.6±2.6 | 11.00 | 13.87 |
| arrhythmia | 65.88±1.1 | 70.63±1 | 73.58±1.4 | 79.01±1 | −4.75 | −5.43 |
| audiology | 77.3±1.4 | 66.31±3.9 | 92.31±0.6 | 91.78±1 | 11.00 | 0.53 |
| Autos | 82.59±2.6 | 64.07±2.4 | 91.45±1.1 | 82.42±2.4 | 18.51 | 9.04 |
| balance-scale | 77.9±0.9 | 77.35±0.7 | 82.36±0.8 | 83.93±1.1 | 0.55 | −1.57 |
| breast-cancer | 74.25±0.8 | 74.48±0.9 | 58.76±1.8 | 59.69±1.5 | −0.23 | −0.93 |
| breast-w | 94.64±0.4 | 94.69±0.4 | 95.21±1 | 95.44±0.6 | −0.06 | −0.23 |
| Colic | 85.15±0.4 | 85.03±0.7 | 80.79±0.9 | 81.12±1.2 | 0.12 | −0.33 |
| colic.orig | 66.3±0 | 65.33±1.7 | 48.55±0 | 70.31±1.5 | 0.98 | −21.76 |
| credit-a | 85.83±0.7 | 86.24±0.7 | 88.49±0.7 | 89.18±0.8 | −0.41 | −0.70 |
| credit-g | 71.03±0.8 | 71.85±0.6 | 64.46±1.2 | 70.96±0.6 | −0.82 | −6.50 |
| diabetes | 74.29±1.1 | 74.52±1.1 | 75.31±1.3 | 74.6±1.4 | −0.23 | 0.71 |
| Ecoli | 82.96±1.2 | 82.62±1.1 | 90.63±0.8 | 91.03±0.6 | 0.34 | −0.40 |
| Glass | 67.17±2.5 | 67.78±2.2 | 80.13±2 | 80.97±1.3 | −0.61 | −0.85 |
| heart-c | 76.85±1.6 | 76.2±1.6 | 77.24±2.4 | 77.73±2 | 0.64 | −0.49 |
| heart-h | 78.33±1.1 | 78.4±1.2 | 75.22±1.5 | 77.53±1.8 | −0.07 | −2.31 |
| heart-statlog | 77.83±1.7 | 78.56±2.1 | 77.49±2.6 | 77.91±2.2 | −0.72 | −0.42 |
| hepatitis | 79.77±1.9 | 79.84±1.7 | 67.57±4.6 | 70.54±4.7 | −0.06 | −2.97 |
| hypothyroid | 99.53±0 | 99.55±0 | 99.27±0.2 | 99.28±0.2 | −0.02 | −0.01 |
| ionosphere | 89.9±1.1 | 89.93±1 | 88.95±1.7 | 88.11±1.4 | −0.03 | 0.83 |
| Iris | 94.7±0.9 | 94.7±0.9 | 95.73±0.7 | 95.76±0.8 | 0.00 | −0.03 |
| kr-vs-kp | 99.39±0.1 | 97.35±0.1 | 99.81±0 | 99.41±0.1 | 2.03 | 0.40 |
| Labor | 80.09±3.1 | 82.28±3.1 | 72.05±4.7 | 75.89±4.4 | −2.19 | −3.85 |
| Letter | 88.02±0.2 | 29.42±0.6 | 95.4±0.1 | 88.77±0.2 | 58.60 | 6.63 |
| Lymph | 77.03±1.5 | 76.45±1.9 | 79.39±1.9 | 78.73±3 | 0.57 | 0.66 |
| mushroom | 100±0 | 100±0 | 100±0 | 100±0 | 0.00 | 0.00 |
| optdigits | 90.51±0.2 | 73.94±0.4 | 95.39±0.1 | 93.69±0.1 | 16.57 | 1.70 |
| pendigits | 96.53±0.1 | 80.13±0.6 | 98.44±0.1 | 96.89±0.1 | 16.40 | 1.56 |
| primary-tumor | 42.68±1.5 | 41.83±1 | 71.95±0.8 | 71.52±1.1 | 0.86 | 0.43 |
| segment | 96.93±0.2 | 92±0.3 | 98.66±0.1 | 98.34±0.1 | 4.92 | 0.32 |
| Sick | 98.73±0.1 | 98.38±0.1 | 95.51±0.7 | 92.05±1.1 | 0.35 | 3.46 |
| Sonar | 72.07±3.1 | 72.26±2.6 | 73.58±3.3 | 73.13±3.2 | −0.19 | 0.44 |
| soybean | 91.96±0.8 | 61.46±0.9 | 98.11±0.3 | 94.87±0.2 | 30.50 | 3.23 |
| Splice | 94.13±0.2 | 94.45±0.2 | 96.67±0.1 | 97.92±0.1 | −0.32 | −1.25 |
| Vehicle | 72.21±1.2 | 71.64±1 | 85.38±0.7 | 89.31±0.4 | 0.57 | −3.93 |
| Vote | 96.41±0.4 | 96.38±0.4 | 96.97±0.4 | 97.03±0.4 | 0.03 | −0.06 |
| Vowel | 80.11±1.3 | 43.39±1.1 | 92.34±0.6 | 87.78±0.5 | 36.72 | 4.56 |
| waveform-5000 | 75.36±0.6 | 74.11±0.4 | 82.82±0.5 | 88.72±0.2 | 1.25 | −5.90 |
| Zoo | 92.23±0.4 | 92.23±0.4 | 97.67±0.1 | 97.67±0.1 | 0.00 | 0.00 |
| J48/tie/VTJ48 | (21/3/16) | (17/2/21) | ||||
Comparison of classification performance (20 runs of 10-fold cross-validation) on the protein datasets.
| Accuracy | AUC | Δ (J48 - VTJ48) | ||||
| J48 | VTJ48 | J48 | VTJ48 | ACC | AUC | |
| MonomersNatural | 70.76±0.9 | 72.41±0.7 | 70.58±1.3 | 76.41±0.6 | −1.66 | −5.83 |
| DimersNatural | 62.58±1 | 61.94±0.9 | 64.65±1 | 64.45±1 | 0.64 | 0.20 |
| TrimersNatural | 55.44±0.3 | 55.33±0.4 | 53.91±0.6 | 53.97±0.7 | 0.10 | −0.06 |
| MonomersHydro | 64.64±0.9 | 64.58±0.9 | 68.1±0.8 | 68.08±0.8 | 0.06 | 0.02 |
| TrimersHydro | 62.79±0.8 | 63.25±0.6 | 64.43±1 | 64.68±0.7 | −0.46 | −0.25 |
| MonomersConfSimi | 66.75±0.9 | 66.79±0.9 | 71.97±0.9 | 71.96±0.8 | −0.03 | 0.01 |
| DimersConfSimi | 64.68±1 | 66.51±0.6 | 63.35±1.1 | 68.82±0.9 | −1.83 | −5.48 |
| TrimersConfSimi | 63.25±0.8 | 63.69±0.7 | 65.77±1.1 | 66.28±0.8 | −0.44 | −0.51 |
| MonomersBlosum | 66.46±0.7 | 66.62±0.7 | 69.33±0.8 | 69.79±0.7 | −0.16 | −0.46 |
| DimersBlosum | 66.32±1 | 69.27±0.8 | 65.65±1.2 | 73.3±0.8 | −2.95 | −7.66 |
| MonomersClustEm14 | 70.07±1 | 71.13±0.8 | 70.73±1 | 74.19±0.6 | −1.06 | −3.45 |
| DimersClustEm14 | 66.87±0.8 | 67.52±1 | 69.33±1 | 71.23±0.9 | −0.64 | −1.90 |
| TrimersClustEm14 | 73.74±0.7 | 76.31±0.7 | 73.62±1 | 80.43±0.8 | −2.57 | −6.81 |
| MonomersClustEm17 | 72.69±0.8 | 74.22±0.5 | 72.44±1.2 | 77.12±0.6 | −1.53 | −4.68 |
| DimersClustEm17 | 63.88±1 | 65.06±0.9 | 63.68±1 | 67.51±0.9 | −1.18 | −3.83 |
| TrimersClustEm17 | 62.35±1 | 61.37±1.2 | 62.92±1.2 | 62.57±1.3 | 0.98 | 0.35 |
| MonomersPhysChem | 71.64±0.9 | 71.64±0.6 | 75.07±0.8 | 75.19±0.7 | 0.00 | −0.12 |
| DimersPhysChem | 68.93±0.8 | 71.29±0.8 | 68.78±1.3 | 73.44±0.9 | −2.36 | −4.66 |
| Computed | 74.92±0.5 | 74.75±0.6 | 79.2±0.6 | 79.41±0.6 | 0.17 | −0.21 |
| eSol | 61.16±0.8 | 61.47±0.8 | 63.67±0.9 | 63.6±0.9 | −0.31 | 0.07 |
| All Features | 72.19±1 | 75.87±0.8 | 71.63±1.4 | 81.21±0.6 | −3.68 | −9.57 |
| J48/tie/VTJ48 | (5/1/15) | (5/0/16) | ||||
Comparison of decision tree dimensions on the GEMLeR datasets including the number of leaves.
| Leaves | Width | Height | ||||
| J48 | VTJ48 | J48 | VTJ48 | J48 | VTJ48 | |
| OVA_Breast | 21.60 | 13.50 | 1673.00 | 1199.40 | 728.80 | 609.00 |
| OVA_Colon | 16.70 | 12.30 | 1608.30 | 1430.00 | 609.30 | 571.90 |
| OVA_Endometrium | 13.20 | 13.00 | 1129.50 | 1151.40 | 616.80 | 616.80 |
| OVA_Kidney | 11.50 | 11.10 | 1169.50 | 1117.90 | 542.00 | 549.50 |
| OVA_Lung | 12.00 | 13.20 | 1053.40 | 1069.70 | 616.60 | 661.20 |
| OVA_Omentum | 17.70 | 12.70 | 1291.30 | 1326.10 | 802.80 | 802.80 |
| OVA_Ovary | 25.50 | 13.90 | 2148.40 | 1842.00 | 773.20 | 743.40 |
| OVA_Prostate | 2.00 | 3.60 | 191.00 | 249.40 | 224.00 | 345.60 |
| OVA_Uterus | 23.60 | 15.30 | 1883.20 | 1563.80 | 758.50 | 721.50 |
| OVA_Uterus | 21.60 | 13.50 | 1673.00 | 1199.40 | 728.80 | 609.00 |
Comparison of classification performance (20 runs of 10-fold cross-validation) on the GEMLeR datasets.
| Accuracy | AUC | Δ (J48 - VTJ48) | ||||
| J48 | VTJ48 | J48 | VTJ48 | ACC | AUC | |
| OVA_Breast | 93.53±0.4 | 94.63±0.4 | 89.94±0.8 | 90.02±1 | −1.10 | −0.07 |
| OVA_Colon | 96.31±0.4 | 96.7±0.3 | 92.39±1.2 | 91.76±1.3 | −0.39 | 0.62 |
| OVA_Endometrium | 95.15±0.4 | 95.08±0.5 | 63.57±6.5 | 64.11±5.4 | 0.06 | −0.53 |
| OVA_Kidney | 96.38±0.3 | 96.31±0.3 | 93.03±0.8 | 93.25±0.7 | 0.06 | −0.22 |
| OVA_Lung | 97.35±0.2 | 97.28±0.3 | 90.12±1.7 | 89.87±1.4 | 0.06 | 0.25 |
| OVA_Omentum | 93.98±0.5 | 94.43±0.4 | 54.82±5.9 | 67.99±7.9 | −0.45 | −13.16 |
| OVA_Ovary | 92.23±0.6 | 92.62±0.6 | 79.21±2.2 | 81.84±2.2 | −0.39 | −2.63 |
| OVA_Prostate | 99.68±0.1 | 99.61±0.1 | 97.02±1 | 98.69±0.8 | 0.06 | −1.67 |
| OVA_Uterus | 92.17±0.4 | 92.43±0.3 | 73.16±3.5 | 70.22±3.2 | −0.26 | 2.93 |
| J48/tie/VTJ48 | (4/0/5) | (3/0/6) | ||||
Figure 2Comparison of original J48 decision tree and visually tuned version from VTJ48 on All Features dataset.
Figure 3Comparison of durations for different datasets.
Top 5 rules with the highest support in All Features extracted from J48 and VTJ48 decision trees.
| Rule | Conditions | Support | Error |
|
| |||
| IF Length< = 233 AND MonomersClustEm17_34>0.136 AND TrimersConfSimi_40< = 0.002 AND TrimersClustEm17_98< = 0.005 AND DimersConfSimi_19< = 0.069 THEN Soluble | 5 | 228 | 1.32 |
| IF Length>233 AND DimersClustEm17_102< = 0.069 AND MonomersNatural_0>0.047 AND Ip >5.181 AND TrimersClustEm17_96< = 0.002 AND Length >251 AND MonomersBlosum_14>0.074 AND TrimersNatural_19< = 0 AND MonomersNatural_1>0.039 THEN Insoluble | 9 | 218 | 0.92 |
| IF Length>233 AND DimersClustEm17_102< = 0.069 AND MonomersNatural_0< = 0.047 AND DimersClustEm14_70< = 0.002 AND TrimersClustEm17_90< = 0 AND DimersBlosum_40< = 0.015 AND MonomersClustEm14_20>0.132 AND TrimersClustEm17_85< = 0.003 THEN Soluble | 8 | 53 | 5.66 |
| IF Length>233 AND DimersClustEm17_102>0.069 AND DimersClustEm17_95< = 0.0121 AND DimersClustEm14_62>0.004 AND MonomersConfSimi_8>0.076 AND MonomersBlosum_14>0.076 AND DimersClustEm14_65< = 0.001 AND DimersClustEm14_100< = 0.002 AND TrimersNatural_6< = 0 AND TrimersClustEm14_46< = 0.003 AND DimersNatural_5< = 0.009 AND DimersClustEm14_71< = 0.004 THEN Soluble | 11 | 49 | 2.04 |
| IF Length< = 233 AND MonomersClustEm17_34< = 0.136 AND MonomersBlosum_16< = 0.173 AND DimersConfSimi_14>0.020 AND DimersClustEm14_59< = 0.040 AND MonomersClustEm17_34< = 0.113 AND TrimersNatural_0< = 0.002 AND MonomersNatural_2>0.066 AND TrimersClustEm17_80< = 0.002 AND DimersBlosum_58< = 0.009 AND DimersBlosum_38< = 0.032 AND MonomersClustEm14_22>0.022 AND DimersPhysChem_118>0.002 AND TrimersClustEm14_65< = 0.005 THEN Insoluble | 14 | 47 | 2.13 |
|
| |||
| IF Length>233 AND DimersClustEm17_102< = 0.069 AND MonomersNatural_0>0.047 THEN Insoluble | 3 | 593 | 17.54 |
| IF Length< = 233 AND MonomersClustEm17_34>0.136 THEN Soluble | 2 | 287 | 5.23 |
| IF Length< = 233 AND MonomersClustEm17_34< = 0.136 AND MonomersBlosum_16< = 0.173 AND DimersClustEm14_59< = 0.040 AND MonomersBlosum_14>0.086 AND MonomersHydro_0>0.324 THEN Insoluble | 6 | 100 | 30.00 |
| IF Length< = 233 AND MonomersClustEm17_34< = 0.136 AND MonomersBlosum_16< = 0.173 AND DimersClustEm14_59< = 0.040 AND MonomersBlosum_14< = 0.086 THEN Soluble | 5 | 99 | 21.21 |
| IF Length< = 233 AND MonomersClustEm17_34< = 0.136 AND MonomersBlosum_16< = 0.173 AND DimersClustEm14_59< = 0.040 AND MonomersBlosum_14>0.086 AND MonomersHydro_0< = 0.324 AND MonomersClustEm17_34>0.110 THEN Soluble | 7 | 68 | 20.59 |
Figure 4Pseudocode of decision tree reduction in Visually Tuned J48.