| Literature DB >> 25866773 |
Chien-Hung Huang1, Huai-Shun Peng1, Ka-Lok Ng2.
Abstract
Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25866773 PMCID: PMC4381656 DOI: 10.1155/2015/312047
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1System flowchart for this study.
Figure 2Data sources and datasets generation.
Figure 3The calculation of score CLD for the protein pair (P , P ).
Performance comparison for the individual algorithm sorted by F1 value.
| Type | Algorithm | ACC | SPE | SEN |
| MCC | PPV | AUC |
|---|---|---|---|---|---|---|---|---|
| Trees | LMT | 0.772 | 0.802 | 0.748 | 0.774 | 0.548 | 0.821 | 0.774 |
| Trees | SimpleCart | 0.770 | 0.804 | 0.742 | 0.773 | 0.546 | 0.825 | 0.775 |
| Trees | J48 | 0.767 | 0.799 | 0.741 | 0.770 | 0.538 | 0.818 | 0.771 |
| Trees | J48graft | 0.767 | 0.799 | 0.742 | 0.769 | 0.538 | 0.818 | 0.771 |
| Trees | REPTree | 0.766 | 0.796 | 0.741 | 0.767 | 0.536 | 0.818 | 0.767 |
| Trees | FT | 0.763 | 0.798 | 0.735 | 0.766 | 0.528 | 0.821 | 0.766 |
| Rules | DTNB | 0.760 | 0.804 | 0.728 | 0.763 | 0.527 | 0.833 | 0.765 |
| Trees | NBTree | 0.760 | 0.796 | 0.733 | 0.763 | 0.527 | 0.819 | 0.764 |
| Trees | RandomForest | 0.760 | 0.777 | 0.744 | 0.761 | 0.524 | 0.791 | 0.761 |
| Rules | Ridor | 0.718 | 0.910 | 0.650 | 0.758 | 0.492 | 0.954 | 0.780 |
| Rules | Jrip | 0.754 | 0.790 | 0.726 | 0.757 | 0.512 | 0.817 | 0.757 |
| Rules | DecisionTable | 0.752 | 0.790 | 0.722 | 0.756 | 0.509 | 0.817 | 0.757 |
| Rules | PART | 0.744 | 0.758 | 0.745 | 0.748 | 0.494 | 0.754 | 0.751 |
| Lazy | Kstar | 0.744 | 0.766 | 0.725 | 0.744 | 0.491 | 0.784 | 0.744 |
| Functions | MultilayerPerceptron | 0.724 | 0.792 | 0.683 | 0.734 | 0.462 | 0.835 | 0.739 |
| Trees | LADTree | 0.720 | 0.762 | 0.696 | 0.727 | 0.447 | 0.788 | 0.729 |
| Trees | RandomTree | 0.721 | 0.721 | 0.720 | 0.721 | 0.440 | 0.723 | 0.721 |
| Lazy | LWL | 0.610 | 1.000 | 0.560 | 0.720 | 0.350 | 1.000 | 0.780 |
| Misc | VFI | 0.610 | 1.000 | 0.560 | 0.720 | 0.350 | 1.000 | 0.780 |
| Rules | ConjunctiveRule | 0.610 | 1.000 | 0.560 | 0.720 | 0.350 | 1.000 | 0.780 |
| Trees | DecisionStump | 0.610 | 1.000 | 0.560 | 0.720 | 0.350 | 1.000 | 0.780 |
| Trees | ADTree | 0.709 | 0.762 | 0.685 | 0.719 | 0.433 | 0.787 | 0.724 |
| Bayes | BayesNet | 0.714 | 0.755 | 0.683 | 0.717 | 0.431 | 0.796 | 0.719 |
| Lazy | IB1 | 0.713 | 0.716 | 0.711 | 0.713 | 0.427 | 0.718 | 0.713 |
| Lazy | Ibk | 0.713 | 0.716 | 0.711 | 0.713 | 0.427 | 0.718 | 0.713 |
| Functions | Logistic | 0.669 | 0.652 | 0.692 | 0.672 | 0.342 | 0.606 | 0.673 |
| Bayes | BayesianLogisticRegression | 0.670 | 0.655 | 0.687 | 0.671 | 0.342 | 0.622 | 0.673 |
| Functions | SimpleLogistic | 0.669 | 0.652 | 0.691 | 0.670 | 0.342 | 0.605 | 0.672 |
| Functions | SMO | 0.665 | 0.641 | 0.699 | 0.667 | 0.334 | 0.580 | 0.670 |
| Functions | VotedPerceptron | 0.642 | 0.606 | 0.721 | 0.657 | 0.305 | 0.467 | 0.664 |
| Rules | Nnge | 0.522 | 0.511 | 0.858 | 0.641 | 0.128 | 0.052 | 0.686 |
| Rules | OneR | 0.635 | 0.629 | 0.641 | 0.635 | 0.269 | 0.610 | 0.634 |
| Functions | RBFNetwork | 0.624 | 0.603 | 0.655 | 0.627 | 0.254 | 0.529 | 0.629 |
| Bayes | NaiveBayes | 0.598 | 0.571 | 0.660 | 0.614 | 0.214 | 0.410 | 0.615 |
| Bayes | NaiveBayesUpdateable | 0.598 | 0.571 | 0.660 | 0.614 | 0.214 | 0.410 | 0.615 |
| Bayes | NaiveBayesSimple | 0.598 | 0.571 | 0.660 | 0.613 | 0.214 | 0.411 | 0.614 |
| Bayes | NaiveBayesMultinomial | 0.576 | 0.554 | 0.634 | 0.590 | 0.168 | 0.366 | 0.594 |
| Misc | HyperPipes | 0.570 | 0.568 | 0.579 | 0.571 | 0.143 | 0.534 | 0.572 |
| Rules | ZeroR | 0.510 | 0.510 | 0.510 | 0.510 | 0.010 | 0.510 | 0.510 |
Performance comparison by voting with the majority sorted by F1 value.
| Classifiers | ACC | SPE | SEN |
| MCC | PPV | AUC |
|---|---|---|---|---|---|---|---|
| TOP: 23 | 0.774 | 0.855 | 0.721 | 0.786 | 0.562 | 0.890 | 0.787 |
| TOP: 11 | 0.781 | 0.829 | 0.744 | 0.785 | 0.569 | 0.855 | 0.787 |
| TOP: 13 | 0.781 | 0.827 | 0.746 | 0.785 | 0.567 | 0.851 | 0.786 |
| TOP: 9 | 0.783 | 0.820 | 0.751 | 0.784 | 0.568 | 0.844 | 0.786 |
| TOP: 19 | 0.776 | 0.841 | 0.734 | 0.784 | 0.566 | 0.872 | 0.787 |
| TOP: 21 | 0.775 | 0.856 | 0.723 | 0.784 | 0.564 | 0.889 | 0.789 |
| TOP: 25 | 0.775 | 0.855 | 0.723 | 0.784 | 0.562 | 0.888 | 0.789 |
| TOP: 27 | 0.774 | 0.847 | 0.727 | 0.784 | 0.562 | 0.879 | 0.787 |
| TOP: 17 | 0.779 | 0.826 | 0.742 | 0.783 | 0.565 | 0.852 | 0.784 |
| TOP: 7 | 0.780 | 0.819 | 0.748 | 0.782 | 0.563 | 0.840 | 0.784 |
| TOP: 15 | 0.779 | 0.824 | 0.742 | 0.782 | 0.564 | 0.852 | 0.785 |
| TOP: 29 | 0.773 | 0.836 | 0.730 | 0.780 | 0.556 | 0.867 | 0.783 |
| TOP: 3 | 0.777 | 0.811 | 0.749 | 0.779 | 0.558 | 0.831 | 0.780 |
| TOP: 5 | 0.776 | 0.811 | 0.748 | 0.779 | 0.556 | 0.830 | 0.780 |
| TOP: 31 | 0.775 | 0.827 | 0.738 | 0.777 | 0.556 | 0.853 | 0.783 |
| TOP: 33 | 0.774 | 0.819 | 0.738 | 0.776 | 0.552 | 0.847 | 0.778 |
| TOP: 35 | 0.771 | 0.811 | 0.738 | 0.773 | 0.545 | 0.837 | 0.775 |
| TOP: 37 | 0.768 | 0.802 | 0.739 | 0.770 | 0.540 | 0.826 | 0.772 |
| TOP: 39 | 0.768 | 0.802 | 0.739 | 0.770 | 0.541 | 0.826 | 0.771 |
Performance comparison by voting with the majority for five classification types.
| Type | Classifiers | ACC | SPE | SEN |
| MCC | PPV | AUC |
|---|---|---|---|---|---|---|---|---|
| Bayes | TOP: 3 | 0.672 | 0.653 | 0.695 | 0.672 | 0.346 | 0.609 | 0.673 |
| TOP: 5 | 0.599 | 0.571 | 0.660 | 0.614 | 0.215 | 0.410 | 0.614 | |
|
| ||||||||
| Functions | TOP: 5 | 0.673 | 0.653 | 0.700 | 0.676 | 0.349 | 0.600 | 0.676 |
| TOP: 3 | 0.669 | 0.652 | 0.692 | 0.672 | 0.342 | 0.606 | 0.673 | |
|
| ||||||||
| Lazy | TOP: 3 | 0.743 | 0.837 | 0.688 | 0.755 | 0.504 | 0.883 | 0.763 |
|
| ||||||||
| Rules | TOP: 5 | 0.764 | 0.825 | 0.721 | 0.769 | 0.537 | 0.856 | 0.774 |
| TOP: 7 | 0.764 | 0.826 | 0.721 | 0.769 | 0.537 | 0.857 | 0.774 | |
| TOP: 9 | 0.765 | 0.817 | 0.726 | 0.769 | 0.534 | 0.848 | 0.771 | |
| TOP: 3 | 0.755 | 0.838 | 0.705 | 0.766 | 0.528 | 0.877 | 0.771 | |
|
| ||||||||
| Trees | TOP: 11 | 0.781 | 0.831 | 0.744 | 0.786 | 0.567 | 0.859 | 0.788 |
| TOP: 9 | 0.780 | 0.820 | 0.748 | 0.783 | 0.563 | 0.842 | 0.785 | |
| TOP: 7 | 0.779 | 0.816 | 0.748 | 0.781 | 0.561 | 0.838 | 0.782 | |
| TOP: 3 | 0.777 | 0.811 | 0.749 | 0.779 | 0.558 | 0.831 | 0.780 | |
| TOP: 5 | 0.776 | 0.811 | 0.748 | 0.779 | 0.556 | 0.830 | 0.780 | |
Figure 4The seven performance measures for group voting with the majority.
The AUC value of the four features.
| Feature | AUC | Rank |
|---|---|---|
| DFS_C | 0.677 | 1 |
| CLD | 0.651 | 2 |
| DDI | 0.546 | 3 |
| DFS_X | 0.526 | 4 |
Performance comparison for the individual algorithm using the CLD feature sorted by F1.
| Type | Algorithm | ACC | SPE | SEN |
| MCC | PPV | AUC |
|---|---|---|---|---|---|---|---|---|
| Trees | SimpleCart | 0.653 | 0.651 | 0.652 | 0.653 | 0.303 | 0.647 | 0.653 |
| Trees | REPTree | 0.650 | 0.650 | 0.649 | 0.650 | 0.300 | 0.649 | 0.650 |
| Trees | FT | 0.645 | 0.649 | 0.641 | 0.646 | 0.288 | 0.658 | 0.646 |
| Rules | DecisionTable | 0.642 | 0.650 | 0.639 | 0.644 | 0.288 | 0.663 | 0.644 |
| Rules | DTNB | 0.642 | 0.650 | 0.639 | 0.644 | 0.288 | 0.663 | 0.644 |
| Trees | NBTree | 0.643 | 0.648 | 0.639 | 0.644 | 0.287 | 0.661 | 0.644 |
| Bayes | BayesNet | 0.642 | 0.647 | 0.640 | 0.643 | 0.286 | 0.657 | 0.643 |
| Trees | ADTree | 0.642 | 0.644 | 0.641 | 0.643 | 0.283 | 0.646 | 0.643 |
| Rules | Ridor | 0.614 | 0.659 | 0.639 | 0.642 | 0.257 | 0.635 | 0.647 |
| Trees | LADTree | 0.642 | 0.648 | 0.641 | 0.642 | 0.287 | 0.653 | 0.644 |
| Trees | LMT | 0.639 | 0.656 | 0.625 | 0.640 | 0.280 | 0.693 | 0.640 |
| Rules | PART | 0.639 | 0.655 | 0.625 | 0.639 | 0.278 | 0.695 | 0.639 |
| Trees | J48 | 0.639 | 0.655 | 0.627 | 0.639 | 0.280 | 0.689 | 0.641 |
| Trees | J48graft | 0.639 | 0.655 | 0.627 | 0.639 | 0.280 | 0.689 | 0.641 |
| Rules | Jrip | 0.637 | 0.639 | 0.638 | 0.638 | 0.277 | 0.634 | 0.638 |
| Rules | OneR | 0.635 | 0.629 | 0.641 | 0.635 | 0.269 | 0.610 | 0.634 |
| Functions | VotedPerceptron | 0.611 | 0.579 | 0.697 | 0.632 | 0.248 | 0.392 | 0.638 |
| Lazy | LWL | 0.612 | 0.582 | 0.683 | 0.629 | 0.246 | 0.431 | 0.632 |
| Trees | DecisionStump | 0.612 | 0.582 | 0.684 | 0.629 | 0.246 | 0.428 | 0.632 |
| Rules | ConjunctiveRule | 0.613 | 0.583 | 0.681 | 0.628 | 0.245 | 0.437 | 0.631 |
| Bayes | NaiveBayes | 0.619 | 0.595 | 0.656 | 0.623 | 0.242 | 0.496 | 0.627 |
| Bayes | NaiveBayesSimple | 0.619 | 0.595 | 0.656 | 0.623 | 0.242 | 0.496 | 0.627 |
| Bayes | NaiveBayesUpdateable | 0.619 | 0.595 | 0.656 | 0.623 | 0.242 | 0.496 | 0.627 |
| Trees | RandomForest | 0.623 | 0.616 | 0.631 | 0.623 | 0.247 | 0.591 | 0.624 |
| Lazy | Ibk | 0.622 | 0.613 | 0.632 | 0.622 | 0.242 | 0.586 | 0.622 |
| Trees | RandomTree | 0.622 | 0.613 | 0.632 | 0.622 | 0.242 | 0.586 | 0.622 |
| Bayes | BayesianLogisticRegression | 0.621 | 0.612 | 0.630 | 0.621 | 0.240 | 0.583 | 0.621 |
| Functions | Logistic | 0.621 | 0.612 | 0.631 | 0.621 | 0.241 | 0.582 | 0.621 |
| Functions | SimpleLogistic | 0.621 | 0.612 | 0.633 | 0.621 | 0.241 | 0.578 | 0.621 |
| Functions | SMO | 0.619 | 0.607 | 0.640 | 0.621 | 0.245 | 0.549 | 0.622 |
| Lazy | Kstar | 0.619 | 0.606 | 0.640 | 0.620 | 0.242 | 0.547 | 0.623 |
| Functions | MultilayerPerceptron | 0.618 | 0.598 | 0.649 | 0.620 | 0.242 | 0.518 | 0.621 |
| Functions | RBFNetwork | 0.618 | 0.598 | 0.647 | 0.620 | 0.240 | 0.517 | 0.620 |
| Lazy | IB1 | 0.594 | 0.564 | 0.683 | 0.619 | 0.214 | 0.351 | 0.624 |
| Rules | Nnge | 0.544 | 0.601 | 0.529 | 0.562 | 0.106 | 0.832 | 0.564 |
| Misc | HyperPipes | 0.510 | 0.510 | 0.510 | 0.510 | 0.010 | 0.510 | 0.510 |
| Rules | ZeroR | 0.510 | 0.510 | 0.510 | 0.510 | 0.010 | 0.510 | 0.510 |
| Bayes | NaiveBayesMultinomial | 0.500 | 0.561 | 0.447 | 0.480 | −0.008 | 0.489 | 0.569 |
| Misc | VFI | 0.500 | NA | 0.500 | NA | NA | 1.000 | NA |
Figure 5The performance comparison of the individual algorithm for Aragues (blue) and the proposed method (red).
Performance comparison by voting with the majority using the CLD feature sorted by F1.
| Classifiers | ACC | SPE | SEN |
| MCC | PPV | AUC |
|---|---|---|---|---|---|---|---|
| TOP: 3 | 0.652 | 0.653 | 0.652 | 0.652 | 0.304 | 0.650 | 0.653 |
| TOP: 27 | 0.648 | 0.646 | 0.650 | 0.648 | 0.296 | 0.643 | 0.648 |
| TOP: 17 | 0.646 | 0.653 | 0.639 | 0.647 | 0.291 | 0.671 | 0.647 |
| TOP: 25 | 0.646 | 0.647 | 0.648 | 0.647 | 0.296 | 0.643 | 0.648 |
| TOP: 5 | 0.646 | 0.651 | 0.639 | 0.646 | 0.293 | 0.666 | 0.646 |
| TOP: 15 | 0.645 | 0.654 | 0.638 | 0.646 | 0.291 | 0.674 | 0.646 |
| TOP: 19 | 0.643 | 0.651 | 0.640 | 0.646 | 0.289 | 0.663 | 0.645 |
| TOP: 23 | 0.644 | 0.648 | 0.645 | 0.646 | 0.294 | 0.648 | 0.647 |
| TOP: 29 | 0.646 | 0.645 | 0.648 | 0.646 | 0.292 | 0.641 | 0.647 |
| TOP: 13 | 0.644 | 0.653 | 0.639 | 0.645 | 0.291 | 0.669 | 0.646 |
| TOP: 31 | 0.644 | 0.642 | 0.648 | 0.645 | 0.292 | 0.634 | 0.644 |
| TOP: 7 | 0.643 | 0.650 | 0.639 | 0.644 | 0.288 | 0.663 | 0.644 |
| TOP: 9 | 0.643 | 0.650 | 0.639 | 0.644 | 0.290 | 0.662 | 0.644 |
| TOP: 11 | 0.644 | 0.651 | 0.639 | 0.644 | 0.290 | 0.665 | 0.644 |
| TOP: 21 | 0.642 | 0.649 | 0.642 | 0.644 | 0.289 | 0.659 | 0.645 |
| TOP: 39 | 0.644 | 0.638 | 0.648 | 0.644 | 0.288 | 0.627 | 0.644 |
| TOP: 37 | 0.642 | 0.635 | 0.649 | 0.643 | 0.285 | 0.620 | 0.644 |
| TOP: 35 | 0.641 | 0.635 | 0.648 | 0.642 | 0.286 | 0.619 | 0.642 |
| TOP: 33 | 0.641 | 0.635 | 0.648 | 0.641 | 0.285 | 0.620 | 0.642 |
Figure 6The performance comparison of the voting with the majority for Aragues (blue) and the proposed method (red).
Performance evaluation for OMIM and HLungDB datasets.
| Type | Algorithm | Hit number | Hit ratio |
|---|---|---|---|
| Rules | Ridor | 1164 | 0.894 |
| Rules | ZeroR | 1159 | 0.890 |
| Trees | ADTree | 1119 | 0.859 |
| Misc | HyperPipes | 1115 | 0.856 |
| Functions | MultilayerPerceptron | 1076 | 0.826 |
| Trees | LADTree | 1061 | 0.815 |
| Rules | OneR | 1047 | 0.804 |
| Lazy | IB1 | 1023 | 0.786 |
| Lazy | IBk | 1023 | 0.786 |
| Bayes | BayesNet | 1020 | 0.783 |
| Rules | PART | 1019 | 0.783 |
| Trees | J48graft | 1018 | 0.782 |
| Trees | J48 | 1017 | 0.781 |
| Rules | DecisionTable | 1011 | 0.776 |
| Trees | FT | 1004 | 0.771 |
| Lazy | KStar | 999 | 0.767 |
| Bayes | BayesianLogisticRegression | 998 | 0.767 |
| Trees | NBTree | 995 | 0.764 |
| Rules | DTNB | 991 | 0.761 |
| Trees | RandomTree | 988 | 0.759 |
| Trees | LMT | 983 | 0.755 |
| Trees | REPTree | 975 | 0.749 |
| Trees | SimpleCart | 973 | 0.747 |
| Trees | RandomForest | 973 | 0.747 |
| Rules | JRip | 966 | 0.742 |
| Functions | Logistic | 964 | 0.740 |
| Functions | SimpleLogistic | 964 | 0.740 |
| Functions | SMO | 944 | 0.725 |
| Functions | RBFNetwork | 925 | 0.710 |
| Functions | VotedPerceptron | 844 | 0.648 |
| Bayes | NaiveBayesSimple | 827 | 0.635 |
| Bayes | NaiveBayes | 826 | 0.634 |
| Bayes | NaiveBayesUpdateable | 826 | 0.634 |
| Bayes | NaiveBayesMultinomial | 796 | 0.611 |
| Rules | NNge | 94 | 0.072 |
Summary of microarray datasets.
| GEO ID | Organization name | Number of DEGs |
|---|---|---|
| GSE7670 | Taipei Veterans General Hospital | 1874 |
| GSE10072 | National Cancer Institute, NIH | 3138 |
| GSE19804 | National Taiwan University | 5398 |
| GSE27262 | National Taiwan Yang Ming University | 8476 |
| 1345 (intersection) |