| Literature DB >> 29297278 |
Shahin Boluki1, Mohammad Shahrokh Esfahani2, Xiaoning Qian3, Edward R Dougherty3.
Abstract
BACKGROUND: Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution - for instance, genetic pathways - is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods.Entities:
Keywords: Biological pathways; Optimal Bayesian classification; Prior construction; Probabilistic Boolean networks
Mesh:
Substances:
Year: 2017 PMID: 29297278 PMCID: PMC5751802 DOI: 10.1186/s12859-017-1893-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1A schematic illustration of the proposed Bayesian prior construction approach for a binary-classification problem. Information contained in the biological signaling pathways and their corresponding regulating functions is transformed to prior probabilities by MKDIP. Previously observed sample points (labeled or unlabeled) are used along with the constructed priors to design a Bayesian classifier to classify a new sample point (patient)
Fig. 2An illustrative example showing the components directly connected to gene 1. In the Boolean function {AND, OR, NOT}={∧,∨,−}. Based on the regulating function of gene 1, it is up-regulated if gene 5 is up-regulated and genes 2 and 3 are down-regulated
Fig. 3Signaling pathways corresponding to Tables 1 and 2. Signaling pathways for: 3(a) the normal mammalian cell cycle (corresponding to Table 1) and 3(b) a simplified pathway involving TP53 (corresponding to Table 2)
Boolean regulating functions of normal mammalian cell cycle [51]. In the Boolean functions {AND, OR, NOT}={∧,∨,−}
| Gene | Node name | Boolean regulating function |
|---|---|---|
| CycD |
| Extracellular signal |
| Rb |
|
|
| E2F |
|
|
| CycE |
|
|
| CycA |
|
|
|
| ||
| p27 |
|
|
| Cdc20 |
|
|
| Cdh1 |
|
|
| UbcH10 |
|
|
| CycB |
|
|
Boolean regulating functions corresponding to the pathway in Fig. 3(b) [54]. In the Boolean functions {AND, OR, NOT}={∧,∨,−}
| Gene | Node name | Boolean regulating function |
|---|---|---|
| dna−dsb |
| Extracellular signal |
| ATM |
|
|
| P53 |
|
|
| Wip1 |
|
|
| Mdm2 |
|
|
The set of constraints extracted from the regulating functions and pathways for the TP53 network. Constraints extracted from the Boolean regulating functions in Table 2 corresponding to the pathway in Fig. 3(b) used in MKDIP-E, MKDIP-D, MKDIP-R (left). Constraints extracted based on [36] from the pathway in Fig. 3(b) used in RMEP, RMDIP, REMLP (right)
| (a) MKDIP Constraints | (b) Constraints in Methods of [ | ||
|---|---|---|---|
| Node | Constraint | Node | Constraint |
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
|
| ||
Expected true error of different classification rules for the mammalian cell-cycle network. The constructed priors are considered using two precision factors: optimal precision factor (left) and estimated precision factor (right), with c=0.5, and c=0.6, where the minimum achievable error (Bayes error) is denoted by E r r
| (a) | (b) | ||||||||||
| Method/ | 30 | 60 | 90 | 120 | 150 | Method/ | 30 | 60 | 90 | 120 | 150 |
| Hist | 0.3710 | 0.3423 | 0.3255 | 0.3155 | 0.3081 | Hist | 0.3710 | 0.3423 | 0.3255 | 0.3155 | 0.3081 |
| CART | 0.3326 | 0.3195 | 0.3057 | 0.3031 | 0.2975 | CART | 0.3326 | 0.3195 | 0.3057 | 0.3031 | 0.2975 |
| RF | 0.3359 | 0.3160 | 0.3015 | 0.2991 | 0.2933 | RF | 0.3359 | 0.3160 | 0.3015 | 0.2991 | 0.2933 |
| SVM | 0.3359 | 0.3112 |
| 0.2959 | 0.2940 | SVM | 0.3359 | 0.3112 | 0.2977 | 0.2959 | 0.2940 |
| Jeffreys’ | 0.3710 | 0.3423 | 0.3255 | 0.3155 | 0.3081 | Jeffreys’ | 0.3710 | 0.3423 | 0.3255 | 0.3155 | 0.3081 |
| RMEP | 0.3236 | 0.3070 | 0.3010 | 0.2946 | 0.2910 | RMEP | 0.3315 | 0.3059 | 0.2985 | 0.2963 | 0.2930 |
| RMDIP | 0.3236 | 0.3070 | 0.3010 | 0.2946 | 0.2910 | RMDIP | 0.3314 | 0.3060 | 0.2986 | 0.2965 | 0.2931 |
| REMLP | 0.3425 | 0.3264 | 0.3146 | 0.3067 | 0.3011 | REMLP | 0.3488 | 0.3352 | 0.3202 | 0.3101 | 0.3048 |
| MKDIP-E | 0.3221 | 0.3070 | 0.3010 | 0.2949 | 0.2910 | MKDIP-E | 0.3313 | 0.3056 | 0.2982 | 0.2962 | 0.2929 |
| MKDIP-D | 0.3232 | 0.3070 | 0.3010 | 0.2952 | 0.2910 | MKDIP-D | 0.3315 | 0.3061 | 0.2986 | 0.2965 | 0.2931 |
| MKDIP-R |
|
| 0.2985 |
|
| MKDIP-R |
|
|
|
|
|
| (c) | (d) | ||||||||||
| Method/ | 30 | 60 | 90 | 120 | 150 | Method/ | 30 | 60 | 90 | 120 | 150 |
| Hist | 0.3622 | 0.3608 | 0.3624 | 0.3641 | 0.3652 | Hist | 0.3622 | 0.3608 | 0.3624 | 0.3641 | 0.3652 |
| CART | 0.3554 | 0.3556 | 0.3507 | 0.3510 | 0.3447 | CART | 0.3554 | 0.3556 | 0.3507 | 0.3510 | 0.3447 |
| RF | 0.3524 | 0.3514 | 0.3467 | 0.3476 | 0.3420 | RF | 0.3524 | 0.3514 | 0.3467 | 0.3476 | 0.3420 |
| SVM | 0.3735 | 0.3684 | 0.3615 | 0.3602 | 0.3544 | SVM | 0.3735 | 0.3684 | 0.3615 | 0.3602 | 0.3544 |
| Jeffreys’ | 0.3620 | 0.3559 | 0.3519 | 0.3502 | 0.3472 | Jeffreys’ | 0.3620 | 0.3559 | 0.3519 | 0.3502 | 0.3472 |
| RMEP |
| 0.3385 |
|
|
| RMEP | 0.3528 | 0.3415 | 0.3407 | 0.3388 | 0.3378 |
| RMDIP |
|
|
|
|
| RMDIP | 0.3529 | 0.3415 | 0.3408 | 0.3388 | 0.3378 |
| REMLP | 0.3666 | 0.3625 | 0.3587 | 0.3558 | 0.3530 | REMLP | 0.3700 | 0.3650 | 0.3603 | 0.3578 | 0.3546 |
| MKDIP-E |
| 0.3384 |
|
|
| MKDIP-E | 0.3525 |
|
|
|
|
| MKDIP-D |
| 0.3386 |
|
|
| MKDIP-D | 0.3532 | 0.3418 | 0.3409 | 0.3389 | 0.3379 |
| MKDIP-R | 0.3437 | 0.3409 | 0.3404 | 0.3401 | 0.3389 | MKDIP-R |
| 0.3416 | 0.3416 | 0.3402 | 0.3387 |
The lowest error for each sample size is written in bold
Expected difference between the true model (for mammalian cell-cycle network) and estimated posterior probability masses. Optimal precision factor (left) and estimated precision factor (right), with c=0.5, and c=0.6
| (a) | (b) | ||||||||||
| Method/ | 30 | 60 | 90 | 120 | 150 | Method/ | 30 | 60 | 90 | 120 | 150 |
| Jeffreys’ | 0.2155 | 0.1578 | 0.1300 | 0.1134 | 0.1010 | Jeffreys’ | 0.2155 | 0.1578 | 0.1300 | 0.1134 | 0.1010 |
| RMEP | 0.1591 | 0.1293 | 0.1126 | 0.1020 | 0.0912 | RMEP | 0.1761 |
|
| 0.1032 |
|
| RMDIP | 0.1591 | 0.1294 | 0.1126 | 0.1020 | 0.0912 | RMDIP | 0.1761 |
|
| 0.1032 |
|
| REMLP | 0.1863 | 0.1436 | 0.1225 | 0.1088 | 0.0970 | REMLP | 0.2060 | 0.1607 | 0.1315 | 0.1120 | 0.1019 |
| MKDIP-E | 0.1589 | 0.1293 | 0.1126 | 0.1019 | 0.0911 | MKDIP-E | 0.1760 |
|
|
|
|
| MKDIP-D | 0.1591 | 0.1293 | 0.1126 | 0.1020 | 0.0912 | MKDIP-D | 0.1761 |
|
| 0.1032 |
|
| MKDIP-R |
|
|
|
|
| MKDIP-R |
| 0.1392 | 0.1184 | 0.1036 | 0.0949 |
| (c) | (d) | ||||||||||
| Method/ | 30 | 60 | 90 | 120 | 150 | Method/ | 30 | 60 | 90 | 120 | 150 |
| Jeffreys’ | 0.2183 | 0.1595 | 0.1322 | 0.1146 | 0.1027 | Jeffreys’ | 0.2183 | 0.1595 | 0.1322 | 0.1146 | 0.1027 |
| RMEP | 0.1628 | 0.1332 | 0.1154 | 0.1039 | 0.0946 | RMEP | 0.1805 |
| 0.1201 |
|
|
| RMDIP | 0.1628 | 0.1333 | 0.1154 | 0.1039 | 0.0947 | RMDIP | 0.1805 |
| 0.1201 |
|
|
| REMLP | 0.1867 | 0.1471 | 0.1247 | 0.1101 | 0.0990 | REMLP | 0.2065 | 0.1635 | 0.1346 | 0.1166 | 0.1036 |
| MKDIP-E | 0.1627 | 0.1332 | 0.1154 | 0.1038 | 0.0946 | MKDIP-E |
|
|
|
|
|
| MKDIP-D | 0.1628 | 0.1332 | 0.1154 | 0.1039 | 0.0946 | MKDIP-D | 0.1805 |
| 0.1201 |
|
|
| MKDIP-R |
|
|
|
|
| MKDIP-R | 0.1814 | 0.1421 | 0.1207 | 0.1065 | 0.0965 |
The lowest distance for each sample size is written in bold
Expected true error of different classification rules for the TP53 network. The constructed priors are considered using two precision factors: optimal precision factor (left) and estimated precision factor (right), with c=0.5, and c=0.6, where the minimum achievable error (Bayes error) is denoted by E r r
| (a) | (b) | ||||||||||
| Method/ | 15 | 30 | 45 | 60 | 75 | Method/ | 15 | 30 | 45 | 60 | 75 |
| Hist | 0.3586 | 0.3439 | 0.3337 | 0.3321 | 0.3296 | Hist | 0.3586 | 0.3439 |
| 0.3321 | 0.3296 |
| CART | 0.3633 | 0.3492 | 0.3350 | 0.3314 | 0.3295 | CART | 0.3633 | 0.3492 | 0.3350 |
| 0.3295 |
| RF | 0.3791 | 0.3574 | 0.3461 | 0.3400 | 0.3362 | RF | 0.3791 | 0.3574 | 0.3461 | 0.3400 | 0.3362 |
| SVM | 0.3902 | 0.3481 | 0.3433 | 0.3324 | 0.3322 | SVM | 0.3902 | 0.3481 | 0.3433 | 0.3324 | 0.3322 |
| Jeffreys’ | 0.3809 | 0.3439 | 0.3457 | 0.3321 | 0.3334 | Jeffreys’ | 0.3809 | 0.3439 | 0.3457 | 0.3321 | 0.3334 |
| RMEP | 0.3399 | 0.3392 | 0.3360 | 0.3315 | 0.3328 | RMEP | 0.3791 | 0.3489 | 0.3377 | 0.3329 | 0.3302 |
| RMDIP | 0.3399 | 0.3392 | 0.3360 | 0.3315 | 0.3328 | RMDIP | 0.3789 | 0.3490 | 0.3378 | 0.3329 | 0.3302 |
| REMLP | 0.3405 |
|
|
| 0.3287 | REMLP |
|
| 0.3350 | 0.3318 | 0.3292 |
| MKDIP-E |
| 0.3398 | 0.3351 | 0.3306 | 0.3297 | MKDIP-E | 0.3675 | 0.3470 | 0.3373 | 0.3326 | 0.3298 |
| MKDIP-D |
| 0.3398 | 0.3347 | 0.3306 | 0.3297 | MKDIP-D | 0.3668 | 0.3472 | 0.3374 | 0.3327 | 0.3298 |
| MKDIP-R | 0.3435 | 0.3354 | 0.3321 | 0.3295 |
| MKDIP-R | 0.3471 | 0.3402 | 0.3349 | 0.3316 |
|
| (c) | (d) | ||||||||||
| Method/ | 15 | 30 | 45 | 60 | 75 | Method/ | 15 | 30 | 45 | 60 | 75 |
| Hist | 0.3081 | 0.2965 | 0.2906 | 0.2883 | 0.2846 | Hist | 0.3081 | 0.2965 | 0.2906 | 0.2883 | 0.2846 |
| CART | 0.3173 | 0.2988 | 0.2882 | 0.2846 |
| CART | 0.3173 | 0.2988 | 0.2882 | 0.2846 |
|
| RF | 0.3333 | 0.3035 | 0.2946 | 0.2850 | 0.2842 | RF | 0.3333 | 0.3035 | 0.2946 | 0.2850 | 0.2842 |
| SVM | 0.3322 | 0.3091 | 0.2991 | 0.2926 | 0.2857 | SVM | 0.3322 | 0.3091 | 0.2991 | 0.2926 | 0.2857 |
| Jeffreys’ | 0.3105 | 0.2936 | 0.2860 |
| 0.2819 | Jeffreys’ | 0.3105 | 0.2936 |
|
| 0.2819 |
| RMEP |
| 0.2922 | 0.2847 | 0.2843 | 0.2835 | RMEP | 0.3346 | 0.3024 | 0.2894 | 0.2860 | 0.2823 |
| RMDIP |
| 0.2922 | 0.2847 | 0.2843 | 0.2835 | RMDIP | 0.3344 | 0.3023 | 0.2895 | 0.2858 | 0.2823 |
| REMLP | 0.3003 |
| 0.2869 | 0.2839 | 0.2832 | REMLP |
|
| 0.2910 | 0.2870 | 0.2850 |
| MKDIP-E |
| 0.2909 |
| 0.2851 | 0.2837 | MKDIP-E | 0.3341 | 0.3025 | 0.2898 | 0.2864 | 0.2822 |
| MKDIP-D |
| 0.2909 |
| 0.2851 | 0.2837 | MKDIP-D | 0.3347 | 0.3024 | 0.2898 | 0.2862 | 0.2822 |
| MKDIP-R | 0.3032 | 0.2917 | 0.2868 | 0.2843 | 0.2825 | MKDIP-R | 0.3096 | 0.2981 | 0.2910 | 0.2869 | 0.2849 |
The lowest error for each sample size is written in bold
Expected difference between the true model (for TP53 network) and estimated posterior probability masses. Optimal precision factor (left) and estimated precision factor (right), with c=0.5, and c=0.6
| (a) | (b) | ||||||||||
| Method/ | 15 | 30 | 45 | 60 | 75 | Method/ | 15 | 30 | 45 | 60 | 75 |
| Jeffreys’ | 0.2285 | 0.1716 | 0.1429 | 0.1242 | 0.1114 | Jeffreys’ | 0.2285 | 0.1716 | 0.1429 | 0.1242 | 0.1114 |
| RMEP | 0.1427 | 0.1165 | 0.1051 | 0.0934 | 0.0880 | RMEP | 0.2218 | 0.1578 |
| 0.1095 |
|
| RMDIP | 0.1424 | 0.1163 | 0.1048 | 0.0932 |
| RMDIP | 0.2217 | 0.1575 | 0.1281 |
|
|
| REMLP | 0.1698 | 0.1337 | 0.1199 | 0.1091 | 0.0985 | REMLP | 0.1845 | 0.1505 | 0.1366 | 0.1235 | 0.1133 |
| MKDIP-E | 0.1412 | 0.1161 | 0.1050 | 0.0933 | 0.0880 | MKDIP-E | 0.2149 | 0.1565 | 0.1282 | 0.1096 |
|
| MKDIP-D |
|
|
|
|
| MKDIP-D | 0.2149 | 0.1564 | 0.1281 | 0.1096 |
|
| MKDIP-R | 0.1564 | 0.1247 | 0.1118 | 0.1031 | 0.0930 | MKDIP-R |
|
| 0.1281 | 0.1171 | 0.1082 |
| (c) | (d) | ||||||||||
| Method/ | 15 | 30 | 45 | 60 | 75 | Method/ | 15 | 30 | 45 | 60 | 75 |
| Jeffreys’ | 0.2319 | 0.1723 | 0.1438 | 0.1262 | 0.1137 | Jeffreys’ | 0.2319 | 0.1723 | 0.1438 | 0.1262 | 0.1137 |
| RMEP | 0.1476 | 0.1222 | 0.1090 | 0.0987 | 0.0923 | RMEP | 0.2182 | 0.1599 | 0.1304 |
| 0.1032 |
| RMDIP | 0.1474 | 0.1220 | 0.1087 | 0.0985 | 0.0921 | RMDIP | 0.2179 | 0.1597 | 0.1303 |
|
|
| REMLP | 0.1751 | 0.1332 | 0.1192 | 0.1077 | 0.0980 | REMLP | 0.1937 | 0.1522 | 0.1363 | 0.1235 | 0.1144 |
| MKDIP-E | 0.1457 | 0.1215 | 0.1086 | 0.0985 | 0.0922 | MKDIP-E | 0.2165 | 0.1586 | 0.1304 | 0.1147 | 0.1036 |
| MKDIP-D |
|
|
|
|
| MKDIP-D | 0.2164 | 0.1585 | 0.1303 | 0.1147 | 0.1035 |
| MKDIP-R | 0.1574 | 0.1217 | 0.1093 | 0.1010 | 0.0926 | MKDIP-R |
|
|
| 0.1158 | 0.1086 |
The lowest distance for each sample size is written in bold
Expected errors of different Bayesian classification rules in the mixture model for the mammalian cell-cycle network. Expected true error (left) and expected error on unlabeled training data (right), with c 0=0.6
| Method/ | 30 | 60 | 90 | 120 | 150 | Method/ | 30 | 60 | 90 | 120 | 150 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PDCOTP |
|
|
| 0.3309 | 0.3334 | PDCOTP |
|
|
| 0.3355 | 0.3339 |
| Jeffreys’ | 0.4709 | 0.4743 | 0.4704 | 0.4675 | 0.4654 | Jeffreys’ | 0.4751 | 0.4621 | 0.4681 | 0.4700 | 0.4645 |
| RMEP | 0.3417 | 0.3340 | 0.3307 | 0.3300 | 0.3299 | RMEP | 0.3447 | 0.3409 | 0.3366 | 0.3323 | 0.3316 |
| RMDIP | 0.3408 | 0.3336 | 0.3300 | 0.3305 | 0.3301 | RMDIP |
| 0.3404 | 0.3342 | 0.3344 | 0.3343 |
| REMLP | 0.3754 | 0.3835 | 0.3882 | 0.3857 | 0.3844 | REMLP | 0.3748 | 0.3821 | 0.3908 | 0.3826 | 0.3812 |
| MKDIP-E | 0.3411 | 0.3341 |
| 0.3297 | 0.3306 | MKDIP-E | 0.3457 | 0.3386 | 0.3351 | 0.3312 | 0.3320 |
| MKDIP-D |
|
| 0.3306 | 0.3304 | 0.3303 | MKDIP-D | 0.3482 | 0.3387 | 0.3381 | 0.3342 | 0.3334 |
| MKDIP-R | 0.3457 | 0.3342 | 0.3299 |
|
| MKDIP-R | 0.3449 |
|
|
|
|
The lowest error for each sample size and the lowest error among practical methods is written in bold
Expected errors of different Bayesian classification rules in the mixture model for the TP53 network. Expected true error (left) and expected error on unlabeled training data (right), with c 0=0.6
| Method/ | 15 | 30 | 45 | 60 | 75 | Method/ | 15 | 30 | 45 | 60 | 75 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| PDCOTP |
|
|
|
|
| PDCOTP |
|
|
|
|
|
| Jeffreys’ | 0.4204 | 0.4324 | 0.4335 | 0.4432 | 0.4361 | Jeffreys’ | 0.4220 | 0.4314 | 0.4381 | 0.4419 | 0.4348 |
| RMEP |
|
| 0.3327 |
| 0.3422 | RMEP |
| 0.3350 | 0.3487 | 0.3543 | 0.3529 |
| RMDIP | 0.3297 | 0.3260 | 0.3327 | 0.3406 | 0.3432 | RMDIP | 0.3504 | 0.3423 | 0.3496 | 0.3551 | 0.3545 |
| REMLP | 0.3637 | 0.3687 | 0.3706 | 0.3658 | 0.3653 | REMLP | 0.3489 | 0.3579 | 0.3709 | 0.3593 | 0.3556 |
| MKDIP-E | 0.3312 | 0.3246 | 0.3322 | 0.3428 | 0.3386 | MKDIP-E | 0.3502 | 0.3378 | 0.3486 | 0.3585 | 0.3492 |
| MKDIP-D | 0.3321 |
|
| 0.3436 |
| MKDIP-D | 0.3551 |
|
| 0.3570 | 0.3475 |
| MKDIP-R | 0.3872 | 0.3749 | 0.3667 | 0.3607 | 0.3586 | MKDIP-R | 0.3613 | 0.3583 | 0.3589 |
|
|
The lowest error for each sample size and the lowest error among practical methods is written in bold
Fig. 4Signaling pathways corresponding to NSCLC classification. The pathways are collected from KEGG Pathways for NSCLC and PI3K-AKT pathways, and from [63]
Regulating functions corresponding to the signaling pathways in Fig. 4. In the Boolean functions {AND, OR, NOT}={∧,∨,−}
| Gene | Node name | Boolean regulating function |
|---|---|---|
| EGFR |
| - |
| PIK3CA |
|
|
| AKT |
|
|
| KRAS |
| - |
| RAF1 |
|
|
| BAD |
|
|
| P53 |
| - |
| BCL2 |
|
|
Expected error of different classification rules calculated on a real dataset. The classification is between LUA (class 0) and LUS (class 1), with c=0.57
| Method/ | 34 | 74 | 114 | 134 | 174 |
|---|---|---|---|---|---|
| Best Non Bayesian | 0.1764 | 0.1574 | 0.1473 | 0.1426 | 0.1371 |
| Jeffreys’ | 0.1766 | 0.1574 | 0.1476 | 0.1425 | 0.1371 |
| Best RM | 0.1426 | 0.1289 | 0.1164 | 0.1083 | 0.1000 |
| Best MKDIP |
|
|
|
|
|