| Literature DB >> 33267435 |
Yuguang Long1,2, Limin Wang2, Minghui Sun3.
Abstract
Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback-Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may "best match" the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.Entities:
Keywords: Kullback–Leibler divergence; attribute independence assumption; probability distribution; tree-augmented naive Bayes
Year: 2019 PMID: 33267435 PMCID: PMC7515236 DOI: 10.3390/e21080721
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Examples of different BNCs. (a) Naive Bayes, (b) Tree-augmented naive Bayes, (c) k-dependence Bayesian Classifier.
Figure 2The learning procedure of TAN on Balance-Scale.
Figure 3Distributions of and on Census-income.
Figure 4with different class labels on Census-income.
Figure 5The learning procedure of ETAN with k = 2 on Balance-Scale.
Data sets.
| No. | Data Set | Ins. | Att. | Class | No. | Data Set | Inst. | Att. | Class |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Labor | 57 | 16 | 2 | 21 | Vowel | 990 | 13 | 11 |
| 2 | Labor-Negotiations | 57 | 16 | 2 | 22 | Led | 1000 | 7 | 10 |
| 3 | Lymphography | 150 | 4 | 3 | 23 | Car | 1728 | 6 | 4 |
| 4 | Iris | 150 | 4 | 3 | 24 | Hypothyroid | 3163 | 25 | 2 |
| 5 | Hungarian | 294 | 13 | 2 | 25 | Dis | 3772 | 29 | 2 |
| 6 | Heart-Disease-C | 303 | 13 | 2 | 26 | Sick | 3772 | 29 | 2 |
| 7 | Soybean-Large | 307 | 35 | 19 | 27 | Abalone | 4177 | 8 | 3 |
| 8 | Ionosphere | 351 | 34 | 2 | 28 | Spambase | 4601 | 57 | 2 |
| 9 | House-Votes-84 | 435 | 16 | 2 | 29 | Waveform-5000 | 5000 | 40 | 3 |
| 10 | Musk1 | 476 | 166 | 2 | 30 | Page-Blocks | 5473 | 10 | 5 |
| 11 | Cylinder-Bands | 540 | 39 | 2 | 31 | Optdigits | 5620 | 64 | 10 |
| 12 | Chess | 551 | 39 | 2 | 32 | Satellite | 6435 | 36 | 6 |
| 13 | Syncon | 600 | 60 | 6 | 33 | Mushrooms | 8124 | 22 | 2 |
| 14 | Balance-Scale | 625 | 4 | 3 | 34 | Thyroid | 9169 | 29 | 20 |
| 15 | Soybean | 683 | 35 | 19 | 35 | Letter-Recog | 20000 | 26 | 2 |
| 16 | Credit-A | 690 | 15 | 2 | 36 | Adult | 48842 | 14 | 2 |
| 17 | Breast-Cancer-W | 699 | 9 | 2 | 37 | Connect-4 | 67557 | 42 | 3 |
| 18 | Pima-Ind-Diabetes | 768 | 8 | 2 | 38 | Waveform | 100000 | 21 | 3 |
| 19 | Vehicle | 846 | 18 | 4 | 39 | Census-Income | 299285 | 41 | 2 |
| 20 | Anneal | 898 | 38 | 6 | 40 | Poker-Hand | 1025010 | 10 | 10 |
Experimental results of zero-one loss.
| Data Set | NB | TAN | KDB | LR | ETAN | AODE | WAODE |
|---|---|---|---|---|---|---|---|
| Labor | 0.0289 | 0.0211 | 0.0279 | 0.0211 | 0.0274 | 0.0205 | 0.0200 |
| Labor-Negotiations | 0.0505 | 0.0763 | 0.0553 | 0.0422 | 0.0237 | 0.0268 | 0.0268 |
| Lymphography | 0.0902 | 0.0976 | 0.1041 | 0.1422 | 0.0814 | 0.0853 | 0.0951 |
| Iris | 0.0590 | 0.0550 | 0.0656 | 0.0343 | 0.0760 | 0.0626 | 0.0624 |
| Hungarian | 0.1646 | 0.1454 | 0.1480 | 0.1057 | 0.1456 | 0.1597 | 0.1611 |
| Heart-Disease-C | 0.1297 | 0.1263 | 0.1299 | 0.1300 | 0.1300 | 0.1171 | 0.1092 |
| Soybean-Large | 0.1070 | 0.1275 | 0.1086 | 0.1104 | 0.0964 | 0.0812 | 0.0655 |
| Ionosphere | 0.1220 | 0.0800 | 0.0855 | 0.1117 | 0.0817 | 0.0903 | 0.0061 |
| House-Votes-84 | 0.0899 | 0.0410 | 0.0258 | 0.0307 | 0.0406 | 0.0518 | 0.0406 |
| Musk1 | 0.1847 | 0.1563 | 0.1535 | 0.1357 | 0.1527 | 0.1670 | 0.1501 |
| Cylinder-Bands | 0.2000 | 0.3242 | 0.1939 | 0.1863 | 0.1746 | 0.1684 | 0.1286 |
| Chess | 0.1413 | 0.1427 | 0.1119 | 0.0832 | 0.1192 | 0.1380 | 0.0180 |
| Syncon | 0.0516 | 0.0203 | 0.0314 | 0.1123 | 0.0339 | 0.0334 | 0.1827 |
| Balance-Scale | 0.1840 | 0.1843 | 0.1902 | 0.0753 | 0.1877 | 0.1905 | 0.0503 |
| Soybean | 0.1015 | 0.0504 | 0.0491 | 0.0656 | 0.0622 | 0.0690 | 0.0900 |
| Credit-A | 0.0912 | 0.1171 | 0.1137 | 0.1279 | 0.1266 | 0.0892 | 0.0953 |
| Breast-Cancer-W | 0.0187 | 0.0315 | 0.0449 | 0.0375 | 0.0263 | 0.0243 | 0.1941 |
| Pima-Ind-Diabetes | 0.1957 | 0.1946 | 0.1944 | 0.1898 | 0.1905 | 0.1935 | 0.2398 |
| Vehicle | 0.3330 | 0.2384 | 0.2494 | 0.1540 | 0.2482 | 0.2435 | 0.0194 |
| Anneal | 0.0354 | 0.0195 | 0.0073 | 0.0788 | 0.0186 | 0.0180 | 0.2104 |
| Vowel | 0.3301 | 0.1931 | 0.1745 | 0.1688 | 0.2135 | 0.2268 | 0.1811 |
| Led | 0.2322 | 0.2247 | 0.2317 | 0.2211 | 0.2348 | 0.2327 | 0.3766 |
| Car | 0.0937 | 0.0478 | 0.0387 | 0.0536 | 0.0524 | 0.0597 | 0.0633 |
| Hypothyroid | 0.0116 | 0.0105 | 0.0096 | 0.0181 | 0.0095 | 0.0094 | 0.0315 |
| Dis | 0.0165 | 0.0194 | 0.0191 | 0.0195 | 0.0194 | 0.0163 | 0.0078 |
| Sick | 0.0246 | 0.0208 | 0.0198 | 0.0277 | 0.0215 | 0.0221 | 0.3212 |
| Abalone | 0.4180 | 0.3134 | 0.3033 | 0.3613 | 0.3089 | 0.3201 | 0.0574 |
| Spambase | 0.0929 | 0.0571 | 0.0497 | 0.0560 | 0.0488 | 0.0631 | 0.1184 |
| Waveform-5000 | 0.1762 | 0.1232 | 0.1157 | 0.1207 | 0.1122 | 0.1233 | 0.2172 |
| Page-Blocks | 0.0451 | 0.0306 | 0.0280 | 0.0282 | 0.0259 | 0.0259 | 0.0224 |
| Optdigits | 0.0685 | 0.0275 | 0.0250 | 0.0382 | 0.0182 | 0.0203 | 0.0902 |
| Satellite | 0.1746 | 0.0948 | 0.0808 | 0.1064 | 0.0834 | 0.0889 | 0.0002 |
| Mushrooms | 0.0237 | 0.0001 | 0.0001 | 0.0000 | 0.0001 | 0.0004 | 0.0561 |
| Thyroid | 0.0994 | 0.0572 | 0.0553 | 0.0999 | 0.0535 | 0.0658 | 0.0561 |
| Letter-Recog | 0.2207 | 0.1032 | 0.1387 | 0.1945 | 0.0569 | 0.0806 | 0.0892 |
| Adult | 0.1649 | 0.1312 | 0.1220 | 0.1394 | 0.1215 | 0.1440 | 0.0006 |
| Connect-4 | 0.2660 | 0.2253 | 0.2022 | 0.2279 | 0.1981 | 0.2279 | 0.0158 |
| Waveform | 0.0219 | 0.0152 | 0.0210 | 0.0267 | 0.0140 | 0.0157 | 0.3068 |
| Census-Income | 0.2303 | 0.0544 | 0.0421 | - | 0.0450 | 0.0859 | 0.2083 |
| Poker-Hand | 0.4979 | 0.2865 | 0.1326 | - | 0.4040 | 0.4217 | 0.1716 |
Experimental results of Variance.
| Data Set | NB | TAN | KDB | LR | ETAN | AODE | WAODE |
|---|---|---|---|---|---|---|---|
| Labor | 0.0395 | 0.0632 | 0.0721 | 0.0328 | 0.0779 | 0.0268 | 0.0221 |
| Labor-Negotiations | 0.0653 | 0.1395 | 0.1289 | 0.0655 | 0.0868 | 0.0626 | 0.0626 |
| Lymphography | 0.0343 | 0.1106 | 0.1408 | 0.1212 | 0.0961 | 0.0412 | 0.0478 |
| Iris | 0.0390 | 0.0510 | 0.0364 | 0.0327 | 0.0460 | 0.0394 | 0.0396 |
| Hungarian | 0.0201 | 0.0556 | 0.0561 | 0.0751 | 0.0411 | 0.0270 | 0.0317 |
| Heart-Disease-C | 0.0248 | 0.0479 | 0.0582 | 0.0920 | 0.0591 | 0.0304 | 0.0383 |
| Soybean-Large | 0.0783 | 0.1127 | 0.0982 | 0.1542 | 0.0899 | 0.0747 | 0.0855 |
| Ionosphere | 0.0242 | 0.0414 | 0.0581 | 0.0946 | 0.0448 | 0.0319 | 0.0242 |
| House-Votes-84 | 0.0066 | 0.0170 | 0.0197 | 0.0714 | 0.0083 | 0.0068 | 0.0123 |
| Musk1 | 0.1108 | 0.1191 | 0.1320 | 0.1691 | 0.1157 | 0.1153 | 0.1010 |
| Cylinder-Bands | 0.0656 | 0.0724 | 0.0750 | 0.1437 | 0.0888 | 0.0827 | 0.0364 |
| Chess | 0.0401 | 0.0491 | 0.0531 | 0.0791 | 0.0578 | 0.0385 | 0.0230 |
| Syncon | 0.0204 | 0.0222 | 0.0301 | 0.1764 | 0.0246 | 0.0161 | 0.0913 |
| Balance-Scale | 0.0848 | 0.0941 | 0.0872 | 0.0339 | 0.0863 | 0.0854 | 0.0334 |
| Soybean | 0.0302 | 0.0593 | 0.0439 | 0.0839 | 0.0395 | 0.0288 | 0.0321 |
| Credit-A | 0.0249 | 0.0555 | 0.0768 | 0.0737 | 0.0673 | 0.0269 | 0.0264 |
| Breast-Cancer-W | 0.0010 | 0.0372 | 0.0504 | 0.0395 | 0.0376 | 0.0118 | 0.0700 |
| Pima-Ind-Diabetes | 0.0715 | 0.0663 | 0.0689 | 0.0425 | 0.0697 | 0.0729 | 0.1276 |
| Vehicle | 0.1120 | 0.1297 | 0.1283 | 0.0797 | 0.1330 | 0.1246 | 0.0161 |
| Anneal | 0.0168 | 0.0156 | 0.0152 | 0.0593 | 0.0139 | 0.0118 | 0.0604 |
| Vowel | 0.2542 | 0.2466 | 0.2325 | 0.2239 | 0.2310 | 0.2465 | 0.2489 |
| Led | 0.0333 | 0.0536 | 0.0565 | 0.0640 | 0.0460 | 0.0372 | 0.1106 |
| Car | 0.0520 | 0.0375 | 0.0434 | 0.0385 | 0.0427 | 0.0431 | 0.0509 |
| Hypothyroid | 0.0031 | 0.0029 | 0.0024 | 0.0062 | 0.0039 | 0.0026 | 0.0083 |
| Dis | 0.0069 | 0.0006 | 0.0011 | 0.0038 | 0.0009 | 0.0048 | 0.0056 |
| Sick | 0.0047 | 0.0052 | 0.0043 | 0.0084 | 0.0063 | 0.0038 | 0.1543 |
| Abalone | 0.0682 | 0.1690 | 0.1769 | 0.0746 | 0.1679 | 0.1536 | 0.0111 |
| Spambase | 0.0092 | 0.0157 | 0.0214 | 0.0243 | 0.0177 | 0.0098 | 0.0420 |
| Waveform-5000 | 0.0259 | 0.0687 | 0.0843 | 0.0310 | 0.0693 | 0.0403 | 0.1311 |
| Page-Blocks | 0.0135 | 0.0144 | 0.0177 | 0.0123 | 0.0139 | 0.0111 | 0.0137 |
| Optdigits | 0.0153 | 0.0185 | 0.0254 | 0.0752 | 0.0162 | 0.0133 | 0.0364 |
| Satellite | 0.0139 | 0.0368 | 0.0455 | 0.0517 | 0.0395 | 0.0325 | 0.0001 |
| Mushrooms | 0.0043 | 0.0002 | 0.0002 | 0.0001 | 0.0001 | 0.0001 | 0.0239 |
| Thyroid | 0.0205 | 0.0252 | 0.0272 | 0.0453 | 0.0239 | 0.0202 | 0.0241 |
| Letter-Recog | 0.0471 | 0.0591 | 0.0113 | 0.0422 | 0.0523 | 0.0709 | 0.0417 |
| Adult | 0.0069 | 0.0165 | 0.0285 | 0.0108 | 0.0236 | 0.0104 | 0.0004 |
| Connect-4 | 0.0156 | 0.0149 | 0.0309 | 0.0127 | 0.0373 | 0.0199 | 0.0023 |
| Waveform | 0.0009 | 0.0053 | 0.0037 | 0.0024 | 0.0059 | 0.0023 | 0.0632 |
| Census-Income | 0.0052 | 0.0100 | 0.0110 | - | 0.0144 | 0.0138 | 0.0224 |
| Poker-Hand | 0.0000 | 0.0424 | 0.0633 | - | 0.0440 | 0.0273 | 0.0602 |
The records of win/draw/loss for BNCs and our algorithms.
| BNC | NB | TAN | KDB | AODE | WAODE | |
|---|---|---|---|---|---|---|
| TAN | 27/5/8 | - | - | - | - | |
| KDB | 25/10/5 | 16/13/11 | - | - | - | |
| Zero-one loss | AODE | 29/8/3 | 13/15/12 | 13/14/13 | - | - |
| WAODE | 28/7/5 | 19/14/7 | 18/13/9 | 14/19/7 | - | |
| ETAN | 30/6/4 | 21/11/8 | 19/12/9 | 18/13/9 | 15/15/10 | |
| TAN | 28/5/7 | - | - | - | - | |
| KDB | 26/8/6 | 18/14/8 | - | - | - | |
| Bias | AODE | 31/7/2 | 14/10/16 | 13/6/21 | - | - |
| WAODE | 24/2/14 | 19/4/17 | 18/4/18 | 18/4/18 | - | |
| ETAN | 32/4/4 | 18/14/8 | 9/19/12 | 20/13/7 | 19/3/18 | |
| TAN | 6/2/32 | - | - | - | - | |
| KDB | 9/2/29 | 10/7/23 | - | - | - | |
| Variance | AODE | 10/11/19 | 30/3/7 | 29/3/8 | - | - |
| WAODE | 12/5/22 | 21/3/16 | 21/1/18 | 12/4/24 | - | |
| ETAN | 4/5/31 | 15/8/17 | 24/6/10 | 3/6/31 | 18/3/19 |
The records of win/draw/loss for LR and our algorithms.
| LR | ||
|---|---|---|
| Zero-one loss | 26/4/8 | |
| ETAN | Bias | 22/5/11 |
| Variance | 21/3/14 |
Figure 6Comparison between LR and ETAN in terms of zero-one loss.
Figure 7Nemenyi test for all algorithms.
Experimental results of Bias.
| Data Set | NB | TAN | KDB | LR | ETAN | AODE | WAODE |
|---|---|---|---|---|---|---|---|
| Labor | 0.0289 | 0.0211 | 0.0279 | 0.0211 | 0.0274 | 0.0205 | 0.0200 |
| Labor-Negotiations | 0.0505 | 0.0763 | 0.0553 | 0.0422 | 0.0237 | 0.0268 | 0.0268 |
| Lymphography | 0.0902 | 0.0976 | 0.1041 | 0.1422 | 0.0814 | 0.0853 | 0.0951 |
| Iris | 0.0590 | 0.0550 | 0.0656 | 0.0343 | 0.0760 | 0.0626 | 0.0624 |
| Hungarian | 0.1646 | 0.1454 | 0.1480 | 0.1057 | 0.1456 | 0.1597 | 0.1611 |
| Heart-Disease-C | 0.1297 | 0.1263 | 0.1299 | 0.1300 | 0.1300 | 0.1171 | 0.1092 |
| Soybean-Large | 0.1070 | 0.1275 | 0.1086 | 0.1104 | 0.0964 | 0.0812 | 0.0655 |
| Ionosphere | 0.1220 | 0.0800 | 0.0855 | 0.1117 | 0.0817 | 0.0903 | 0.0061 |
| House-Votes-84 | 0.0899 | 0.0410 | 0.0258 | 0.0307 | 0.0406 | 0.0518 | 0.0406 |
| Musk1 | 0.1847 | 0.1563 | 0.1535 | 0.1357 | 0.1527 | 0.1670 | 0.1501 |
| Cylinder-Bands | 0.2000 | 0.3242 | 0.1939 | 0.1863 | 0.1746 | 0.1684 | 0.1286 |
| Chess | 0.1413 | 0.1427 | 0.1119 | 0.0832 | 0.1192 | 0.1380 | 0.0180 |
| Syncon | 0.0516 | 0.0203 | 0.0314 | 0.1123 | 0.0339 | 0.0334 | 0.1827 |
| Balance-Scale | 0.1840 | 0.1843 | 0.1902 | 0.0753 | 0.1877 | 0.1905 | 0.0503 |
| Soybean | 0.1015 | 0.0504 | 0.0491 | 0.0656 | 0.0622 | 0.0690 | 0.0900 |
| Credit-A | 0.0912 | 0.1171 | 0.1137 | 0.1279 | 0.1266 | 0.0892 | 0.0953 |
| Breast-Cancer-W | 0.0187 | 0.0315 | 0.0449 | 0.0375 | 0.0263 | 0.0243 | 0.1941 |
| Pima-Ind-Diabetes | 0.1957 | 0.1946 | 0.1944 | 0.1898 | 0.1905 | 0.1935 | 0.2398 |
| Vehicle | 0.3330 | 0.2384 | 0.2494 | 0.1540 | 0.2482 | 0.2435 | 0.0194 |
| Anneal | 0.0354 | 0.0195 | 0.0073 | 0.0788 | 0.0186 | 0.0180 | 0.2104 |
| Vowel | 0.3301 | 0.1931 | 0.1745 | 0.1688 | 0.2135 | 0.2268 | 0.1811 |
| Led | 0.2322 | 0.2247 | 0.2317 | 0.2211 | 0.2348 | 0.2327 | 0.3766 |
| Car | 0.0937 | 0.0478 | 0.0387 | 0.0536 | 0.0524 | 0.0597 | 0.0633 |
| Hypothyroid | 0.0116 | 0.0105 | 0.0096 | 0.0181 | 0.0095 | 0.0094 | 0.0315 |
| Dis | 0.0165 | 0.0194 | 0.0191 | 0.0195 | 0.0194 | 0.0163 | 0.0078 |
| Sick | 0.0246 | 0.0208 | 0.0198 | 0.0277 | 0.0215 | 0.0221 | 0.3212 |
| Abalone | 0.4180 | 0.3134 | 0.3033 | 0.3613 | 0.3089 | 0.3201 | 0.0574 |
| Spambase | 0.0929 | 0.0571 | 0.0497 | 0.0560 | 0.0488 | 0.0631 | 0.1184 |
| Waveform-5000 | 0.1762 | 0.1232 | 0.1157 | 0.1207 | 0.1122 | 0.1233 | 0.2172 |
| Page-Blocks | 0.0451 | 0.0306 | 0.0280 | 0.0282 | 0.0259 | 0.0259 | 0.0224 |
| Optdigits | 0.0685 | 0.0275 | 0.0250 | 0.0382 | 0.0182 | 0.0203 | 0.0902 |
| Satellite | 0.1746 | 0.0948 | 0.0808 | 0.1064 | 0.0834 | 0.0889 | 0.0002 |
| Mushrooms | 0.0237 | 0.0001 | 0.0001 | 0.0000 | 0.0001 | 0.0004 | 0.0561 |
| Thyroid | 0.0994 | 0.0572 | 0.0553 | 0.0999 | 0.0535 | 0.0658 | 0.0561 |
| Letter-Recog | 0.2207 | 0.1032 | 0.1387 | 0.1945 | 0.0569 | 0.0806 | 0.0892 |
| Adult | 0.1649 | 0.1312 | 0.1220 | 0.1394 | 0.1215 | 0.1440 | 0.0006 |
| Connect-4 | 0.2660 | 0.2253 | 0.2022 | 0.2279 | 0.1981 | 0.2279 | 0.0158 |
| Waveform | 0.0219 | 0.0152 | 0.0210 | 0.0267 | 0.0140 | 0.0157 | 0.3068 |
| Census-Income | 0.2303 | 0.0544 | 0.0421 | - | 0.0450 | 0.0859 | 0.2083 |
| Poker-Hand | 0.4979 | 0.2865 | 0.1326 | - | 0.4040 | 0.4217 | 0.1716 |