| Literature DB >> 33286911 |
Zhigao Guo1, Anthony C Constantinou1,2.
Abstract
Score-based algorithms that learn Bayesian Network (BN) structures provide solutions ranging from different levels of approximate learning to exact learning. Approximate solutions exist because exact learning is generally not applicable to networks of moderate or higher complexity. In general, approximate solutions tend to sacrifice accuracy for speed, where the aim is to minimise the loss in accuracy and maximise the gain in speed. While some approximate algorithms are optimised to handle thousands of variables, these algorithms may still be unable to learn such high dimensional structures. Some of the most efficient score-based algorithms cast the structure learning problem as a combinatorial optimisation of candidate parent sets. This paper explores a strategy towards pruning the size of candidate parent sets, and which could form part of existing score-based algorithms as an additional pruning phase aimed at high dimensionality problems. The results illustrate how different levels of pruning affect the learning speed relative to the loss in accuracy in terms of model fitting, and show that aggressive pruning may be required to produce approximate solutions for high complexity problems.Entities:
Keywords: probabilistic graphical models; pruning; structure learning
Year: 2020 PMID: 33286911 PMCID: PMC7597292 DOI: 10.3390/e22101142
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Sample CPSs of node “0”, ordered by BDeu score with max in-degree 3. The example is based on Audio-train dataset which incorporates 100 variables.
| Child Node | Local BDeu Score | CPS Size | CPS |
|---|---|---|---|
| 0 | −5149.19 | 3 | {9, 85, 95} |
| 0 | −5150.47 | 3 | {9, 94, 95} |
| 0 | −5174.53 | 3 | {85,94, 95} |
| 0 | −5207.08 | 3 | {80,85, 95} |
| 0 | −5208.28 | 3 | {9, 80, 95} |
| … | … | … | … |
| 0 | −6886.30 | 2 | {48,67} |
| 0 | −6886.74 | 1 | {67} |
| 0 | −5174.53 | 1 | {81} |
| 0 | −5174.53 | 1 | {75} |
| 0 | −6889.11 | 0 | {} |
An example BN with four nodes and the legal CPSs that remain after pruning the CPSs that are impossible to exist (highlighted in bold), as determined by the BDeu score. The example assumes four nodes, a maximum in-degree (ID) of 3, and a sample size of 5000.
| Node | ID = 0 | ID = 1 | ID = 1 | ID = 1 | ID = 2 | ID = 2 | ID = 2 | ID = 3 |
|---|---|---|---|---|---|---|---|---|
| 1 |
|
|
|
|
|
|
|
|
| −2288.7 | −2274.6 | −2196.2 | −2240.7 |
|
| −2171.3 |
| |
| 2 |
|
|
|
|
|
|
|
|
| −2003.7 | −1989.6 | −1900.7 | −1915.1 |
|
| −1849.2 |
| |
| 3 |
|
|
|
|
|
|
|
|
| −2891.5 | −2799.0 | −2788.5 | −2811.3 | −2714.5 | −2741.9 | −2745.5 | −2692.6 | |
| 4 |
|
|
|
|
|
|
|
|
| −1951.6 | −1903.6 | −1862.9 | −1871.4 | −1829.5 | −1846.5 | −1819.9 | −1807.6 |
Figure 1All possible CPSs of node “1” (not shown in the diagram) under the assumption the maximum in-degree is 3.
Figure 2The optimal structure learnt from the CPSs presented in Table 2.
The number and rates of legal CPSs in relation to the all possible CPSs for subsets of Audio-train data over varying samples sizes and maximum in-degrees.
| Maximum | Number of | Sample Size | ||||
|---|---|---|---|---|---|---|
| 3000 | 6000 | 9000 | 12,000 | 15,000 | ||
| 1 | 10,000 | 8398 | 8926 | 9163 | 9320 | 9394 |
| 84.0% | 89.3% | 91.6% | 93.2% | 93.9% | ||
| 2 | 495,100 | 228,197 | 306,263 | 349,587 | 374,007 | 388,621 |
| 46.1% | 61.9% | 70.6% | 75.5% | 78.5% | ||
| 3 | 16,180,000 | 1,200,429 | 3,260,399 | 5,130,502 | 6,405,394 | 7,343,077 |
| 7.42% | 20.2% | 31.7% | 39.6% | 45.4% | ||
Moderate complexity case studies (nodes∣max in-degree in true networks), depicting the total number of legal CPSs per network, as well as the average number of CPSs per node in that network, for network and sample size combination. The number of legal CPSs assume a maximum in-degree of 3.
| Sample Size | Asia | Insurance | Water | Alarm | Hailfinder | Carpo | |
|---|---|---|---|---|---|---|---|
| CPSs (graph) | 100 | 41 | 279 | 482 | 907 | 244 | 5068 |
| 1000 | 107 | 774 | 573 | 1928 | 761 | 3827 | |
| 10,000 | 161 | 3652 | 961 | 6473 | 3768 | 16,391 | |
| CPSs (per node) | 100 | 5.13 | 10.33 | 15.06 | 24.51 | 4.36 | 84.47 |
| 1000 | 13.38 | 28.67 | 17.91 | 52.11 | 13.59 | 63.78 | |
| 10,000 | 20.12 | 135.26 | 30.03 | 174.95 | 67.29 | 273.18 |
Loss in accuracy for different levels of pruning, as a discrepancy in BDeu score from the unpruned score, based on the three different sample sizes for case studies Asia, Insurance and Water.
| Pruning | Asia | Asia | Asia | Insurance | Insurance | Insurance | Water | Water | Water |
|---|---|---|---|---|---|---|---|---|---|
| 90% | −6.70‰ | −1.26‰ | −1.33‰ | −30.74‰ | −62.92‰ | −35.26‰ | −11.84‰ | −28.11‰ | −15.50‰ |
| 80% | −6.70‰ | −1.26‰ | −1.06‰ | −30.74‰ | −37.77‰ | −7.99‰ | −11.15‰ | −19.37‰ | −8.12‰ |
| 70% | −6.70‰ | −1.26‰ | −1.06‰ | −10.50‰ | −13.80‰ | −7.13‰ | −8.21‰ | −2.99‰ | −0.68‰ |
| 60% | −6.70‰ | −1.26‰ | −0.72‰ | −8.32‰ | −6.73‰ | −5.32‰ | −6.70‰ | −2.81‰ | −0.44‰ |
| 50% | −6.68‰ | −1.26‰ | −0.72‰ | −7.94‰ | −4.14‰ | −2.83‰ | −1.24‰ | −1.02‰ | −0.27‰ |
| 40% | −0.04‰ | −1.26‰ | −0.72‰ | −2.33‰ | −1.28‰ | −2.07‰ | −0.64‰ |
| −0.18‰ |
| 30% |
| −0.9‰ | −0.25‰ | −2.23‰ |
| −1.22‰ | −0.32‰ |
| −0.02‰ |
| 20% |
|
|
|
|
| −0.25‰ | −0.32‰ |
|
|
| 10% |
|
|
|
|
|
|
|
|
|
| 0% |
|
|
|
|
|
|
|
|
|
Loss in accuracy for different levels of pruning, as a discrepancy in BDeu score from the unpruned score, based on the three different sample sizes for case studies Alarm, Hailfinder and Carpo.
| Pruning | Alarm | Alarm | Alarm | Hailfinder | Hailfinder | Hailfinder | Carpo | Carpo | Carpo |
|---|---|---|---|---|---|---|---|---|---|
| 90% | −78.39‰ | −46.86‰ | −23.13‰ | −34.15‰ | −8.21‰ | −8.82‰ | −7.88‰ | −3.84‰ | −2.90‰ |
| 80% | −30.04‰ | −38.71‰ | −14.44‰ | −25.02‰ | −4.17‰ | −6.05‰ | −5.29‰ | −3.13‰ | −1.99‰ |
| 70% | −18.87‰ | −22.93‰ | −3.88‰ | −10.03‰ | −4.17‰ | −4.23‰ | −4.33‰ | −2.02‰ | −1.94‰ |
| 60% | −13.55‰ | −14.33‰ | −1.99‰ | −2.23‰ | −2.16‰ | −1.38‰ | −4.33‰ | −1.78‰ | −1.85‰ |
| 50% | −4.27‰ | −5.23‰ | −1.79‰ | −1.57‰ | −1.60‰ | −0.57‰ | −3.97‰ | −1.73‰ | −1.10‰ |
| 40% | −3.69‰ | −1.82‰ | −0.20‰ | −1.57‰ | −1.03‰ | −0.57‰ | −2.33‰ | −1.54‰ | −1.06‰ |
| 30% | −1.06‰ | −0.30‰ |
| −1.27‰ | −0.20‰ | −0.06‰ | −1.51‰ | −1.17‰ | −0.93‰ |
| 20% | −1.06‰ | −0.30‰ |
| −0.07‰ | −0.19‰ | −0.06‰ | −1.01‰ | −0.25‰ | −0.35‰ |
| 10% |
| −0.15‰ |
|
| −0.19‰ | −0.06‰ | −1.01‰ | −0.18‰ | −0.02‰ |
| 0% |
|
|
|
|
|
|
|
|
|
Loss in accuracy for different levels of pruning, as a discrepancy in BDeu score from the unpruned score, based on the three different sample sizes for case studies Audio-train and Kosarek-test. Time (secs) represents the time needed by the MINOBS algorithm to first discover the highest scoring graph within the 24 h of search.
| Audio-Train | Kosarek-Test | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ||
|
|
|
|
| |||||
| 99% | 73,535 | 735 | −4.352‰ | 1473 | 58,249 | 307 | −7.468‰ | 4260 |
| 95% | 367,258 | 3673 | −0.669‰ | 682 | 287,641 | 1514 | −0.271‰ | 3265 |
| 90% | 734,414 | 7344 | −0.002‰ | 1035 | 575,096 | 3027 |
| 1803 |
| 80% | 1,468,717 | 14,687 |
| 2952 | 1,149,980 | 6053 |
| 16,378 |
| 70% | 2,203,033 | 22,030 |
| 3908 | 1,724,881 | 9078 |
| 16,010 |
| 60% | 2,937,329 | 29,373 |
| 5344 | 2,299,767 | 12,104 |
| 20,637 |
| 50% | 3,671,663 | 36,717 |
| 4334 | 2,874,708 | 15,130 |
| 9033 |
| 40% | 4,405,948 | 44,059 |
| 4587 | 3,449,544 | 18,155 |
| 14,903 |
| 30% | 5,140,257 | 51,403 |
| 10,028 | 4,024,450 | 21,181 |
| 9288 |
| 20% | 5,874,560 | 58,746 |
| 10,442 | 4,599,334 | 24,207 |
| 29,603 |
| 10% | 6,608,876 | 66,089 |
| 11,385 | 5,174,238 | 27,233 |
| 42,493 |
| 0% | 7,343,077 | 73,431 |
| 21,643 | 5,748,931 | 30,258 |
| 82,758 |
Loss in accuracy for different levels of pruning, as a discrepancy in BDeu score from the unpruned score, based on the three different sample sizes for case studies EachMovie-train and Reuters-52. Time (secs) represents the time needed by the MINOBS algorithm to first discover the highest scoring graph within the 24 h of search.
| EachMovie-Train | Reuters-52-Train | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ||
|
|
|
|
| |||||
| 99% | 220,378 | 441 | −0.671‰ | 1711 | 375,700 | 423 | −1.269‰ | 3368 |
| 95% | 1,099,782 | 2200 | −0.158‰ | 6471 | 1,874,897 | 2109 | −0.051‰ | 6430 |
| 90% | 2,199,065 | 4398 |
| 9049 | 3,748,921 | 4217 |
| 10,002 |
| 80% | 4,397,558 | 8795 |
| 15,273 | 7,496,843 | 8433 |
| 34,537 |
| 70% | 6,596,133 | 13,192 |
| 23,133 | 11,244,877 | 12,649 |
| 41,554 |
| 60% | 8,795,121 | 17,589 |
| 9195 | 14,992,798 | 16,865 |
| 12,925 |
| 50% | 10,993,281 | 21,943 |
| 15,812 | 18,741,002 | 21,081 |
| 27,914 |
| 40% | 13,191,681 | 26,383 |
| 37,244 | 22,488,769 | 25,297 |
| 17,276 |
| 30% | 15,390,238 | 30,780 |
| 74,312 | 26,236,772 | 29,513 |
| 72,208 |
| 20% | 17,588,746 | 35,107 |
| 24,576 | 29,984,724 | 33,729 |
| 16,969 |
| 10% | 19,787,306 | 39,575 |
| 35,952 | 33,732,728 | 37,945 |
| 69,315 |
| 0% | 21,985,307 | 43,971 |
| 82,758 | 37,479,789 | 42,159 |
| 48,704 |