| Literature DB >> 30957008 |
Michail Tsagris1, Giorgos Borboudakis1,2, Vincenzo Lagani1,2, Ioannis Tsamardinos1,2,3,4.
Abstract
We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial, and ordinal variables. We use likelihood-ratio tests based on appropriate regression models and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs, respectively. In experiments on simulated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data and show that the proposed approach outperforms alternatives in terms of learning accuracy.Entities:
Keywords: Bayesian networks; Conditional independence tests; Constraint-based learning; Maximal ancestral graphs; Mixed data
Year: 2018 PMID: 30957008 PMCID: PMC6428307 DOI: 10.1007/s41060-018-0097-y
Source DB: PubMed Journal: Int J Data Sci Anal
Fig. 1An example where the proposed tests fail to identify the unconditional dependency between X and Z is shown. The correlation between X and Z is 0.008, and the p value of the test equals 0.795, suggesting independence
Fig. 2The correlation of the two p values and the proportion of decision agreements at the 5% significance level are shown for different pairs of regression models. The correlation of p values for (un)conditional independence increases with sample size, reaching almost perfect positive correlation in most cases. In terms of decision agreements, an agreement of over 90% is reached in all cases even with 200 samples. a Unconditional independence, b conditional independence, c unconditional dependence, and d conditional dependence
Fig. 3Estimated type I error on the (un)conditional independence cases for each pair of regression models, and three methods for combining dependent p values. The solid horizontal line is at the 5% level, and the two dashed lines at 4 and 6% levels. Whenever linear regression models are involved, the MM method and the linear test perform similarly. For the conditional case of binary-ordinal and multinomial-ordinal pairs, the MM method outperforms all methods
Fig. 4Estimated power on the (un)conditional dependence cases for each pair of regression models, and three methods for combining dependent p values. In most cases, all methods perform very similar. For the multinomial-ordinal case, ordinal regression breaks down for small samples, and MM is slightly behind the rest. This is expected, as the other methods also have a larger type I error
Precision and recall for the skeleton estimation
| Method | 50 variables | 100 variables | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| 3 neighbors | ||||||
| MM | 0.783 | 0.981 | 0.988 | 0.949 | 0.971 | 0.974 |
| Fast | 0.708* | 0.971* | 0.979* | 0.936* | 0.952* | 0.951* |
| Copula | 0.942* | 0.975 | 0.884* | 0.896 | 0.914* | |
| 5 neighbors | ||||||
| MM | 0.989 | 0.992 | 0.993 | 0.988 | 0.992 | 0.989 |
| Fast | 0.986 | 0.990 | 0.992 | 0.984 | 0.985* | 0.985* |
| Copula | 0.980* | 0.971* | 0.951* | 0.987 | 0.961* | 0.950* |
|
| ||||||
| 3 neighbors | ||||||
| MM | 0.172 | 0.704 | 0.808 | 0.536 | 0.707* | 0.794 |
| Fast | 0.155* | 0.639* | 0.711* | 0.507* | 0.643* | 0.684* |
| Copula | 0.152* | 0.675 | 0.796 | 0.402* | 0.669* | 0.793 |
| 5 neighbors | ||||||
| MM | 0.445 | 0.617 | 0.717 | 0.460 | 0.624 | 0.725 |
| Fast | 0.436* | 0.575* | 0.649* | 0.457 | 0.582* | 0.660* |
| Copula | 0.374* | 0.600* | 0.700 | 0.341* | 0.595* | 0.725 |
An asterisk (*) indicates that the precision or recall of the Fast or Copula approach is statistically significantly lower than that of MM at the 1% significance level. The italic font indicates that the precision of the Copula approach is statistically significantly higher than that of MM at 1% significance level
Precision and recall for the estimation of the orientations
| Method | 50 variables | 100 variables | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| 3 neighbors | ||||||
| MM | 0.686 | 0.979 | 0.988 | 0.943 | 0.965 | 0.974 |
| Fast | 0.608* | 0.969* | 0.978* | 0.928* | 0.942* | 0.948* |
| Copula | 0.940* | 0.928* | 0.932* | 0.913* | ||
| 5 neighbors | ||||||
| MM | 0.987 | 0.992 | 0.993 | 0.986 | 0.992 | 0.989 |
| Fast | 0.984 | 0.989 | 0.992 | 0.982 | 0.985* | 0.984* |
| Copula | 0.975* | 0.970* | 0.950* | 0.984 | 0.959* | 0.949* |
|
| ||||||
| 3 neighbors | ||||||
| MM | 0.118 | 0.692 | 0.806 | 0.504 | 0.668 | 0.790 |
| Fast | 0.108* | 0.621* | 0.698* | 0.476* | 0.600* | 0.669* |
| Copula | 0.092* | 0.666 | 0.793 | 0.342* | 0.625* | 0.791 |
| 5 neighbors | ||||||
| MM | 0.413 | 0.606 | 0.711 | 0.430 | 0.613 | 0.719 |
| Fast | 0.406* | 0.561* | 0.638* | 0.428 | 0.569* | 0.649* |
| Copula | 0.327* | 0.591 | 0.696 | 0.289* | 0.583* | 0.722 |
An asterisk (*) indicates that the precision or recall of the Fast or Copula approach is statistically significantly lower than that of MM at the 1% significance level. The italic font indicates that the precision of the Copula approach is statistically significantly higher than that of MM at 1% significance level
Structural Hamming distance (lower is better)
| Method | 50 variables | 100 variables | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| ||||||
| 3 neighbors | ||||||
| MM | 71.48 | 34.40 | 25.64 | 97.62 | 69.94 | 55.12 |
| Fast | 73.18* | 38.66* | 33.76* | 101.04* | 80.46* | 74.28* |
| Copula | 70.96 | 37.12* | 30.30* | 115.66* | 76.88* | 62.00* |
| 5 neighbors | ||||||
| MM | 81.46 | 57.28 | 44.54 | 158.60 | 112.35 | 87.10 |
| Fast | 82.12 | 62.62* | 53.56* | 158.50 | 123.55* | 105.30* |
| Copula | 91.60* | 60.42* | 49.84* | 191.40* | 124.95* | 93.15* |
An asterisk (*) indicates that the SHD of the Fast or Copula approach is statistically significantly higher than that of MM at the 1% significance level