| Literature DB >> 27195202 |
Abstract
This paper aims to give a broad coverage of central concepts and principles involved in automated causal inference and emerging approaches to causal discovery from i.i.d data and from time series. After reviewing concepts including manipulations, causal models, sample predictive modeling, causal predictive modeling, and structural equation models, we present the constraint-based approach to causal discovery, which relies on the conditional independence relationships in the data, and discuss the assumptions underlying its validity. We then focus on causal discovery based on structural equations models, in which a key issue is the identifiability of the causal structure implied by appropriately defined structural equation models: in the two-variable case, under what conditions (and why) is the causal direction between the two variables identifiable? We show that the independence between the error term and causes, together with appropriate structural constraints on the structural equation, makes it possible. Next, we report some recent advances in causal discovery from time series. Assuming that the causal relations are linear with nonGaussian noise, we mention two problems which are traditionally difficult to solve, namely causal discovery from subsampled data and that in the presence of confounding time series. Finally, we list a number of open questions in the field of causal discovery and inference.Entities:
Keywords: Causal discovery; Causal inference; Conditional independence; Identifiability; Statistical independence; Structural equation model
Year: 2016 PMID: 27195202 PMCID: PMC4841209 DOI: 10.1186/s40535-016-0018-x
Source DB: PubMed Journal: Appl Inform (Berl) ISSN: 2196-0089
Fig. 1a Unmanipulated causal graph K; b B Manipulated to 5; c A Manipulated to 5
Fig. 2Alternative SEM models
Fig. 3Illustration of causal asymmetry between two variables with linear relations. The data were generated according to equation 3 with , i.e., the causal relation is . From top to bottom: X and both follow the Gaussian distribution (case 1), uniform distribution (case 2), and a certain type of super-Gaussian distribution (case 3). The two columns on the left show the scatter plot of X and Y and that of X and the regression residual for regression of Y given X, and the two columns on the right correspond to regression of X given Y. Here we used 1000 data points. One can see that for regression of X given Y, in cases 2 and 3 the residual is not independent from the predictor, although they are uncorrelated by construction