| Literature DB >> 31214249 |
Clark Glymour1, Kun Zhang1, Peter Spirtes1.
Abstract
A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.Entities:
Keywords: causal discovery; conditional independence; directed graphical causal models; non-Gaussian distribution; non-linear models; statistical independence; structural equation models
Year: 2019 PMID: 31214249 PMCID: PMC6558187 DOI: 10.3389/fgene.2019.00524
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Illustration of how the PC algorithm works. (A) Original true causal graph. (B) PC starts with a fully-connected undirected graph. (C) The X − Y edge is removed because X ⫫ Y. (D) The X − W and Y − W edges are removed because X ⫫ W |Z and Y ⫫ W |Z. (E) After finding v-structures. (F) After orientation propagation.
Figure 2IIllustration of how the FCI algorithm is able to determine the existence of latent confunders. (A) Original true causal graph. (B) After edges are removed because of conditional independence relations. (C) The output of FCI, indicating that there is at least one unmeasured confounder of Y and Z.
Figure 3Illustration of causal asymmetry between two variables with linear relations. The causal relation is X → Y. From top to bottom: X and E both follow the Gaussian distribution (case 1), uniform distribution (case 2), and Laplace distribution (case 3). The two columns on the left show the scatter plot of X and Y and that of X and the regression residual for regressing Y on X, and the two columns on the right correspond to regressing X on Y.
Figure 4The “extended expert" model for Sachs's data set). See Sachs et al. (2005) or Ramsey and Bryan (2018b) for the significance of the variables.
Figure 5The Model for the Sach's Data estimated by the FASK algorithm.
Comparison of the fundamental causal discovery methods reviewed in this paper.
| Faithfulness assumption required? | Yes | Yes | Some weaker condition required (not totally clear yet) | No |
| Specific assumptions on data distributions required? | No | No | Yes (usually assumes linear-Gaussian models or multinomial distributions) | Yes |
| Properly handle confounders? | No | Yes | No | No |
| Output | Markov equivalence class | Partial ancestral graph | Markov equivalence class | DAG as well as causal model (under the respective identifiability conditions) |
| Remark on practical issues | Confounder in the linear, non-Gaussian case Hoyer et al. ( | |||