| Literature DB >> 34532201 |
Nicholas J Matiasz1, Justin Wood2, Wei Wang3, Alcino J Silva4, William Hsu5.
Abstract
Scientists try to design experiments that will yield maximal information. For instance, given the available evidence and a limitation on the number of variables that can be observed simultaneously, it may be more informative to intervene on variable X and observe the response of variable Y than to intervene on X and observe Z; in other situations, the opposite may be true. Scientists must often make these decisions without primary data. To address this problem, in previous work, we created software for annotating aggregate statistics in the literature and deriving consistent causal explanations, expressed as causal graphs. This meta-analytic pipeline is useful not only for synthesizing evidence but also for planning experiments: one can use it strategically to select experiments that could further eliminate causal graphs from consideration. In this paper, we introduce interpretable policies for selecting experiments in the context of piecemeal causal discovery, a common setting in biological sciences in which each experiment can measure not an entire system but rather a strict subset of its variables. The limits of this piecemeal approach are only beginning to be fully characterized, with crucial theoretical work published recently. With simulations, we show that our experiment-selection policies identify causal structures more efficiently than random experiment selection. Unlike methods that require primary data, our meta-analytic approach offers a flexible alternative for those seeking to incorporate qualitative domain knowledge into their search for causal mechanisms. We also present a method that categorizes hypotheses with respect to their utility for identifying a system's causal structure. Although this categorization is usually infeasible to perform manually, it is critical for conducting research efficiently.Entities:
Keywords: Causal discovery; cause effect analysis; computer aided analysis; design of experiments; evidence synthesis; graphical models
Year: 2021 PMID: 34532201 PMCID: PMC8442252 DOI: 10.1109/access.2021.3093524
Source DB: PubMed Journal: IEEE Access ISSN: 2169-3536 Impact factor: 3.476
FIGURE 1.This block diagram provides an overview of the proposed method. Experimental results in the literature are annotated using the research map schema; these results are converted into statistical relations in the form of ASP-encoded causal-structure constraints. An ASP-based causal discovery algorithm then computes the set of causal graphs that maximally accommodate the evidence. Algorithm 1 computes the degrees of freedom for the resulting equivalence class. Algorithm 2 and algorithm 3 are used to identify informative experiments to perform next. Algorithm 4 categorizes hypotheses with respect to their utility for identifying a system’s causal structure.
The experiments that would be most informative with respect to a pair of variables, given their particular degree-of-freedom pattern in an equivalence class. These suggested experiments inform the experiment-selection method given in Algorithms 2 and 3. The set J indicates which variables are intervened on in each experiment; when J = Ø, a passive observation of the two variables is performed.
|
|
FIGURE 2.A comparison of three experiment-selection policies: (1) random, (2) Algorithm 2 (degrees of freedom), and (3) Algorithm 3 (expectation). This plot shows the results of the simulation given in algorithm 5 for N = 4. The results show the experimental effort that is saved when each experiment is chosen based on the remaining degrees of freedom in the equivalence class.
Empirical efficiency of experiment-selection policies.
| Number of studies needed to reach: | |||
|---|---|---|---|
| Policy | < 50 graphs | < 10 graphs | minimum |
|
| 5 | 9 | 15 |
|
| |||
|
| 6 | 14 | 23 |
|
| |||
| Random selection | 7 | 19 | 47 |
The number of studies that each experiment-selection policy takes on average to reduce the equivalence class to a given size.
Computational efficiency of experiment-selection policies.
| Number of ASP models invoked for: | |||
|---|---|---|---|
| Policy | 4 variables | 8 variables | 14 variables |
|
| 543 | ~ 1011 | ~ 1036 |
|
| |||
|
| 18 | 84 | 273 |
|
| |||
| Random selection | 0 | 0 | 0 |
The number of ASP models that each experiment-selection policy requires the solver to invoke in order to suggest an experiment.
Runtimes for experiment-selection and hypothesis-categorization simulations.
| Average execution time (s) to determine: | ||
|---|---|---|
| Policy | Graphs/equivalence class | Hypotheses/category |
|
| 61.7 | 246.5 |
|
| ||
|
| 34.0 | 527.7 |
|
| ||
| Random selection | 827.3 | 1001.5 |
The average runtimes required to complete a single run (i.e., for a given true causal graph) of the simulations presented in Fig. 2 and Fig. 3, respectively, for each of the three experiment selection procedures.
FIGURE 3.The average number of hypotheses that fell into categories 1, 2, and 3 in a run of the simulation given in Algorithm 5, in which Algorithm 2 was used as the experiment-selection procedure. As each experiment’s result updates the knowledge base of causal-structure constraints, untested hypotheses may change categories, with important implications for the selection of the next experiment.