| Literature DB >> 25780776 |
M Cyrus Maher1, Ryan D Hernandez2.
Abstract
Background. Establishing health-related causal relationships is a central pursuit in biomedical research. Yet, the interdependent non-linearity of biological systems renders causal dynamics laborious and at times impractical to disentangle. This pursuit is further impeded by the dearth of time series that are sufficiently long to observe and understand recurrent patterns of flux. However, as data generation costs plummet and technologies like wearable devices democratize data collection, we anticipate a coming surge in the availability of biomedically-relevant time series data. Given the life-saving potential of these burgeoning resources, it is critical to invest in the development of open source software tools that are capable of drawing meaningful insight from vast amounts of time series data. Results. Here we present CauseMap, the first open source implementation of convergent cross mapping (CCM), a method for establishing causality from long time series data (≳25 observations). Compared to existing time series methods, CCM has the advantage of being model-free and robust to unmeasured confounding that could otherwise induce spurious associations. CCM builds on Takens' Theorem, a well-established result from dynamical systems theory that requires only mild assumptions. This theorem allows us to reconstruct high dimensional system dynamics using a time series of only a single variable. These reconstructions can be thought of as shadows of the true causal system. If reconstructed shadows can predict points from opposing time series, we can infer that the corresponding variables are providing views of the same causal system, and so are causally related. Unlike traditional metrics, this test can establish the directionality of causation, even in the presence of feedback loops. Furthermore, since CCM can extract causal relationships from times series of, e.g., a single individual, it may be a valuable tool to personalized medicine. We implement CCM in Julia, a high-performance programming language designed for facile technical computing. Our software package, CauseMap, is platform-independent and freely available as an official Julia package. Conclusions. CauseMap is an efficient implementation of a state-of-the-art algorithm for detecting causality from time series data. We believe this tool will be a valuable resource for biomedical research and personalized medicine.Entities:
Keywords: Causality; Dynamical systems; Open source software; Personalized medicine; Time series methods
Year: 2015 PMID: 25780776 PMCID: PMC4359046 DOI: 10.7717/peerj.824
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Runtime versus time series length.
Results are presented for one to six catenations of the dataset presented in Fig. 1. Runtime values are for comprehensive parameter optimizations on a single 2.6 GHz Intel Core i7 processor.
| Time series length | Runtime (s) |
|---|---|
| 71 | 10.2 |
| 142 | 40.4 |
| 213 | 116.6 |
| 284 | 317.2 |
| 355 | 534.7 |
| 426 | 1080.5 |
Figure 1An example visualization from CauseMap using abundances of Paramecium aurelia and Didinium nasutum.
See Supplemental Information for more information on this system. (A) For optimal parameter values, the convergence of the cross-map correlation with library size. (B–C). The dependence of the maximum cross-map correlation on assumed dimensionality (measured by E) and the time lag of the causal effect (measured by τ). Note that the second maximum at τ = 5 corresponds to the principal frequency of the P. aurelia and D. nasutum time series, as determined by Fourier transform analysis.
Figure 2The effect of time series length on ρccm convergence.
Black, blue, and red lines illustrate ρccm for the full, 1/2 thinned, and 1/3 thinned datasets, respectively. For a given color, darker lines show ρccm for the test of whether Didinium abundance influences Paramecium abundance. Lighter lines examine the converse.