| Literature DB >> 32265770 |
Julian D Karch1,2, Andreas M Brandmaier2,3, Manuel C Voelkle4.
Abstract
In this article, we extend the Bayesian nonparametric regression method Gaussian Process Regression to the analysis of longitudinal panel data. We call this new approach Gaussian Process Panel Modeling (GPPM). GPPM provides great flexibility because of the large number of models it can represent. It allows classical statistical inference as well as machine learning inspired predictive modeling. GPPM offers frequentist and Bayesian inference without the need to resort to Markov chain Monte Carlo-based approximations, which makes the approach exact and fast. GPPMs are defined using the kernel-language, which can express many traditional modeling approaches for longitudinal data, such as linear structural equation models, multilevel models, or state-space models but also various commonly used machine learning approaches. As a result, GPPM is uniquely able to represent hybrid models combining traditional parametric longitudinal models and nonparametric machine learning models. In the present paper, we introduce GPPM and illustrate its utility through theoretical arguments as well as simulated and empirical data.Entities:
Keywords: Bayesian; continuous-time; longitudinal analysis; machine learning; prediction; statistical learning
Year: 2020 PMID: 32265770 PMCID: PMC7096578 DOI: 10.3389/fpsyg.2020.00351
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Five model-implied trajectories simulated from the “exponential rise to limit model.” Each line represents the skill trajectory of one person.
Figure 2Visualization of the predictive distributions on the “exponential rise to the limit” data for the three considered models. CI = credibility interval. (A) LGCM. (B) squared exponential (SE) Model. (C) LGCM+SE.
Negative log predictive probabilities on the “exponential rise to the limit data” for the compared models as estimated by the different test sets.
| LGCM | 152.1 | 317.77 | 600.2 |
| SE Model | 2.33 | 11.89 | 13.862 |
| LGCM+SE Model | 2.21 | 16.86 | 18.853 |
Negative log predictive probabilities on the data generated from the LGCM + unknown deviation distribution for the compared models as estimated by the different test sets.
| LGCM | 74.97 | 299.14 | 375.37 |
| SE Model | 2.85 | 31.641 | 34.278 |
| LGCM+SE Model | 2.87 | 30.275 | 33.038 |
Figure 3Visualization of the predictive distributions on the data generated from the LGCM + unknown deviation distribution for the three considered models. CI = credibility interval. (A) LGCM. (B) SE Model. (C) LGCM+SE.
Figure 4Graphical illustration of the differences between the squared exponential and the exponential kernel. Example trajectories implied by each kernel are shown. To generate the data, the variance parameter was set to and the length scale parameter to l = 1 for the exponential and to l = 0.25 for the squared exponential kernel. (A) Squared exponential. (B) Exponential.
Bayesian information criterion (BIC), and negative log predictive probability (NLPP) for the exponential and the SE model.
| BIC | 10876.14 | |
| NLPP | 10846.54 |
Bold face marks the model selected on the basis of the corresponding measure. The smaller value of a measure indicates which model to select.
95% confidence intervals as well as maximum likelihood estimates for the parameters from exponential and the squared exponential model.
| μ | 2.82 | 2.85 | 2.87 |
| 0.00 | 0.00 | 0.11 | |
| 0.37 | 0.47 | 0.50 | |
| 13.24 | 13.42 | 15.26 | |
| 0.04 | 0.05 | 0.06 | |
| μ | 2.82 | 2.85 | 2.87 |
| 0.21 | 0.26 | 0.30 | |
| 0.16 | 0.19 | 0.23 | |
| 20.95 | 21.39 | 30.54 | |
| 0.07 | 0.08 | 0.08 | |
Figure 5Person-specific predictions of the squared exponential and the exponential model for one randomly selected person. The bold line indicates the mean of the predictive distribution for every time point. The gray area displays the 95% credible region. Crosses depict observed training data.