| Literature DB >> 31516030 |
Anja F Ernst1, Marieke E Timmerman1, Bertus F Jeronimus1, Casper J Albers1.
Abstract
Studying emotion dynamics through time series models is becoming increasingly popular in the social sciences. Across individuals, dynamics can be rather heterogeneous. To enable comparisons and generalizations of dynamics across groups of individuals, one needs sophisticated tools that express the essential similarities and differences. A way to proceed is to identify subgroups of people who are characterized by qualitatively similar emotion dynamics through dynamic clustering. So far, these methods assume equal generating processes for individuals per cluster. To avoid this overly restrictive assumption, we outline a probabilistic clustering approach based on a mixture model that clusters on individuals' vector autoregressive coefficients. We evaluate the performance of the method and compare it with a nonprobabilistic method in a simulation study. The usefulness of the methods is illustrated using 366 ecological momentary assessment time series with external measures of depression and anxiety.Entities:
Keywords: VAR model; ecological momentary assessment; finite mixture model; intensive longitudinal data; interindividual differences
Year: 2019 PMID: 31516030 PMCID: PMC8132011 DOI: 10.1177/1073191119873714
Source DB: PubMed Journal: Assessment ISSN: 1073-1911
Overview of Model-Based Time Series Clustering Methods for Psychological Data.
| Paper | Probabilistic | Filtering method | Adaptive method | Variation within clusters |
|---|---|---|---|---|
|
| + | − | + | − |
|
| + | − | + | − |
|
| + | − | + | − |
|
| + | − | + | − |
|
| + | − | + | − |
|
| − | + | − | − |
|
| − | + | + | − |
|
| − | + | − | ± |
Note. + Denotes a property is present, − a property is absent, and ± a property is present to a limited extent.
Figure 1.Distribution of average misclassification probability in the similar-true-parameters conditions across the three between-cluster distance conditions.
Note. Squares indicate the average misclassification probability within a data set, lines indicate the mean, bands the interquartile range. Average misclassification probability is based on overlap between components of the mixture distribution of individuals’ true VAR(1) slopes.
Mean (SD) ARI Values of the Two Clustering Methods Across Conditions.
| Factor | Levels | Nonprobabilistic method | Probabilistic method |
|---|---|---|---|
| Distance between clusters | Small distance | . | .575 (.381) |
| Medium distance | .895 (.205) | . | |
| Large distance | .927 (.168) | . | |
| Distance within clusters | Identical data | . | .935 (.192) |
| Similar data | .609 (.379) | . | |
| Number of persons | 30 | . | .789 (.338) |
| 60 | .800 (.331) | . | |
| 120 | .806 (.326) | . | |
| Number of observations | 51 | . | .736 (.352) |
| 101 | .808 (.327) | . | |
| 501 | .820 (.327) | . | |
| Cluster size | Equal proportion | .832 (.309) | . |
| Majority cluster | .804 (.320) | . | |
| Minority cluster | .760 (.357) | . | |
| Number of clusters | 2 | .802 (.354) | . |
| 4 | .796 (.306) | . |
Note. ARI = adjusted Rand index. The largest ARIs in each condition are highlighted in bold.
Results of a RM-ANOVA Evaluating the ARI Values of Both Clustering Methods.
|
|
|
|
| |
|---|---|---|---|---|
| Between-method effects | ||||
| Distance within clusters | 1 | 4252.000 | .000 | .572 |
| Number of persons | 2 | 31.200 | .000 | .019 |
| Number of observations | 2 | 179.000 | .000 | .101 |
| Cluster size | 2 | 65.400 | .000 | .039 |
| Number of clusters | 1 | 19.600 | .000 | .006 |
| Distance between clusters | 2 | 2729.000 | .000 | .631 |
| Distance within clusters × Number of persons | 2 | 10.500 | .000 | .007 |
| Distance within clusters × Number of observations | 2 | 0.855 | .425 | .001 |
| Distance within clusters × Cluster size | 2 | 18.400 | .000 | .011 |
| Distance within clusters × Number of clusters | 1 | 3.350 | .067 | .001 |
| Distance within clusters × Distance between clusters | 2 | 1413.000 | .000 | .470 |
| Number of persons × Number of observations | 4 | 2.070 | .082 | .003 |
| Number of persons × Cluster size | 4 | 2.540 | .038 | .003 |
| Number of persons × Number of clusters | 2 | 0.254 | .776 | .000 |
| Number of persons × Distance between clusters | 4 | 2.540 | .038 | .003 |
| Number of observations × Cluster size | 4 | 4.010 | .003 | .005 |
| Number of observations × Number of clusters | 2 | 2.420 | .090 | .002 |
| Number of observations × Distance between clusters | 4 | 42.600 | .000 | .051 |
| Cluster size × Number of clusters | 2 | 97.600 | .000 | .058 |
| Cluster size × Distance between clusters | 4 | 1.770 | .132 | .002 |
| Number of clusters ×Distance between clusters | 2 | 0.550 | .577 | .000 |
| Residuals | 3188 | |||
| Within-data effects | ||||
| Clustering method | 1 | 79.400 | .000 | .024 |
| Distance within clusters × Method | 1 | 698.000 | .000 | .180 |
| Number of persons × Method | 2 | 26.900 | .000 | .017 |
| Number of observations × Method | 2 | 98.900 | .000 | .058 |
| Cluster size × Method | 2 | 6.660 | .001 | .004 |
| Number of clusters × Method | 1 | 20.800 | .000 | .006 |
| Distance between clusters × Method | 2 | 19.800 | .000 | .012 |
| Distance within clusters × Number of persons × Method | 2 | 12.400 | .000 | .008 |
| Distance within clusters × Number of observations × Method | 2 | 5.610 | .004 | .004 |
| Distance within clusters × Cluster size × Method | 2 | 31.900 | .000 | .020 |
| Distance within Clusters × Number of Clusters × Method | 1 | 5.150 | .023 | .002 |
| Distance within Clusters × Distance between Clusters × Method | 2 | 157.000 | .000 | .090 |
| Number of persons × Number of observations × Method | 4 | 0.679 | .606 | .001 |
| Number of persons × Cluster size × Method | 4 | 0.438 | .782 | .001 |
| Number of persons × Number of clusters × Method | 2 | 0.082 | .922 | .000 |
| Number of persons × Distance between clusters × Method | 4 | 15.700 | .000 | .019 |
| Number of observations × Cluster size × Method | 4 | 1.000 | .406 | .001 |
| Number of observations × Number of clusters × Method | 2 | 0.051 | .950 | .000 |
| Number of observations × Distance between clusters × Method | 4 | 108.000 | .000 | .120 |
| Cluster size × Number of clusters × Method | 2 | 9.970 | .000 | .006 |
| Cluster size × Distance between clusters × Method | 4 | 20.400 | .000 | .025 |
| Number of clusters × Distance between clusters × Method | 2 | 9.010 | .000 | .006 |
Note. RM-ANOVA = repeated-measures analysis of variance; ARI = adjusted Rand index; df = degrees of freedom.
Mean (SD) Euclidean Distances Between True and Estimated VAR(1) Slopes for the Two Clustering Methods Across Conditions.
| Factor | Levels | Nonprobabilistic method | Probabilistic method |
|---|---|---|---|
| Distance between clusters | Small distance | .300 (.232) | . |
| Medium distance | .209 (.155) | . | |
| Large distance | .203 (.149) | . | |
| Distance within clusters | Identical data | . | .127 (.091) |
| Similar data | .392 (.143) | . | |
| Number of persons | 30 | .265 (.182) | . |
| 60 | .232 (.185) | . | |
| 120 | .215 (.193) | . | |
| Number of observations | 51 | . | .285 (.148) |
| 101 | .237 (.181) | . | |
| 501 | .214 (.207) | . | |
| Cluster size | Equal proportion | .215 (.173) | . |
| Majority cluster | .241 (.193) | . | |
| Minority cluster | .256 (.194) | . | |
| Number of clusters | 2 | .224 (.190) | . |
| 4 | .251 (.185) | . |
Note. The lowest Euclidean distances in each condition are highlighted in bold.
Results of a RM-ANOVA Evaluating the Mean Euclidean Distances of Both Clustering Methods.
|
|
|
|
| |
|---|---|---|---|---|
| Between-method effects | ||||
| Distance within clusters | 1 | 17672.000 | .000 | .847 |
| Number of persons | 2 | 608.000 | .000 | .276 |
| Number of observations | 2 | 1111.000 | .000 | .411 |
| Cluster size | 2 | 207.000 | .000 | .115 |
| Number of clusters | 1 | 551.000 | .000 | .147 |
| Distance between clusters | 2 | 1472.000 | .000 | .480 |
| Distance within clusters × Number of persons | 2 | 40.100 | .000 | .025 |
| Distance within clusters × Number of observations | 2 | 114.000 | .000 | .066 |
| Distance within clusters × Cluster size | 2 | 44.300 | .000 | .027 |
| Distance within clusters × Number of clusters | 1 | 80.900 | .000 | .025 |
| Distance within clusters × Distance between clusters | 2 | 930.000 | .000 | .368 |
| Number of persons × Number of observations | 4 | 19.900 | .000 | .024 |
| Number of persons × Cluster size | 4 | 8.970 | .000 | .011 |
| Number of persons × Number of clusters | 2 | 32.600 | .000 | .020 |
| Number of persons × Distance between clusters | 4 | 10.200 | .000 | .013 |
| Number of observations × Cluster size | 4 | 5.590 | .000 | .007 |
| Number of observations × Number of clusters | 2 | 28.100 | .000 | .017 |
| Number of observations × Distance between clusters | 4 | 13.700 | .000 | .017 |
| Cluster size × Number of clusters | 2 | 232.000 | .000 | .127 |
| Cluster size × Distance between clusters | 4 | 1.140 | .335 | .001 |
| Number of clusters × Distance between clusters | 2 | 0.543 | .581 | .000 |
| Residuals | 3188 | |||
| Within-data effects | ||||
| Clustering method | 1 | 527.000 | .000 | .142 |
| Distance within clusters × Method | 1 | 2786.000 | .000 | .466 |
| Number of persons × Method | 2 | 86.100 | .000 | .051 |
| Number of observations × Method | 2 | 446.000 | .000 | .219 |
| Cluster size × Method | 2 | 2.690 | .068 | .002 |
| Number of clusters × Method | 1 | 88.700 | .000 | .027 |
| Distance between clusters × Method | 2 | 8.010 | .000 | .005 |
| Distance within clusters × Number of persons × Method | 2 | 106.000 | .000 | .062 |
| Distance within clusters × Number of observations × Method | 2 | 10.300 | .000 | .006 |
| Distance within clusters × Cluster Size × Method | 2 | 2.230 | .107 | .001 |
| Distance within clusters × Number of clusters × Method | 1 | 71.200 | .000 | .022 |
| Distance within clusters × Distance between clusters × Method | 2 | 75.800 | .000 | .045 |
| Number of persons × Number of observations × Method | 4 | 4.180 | .002 | .005 |
| Number of persons × Cluster size × Method | 4 | 0.910 | .457 | .001 |
| Number of persons × Number of clusters × Method | 2 | 0.251 | .778 | .000 |
| Number of persons × Distance between clusters × Method | 4 | 13.300 | .000 | .016 |
| Number of observations × Cluster size × Method | 4 | 0.452 | .771 | .001 |
| Number of observations × Number of clusters × Method | 2 | 0.938 | .392 | .001 |
| Number of observations × Distance between clusters × Method | 4 | 94.000 | .000 | .105 |
| Cluster size × Number of clusters × Method | 2 | 0.947 | .388 | .001 |
| Cluster size × Distance between clusters × Method | 4 | 14.600 | .000 | .018 |
| Number of clusters × Distance between clusters × Method | 2 | 28.400 | .000 | .018 |
Note. RM-ANOVA = repeated-measures analysis of variance; df = degrees of freedom.
Average Attraction Rates (SD) of the Two Clustering Methods Across Simulation Conditions.
| Factor | Levels | Nonprobabilistic method | Probabilistic method |
|---|---|---|---|
| Distance between clusters | Small distance | . | .077 (.220) |
| Medium distance | . | .165 (.314) | |
| Large distance | . | .190 (.333) | |
| Distance within cluster | Identical data | . | .272 (.375) |
| Similar data | . | .016 (.056) | |
| Number of persons | 30 | . | .172 (.326) |
| 60 | . | .146 (.298) | |
| 120 | . | .113 (.261) | |
| Number of observations | 51 | . | .020 (.070) |
| 101 | . | .128 (.280) | |
| 501 | . | .283 (.383) | |
| Cluster size | Equal proportion | . | .148 (.305) |
| Majority cluster | . | .125 (.274) | |
| Minority cluster | . | .159 (.311) | |
| Number of clusters | 2 | . | .205 (.369) |
| 4 | . | .082 (.182) |
Note. Highest attraction rates in each condition are highlighted in bold.
Results of a RM-ANOVA Evaluating the Attraction Rates of Both Clustering Methods.
|
|
|
|
| |
|---|---|---|---|---|
| Between-method effects | ||||
| Distance within clusters | 1 | 3698.000 | .000 | .537 |
| Number of persons | 2 | 18.800 | .000 | .012 |
| Number of observations | 2 | 342.000 | .000 | .177 |
| Cluster size | 2 | 108.000 | .000 | .064 |
| Number of clusters | 1 | 1962.000 | .000 | .381 |
| Distance between clusters | 2 | 411.000 | .000 | .205 |
| Distance within clusters × Number of persons | 2 | 1.870 | .155 | .001 |
| Distance within clusters × Number of observations | 2 | 241.000 | .000 | .131 |
| Distance within clusters × Cluster size | 2 | 60.000 | .000 | .036 |
| Distance within clusters × Number of clusters | 1 | 62.200 | .000 | .019 |
| Distance within clusters × Distance between clusters | 2 | 21.900 | .000 | .014 |
| Number of persons × Number of observations | 4 | 6.300 | .000 | .008 |
| Number of persons × Cluster size | 4 | 0.744 | .562 | .001 |
| Number of persons × Number of clusters | 2 | 0.150 | .860 | .000 |
| Number of persons × Distance between clusters | 4 | 1.020 | .393 | .001 |
| Number of observations × Cluster size | 4 | 1.140 | .335 | .001 |
| Number of observations × Number of clusters | 2 | 22.000 | .000 | .014 |
| Number of observations × Distance between clusters | 4 | 13.400 | .000 | .017 |
| Cluster size × Number of clusters | 2 | 146.000 | .000 | .084 |
| Cluster size × Distance between clusters | 4 | 15.400 | .000 | .019 |
| Number of clusters × Distance between clusters | 2 | 15.100 | .000 | .009 |
| Residuals | 3188 | |||
| Within-data effects | ||||
| Clustering method | 1 | 14150.000 | .000 | .816 |
| Distance within clusters × Method | 1 | 46.200 | .000 | .014 |
| Number of persons × Method | 2 | 8.450 | .000 | .005 |
| Number of observations × Method | 2 | 195.000 | .000 | .109 |
| Cluster size × Method | 2 | 80.800 | .000 | .048 |
| Number of clusters × Method | 1 | 339.000 | .000 | .096 |
| Distance between clusters × Method | 2 | 34.100 | .000 | .021 |
| Distance within clusters × Number of persons × Method | 2 | 23.300 | .000 | .014 |
| Distance within clusters × Number of observations × Method | 2 | 312.000 | .000 | .164 |
| Distance within clusters × Cluster size × Method | 2 | 27.100 | .000 | .017 |
| Distance within clusters × Number of clusters × Method | 1 | 256.000 | .000 | .074 |
| Distance within clusters × Distance between clusters × Method | 2 | 398.000 | .000 | .200 |
| Number of persons × Number of observations × Method | 4 | 2.680 | .030 | .003 |
| Number of persons × Cluster size × Method | 4 | 0.919 | .452 | .001 |
| Number of persons × Number of clusters × Method | 2 | 7.920 | .000 | .005 |
| Number of persons × Distance between clusters × Method | 4 | 4.730 | .001 | .006 |
| Number of observations × Cluster size × Method | 4 | 4.100 | .003 | .005 |
| Number of observations × Number of clusters × Method | 2 | 45.200 | .000 | .028 |
| Number of observations × Distance between clusters × Method | 4 | 16.000 | .000 | .020 |
| Cluster size × Number of clusters × Method | 2 | 128.000 | .000 | .074 |
| Cluster size × Distance between clusters × Method | 4 | 24.100 | .000 | .029 |
| Number of clusters × Distance between clusters × Method | 2 | 37.100 | .000 | .023 |
Note. RM-ANOVA = repeated-measures analysis of variance; df = degrees of freedom.
Differences Between the Simulations With Equal and Unequal Intercepts Between Clusters.
| Performance aspect | Intercepts equal | Intercepts unequal | ||
|---|---|---|---|---|
| Nonprobabilistic method | Probabilistic method | Nonprobabilistic method | Probabilistic method | |
| ARI | .574 (.427) | . | .849 (.242) | . |
| Euclidean distance | .300 (.232) | . | .282 (.217) | . |
| Attraction rate | . | .077 (.220) | .677 (.342) | . |
Note. ARI = adjusted Rand index. Comparisons between the simulations are made with the distance between clusters fixed to small distance. The simulations are compared on mean (SD) ARI values, Euclidean distances between true and estimated VAR(1) slopes and attraction rates. The best performance aspects are highlighted in bold.
Figure 2.Clusters are based on the multivariate associations between the following variables: Negative Affect Deactivation (NAD), Positive Affect Activation (PAA), Positive Affect Deactivation (PAD), and Negative Affect Activation (NAA). (a) Mean VAR(1) coefficients per cluster, reached through the probabilistic VAR(1) clustering method. (b) Mean VAR(2) coefficients per cluster, reached through the probabilistic VAR(2) clustering method.
Note. The postfix l1 indicates columns hold the lag-1 coefficients that give the influence the variable will have on an emotion (listed in the rows) at the next measurement 6 hours later. The postfix l2 indicates columns which hold lag-2 coefficients that give the influence the variable will have on an emotion (listed in the rows) at the measurement 12 hours later.
Mean (SD) Scores on Static Variables for the Clusters Reached by the Probabilistic Clustering VAR(1) Method.
| % Male | Age | PANAS PA | PANAS NA | Anxiety/panic | QIDS sum | Proportion | |
|---|---|---|---|---|---|---|---|
| Cluster 1 | 0.21 (0.41) | 47.00 (16.21) | 35.65 (5.26) | 15.82 (5.14) | 0.40 (0.53) | 4.00 (3.07) | .15 |
| Cluster 2 | 0.17 (0.37) | 45.05 (13.39) | 32.71 (6.83) | 20.11 (6.15) | 0.73 (0.74) | 6.16 (4.01) | .63 |
| Cluster 3 | 0.33 (0.49) | 50.78 (12.74) | 34.67 (5.67) | 15.11 (2.97) | 0.28 (0.46) | 3.67 (3.45) | .05 |
| Cluster 4 | 0.00 (0.00) | 51.50 (7.78) | 28.50 (14.85) | 26.00 (1.41) | 1.50 (0.71) | 15.00 (2.83) | .01 |
| Cluster 5 | 0.19 (0.40) | 42.25 (11.86) | 27.77 (6.67) | 28.07 (7.78) | 1.44 (0.98) | 10.12 (5.02) | .16 |
Note. PANAS = positive and negative affect schedule; QIDS = Quick Inventory of Depressive Symptomatology.
Figure 3.Distribution of scores on personality measurements of the whole sample.
Note. Vertical lines show mean values of clusters reached through the probabilistic VAR(1) clustering method.
Figure 4.Classification uncertainty for all 366 participants in the solution reached by the probabilistic VAR(1) clustering method.
Mean VAR Coefficients of the Probabilistic Clustering VAR(1) Method.
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NAA | PAD | PAA | NAD | NAA | PAD | PAA | NAD | NAA | PAD | PAA | NAD | NAA | PAD | PAA | NAD | NAA | PAD | PAA | NAD | |
| Int | 44.876 | 135.775 | 118.154 | 65.729 | 42.636 | 143.796 | 123.703 | 76.372 | 28.067 | 148.287 | 133.914 | 67.397 | 1.256 | 42.546 | 50.869 | 184.080 | 102.627 | 118.864 | 106.294 | 111.890 |
| NAAl1 | .096 | .087 | .030 | −.001 | .235 | −.050 | .035 | .034 | .127 | .062 | .050 | −.113 | .334 | −.040 | .026 | .078 | .212 | −.033 | −.002 | .042 |
| PADl1 | −.009 | .183 | .136 | −.046 | −.008 | .171 | .074 | −.038 | −.045 | .191 | .025 | −.005 | .009 | .150 | .012 | .371 | .053 | .106 | .004 | .039 |
| PAAl1 | −.044 | .137 | .162 | −.017 | .004 | .075 | .190 | −.018 | .058 | −.025 | .162 | −.053 | .296 | −.107 | .020 | −.037 | −.065 | .117 | .243 | −.085 |
| NADl1 | .039 | .016 | −.064 | .208 | .067 | −.027 | −.071 | .251 | .068 | −.070 | −.083 | .106 | .097 | .014 | −.092 | .201 | −.019 | .024 | −.002 | .187 |
Note. Rows hold the lag-1 coefficients that give the influence the variable will have on an emotion (listed in the columns) at the next measurement 6 hours later.