| Literature DB >> 34898331 |
Anita Brobbey1, Samuel Wiebe1,2, Alberto Nettel-Aguirre3, Colin Bruce Josephson1,2, Tyler Williamson1, Lisa M Lix4, Tolulope T Sajobi1,2.
Abstract
Discriminant analysis procedures that assume parsimonious covariance and/or means structures have been proposed for distinguishing between two or more populations in multivariate repeated measures designs. However, these procedures rely on the assumptions of multivariate normality which is not tenable in multivariate repeated measures designs which are characterized by binary, ordinal, or mixed types of response distributions. This study investigates the accuracy of repeated measures discriminant analysis (RMDA) based on the multivariate generalized estimating equations (GEE) framework for classification in multivariate repeated measures designs with the same or different types of responses repeatedly measured over time. Monte Carlo methods were used to compare the accuracy of RMDA procedures based on GEE, and RMDA based on maximum likelihood estimators (MLE) under diverse simulation conditions, which included number of repeated measure occasions, number of responses, sample size, correlation structures, and type of response distribution. RMDA based on GEE exhibited higher average classification accuracy than RMDA based on MLE especially in multivariate non-normal distributions. Three repeatedly measured responses namely severity of epilepsy, current number of anti-epileptic drugs, and parent-reported quality of life in children with epilepsy were used to demonstrate the application of these procedures.Entities:
Keywords: Discriminant analysis; classification; generalized estimating equation; multivariate non-normal distribution; multivariate repeated measures data
Mesh:
Year: 2021 PMID: 34898331 PMCID: PMC8961244 DOI: 10.1177/09622802211032705
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Configuration of unstructured between-responses correlation matrix given within-response correlation coefficient for the Monte Carlo Study.
| Within-response | |
|---|---|
| Correlation 0.3 | 0.7 |
| Coefficient ( | |
|
|
|
|
|
|
Q: Number of responses.
True parameters ( ) for population 1 and population 2 simulated data.
| Population distribution | Number of responses | Population 1 | Population 2 |
|---|---|---|---|
| Normal/mixed-type | 3 |
|
|
| 5 |
|
| |
| Poisson | 3 |
|
|
| 5 |
|
|
Overall mean accuracy (standard error) for repeated measures LDA procedures based on GEE, and MLE by population distribution, number of responses, number of measurements occasions, and correlation structure.
| Population distribution | Number of responses | Number of measurements occasions | GEE | MLE | ||
|---|---|---|---|---|---|---|
| UNAR | UNCS | UNAR | UNCS | |||
| Normal | 3 | 3 | 0.62 (0.04) | 0.64 (0.04) | 0.63 (0.04) | 0.65 (0.04) |
| 5 | 0.73 (0.04) | 0.75 (0.04) | 0.69 (0.04) | 0.70 (0.04) | ||
| 5 | 3 | 0.68 (0.04) | 0.74 (0.04) | 0.66 (0.04) | 0.66 (0.04) | |
| 5 | 0.83 (0.03) | 0.89 (0.03) | 0.82 (0.03) | 0.63 (0.03) | ||
| Poisson | 3 | 3 | 0.88 (0.04) | 0.90 (0.03) | 0.79 (0.04) | 0.81 (0.04) |
| 5 | 0.97 (0.02) | 0.97 (0.03) | 0.84 (0.05) | 0.85 (0.05) | ||
| 5 | 3 | 0.99 (0.01) | 0.99 (0.01) | 0.89 (0.04) | 0.90 (0.04) | |
| 5 | 0.99 (0.01) | 0.99 (0.01) | 0.95 (0.02) | 0.95 (0.02) | ||
| Mixed-type | 3 | 3 | 0.62 (0.04) | 0.63 (0.04) | 0.55 (0.04) | 0.55 (0.04) |
| 5 | 0.72 (0.04) | 0.74 (0.04) | 0.58 (0.04) | 0.58 (0.04) | ||
| 5 | 3 | 0.68 (0.04) | 0.72 (0.04) | 0.67 (0.04) | 0.57 (0.04) | |
| 5 | 0.81 (0.03) | 0.87 (0.03) | 0.62 (0.04) | 0.62 (0.04) | ||
UNAR: unstructured between-responses and autoregressive order 1 within-response correlation matrix; UNCS: unstructured between-responses and compound symmetry within-response correlation matrix; GEE: generalized estimating equation; MLE: maximum likelihood estimation.
Overall mean accuracy (standard error) for repeated measures QDA procedures based on GEE, and MLE by population distribution, Number of responses, number of measurements occasions, and correlation structure.
| Population distribution | Number of responses | Number of measurements occasions | GEE | MLE | ||
|---|---|---|---|---|---|---|
| UNAR | UNCS | UNAR | UNCS | |||
| Normal | 3 | 3 | 0.77 (0.04) | 0.80 (0.04) | 0.65 (0.04) | 0.66 (0.04) |
| 5 | 0.85 (0.04) | 0.88 (0.04) | 0.71 (0.04) | 0.71 (0.04) | ||
| 5 | 3 | 0.85 (0.04) | 0.89 (0.04) | 0.66 (0.04) | 0.66 (0.04) | |
| 5 | 0.90 (0.03) | 0.94 (0.03) | 0.85 (0.03) | 0.90 (0.02) | ||
| Poisson | 3 | 3 | 0.93 (0.03) | 0.94 (0.03) | 0.78 (0.04) | 0.79 (0.04) |
| 5 | 0.99 (0.01) | 0.98 (0.03) | 0.85 (0.05) | 0.85 (0.05) | ||
| 5 | 3 | 0.99 (0.01) | 0.99 (0.01) | 0.90 (0.04) | 0.92 (0.03) | |
| 5 | 0.99 (0.01) | 0.99 (0.01) | 0.95 (0.02) | 0.95 (0.02) | ||
| Mixed-type | 3 | 3 | 0.74 (0.04) | 0.75 (0.04) | 0.56 (0.04) | 0.55 (0.04) |
| 5 | 0.84 (0.04) | 0.85 (0.04) | 0.58 (0.04) | 0.58 (0.06) | ||
| 5 | 3 | 0.83 (0.04) | 0.86 (0.04) | 0.58 (0.04) | 0.58 (0.04) | |
| 5 | 0.91 (0.03) | 0.94 (0.03) | 0.63 (0.04) | 0.63 (0.04) | ||
UNAR: unstructured between-responses and autoregressive order 1 within-response correlation matrix; UNCS: unstructured between-responses and compound symmetry within-response correlation matrix; GEE: generalized estimating equation; MLE: maximum likelihood estimation.
Figure 1.Observed longitudinal profiles of number of anti-epileptic drugs (AEDs), quality of life, and seizure severity from the Remission group (left column) and the Refractory group (right column). Solid lines show LOESS smoothed profiles for Poisson, normal, and binomial models calculated using data from all patients. Baseline (0 month), and 6 months, 12 months, and 24 months.
GEE group-specific correlation parameter estimates for HERQULES data by the assumed correlation structure.
| Remission | Refractory | |||
|---|---|---|---|---|
| UNAR | UNCS | UNAR | UNCS | |
|
| 0.812 | 0.749 | 0.744 | 0.726 |
| Corr(Y2Y1) | –0.025 | –0.023 | ||
| Corr(Y3Y1) | 0.003 | 0.001 | ||
| Corr(Y3Y2) | –0.042 | –0.038 | ||
UNAR: unstructured between-responses and autoregressive order 1 within-response correlation matrix; UNCS: unstructured between-responses and compound symmetry within-response correlation matrix; number of (AEDs) (Y1), HRQOL (Y2), Severe Seizure (Y3).
Classification accuracy for the generalized estimating equation (GEE), and maximum likelihood estimation (MLE) methods for repeated measures LDA and QDA by the assumed correlation structure.
| GEE | MLE | |||
|---|---|---|---|---|
| UNAR | UNCS | UNAR | UNCS | |
| LDA | ||||
| Remission | 0.772 | 0.770 | 0.762 | 0.760 |
| Refractory | 0.651 | 0.640 | 0.570 | 0.558 |
| Overall | 0.711 | 0.705 | 0.665 | 0.660 |
| QDA | ||||
| Remission | 0.871 | 0.880 | 0.752 | 0.750 |
| Refractory | 0.709 | 0.698 | 0.581 | 0.570 |
| Overall | 0.790 | 0.789 | 0.667 | 0.660 |
LDA: linear discriminant analysis; QDA: quadratic discriminant analysis; GEE: generalized estimating equation; MLE: maximum likelihood estimation; UNAR: unstructured between responses and autoregressive order 1 within response correlation matrix; UNCS: unstructured between responses and compound symmetry within response correlation matrix.