| Literature DB >> 35780114 |
Anthony Molinaro1, Frank DeFalco2.
Abstract
BACKGROUND: Seasonality classification is a well-known and important part of time series analysis. Understanding the seasonality of a biological event can contribute to an improved understanding of its causes and help guide appropriate responses. Observational data, however, are not comprised of biological events, but timestamped diagnosis codes the combination of which (along with additional requirements) are used as proxies for biological events. As there exist different methods for determining the seasonality of a time series, it is necessary to know if these methods exhibit concordance. In this study we seek to determine the concordance of these methods by applying them to time series derived from diagnosis codes in observational data residing in databases that vary in size, type, and provenance.Entities:
Keywords: ACHILLES; ARIMA; CASTOR; Classification; Common data model; Cyclical; OHDSI; OMOP CDM; Observational data; R; Seasonality; Time series
Mesh:
Year: 2022 PMID: 35780114 PMCID: PMC9250712 DOI: 10.1186/s12874-022-01652-3
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.612
Databases used in this study
| Database | Time Series | People | Type | Period |
|---|---|---|---|---|
| Premier Healthcare Database (PHD) | 6635 | 264 M | Hospital Charge | 2000–2021 |
| Japan Medical Data Center (JMDC) | 2956 | 13 M | Claims | 2000–2021 |
| Optum Electronic Health Records (EHR) | 12,102 | 101 M | Electronic Health Records | 2007–2021 |
IBM MarketScan® Commercial Claims and Encounters (CCAE) | 11,051 | 157 M | Claims | 2000–2021 |
| IQVIA Disease Analyzer - France (FRA) | 896 | 4 M | General Practitioner | 2016–2021 |
| IQVIA Disease Analyzer – Germany (GER) | 3208 | 31 M | General Practitioner | 2011–2021 |
| IQVIA Australian Longitudinal Patient Data (AUS) | 408 | 5 M | General Practitioner | 1996–2021 |
IBM MarketScan® Medicare Supplemental and Coordination of Benefits (MDCR) | 6596 | 10 M | Claims | 2000–2021 |
| IBM MarketScan® Multi-State Medicaid (MDCD) | 6478 | 31 M | Claims | 2006–2021 |
Optum Clinformatics Extended Data Mart – Date of Death (DOD) | 11,137 | 91 M | Claims | 2000–2021 |
Methods Summary
| METHOD NAME | ABBREVIATION | BRIEF DESCRIPTION |
|---|---|---|
| Edwards’ Test [ | ED | Hypothesis test of a harmonic model of data using a linear combination of sine and cosine (periodic for 2nπ, thus trend removal is not required). The modeled data are fit using a Poisson generalized linear model. Seasonality is determined by evaluating the peaks and troughs of the modeled curve fit to the observed time series. Implementation in R follows [ |
| Friedman’s Test [ | FR | Hypothesis test using a non-parametric approach for comparing samples within a population or from populations with identical medians. A rank-based approach is employed to test the hypothesis of no seasonality of the ranked months. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
| ARIMA Hypothesis Test [ | AR | Hypothesis test to determine if the seasonal component is significant when compared to an identical ARIMA model without a seasonal component. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
| QS Test [ | QS | Hypothesis test to determine seasonality by examining the autocorrelation of seasonal lags. The observed time series is seasonal if positive autocorrelations at either lag 12 or 24 are significant. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
| ETS Hypothesis Test [ | ET | Hypothesis test to determine if the seasonal component is significant when compared to an identical ETS model without a seasonal component. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
| Kruskal-Wallis Test [ | KW | Hypothesis test using a non-parametric approach to compare samples from a population. A rank-based approach is employed to test the hypothesis that the monthly data have the same mean. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
| Welch’s Test [ | WE | Hypothesis test employing one-way ANOVA, but allowing for unequal variances amongst the groups of months. Seasonality is determined if hypothesis that the monthly means are identical is rejected. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
| Auto ARIMA Test [ | AA | Test based on minimizing forecast errors across different models. The observed time series is considered seasonal if the optimal ARIMA model chosen (the one that minimizes forecast error) includes a seasonal component. Any linear trend in the data is removed prior to testing for seasonality. Implementation in R follows [ |
Proportion of time series classified seasonal, p < 0.05, blue indicates min, red indicates max
Proportion of time series classified seasonal, p < 0.1, blue indicates min, red indicates max
Proportion of time series classified seasonal, p < 0.01, blue indicates min, red indicates max
Fig. 1Stacked bar chart visualizing concordance by database across all significance levels
Fig. 2UpSetR plot visualizing 40 different method combinations of seasonality classification for OPTUM DOD, p < 0.05
Fig. 3Nine time series from OPTUM DOD and their binary classification by each method, p < 0.05. Green method abbreviation indicates seasonal. Red method abbreviation indicates nonseasonal. N = The number of times the specified (green-red) combination occurred. M = The number of times any numerically similar (i.e., p seasonal and q non-seasonal) combination occurred