Literature DB >> 33055094

Use of interrupted time series methods in the evaluation of health system quality improvement interventions: a methodological systematic review.

Celestin Hategeka^1,2, Hinda Ruton^2,3, Mohammad Karamouzian^4,5, Larry D Lynd^6,7, Michael R Law².

Abstract

BACKGROUND: When randomisation is not possible, interrupted time series (ITS) design has increasingly been advocated as a more robust design to evaluating health system quality improvement (QI) interventions given its ability to control for common biases in healthcare QI. However, there is a potential risk of producing misleading results when this rather robust design is not used appropriately. We performed a methodological systematic review of the literature to investigate the extent to which the use of ITS has followed best practice standards and recommendations in the evaluation of QI interventions.
METHODS: We searched multiple databases from inception to June 2018 to identify QI intervention studies that were evaluated using ITS. There was no restriction on date, language and participants. Data were synthesised narratively using appropriate descriptive statistics. The risk of bias for ITS studies was assessed using the Cochrane Effective Practice and Organisation of Care standard criteria. The systematic review protocol was registered in PROSPERO (registration number: CRD42018094427).
RESULTS: Of 4061 potential studies and 2028 unique records screened for inclusion, 120 eligible studies assessed eight QI strategies and were from 25 countries. Most studies were published since 2010 (86.7%), reported data using monthly interval (71.4%), used ITS without a control (81%) and modelled data using segmented regression (62.5%). Autocorrelation was considered in 55% of studies, seasonality in 20.8% and non-stationarity in 8.3%. Only 49.2% of studies specified the ITS impact model. The risk of bias was high or very high in 72.5% of included studies and did not change significantly over time.
CONCLUSIONS: The use of ITS in the evaluation of health system QI interventions has increased considerably over the past decade. However, variations in methodological considerations and reporting of ITS in QI remain a concern, warranting a need to develop and reinforce formal reporting guidelines to improve its application in the evaluation of health system QI interventions. © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical

Keywords: health systems; systematic review

Mesh：

Year: 2020 PMID： 33055094 PMCID： PMC7559052 DOI： 10.1136/bmjgh-2020-003567

Source DB: PubMed Journal: BMJ Glob Health ISSN： 2059-7908

Interrupted time series (ITS) is one of the strongest quasi-experimental designs. ITS has increasingly been advocated for use in the evaluation of health system quality improvement (QI) interventions when randomisation is not possible. Our findings showed variations in methodological considerations and reporting of ITS in QI—for example, only about half of studies specified the ITS impact model and considered autocorrelation in the ITS analyses. The risk of bias was high or very high in about three-quarters of included studies and did not appear to have changed significantly over time. While use of ITS in the evaluation of health system QI interventions has increased considerably over the past decade, the quality of ITS studies still shows room for improvement. Developing and/or reinforcing formal reporting guidelines could improve application of ITS in the evaluation of health system QI interventions.

Introduction

Improving health system quality is vital to ensure that utilisation of healthcare maximises population health outcomes, economic benefit and people’s confidence in health systems.1–3 A key strategy to improving the quality of care is the use of context-appropriate health system quality improvement (QI) interventions. QI interventions employ different approaches targeting patients or users (eg, education of patients and promotion of self-management), healthcare providers (eg, clinical education and reminders) and health systems (eg, organisational change and electronic patient registry) to bolster quality of care.4 5 While evaluating these interventions is key to understanding their impact, many are designed and implemented without predetermined evaluation plans.6–9 In such cases, rigorous quasi-experimental designs can be employed.10–13 Concerns persist, however, about the potential biases in some weaker observational study designs.10 11 As a result, interrupted time series (ITS) analysis has increasingly been advocated as one of the more robust observational designs as it can control for the secular trends present in many health system outcomes.7 8 14 Additionally, ITS can easily be interpreted by lay audiences, especially when graphical displays of the results are employed. Similarly, the Cochrane Effective Practice and Organisation of Care (EPOC) recommends ITS along with experiments and controlled before and after designs for inclusion in EPOC reviews.15 There remains a risk, however, of producing misleading or incorrect results when this rather robust design is not used appropriately.16–18 While use of ITS in health research has increased substantially over the last few decades (online supplemental appendix figure 1), previous systematic reviews have shown heterogeneity in the reporting and methodological concerns in the application of ITS.17 19–21 However, none of these reviews have focused on use of ITS in the evaluation of health system QI interventions.17 19 20 Another review examined characteristics and reporting of ITS in healthcare more broadly.21 However, this review was restricted to just 1 year (ie, 2015) and one database and excluded controlled ITS studies.21 Similarly, they did not assess the risk of bias of included studies. Therefore, the extent to which the use of ITS has followed best practice standards and recommendations in the evaluation of health system QI interventions remains unclear.

Methodological considerations with interrupted time series

The ITS design relies on data collected at multiple intervals over time (ie, time series data) before and after an intervention to establish a causal relationship between an intervention (eg, QI) and an outcome of interest (eg, health outcomes).22 The Cochrane EPOC recommends ITS studies with at least three data points before and after an intervention for inclusion in its reviews.15 ITS can be either retrospective or prospective, and can further be classified into two types based on whether one group was used (single ITS) or a control group was added (controlled ITS).10 22 Single ITS (SITS) compares longitudinal changes before and after the intervention in the exposed group, whereas controlled ITS (CITS) compares longitudinal changes before and after the intervention between an exposed group and a control group (figures 1 and 2).7 18 22–25 Diagrammatic representation of single interrupted time series. Diagrammatic representation of controlled interrupted time series. Single ITS assumes that level and trend in a given outcome measure in the group exposed to the intervention would have remained the same absent the intervention (figure 1). In contrast, the counterfactual assumption for controlled ITS is that the level and trend in the group exposed to the intervention would have changed in the same way as was observed in the control group (figure 2).7 18 22–25 It is recommended to specify anticipated changes in the level or trend in outcome measures a priori, for example, based on expert knowledge.17 26 27 Graphical figures of results of ITS studies are vital as they allow readers to see pre-existing levels and trends, the implementation of the intervention and the impact of the intervention.18 20 28

Figure 1

Diagrammatic representation of single interrupted time series.

Figure 2

Diagrammatic representation of controlled interrupted time series.

CITS is arguably a stronger quasi-experimental design as it has two controls (baseline trend in the intervention group and control group), allowing both within-group and between-group comparisons.10 This enhances its capacity to control for potential threats to internal validity such as history (co-intervention), maturation, instrumentation, regression to the mean and attrition.10 18 22 24 29 However, SITS may be more vulnerable to these threats to internal validity. For example, Baicker and Svoronos recently reanalysed data from the Oregon Health Insurance experiment using SITS and found results that were not consistent with the randomised trial findings.30 Therefore, it is generally recommended to add a control group to ITS wherever possible.10 22 Bernal and colleagues’ study reviewed various types of controls that can be added to strengthen ITS design, including location and characteristic-based controls and control outcome.24 When QI interventions are multisite and implemented at different times, this needs to be taken into account when assigning index time to a control group using techniques such as matching and random assignment of index time to controls.

Statistical analysis of interrupted time series

While there are many statistical approaches used for estimating effects of interventions in ITS studies, segmented regression analysis of ITS is the most commonly used, followed by autoregressive integrated moving average (ARIMA) models.16 20 21 Details on segmented regression analysis can be found in the seminal paper of Wagner and colleagues.18 While most ITS studies use segmented regression analysis on time series data aggregated within periods of time, analysis on individual-level data is feasible using mixed effects models. Saeed and colleagues discuss how to combine generalised mixed effects models with segmented regression to evaluate the effectiveness of healthcare interventions.31 Similarly, Ewusie suggests incorporating patient variability and sample size as weights in a weighted segmented regression model of ITS to account for potential heterogeneity introduced by population variability within and/or between sites.16 Power in ITS depends on several factors, including the number and distribution of data points before and after the intervention, the stability of data points, strength of effect, autocorrelation and seasonality.26 32 Missing observations are common in time series data of health service utilisation and quality of care and should be dealt with using appropriate techniques such as multiple imputation and maximum likelihood estimation.25 33–35 As successive time series observations may be correlated, it is recommended to assess for autocorrelation using appropriate modelling techniques.18 There are several tests for assessing autocorrelation including Durbin-Watson tests and visual plots of the autocorrelation function and partial autocorrelation function.18 36 Moreover, seasonality is common in time series data and should be adjusted for as appropriate. There are several techniques for adjusting for seasonality in time series analyses, including deseasonalising time series data prior to fitting statistical models, adding dummy calendar variable (eg, calendar month), using Fourier terms (pairs of sine and cosine functions) and splines.26 37 When using monthly time series data, at least 12 data points before and after the intervention are recommended to be able to adjust for seasonality meaningfully.18 Additionally, non-stationarity—a time series exhibits an underlying trend that is not related to the intervention—should be assessed and, where significant, stationarised as appropriate. Non-stationarity can be assessed using the Augmented Dickey-Fuller test.38 When autocorrelation, seasonality and/or non-stationarity are present in time series data and are not adjusted for, this can result in biased results. Despite the rich literature on how to approach ITS studies from a methodological and statistical standpoint, we do not have good information on the degree to which evaluations of healthcare QI interventions meet these criteria. Thus, we systematically reviewed the literature to investigate the extent to which these evaluations have followed best practice standards.

Methods

This methodological systematic review was conducted according to a prespecified study protocol registered in PROSPERO (registration number: CRD42018094427) on 11 June 2018.39

Search strategy

Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (online supplemental appendix table 1),40 we searched for healthcare QI intervention studies that were evaluated using ITS and were published in peer-reviewed journals from inception to June 2018. We also searched the grey literature to capture any relevant studies. The search strategy is summarised in table 1 and full details are provided in online supplemental appendix table 2.

Table 1

Summary of the search strategy

Search concepts*	Interrupted time series analysis (eg, interrupted time series analysis OR ITS studies OR interrupted time series OR time series OR trend analysis OR segmented regression OR Piecewise regression OR broken-stick regression) AND healthcare QI interventions (eg, quality improvement OR healthcare quality OR quality control OR quality assurance OR clinical audit).
Databases	MEDLINE, EMBASE, CINAHL, Web of Science, Global Health, Google Scholar, Africa-Wide, Latin American and Caribbean Health Sciences Literature (LILACS), Index Medicus for the South-east Asian Region (IMSEAR), Index Medicus for the Eastern Mediterranean Region (IMEMR), and Western Pacific Rim Region Index Medicus (WPRIM).
Other search strategies	Complementing electronic databases, hand searches of the bibliographies of relevant published works and previous reviews, relevant conference proceedings (eg, International Forum on Quality and Safety in Healthcare, Institute of Healthcare Improvement) were also performed.
Restrictions	No restrictions applied on date of publication, language of study, participants, or the type of QI outcome.

*Search terms were combined using appropriate Boolean operators and included subject heading terms and/or key words for two key themes (interrupted time series analysis and healthcare quality improvement) and were adjusted to fit each database requirements.

ITS, interrupted time series; QI, quality improvement.

Summary of the search strategy *Search terms were combined using appropriate Boolean operators and included subject heading terms and/or key words for two key themes (interrupted time series analysis and healthcare quality improvement) and were adjusted to fit each database requirements. ITS, interrupted time series; QI, quality improvement.

Inclusion and exclusion criteria

The population, interventions, comparisons, outcomes and study designs considered for review are listed in table 2. Guided by the Shojania et al’s taxonomy for QI strategies, we identified interventions targeting health systems, health providers and/or patients for inclusion.4 5 Studies that did not have at least three preintervention and postintervention data points were excluded, consistent with the Cochrane EPOC recommendation.15 Similarly, given that this review focused on empirical applications of ITS, systematic reviews/meta-analyses, letters, commentaries, methodological contributions, study protocols and conference abstracts were excluded.

Table 2

Population, interventions, comparisons, outcomes and study designs for study inclusion

Criteria	Definition
Population	All types of patients/consumers and healthcare professionals/providers are eligible. All levels of healthcare delivery are eligible.
Intervention	Interventions to improve quality of healthcare4 5: Provider reminder systems; Facilitated relay of clinical data to providers; Audit and feedback; Provider education; Patient education; Patient reminder systems; Promotion of self-management; Organisational change; and Financial incentives, regulation and policy
Comparison	Not relevant given this review is not focused on any particular comparator.
Outcomes	Not relevant given this is a methodological review.
Study designs	Interrupted time series

Population, interventions, comparisons, outcomes and study designs for study inclusion Provider reminder systems; Facilitated relay of clinical data to providers; Audit and feedback; Provider education; Patient education; Patient reminder systems; Promotion of self-management; Organisational change; and Financial incentives, regulation and policy

Data extraction and analysis

The titles and abstracts of unique results from all databases and grey literature were reviewed independently by two researchers (CH and HR) for potential inclusion. The full texts of studies retained at the title and abstract screening stage were retrieved and independently assessed for inclusion by two researchers (CH and HR). Any discrepancies were resolved through discussion and consensus, with a third researcher (MRL) consulted wherever necessary. Data extraction on each included study was conducted using a data extraction tool, developed and piloted a priori drawing on a checklist on the methodological and reporting recommendations for ITS studies adapted from the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (online supplemental appendix table 3).20 Data elements included study characteristics (eg, publication year, country), intervention characteristics (eg, QI strategies, single/multiple interventions), methodological considerations (eg, ITS model, autocorrelation, use of control). We synthesised extracted data narratively using descriptive statistics and following the review protocol registered in PROSPERO. Data extraction was done by one researcher (CH); however, a validation was done by a second researcher (HR) who verified a random subsample consisting of 10% of eligible studies. The agreement between the two researchers (CH and HR) was very high (>95%). Additionally, where necessary, CH consulted with the researcher team members during data extraction. Lastly, assessment of the risk of bias of included studies was performed by two researchers (CH and HR).

Assessment of risk of bias

The risk of bias assessment was guided by the Cochrane Effective Practice and Organisation of Care (EPOC) standard criteria for assessing risk of bias for ITS studies.17 27 41 Specifically, we assessed the following EPOC standard criteria: Was the intervention independent of other changes? Was the shape of the intervention effect prespecified? Was the intervention unlikely to affect data collection? Was the primary outcome measured objectively? Were incomplete outcome data adequately addressed? Was the study free of selective outcome reporting? Was the study analysed appropriately using interrupted time series techniques? Criteria 1, 3 and 4 assess the threat of history, instrumentation and testing, respectively. Criterion 2 assesses whether the ITS impact model was specified a priori, while criterion 5 assesses whether missing data were dealt with appropriately. Criterion 6 assesses whether all relevant outcomes that were part of the study objectives were reported. Lastly, criterion 7 assesses whether data were analysed appropriately such as using appropriate models (eg, segmented regression and ARIMA) and considering key methodological recommendations (eg, autocorrelation and seasonality). Online supplemental appendix table 4 describes how we performed risk of bias scoring among included studies. Each criterion scored ‘0’ if low risk and ‘1’ otherwise. For each study, we created an aggregate score by combining scores across the seven criteria. Subsequently, the study-level risk of bias aggregate score was categorised as low (score=0), moderate (score=1 or 2), high (score=3 or 4) and very high (score >4). Trends in risk of bias over time were also assessed. The time for each included study was defined as the year of the publication. Time trend for risk of bias was assessed by fitting a logistic regression model comparing the proportion of studies with a low or moderate risk of bias to the studies with high or very high risk of bias as has previously been done.42 As the literature search end date was June 2018, studies published in 2018 that were captured by our search strategy may not have been representative of all studies published in 2018. As such, for time trend analyses, studies published in 2018 were excluded. All statistical analyses were conducted using R version 4.0.2

Patient and public involvement

This research was done without patient involvement. Patients were not invited to comment on the study design and were not consulted to develop patient relevant outcomes or interpret the results. Patients were not invited to contribute to the writing or editing of this document for readability or accuracy.

Results

The search strategy identified 4061 results, of which 2028 unique eligible records were screened for inclusion. Abstract and full-text screening identified 120 studies that met the inclusion criteria.43–162 A summary of this process is presented in the PRISMA flow diagram in figure 3.

Figure 3

Flow diagram of the selection of included studies.

Characteristics of included studies

The 120 studies represented a diversity of QI strategies from different geographical settings. Included studies were from 25 countries, with 106 (88.3%) from high-income countries and 14 (11.7%) from low- and middle - income countries (LMICs) as defined by the 2019 World Bank Classification (online supplemental appendix figure 2). The number of studies published over time increased considerably, with 104 (86.7%) studies published in 2010 or later (online supplemental appendix figure 3). The included studies assessed eight QI strategies, with provider education (39.2%), audit and feedback (30.0%), organisational change (24.2%) and provider reminder systems (19.2%) being the most reported QI strategies (figure 4). Nearly half of studies were multisite (44.2%).

Figure 4

Type of quality improvement (QI) strategy reported in the included studies.

Reporting and methodological considerations

Interrupted time series methodological consideration of included studies

All included studies reported the use of an ITS design in their titles and/or abstracts and provided clear timing related to the QI implementation (table 3). Nearly all studies provided a background/rationale for using ITS (98.3%) and provided a description of the intervention (96.7%). A fifth of studies reported on evaluation of multiple phases of QI interventions, of which 76.0% adjusted for multiple phase of QI interventions in their analyses. All included studies reported on the study population, inclusion criteria and outcomes. Nearly all studies (99.2%) used data collected at regular intervals and reported data sources, of which 57% were administrative data including Health Management Information System data, 35% were patient records and 8% were survey data.

Table 3

ITS methodological considerations of included studies

Characteristics	n	%
ITS design reported in the title and/or abstract	120	100
Background/rationale reported	118	98.3
Study objectives reported	120	100
Description of QI intervention	116	96.6
Start (and end) of QI intervention reported	120	100
Multiple QI interventions	25	20.8
2 interventions	22	88
3 interventions	1	4
4 interventions	2	8
Adjusted multiple interventions in the analyses	19	76
Study setting reported	120	100
Multisite	53	44.2
Study period reported	120	100
Study population reported	120	100
Cohort definition reported	120	100
Inclusion criteria reported	120	100
Data sources reported	119	99.2
Data completeness and validity reported	14	11.6
Time point intervals, monthly	85	71.4
Data collected regularly (regular interval)	119	99.2
Time points clearly reported	119	99.2
Rationale for the number and spacing of data points described	51	42.5
Outcome measure(s) reported	120	100
Format of outcome(s) reported	120	100
Unit of analysis
Aggregated	51	42.5
Individual	11	9.2
Not reported	58	48.3
ITS models
Segmented regression	75	62.5
ARIMA	19	15.8
Other models (eg, GEE, linear regression, mixed effect model, spline regression, poison regression, Prais-Winsten regression, logistic regression)	17	14.2
Not reported	14	11.6
Autocorrelation
Checked/adjusted as appropriate	66	55
Test(s) used to check for autocorrelation
Durbin-Watson test statistic	24	36.4
Other tests (eg, ACF and PACF, Ljung-Box χ² test, residual plots and Breusch-Godfrey test, Breusch-Godfrey test, Cumby-Huizinga test)	10	15.2
Not reported	33	50
Seasonality
Checked/adjusted	25	20.8
Non-stationarity
Checked/adjusted	10	8.3
Test(s) used to check
Augmented Dickey-Fuller test	7	70
Not reported	2	20
Control group used	22	18.3
Type of control, location-based control	13	59.1
Analyses of controlled ITS
Combined	5	22.7
Separated	12	54.5
Difference	2	9.1
Not reported	3	13.6
Specify ITS impact model (or provided basic ITS model structure)	59	49.2
Use of lag period	34	28.4
Sensitivity analyses	15	12.5
Reported statistical software used	97	80.8

ACF, autocorrelation function; ARIMA, autoregressive integrated moving average; GEE, generalised estimating equation; ITS, interrupted time series; PACF, partial autocorrelation function; QI, quality improvement.;

ITS methodological considerations of included studies 2 interventions 3 interventions 4 interventions Aggregated Individual Not reported Segmented regression ARIMA Other models (eg, GEE, linear regression, mixed effect model, spline regression, poison regression, Prais-Winsten regression, logistic regression) Not reported Checked/adjusted as appropriate Durbin-Watson test statistic Other tests (eg, ACF and PACF, Ljung-Box χ2 test, residual plots and Breusch-Godfrey test, Breusch-Godfrey test, Cumby-Huizinga test) Not reported Checked/adjusted Checked/adjusted Augmented Dickey-Fuller test Not reported Combined Separated Difference Not reported ACF, autocorrelation function; ARIMA, autoregressive integrated moving average; GEE, generalised estimating equation; ITS, interrupted time series; PACF, partial autocorrelation function; QI, quality improvement.; Most studies (71.4%) used monthly data points, with a median preintervention and postintervention period of 18 months (range, 3–120) and 20 months (range 4–90), respectively (table 3). Less than half (42.5%) of studies provided a rationale for the number and spacing of data points described. Just over half reported the unit of analysis clearly (42.5% used data aggregated at the study site level and 9.2% used disaggregated data). Most studies used ITS without a control (81%) and modelled data using segmented regression (62.5%), while 1 in 10 (11.6%) did not report what model was used. For studies that included a control group, 59.1% of studies used a location-based control and about half (54.5%) modelled intervention and control group in a separate ITS model. About half (55%) of studies reported checking/adjusting for autocorrelation in their ITS models, with the Durbin-Watson test statistic being the test frequently used to check for autocorrelation (36.4%). A fifth (20.8%) of studies reported checking and adjusting for seasonality. Similarly, only 10 (8.3%) studies reported checking/adjusting for non-stationarity in their ITS models, with the augmented Dickey-Fuller test being the most frequently used (70%). Stratified analysis showed that these items were more likely to be reported with ARIMA models compared with segmented regression (online supplemental appendix table 5). Only about half (49.2%) of studies specified the ITS impact model. Over a tenth (12.5%) of studies considered sensitivity analyses.

Reporting of interrupted time series study results and interpretation

Most studies reported characteristics of study sample (70%) and reported all outcomes examined over the study period (87.5%) (table 4). Similarly, most studies reported the impact of QI using level and trend changes (89.2%), reported CI or SE (81.7%), and reported graphical figures displaying results (92.5%). Of the studies that included graphical figures, 57.6% showed observed trends and 12.6% showed the counterfactual. Further, 29.2% of studies reported absolute and/or relative changes and their significance, for example, clinical and policy. Most studies reported key results and discussed potential limitations of the study. Less than half of studies explicitly discussed potential history threats such as co-interventions. Similarly, 42.1% of studies that mentioned potential limitations also discussed the direction or magnitude of these potential bias(es). About a quarter of studies discussed the stability of outcome data over time.

Table 4

Reporting of ITS study results and interpretation

Results	n	%
Participants
Characteristics in each study group	84	70
Flow diagram of study participant selection	11	9.2
Outcomes
Reported all outcomes examined over the study period)	105	87.5
Report the average, minimum and maximum number of outcomes across time intervals	22	18.3
Reported level/trend changes	107	89.2
Report absolute and/or relative changes and their significance, eg, clinical, policy and statistical	35	29.2
Report CI or SE	98	81.7
Graphical figures to display results	111	92.5
Fitted lines (trend)	64	57.6
Counterfactual lines	14	12.6
Used time lag and showed it on figure in results	23	67.6
Results of sensitivity analyses if relevant	3	20
Interpretation
Key results	119	99.2
Context (related to possible confounding)	113	97.4
Relevant co-interventions	56	46.7
Stability of the participant characteristics over time	24	20
Stability of outcome coding over time	28	23.3
Limitations of the study
Discussion of limitations of the study	114	95
Data variability/appropriateness of number data points	31	25.8
Discussion direction/magnitude of any potential bias	48	42.1

Reporting of ITS study results and interpretation Characteristics in each study group Flow diagram of study participant selection Reported all outcomes examined over the study period) Report the average, minimum and maximum number of outcomes across time intervals Reported level/trend changes Report absolute and/or relative changes and their significance, eg, clinical, policy and statistical Report CI or SE Graphical figures to display results Fitted lines (trend) Counterfactual lines Used time lag and showed it on figure in results Results of sensitivity analyses if relevant Key results Context (related to possible confounding) Relevant co-interventions Stability of the participant characteristics over time Stability of outcome coding over time Discussion of limitations of the study Data variability/appropriateness of number data points Discussion direction/magnitude of any potential bias

Risk of biases of included studies

Risk of bias was low for 3 (2.5%) studies, moderate for 30 (25.0%), high for 64 (53.3%) and very high for 23 (19.2%). The proportion of studies with a low or moderate risk of bias did not significantly change over time (OR 1.15, 95% CI 0.96 to 1.40, p=0.14) (figure 5). Figure 6 shows the risk of bias across the seven EPOC standard criteria for assessing bias in ITS studies. Details on risk of bias assessment for each included study are provided elsewhere.6

Figure 5

Figure 6

Summary of quality assessment of included studies. NA, not applicable. In this study, NA refers to studies with complete data.

Number and risk of bias of included studies over time. Our literature search end date was June 2018, and as such, studies published in 2018 that were captured by our search strategy may not have been representative of all studies published in 2018. Summary of quality assessment of included studies. NA, not applicable. In this study, NA refers to studies with complete data.

Discussion

Over the past few decades, ITS has been an increasingly popular method to evaluate QI interventions in healthcare. We investigated whether the use of ITS in such evaluations has followed best practice standards and recommendations. Our findings showed variations in methodological considerations and reporting of ITS in QI. In particular, we found that most studies identified used ITS without a control and only about half of studies specified the ITS impact model and considered autocorrelation in the ITS analyses. Of concern, the risk of bias was high or very high in about three-quarters of included studies and did not appear to have changed significantly over time. While ITS is arguably one of the strongest quasi-experimental designs available, ITS studies may be subjected to several threats to internal validity.10 18 22 These internal validity threats can often be minimised by adding a control time series group to ITS where feasible.10 18 22 24 However, this review found that a control group was used in less than 20% of included studies, with a location-based control being the most commonly used. When QI interventions were multisite and implemented at different times, approaches used to assign control were not often discussed clearly. Most of studies included in this review were from high-income countries; however, with expanding routine health information systems in LMICs, there is opportunity to conduct well-designed ITS studies for evaluating healthcare interventions including QI in these settings.25 163 Consistent with previous systematic reviews, this review highlights variations in methodological considerations (or application) and reporting of ITS.19–21 For example, Jandoc et al’s (2015) systematic review of application of ITS in drug utilisation research found that about three-quarters of studies examined drug utilisation over monthly intervals, and two-thirds used segmented regression techniques.20 They also found that key methodological considerations in ITS including seasonality and non-stationarity were considered in less than half of studies.20 However, autocorrelation was considered in two-thirds of studies included in Jandoc et al’s review unlike in our review where it was considered in only about half of studies.20 Our findings showed that considerations of autocorrelation, seasonality and non-stationarity were more likely to be reported when ARIMA was used compared with segmented regression, which mirrors prior reviews in other subject areas.20 Similar to this review, a more recent review by Huddson et al (2019) examined characteristics and reporting of single ITS in healthcare in studies published in 2015 and found that only about half of studies considered autocorrelation, 28% considered seasonality and 10% considered non-stationarity.21 Consistent with previous reviews, this review showed that studies used various statistical approaches that may have different effect on study results, highlighting a need for further research on how various statistical analyses used in ITS can affect intervention estimates.20 21 While the number of ITS studies has increased over time, the conduct and reporting of ITS studies still leaves room for improvement. Consistent with previous reviews, variations in methodological considerations and reporting of ITS remain a concern, warranting a need to develop and reinforce formal reporting guidelines to improve the application of ITS in the evaluation of effectiveness of healthcare interventions. Jandoc et al (2015) proposed methodological and reporting recommendations for ITS studies that built on STROBE, and Bernal (2018) proposed a framework for enhanced reporting of ITS framework that can be adapted to help improve design and reporting of ITS in the evaluation of QI.20 23 Additionally, the Cochrane EPOC standard criteria for assessing risk of bias for ITS should be used to improve the design of ITS in the evaluation of QI to reduce risk of bias.17 27 Although these EPOC standard criteria do not provide guidance on how to account for key methodological considerations such as autocorrelation, non-stationarity, seasonality and multiple interventions, there are other resources that are accessible to QI evaluators to help conduct ITS analyses appropriately, such as Wagner et al’s seminal paper on segmented regression analysis of ITS, Bernal et al’s tutorial on use of ITS in the evaluation of public health interventions and Saeed et al’s work on use of segmented generalised mixed effects models to evaluate health outcomes.18 26 31 This methodological systematic review has a number of limitations that need to be highlighted. First, our results are based on what were reported (or not) as performed in eligible studies; we did not reanalyse reported time series data to check whether the statistical analyses undertaken in these studies were performed appropriately. As such, it is possible that we underestimated the risk of bias, especially as related to whether the study data were analysed appropriately using time series techniques—one of the seven EPOC standard criteria for assessing risk of bias for ITS studies. Ramsay and colleagues reanalysed ITS studies data from two systematic reviews and found that approximately 50% of studies that had reported a significant difference were not actually significant when ITS methods were applied appropriately.17 Second, although we did not apply any restrictions to our search strategy, we could still have missed some relevant studies. Lastly, while the recommended conventional double screening was employed in this study, data extraction was done by a single researcher.164 However, a validation on a random subsample consisting of 10% of eligible studies found a very high agreement, making the data extraction less of a concern. In conclusion, while use of ITS in the evaluation of health system QI interventions has increased considerably over the past decade, the quality of ITS studies still shows room for improvement. Moreover, variations in methodological considerations and reporting of ITS in QI research remain a concern, warranting a need to develop and reinforce formal reporting guidelines to improve application of ITS.

143 in total

1. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement.

Authors: David Moher; Alessandro Liberati; Jennifer Tetzlaff; Douglas G Altman
Journal: Ann Intern Med Date: 2009-07-20 Impact factor: 25.391

2. Regionalization and Outcomes of Lung Cancer Surgery in Ontario, Canada.

Authors: Anna M Bendzsak; Nancy N Baxter; Gail E Darling; Peter C Austin; David R Urbach
Journal: J Clin Oncol Date: 2017-07-06 Impact factor: 44.544

3. Effect of a Proton Pump Inhibitor Deprescribing Guideline on Drug Usage and Costs in Long-Term Care.

Authors: Wade Thompson; Matthew Hogel; Yan Li; Kednapa Thavorn; Denis O'Donnell; Lisa McCarthy; Lisa Dolovich; Cody Black; Barbara Farrell
Journal: J Am Med Dir Assoc Date: 2016-06-08 Impact factor: 4.669

4. Improving Physician Communication With Patients as Measured by HCAHPS Using a Standardized Communication Model.

Authors: Devin J Horton; Peter M Yarbrough; Nathan Wanner; Ryan D Murphy; Polina V Kukhareva; Kensaku Kawamoto
Journal: Am J Med Qual Date: 2017-02-01 Impact factor: 1.852

5. The impact of prescribing safety alerts for elderly persons in an electronic medical record: an interrupted time series evaluation.

Authors: David H Smith; Nancy Perrin; Adrianne Feldstein; Xiuhai Yang; Daniel Kuang; Steven R Simon; Dean F Sittig; Richard Platt; Stephen B Soumerai
Journal: Arch Intern Med Date: 2006-05-22

6. Effect of a Handover Tool on Efficiency of Care and Mortality for Interhospital Transfers.

Authors: Cecelia N Theobald; Neesha N Choma; Jesse M Ehrenfeld; Stephan Russ; Sunil Kripalani
Journal: J Hosp Med Date: 2017-01 Impact factor: 2.960

7. Reducing blood culture contamination in community hospital emergency departments: a multicenter evaluation of a quality improvement intervention.

Authors: Wesley H Self; Joyce Mickanin; Carlos G Grijalva; Freda H Grant; Michelle C Henderson; Glenda Corley; D Glen Blaschke Ii; Candace D McNaughton; Tyler W Barrett; Thomas R Talbot; Barbara R Paul
Journal: Acad Emerg Med Date: 2014-03 Impact factor: 3.451

8. Effect of adding a mobile health intervention to a multimodal antimicrobial stewardship programme across three teaching hospitals: an interrupted time series study.

Authors: E Charani; M Gharbi; L S P Moore; E Castro-Sanchéz; W Lawson; M Gilchrist; A H Holmes
Journal: J Antimicrob Chemother Date: 2017-06-01 Impact factor: 5.790

9. Impact of performance grading on annual numbers of acute myocardial infarction-associated emergency department visits in Taiwan: Results of segmented regression analysis.

Authors: I-Shiang Tzeng; Su-Hsun Liu; Kuan-Fu Chen; Chin-Chieh Wu; Jih-Chang Chen
Journal: Medicine (Baltimore) Date: 2016-10 Impact factor: 1.889

10. Time series regression studies in environmental epidemiology.

Authors: Krishnan Bhaskaran; Antonio Gasparrini; Shakoor Hajat; Liam Smeeth; Ben Armstrong
Journal: Int J Epidemiol Date: 2013-06-12 Impact factor: 7.196

16 in total

1. Distributed lag interrupted time series model for unclear intervention timing: effect of a statement of emergency during COVID-19 pandemic.

Authors: Daisuke Yoneoka; Takayuki Kawashima; Yuta Tanoue; Shuhei Nomura; Akifumi Eguchi
Journal: BMC Med Res Methodol Date: 2022-07-25 Impact factor: 4.612

2. Implementation of Rehabilitation and Patient Outcomes During the Initial COVID-19 Pandemic.

Authors: Natsuko Kanazawa; Norihiko Inoue; Takuaki Tani; Koichi Naito; Hiromasa Horiguchi; Kiyohide Fushimi
Journal: Prog Rehabil Med Date: 2022-06-24

3. Interrupted Time Series Analysis of Pediatric Infectious Diseases and the Consumption of Antibiotics in an Atlantic European Region during the SARS-CoV-2 Pandemic.

Authors: Ana Clavería; María Victoria Delgado-Martín; Ana Goicoechea-Castaño; José Manuel Iglesias-Moreno; Clara García-Cendón; María Victoria Martín-Miguel; Rita Villarino-Moure; Carolina Barreiro-Arceiz; Isabel Rey-Gómez-Serranillos; Javier Roca
Journal: Antibiotics (Basel) Date: 2022-02-18

4. Time Distributions of Common Respiratory Pathogens Under the Spread of SARS-CoV-2 Among Children in Xiamen, China.

Authors: Jinhui Wang; Tiantian Xiao; Feifan Xiao; Shaoxian Hong; Shunqin Wang; Jiancheng Lin; Yong Li; Xiaochuan Wang; Kai Yan; Deyi Zhuang
Journal: Front Pediatr Date: 2021-04-12 Impact factor: 3.418

Review 5. Effectiveness of Trainings of General Practitioners on Antibiotic Stewardship: Methods of a Pragmatic Quasi-Experimental Study in a Controlled Before-After Design in South-East-Lower Saxony, Germany (WASA).

Authors: Daniela Gornyk; Martina Scharlach; Brigitte Buhr-Riehm; Carolina Judith Klett-Tammen; Sveja Eberhard; Jona Theodor Stahmeyer; Anika Großhennig; Andrea Smith; Sarah Meinicke; Wilfried Bautsch; Gérard Krause; Stefanie Castell
Journal: Front Pharmacol Date: 2021-04-22 Impact factor: 5.810

6. Impact of the COVID-19 pandemic on emergency admission for patients with stroke: a time series study in Japan.

Authors: Takuaki Tani; Shinobu Imai; Kiyohide Fushimi
Journal: Neurol Res Pract Date: 2021-12-13

7. The impact of changing cigarette smoking habits and smoke-free legislation on orofacial cleft incidence in the United Kingdom: Evidence from two time-series studies.

Authors: Matthew Fell; Craig Russell; Jibby Medina; Toby Gillgrass; Shaheel Chummun; Alistair R M Cobb; Jonathan Sandy; Yvonne Wren; Andrew Wills; Sarah J Lewis
Journal: PLoS One Date: 2021-11-24 Impact factor: 3.240

8. Analysis of the Effects of Electronic Medical Records and a Payment Scheme on the Length of Hospital Stay.

Authors: Melaku Haile Likka; Yukio Kurihara
Journal: Healthc Inform Res Date: 2022-01-31

9. Impact of the COVID-19 pandemic and response on the utilisation of health services in public facilities during the first wave in Kinshasa, the Democratic Republic of the Congo.

Authors: Celestin Hategeka; Simone E Carter; Faustin Mukalenge Chenge; Eric Nyambu Katanga; Grégoire Lurton; Serge Ma-Nitu Mayaka; Dieudonné Kazadi Mwamba; Esther van Kleef; Veerle Vanlerberghe; Karen Ann Grépin
Journal: BMJ Glob Health Date: 2021-07

10. Comparison of six statistical methods for interrupted time series studies: empirical evaluation of 190 published series.

Authors: Simon L Turner; Amalia Karahalios; Andrew B Forbes; Monica Taljaard; Jeremy M Grimshaw; Joanne E McKenzie
Journal: BMC Med Res Methodol Date: 2021-06-26 Impact factor: 4.615