Literature DB >> 33762237

How well can we assess the validity of non-randomised studies of medications? A systematic review of assessment tools.

Elvira D'Andrea¹, Lydia Vinals², Elisabetta Patorno¹, Jessica M Franklin¹, Dimitri Bennett^3,4, Joan A Largent⁵, Daniela C Moga⁶, Hongbo Yuan⁷, Xuerong Wen⁸, Andrew R Zullo^9,10, Thomas P A Debray^11,12, Grammati Sarri¹³.

Abstract

OBJECTIVE: To determine whether assessment tools for non-randomised studies (NRS) address critical elements that influence the validity of NRS findings for comparative safety and effectiveness of medications.
DESIGN: Systematic review and Delphi survey. DATA SOURCES: We searched PubMed, Embase, Google, bibliographies of reviews and websites of influential organisations from inception to November 2019. In parallel, we conducted a Delphi survey among the International Society for Pharmacoepidemiology Comparative Effectiveness Research Special Interest Group to identify key methodological challenges for NRS of medications. We created a framework consisting of the reported methodological challenges to evaluate the selected NRS tools. STUDY SELECTION: Checklists or scales assessing NRS. DATA EXTRACTION: Two reviewers extracted general information and content data related to the prespecified framework.
RESULTS: Of 44 tools reviewed, 48% (n=21) assess multiple NRS designs, while other tools specifically addressed case-control (n=12, 27%) or cohort studies (n=11, 25%) only. Response rate to the Delphi survey was 73% (35 out of 48 content experts), and a consensus was reached in only two rounds. Most tools evaluated methods for selecting study participants (n=43, 98%), although only one addressed selection bias due to depletion of susceptibles (2%). Many tools addressed the measurement of exposure and outcome (n=40, 91%), and measurement and control for confounders (n=40, 91%). Most tools have at least one item/question on design-specific sources of bias (n=40, 91%), but only a few investigate reverse causation (n=8, 18%), detection bias (n=4, 9%), time-related bias (n=3, 7%), lack of new-user design (n=2, 5%) or active comparator design (n=0). Few tools address the appropriateness of statistical analyses (n=15, 34%), methods for assessing internal (n=15, 34%) or external validity (n=11, 25%) and statistical uncertainty in the findings (n=21, 48%). None of the reviewed tools investigated all the methodological domains and subdomains.
CONCLUSIONS: The acknowledgement of major design-specific sources of bias (eg, lack of new-user design, lack of active comparator design, time-related bias, depletion of susceptibles, reverse causation) and statistical assessment of internal and external validity is currently not sufficiently addressed in most of the existing tools. These critical elements should be integrated to systematically investigate the validity of NRS on comparative safety and effectiveness of medications. SYSTEMATIC REVIEW PROTOCOL AND REGISTRATION: https://osf.io/es65q. © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical Disease Gene Species

Keywords: clinical pharmacology; epidemiology; public health; qualitative research; statistics & research methods

Year: 2021 PMID： 33762237 PMCID： PMC7993210 DOI： 10.1136/bmjopen-2020-043961

Source DB: PubMed Journal: BMJ Open ISSN： 2044-6055 Impact factor: 2.692

This is the first systematic review to investigate whether existing tools adequately assess the validity of non-randomised studies evaluating the comparative safety and effectiveness of medications. Assessment tools were identified by searching through multiple sources: relevant databases, grey literature, websites of authoritative organisations, bibliographies of previous systematic reviews and experts’ suggestions. The prepiloted framework adopted to evaluate the completeness of the tools included all the main methodological challenges suggested by an interdisciplinary (academia, industry and government agencies) and international team of experts in the field of pharmacoepidemiology and healthcare outcomes research. Tools not published in English or that could not be retrieved were omitted from this systematic review. The search for tools in the grey literature might not be comprehensive since it was performed through only one browser.

Introduction

There are high expectations that real-world data (RWD) and resultant real-world evidence (RWE) will become a key source of information for the development process of pharmacological or biological therapies.1–3 The 21st Century Cures Act and the sixth Prescription Drug User Fee Act required the Food and Drug Administration (FDA) to explore the use of RWE and, consequently, well-designed and conducted non-randomised studies (NRS) for expediting drug approvals.4 5 Similarly, one of the goals of the European Medicines Agency (EMA) Adaptive Pathways Initiative is to supplement clinical trial data with RWD and to eventually produce RWE as part of the approval process of new medications or indications.6 However, the growing demand for RWD has raised concerns about the reliability of NRS to generate RWE. Due to the inherent limitations of observational analyses, the validity of NRS depends largely on the implementation of complex design and analytic methodologies. In recent reports, both FDA and EMA emphasised the need to plan and execute NRS following standards that can ensure validity and reproducibility of RWE.7 8 Tools that assess the validity of NRS can be useful instruments for both researchers (eg, for authors and reviewers to prevent publication of poor quality pharmacoepidemiological research) and other stakeholders who are involved in clinical, managemental or economic decision making (eg, to correctly inform guidelines and clinicians or to guide resource allocation). An analysis on the capability of existing tools to assess the validity of NRS of comparative safety and effectiveness of medications is currently lacking. Previously published systematic reviews on assessment tools for NRS were mostly descriptive and did not provide a critical evaluation of the tools content,9–13 investigated only a specific type of bias14 or focused only on safety outcomes.15 Therefore, we conducted a systematic review to assess the content of eligible tools for NRS of medications. There is no agreement on an assessment framework for NRS of pharmacological interventions. Thus, we performed a Delphi survey among international experts in the field of pharmacoepidemiology and health outcome research in order to build consensus for the methodological challenges that may threaten the validity of NRS of medications and that should be evaluated by assessment NRS tools. The main objective of this study was to determine whether the retrieved NRS tools sufficiently address the main methodological challenges recommended by the experts. This study is part of a research project to develop a framework for the synthesis of NRS and randomised controlled trials (RCTs),16 led by the Comparative Effectiveness Research Special Interest Group (CER SIG) of the International Society for Pharmacoepidemiology (ISPE).

Methods

The systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses statement.17 Systematic review protocol and registration are available at https://osf.io/es65q.

Systematic search and eligibility criteria

We searched PubMed and Embase from inception to November 2019 to identify existing tools that investigated the validity of NRS, specifically case–control and cohort design studies. We excluded guidelines or manuals, tools to review study protocols, tools targeting NRS of non-pharmacological interventions (eg, surgery) or assessing only one or a few specific types of bias, and tools not available in English language. In parallel, we searched the same electronic databases for systematic reviews of assessment tools of NRS. We then extracted the references of the tools included in the systematic reviews retrieved. We also performed a general search through Google for grey literature and reviewed any additional information from initiatives, programmes or organisations. Full details on the search strategy are reported in the supplement (online supplemental tables S1 and S2). Three reviewers (ED, GS and LV) independently removed duplicates and reviewed titles and abstracts of peer-reviewed publications or documents from the grey literature to select eligible tools. Discrepancies were resolved by consensus.

Delphi survey and prespecified framework

Concurrently, we performed a Delphi survey18 to reach a consensus among content experts about the main methodological challenges (domains) that may threaten the validity of NRS on comparative safety and effectiveness of medications. The survey is available in the online supplemental 2. The panel of experts involved members of the SIG for CER of the ISPE. Detailed information on the Delphi methods and results is reported in the online supplemental 1. Domains and subdomains indicated by the Delphi respondents as major elements that can impact the validity of NRS of medications were used to develop and pilot a framework to evaluate the identified NRS tools. All domains were considered equally important. A glossary of terms used in the framework is reported in table 1.

Table 1

Glossary of terms

Term	Definition
Active comparator design	A study design that compares the effect of the drug of interest with another drug used in clinical practice instead of non-use.
Adjustment for causal intermediaries	Adjustment for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome.
Case–control design	A study design in which cases (patients with outcomes) are identified and compared with controls (patients without outcomes) with respect to the exposure of interest.
Cohort design	A study design in which a group of patients (a cohort) is identified and followed to ascertain the occurrence of an outcome.
Confounding	A mixing of effects that arises when patients with different baseline risks are compared; the resulting effect measure is a mix of drug effects and risk factor effects.
Depletion of susceptibles	Selection bias that occurs when the initiation of exposure to a drug is associated with an early increased incidence rate of the study outcome, followed by a decreased incidence rate with longer duration of exposure (eg, users of new drugs are compared with users of older drugs).
Detection or surveillance bias	Bias that occurs when the degree of outcome surveillance (or an associated symptom) is related to exposure and is differential among the exposure groups.
Immortal time bias	Time-related bias that derives from including a period of follow-up during which, by design, outcomes cannot occur.
Time-window bias	Time-related bias, in the context of a case–control study nested in a cohort, that derives from the use of time-windows of different lengths between cases and controls to define time-dependent exposures.
Incorrect outcome model specification	Misspecification of a statistical model that leads to biased outcome results. Common causes are omission of a relevant variable, inclusion of an unnecessary variable, adopting the wrong functional form, incorrect specification of the error term, uncertainty about what the true model is and reciprocal causation.
Loss to follow-up bias	Bias that occurs when there is difference in retention during the follow-up period after enrolment that are related to exposure status and outcome.
New-user design	A study design that starts following patients at the time they initiate a new drug (also known as incident-user design)
Non-contemporaneous comparator bias	Bias generated by differences in the timing of selection of comparator group(s) within a study influence exposures and outcomes resulting in biased estimates.
Reverse causation (or reverse causality)	Bias due to direction of cause and effect contrary to a common presumption, or a two-way causal relationship between exposure and outcome.
Recall bias	Bias that occurs when participants do not remember previous events or experiences accurately or omit details (not for claims-based studies).
Selection bias	Bias that occurs when selection of participants or follow-up time is related to both intervention and outcome (eg, prevalent users of a drug are compared with non-users or incident users). Our framework has a separate subdomain that refers to selection bias due to lack of generalisability, applicability or transferability to patients who were excluded from the study.

Glossary of terms

Data extraction

Two reviewers (ED and LV) independently extracted general information of the identified tools (first author or name of the tool, year of publication or online availability of the most updated version, type of tool, scope of the tool, NRS designs evaluated and number of items) and content data related to the prespecified domains of the framework. Discrepancies were resolved by consensus. We categorised the tools as checklists, defined as itemised instruments (including questionnaires) developed to identify the presence or absence of critical elements, or rating scales, defined as itemised instruments aimed to identify the performance of a study at each critical element described in the tool, using a qualitative or quantitative scale.

Data synthesis

General characteristics of the identified tools were summarised with means and SD, for continuous variables, and relative frequencies, for categorical variables. The findings from the Delphi survey and the proportion of tools assessing the prespecified elements of the framework were reported in terms of relative frequencies.

Results

Overview of tools

Of 44 tools that met our eligibility criteria,19–52 20 (45%) were identified through the database search of peer-reviewed literature and 24 (55%) through the general online search and other sources (online supplemental figure S1 and table S3). Characteristics of the tools are shown in tables 2 and 3. The number of items across all tools ranged from 5 to 54, with a median of 13.5 (IQR 10.3–22). Only three tools were designed to specifically address studies on the comparative safety and effectiveness of pharmacological interventions: one published in 1994 by Cho and Bero,46 the The Good ReseArch for Comparative Effectiveness (GRACE) checklist and the International Society for Pharmacoeconomics and Outcomes Research – Academy of Managed Care Pharmacy – National Pharmaceutical Council (ISPOR-AMCP-NPC) tool, both published in 2014.25 26 Individual characteristics of the tools included in the systematic review Nine tools from our bibliographic search provided two separate instruments to assess cohort or case–control studies. Thus, the overall number of included records is 35, while the number of included assessment tools is 44. *Tool name or first author name, if the tool does not have an assigned name, and it was published in peer-review journals. †Tool developed to assess NRS on the comparative safety and effectiveness of medications. CASP, The Critical Appraisals Skills Programme; CC, case–control study; COEH, Centre for Occupational and Environmental Health of The University of Manchester; Coh, cohort study; CSS, cross-sectional study; EPHPP, Effective Public Health Practice Project Quality Assessment Tool; GRACE, The Good ReseArch for Comparative Effectiveness; HEBW, Health Evidence Bulletins Wales; ISPOR-AMCP-NPC, International Society for Pharmacoeconomics and Outcomes Research – Academy of Managed Care Pharmacy – National Pharmaceutical Council; JBI, The Joanna Briggs Institute; MMAT, Mixed Methods Appraisal Tool; NIH–NHLBI, The National Institute of Health - The National Heart, Lung, and Blood Institute; NRS, non-randomised studies; RAMboMAN, GATE-EPIQ, Recruitment Allocation Maintenance blind objective Measurements Analyses, Graphic Approach To Epidemiology – Effective Practice, Informatics and Quality Improvement; RCTs, randomised controlled trials; RELEVANT, The REal Life EVidence AssessmeNt Tool; RoBANS, Risk of Bias Assessment tool for Non-randomized Studies; ROBINS-I, Risk Of Bias In Non-randomized Studies of Interventions; RTI-Item Bank, Research Triangle Institute Item Bank; SIGN, The Scottish Intercollegiate Guidelines Network; STROBE, STrengthening the Reporting of OBservational studies in Epidemiology; SURE, Specialist Unit for Review Evidence; TREND, Transparent Reporting of Evaluations with Non-randomized designs. General characteristics of the assessment tools included in the systematic review *Two tools evaluated both cohort and RCTs together; one tool evaluated both cohort and cross-sectional studies together. †NRS tools refer to a single tool built to evaluate both cohort and case–control studies or a tool built to evaluate additional NRS (eg, cross-sectional studies and before–after studies) together with cohort and case–control studies. Eight NRS tools included also the evaluation of RCTs. CER, Comparative Effectiveness research; NRS, non-randomised studies; RCTs, randomised controlled trials.

Tool formats and scopes

Most of the tools were checklists (n=35, 80%), and 13 checklists included a final section to elaborate a summary judgement of the study appraisal (37%). The remaining tools were scales (n=9, 20%), and six of them provided a section for a summary judgement (67%). Thirty-five tools (80%) were designed as critical appraisal tools for different scopes (eg, assessing the quality of NRS included in a systematic review, screening eligible NRS to include in systematic reviews to support clinical guidelines, supporting peer-review processes or, more general, allowing readers to interpret NRS results critically). Four tools (9%) were developed to assess the quality of reporting and were mainly intended for researchers. Five other tools (11%) combined elements of both critical appraisal and quality reporting and were for a more general audience (both researchers and readers) (tables 2 and 3).

Study designs addressed

Twenty-one tools (48%) were developed to assess multiple NRS designs (11 targeted cohort and case–control studies and 10 others addressed also other NRS designs or did not specify them). Other tools specifically addressed case–control (n=12, 27%) or cohort studies (n=11, 25%). Ten tools (23%) were designed to assess also RCTs.

Tool elements

The response rate of the Delphi survey was 73% (35 respondents out of 48 members). Detailed results are reported in the online supplemental figure S2. Domains and subdomains indicated by the respondents as major elements that can impact the validity of NRS of medications are reported in the first column of table 4.

Table 4

Methodological challenges addressed by the included assessment tools

Domains	Cohort tools*,n=11	Case–control tools, n=12	NRS tools†,n=21	Total,n=44
1. Methods for selecting participants, n (%)	11 (100)	12 (100)	20 (95)	43 (98)
Sampling strategies to correct selection bias	4 (36)	6 (50)	9 (42)	19 (43)
Inclusion and exclusion criteria of target population	6 (55)	8 (67)	13 (61)	27 (61)
Depletion of susceptibles	1 (9)	0 (0)	0 (0)	1 (2)
External validity of target population	6 (55)	6 (50)	9 (43)	21 (48)
Others‡	11 (100)	12 (100)	18 (86)	41 (93)
2. Measurement of exposure, outcomes, covariates and follow-up, n (%)§	11 (100)	12 (100)	19 (90)	42 (95)
Measurement of exposure§	11 (100)	11 (92)	18 (81)	40 (91)
Measurement of outcomes§	11 (100)	11 (92)	18 (81)	40 (91)
Measurement of covariates	4 (36)	4 (33)	4 (19)	12 (27)
Measurement of follow-up	9 (82)	3 (25)	5 (24)	17 (39)
3. Design-specific sources of bias, n (%)	11 (100)	10 (83)	19 (90)	40 (91)
New-user design	0 (0)	0 (0)	2 (10)	2 (5)
Active comparator design	0 (0)	0 (0)	0 (0)	0 (0)
Immortal time bias or time-window bias	0 (0)	0 (0)	3 (14)	3 (7)
Detection or surveillance bias	1 (9)	2 (17)	1 (5)	4 (9)
Loss to follow-up bias	9 (82)	1 (8)	12 (57)	22 (50)
Non-contemporaneous comparator bias	0 (0)	1 (8)	5 (24)	6 (14)
Reverse causation	5 (45)	1 (8)	2 (10)	8 (18)
Recall bias¶	1 (9)	4 (33)	1 (5)	6 (14)
Interviewer or observer bias¶	1 (9)	3 (25)	7 (35)	11 (25)
Ascertainment bias¶	0 (0)	1 (8)	1 (5)	2 (5)
General item/question on bias¶	3 (27)	3 (25)	3 (14)	9 (20)
Other biases**	0 (0)	2 (17)	5 (24)	7 (16)
4. Confounding, n (%)	11 (100)	11 (92)	18 (86)	40 (91)
Study design used to minimise confounding	6 (55)	7 (58)	13 (62)	26 (59)
Confounders measured and included in statistical analyses	10 (91)	10 (83)	18 (86)	38 (86)
Potential unmeasured confounding addressed in the analysis (eg, proxy analysis and IV analysis)	1 (9)	1 (8)	3 (14)	5 (11)
5. Lack of appropriateness of statistical analyses (with specific mention of overadjustment and/or incorrect outcome model specification), n (%)	2 (18)	3 (25)	10 (48)	15 (34)
6. Methods for assessing statistical uncertainty in the findings (eg, CIs reported for each analysis), n (%)	7 (64)	6 (50)	8 (38)	21 (48)
7. Methods for assessing internal validity (eg, sensitivity analysis addressing potential confounding, measurement errors or other biases), n (%)	3 (27)	3 (25)	9 (43)	15 (34)
8. Methods for assessing external validity (eg, post hoc subgroup analysis and comparison with other populations), n (%)	4 (36)	3 (25)	4 (19)	11 (25)

*Two tools evaluated both cohort and RCTs together; one tool evaluated both cohort and cross-sectional studies together.

†NRS tools refer to a single tool built to evaluate both cohort and case–control studies or a tool built to evaluate additional NRS (eg, cross-sectional studies and before–after studies) together with cohort and case–control studies. Eight NRS tools included also the evaluation of RCTs.

‡'Others’ refers to items not included in our evaluation framework but included in the reviewed tools to investigate selection bias (eg, population characteristics sufficiently described to determine the applicability of the research question, sample size justification and power description, and ethical considerations).

§Items or questions on exposure misclassification and/or outcome misclassification are counted in this domain and relative subdomains.

¶Design-specific biases not included in the evaluation framework but addressed by the reviewed tools.

**Other design-specific biases not included in the evaluation framework but addressed by a few tools (eg, bias due to missing data, patients' blinding, different length of follow-up between groups, Berkson’s bias and protopathic bias).

IV, instrumental variable; NRS, non-randomised studies; RCTs, randomised controlled trials.

Methodological challenges addressed by the included assessment tools *Two tools evaluated both cohort and RCTs together; one tool evaluated both cohort and cross-sectional studies together. †NRS tools refer to a single tool built to evaluate both cohort and case–control studies or a tool built to evaluate additional NRS (eg, cross-sectional studies and before–after studies) together with cohort and case–control studies. Eight NRS tools included also the evaluation of RCTs. ‡'Others’ refers to items not included in our evaluation framework but included in the reviewed tools to investigate selection bias (eg, population characteristics sufficiently described to determine the applicability of the research question, sample size justification and power description, and ethical considerations). §Items or questions on exposure misclassification and/or outcome misclassification are counted in this domain and relative subdomains. ¶Design-specific biases not included in the evaluation framework but addressed by the reviewed tools. **Other design-specific biases not included in the evaluation framework but addressed by a few tools (eg, bias due to missing data, patients' blinding, different length of follow-up between groups, Berkson’s bias and protopathic bias). IV, instrumental variable; NRS, non-randomised studies; RCTs, randomised controlled trials.

Methods for selecting participants

Nearly all tools assessed methods for selecting study participants to correct selection bias (n=43, 98%). Specifically, almost half of the tools included items related to sampling strategies (n=19, 43%), the definition of inclusion and exclusion criteria (n=27, 61%) and the generalisability of participants (ie, attempts to achieve a sample of participants that represents the target population) (n=21, 48%), while only one tool addressed the depletion of susceptibles (n=1, 2%) (table 4 and online supplemental figure S3).

Measurement of exposure, outcomes, covariates and follow-up

Forty-two tools (95%) had at least one item assessing the definition and measurement of exposure, outcome, covariates and follow-up. Assessment of exposure and outcome was widely reported by the tools (n=40, 91%), while definition and measurement of covariates (n=12, 27%) or follow-up (n=17, 39%) were less often addressed (with the exception for tools addressing follow-up in cohort studies only, n=9, 82%) (table 4 and online supplemental figure S4).

Design-specific sources of bias

Design-specific sources of bias (excluding selection bias which was investigated in ‘Methods for selecting participants’) were assessed by 91% of the tools (n=40) and generally included loss to follow-up bias (n=22, 50%), observer or interviewer bias (n=11, 25%), reverse causation bias (n=8, 18%), recall bias (n=6, 14%) and non-contemporaneous comparator bias (n=6, 14%). A few or no tools assessed detection or surveillance bias (n=4, 9%), time-related bias, such as immortal person-time bias or time-window bias (n=3, 7%), and biases due to lack of new-user design (n=2, 5%) or active comparator design (n=0). Other tools reported only a general item/question on the risk of bias (n=9, 20%), without any reference to a specific type of bias. Tools specifically for cohort studies addressed more frequently loss to follow-up (n=9, 82%) and reverse causation biases (n=5, 45%) compared with the other tools, while tools for case–control studies addressed mostly recall (n=4, 33%) and observer biases (n=3, 25%). Tools for multiple NRS covered commonly loss to follow-up (n=12, 57%) and interviewer or observer biases (n=7, 35%) (table 4 and online supplemental figure S5).

Confounding

Forty tools (91%) included at least one item or question related to confounding. Specifically, 26 tools (59%) searched whether study design was planned in a way to minimise confounding, 38 (86%) whether confounders were measured and included in the analyses and only five whether potential unmeasured confounding was assessed in the sensitivity analyses (11%) (table 4 and online supplemental figure S6).

Appropriateness of statistical analyses, external and internal validity

One-third of the tools (n=14, 32%) assessed the appropriateness of statistical analyses, although most of them did not explicitly mention overadjustment of causal intermediates and/or incorrect outcome model specification. Almost half (n=21, 48%) included methods for measuring uncertainty in the findings. Few tools addressed methods for evaluating internal (n=15, 34%) or external validity (n=11, 25%) (table 4 and online supplemental figure S7 in the online supplement). These results were mostly consistent across the three different types of design addressed, cohort only, case–control only and multiple NRS, except for the assessment of follow-up (domain 2) and several design-specific sources of biases (domain 3) already mentioned above (table 4). None of the reviewed tools covered all the main domains and subdomains as identified by the CER SIG and listed in table 4. Results for each selected tool on the proportions of items/questions that investigate the prespecified domains are shown in the online supplemental figures S8–S11.

Discussion

In this systematic review, we identified assessment tools evaluating the validity of NRS on comparative safety and effectiveness of medications. Of 44 tools included, only three were specifically designed to assess NRS of pharmacological interventions.25 26 46

Main findings

Overall, we found that existing tools assessed most of the methodological challenges identified by the domains of the CER SIG framework, but critical elements were often insufficiently addressed. For example, although many tools assessed the risk of selection bias, only half of them explicitly investigated sampling strategies and considered a prespecification of inclusion/exclusion criteria. Even more surprising was that only one tool explored the potential for selection bias due to depletion of patients that are susceptible to the outcome. This cohort-based phenomenon can occur when new users of a medication are depleted of all susceptible subjects to the outcome, documenting an increased incidence rate of the outcome in an early stage, followed by a decreased rate with a longer duration of exposure.53 Depletion of susceptibles is an important source of bias to account for when evaluating effects of new medications in incident users and can significantly undermine the validity of the results.53 Similarly, many tools investigated misclassification or information bias of exposure and outcome. However, only about one-third assessed definition and measurement of covariates, and less than one-fourth of the case–control and multiple NRS designs tools assessed information on follow-up definition. Again, these are common causes of bias and should be integrated in tools that investigate the validity of NRS. Design-specific sources of bias was a critical domain. Although overall 91% of the tools had at least one item/question investigating biases due to an inappropriate study design, only Risk Of Bias In Non-randomized Studies of Interventions (ROBINS-I) and the GRACE checklist addressed bias due to lack of new-user design and time-related bias (ie, immortal person-time bias or time-window bias), while no tools investigated bias due to lack of active comparator design. Since these biases can independently lead to major methodological flaws (defined as elements that by themselves can significantly compromise the validity of the results), their assessment must be included in appraisal tools for NRS of pharmacological interventions. For example, recent evidence on NRS of glucose-lowering medications reported that only one-fourth of the studies adopted a new-user design and less than half used an active comparator.54 In the same example, potential for time-related bias was detected in more than two-third of the studies.54 Integrating the evaluation of these major methodological flaws in existing tools and recommending the use of these tools before publication can increase awareness in the clinical research community with respect to main design-specific biases. This can ultimately decrease the amount of NRS with invalid findings on the safety or effectiveness of medications. A high percentage of tools evaluated whether confounders were appropriately measured, controlled for in the analysis and considered in the study design. However, very few tools included at least one item/question on whether potential unmeasured confounding had been considered in the analysis or interpretation of findings. One-third of the tools checked the appropriateness of statistical analyses, but most omitted specific reference to common flaws such as overadjustment or incorrect outcome model specification. Similarly, only one-third of the tools assessed internal validity (eg, through sensitivity analysis to address potential confounding, measurement errors or other biases), and only one-fourth assessed external validity (eg, post hoc subgroup analysis and comparison with other populations).

Implications for practice and research

While recently published tools such as The Critical Appraisals Skills Programme checklist,21 ISPOR-AMCP-NPC,25 Recruitment Allocation Maintenance blind objective Measurements Analyses,19 GRACE26 and ROBINS-I24 are among the most complete tools, addressing several of the critical elements underlined by the ISPE CER SIG, they all had limitations in the acknowledgement of two or more major methodological challenges (eg, selection bias for depletion of susceptibles, immortal-time bias or window-time bias, lack of new-user design, lack of active comparator design, reverse causation bias and adjustment for causal intermediaries). Assessment tools can be powerful instruments for researchers, authors, reviewers of scientific journals or readers, helping to identify the main limitations of a study and to correctly interpret the results, to acknowledge major methodological flaws and, ultimately, to prevent publication of studies with invalid findings. Furthermore, other decision makers, such as clinicians, guideline developers and payers or investors, can benefit from instruments that help to ensure the validity of NRS findings. RCTs can be an insufficient source of evidence for decisions on pharmaceutical interventions.55 56 Despite well-designed and adequately powered RCTs being considered the ‘gold standard’ of the clinical research paradigm, they can often be too time intensive and money intensive. Trials are often relatively small, focus on short-term efficacy and safety in a controlled clinical environment, using surrogate outcomes or under-representing high-risk populations that can be most likely the target on the new medications in the real-world setting.55 56 Trials might also not record treatments taken outside the study protocol.47 Additionally, patients volunteered to participate in a trial are usually very motivated and so more adherent to therapy compared with the real-world population.56 NRS based on RWD can help to address these issues and could be supplement the evidence from RCTs to provide a more complete picture on the effectiveness of pharmaceutical interventions in less controlled environments. NRSs have the advantages to investigate large-scale populations, high-risk subpopulations, rare exposures, diseases or outcome, and long-term outcomes or other delayed health effects at low costs and rapidly.55 56 Moreover, since RWD are often collected for intents unrelated to research objectives (mainly administrative), biases such as recall bias, interviewer bias, non-response bias and bias for loss to follow-up are reduced or eliminated.55 Thus, since RWE derived by NRS contribute significantly to generate evidence of comparative effectiveness research of medications, our synthesis can help numerous stakeholders to evaluate whether the NRS considered are valid enough to guide decision making. Although checklists have been previously suggested for reviewing the risk of bias of general NRS,57 we cannot strongly recommend a specific tool for NRS on comparative analyses of medications. As already mentioned, items or questions that address all those methodological flaws must be integrated in the existing tools. Based on our findings, most recent and comprehensive tools such as ROBINS-I24 and GRACE26 assessed a higher number of major methodological elements and could therefore be prioritised in this endeavour.

Strengths and limitations

To our knowledge, this is the first systematic review that investigated whether existing tools adequately assess the validity of cohort and case–control studies evaluating the comparative safety and effectiveness of medications. Previously published systematic reviews on assessment tools for NRS were not specifically focused on pharmacological interventions,9 10 included randomised study designs11–13 or investigated only a specific type of bias.14 One systematic review of NRS tools for medications focused only on safety outcomes, and it is now outdated since published in 2012.15 Our systematic review has multiple strengths: authors reviewed the results of the searches independently following a predefined protocol; the framework for data extraction was developed based on inputs of worldwide experts in the field of pharmacoepidemiology and healthcare outcomes research coming from different backgrounds (academia, industry and governmental agency) and different countries, and it included the most updated versions of the identified tools. This review also has limitations. Search for tools in the grey literature might not be comprehensive since it was performed through only one browser. The search was also restricted to tools published in English and excluded identified tools that could not be retrieved.

Conclusion

In this systematic review, we found that available tools for NRS assessment failed to provide a comprehensive assessment of major methodological aspects that can affect the validity of NRS on the comparative safety and effectiveness of medications. Specifically, major aspects such as lack of new-user design, active comparator design, time-related bias (ie, immortal time bias and time-window bias) and statistical assessment of internal validity remain poorly covered. Including these critical elements into existing tools may provide a more accurate instrument to evaluate NRS of pharmacological interventions and increase awareness in the clinical research community about major addressable flaws in pharmacoepidemiology. This may improve the validity of NRS on the comparative safety and effectiveness of medications and reduce the publication of studies with unreliable findings.

Table 2

Individual characteristics of the tools included in the systematic review

Tool identified*	Year	Type of tool	Scope of the tool	Study design evaluated	tems
RELEVANT	2019	Checklist	Critical appraisal and reporting	NRS	21
RAMboMAN - GATE-EPIQ	2019	Rating scale+summary judgement	Critical appraisal	Coh (+RCTs), CC	Coh (+RCTs) 21, CC 18
MMAT	2018	Checklist	Critical appraisal	NRS	5
CASP	2018	Checklist	Critical appraisal	Coh, CC	Coh 12, CC 11
SURE	2018	Checklist+summary judgement	Critical appraisal	Coh, CC	Coh 13, CC 11
JBI	2017	Checklist+summary judgement	Critical appraisal	Coh, CC	Coh 11, CC 10
ROBINS-I	2016	Checklist+summary judgement	Critical appraisal	NRS	34 (+8 optional question)
ISPOR-AMCP-NPC†	2014	Checklist+summary judgement	Critical appraisal	Coh CC	32
GRACE†	2014	Checklist+summary judgement	Critical appraisal	Coh CC	11
NIH–NHLBI	2014	Checklist+summary judgement	Critical appraisal	Coh (+CSS), CC	Coh (+CSS) 14, CC 12
HEBW	2014	Checklist+summary judgement	Critical appraisal	Coh	18
RoBANS	2013	Rating scale	Critical appraisal	NRS	6
RTI-Item Bank	2013	Checklist	Critical appraisal	NRS	13
Newcastle-Ottawa	2013	Rating scale +summary judgement	Critical appraisal	Coh, CC	Coh 8, CC 8
SIGN - V.3.0	2012	Checklist+summary judgement	Critical appraisal	Coh, CC	Coh 14, CC 11
Montreal	2011	Checklist	Critical appraisal	Coh CC (+RCTs)	10
EPHPP	2011	Rating scale	Critical appraisal	Coh CC (+RCTs)	17
STROBE – V.4	2007	Checklist	Reporting	Coh, CC	Coh 22, CC 22
TREND	2004	Checklist	Reporting	NRS	22
Margetts	2002	Checklist	Reporting	Coh CC	11
Zaza	2000	Checklist	Critical appraisal	Coh CC	15
Downs-Black	1998	Rating scale	Critical appraisal and reporting	Coh CC (+RCTs)	27
Elwood	1998	Checklist	Critical appraisal	Coh CC (+RCTs)	20
Hadorn	1996	Checklist	Critical appraisal	Coh (+RCTs)	7
London	1996	Checklist	Critical appraisal	Coh CC	33
Avis	1994	Rating scale+summary judgement	Critical appraisal and reporting	Coh CC (+RCTs)	24
Durant	1994	Checklist	Critical appraisal	CC	23
Levine	1994	Checklist	Critical appraisal	Coh CC (+RCTs)	10
Gyorkos	1994	Checklist	Critical appraisal	Coh, CC	Coh 6, CC 5
Cho†	1994	Rating scale+summary judgement	Critical appraisal	NRS (+RCTs)	24
COEH	1991	Checklist	Critical appraisal	NRS	54
Fowkes-Fulton	1991	Checklist+summary judgement	Critical appraisal and reporting	NRS (+RCTs)	6
Lichtenstein	1987	Checklist	Critical appraisal	CC	20
Gardner	1986	Checklist	Critical appraisal	NRS	12
Horwitz	1979	Checklist	Critical appraisal and reporting	CC	12

Nine tools from our bibliographic search provided two separate instruments to assess cohort or case–control studies. Thus, the overall number of included records is 35, while the number of included assessment tools is 44.

*Tool name or first author name, if the tool does not have an assigned name, and it was published in peer-review journals.

†Tool developed to assess NRS on the comparative safety and effectiveness of medications.

CASP, The Critical Appraisals Skills Programme; CC, case–control study; COEH, Centre for Occupational and Environmental Health of The University of Manchester; Coh, cohort study; CSS, cross-sectional study; EPHPP, Effective Public Health Practice Project Quality Assessment Tool; GRACE, The Good ReseArch for Comparative Effectiveness; HEBW, Health Evidence Bulletins Wales; ISPOR-AMCP-NPC, International Society for Pharmacoeconomics and Outcomes Research – Academy of Managed Care Pharmacy – National Pharmaceutical Council; JBI, The Joanna Briggs Institute; MMAT, Mixed Methods Appraisal Tool; NIH–NHLBI, The National Institute of Health - The National Heart, Lung, and Blood Institute; NRS, non-randomised studies; RAMboMAN, GATE-EPIQ, Recruitment Allocation Maintenance blind objective Measurements Analyses, Graphic Approach To Epidemiology – Effective Practice, Informatics and Quality Improvement; RCTs, randomised controlled trials; RELEVANT, The REal Life EVidence AssessmeNt Tool; RoBANS, Risk of Bias Assessment tool for Non-randomized Studies; ROBINS-I, Risk Of Bias In Non-randomized Studies of Interventions; RTI-Item Bank, Research Triangle Institute Item Bank; SIGN, The Scottish Intercollegiate Guidelines Network; STROBE, STrengthening the Reporting of OBservational studies in Epidemiology; SURE, Specialist Unit for Review Evidence; TREND, Transparent Reporting of Evaluations with Non-randomized designs.

Table 3

General characteristics of the assessment tools included in the systematic review

Characteristics	All,n=44	Cohort*,n=11	Case–control,n=12	NRS†,n=21
Publication year, n (%)
1979–1989	3 (7)	0 (0)	2 (17)	1 (5)
1990–1999	12 (27)	2 (18)	2 (17)	8 (38)
2000–2009	5 (11)	1 (9)	1 (8)	3 (14)
2010–2019	24 (55)	8 (73)	7 (58)	9 (43)
Type of tool, n (%)
Checklist	22 (50)	4 (36)	6 (50)	12 (57)
Checklist+summary judgement	13 (30)	5 (45)	4 (33)	4 (19)
Rating scale	3 (7)	0 (0)	0 (0)	3 (14)
Rating scale+summary judgement	6 (14)	2 (18)	2 (16)	2 (9)
Scope of the tool, n (%)
Critical appraisal	35 (80)	9 (81)	10 (83)	16 (76)
Reporting	4 (9)	2 (18)	1 (8)	1 (5)
Critical appraisal and reporting	5 (11)	0 (0)	1 (8)	4 (19)
Tools designed for CER, n (%)	3 (7)	0 (0)	0 (0)	3 (14)
Number of items, median (IQR)	13 (10.3–21.8)	13 (9.5–16)	11.5 (10.8–18.5)	17 (11–24)

*Two tools evaluated both cohort and RCTs together; one tool evaluated both cohort and cross-sectional studies together.

CER, Comparative Effectiveness research; NRS, non-randomised studies; RCTs, randomised controlled trials.

35 in total

Review 1. Research guidelines for the Delphi survey technique.

Authors: F Hasson; S Keeney; H McKenna
Journal: J Adv Nurs Date: 2000-10 Impact factor: 3.187

2. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement.

Authors: Don C Des Jarlais; Cynthia Lyles; Nicole Crepaz
Journal: Am J Public Health Date: 2004-03 Impact factor: 9.308

3. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions.

Authors: S H Downs; N Black
Journal: J Epidemiol Community Health Date: 1998-06 Impact factor: 3.710

4. Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity.

Authors: Soo Young Kim; Ji Eun Park; Yoon Jae Lee; Hyun-Ju Seo; Seung-Soo Sheen; Seokyung Hahn; Bo-Hyoung Jang; Hee-Jung Son
Journal: J Clin Epidemiol Date: 2013-01-18 Impact factor: 6.437

5. A questionnaire to assess the relevance and credibility of observational studies to inform health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report.

Authors: Marc L Berger; Bradley C Martin; Don Husereau; Karen Worley; J Daniel Allen; Winnie Yang; Nicole C Quon; C Daniel Mullins; Kristijan H Kahler; William Crown
Journal: Value Health Date: 2014-03 Impact factor: 5.725

6. Real-World Evidence: What It Is and What It Can Tell Us According to the International Society for Pharmacoepidemiology (ISPE) Comparative Effectiveness Research (CER) Special Interest Group (SIG).

Authors: Hongbo Yuan; M Sanni Ali; Emily S Brouwer; Cynthia J Girman; Jeff J Guo; Jennifer L Lund; Elisabetta Patorno; Jonathan L Slaughter; Xuerong Wen; Dimitri Bennett
Journal: Clin Pharmacol Ther Date: 2018-05-07 Impact factor: 6.875

Review 7. Critical appraisal of nonrandomized studies-A review of recommended and commonly used tools.

Authors: Joan M Quigley; Juliette C Thompson; Nicholas J Halfpenny; David A Scott
Journal: J Eval Clin Pract Date: 2018-02-27 Impact factor: 2.431

8. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions.

Authors: Jonathan Ac Sterne; Miguel A Hernán; Barnaby C Reeves; Jelena Savović; Nancy D Berkman; Meera Viswanathan; David Henry; Douglas G Altman; Mohammed T Ansari; Isabelle Boutron; James R Carpenter; An-Wen Chan; Rachel Churchill; Jonathan J Deeks; Asbjørn Hróbjartsson; Jamie Kirkham; Peter Jüni; Yoon K Loke; Theresa D Pigott; Craig R Ramsay; Deborah Regidor; Hannah R Rothstein; Lakhbir Sandhu; Pasqualina L Santaguida; Holger J Schünemann; Beverly Shea; Ian Shrier; Peter Tugwell; Lucy Turner; Jeffrey C Valentine; Hugh Waddington; Elizabeth Waters; George A Wells; Penny F Whiting; Julian Pt Higgins
Journal: BMJ Date: 2016-10-12

Review 9. A systematic review of the content of critical appraisal tools.

Authors: Persis Katrak; Andrea E Bialocerkowski; Nicola Massy-Westropp; Saravana Kumar; Karen A Grimmer
Journal: BMC Med Res Methodol Date: 2004-09-16 Impact factor: 4.615

Review 10. Trial designs using real-world data: The changing landscape of the regulatory approval process.

Authors: Elodie Baumfeld Andre; Robert Reynolds; Patrick Caubel; Laurent Azoulay; Nancy A Dreyer
Journal: Pharmacoepidemiol Drug Saf Date: 2019-12-10 Impact factor: 2.890

2 in total

1. How to assess applicability and methodological quality of comparative studies of operative interventions in orthopedic trauma surgery.

Authors: Kim Luijken; Bryan J M van de Wall; Lotty Hooft; Luke P H Leenen; R Marijn Houwert; Rolf H H Groenwold
Journal: Eur J Trauma Emerg Surg Date: 2022-07-09 Impact factor: 3.693

Review 2. Risk of bias in non-randomized observational studies assessing the relationship between proton-pump inhibitors and adverse kidney outcomes: a systematic review.

Authors: Pradeep Rajan; Kristy Iglay; Thomas Rhodes; Cynthia J Girman; Dimitri Bennett; Kamyar Kalantar-Zadeh
Journal: Therap Adv Gastroenterol Date: 2022-02-10 Impact factor: 4.409

2 in total