Literature DB >> 24349397

Appraisal tools for clinical practice guidelines: a systematic review.

Ulrich Siering¹, Michaela Eikermann², Elke Hausner¹, Wiebke Hoffmann-Eßer¹, Edmund A Neugebauer².

Abstract

INTRODUCTION: Clinical practice guidelines can improve healthcare processes and patient outcomes, but are often of low quality. Guideline appraisal tools aim to help potential guideline users in assessing guideline quality. We conducted a systematic review of publications describing guideline appraisal tools in order to identify and compare existing tools.
METHODS: Among others we searched MEDLINE, EMBASE and the Cochrane Database of Systematic Reviews from 1995 to May 2011 for relevant primary and secondary publications. We also handsearched the reference lists of relevant publications. On the basis of the available literature we firstly generated 34 items to be used in the comparison of appraisal tools and grouped them into thirteen quality dimensions. We then extracted formal characteristics as well as questions and statements of the appraisal tools and assigned them to the items.
RESULTS: We identified 40 different appraisal tools. They covered between three and thirteen of the thirteen possible quality dimensions and between three and 29 of the possible 34 items. The main focus of the appraisal tools were the quality dimensions "evaluation of evidence" (mentioned in 35 tools; 88%), "presentation of guideline content" (34 tools; 85%), "transferability" (33 tools; 83%), "independence" (32 tools; 80%), "scope" (30 tools; 75%), and "information retrieval" (29 tools; 73%). The quality dimensions "consideration of different perspectives" and "dissemination, implementation and evaluation of the guideline" were covered by only twenty (50%) and eighteen tools (45%) respectively.
CONCLUSIONS: Most guideline appraisal tools assess whether the literature search and the evaluation, synthesis and presentation of the evidence in guidelines follow the principles of evidence-based medicine. Although conflicts of interest and norms and values of guideline developers, as well as patient involvement, affect the trustworthiness of guidelines, they are currently insufficiently considered. Greater focus should be placed on these issues in the further development of guideline appraisal tools.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 24349397 PMCID： PMC3857289 DOI： 10.1371/journal.pone.0082915

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Clinical practice guidelines (hereafter referred to as “guidelines”) are defined by the Institute of Medicine as “statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options” [1]. Beyond that, guidelines are used for a variety of purposes, for example, as a means to measure and improve the quality of care, to resolve malpractice claims, to contribute to the development of clinical decision aids or to support policy makers in the allocation of healthcare resources [1]. There is evidence to suggest that, when rigorously developed, guidelines have the power to translate the complexity of scientific research findings and other evidence into recommendations for healthcare action [2-5]. Several studies have shown that guidelines can improve healthcare processes and patient outcomes. Grimshaw, Eccles, and Tetroe 2004 conducted a systematic review of the effectiveness and costs of various guideline development, dissemination and implementation strategies. The majority (86.6%) of the 235 studies included in their review reported improvements in health care [6,7]. Two other systematic reviews reported similar results [8,9]. However, all of the authors noted that the studies included were of low methodological quality. The AGREE Collaboration defines guideline quality as “the confidence that the potential biases of guideline development have been addressed adequately and that the recommendations are both internally and externally valid, and are feasible for practice” [10]. This definition has been widely adopted in the scientific literature [11,12]. Studies investigating the methodological quality of guidelines have often reported low quality and no, or only modest, improvement in quality over time [13-17]. Potential deficits of guidelines include: conflicting recommendations [18-26], insufficient consideration of relevant patient characteristics (e.g., multimorbidity or ethnic differences) [27-30], low quality of the evidence underlying the recommendations [31-35], lack of transparency of methods applied by guideline developers, especially concerning the derivation of recommendations and the determination of their strength [1], inadequate management of potential conflicts of interest [36-41]. Several groups, such as the Guidelines International Network [42], the Institute of Medicine [1], the World Health Organization [43], the National Institute for Health and Clinical Excellence [44], the Scottish Intercollegiate Guidelines Network [45], many medical societies [46-51], as well as individual experts in the field [12,52-55], have proposed manuals defining standards for guideline developers in order to increase guideline quality. Overall, these manuals address the following key elements in the development process: establishment of a multidisciplinary guideline development group, consumer involvement, identification of clinical questions or problems, conduct of systematic searches and appraisal of the evidence retrieved, procedures for drafting recommendations, external consultation, and ongoing reviewing and updating [56]. Parallel to the production of manuals for the development of high-quality guidelines, tools for their appraisal have been developed. These tools aim to help potential guideline users to assess guideline quality. The AGREE II Instrument – the guideline appraisal tool used most often internationally – contains questions covering the areas (1) scope and purpose, (2) stakeholder involvement, (3) rigour of development, (4) clarity of presentation, (5) applicability, and (6) editorial independence [57]. Graham 2000 identified and compared guideline appraisal tools in a systematic review [58], which was updated by Vlayen in 2005 [59]. Vlayen identified 24 different tools containing questions that could be grouped into ten quality dimensions with 50 different items. Four of the 24 tools covered all of the guideline dimensions, but only four were validated and none assessed the evidence base of the clinical content of the guidelines. The authors stated that “the results of the search for evidence, the correct use of inclusion and exclusion criteria, and the critical appraisal of the retrieved evidence are not validated. Therefore, a major conclusion of this review is that in order to evaluate the quality of the clinical content and more specifically the evidence base of a clinical practice guideline, verification of the completeness and the quality of the literature search and its analysis has to be added to the process of validation by an appraisal instrument.” The aims of this systematic review were to identify and compare existing guideline appraisal tools to see if the landscape of tools had changed. This comparison can then be used to support decision-making by clinicians, patients and policy makers concerning the selection of the most appropriate tool, as well as to identify potential for improvement.

Methods

We searched for relevant primary and secondary publications (systematic and narrative reviews) in MEDLINE, EMBASE, the Cochrane Database of Systematic Reviews (Cochrane Reviews), the Database of Abstracts of Reviews of Effects (Other Reviews), the Health Technology Assessment Database (Technology Assessments), the NHS Economic Evaluation Database, and the Cochrane Methodology Register. The systematic search was limited to publications in German and English published after 1994. The search in all databases was performed in May 2011. The search strategy included, among others, the search terms “guideline”, “appraisal”, “guideline adherence”, “quality”, “evidence based” and “evaluation”. The full search strategy, which was developed by an information specialist (EH), is attached to this publication as online File S1. In addition, we scrutinized the reference lists of the relevant primary and secondary publications retrieved in the above search to identify further publications. We included articles with the following characteristics: Publication described the most recent version of an appraisal tool for clinical guidelines Availability of a full-text document (e.g., journal article or internet file). Articles were excluded that only described the content of guidelines, the guideline development process or the application of an appraisal tool already identified in another publication. Two reviewers (US, WHE) independently screened titles and abstracts of the retrieved citations to identify potentially eligible primary and secondary publications. The full texts were obtained and independently evaluated by the same two reviewers. Disagreements were resolved by consensus. Since the primary aim of this review was to identify existing guideline appraisal tools and to describe and compare their formal and content characteristics, no risk of bias assessment was conducted for the publications included. The content analysis was a two-stage process. The first stage involved the generation of items to be used in the comparison of appraisal tools by compilation of a list of all questions and statements from each of the tools included. These were grouped into common questions and statements and assigned to an item label. The items were then assigned to broader common categories, named quality dimensions, which were largely derived from Cluzeau et al. 1999 [60], Graham 2000 [58] and Vlayen 2005 [59]. The individual steps of the content analysis procedures were always conducted by one person (US) and checked by another (WHE). Disagreements were resolved by consensus. We identified 34 individual items and assigned them to thirteen quality dimensions (see Table 1 for detailed definitions).

Table 1

Quality dimensions and items for guideline appraisal.

Quality dimensions / Item label	Definition
1. Information retrieval
Health questions and outcomes	Description of clinical health questions and relevant outcomes of the guideline
Literature search	Search for literature and other evidence
Literature selection	Criteria used to include and exclude literature and other evidence
2. Evaluation of evidence
Grading of evidence	Grading of the evidence, which may or may not include a statement about the strength of evidence (LoE)
Consistency between evidence and recommendations	Studies results are reported correctly in the guideline and support the recommendations
3. Consideration of different perspectives
Norms and values	Discussion of influence of norms and values on guideline development
Expert knowledge	Evaluation of expert opinion and clinical experience
Patient perspectives	Consideration of views and preferences of the target population in the guideline development process
4. Formulation of recommendations
Formulation of recommendations	Methods used in formulating recommendations which may or may not include a statement about the strength of recommendations (GoR)
5. Transferability
Comparability	Patients, interventions and settings in the studies were comparable to those targeted by the recommendations
Costs	Consideration of resource implications of applying the recommendations
Barriers and facilitators	Description of barriers and facilitators to guideline application (compatibility of guideline with local norms and values; professional’s training, skill, and experience; availability of drugs or technology; local adaptation or modification of the guideline)
6. Presentation of guideline content
Benefits and harms	Presentation of health benefits, side effects, and harms of the recommended action
Link to evidence	Explicit link between the recommendations and the supporting evidence
7. Alternatives
Options for management	Presentation of alternative options for management of the condition or health issues
Exceptions	Description of situations in which guidelines may not apply
Patient preferences	Consideration of patient preferences in the application of guideline recommendations
8. Reliability
Independent Review	External peer review before publication
Pilot test	Pilot test of the guideline prior to release
9. Scope
Rationale and objective	Description of the rationale or reason for guideline development and description of the goal or objective of the guideline
Guideline topic	Topic, or health problem, or technology dealt with
Practice setting	Practice setting for which the guideline is intended
Patient population	Patient population for whom the guideline is intended
Provider population	Group of health care providers for whom the guideline is intended
10. Independence
Guideline development group	Individuals and/or disciplines, or occupations represented in the guideline development group and their function in the group
Guideline development organization and funding	Organization or group who developed the guideline and sources of funding
Conflicts of interest	Consideration of (potential) conflicts of interest related to the individuals developing the guideline
11. Clarity and presentation
Clarity	Clear wording of the guideline and the recommendations
Presentation	Easily identifiable recommendations (e. g., summarized in a box, bold text, underlined). Graphical description of the stages and decisions in clinical care (clinical algorithm).
12. Updating
Currentness	Currentness of the evidence of the guideline Date of issue of guideline and or date guideline becomes invalid
Scheduled review	Procedure for updating the guideline
13. Dissemination, Implementation, Evaluation
Dissemination	Distribution of the guideline to intended users
Implementation	Strategies to implement the guideline
Evaluation	Evaluation of the guideline and the adherence to the guideline once it has been implemented

Currentness of the evidence of the guideline For the second stage of the analysis, we (US, WHE) extracted the following information from each publication: (1) Formal characteristics of the appraisal tool. These included language, the use of existing appraisal tools for tool development, number of items and domains, possible answers, number of appraisers, calculation of domain scores and overall assessment, information on the development and validation of the appraisal tool, as well as publication in a journal. (2) Questions and statements of the appraisal tools. One reviewer (US) then assigned the questions and statements to the items identified during the first stage of the content analysis. A second reviewer (WHE) confirmed this step by once again checking the questions of each appraisal tool and the items to which they had been assigned. Disagreements were resolved by consensus. The numbers of quality dimensions and items covered by each appraisal tool were then compared. The review was not registered in advance, nor has a review protocol been published.

Results

Selection of publications

We retrieved 5164 references from bibliographic databases and screened 446 full texts. In addition, we retrieved 62 further publications from the reference lists of the relevant primary and secondary publications. We identified a total of 42 eligible publications describing 40 different guideline appraisal tools (Figure 1). Excluded publications are listed in online File S2. Relevant secondary publications are listed in online File S3.

Figure 1

Flow chart for selection of appraisal tools.

Description of Appraisal Tools

Table 2 shows the main formal characteristics of the 40 appraisal tools considered. 38 were published in English and two in German. 26 named at least one other publication that had influenced their developmentand ten named the AGREE Instrument [10]; other publications mentioned included those by Hayward 1995, Wilson 1995 and Field 1992 [61-63].

Table 2

Formal characteristics of guideline appraisal tools.

Appraisal tool	Language	Based on	Additional information on development of appraisal tool	Generic appraisal tool^a	Subject of assessment	Number of questions (Domains)	Explanation of questions	Rating scale / Multiple choice / Additional comments	Number of appraiser	Domain scores / Overall assessment	Validation	Publication in journal
ADAPTE 2009 [64]	EN	n. s.	yes	yes	GL / Rec.	43 (3)	SE	no / yes / no	n. s.	no / no	yes	no
AGREE II 2009 [57]	EN	[10]	yes	yes	GL	23 (6)	SE	yes / no / yes	2, better 4	yes / yes	yes	no
APWCA 2010 [81]	EN	n. s.	no	yes	GL	11 (-)	no	no / no / no	n. s.	no / no	no	yes
APA 2002[97]	EN	n. s.	no	yes	GL	47 (21)	SE	no / no / no	n. s.	no / no	no	yes
Baxter 2003 [82]	EN	[61,91,113]	no	yes	GL	12 (4)	SE	no / no / no	n. s.	no / no	no	yes
BÄK 1997 [83]	GE	n. s.	no	yes	GL	12 (-)	SE	no / no / no	n. s.	no / no	no	yes
Calder 1997 [107]	EN	[60-62,116,117]	no	yes	GL	26 (5)	no	no / yes / no	2	no / no	no	yes
Chong 2009 [66]	EN	[10,13]	no	yes	GL	11 (2)	no	yes^b / yes^c / no	2	no / no	yes	yes
Chou 2008[84,85]	EN	[10,14,118]	no	yes	GL	26 (5)	SE	no / no / no	n. s.	no / no	no	yes
Cluzeau 1999 [60]	EN	[63]	no	yes	GL	37 (3)	no	no / yes / no	n. s.	yes / yes	yes	yes
Cook 1998 [119]	EN	[61,62]	no	yes	GL	9 (3)	SE	no / no / no	n. s.	no / no	no	yes
DELBI 2008 [65]	GE	[10]	yes	yes	GL	34 (8)	CE	yes / no / no	2, better 4	yes / no	no	yes
Fields 2000 [86]	EN	[61,62]	no	yes	GL	8 (-)	no	no / no / no	n. s.	no / no	no	yes
Foy 2002 [87]	EN	[120,121]	yes	yes	Rec.	13 (-)	no	yes / no / no	n. s.	no / no	no	yes
Fretheim 2002 [98]	EN	[10,122]	no	yes	GL	8 (-)	CE	no / no / no	2	no / no	no	yes
GLIA 2011 [67]	EN	[10,13,60,63,120,121,123,124]	yes	yes	GL / Rec.	30 (9)	no	no / yes / yes	2	no / no	yes	no
Grilli 2000 [14]	EN	n. s.	no	yes	GL	3 (-)	CE	no / yes / no	2	no / no	yes	yes
Guyatt 2002 [88]	EN	n. s.	no	yes	GL	4 (-)	SE	no / no / no	n. s.	no / no	no	yes
Hargrove 2008 [108]	EN	[10,88,125-128]	no	yes	GL	18 (3)	SE	no / yes / yes	3	no / no	yes	yes
Hart 2002 [99]	EN	[10,13,14]	no	yes	GL	9 (-)	no	yes / no / no	n. s.	no / no	no	yes
Hasenfeld 2003 [100]	EN	[13]	no	yes	GL	30 (-)	no	no / yes / no	2	no / no	no	yes
Hayward 1995 [61,62]	EN	[129,130]	no	yes	Rec.	10 (3)	SE	no / no / no	n. s.	no / no	no	yes
Hindley 2005 [101]	EN	[10]	yes	yes	GL	18 (12)	no	yes / no / no	at least 2	no / yes	yes	yes
Kulig 2003 [102]	EN	[131]	yes	yes	GL	13 (3)	no	yes / no / no	2	yes / yes	yes	yes
Liddle 1996 [89]	EN	k. A	yes	yes	GL / Rec.	14 (3)	SE for 1 question	for some questions / for 1 question / yes	n. s.	no / no	yes	no
Linskey 2010 [90]	EN	n. s.	no	yes	GL	9 (-)	no	no / no / no	n. s.	no / no	no	yes
Marshall 2000 [91]	EN	[129,132,133]	no	yes	GL	9 (-)	SE	no / no / no	n. s.	no / no	no	yes
Mottur-Pilson 1995 [134]	EN	[63]	yes	yes	GL	18 (-)	no	no / yes / yes	n. s.	no / yes	no	yes
Nonino 2004 [92]	EN	[10,14]	no	yes	GL	6 (-)	no	no / no / no	n. s.	no / no	no	yes
Pentheroudakis 2008 [103]	EN	n. s.	no	(no)^d	GL	24 (4)	SE	no / no / no	n. s.	no / no	no	yes
Sanderlin 2007 [93]	EN	n. s.	no	yes	GL	5 (-)	SE	no / no / no	n. s.	no / no	no	yes
Sanders 2000 [104]	EN	[135]	no	yes	GL	15 (3)	no	yes / no / no	n. s.	yes / yes	no	yes
Savoie 2000 [105]	EN	[113,136]	no	(no)^d	GL	51 (15)	no	no / yes / yes	2	no / no	no	yes
Shaneyfelt 1999 [13]	EN	[116]	yes	yes	GL	25 (3)	no	no / yes / no	2	no / no	yes	yes
Shiffman 2003 [2]	EN	[63,131,137]	yes	yes	GL / Rec.	18 (-)	no	no / no / no	n. s.	no / no	no	yes
Veale 1999 [94]	EN	n. s.	no	yes	GL	7 (-)	no	no / no/ no	n. s.	no / no	no	yes
Ward 1996 [106]	EN	[63]	no	yes	GL	18 (8)	no	no / yes / no	n. s.	no / no	no	yes
Warriner 2011 [95]	EN	n. s.	no	yes	GL	11 (9)	SE	no / no/ no	n. s.	no / no	no	yes
WHO 2003 [43]	EN	n. s.	no	yes	GL	25 (8)	no	no / no / no	n. s.	no / no	no	no
Woolf 1995 [96]	EN	n. s.	no	yes	GL	10 (-)	no	no / no/ no	n. s.	no / no	no	yes

n. s. not specified; EN English; GE German; GL Guideline; Rec. Recommendation; SE: Some explanations; CE: Concrete explanations

a: A generic appraisal tool is a tool that can be used to appraise all kinds of clinical practice guidelines.b: For 4 of the 11 questions.c: For 7 of the 11 questions.d: The appraisal tool includes some disease-specific questions.

n. s. not specified; EN English; GE German; GL Guideline; Rec. Recommendation; SE: Some explanations; CE: Concrete explanations a: A generic appraisal tool is a tool that can be used to appraise all kinds of clinical practice guidelines.b: For 4 of the 11 questions.c: For 7 of the 11 questions.d: The appraisal tool includes some disease-specific questions. Eleven appraisal tools provided additional information on their development process. The number of questions in the tools ranged from three to 51. 23 tools grouped their questions into domains. The number of domains ranged from two to 21. Eighteen tools contained at least some explanation of their questions. Twenty tools used no specified scoring system, and twelve used a multiple choice answer, mostly a “yes/no” score, with or without the options ‘not sure’ or ‘not applicable’. Nine tools applied some form of scaling system. Six tools explicitly requested additional comments from guideline appraisers. Thirteen appraisal tools recommended that guidelines should be appraised independently by at least two reviewers. The calculation of a quality score for the domains of an appraisal tool and a qualitative or quantitative overall assessment of the guideline were suggested by five and six tools respectively. Only eleven tools had been subject to any sort of validation studies and only six of these [13,60,64-67] had been validated more thoroughly. All but five appraisal tools were published in peer-reviewed journals. Content analysis Figures 2 and 3 compare the quality dimensions and items covered by the appraisal tools analysed.

Figure 2

Percentage (total number) of quality dimensions / items covered by the guideline appraisal tools.

Figure 3

Percentage (total number) of appraisal tools with questions that can be attributed to the respective quality dimension / item.

The tools varied considerably in terms of the number of quality dimensions covered. Ten (25%) covered at least twelve quality dimensions with at least one item; eleven (28%) covered only six or fewer quality dimensions. The appraisal tools also differed in the extent to which each quality dimension was covered. Of the 34 possible items the number covered by each tool varied between three and 29 (Figure 2). The quality dimensions “evaluation of evidence” (mentioned in 35 tools; 88%) and “information retrieval” (29 tools; 73%) were a main focus of the appraisal tools. However, the tools rarely assessed whether the study results were reported correctly in the guidelines and supported the recommendations (item “consistency” mentioned in six tools; 15%). Another focus was the quality dimension “transferability” (33 tools; 83%) with the items “costs” (25 tools; 63%) and “barriers and facilitators” (23 tools; 58%). However, the tools rarely assessed whether patients, interventions and settings in the studies underlying the recommendations were comparable to those targeted by the recommendations (item “comparability” mentioned in eight tools; 20%). Further quality dimensions covered by at least 70% of the appraisal tools were the dimensions “presentation of guideline content” (34 tools; 85%), “independence” (32 tools; 80%), “scope” (30 tools; 75%), “updating” (30 tools; 75%), and “formulation of recommendations” (28 tools; 70%). The item “composition of the guideline development group” in the quality dimension “independence” was covered frequently (32 tools; 80%), whereas few appraisal tools mentioned the item “consideration of (potential) conflicts of interest" related to the guideline development group (eleven tools; 28%). The following two quality dimensions were covered by 50% or less of the appraisal tools: firstly, “consideration of different perspectives” (20 tools; 50%) with the items “patient perspectives” (thirteen tools; 33%), “norms and values” (nine tools; 23%), and “expert knowledge” (six tools; 15%), and secondly, “dissemination, implementation and evaluation of the guideline” (eighteen tools; 45%) (Figure 3). A table with the complete content characteristics of the guideline appraisal tools is attached as online File S4.

Discussion

Main findings

The aim of this systematic review was to identify and compare existing guideline appraisal tools. We identified 40 different tools. Among those were 24 new tools not included in the systematic reviews by Graham 2000 [58] and Vlayen 2005 [59], as well as an additional three updated tools. Most appraisal tools assess whether the literature search, the evaluation and synthesis of the evidence, and the reporting of the evidence in the guidelines are in accordance with the principles of evidence-based medicine. However, the guideline development process comprises more than the systematic compilation of the evidence on a relevant clinical question. Burgers et al 2002 stated that guideline development is a technical as well as social process [68]. The choice and interpretation of the evidence identified and the formulation of recommendations is affected by norms and values of the guideline development group [53,69-74]. Zuiderent-Jerak et al 2012 suggest that guidelines should reflect all knowledge, not just clinical trials [75]. However, few appraisal tools assess whether the formulation of recommendations is supported by a formal consensus process or whether the norms and values of the guideline development group are clearly stated. Current standards for guideline development [1,42] point out that patients should be full members of the guideline development group. However, many of the appraisal tools fail to capture consumer involvement, i.e. do not assess whether patients’ views were considered in the guideline development group. Conflicts of interest may influence decisions in the health care system [76,77], also concerning the development of guidelines [36-38], and new and more stringent policies have been called for [42,55,78-80]. It is therefore surprising that only few appraisal tools assess whether conflicts of interest of members of the guideline development group have been recorded and addressed.

Selection of an appraisal tool

Most of the appraisal tools included can be assigned to one of three groups: Tools with general questions and with no or only a few appraisal criteria to decide whether the requirements of the questions are fulfilled [61,62,81-96]. Tools with specific questions or appraisal criteria to decide whether the requirements of the questions are fulfilled [2,13,14,43,65-67,97-106]. A small group of tools with specific questions and / or appraisal criteria with an additional qualitative appraisal [57,60,64,107,108]). Differing results of guideline appraisals are more likely in cases where the questions of an appraisal tool are imprecise or specific criteria for answering the questions are lacking. This problem is particularly evident in the tools in the first group. For this reason the appraisal tools in the first group cannot be recommended for regular use. It is also important to underline that appraisal tools in the first and second group mainly focus on methodological issues surrounding guideline development and reporting. However, they do not evaluate the quality of the clinical content itself [58,109]. For example, guideline appraisal tools in the first and second group assess whether the search strategy was reported in the guidelines, but they do not assess whether the search strategy was developed correctly or whether it was suited to identify evidence to answer the clinical question of the guideline. While rigorous development and explicit reporting of the guideline development process are necessary, they do not guarantee appropriate recommendations or better health outcomes for patients, as the methodological rigour and quality of the clinical content of a clinical practice guideline are not necessarily correlated [58,110-112]. Only the five tools of the third group are designed to solve this problem, at least to some degree. While their main focus is still the appraisal of methodological aspects of guideline development and reporting, they nevertheless require judgments on whether relevant quality aspects have been adequately implemented. For example, they assess not only whether the search strategy was reported but also require a qualitative statement on whether the strategy was appropriate [57,60,64,107,108], whether the evidence identified was appropriately summarized in the recommendations [60,64,107,108] or whether an appropriate formal process was used to arrive at the recommendations [57,60]. Appraisal tools differ in the number of items and quality dimensions covered. If the aim is to conduct a comprehensive guideline appraisal, the AGREE II tool [57] or the German-language DELBI tool [65] may represent the best choice. Both tools cover all thirteen quality dimensions. The AGREE II tool has also been thoroughly evaluated. However, an appraisal tool containing many quality dimensions may not necessarily represent the best choice in all cases. If the primary goal is to learn more about the applicability of a guideline, the GLIA tool [67] may be more suitable. This thoroughly evaluated tool appraises aspects that influence the applicability of a guideline. If the goal is to gain more information on the quality of the clinical content of a guideline, the ADAPTE tool [64] may be more suitable. This tool primarily includes questions that can be assigned to the quality dimensions “information retrieval” and “evaluation of evidence”. It has also been thoroughly evaluated, but demands considerable skill on the part of the guideline appraiser. Moreover, additional information not available in the guideline may be needed to answer the questions in this appraisal tool. Depending on the problem being addressed, a tool containing only a few, but appropriate questions could be adequate. Furthermore, it may sometimes be advisable to omit some domains or items of an extensive appraisal tool. Information S4 provides details of the items and quality dimensions covered by the different appraisal tools.

Strengths and weaknesses of the review

Our review provides a comprehensive overview of guideline appraisal tools. It nevertheless has a number of limitations. A systematic search for appraisal tools is difficult, as there is no appropriate MESH or other term for appraisal tools. Because of the large number of appraisal tools used it is possible that not all appraisal tools were identified. Due to the comprehensive search strategy chosen, which included screening the reference lists of relevant primary and secondary publications, it is nevertheless unlikely that important and commonly used tools were not identified. The systematic search for appraisal tools was limited to tools published after 1994. In the late 1980s and early 1990s, the development of clinical practice guidelines became more common. With the definition of clinical practice guidelines by Field and Lohr in 1990 [113], a shared understanding of guidelines and guideline quality emerged that influenced the development of guidelines, as well as the development of appraisal tools. Authors of appraisal tools published before 1995 were probably not able to consider these developments. We used the questions and statements contained in the appraisal tools, as well as the publications by Cluzeau 1999 [60], Graham 2000 [58] and Vlayen 2008 [59], to identify items and quality dimensions. According to this approach, the result of this review is a comparative description of the appraisal tools. There is no “gold standard” for the evaluation of appraisal tools. It is therefore possible that quality dimensions and items exist that were not identified, as they were not part of the publications and appraisal tools analysed, but may nevertheless be relevant for the appraisal of guideline quality. Furthermore, it was not always possible to clearly assign the questions or items of the appraisal tools to only one quality dimension. A further limitation of our review is that no external experts were consulted in the validation of the appraisal framework.

Unanswered Questions and Future Research

The appraisal tools analysed cover several different aspects of guideline quality. All tools allow for the grading of guideline quality. However, it is uncertain whether all items and quality dimensions contribute equally to the quality of a guideline [58]. Further empirical studies are needed to answer the question as to which items and quality dimensions are essential for the assessment of guideline quality; for example, whether the external review of guidelines really improves their quality, whether conflicts of interest really lead to inappropriate recommendations or whether the explicit consideration of patient preferences really improves the patient-centeredness of a guideline. In 2005 Vlayen stated “that in order to evaluate the quality of the clinical content and more specifically the evidence base of a clinical practice guideline, verification of the completeness and the quality of the literature search and its analysis has to be added to the process of validation by an appraisal instrument” [59]. Some appraisal tools have started to deal with this problem but have not solved it so far. The appraisal of the quality of the clinical content of guidelines is time-consuming, requires highly qualified personnel and may need additional information not available in the guidelines themselves. For example, an information specialist may be needed for appraisal of the appropriateness of a search strategy, it may be necessary to repeat a literature search to verify the completeness of the search results or the analysis of the literature identified has to be repeated to prove its correctness. Some working groups have started to deal with the appraisal of the clinical content of a guideline [114,115], but it remains unclear whether the assessment of the evidence base can be included in guideline appraisal tools in their current form. Further research will have to clarify whether and how overall appraisal of the clinical content of a guideline can be included in guideline appraisal tools with a reasonable use of resources.

Conclusions

Appraisal tools differ in the number of items and quality dimensions covered and some tools cover some quality dimensions better than others. The most comprehensively validated appraisal tool is the AGREE II instrument, but the final choice of the appropriate tool depends on the research question. Nevertheless, appraisal tools containing unspecific questions and / or lacking criteria for answering the questions should not be applied. When choosing an appraisal tool it is important to keep in mind that their main focus is the appraisal of methodological aspects of guideline development and not the evaluation of the evidence base underlying a clinical practice guideline; further research should clarify whether and how an overall appraisal of the clinical content of a guideline can be performed. Although conflicts of interest and norms and values of guideline developers, as well as patient involvement, affect the trustworthiness of guidelines, they are currently insufficiently assessed in guideline appraisal tools. They should thus be considered essential items in the further development of such tools. PRISMA Checklist. (PDF) Click here for additional data file. Search strategy. (PDF) Click here for additional data file. Excluded studies (ordered by reasons for exclusion). (PDF) Click here for additional data file. Relevant secondary publications. (PDF) Click here for additional data file. Content characteristics of guideline appraisal tools. (XLS) Click here for additional data file.

103 in total

1. Clinical practice guidelines and Australian general practice. Contemporary issues.

Authors: B Veale; D Weller; C Silagy
Journal: Aust Fam Physician Date: 1999-07

2. An assessment of guidelines for prevention of ischemic stroke.

Authors: Robert G Hart; Renee D Bailey
Journal: Neurology Date: 2002-10-08 Impact factor: 9.910

3. An experimental study of determinants of group judgments in clinical guideline development.

Authors: Rosalind Raine; Colin Sanderson; Andrew Hutchings; Simon Carter; Kirsten Larkin; Nick Black
Journal: Lancet Date: 2004 Jul 31-Aug 6 Impact factor: 79.321

Review 4. ACCP evidence-based guideline development: a successful and transparent approach addressing conflict of interest, funding, and patient-centered recommendations.

Authors: Michael H Baumann; Sandra Zelman Lewis; David Gutterman
Journal: Chest Date: 2007-05-31 Impact factor: 9.410

5. Inside guidelines: comparative analysis of recommendations and evidence in diabetes guidelines from 13 countries.

Authors: Jako S Burgers; Julia V Bailey; Niek S Klazinga; Akke K Van Der Bij; Richard Grol; Gene Feder
Journal: Diabetes Care Date: 2002-11 Impact factor: 19.112

6. Users' guides to the Medical Literature. VIII. How to use clinical practice guidelines. B. what are the recommendations and will they help you in caring for your patients? The Evidence-Based Medicine Working Group.

Authors: M C Wilson; R S Hayward; S R Tunis; E B Bass; G Guyatt
Journal: JAMA Date: 1995 Nov 22-29 Impact factor: 56.272

7. Implementing findings of research.

Authors: A Haines; R Jones
Journal: BMJ Date: 1994-06-04

8. How to use practice guidelines in the intensive care unit: Diagnosis and management of unstable angina.

Authors: D J Cook; A G Ellrodt; J Calvin; M M Levy
Journal: Crit Care Med Date: 1998-03 Impact factor: 7.598

9. Clinical practice guidelines: a manual for developing evidence-based guidelines to facilitate performance measurement and quality improvement.

Authors: Richard M Rosenfeld; Richard N Shiffman
Journal: Otolaryngol Head Neck Surg Date: 2006-10 Impact factor: 5.591

10. Twelve years of clinical practice guideline development, dissemination and evaluation in Canada (1994 to 2005).

Authors: Jennifer Kryworuchko; Dawn Stacey; Nan Bai; Ian D Graham
Journal: Implement Sci Date: 2009-08-05 Impact factor: 7.327

54 in total

1. Evaluating Guidelines: A Review of Key Quality Criteria.

Authors: Thomas Semlitsch; Wolfgang A Blank; Ina B Kopp; Ulrich Siering; Andrea Siebenhofer
Journal: Dtsch Arztebl Int Date: 2015-07-06 Impact factor: 5.594

2. In Reply.

Authors: Thomas Semlitsch
Journal: Dtsch Arztebl Int Date: 2016-02-19 Impact factor: 5.594

3. Developing a Clinician Friendly Tool to Identify Useful Clinical Practice Guidelines: G-TRUST.

Authors: Allen F Shaughnessy; Akansha Vaswani; Bonnie K Andrews; Deborah R Erlich; Frank D'Amico; Joel Lexchin; Lisa Cosgrove
Journal: Ann Fam Med Date: 2017-09 Impact factor: 5.166

Review 4. Appraisal of clinical practice guidelines on the management of hypothyroidism in pregnancy using the Appraisal of Guidelines for Research and Evaluation II instrument.

Authors: Yuan Fang; Liang Yao; Jing Sun; Jian Zhang; Yanxia Li; Ruifei Yang; Kehu Yang; Limin Tian
Journal: Endocrine Date: 2018-02-14 Impact factor: 3.633

Review 5. Appraisal of the Quality and Contents of Clinical Practice Guidelines for Hypertension Management in Chinese Medicine: A Systematic Review.

Authors: Ya Yuwen; Xue-Jie Han; Wei-Liang Weng; Xue-Yao Zhao; Yu-Qi Liu; Wei-Qiang Li; Da-Sheng Liu; Yan-Ping Wang; Ai-Ping Lu
Journal: Chin J Integr Med Date: 2016-12-09 Impact factor: 1.978

6. Cancer-related fatigue: appraising evidence-based guidelines for screening, assessment and management.

Authors: Elizabeth J M Pearson; Meg E Morris; Carol E McKinstry
Journal: Support Care Cancer Date: 2016-04-26 Impact factor: 3.603

Review 7. Which is the best current guideline for the diagnosis and management of cystic pancreatic neoplasms? An appraisal using evidence-based practice methods.

Authors: Alexis M Cahalane; Y M Purcell; L P Lavelle; S H McEvoy; E R Ryan; E O'Toole; D E Malone
Journal: Eur Radiol Date: 2016-01-13 Impact factor: 5.315

Review 8. Guidelines on vitamin D replacement in bariatric surgery: Identification and systematic appraisal.

Authors: Marlene Toufic Chakhtoura; Nancy Nakhoul; Elie A Akl; Christos S Mantzoros; Ghada A El Hajj Fuleihan
Journal: Metabolism Date: 2016-01-04 Impact factor: 8.694

9. Lipid management for coronary heart disease patients: an appraisal of updated international guidelines applying Appraisal of Guidelines for Research and Evaluation II-clinical practice guideline appraisal for lipid management in coronary heart disease.

Authors: Huimin Zhou; Shaozhao Zhang; Xiuting Sun; Daya Yang; Xiaodong Zhuang; Yue Guo; Xun Hu; Zhimin Du; Meifen Zhang; Xinxue Liao
Journal: J Thorac Dis Date: 2019-08 Impact factor: 2.895

10. Assessing the Level of Patient-Specific Treatment Recommendations in Clinical Practice Guidelines for Hemodialysis Vascular Access in the United States.

Authors: Gilbert L Queeley; Ellen S Campbell; Askal A Ali
Journal: Am Health Drug Benefits Date: 2018-07