| Literature DB >> 34696810 |
R Perry1, A Whitmarsh2, V Leach3, P Davies3,4.
Abstract
BACKGROUND: AMSTAR-2 is a 16-item assessment tool to check the quality of a systematic review and establish whether the most important elements are reported. ROBIS is another assessment tool which was designed to evaluate the level of bias present within a systematic review. Our objective was to compare, contrast and establish both inter-rater reliability and usability of both tools as part of two overviews of systematic reviews. Strictly speaking, one tool assesses methodological quality (AMSTAR-2) and the other assesses risk of bias (ROBIS), but there is considerable overlap between the tools in terms of the signalling questions.Entities:
Keywords: AMSTAR-2; Methodological quality; ROBIS; Risk of bias; Systematic reviews
Mesh:
Year: 2021 PMID: 34696810 PMCID: PMC8543959 DOI: 10.1186/s13643-021-01819-x
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
A comparison of the content of the two tools (AMSTAR-2 and ROBIS)
| Criteria | AMSTAR-2 | ROBIS |
|---|---|---|
1. Did the research questions and inclusion criteria for the review include the components of PICO? 2. Did the report of the review contain an explicit statement that the review methods were established prior to the conduct of the review and did the report justify any significant deviations from the protocol? 3. Did the review authors explain their selection of the study designs for inclusion in the review? | 1.1 Did the review adhere to pre-defined objectives and eligibility criteria? 1.2 Were the eligibility criteria appropriate for the review question? 1.3 Were eligibility criteria unambiguous? 1.4 Were all restrictions in eligibility criteria based on study characteristics appropriate (e.g., date, sample size, study quality, outcomes measured)? 1.5 Were any restrictions in eligibility criteria based on sources of information appropriate (e.g., publication status or format, language, availability of data)? | |
5. Did the review authors perform study selection in duplicate? 6. Did the review authors perform data extraction in duplicate? | 2.5 Were efforts made to minimise error in selection of studies? 3.1 Were efforts made to minimise error in data collection? 3.3 Were all relevant study results collected for use in the synthesis? | |
| 4. Did the review authors use a comprehensive literature search strategy? | 2.1 Did the search include an appropriate range of databases/electronic sources for published and unpublished reports? 2.3 Were the terms and structure of the search strategy likely to retrieve as many eligible studies as possible? 2.4 Were restrictions based on date, publication format, or language appropriate? | |
| NA | 2.2 Were methods additional to database searching used to identify relevant reports? | |
| 7. Did the review authors provide a list of excluded studies and justify the exclusions? | N/A | |
| 8. Did the review authors describe the included studies in adequate detail? | 3.2 Were sufficient study characteristics available for both review authors and readers to be able to interpret the results? | |
| 9. Did the review authors use a satisfactory technique for assessing the risk of bias (RoB) in individual studies that were included in the review? | 3.4 Was risk of bias (or methodological quality) formally assessed using appropriate criteria? 3.5 Were efforts made to minimise error in risk of bias assessment? | |
N/A N/A 11. If meta-analysis was performed did the review authors use appropriate methods for statistical combination of results? 12. If meta-analysis was performed, did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis? | 4.1 Did the synthesis include all studies that it should? 4.2 Were all pre-defined analyses reported or departures explained? 4.3 Was the synthesis appropriate given the nature and similarity in the research questions, study designs and outcomes across included studies? 4.6 Were biases in primary studies minimal or addressed in the synthesis? | |
| 14. Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review? 15. If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review? | 4.4 Was between-study variation (heterogeneity) minimal or addressed in the synthesis? 4.5 Were the findings robust, e.g., as demonstrated through funnel plot or sensitivity analyses? | |
| 13. Did the review authors account for RoB in individual studies when interpreting/ discussing the results of the review? | A. Did the interpretation of findings address all of the concerns identified in Domains 1 to 4? B. Was the relevance of identified studies to the review’s research question appropriately considered? C. Did the reviewers avoid emphasising results on the basis of their statistical significance? | |
10. Did the review authors report on the sources of funding for the studies included in the review? 16. Did the review authors report any potential sources of conflict of interest, including any funding they received for conducting the review? | N/A N/A |
Signalling questions are in a different order to line up the criteria from both tools. N/A not assessed
Agreed results of AMSTAR-2 for fibromyalgia
| Author (date), CAM | 1. Were PICO components listed? | 3. Study design justified? | 5. Was study selection performed in duplicate? | 6. Was data extraction performed in duplicate? | 8. Characteristics of studies provided in detail? | |||
|---|---|---|---|---|---|---|---|---|
| | ||||||||
| Holdcraft 2003 [ | No | No | No | No | No | |||
| Baronowsky 2009 [ | No | Yes | No | No | No | |||
| Terhorst 2011, 2012 [ | Yes | No | Yes | No | No | |||
| De Silva 2010 [ | No | No | Yes | Yes | No | |||
| | ||||||||
| Perry 2010 [ | Yes | No | Yes | Yes | PY | |||
| Boehm 2014 [ | No | No | Yes | Yes | Yes | |||
| | ||||||||
| Ernst 2009 [ | Yes | No | No | Yes | No | |||
| | ||||||||
| Mayhew and Ernst 2007 [ | No | No | No | Yes | PY | |||
| Daya 2007 [ | No | No | No | No | PY | |||
| Langhorst 2010 [ | Yes | No | Yes | Yes | Yes | |||
| Martin-Sanchez 2009 [ | Yes | No | No | No | No | |||
| Cao 2013 [ | Yes | No | yes | Yes | Yes | |||
| Deare 2013 [ | Yes | No | Yes | Yes | Yes | |||
| Yang 2014 [ | Yes | No | Yes | Yes | No | |||
| | ||||||||
| de Souza Nascimento 2013 [ | Yes | yes | No | Yes | PY | |||
| | ||||||||
| Perry 2011 [ | Yes | No | PY | Yes | Yes | |||
| Bruyas-Bertholon 2012 [ | No | No | No | No | PY | |||
| Harb 2016 [ | Yes | No | Yes | Yes | PY | |||
| Gutierrez-Castrellon 2017 [ | Yes | No | No | No | No | |||
| | ||||||||
| Dobson 2012 [ | Yes | No | Yes | Yes | Yes | |||
| Gleberzon 2012 [ | No | No | No | Yes | PY | |||
| Carnes 2017 [ | No | No | Yes | Yes | PY | |||
| | ||||||||
| Skejeie 2018 [ | Yes | No | Yes | Yes | Yes | |||
| | ||||||||
| Anheyer 2017 [ | No | No | No | Yes | Yes | |||
| | ||||||||
| Sung 2013 [ | Yes | No | Yes | Yes | PY | |||
| Anabrees 2013 [ | Yes | No | No | Yes | Yes | |||
| Urbanska 2014 [ | Yes | No | No | No | PY | |||
| Xu 2015 [ | No | No | Yes | Yes | Yes | |||
| Schreck Bird 2017 [ | Yes | No | Yes | No | Yes | |||
| Dryl 2018 [ | Yes | No | No | No | PY | |||
| Sung 2018 [ | Yes | No | No | No | No | |||
CL critically low, PY partial yes, MA meta-analysis, PICO participants, intervention, comparator, outcomes, RoB risk of bias
aToo few studies to perform a test of heterogeneity
bNot fully searched and search conducted Dec 2014
cConflict of interest occurred but no indication of how it was dealt with
dAll included studies were by the author team but did not indicate how this was dealt with
Italicised columns represent the critical domains (see Appendix, Table 15)
The inter-rater agreement between the three raters for AMSTAR-2
| Question | Number of studies | Gwet’s AC1/Gwet’s AC2 | 95% CI |
|---|---|---|---|
| 1 | 31 | 0.69 | 0.48, 0.91 |
| 3 | 31 | 0.55 | 0.30, 0.80 |
| 5 | 31 | 0.70 | 0.47, 0.94 |
| 6 | 31 | 0.60 | 0.35, 0.86 |
| 8 | 31 | 0.39 | 0.21, 0.56 |
| 10 | 31 | 0.84 | 0.67, 1.00 |
| 12 | 19 | 0.40 | 0.05, 0.75 |
| 14 | 31 | 0.19 | -0.08, 0.47 |
| 16 | 31 | 0.34 | 0.06, 0.63 |
Italicised questions are considered critical by the tool authors
Fig. 1Gwet’s statistic for the inter-rater agreement for AMSTAR-2 questions
Tabular presentation for agreement of ROBIS results
| | |||||
| 1. Perry | Low | Low | Low | Unclear | Low |
| 2. Boehm | High | Low | Low | High | High |
| | |||||
| 3. Mayhew | Low | High | High | Low | Low |
| 4. Daya | Low | High | High | Low | Low |
| 5. Langhorst | Low | High | High | Low | Low |
| 6. Martin-Sanchez | Low | High | High | High | High |
| 7. Cao | Low | High | Low | Low | Low |
| 8. Deare | Low | Low | Low | Low | Low |
| 9. Yang | Low | Low | High | High | High |
| | |||||
| 10. Ernst | High | Unclear | High | Unclear | Unclear |
| | |||||
| 11. Nascimento | Low | Low | Low | High | Low |
| | |||||
| 12. Holdcraft | Low | Low | Low | High | Low |
| 13. Baronowsky | Low | Low | Unclear | High | Low |
| 14. Terhorst | Low | High | Low | High | High |
| 15. De Silva | High | High | High | Unclear | Low |
| | |||||
| 1. Perry | Low | Unclear | Low | Low | Low |
| 2. Bruyas-Bertholon | High | High | Unclear | High | High |
| 3. Harb | High | High | Low | High | High |
| 4. Gutierrez-Castrellon | Unclear | High | High | High | High |
| | |||||
| 5. Dobson | Low | Low | Low | Low | Low |
| 6. Gleberzon | High | High | Unclear | Unclear | High |
| 7. Carne | Low | Low | Low | High | Unclear |
| | |||||
| 8. Skejeie | Low | Low | Low | Low | Unclear |
| | |||||
| 9. Anheyer | Unclear | High | Low | High | High |
| | |||||
| 10. Sung 2013 | Unclear | Low | Low | High | Unclear |
| 11. Anabrees | Low | Low | Low | High | Low |
| 12. Urbansk | Low | High | High | High | High |
| 13. Xu | Unclear | Low | Low | Unclear | Low |
| 14. Shreck Bird | High | High | Low | High | High |
| 15. Dryl | High | High | Unclear | High | High |
| 16. Sung 2018 | High | Unclear | Unclear | Unclear | Unclear |
Inter-rater agreement
| ROBIS question | No. of studies | Gwet’s AC1/Gwet’s AC2 | 95% CI |
|---|---|---|---|
| 1.1 | 30 | 0.62 | 0.38, 0.85 |
| 1.2 | 31 | 0.70 | 0.56, 0.84 |
| 1.3 | 31 | 0.69 | 0.56, 0.82 |
| 1.4 | 31 | 0.61 | 0.48, 0.74 |
| 1.5 | 31 | 0.56 | 0.37, 0.74 |
| | 31 | 0.45 | 0.22, 0.67 |
| 2.1 | 31 | 0.53 | 0.41, 0.65 |
| 2.2 | 30 | 0.53 | 0.35, 0.71 |
| 2.3 | 31 | 0.62 | 0.47, 0.77 |
| 2.4 | 31 | 0.41 | 0.20, 0.62 |
| 2.5 | 29 | 0.59 | 0.30, 0.88 |
| | 31 | 0.36 | 0.17, 0.55 |
| 3.1 | 29 | 0.88 | 0.68, 1.00 |
| 3.2 | 31 | 0.66 | 0.51, 0.82 |
| 3.3 | 31 | 0.65 | 0.51, 0.78 |
| 3.4 | 31 | 0.77 | 0.61, 0.93 |
| 3.5 | 30 | 0.73 | 0.48, 0.98 |
| | 31 | 0.55 | 0.35, 0.76 |
| 4.1 | 31 | 0.60 | 0.46, 0.74 |
| 4.2 | 29 | 0.48 | 0.28, 0.68 |
| 4.3 | 31 | 0.77 | 0.66, 0.88 |
| 4.4 | 31 | 0.18 | − 0.02, 0.37 |
| 4.5 | 30 | 0.22 | 0.02, 0.43 |
| 4.6 | 31 | 0.39 | 0.17, 0.62 |
| | 31 | 0.17 | − 0.03, 0.37 |
| A | 31 | 0.28 | 0.09, 0.47 |
| B | 31 | 0.64 | 0.54, 0.75 |
| C | 31 | 0.45 | 0.31, 0.60 |
| | 31 | 0.45 | 0.24, 0.66 |
Fig. 2Gwet’s statistic for the inter-rater agreement for ROBIS questions and domains
Mean (SD) completion time (in minutes) for colic paper
| Rater 1 | Rater 2 | Rater 3 | ||||
|---|---|---|---|---|---|---|
| Mean (SD) | Mean (SD) | Mean (SD) | ||||
| 14 | 13.0 (5.2) | 15 | 18.7 (6.6) | 16 | 11.1 (4.2) | |
| 9 | 14.1 (6.5) | 10 | 15.7 (5.3) | 15 | 43.3 (23.3) | |
Results of AMSTAR-2 for CAM for fibromyalgia reviews
| Question | No. of studies | Gwet’s AC1/Gwet’s AC2 | 95% CI | |
|---|---|---|---|---|
| 1 | 15 | 0.66 | 0.32, 1.00 | 0.001 |
| 3 | 15 | 0.39 | − 0.08, 0.86 | 0.096 |
| 5 | 15 | 0.69 | 0.33, 1.00 | 0.001 |
| 6 | 15 | 0.65 | 0.26, 1.00 | 0.003 |
| 8 | 15 | 0.20 | 0.02, 0.38 | 0.031 |
| 10 | 15 | 1.00 | 0.85, 1.00 | < 0.001 |
| 12 | 7 | 0.52 | − 0.11, 1.00 | 0.091 |
| 14 | 15 | 0.20 | − 0.17, 0.57 | 0.270 |
| 16 | 15 | 0.55 | 0.14, 0.96 | 0.013 |
Twenty missing ratings. Italicised areas are considered the critical questions
Inter-rater agreement
| ROBIS question | No. of studies | Gwet’s AC1/Gwet’s AC2 | 95% CI | |
|---|---|---|---|---|
| 1.1 | 14 | 0.73 | 0.46, 1.00 | < 0.001 |
| 1.2 | 15 | 0.70 | 0.45, 0.95 | < 0.001 |
| 1.3 | 15 | 0.62 | 0.39, 0.84 | < 0.001 |
| 1.4 | 15 | 0.54 | 0.32, 0.76 | < 0.001 |
| 1.5 | 15 | 0.64 | 0.40, 0.88 | < 0.001 |
| | ||||
| 2.1 | 15 | 0.53 | 0.36, 0.69 | < 0.001 |
| 2.2 | 14 | 0.42 | 0.16, 0.69 | 0.005 |
| 2.3 | 15 | 0.72 | 0.53, 0.92 | < 0.001 |
| 2.4 | 15 | 0.31 | − 0.08, 0.70 | 0.110 |
| 2.5 | 15 | 0.56 | 0.14, 0.99 | 0.013 |
| | ||||
| 3.1 | 15 | 0.95 | 0.66, 1.00 | < 0.001 |
| 3.2 | 15 | 0.65 | 0.47, 0.84 | < 0.001 |
| 3.3 | 15 | 0.57 | 0.40, 0.74 | < 0.001 |
| 3.4 | 15 | 0.55 | 0.23, 0.88 | 0.003 |
| 3.5 | 15 | 0.81 | 0.51, 1.00 | < 0.001 |
| | ||||
| 4.1 | 15 | 0.55 | 0.33, 0.77 | < 0.001 |
| 4.2 | 13 | 0.55 | 0.29, 0.81 | 0.001 |
| 4.3 | 15 | 0.80 | 0.62, 0.98 | < 0.001 |
| 4.4 | 15 | 0.13 | − 0.19, 0.45 | 0.405 |
| 4.5 | 14 | − 0.10 | − 0.52, 0.33 | 0.633 |
| 4.6 | 15 | 0.23 | − 0.17, 0.64 | 0.235 |
| | ||||
| A | 15 | 0.10 | − 0.25, 0.44 | 0.552 |
| B | 15 | 0.61 | 0.40, 0.83 | < 0.001 |
| C | 15 | 0.39 | 0.01, 0.76 | 0.009 |
| | ||||
Six ratings missing
The risk of bias and study quality for each fibromyalgia review
| Fibromyalgia | AMSTAR-2 | ROBIS |
|---|---|---|
|
| ||
| |
|
|
| Baronowsky 2009 [ |
|
|
| Terhorst 2011, 2012 [ |
|
|
| De Silva 2010 [ |
|
|
|
| ||
| |
|
|
| Boehm 2014 [ |
|
|
|
| ||
| Ernst 2009 [ |
|
|
|
| ||
| |
|
|
| |
|
|
| |
|
|
| Martin-Sanchez 2009 [ |
|
|
| |
|
|
| Deare 2013 [ |
|
|
| Yang 2014 [ |
|
|
|
| ||
| |
|
|
When AMSTAR-2 is low, this should correspond to ROBIS being of high risk of bias. The italicised reviews show discrepancies between the overall rating of quality/bias
To compare the distribution of risk of bias and study quality for the fibromyalgia reviews
| ROBIS | ||||
|---|---|---|---|---|
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
Inter-rater agreement
| Question | No. of studies | Gwet’s AC1/Gwet’s AC2 | 95% CI | |
|---|---|---|---|---|
| 1 | 16 | 0.73 | 0.43, 1.00 | < 0.001 |
| 3 | 16 | 0.68 | 0.40, 0.96 | < 0.001 |
| 5 | 16 | 0.72 | 0.38, 1.00 | < 0.001 |
| 6 | 16 | 0.56 | 0.18, 0.95 | 0.006 |
| 8 | 16 | 0.61 | 0.35, 0.87 | < 0.001 |
| 10 | 16 | 0.67 | 0.36, 0.97 | < 0.001 |
| 12 | 12 | 0.34 | − 0.12, 0.80 | 0.133 |
| 14 | 16 | 0.22 | − 0.23, 0.66 | 0.321 |
| 16 | 16 | 0.15 | − 0.25, 0.55 | 0.444 |
Fifteen missing ratings. Italicised areas are considered the critical questions
Inter-rater agreement
| ROBIS question | No. of studies | Gwet’s AC1/Gwet’s AC2 | 95% CI | |
|---|---|---|---|---|
| 1.1 | 16 | 0.57 | 0.17, 0.96 | 0.008 |
| 1.2 | 16 | 0.71 | 0.55, 0.87 | < 0.001 |
| 1.3 | 16 | 0.76 | 0.61, 0.91 | < 0.001 |
| 1.4 | 16 | 0.71 | 0.54, 0.87 | < 0.001 |
| 1.5 | 16 | 0.49 | 0.20, 0.77 | 0.002 |
| | ||||
| 2.1 | 16 | 0.54 | 0.34, 0.73 | < 0.001 |
| 2.2 | 16 | 0.64 | 0.37, 0.92 | < 0.001 |
| 2.3 | 16 | 0.57 | 0.34, 0.81 | < 0.001 |
| 2.4 | 16 | 0.50 | 0.27, 0.73 | < 0.001 |
| 2.5 | 14 | 0.61 | 0.18, 1.00 | < 0.001 |
| | ||||
| 3.1 | 14 | 0.82 | 0.51, 1.00 | < 0.001 |
| 3.2 | 16 | 0.70 | 0.44, 0.96 | < 0.001 |
| 3.3 | 16 | 0.72 | 0.52, 0.92 | < 0.001 |
| 3.4 | 16 | 0.92 | 0.83, 1.00 | < 0.001 |
| 3.5 | 15 | 0.66 | 0.21, 1.00 | 0.007 |
| | ||||
| 4.1 | 16 | 0.65 | 0.45, 0.86 | < 0.001 |
| 4.2 | 16 | 0.42 | 0.11, 0.73 | 0.011 |
| 4.3 | 16 | 0.73 | 0.58, 0.88 | < 0.001 |
| 4.4 | 16 | 0.23 | − 0.02, 0.48 | 0.072 |
| 4.5 | 16 | 0.40 | 0.22, 0.57 | < 0.001 |
| 4.6 | 16 | 0.55 | 0.32, 0.77 | < 0.001 |
| | ||||
| A | 16 | 0.47 | 0.28, 0.65 | 0.015 |
| B | 16 | 0.69 | 0.55, 0.82 | < 0.001 |
| C | 16 | 0.54 | 0.37, 0.72 | < 0.001 |
| | ||||
Three ratings missing
The risk of bias and study quality for each colic review
| Colic | AMSTAR-2 | ROBIS |
|---|---|---|
|
| ||
| |
|
|
| Bruyas-Bertholon 2012 [ |
|
|
| Harb 2016 [ |
|
|
| Gutierrez-Castrellon 2017 [ |
|
|
|
| ||
| Dobson 2012 [ |
|
|
| Gleberzon 2012 [ |
|
|
| Carnes 2017 [ |
|
|
|
| ||
| Skejeie 2018 [ |
|
|
|
| ||
| Anheyer 2017 [ |
|
|
|
| ||
| Sung 2013 [ |
|
|
| |
|
|
| Urbanska 2014 [ |
|
|
| |
|
|
| Schreck Bird 2017 [ |
|
|
| Dryl 2018 [ |
|
|
| Sung 2018 [ |
|
|
When AMSTAR-2 is low, this should correspond to ROBIS being of high risk of bias. The italicised reviews show discrepancies between the overall rating of quality/bias
To compare the distribution of risk of bias and study quality for the fibromyalgia reviews
| ROBIS | ||||
|---|---|---|---|---|
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
|
|
|
|
| |
Criteria for assessing confidence in AMSTAR-2 (Shea et al. [20])
| Rating overall confidence in the results of the review | |
|---|---|
1. (a) 2. (a) 3. (a) 4. (a) |
*Multiple non-critical weaknesses may diminish confidence in the review and it may be appropriate to move the overall appraisal down from moderate to low confidence