Literature DB >> 24965222

Relation of completeness of reporting of health research to journals' endorsement of reporting guidelines: systematic review.

Adrienne Stevens¹, Larissa Shamseer², Erica Weinstein³, Fatemeh Yazdi¹, Lucy Turner¹, Justin Thielman¹, Douglas G Altman⁴, Allison Hirst⁵, John Hoey⁶, Anita Palepu⁷, Kenneth F Schulz⁸, David Moher⁹.

Abstract

OBJECTIVE: To assess whether the completeness of reporting of health research is related to journals' endorsement of reporting guidelines.
DESIGN: Systematic review. DATA SOURCES: Reporting guidelines from a published systematic review and the EQUATOR Network (October 2011). Studies assessing the completeness of reporting by using an included reporting guideline (termed "evaluations") (1990 to October 2011; addendum searches in January 2012) from searches of either Medline, Embase, and the Cochrane Methodology Register or Scopus, depending on reporting guideline name. STUDY SELECTION: English language reporting guidelines that provided explicit guidance for reporting, described the guidance development process, and indicated use of a consensus development process were included. The CONSORT statement was excluded, as evaluations of adherence to CONSORT had previously been reviewed. English or French language evaluations of included reporting guidelines were eligible if they assessed the completeness of reporting of studies as a primary intent and those included studies enabled the comparisons of interest (that is, after versus before journal endorsement and/or endorsing versus non-endorsing journals). DATA EXTRACTION: Potentially eligible evaluations of included guidelines were screened initially by title and abstract and then as full text reports. If eligibility was unclear, authors of evaluations were contacted; journals' websites were consulted for endorsement information where needed. The completeness of reporting of reporting guidelines was analyzed in relation to endorsement by item and, where consistent with the authors' analysis, a mean summed score.
RESULTS: 101 reporting guidelines were included. Of 15,249 records retrieved from the search for evaluations, 26 evaluations that assessed completeness of reporting in relation to endorsement for nine reporting guidelines were identified. Of those, 13 evaluations assessing seven reporting guidelines (BMJ economic checklist, CONSORT for harms, PRISMA, QUOROM, STARD, STRICTA, and STROBE) could be analyzed. Reporting guideline items were assessed by few evaluations.
CONCLUSIONS: The completeness of reporting of only nine of 101 health research reporting guidelines (excluding CONSORT) has been evaluated in relation to journals' endorsement. Items from seven reporting guidelines were quantitatively analyzed, by few evaluations each. Insufficient evidence exists to determine the relation between journals' endorsement of reporting guidelines and the completeness of reporting of published health research reports. Journal editors and researchers should consider collaborative prospectively designed, controlled studies to provide more robust evidence. SYSTEMATIC REVIEW REGISTRATION: Not registered; no known register currently accepts protocols for methodology systematic reviews. © Stevens et al 2014.

Entities: Chemical

Mesh：

Year: 2014 PMID： 24965222 PMCID： PMC4070413 DOI： 10.1136/bmj.g3804

Source DB: PubMed Journal: BMJ ISSN： 0959-8138

Introduction

Reporting of health research is, in general, bad.1 2 3 4 5 6 7 Complete and transparent reporting facilitates the use of research for a variety of stakeholders such as clinicians, patients, and policy decision makers who use research findings; researchers who wish to replicate findings or incorporate those findings in future research; systematic reviewers; and editors who publish health research. Reporting guidelines are tools that have been developed to improve the reporting of health research. They are intended to help people preparing or reviewing a specific type of research and may include a minimum set of items to be reported (often in the form of a checklist) and possibly also a flow diagram.8 9 An important role for editors is to ensure that research articles published in their journals are clear, complete, transparent, and as free as possible from bias.10 In an effort to uphold high standards, journal editors may feel the need to endorse multiple reporting guidelines without knowledge of their rigor or ability to improve reporting. The CONSORT statement is a well known reporting guideline that has been extensively evaluated.11 12 13 14 15 A 2012 systematic review indicated that, for some items of the CONSORT checklist, trials published in journals that endorse CONSORT were more completely reported than were trials published before the time of endorsement or in non-endorsing journals.16 17 A similar systematic review of other reporting guidelines may provide editors and other end users with the information needed to help them decide which other guidelines to use or endorse. Our objective was to assess whether the completeness of reporting of health research is related to journals’ endorsement of reporting guidelines other than CONSORT by comparing the completeness of reporting in journals before and after endorsement of a reporting guideline and in endorsing journals compared with non-endorsing journals. For context, the box provides readers with definitions of terms used throughout this review.

Definitions related to evaluation of reporting guidelines in context of this systematic review

Endorsement—Action taken by a journal to indicate its support for the use of one or more reporting guideline(s) by authors submitting research reports for consideration; typically achieved in a statement in a journal’s “Instructions to authors” Adherence—Action taken by an author to ensure that a manuscript is compliant with items (that is, reports all suggested items) recommended by the appropriate/relevant reporting guideline Implementation—Action taken by journals to ensure that authors adhere to an endorsed reporting guideline and that published manuscripts are completely reported Complete reporting—Pertains to the state of reporting of a study report and whether it is compliant with an appropriate reporting guideline

Methods

Our methods are available in a previously published protocol.18 This systematic review is reported according to the PRISMA statement (appendix 1).19 Any changes in methods from those reported in the protocol are found in appendix 2.

Identifying reporting guidelines

We first searched for and selected reporting guidelines. We included reporting guidelines from Moher et al’s 2011 systematic review,9 and we screened guidelines identified through the EQUATOR Network (October 2011; reflects content from PubMed searches to June 2011). We included English language reporting guidelines for health research if they provided explicit text to guide authors in reporting, described how the guidance was developed, and used a consensus process to develop the guideline. After removing any duplicate results from the search yield, we uploaded records and full text reports to Distiller SR. Two people (AS and LS) independently screened reporting guidelines. Disagreements were resolved by consensus or a third person (DM).

Identifying evaluations of reporting guidelines

Many developers of reporting guidelines have devised acronyms for their guidelines for simplicity of naming (for example, CONSORT, PRISMA, STARD). Some acronyms, however, refer to words with other meanings (for example, STROBE). For this reason, we used a dual approach to searching for evaluations of relevant reporting guidelines. We searched for reporting guidelines with unique acronyms cited in bibliographic records in Ovid Medline (1990 to October 2011), Embase (1990 to 2011 week 41), and the Cochrane Methodology Register (2011, issue 4); we searched Scopus (October 2011) for evaluations of all other guidelines (that is, ones with alternate meanings or without an acronym). We did addendum searches in January 2012. Details are provided in appendix 3. In addition, we contacted the corresponding authors of reporting guidelines, scanned bibliographies of related systematic reviews, and consulted with members of our research team for other potential evaluations. We included English or French language evaluations if they assessed the completeness of reporting as a primary intent and included studies enabling the comparisons of interest (after versus before journal endorsement and/or endorsing versus non-endorsing journals). Choice of language for inclusion was based on expertise within our research team; owing to budget constraints, we could not seek translations of potential evaluations in other languages. After removing any duplicate results from the search yield, we uploaded records to Distiller SR. We first screened records by title and abstract (one person to include, two people to exclude a record) and then in two rounds for the full reports (two reviewers, independently) owing to the complexity of assessing screening criteria and using a team of reviewers. Disagreements were resolved by consensus or a third person. Where needed, we contacted authors of evaluations (n=66) or journal editors (n=48) for additional information. One person (from among a smaller working group of the team) processed evaluations with responses to queries to authors and journal editors and collated multiple reports for evaluations. We first assessed each published study from within an included evaluation according to the journal in which it was published (fig 1). We collected information on endorsement from evaluations or journal websites. If the journal’s “Instruction to authors” section (or similar) specifically listed the guideline, we considered the journal to be an “endorser.”

Fig 1 Schematic depicting relation among evaluation of reporting guideline, studies contained within it, and determination of comparison groups according to journal endorsement status

Data extraction and analysis

For included reporting guidelines, one person extracted guidelines’ characteristics. For evaluations of reporting guidelines, one person extracted characteristics of the evaluation and outcomes and did validity assessments; a second person verified 20% of the characteristics of studies and 100% of the remaining information. We contacted authors for completeness of reporting data for evaluations, where needed. Variables collected are reflected in the tables, figures, and appendices. As no methods exist for synthesizing validity assessments for methods reviews, we present information in tables and text for readers’ interpretation. Our primary outcome was completeness of reporting, defined as complete reporting of all elements within a guidance checklist item. As not all authors evaluated reporting guideline checklist items as stated in the original guideline publications, we excluded any items that were split into two or more separate items or reworded (leading to a change in meaning of the item). Comparisons of interest were endorsing versus non-endorsing journals and after versus before endorsement. The first comparison functions as a cross sectional analysis, and years in which articles from endorsing journals were published depicted the years of comparison with articles from non-endorsing journals. We used the publication date of the reporting guideline as a proxy if the actual date of endorsement was not known. For the second comparison, we included before and after studies from the same journal only if a specific date of endorsement was known. We also examined the publication years of included studies to ensure that years were close enough within a given arm for reasonable comparison. As a result, not all studies included in the evaluations were included in our analysis. We analyzed the completeness of reporting in relation to journals’ endorsement of guidelines by item (number of studies within an evaluation completely reporting a given reporting item) and by mean summed score (we calculated a sum of completely reported guideline items for each study included in an evaluation and compared the mean of those sums across studies between comparison groups); we used a mean summed score only when evaluations also analyzed in this manner. We used risk ratios, standardized mean differences, and mean differences with associated 99% confidence intervals for analyses, as calculated using Review Manager software.20 In most cases, we reworked authors’ data to form our comparison groups of interest for the analysis. Where possible, we used a random effects model meta-analysis to do a quantitative synthesis across evaluations for a given checklist item or for the mean summed score. We entered evaluations into Review Manager as the “studies,” whereas studies included within a given evaluation formed the unit of analysis, just as the number of patients would normally be entered. We entered the pooled effect estimate and confidence interval values from Review Manager for each checklist into Comprehensive Meta-Analysis to create summary plots depicting a “snapshot” view for each reporting guideline.21 Secondary outcomes were methodological quality and unwanted effects of using a guideline, as reported in evaluations. We present data for these outcomes in narrative form.

Results

Literature search results

Reporting guidelines

Eighty one reporting guidelines from Moher et al’s 2011 systematic review9 and 23 of 98 reporting guidelines identified by the EQUATOR Network were initially eligible for inclusion (fig 2). After removal of the CONSORT guidelines, we included a total of 101 reporting guidelines.19 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121

Fig 2 PRISMA flow diagram for selecting reporting guidelines for health research. RG=reporting guideline

Evaluations of reporting guidelines

Our literature search included evaluations of the CONSORT guidelines, but we excluded those during the screening process. We located 17 225 records through bibliographic databases and an additional 49 records from other sources (bibliographies, web search for full text reports of conference abstracts, and articles suggested by authors of reporting guidelines and members of the research team). After removing companion (known multiple publications) and duplicate reports, we screened a total of 15 249 title and abstract records. Of those, 1153 were eligible for full text review. After two rounds of full text screening, contacting authors, and seeking journal endorsement information, we included a total of 26 evaluations (fig 3).122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 A list of potential evaluations written in languages other than English or French is provided in appendix 4.

Fig 3 PRISMA flow diagram for selecting evaluations of relevant reporting guidelines. RG=reporting guideline; SR=systematic review

Fig 3 PRISMA flow diagram for selecting evaluations of relevant reporting guidelines. RG=reporting guideline; SR=systematic review Nine reporting guidelines were assessed among the 26 included evaluations: STARD 2003 for studies of diagnostic accuracy (n=8),131 132 133 134 135 136 137 138 CONSORT extension for harms 2004 (n=5),124 125 126 141 142 PRISMA 2009 for systematic reviews and meta-analyses (n=3),143 144 145 QUOROM 1999 for meta-analyses of randomized trials (n=3),128 129 130 BMJ economics checklist 1996 (n=2 evaluations),122 123 STROBE 2007 for observational studies in epidemiology (n=2),140 147 CONSORT extension for journal and conference abstracts 2008 (n=1),146 CONSORT extension for herbal interventions 2006 (n=1),127 and STRICTA 2002 for controlled trials of acupuncture (n=1).139

Characteristics of included studies

Appendix 5 descriptively summarizes included reporting guidelines according to the focus of the guideline and the content area the guideline covers. Among included guidelines were those covering general health research reports; animal, pre-clinical, and other basic science reports; a variety of health research designs and types of health research; and a variety of content areas. Tables 1 and 2 show characteristics of the included evaluations. The most frequent content focuses of evaluations were diagnostic studies (7/26; 27%), drug therapies (6/26; 23%), and unspecified (5/26; 19%); evaluations spanned a variety of biomedical areas. Funding was most frequently either not reported (13/26; 50%) or provided by a government agency (7/26; 27%), and the role of the funder in the conduct of the evaluation was not reported in most evaluations (22/26; 85%). Two thirds of the evaluations provided a statement regarding competing interests or declared authors’ source(s) of support (17/26; 65%). Corresponding authors of evaluations were located in nine countries; 37% (10/27) of corresponding authors were in the United Kingdom.

Table 1

Characteristics of included evaluations for BMJ economics, CONSORT extension for abstracts, CONSORT extension for harms, CONSORT extension for herbal interventions, and PRISMA reporting guidelines

Author, year*	Country of corresponding author	Sources of funding; role of funder; authors’ source(s) of support	Content focus	Specific medical or scientific specialty†	Extent of guideline assessed‡
BMJ economics guideline, 1996
Herman, 2005¹²²§	United States	Government agency: grant from National Center for Complementary and Alternative Medicine; not reported; not reported (authors declare no competing interests)	Complementary medicine	Unspecified	All items
Jefferson, 1998¹²³	United Kingdom	Not reported; not reported; not reported	Unspecified	Unspecified	Subset of items¶
CONSORT extension for abstracts, 2008**
Ghimire, 2014¹⁴⁶	South Korea	Not reported; not reported; not reported (authors declare no competing interests)	Unspecified	Oncology	Subset of items¶
CONSORT extension for harms, 2004**
Haidich, 2011¹²⁴§	Greece	Not reported; not reported; not reported	Drug therapies	Several medical specialties††	All items
Turner, 2011¹²⁵§	Canada	Government agency: National Center for Complementary and Alternative Medicine, National Institutes of Health; not reported; authors declare no competing interests	Complementary medicine	Unspecified	Subset of items‡‡
Peron, 2014¹⁴¹	France	Not reported; not reported; charitable foundation: Nuovo-Soldati Foundation (authors declare no competing interests)	Drugs therapies	Oncology	Subset of items¶
Cornelius, 2013¹⁴²	United Kingdom	Government agency: National Institute for Health Research Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London; not reported; not reported (authors declare no competing interests)	Drug therapies	Neurosciences	Subset of items¶
Lee, 2008¹²⁶	Canada	Government agency: Canadian Institutes of Health Research Chronic Disease New Emerging Team grant (joint sponsorship from Canadian Diabetes Association, Kidney Foundation of Canada, Heart and Stroke Foundation of Canada, and two other Canadian Institutes of Health Research Institutes); not reported; not reported	Drug therapies	Clinical neurology	Subset of items¶
CONSORT extension for herbal interventions, 2006**
Ernst, 2011¹²⁷	United Kingdom	Not reported; not reported; not reported	Complementary medicine	Medicine, general and internal	Subset of items
PRISMA, 2009
Tunis, 2013¹⁴³§	Canada	No funding; not applicable; not reported (authors state no competing interests; authors have declared financial activities not related to article)	Unspecified	Radiology, nuclear medicine, and medical imaging	All items
Panic, 2013¹⁴⁵§	Italy	Not reported; funder had no role in work; Academic: ERAWEB, Charitable: Fondazione Veronesi (authors declare no competing interests)	Unspecified	Gastroenterology and hepatology	All items
Fleming, 2013¹⁴⁴§	United Kingdom	Not reported; not reported; not reported.	Unspecified	Dentistry, oral surgery, and medicine	All items

*All included evaluations were published as full reports.

†2011 journal impact factor categories used for classification.

‡If authors of evaluations deemed particular guidance item to be “not applicable” to literature they were assessing, those items were excluded from analysis; for evaluations with zero or one studies in one comparison arm, those evaluations were removed from synthesis because that one arm would determine direction of effect.

§Included in quantitative analysis.

¶As determined by authors of this review when comparing with published guidance.

**Official extension of CONSORT reporting guideline; “official” defined as at least one author from original CONSORT reporting guideline on authorship of extension.

††Cardiac and cardiovascular systems, hematology, immunology, infectious diseases, obstetrics and gynecology, oncology, psychiatry, respiratory system, and rheumatology.

‡‡Evaluation’s authors indicated subset was assessed but authors of this review determined smaller subset was analyzed when comparing with published guidance.

Table 2

Characteristics of included evaluations for QUOROM, STARD, STRICTA, and STROBE reporting guidelines

Author, year*	Country of corresponding author	Sources of funding; role of funder; authors’ source(s) of support	Content focus	Specific medical or scientific specialty†	Extent of guideline assessed‡
QUOROM, 1999
Hind, 2007¹²⁸§	United Kingdom	Not reported; not reported; not reported (authors declare they previously worked for UK NHS Health Technology Assessment Programme (source of included reports))	Therapeutic interventions (generic)	Unspecified	Subset of items
Biondi-Zoccai, 2006¹²⁹	Italy	No funding; not applicable; not reported (authors declare no competing interests)	Drug therapies	Urology and nephrology	All items
Poolman, 2007¹³⁰	Canada, Netherlands	Not reported; not reported; academic: Canadian Institutes of Health Research Canada Research Chair; Industry: Merck Sharp and Dohme Netherlands, Biomet Netherlands, Zimmer Netherlands; other: Stichting Wetenschappelijk Onderzoek Orthopaedische Chirurgie Fellowship, Anna Fonds Foundation, Nederlandse Vereniging voor Orthopedische Traumatologie Fellowship	Surgery	Orthopedics	All items
STARD, 2003
Freeman, 2009¹³¹§	United Kingdom	Government agency: European Commission funds allocated to Safe Activities For Everyone Network of Excellence under 6th Framework; not reported; not reported	Biochemical and laboratory research methods	Obstetrics and gynecology	All items
Mahoney, 2007¹³²§	United States	Industry: LifeScan Inc; not reported; study funder	Diagnostic (glucose monitoring)	Endocrinology and metabolism	All items
Selman, 2011¹³³§	United Kingdom	Not reported; not reported; other: charitable foundation (Wellbeing of Women) and Medical Research Council/Royal College of Obstetricians and Gynaecologists Clinical Research Training Fellowship (authors declare no competing interests)	Diagnostic studies	Obstetrics and gynecology	Subset of items¶
Smidt, 2006¹³⁴§	Netherlands	Government agency: ZonMW; funder did not play role in study or manuscript**; authors declare no competing interests.	Diagnostic studies	Medicine, general and internal	Subset of items¶
Coppus, 2006¹³⁵	Netherlands	Government agency: VIDI-program of ZonMW and charitable foundation: Scientific foundation of the Maxima Medical Center;not reported; not reported	Diagnostic studies	Reproductive biology	Subset of items§
Johnson, 2007¹³⁶	United Kingdom	Not reported; not reported; not reported (authors declare no competing interests)	Diagnostic studies	Ophthalmology	Subset of items
Krzych, 2009¹³⁷	Poland	Self financed; not applicable; not reported	Diagnostic studies	Cardiac and cardiovascular systems	Subset of items††
Paranjothy, 2007¹³⁸	United Kingdom	No funding; not reported; authors state no information to disclose	Diagnostic studies	Ophthalmology	All items
STRICTA, 2002‡‡
Hammerschlag, 2011¹³⁹§	United States	Not reported; not reported; personnel support from Oregon College of Oriental Medicine research department and Helfgott Research Institute of National College of Natural Medicine	Complementary Medicine	Unspecified	Subset of items¶
STROBE, 2007
Parsons, 2011¹⁴⁷§	United Kingdom	Not reported; not reported; not reported	Surgery	Orthopedics	All items
Delaney, 2010¹⁴⁰	United States	Industry: Biomedical Excellence for Safer Transfusion collaborative (industry sponsored); not reported; authors declare no competing interests	Platelet transfusion	Hematology	Subset of items¶

*All included evaluations were published as full reports.

†2011 journal impact factor categories used for classification.

§Included in quantitative analysis.

¶As determined by authors of this review when comparing with published guidance.

**Specifically, funding agency did not play role in design or conduct of study; collection, management, analysis, or interpretation of data; or preparation, review, or approval of manuscript.

††Authors of evaluations indicated subset was assessed, but authors of this review determined smaller subset was analyzed when comparing with published guidance.

‡‡Unofficial extension of CONSORT reporting guideline.

Characteristics of included evaluations for BMJ economics, CONSORT extension for abstracts, CONSORT extension for harms, CONSORT extension for herbal interventions, and PRISMA reporting guidelines *All included evaluations were published as full reports. †2011 journal impact factor categories used for classification. ‡If authors of evaluations deemed particular guidance item to be “not applicable” to literature they were assessing, those items were excluded from analysis; for evaluations with zero or one studies in one comparison arm, those evaluations were removed from synthesis because that one arm would determine direction of effect. §Included in quantitative analysis. ¶As determined by authors of this review when comparing with published guidance. **Official extension of CONSORT reporting guideline; “official” defined as at least one author from original CONSORT reporting guideline on authorship of extension. ††Cardiac and cardiovascular systems, hematology, immunology, infectious diseases, obstetrics and gynecology, oncology, psychiatry, respiratory system, and rheumatology. ‡‡Evaluation’s authors indicated subset was assessed but authors of this review determined smaller subset was analyzed when comparing with published guidance. Characteristics of included evaluations for QUOROM, STARD, STRICTA, and STROBE reporting guidelines *All included evaluations were published as full reports. †2011 journal impact factor categories used for classification. ‡If authors of evaluations deemed particular guidance item to be “not applicable” to literature they were assessing, those items were excluded from analysis; for evaluations with zero or one studies in one comparison arm, those evaluations were removed from synthesis because that one arm would determine direction of effect. §Included in quantitative analysis. ¶As determined by authors of this review when comparing with published guidance. **Specifically, funding agency did not play role in design or conduct of study; collection, management, analysis, or interpretation of data; or preparation, review, or approval of manuscript. ††Authors of evaluations indicated subset was assessed, but authors of this review determined smaller subset was analyzed when comparing with published guidance. ‡‡Unofficial extension of CONSORT reporting guideline. For each included evaluation, tables 3 and 4 show the number of studies relevant to our assessments, their year(s) of publication, and the number of journals publishing the relevant studies. Tables 5 and 6 present information on the extent of journals’ endorsement and whether the date of endorsement was provided by evaluation authors, journal websites, or editors.

Table 3

Validity assessment for evaluations with studies enabling endorsing versus non-endorsing journal comparison

Author, year	Relevant studies for assessment (endorsing v non-endorsing)	Year of publication of assessed studies	Journals that published assessed studies	Two or more assessors for completeness of reporting*	No of items assessed as reported in methods section*	Comprehensive search strategy*	Balance of studies per journal in comparison groups*†
BMJ economic guidelines, 1996
Herman, 2005¹²²‡	2 v 11	2003-04	1 v 10	Unclear	High	Low	High
Jefferson, 1998¹²³	1 v 5	1997-98§	1 v 1	Unclear	Unclear	High	High
CONSORT extension for abstracts, 2008
Ghimire, 2014¹⁴⁶	74 v 234	2010-12	2 v 4	High	Unclear	Low	Low
CONSORT extension for harms, 2004
Haidich, 2011¹²⁴‡	25 v 77	2006	2 v 3	High	High	High	Low
Turner, 2011¹²⁵‡	5 v 189	2009	5 v 104	Low	High	Low	Low
Peron, 2013¹⁴¹	43 v 282	2007-11	2 v 8	Unclear	High	Low	Low
Cornelius, 2013¹⁴²	1 v 6	2009	1 v 5	High	High	High	High
Lee, 2008¹²⁶	1 v 1	2005	1 v s 1	High	High	High	High
CONSORT extension for herbal interventions, 2006
Ernst, 2011¹²⁷	1 v 4	2009	1 v 3	Unclear	High	Low	High
PRISMA, 2009
Tunis, 2013¹⁴³‡	13 v 48	2010-11	1 v 8	High	High	Low	Low
Panic, 2013¹⁴⁵‡	30 v 30	Jan-Oct 2012	6 v 10	High	High	Low	Unclear
Fleming, 2013¹⁴⁴‡	20 v 2	2009-11 v 2010-11	2 v 1	High	High	Low	Low
QUOROM, 1999
Biondi-Zoccai, 2006¹²⁸	1 v 6	2004	1 v 6	High	High	Low	High
Poolman, 2007¹³⁰	1 v 6	2006 v 2005	1 v 5	High	Unclear	Low	High
STARD, 2003
Freeman, 2009¹³¹‡	3 v 9	2004-05	2 v 7	Unclear	High	High	High
Mahoney, 2007¹³²‡	6 v 20	2003-05	4 v 13	High	High	Low	High
Selman, 2011¹³³‡	14 v 36	2003-06	6 v 22	High	Low	Low	Low
Smidt, 2006¹³⁴‡	95 v 46	2004	7 v 5	High	High	Low	Low
Coppus, 2006¹³⁵	8 v 19	2004	1 v 1	Low	High	Unclear	High
Johnson, 2007¹³⁶	1 v 10	2005	1 v 4	High	High	Low	High
Krzych, 2009¹³⁷	4 v 21	2004-06	2 v 16	Unclear	High	Low	High
Paranjothy, 2007¹³⁸	1 v 8	2005-06	1 v 4	High	High	Low	High
STRICTA, 2002
Hammerschlag, 2011¹³⁹‡	17 v 130	2002-05	3 v 64	Low	High	Low	Unclear
STROBE, 2007
Parsons, 2011¹⁴⁷‡	9 v 38	2008-10	2 v 6	Low	Unclear	Low	Low
Delaney, 2010¹⁴⁰	1 v 4	2008	1 v 3	High	Unclear	Low	High

*High=high validity; low=low validity; unclear=unclear validity.

†Assessed once authors’ data reorganized into comparison groups.

‡Included in quantitative synthesis.

§Estimated based on information provided in article.

Table 4

Validity assessment for evaluations with studies enabling the after versus before journal comparison

Author, year	Relevant studies for assessment (after v before endorsement)	Year of publication of assessed studies	Journals that published assessed studies	Two or more assessors for completeness of reporting*	No of items assessed as reported in methods section*	Comprehensive search strategy*	Balance of studies per journal in comparison groups*†	Sampling took place in period following publication of reporting guideline*†
BMJ economic guidelines, 1996
Jefferson, 1998¹²³	1 v 8	1997-98 v 1994-95§	1	Unclear	Unclear	High	High	Low
CONSORT extension for abstracts, 2008
Ghimire, 2014¹⁴⁶	74 v 16	2010-12 v 2005-07	2	High	Unclear	Low	Low	Low
CONSORT extension for harms, 2004
Lee, 2008¹²⁶	1 v 2	2005 v 1999-2000	1	High	High	High	High	Low
PRISMA, 2009
Panic, 2013¹⁴⁵‡	27 v 26	2012 v 2008-11	6	High	High	Low	Low	Unclear
Fleming, 2013¹⁴⁴‡	14 v 12	2009-11 v 2006-09	1	High	High	Low	High	Low
QUOROM, 1999
Hind, 2007¹²⁸‡	13 v 15	2005 v 2003	1	Low	High	Low	High	High
STARD, 2003
Smidt, 2006¹³⁴‡	95 v 78	2004 v 2000	7	High	High	Low	Unclear	Low
Selman, 2011¹³³	3 v 1	2005-06 v 2003	1	High	Low	Low	Low	High
STRICTA, 2002
Hammerschlag, 2011¹³⁹‡	11 v 4	2003-05 v 1999-2001	2	Low	High	Low	Unclear	Low
STROBE, 2007
Parsons, 2011¹⁴⁷‡	9 v 11	2008-10 v 2005-08	2	Low	Unclear	Low	Low	Low

*High=high validity; low=low validity; unclear=unclear validity.

†Assessed once authors’ data reorganized into comparison groups.

‡Included in quantitative synthesis.

§Estimated based on information provided in article.

Table 5

Journal endorsement information for evaluations assessing BMJ economics, CONSORT extension for abstracts, CONSORT extension for harms, CONSORT extension for herbal interventions, and PRISMA reporting guidelines

Author, year	Endorsing journals that published assessed studies	Extent of endorsement	Date of endorsement provided
BMJ economic guidelines, 1996
Herman, 2005¹²²* †	BMJ	Submit checklist	By journal, email
Jefferson, 1998¹²³	BMJ	Submit checklist	By journal, email
CONSORT extension for abstracts, 2008
Ghimire, 2014¹⁴⁶	Lancet	Suggests use	By journal, email
CONSORT extension for harms, 2004
Haidich, 2011¹²⁴*†	Annals of Internal Medicine	Submit checklist	By journal, email
Haidich, 2011¹²⁴*†	The Lancet	Submit checklist	By journal, email
Turner, 2011¹²⁵*†	The American Journal of Gastroenterology	Submit checklist	By journal, email
	American Journal of Kidney Diseases	Suggests use	By journal, email
	Applied Health Economics and Health Policy	Suggests use	By journal, email
	JAMA	Submit checklist	Not provided
	Phytomedicine	Suggests use	Not provided
Peron, 2014¹⁴¹ †	Lancet	Submit checklist	By journal, email
Peron, 2014¹⁴¹ †	Lancet Oncology	Submit checklist	By journal, email
Cornelius, 2013¹⁴² †	Lancet	Submit checklist	By journal, email
Lee, 2008¹²⁶	BMJ	Submit checklist	By journal, email
CONSORT extension for herbal interventions, 2006
Ernst, 2011¹²⁷†	Annals of Internal Medicine	Suggests use	Not provided
PRISMA, 2009
Tunis, 2013¹⁴³*†	Radiology	Suggests use	Unknown based on information given
Panic, 2013¹⁴⁵*	Alimentary Pharmacology and Therapeutics	Extent of endorsement at time of author’s analysis unknown (all journals)	Provided by author (all journals)
	American Journal of Gastroenterology
	BMC Gastroenterology
	Colorectal Disease
	Diseases of the Colon and Rectum
	Gut
	Gut Pathogens
	Hepatitis Monthly
	HPB
Fleming, 2013¹⁴⁴*	American Journal of Orthodontics and Dentofacial Orthopedics	Submit checklist	By journal, email
	Angle Orthodontist	Suggests use	Not provided
	European Journal of Orthodontics	Submit checklist	By journal, email
	Journal of Orthodontics	Suggests use	By journal, email

*Evaluations included in quantitative analysis.

†Endorsing versus non-endorsing journals comparison only.

Table 6

Journal endorsement information for evaluations assessing QUOROM, STARD, STRICTA, and STROBE reporting guidelines

Author, year	Endorsing journals that published assessed studies	Extent of endorsement	Date of endorsement provided
QUOROM, 1999
Hind, 2007¹²⁸*†	UK NHS Health Technology Assessment Programme	Submit checklist	By evaluation
Biondi-Zoccai, 2006¹²⁹‡	Clinical Cardiology	Unknown based on information given	Unknown based on information given
Poolman, 2007¹³⁰‡	BMJ	Suggests use	Not provided
STARD, 2003
Freeman, 2009¹³¹*‡	American Journal of Obstetrics and Gynecology	Submit checklist	Unknown based on information given
Freeman, 2009¹³¹*‡	Molecular Diagnosis§	Suggests use	Not provided
Mahoney, 2007¹³²*‡	Archives of Disease in Childhood (including Fetal and Neonatal Edition)	Suggests use	Unknown based on information given
	Clinical Biochemistry	Suggests use	Not provided
	Emergency Medicine Journal	Suggests use	Unknown based on information given
	Journal of the Medical Association of Thailand	Suggests use	Not provided
Selman, 2011¹³³*¶	American Journal of Obstetrics and Gynecology†	Submit checklist	Unknown based on information given
	Cancer†	Suggests use	Not provided
	Clinical Radiology†	Suggests use	Not provided
	Journal of the Medical Association of Thailand†	Suggests use	Not provided
	Obstetrics and Gynecology	Suggests use	By journal, email
	Radiology†	Suggests use	By journal website
Smidt, 2006¹³⁴*	Annals of Internal Medicine	Suggests use	Journal website or by evaluation (all journals)
	BMJ	Suggests use
	Clinical Chemistry	Submit checklist
	JAMA	Suggests use
	The Lancet	Submit checklist
	Neurology	Submit checklist
	Radiology	Suggests use
Coppus, 2006¹³⁵‡	Human Reproduction	Journal no longer endorses guideline
Johnson, 2007¹³⁶‡	Ophthalmic and Physiologic Optics	Submit checklist	By journal, email
Krzych, 2009¹³⁷‡	Clinical Chemistry**	Submit checklist	Reported in another evaluation
Krzych, 2009¹³⁷‡	Heart	Suggests use	Not provided
Paranjothy, 2007¹³⁸‡	British Journal of Ophthalmology	Suggests use	Not provided
STRICTA, 2002
Hammerschlag, 2011¹³⁹*	Acupuncture in Medicine	Suggests use	By journal, email
	Journal of Alternative and Complementary Medicine	Suggests use	By journal, email
	Medical Acupuncture†	Suggests use	By journal, email
STROBE, 2007
Parsons, 2011¹⁴⁷*	Clinical Orthopaedics and Related Research	Suggests use	By journal, email
Parsons, 2011¹⁴⁷*	The Journal of Bone and Joint Surgery (American)	Suggests use	By journal, email
Delaney, 2010¹⁴⁰‡	Annals of Surgery	Suggests use	Not provided

*Evaluations included in quantitative analysis.

†After versus before journal endorsement comparison only.

‡Endorsing versus non-endorsing journals comparison only.

§Now published as Molecular Diagnosis and Therapy.

¶In quantitative analysis for endorsing versus non-endorsing journals only.

**Reported in another included evaluation.

Validity assessment for evaluations with studies enabling endorsing versus non-endorsing journal comparison *High=high validity; low=low validity; unclear=unclear validity. †Assessed once authors’ data reorganized into comparison groups. ‡Included in quantitative synthesis. §Estimated based on information provided in article. Validity assessment for evaluations with studies enabling the after versus before journal comparison *High=high validity; low=low validity; unclear=unclear validity. †Assessed once authors’ data reorganized into comparison groups. ‡Included in quantitative synthesis. §Estimated based on information provided in article. Journal endorsement information for evaluations assessing BMJ economics, CONSORT extension for abstracts, CONSORT extension for harms, CONSORT extension for herbal interventions, and PRISMA reporting guidelines *Evaluations included in quantitative analysis. †Endorsing versus non-endorsing journals comparison only. Journal endorsement information for evaluations assessing QUOROM, STARD, STRICTA, and STROBE reporting guidelines *Evaluations included in quantitative analysis. †After versus before journal endorsement comparison only. ‡Endorsing versus non-endorsing journals comparison only. §Now published as Molecular Diagnosis and Therapy. ¶In quantitative analysis for endorsing versus non-endorsing journals only. **Reported in another included evaluation.

Validity assessment

Tables 3 and 4 show validity assessments for the comparisons; supports for those judgments are in appendix 6. Table 3 provides information on evaluations for the endorsing versus non-endorsing journal comparison; table 4 includes information for those evaluations that included studies pertaining to the after versus before endorsement comparison. More than half (15/26; 58%) of the evaluations used at least two people to assess the completeness of reporting. Selective reporting does not seem to be a problem, as most evaluations (20/26; 77%) assessed the number of reporting items as stipulated in the methods section. A comprehensive search strategy for locating relevant studies was not reported for most evaluations (5/26; 19%); an evaluation with the intention of evaluating reports from specific journals in a specified time period would have been deemed adequately comprehensive. When comparing endorsing journals with non-endorsing journals, half of the evaluations (14/25; 56%) had a similar number of studies per journal in the comparison groups; when comparing journals after and before endorsement, less than half of the evaluations (4/10; 40%) were balanced for the number of studies per journal in the comparison groups to account for a potential “clustering” problem. When comparing journals after and before endorsement, most evaluations (7/10; 70%) had studies in the “before” arm that were published before the reporting guideline was published, possibly confounding the evaluations.

Relation between journals’ endorsement of guidelines and completeness of reporting

Of the 26 included evaluations, we were able to quantitatively analyze 13; we did not have access to the raw data for the remaining evaluations. The CONSORT extensions for herbal interventions and journal/conference abstracts reporting guidelines were covered by one evaluation each, but raw data were not available for our analysis. Because of the few evaluations with available data, we were unable to do pre-planned subgroup and sensitivity analyses and assessments of funnel plot asymmetry.18 Data described below pertain to overall analyses of checklist items by guideline; individual analyses for each checklist item and mean summed score are provided in appendix 7.

Endorsing versus non-endorsing journals

Analyzed by checklist item, the CONSORT extension for harms (10 items), PRISMA (27 items), STARD (25 items), and STROBE (34 items) reporting guidelines were evaluated on all items; a subset of items was analyzed for the BMJ economics checklist (19/35 items) and STRICTA (18/20 items) guidelines. Most items were assessed by only one evaluation; STARD items were assessed by two to four evaluations and PRISMA by mostly two to three evaluations (figures 4, 5, 6, 7, 8, and 9). Relatively few relevant studies were included in the assessments (median 85, interquartile range 47-143, studies). Across guidelines, almost all items were statistically non-significant for completeness of reporting in relation to journal endorsement (figures 4, 5, 6, 7, 8, and 9).

Fig 5 Completeness of reporting summary plot for CONSORT extension for harms checklist, endorsing versus non-endorsing journals

Fig 6 Completeness of reporting summary plot for PRISMA checklist, endorsing versus non-endorsing journals. Although all evaluations assessed all items, one evaluation was excluded from analysis of two checklist items because of zero or one studies for analysis

Fig 7 Completeness of reporting summary plot for STARD checklist, endorsing versus non-endorsing journals. Effect estimate for checklist item “Test methods: definition of cut-offs of index test and reference standard” was not estimable during quantitative analysis because of zero events in each arm (one evaluation in analysis)

Fig 8 Completeness of reporting summary plot for STRICTA checklist, endorsing versus non-endorsing journals

Fig 9 Completeness of reporting summary plot for STROBE checklist, endorsing versus non-endorsing journals. Effect estimate for checklist item “Methods: missing data” was not estimable during quantitative analysis because of zero events in each arm

Fig 4 Completeness of reporting summary plot for BMJ economics checklist, endorsing versus non-endorsing journals. Summary plots in this and other related figures were generated in Comprehensive Meta-analysis. In brief, summary effect estimates for each checklist are shown, and those estimates were previously calculated in Review Manager. For example, checklist item “economic importance of question” was assessed in only one evaluation, which had 13 studies (2 studies from endorsing journal and 11 studies from non-endorsing journals; appendix 7) that provided information on whether study had reported on that checklist item. Appendix 7 shows analyses for each checklist item conducted in Review Manager Fig 5 Completeness of reporting summary plot for CONSORT extension for harms checklist, endorsing versus non-endorsing journals Fig 6 Completeness of reporting summary plot for PRISMA checklist, endorsing versus non-endorsing journals. Although all evaluations assessed all items, one evaluation was excluded from analysis of two checklist items because of zero or one studies for analysis Fig 7 Completeness of reporting summary plot for STARD checklist, endorsing versus non-endorsing journals. Effect estimate for checklist item “Test methods: definition of cut-offs of index test and reference standard” was not estimable during quantitative analysis because of zero events in each arm (one evaluation in analysis) Fig 8 Completeness of reporting summary plot for STRICTA checklist, endorsing versus non-endorsing journals Fig 9 Completeness of reporting summary plot for STROBE checklist, endorsing versus non-endorsing journals. Effect estimate for checklist item “Methods: missing data” was not estimable during quantitative analysis because of zero events in each arm The CONSORT extension for harms, PRISMA, STARD, STRICTA, and STROBE were each analyzed by mean summed score, for which some evaluations used all items and others used a subset of items (table 7). Guidelines were assessed by a range of one to three evaluations. Relatively few relevant studies were included in the assessments (median 102, interquartile range 88-143, studies). Analyses for completeness of reporting in relation to journal endorsement for mean summed scores were statistically non-significant for all except PRISMA (table 7).

Table 7

Analysis by mean summed score of items for reporting guideline checklists, endorsing versus non-endorsing journals*

Reporting guideline†	No of evaluations‡	No of studies (total)	Effect estimate (99% CI)
CONSORT extension for harms, 2004	1§	25 v 77 (102)	Mean difference 0.04 (–1.50 to 1.58)
PRISMA, 2009	3¶	63 v 80 (143)	Standardized mean difference 0.53 (0.02 to 1.03)
STARD, 2003	3**	23 v 65 (88)	Standardized mean difference 0.52 (–0.11 to 1.16)
STRICTA, 2002	1††	17 v 130 (147)	Mean difference 1.42 (–0.04 to 2.88)
STROBE, 2007	1§	9 v 38 (47)	Mean difference 1.55 (–3.19 to 6.29)

*Individual forest plots depicting these summary data are shown in appendix 7.

†QUOROM (two evaluations) was not estimable because of one study in one comparison arm per assessed evaluation.

‡Only evaluations that calculated summed score for report were included.

§All checklist items summed.

¶Subset of items was summed for one evaluation.

**Subset of items was summed for two of three evaluations.

††Subset of items was summed.

Analysis by mean summed score of items for reporting guideline checklists, endorsing versus non-endorsing journals* *Individual forest plots depicting these summary data are shown in appendix 7. †QUOROM (two evaluations) was not estimable because of one study in one comparison arm per assessed evaluation. ‡Only evaluations that calculated summed score for report were included. §All checklist items summed. ¶Subset of items was summed for one evaluation. **Subset of items was summed for two of three evaluations. ††Subset of items was summed.

After versus before journal endorsement

Analyzed by checklist item, STROBE (34 items) and PRISMA (27 items) were the only reporting guidelines with all items evaluated; the QUOROM (1/17 items), STARD (1/25 items), and STRICTA (17/20 items) guidelines were evaluated for a subset of items. All were assessed by one evaluation each with the exception of PRISMA. Relatively few relevant studies were included in the assessments (median 20, interquartile range 19-64, studies; figures 10, 11, 12, 13, and 14). Analyses for completeness of reporting in relation to endorsement were statistically non-significant for each checklist item.

Fig 11 Completeness of reporting summary plot for QUOROM checklist, after versus before journal endorsement

Fig 12 Completeness of reporting summary plot for STARD checklist, after versus before journal endorsement

Fig 13 Completeness of reporting summary plot for STRICTA checklist, after versus before journal endorsement

Fig 14 Completeness of reporting summary plot for STROBE checklist, after versus before journal endorsement. Effect estimate for checklist item “Methods: missing data” was not estimable during quantitative analysis because of zero events in each arm

Fig 10 Completeness of reporting summary plot for PRISMA checklist, after versus before journal endorsement. Although all evaluations assessed all items, one evaluation was excluded from analysis of one checklist item because of zero and one studies for comparison arms Fig 11 Completeness of reporting summary plot for QUOROM checklist, after versus before journal endorsement Fig 12 Completeness of reporting summary plot for STARD checklist, after versus before journal endorsement Fig 13 Completeness of reporting summary plot for STRICTA checklist, after versus before journal endorsement Fig 14 Completeness of reporting summary plot for STROBE checklist, after versus before journal endorsement. Effect estimate for checklist item “Methods: missing data” was not estimable during quantitative analysis because of zero events in each arm PRISMA (all checklist items), STRICTA (item subset), and STROBE (all checklist items) reporting guidelines were analyzed by a mean summed score and by one or two evaluations each. Relatively few relevant studies were included in the assessments (median 20, interquartile range 18-50, studies), and analyses for completeness of reporting in relation to endorsement for mean summed scores were statistically non-significant (table 8).

Table 8

Analysis by mean summed score for reporting guideline checklists, after versus before journal endorsement*

Reporting guideline	No of evaluations†	No of studies (total)	Effect estimate (99% CI)
PRISMA, 2009	2‡	41 v 38 (79)	Standardized mean difference 0.49 (–0.10 to 1.08)
STRICTA, 2002	1§	11 v 4 (15)	Mean difference 1.82 (–2.49 to 6.13)
STROBE, 2007	1‡	9 v 11 (20)	Mean difference 1.16 (–3.97 to 6.29)

*Individual forest plots depicting these summary data are shown in appendix 7.

†Only evaluations that calculated summed score for report were included.

‡All checklist items were summed.

§Subset of items was summed.

Analysis by mean summed score for reporting guideline checklists, after versus before journal endorsement* *Individual forest plots depicting these summary data are shown in appendix 7. †Only evaluations that calculated summed score for report were included. ‡All checklist items were summed. §Subset of items was summed.

Assessment of study methodological quality within evaluations

Nine of 26 evaluations assessed the methodological quality of included studies (table 9): one economics evaluation,122 one evaluation assessing randomized trials of herbal medicines,127 five systematic review evaluations,129 130 143 144 145 and two evaluations assessing diagnostic studies.131 137 Relatively few studies per evaluation were included in the assessments. The three more recently published systematic review evaluations used AMSTAR, whereas the older two evaluations used the Oxman and Guyatt index. The two diagnostic evaluations used separate, non-overlapping criteria. Given the different methodological areas and tools represented by the evaluations, a meaningful synthesis statement was not possible.

Table 9

Assessment of methodological quality within evaluations

Author, year	Methodological quality assessment
BMJ economic guidelines, 1996
Herman, 2005¹²²	Evaluated economic evaluations on four criteria: randomization; prospective economic data collection; comparison group was usual care; and study was not blinded or mandatory regarding participation. Both studies in endorsing arm met all four criteria compared with 5/11 studies in non-endorsing arm
CONSORT extension for herbal interventions, 2006
Ernst, 2011¹²⁷	Assessed studies by using Cochrane risk of bias tool. Only study from endorsing journal was assessed as at moderate risk of bias. Studies from non-endorsing journals were assessed at high (n=2) or moderate (n=2) risk of bias
PRISMA, 2009
Tunis, 2013¹⁴³	Assessed reviews by using AMSTAR. Using data provided by author, studies (n=13) from only endorsing journal scored mean of 9.2 of 11 points, and studies (n=48) from non-endorsing journals scored 7.6 of 11 points
Panic, 2013¹⁴⁵	Assessed reviews by using AMSTAR. Data by item are not presented. Endorsing versus non-endorsing journals: using data provided by author, mean summed score from studies (n=30) from endorsing journals was 7.2 (range 2 to 9), and those (n=30) from non-endorsing journals scored 6.4 (range 1-9). After versus before journal endorsement: using data provided by author, mean summed score was 7.3 (range 3-9, n=27 articles) after journal endorsement and 6.0 (range 0-9, n=26 articles) before endorsement
Fleming, 2013¹⁴⁴	Authors assessed reviews by using AMSTAR tool but analyzed across all included studies¹⁶²
QUOROM, 1999
Biondi-Zoccai, 2006¹²⁹	Assessed studies by using the Oxman and Guyatt index (range of 1 (minimal flaws) to 7 (extensive flaws)). Only study from endorsing journal scored 2 on index; studies (n=6) from non-endorsing journals scored range of 1-6 points
Poolman, 2007¹³⁰	Used the Oxman and Guyatt index (maximum score 7 points). Only study from endorsing journal scored 7 points. Studies from non-endorsing journals (n=6) scored range of 1-6 points; four studies scoring 1 or 2 points are considered to have “major flaws” according to index
STARD, 2003
Freeman, 2009¹³¹	Assessed eight aspects that authors state address internal and external validity of included studies: selective participant sampling; lack of reporting ethnicity and/or sensitization status of participants; lack of reporting number of replicates, if done, that were used for overall study outcome; lack of reporting failure rate; lack of including reported failure rate in analysis; difference in reported and adjusted accuracy; lack of controlling for presence of fetal DNA; and lack of known genotypes in study as control. Raw data provided in tabular form without summary in text. Studies (n=3) from endorsing journals ranged from 2 to 4 of 8 flaws. Studies (n=8) from non-endorsing journals ranged from 2 to 6 flaws, and information from one study was not interpretable
Krzych, 2009¹³⁷	Authors assessed studies by using QUADAS tool but analyzed across all included studies

Assessment of methodological quality within evaluations

Unwanted effects of reporting guideline use

None of the included evaluations reported on unwanted effects of reporting guideline use.

Discussion

We reviewed the evidence on whether endorsement of reporting guidelines by journals is associated with more complete reporting of research. Although we identified a large number of reporting guidelines, very few evaluations of those reporting guidelines were located and provided information to enable an examination with respect to endorsement.

Strengths and weaknesses of systematic review

This is the first systematic review to comprehensively review a broad range of reporting guidelines. We sourced these reporting guidelines from the EQUATOR Network and another systematic review characterizing known, high quality guidelines. We gave careful consideration to the parameters required to enable our comparisons of interest and made a considerable effort to locate evaluations, including the re-analysis of others’ data. As exemplified by the volume of literature we had to screen, searching is complex with methods reviews. No search filters or established bibliographic database controlled vocabulary terms exist, especially for reporting guidelines. For many methods reviews, the particular studies of interest are often embedded in other studies. The time consuming task of screening leads to a very low yield. Although systematic reviews are customarily current with the literature on publication, all such evidence pertains to comparative effectiveness reviews and not to methods reviews, such as ours. An updated search would yield more than 6000 records for us to screen with likely only a few relevant studies. We were aware of additional evaluations that have been published since the date of our literature search, and we have added these into our review. These additional studies have not led to a change in our conclusions. Other recently published articles did not meet our criteria.148 149 150 We do not believe that an updated search would identify sufficient additional studies to change our results. We limited our inclusion to evaluations written in English or French. This may be a limitation of our work, but we are unclear as to how many evaluations might exist in other languages given that few reporting guidelines are translated into other languages. We did not include the main CONSORT reporting guideline here, and this decision was made after the initial protocol was written. The volume of evaluations for CONSORT is so large that we felt that detailed analysis would have overwhelmed the evidence from other reporting guidelines; furthermore, a systematic review solely evaluating the effect of CONSORT is available as recently as 2012.16 17

Comparison with other reviews

The findings from the 2012 CONSORT systematic review show that, for some CONSORT checklist items, trials published in journals that endorse CONSORT were more completely reported than were trials published before the time of endorsement or in non-endorsing journals.16 17 CONSORT is by far the most extensively evaluated reporting guideline, in contrast to the reporting guidelines covered in this review. At least one other review evaluating CONSORT for harms has been published.151 We examined this review, and studies included in that review but not in ours would not have met our eligibility criteria.

Meaning of review: explanations and implications

Although reporting guidelines might have sufficient face validity to convince some editors to endorse them, we found little evidence to guide this policy. This is in stark contrast, for example, to the evidence required to introduce a new drug in the marketplace. Here, empirical evidence in the form of pivotal randomized trials would be required. Although reporting guidelines are not drugs, they have become increasingly popular, their trajectory continues to increase very quickly, and journal editors and others are making policy decisions about encouraging their use in hundreds if not thousands of journals. Evidence relating to CONSORT, STARD, MOOSE, QUOROM, and STROBE indicates that no standard way exists in which journals endorse reporting guidelines.152 153 154 155 Furthermore, other than including recommendations in their “Instructions to authors,” little is known about what else is done by individual journals to ensure adherence to reporting guidelines. This is an question of fidelity; the effect of endorsement is therefore plagued by different, and not well documented, processes as to the “strength” of endorsement. For example, some journals require a completed reporting guideline checklist as part of the manuscript submission, whereas others only suggest the use of reporting guidelines to facilitate writing of manuscripts. In both instances, whether or how journals check that authors adhere to journals’ recommendations/requirements is not known. One strategy would be to encourage peer reviewers to check adherence to the relevant reporting guideline. A 2012 survey of journals’ instructions to peer reviewers shows that reference to or recommendations to use reporting guidelines during peer review was rare (19 of 116 journals assessed).156 When mentioned, instructions on how to use reporting guidelines during peer review were entirely absent; most journals pointed to CONSORT but few other reporting guidelines. Specifically, surveys of journals’ instructions to authors with respect to endorsement of CONSORT show that guidance is inconsistent and ambiguous and does not provide authors with a strong indication of what is expected of them in terms of using CONSORT during the manuscript submission process.152 153 157 Evidence from this review and a similar CONSORT systematic review suggest much room for improvement in how journals seek to achieve adherence to reporting guidelines.16 17 Developers of reporting guidelines and editors could work together and agree on the optimal way to endorse and implement reporting guidelines across journals (bringing some standardization to the implementation process). A fundamental outcome used by evaluators was the completeness of reporting according to items from the reporting guideline. Ideally, this means that all concepts were reported about a particular reporting guideline checklist item. For example, in the STARD statement, one checklist item covers the “technical specifications of material and methods involved using how and when measurements were taken, and/or cite references for the index tests and reference standard.” For this item, some evaluations separated and tracked reporting information for the index test separately from the reference standard. We had to exclude nine evaluations that did not have any original, unmodified checklist items (that is, guidance items that were split into subcomponents or written with modified interpretation). Furthermore, as noted in tables 1 and 2, more than half of the included evaluations applied modifications to one or more items of the original guidance, negating the inclusion of those items in our analyses. Evaluating the completeness of reporting of reporting guidelines in relation to journals’ endorsement might seem straightforward. However, in reality, it is complex. One problem in approaching our analysis is that only three evaluations considered endorsement as the “intervention” of interest, of which two could be included in our quantitative analysis. As a result, we had to rework authors’ data to facilitate the comparisons of interest and track down journals’ endorsement information, requiring considerable time and effort. Evaluators of reporting guidelines, in general, have not considered endorsement as an “intervention” that has the potential to affect the completeness of reporting. Although evaluations in this review do not provide conclusive evidence, the CONSORT review provides some evidence that simple endorsement of reporting guidelines has the potential to affect the completeness of reporting.16 17 One design used in the literature is the comparison of complete reporting before and after the publication of a reporting guideline. In thinking about this as an intervention and then considering endorsement, endorsement would likely serve as a “stronger” intervention given the need for manuscripts to adhere to a journal’s “Instruction to authors” and subsequent editorial process. However, as mentioned above, the strength of endorsement is crucial and varies across journals. Thus, although not ideal, a journal’s statement about endorsement of a guideline is the best available proxy indicator of a journal’s policy and perhaps authors’ behavior around use of reporting guidelines. In terms of experimental designs, randomizing journals to endorse a reporting guideline or continue with usual editorial policy would be difficult, if not impossible. One method of intervening and evaluating can be with peer reviewers, as mentioned above. To our knowledge, at least one randomized trial by Cobo et al in 2011 has examined the use of reporting guidelines in the peer review process within a single journal that did not endorse any reporting guidelines; it found that manuscripts reviewed using reporting guidelines were of better quality than those that did not use reporting guidelines.158 Although these findings are applicable only to a single journal, more trials like this can provide journals with their own evidence on completeness of reporting and better inform editors as to whether efforts on endorsement and, further, implementation, are having their intended effects. Beyond simple publication of a guideline, little effort is dedicated to knowledge translation (implementation) activities. As defined by the Canadian Institutes of Health Research, the crux of knowledge translation is that it is a move beyond the simple dissemination of knowledge into the actual use/implementation of knowledge.159 The EQUATOR Network has gone some way in providing a collated home and network of reporting guidelines and resources. However, knowledge producers/guideline developers are responsible for ensuring appropriate and widespread use of a particular guideline by knowledge users. Developers and interested researchers may wish to think about studying the behaviors of target users (for example, prospective journal authors) and developing, carrying out, and evaluating strategies that have the potential to affect behavior change around guideline use, similar to ongoing work in implementation of clinical research.160 161

Future research

Future evaluations of reporting guidelines should assess unmodified reporting items. Non-experimental designs on the basis of journal endorsement status can help to supplement the evidence base. However, researchers in this area, such as guideline developers, should consider carrying out prospectively designed, controlled studies, like the study by Cobo et al,158 in the context of the journal’s editorial process to provide more robust evidence.

Conclusions

The completeness of reporting of only nine of 101 rigorously developed reporting guidelines has been evaluated in relation to journal endorsement status. Items from seven reporting guidelines were quantitatively analyzed by few evaluations each. Insufficient evidence exists to determine the relation between journals’ endorsement of reporting guidelines and the completeness of reporting in published health research reports. Future evaluations of reporting guidelines can take the form of comparisons based on journal endorsement status, but researchers should consider prospectively designed, controlled studies conducted in the context of the journal’s editorial process. Complete and transparent reporting of research, which is often inadequate and incomplete, enables readers to assess the internal validity and applicability of findings The completeness of reporting of the CONSORT guideline in relation to endorsement by journals has been evaluated and was shown to be associated with more complete reporting for several checklist items No systematic review has comprehensively reviewed evaluations of other reporting guidelines Apart from CONSORT, 101 rigorously developed reporting guidelines exist for reporting health research, only nine of which could be evaluated regarding their journal endorsement status and with data from only a few evaluations Few data are available to help editors regarding endorsement of specific reporting guidelines Future evaluations of reporting guidelines based on journal endorsement status can help to supplement the evidence base However, researchers should consider prospectively designed, controlled studies conducted in the context of the journal’s editorial process to provide more robust evidence

153 in total

1. Standardized definitions and clinical endpoints in carotid artery and supra-aortic trunk revascularization trials.

Authors: Krassen Nedeltchev; Peter M Pattynama; Giancarlo Biaminoo; Nicolas Diehm; Michael R Jaff; L Nelson Hopkins; Stephen Ramee; Marc van Sambeek; Aly Talen; Frank Vermassen; Alberto Cremonesi
Journal: Catheter Cardiovasc Interv Date: 2010-09-01 Impact factor: 2.692

2. Standardized reporting guidelines for studies evaluating risk stratification of emergency department patients with potential acute coronary syndromes.

Authors: Judd E Hollander; Andra L Blomkalns; Gerard X Brogan; Deborah B Diercks; John M Field; J Lee Garvey; W Brian Gibler; Timothy D Henry; James W Hoekstra; Brian R Holroyd; Yuling Hong; J Douglas Kirk; Brian J O'Neil; Raymond E Jackson; Tom Aufderheide; Andra L Blomkalns; Gerard X Brogan; James Christenson; Sean Collins; Deborah B Diercks; Francis M Fesmire; J Lee Garvey; Gary B Green; Christopher J Lindsell; W Frank Peacock; Charles V Pollack; Robert Zalenski
Journal: Ann Emerg Med Date: 2004-12 Impact factor: 5.721

3. Epidemiology and reporting of randomised trials published in PubMed journals.

Authors: An-Wen Chan; Douglas G Altman
Journal: Lancet Date: 2005 Mar 26-Apr 1 Impact factor: 79.321

4. CONSORT for reporting randomised trials in journal and conference abstracts.

Authors: Sally Hopewell; Mike Clarke; David Moher; Elizabeth Wager; Philippa Middleton; Douglas G Altman; Kenneth F Schulz
Journal: Lancet Date: 2008-01-26 Impact factor: 79.321

5. The quality of reporting of orthopaedic randomized trials with use of a checklist for nonpharmacological therapies.

Authors: Simon Chan; Mohit Bhandari
Journal: J Bone Joint Surg Am Date: 2007-09 Impact factor: 5.284

6. Association of study quality with completeness of reporting: have completeness of reporting and quality of systematic reviews and meta-analyses in major radiology journals changed since publication of the PRISMA statement?

Authors: Adam S Tunis; Matthew D F McInnes; Ramez Hanna; Kaisra Esmail
Journal: Radiology Date: 2013-07-03 Impact factor: 11.105

Review 7. Utstein-style guidelines for uniform reporting of laboratory CPR research. A statement for healthcare professionals from a task force of the American Heart Association, the American College of Emergency Physicians, the American College of Cardiology, the European Resuscitation Council, the Heart and Stroke Foundation of Canada, the Institute of Critical Care Medicine, the Safar Center for Resuscitation Research, and the Society for Academic Emergency Medicine. Writing Group.

Authors: A H Idris; L B Becker; J P Ornato; J R Hedges; N G Bircher; N C Chandra; R O Cummins; W Dick; U Ebmeyer; H R Halperin; M F Hazinski; R E Kerber; K B Kern; P Safar; P A Steen; M M Swindle; J E Tsitlik; I von Planta; M von Planta; R L Wears; M H Weil
Journal: Circulation Date: 1996-11-01 Impact factor: 29.690

8. Strengthening the reporting of Genetic RIsk Prediction Studies: the GRIPS Statement.

Authors: A Cecile J W Janssens; John P A Ioannidis; Cornelia M van Duijn; Julian Little; Muin J Khoury
Journal: PLoS Med Date: 2011-03-15 Impact factor: 11.069

9. Financial Conflicts of Interest Checklist 2010 for clinical research studies.

Authors: Paula A Rochon; John Hoey; An-Wen Chan; Lorraine E Ferris; Joel Lexchin; Sunila R Kalkar; Melanie Sekeres; Wei Wu; Marleen Van Laethem; Andrea Gruneir; James Maskalyk; David L Streiner; Jennifer Gold; Nathan Taback; David Moher
Journal: Open Med Date: 2010-03-24

Review 10. Does use of the CONSORT Statement impact the completeness of reporting of randomised controlled trials published in medical journals? A Cochrane review.

Authors: Lucy Turner; Larissa Shamseer; Douglas G Altman; Kenneth F Schulz; David Moher
Journal: Syst Rev Date: 2012-11-29

59 in total

1. Rigor, Transparency, and Reporting Social Science Research: Why Guidelines Don't Have to Kill Your Story.

Authors: Tracy Wharton
Journal: Res Soc Work Pract Date: 2015-12-31

2. Impact of the transparent reporting of evaluations with nonrandomized designs reporting guideline: ten years on.

Authors: Thomas Fuller; Jaime Peters; Mark Pearson; Rob Anderson
Journal: Am J Public Health Date: 2014-09-11 Impact factor: 9.308

3. Endorsing Reporting Guidelines: the Journal of Infection Prevention helps show the way!

Authors: Sheldon Stone; Barry Cookson
Journal: J Infect Prev Date: 2016-10-19

Review 4. Scoping review on interventions to improve adherence to reporting guidelines in health research.

Authors: David Blanco; Doug Altman; David Moher; Isabelle Boutron; Jamie J Kirkham; Erik Cobo
Journal: BMJ Open Date: 2019-05-09 Impact factor: 2.692

5. Epidemiology and reporting characteristics of preclinical systematic reviews.

Authors: Victoria T Hunniford; Joshua Montroy; Dean A Fergusson; Marc T Avey; Kimberley E Wever; Sarah K McCann; Madison Foster; Grace Fox; Mackenzie Lafreniere; Mira Ghaly; Sydney Mannell; Karolina Godwinska; Avonae Gentles; Shehab Selim; Jenna MacNeil; Lindsey Sikora; Emily S Sena; Matthew J Page; Malcolm Macleod; David Moher; Manoj M Lalu
Journal: PLoS Biol Date: 2021-05-05 Impact factor: 8.029

Review 6. How Well Is Quality Improvement Described in the Perioperative Care Literature? A Systematic Review.

Authors: Emma L Jones; Nicholas Lees; Graham Martin; Mary Dixon-Woods
Journal: Jt Comm J Qual Patient Saf Date: 2016-05

7. A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR Network.

Authors: Douglas G Altman; Iveta Simera
Journal: J R Soc Med Date: 2016-02 Impact factor: 5.344

8. Improving the quality of toxicology and environmental health systematic reviews: What journal editors can do.

Authors: Paul Whaley; Bas J Blaauboer; Jan Brozek; Elaine A Cohen Hubal; Kaitlyn Hair; Sam Kacew; Thomas B Knudsen; Carol F Kwiatkowski; David T Mellor; Andrew F Olshan; Matthew J Page; Andrew A Rooney; Elizabeth G Radke; Larissa Shamseer; Katya Tsaioun; Peter Tugwell; Daniele Wikoff; Tracey J Woodruff
Journal: ALTEX Date: 2021-06-22 Impact factor: 6.250

Review 9. Reporting Quality of Systematic Reviews and Meta-Analyses of Otorhinolaryngologic Articles Based on the PRISMA Statement.

Authors: Jeroen P M Peters; Lotty Hooft; Wilko Grolman; Inge Stegeman
Journal: PLoS One Date: 2015-08-28 Impact factor: 3.240

10. Completeness of reporting of quality improvement studies in neonatology is inadequate: a systematic literature survey.

Authors: Catherine Hu; Jie Yi Wang; Zoe El Helou; Muhammad Taaha Hassan; Zheng Jing Hu; Gerhard Fusch; Lawrence Mbuagbaw; Salhab El Helou; Lehana Thabane
Journal: BMJ Open Qual Date: 2021-06