Literature DB >> 12437785

Prominent medical journals often provide insufficient information to assess the validity of studies with negative results.

Randy S Hebert¹, Scott M Wright, Robert S Dittus, Tom A Elasy.

Abstract

BACKGROUND: Physicians reading the medical literature attempt to determine whether research studies are valid. However, articles with negative results may not provide sufficient information to allow physicians to properly assess validity.
METHODS: We analyzed all original research articles with negative results published in 1997 in the weekly journals BMJ, JAMA, Lancet, and New England Journal of Medicine as well as those published in the 1997 and 1998 issues of the bimonthly Annals of Internal Medicine (N = 234). Our primary objective was to quantify the proportion of studies with negative results that comment on power and present confidence intervals. Secondary outcomes were to quantify the proportion of these studies with a specified effect size and a defined primary outcome. Stratified analyses by study design were also performed.
RESULTS: Only 30% of the articles with negative results comment on power. The reporting of power (range: 15%-52%) and confidence intervals (range: 55-81%) varied significantly among journals. Observational studies of etiology/risk factors addressed power less frequently (15%, 95% CI, 8-21%) than did clinical trials (56%, 95% CI, 46-67%, p < 0.001). While 87% of articles with power calculations specified an effect size the authors sought to detect, a minority gave a rationale for the effect size. Only half of the studies with negative results clearly defined a primary outcome.
CONCLUSION: Prominent medical journals often provide insufficient information to assess the validity of studies with negative results.

Entities: Disease Gene Species

Mesh：

Year: 2002 PMID： 12437785 PMCID： PMC131026 DOI： 10.1186/1477-5751-1-1

Source DB: PubMed Journal: J Negat Results Biomed ISSN： 1477-5751

Background

Physicians are faced with the challenge of assessing whether the conclusions of research studies are valid. Power, the probability that a study will detect an effect of a specified size, is analogous to the sensitivity of a diagnostic test. [1] Just as a negative result does not rule out disease when the test applied has low sensitivity, a negative study with inadequate power cannot disprove a research hypothesis. Power/sample size calculations play an important role in study planning, give readers an idea of the adequacy of the investigation, and help readers assess the validity of studies with negative results. [2-4] Effect size (delta) is a critical component of power calculations. Investigators choose from a wide range of possible deltas when calculating sample size. Clinicians and investigators also often struggle to determine what effect size is reasonable to expect.[2,5-8] Consequently, it is important for investigators to report the effect size they wish to detect. However, this is often neglected.[8] Sample size calculations alone are insufficient for the interpretation of studies with negative results; power and confidence intervals compliment each other and should both be reported.[6,9] Confidence intervals take into account the data actually collected, define the upper and lower range consistent with a study's data, provide an estimate of precision, and can give readers some indication of the clinical significance of the results. [10-13] Our work adds to the literature in several ways. Several authors have found that many randomized controlled trials were underpowered, or had an unacceptable risk of missing an important effect due to inadequate sample size. [14-21] Because power calculations are often complicated,[21] many readers are unlikely to have the statistical sophistication necessary to perform a power analysis. Therefore, we were interested in whether articles provided information necessary for readers to assess the validity of studies with negative results. We looked for evidence of power/sample size calculations and effect size. In addition, unlike prior work, we examined studies for documentation of confidence intervals.[22] Finally, because the calculation of sample size is applicable to all comparative studies, we did not limit our study to randomized controlled trials.[23] Our primary objective was to quantify the proportion of studies with negative results within prominent general medical journals[24] that comment on power and present confidence intervals. Secondary outcomes were to quantify the proportion of these studies with a specified delta and a defined primary outcome.

Methods

All articles from the 1997 issues of the British Medical Journal (BMJ), Journal of the American Medical Association (JAMA), Lancet, and the New England Journal of Medicine (NEJM) were reviewed. Because the Annals of Internal Medicine (Annals) is published bimonthly, all articles from 1997 and 1998 were reviewed so as to include a comparable number of articles. One investigator (RSH) manually searched the journals and reviewed all articles for eligibility. Review articles, meta-analyses, modeling studies, decision and cost-effective analyses, case reports, editorials, letters, and studies without inferential statistics (i.e. descriptive studies) were excluded. Equivalence trials (studies designed to show equivalent efficacy of treatments) were included because power analysis, confidence intervals, and delta are particularly important to their design. Methodological issues involved in the design and analysis of these studies have been described elsewhere.[25,26] Articles were classified as having negative results if 1) the primary outcome(s) was not statistically significant (i.e. the article had an explicit statement that the comparison between two groups did not reach statistical significance) or 2) in those articles with no primary outcome(s), any of the first three outcomes were not statistically significant. Other outcomes were not evaluated. A second author (TAE) reviewed the full text of a simple random sample of 50 articles and the kappa statistic was calculated to assess the intraobserver variability for our classification scheme. We examined articles to see if the authors named a primary outcome variable. We employed a decision rule, modified from Moher and colleagues, to define the primary outcome in those articles where none was specified.[19] If an article reported a sample size calculation, this was assumed to be the primary outcome.[27] If calculations were not performed, a total of three outcomes, if present, were examined. In those articles with multiple outcomes and none defined as primary, the three outcomes evaluated were the first three listed in the abstract (or result section if less than three outcomes were listed in the abstract). The full text of included articles was systemically reviewed. Data was abstracted by a single author (RSH) and recorded in standardized fashion. Information was recorded on whether the article had a primary outcome(s), commented on power, sample size calculations, and confidence intervals pertaining to the outcomes evaluated, a projected delta, and a reason for this delta. A paper was given credit for addressing power if sample size calculations or comments on power/sample size were present. Power, sample size calculations, and confidence intervals could pertain to any one of the three outcomes evaluated and was not necessary for all outcomes. Comparisons were made across journals by Chi-square analysis. We also assessed articles for comment on power and/or presentation of confidence intervals while stratifying by study design (clinical trials, observational studies of etiology/risk factors, screening/diagnosis, prognosis, and other). Responses were summarized as proportions and 95% confidence intervals. All data was analyzed using STATA 6.0 (Stata Corp., College Station, TX).

Results

One thousand thirty eight articles were eligible for analysis. Two hundred thirty four (23%) were classified as negative. There was good agreement between observers in the classification of articles (k = 0.74). The percent of negative articles per journal was: Annals 20% (41/203), BMJ 22% (57/256), JAMA 23% (44/191), Lancet 22% (46/205), and NEJM 25% (46/183) (p = 0.857). Thirty percent (70/234) of studies with negative results had comments on power and/or sample size calculations. Seventy three percent (171/234) included confidence intervals. The reporting of power (range: 15%-52%) and confidence intervals (range: 55–81%) varied significantly among journals. Twenty two percent of the studies included both power/sample size calculations and confidence intervals. There existed significant variation between journals in the reporting of power/sample size calculations and confidence intervals (Table 1). Because clinical trials (n = 87) and observational studies of etiology/risk factors (n = 109) were the predominant study designs (84% of the negative studies), articles with other study designs were not examined further. Fifty six percent (95% CI, 46–67%) of negative clinical trials and 15% (95% CI, 8–21%) of negative observational risk factor/etiology studies addressed power/sample size (p < 0.001). For reporting confidence intervals, the corresponding percentages were 79% (95% CI, 71–87%) and 75% (95% CI, 65–84%), respectively (p = 0.489).

Table 1

Negative articles addressing power/sample size and confidence intervals

Journal	Power/Sample Size*	Confidence Intervals†	Power/Sample Size and Confidence Intervals*
	n (%, 95% CI)
Annals	6/41 (15, 3–26)	33/41 (80, 68–93)	5/41 (12, 2–23)
BMJ	11/57 (19, 9–30)	46/57 (81, 70–91)	10/57 (18, 7–28)
JAMA	10/44 (23, 10–36)	24/44 (55, 39–70)	3/44 (7, 0–15)
Lancet	24/46 (52, 37–67)	34/46 (74, 61–87)	20/46 (43, 29–58)
NEJM	19/46 (41, 27–56)	34/46 (74, 61–87)	13/46 (28, 15–42)
Total	70/234 (30, 24–36)	171/234 (73, 67–79)	51/234 (22, 16–27)

* P < 0.001 † P = 0.038

Negative articles addressing power/sample size and confidence intervals * P < 0.001 † P = 0.038 Of the negative articles including information about sample size, 87% (61/70) specified a delta or the effect size that the authors sought to detect. A minority, 43% (26/61), explained the rationale behind the delta chosen. Of these, 77% (20/26) cited references or pilot studies to support their rationale. Only 52% (122/234) of articles with negative results had a clearly defined primary outcome(s).

Discussion

Many articles underreport power/sample size calculations and confidence intervals. Significant variation exists among journals. Our work demonstrates that power was reported more often in clinical trials than in observational studies of etiology/risk factors. Investigators involved in randomized clinical trials may be more familiar with the importance of power and sample size calculation.[28] Also, investigators conducting observational studies often do not have the ability to determine sample size prior to beginning their work. Most articles with sample size calculations reported a projected effect size, but only a minority shared the rationale behind this delta and even less provided empiric evidence to support the rationale. While this manuscript describes an analysis of a large body of studies with negative results, several limitations must be considered. First, although most negative studies did not list power/sample size calculation, we cannot be certain this had not been performed a priori. It is also possible that, for the sake of brevity, authors and/or editors omitted power/sample size calculations from the final text when preparing manuscripts for submission. While it is possible these calculations were done but not reported, this may not be the case.[29] Second, our definition of a negative study may seem unduly broad. We examined three outcomes in order to classify articles because articles frequently report several outcomes, often with none defined as primary. [30-33] Previous authors, limiting their work to randomized controlled trials, who have encountered multiple outcomes have defined the primary outcome as "the most clinically important"[19] or the outcome that was the "primary focus of the article".[20] These outcomes are often not possible to discern in observational studies. Nonetheless, our results may represent a best-case scenario given the publication bias against articles with negative results and the fact that we examined the more prominent general medical journals.[34]

Conclusions

In summary, this study demonstrates that prominent medical journals often provide insufficient information to assess the validity of studies with negative results. Authors and journal editors need to include this information so readers can be informed consumers of the medical literature.

Competing interests

1. The research was not supported by external funds. 2. There are no competing interests including financial, stocks, honoraria, speaker's fees, and any competing academic, religious, moral, or personal interests for all authors. 3. We have no financial interest in the material contained in the manuscript. 4. The manuscript is neither under review by another publisher nor previously published. 5. All authors have participated in the design, analysis, and writing of the accompanying manuscript. 6. All authors have approved the final manuscript and have taken care to ensure the integrity of the work.

Authors' contributions

1. Randy S Hebert MD MPH • Conception and design • Acquisition of data • Analysis and interpretation of data • Drafted and revised the article • Gives final approval of the version for publication 2. Scott M Wright MD • Analysis and interpretation of data • Revised the article for important intellectual content • Gives final approval of the version for publication 3. Robert S Dittus MD MPH • Analysis and interpretation of data • Revised the article for important intellectual content • Gives final approval of the version for publication 4. Tom A Elasy MD MPH • Conception and design • Analysis and interpretation of data • Drafted and revised the article • Gives final approval of the version for publication

28 in total

1. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation.

Authors: D Moher; A Jones; L Lepage
Journal: JAMA Date: 2001-04-18 Impact factor: 56.272

2. Negative results of randomized clinical trials published in the surgical literature: equivalency or error?

Authors: J B Dimick; M Diener-West; P A Lipsett
Journal: Arch Surg Date: 2001-07

3. Sample size and statistical power of randomised, controlled trials in orthopaedics.

Authors: K B Freedman; S Back; J Bernstein
Journal: J Bone Joint Surg Br Date: 2001-04

4. Clinical biostatistics. XXXIV. The other side of 'statistical significance': alpha, beta, delta, and the calculation of sample size.

Authors: A R Feinstein
Journal: Clin Pharmacol Ther Date: 1975-10 Impact factor: 6.875

5. Claims of equivalence in randomized controlled trials of the treatment of bacterial meningitis in children.

Authors: Damian J Krysan; Alex R Kemper
Journal: Pediatr Infect Dis J Date: 2002-08 Impact factor: 2.129

6. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials.

Authors: J A Freiman; T C Chalmers; H Smith; R R Kuebler
Journal: N Engl J Med Date: 1978-09-28 Impact factor: 91.245

7. Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols.

Authors: An-Wen Chan; Asbjørn Hróbjartsson; Karsten J Jørgensen; Peter C Gøtzsche; Douglas G Altman
Journal: BMJ Date: 2008-12-04

8. Negative animal studies published in Indian Medical Journal: Are they methodologically strong enough to conclude what they are concluding?

Authors: Jaykaran Charan; Deepak Saxena
Journal: J Pharm Bioallied Sci Date: 2014-04

8 in total

Prominent medical journals often provide insufficient information to assess the validity of studies with negative results.

Background

Methods

Results

Discussion

Conclusions

Competing interests

Authors' contributions

1. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation.

2. Negative results of randomized clinical trials published in the surgical literature: equivalency or error?

3. Sample size and statistical power of randomised, controlled trials in orthopaedics.

4. Clinical biostatistics. XXXIV. The other side of 'statistical significance': alpha, beta, delta, and the calculation of sample size.

5. Claims of equivalence in randomized controlled trials of the treatment of bacterial meningitis in children.

6. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials.

7. The other side of statistical significance: a review of type II errors in the Australian medical literature.

8. The continuing unethical conduct of underpowered clinical trials.

9. A quality assessment of randomized control trials of primary treatment of breast cancer.

10. Beta, or type II error in psychiatric controlled clinical trials.

Review 1. A systematic review of the quality of publications reporting coronary artery bypass grafting trials.

2. Resolving the negative data publication dilemma in translational stroke research.

3. Delta inflation: a bias in the design of randomized controlled trials in critical care medicine.

4. Nonsignificant P values cannot prove null hypothesis: Absence of evidence is not evidence of absence.

5. Quality of antidepressant drugs research articles published in Indian medical journals.

Review 6. Dealing with the positive publication bias: Why you should really publish your negative results.

7. Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols.

8. Negative animal studies published in Indian Medical Journal: Are they methodologically strong enough to conclude what they are concluding?