| Literature DB >> 35155034 |
John L Williams1, Hsini Cindy Chu1, Marissa K Lown1, Joseph Daniel1, Renate D Meckl1, Darshit Patel1, Radwa Ibrahim1.
Abstract
Recent evidence indicates that many clinical and preclinical studies are not reproducible. Prominent causes include design and implementation issues, low statistical power, unintentional bias, and incomplete reporting in the published literature. The primary goal of this study was to assess the quality of published research in three prominent cardiovascular research journals by examining statistical power and assessing the adherence to augmented ARRIVE guidelines (Animal Research: Reporting of In Vivo Experiments). For unpaired t-tests, the average median power for a 20% and 50% change was 0.27 ± 0.06 and 0.88 ± 0.08, respectively. For analysis of guidelines, 40 categories were assessed with a 0-2 scale. Although many strengths were observed, several key elements that were needed for reproducibility were inadequate, including differentiation of primary and secondary outcomes, power calculations for group size, allocation methods, use of randomization and blinding, checks for normality, reports of attrition, and adverse events of subjects, and assessment of bias. A secondary goal was to examine whether a required checklist improved the quality of reporting; those results indicated that a checklist improved compliance and quality of reporting, but adequacy levels in key categories were still too low. Overall, the findings of this study indicated that the probability for reproducibility of many clinical and preclinical cardiovascular research studies was low because of incomplete reporting, low statistical power, and lack of research practices that decrease experimental bias. Expansion of group sizes to increase power, use of detailed checklists, and closer monitoring for checklist adherence by editors and journals should remediate many of these deficits and increase the likelihood of reproducibility.Entities:
Keywords: arrive; bias; cardiovascular research; checklist; irreproducibility; power; preclinical research; reproducibility; statistical power
Year: 2022 PMID: 35155034 PMCID: PMC8825449 DOI: 10.7759/cureus.21086
Source DB: PubMed Journal: Cureus ISSN: 2168-8184
Journal volumes reviewed and the types of analysis conducted.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
| Journal | Volume | Dates | Power analysis | Guidelines analysis |
| AJP:HC 2017 | 312 | January-June, 2017 | X | X |
| AJP:HC 2019 | 316 | January-June, 2019 | X | X |
| AJP:HC 2020 | 319 | July-December 2020 | X | |
| AJP:HC 2021 | 320 | January-March 2021 | X | |
| Circ Research 2019 | 124 | January-June, 2019 | X | X |
| CV Research 2019 | 115 | January-June, 2019 | X | X |
| CV Research 2020 | 116 | January-June, 2020 | X |
Numbers of unpaired t-tests that were analyzed in the articles.
Total tests, significant tests, and NS tests refer, respectively, to total numbers of unpaired t-tests, those that were reportedly statistically significant, and those that were not statistically significant.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
| Journal | Volume | Total articles | Total tests | Significant tests | NS tests |
| AJP:HC 2017 | 312 | 37 | 386 | 216 | 170 |
| AJP:HC 2019 | 316 | 37 | 491 | 237 | 254 |
| AJP:HC 2020 | 319 | 33 | 380 | 195 | 185 |
| AJP:HC 2021 | 320 | 21 | 198 | 111 | 87 |
| AJP:HC total | 128 | 1,455 | 759 | 696 | |
| Circ Research 2019 | 124 | 26 | 262 | 151 | 111 |
| CV Research 2019 | 115 | 37 | 337 | 149 | 188 |
| CV Research 2020 | 116 | 40 | 399 | 258 | 141 |
| CV Research total | 77 | 736 | 407 | 329 | |
| Totals | 229 | 2,453 | 1,317 | 1,136 |
Questions used for the analysis of adherence to guidelines and recommendations.
Cat refers to reference letters that are used for each category in the text. 1 Original ARRIVE guidelines [13]. 2 Revised ARRIVE guidelines [14].
ARRIVE = Animal Research: Reporting of In Vivo Experiments; IACUC = Institutional Animal Care and Use Committee; IRB = Institutional Review Board.
| Cat | Descriptor | Section | 20101 | 20202 | Question summary |
| A | Title | Title | 1 | Was the title as accurate and concise as possible? | |
| B | Abstract | Abstract | 2 | 11 | Was the abstract an accurate summary of the background, goals, methods, findings, and conclusions? |
| C | Background | Introduction | 3a | 12a | Did the introduction include adequate background of the context and experimental approach? |
| D | Model | Introduction | 3b | 12b | Did the introduction explain how the experimental model addresses the experimental objectives? |
| E | Objectives | Introduction | 4 | 13 | Did the introduction clearly describe the objectives or hypothesis of the study? |
| F | IACUC/IRB | Methods | 5 | 14 | Did the methods adequately address whether the study conformed to ethical standards or review? |
| G | Groups (IV) | Methods | 6a | 1a | Were the number of experimental and control groups (independent variables) described? |
| H | Randomization | Methods | 6b | 4a | Were steps taken to reduce bias through randomization of treatments or measurements? |
| I | Blinding | Methods | 6b | 5 | Were steps taken to reduce bias through blinding during measurements and analysis of data? |
| J | Units | Methods | 6c | 1b | In the methods, were details given on what units were used in the experiments (animals/subjects, tissues, cells, etc.)? |
| K | How/why | Methods | 7a,d | 9a,d | In the methods, were procedures described in detail and the rationale given for the methods? |
| L | When/where | Methods | 7b,c | 9b,c | Were details given describing details of when (e.g., time of day) and where procedures occurred? |
| M | Animals | Methods | 8 | 8 | Was information given about the species, strain, sex, age, weight, and source of animals used? |
| N | Housing | Methods | 9 | 15 | Was there specific information on the types of facilities and husbandry conditions? |
| O | Group numbers | Methods | 10a | 2a | In the methods, was the total number of subjects or units used and the numbers in each experimental group specified? |
| P | Power | Methods | 10b | 2b | Was there an explanation of how the number of subjects (or other units) needed was determined? |
| Q | Allocation | Methods | 11a | 4b | Were details given on allocation to treatment groups (e.g., randomization or matching)? |
| R | Treatment order | Methods | 11b | 4b | Was information detailed on the order of treatments or assessment of experimental subjects (or units) within the groups? |
| S | Outcomes | Methods | 12 | 6a | In the methods, were the experimental outcomes that were assessed clearly defined? |
| T | 1°/2° outcomes | Methods | 12 | 6b | Were the experimental outcomes that were assessed delineated as primary and secondary outcomes? |
| U | Statistics | Methods | 13a,b | 7a | Were the statistics that were used described clearly in detail, including for the analysis for each set of data? |
| V | Normality | Methods | 13c | 7b | When parametric statistical tests were used, did the analysis include tests for normality of the group data? |
| W | Homogeneity | Methods | 13c | 7b | When parametric statistical tests were used, did the analysis include tests for homogeneity of the group variances? |
| X | Design | Methods | Were the statistics used appropriately for the design of the experimental protocols? | ||
| Y | Euthanasia | Methods | 16a | How were animals euthanized for experimental purposes or at the end of experiments? | |
| Z | Baseline | Results | 14 | Were baseline characteristics (e.g., weight and heart rate) of subjects prior to experimental treatments reported? | |
| AA | Animal numbers | Results | 15a | 3c | Was the number of subjects used for each group for each analysis reported? |
| BB | Vague numbers | Results | 15a | 3c | Was the number of subjects used for each group reported as a range of values? |
| CC | Figures/tables | Results | 3c | Did the legends of figures and tables report both the numbers of subjects and the statistical tests used for each analysis? | |
| DD | Attrition | Results | 15b | 3a,b | Did the paper report subjects or data that were excluded from the analysis? |
| EE | Distribution | Results | 10a | Were estimates of the group distributions (mean, median, standard deviation, etc.) used appropriately? | |
| FF | Precision | Results | 16 | 10b | When appropriate, were the results of statistical tests reported with measures of precision (i.e., SE or CI)? |
| GG | Specific P | Results | Were specific P-values reported for statistical tests? | ||
| HH | Adverse events | Results | 17 | 16b | Did the paper report adverse events that might have resulted from experimental treatments? |
| II | Bias | Discussion | 18b | 17b | Did the discussion address any potential sources of bias in the development, conduct, or analysis of experiments? |
| JJ | Limitations | Discussion | 18b | 17b | Did the discussion address the limitations of the experimental model and possible sources of imprecision of the results? |
| KK | 3 Rs | Discussion | 18c | Did any aspect of the report address the replacement, refinement, or reduction of the use of animals in the research? | |
| LL | Generalization | Discussion | 19 | 18 | Did the discussion address ways that the findings might extrapolate to other species or human biology? |
| MM | Funding | Discussion | 20 | 21b | Did the paper list funding sources and describe the role of the funding agencies? |
| NN | Conflicts | 21a | Were conflicts of interest addressed? |
Comparison of scores among journal categories.
The descriptor categories are from Table 3. The P-value for each category resulted from the Kruskal-Wallis test. Post hoc significant differences between individual journals are indicated by letters (P < 0.05 with Bonferroni method): (a) AJP:HC 2017 vs. AJP:HC 2019; (b) AJP:HC 2017 vs. Circ Research; (c) AJP:HC 2017 vs. CV Research; (d) AJP:HC 2019 vs. Circ Research; (e) AJP:HC 2019 vs. CV Research; (f) Circ Research vs. CV Research. 1 If no superscript is present by a descriptor, the analysis included both nonhuman and human subjects: numbers of articles were 95, 48, 30, and 30 for AJP:HC 2017, AJP:HC 2019, Circ Research 2019, and CV Research 2019, respectively. 2 Analysis excluded articles with human subjects: numbers of articles were 74, 38, 26, and 28 for AJP:HC 2017, AJP:HC 2019, Circ Research 2019, and CV Research 2019, respectively. 3 Power value was calculated as a single difference of 20% or 50% from an initial value from AJP:HC 2017 in three of the groups. For categories T and II in AJP:HC 2017, all individual values were zero; therefore, prospective power could not be calculated. * The overall P-value with the Kruskal-Wallis test was statistically significant, but after the Bonferroni adjustment, the calculated P-value for this comparison (.0128) was slightly short of the statistical cutoff of 0.008.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
| Descriptor1 | P-value | Post hoc | Power3 | |
| A | Title | .278 | .92/1.00 | |
| B | Abstract | .013 | d | .79/1.00 |
| C | Background | .165 | .96/1.00 | |
| D | Model2 | .019 | b | .54/1.00 |
| E | Objectives | .123 | .88/1.00 | |
| F | IACUC/IRB | .070 | 1.00/1.00 | |
| G | Groups (IV) | .624 | .67/1.00 | |
| H | Randomization | <0.0001 | b, d | .07/.19 |
| I | Blinding | <0.0001 | b, d, f | .07/.15 |
| J | Units | .762 | 1.00/1.00 | |
| K | How/why | .895 | .96/1.00 | |
| L | When/where | <0.001 | b, c | .07/.18 |
| M | Animals2 | .005 | a | .59/1.00 |
| N | Housing2 | <0.001 | d | .09/.36 |
| O | Group numbers | .344 | .13/.60 | |
| P | Power | <0.0001 | b, d, f | .05/.08 |
| Q | Allocation | .661 | .17/.80 | |
| R | Treatment order | .109 | .05/.07 | |
| S | Outcomes | <0.001 | a | .99/1.00 |
| T | 1°/2° outcomes | .040 | b* | — |
| U | Statistics | .440 | .68/1.00 | |
| V | Normality | <0.0001 | b, d, f | .07/.16 |
| W | Homogeneity | .560 | .05/.08 | |
| X | Design | .259 | .58/1.00 | |
| Y | Euthanasia | .006 | c, f | .15/.73 |
| Z | Baseline | .004 | a, e | .10/.40 |
| AA | Subject numbers | .149 | .72/1.00 | |
| BB | Vague numbers | .716 | .08/.34 | |
| CC | Figures/tables | <0.0001 | c, e | .34/.90 |
| DD | Attrition | <0.0001 | b, d, f | .06/.16 |
| EE | Distribution | .320 | .60/1.00 | |
| FF | Precision | .771 | .41/1.00 | |
| GG | Specific P | .010 | e | .09/.34 |
| HH | Adverse events | <0.001 | b | .05/.07 |
| II | Bias | .095 | — | |
| JJ | Limitations | .138 | .49/1.00 | |
| KK | 3 Rs2 | .282 | .05/.06 | |
| LL | Generalization2 | .104 | .39/1.00 | |
| MM | Funding | .383 | .99/1.00 | |
| NN | Conflicts | .768 | 1.00/1.00 |
Figure 1Power of unpaired t-tests for a 20% (A) and 50% (B) change from the initial values.
The central bars in these box-and-whisker plots are the median values (50th percentile). The lower and upper limits of the rectangles represent the 25th and 75th percentiles, respectively. The lower and upper limits of the bars represent the minimum and maximum values, respectively (i.e., the range). The numbers over the maximum values in A indicate n for each journal for both A and B.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Figure 2Percent of NS unpaired t-tests that did not achieve adequate power.
The percent of NS unpaired t-tests that did not achieve the power of at least 0.80 for a difference of 20% or 50% from the initial values are shown by black and striped bars, respectively. The gray bars indicate the percent of the total tests analyzed for each journal volume that was NS. The numbers above bars indicate n.
NS = not significant; AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Figure 3Adequacy scores (%) for categories of the title, abstract, and introduction.
Symbols: X, AJP:HC 2017; +, AJP:HC 2019; O, Circ Research 2019; △, CV Research 2019. * Analysis excluded articles with human subjects: numbers of articles were 74, 38, 26, and 28 for AJP:HC 2017, AJP:HC 2019, Circ Research 2019, and CV Research 2019, respectively. Otherwise, all subjects were included: numbers of articles were 95, 48, 30, and 30 for AJP:HC 2017, AJP:HC 2019, Circ Research 2019, and CV Research 2019, respectively.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Figure 4Adequacy scores (%) for categories of the methods section.
Symbols: X, AJP:HC 2017; +, AJP:HC 2019; O, Circ Research 2019; △, CV Research 2019. For n, see the legend of Figure 3. * Analysis excluded articles with human subjects.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Figure 5Adequacy scores (%) for categories of the methods section (continued from Figure 4).
Symbols: X, AJP:HC 2017; +, AJP:HC 2019; O, Circ Research 2019; △, CV Research 2019. For n, see the legend of Figure 3. * Analysis excluded articles with human subjects.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Figure 6Adequacy scores (%) for categories of the results section.
Symbols: X, AJP:HC 2017; +, AJP:HC 2019; O, Circ Research 2019; △, CV Research 2019. For n, see the legend of Figure 3.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Figure 7Adequacy scores (%) for categories of the discussion and acknowledgements sections.
Symbols: X, AJP:HC 2017; +, AJP:HC 2019; O, Circ Research 2019; △, CV Research 2019. For n, see the legend of Figure 3. * Analysis excluded articles with human subjects.
AJP:HC = American Journal of Physiology: Heart and Circulatory Physiology; Circ Research = Circulation Research; CV Research = Cardiovascular Research.
Categories from Table 3 that overlap with those of the checklist in Circ Research 2019 (n = 26).
Refer to Table 3 and Circ Research checklist [17] for more detailed information on the categories. † Significantly greater adequacy scores (P < 0.05) for Circ Res 2019 (Table 4). Categories in Circ Research checklist: 1 study design, 2 randomization, 3 blinding, 4 sample size and power calculations, 5 data reporting, 6 statistical methods, and 7 experimental details, ethics, and funding statements.
IACUC = Institutional Animal Care and Use Committee; IRB = Institutional Review Board; Circ Research = Circulation Research.
| Category | Descriptor | Score (mean ± SD) | Adequate (%) | Weak (%) | Absent (%) | Mismatch (%) |
| F | IACUC/IRB7 | 1.86 ± 0.35 | 86 | 14 | 0 | 0 |
| G | Groups1 | 1.50 ± 0.51 | 50 | 50 | 0 | 0 |
| H† | Randomization2 | 1.07 ± 0.83 | 37 | 33 | 30 | 3 |
| I † | Blinding3 | 1.17 ± 0.65 | 30 | 57 | 15 | 3 |
| M | Animals5 | 1.54 ± 0.51 | 54 | 46 | 0 | 0 |
| N | Housing5 | 0.19 ± 0.49 | 4 | 11 | 85 | 77 |
| P † | Power4 | 1.00 ± 0.95 | 43 | 13 | 43 | 23 |
| T † | 1°/2° outcomes1 | 0.17 ± 0.53 | 7 | 3 | 90 | 50 |
| U | Statistics6 | 1.57 ± 0.57 | 60 | 37 | 3 | 0 |
| V † | Normality6 | 1.43 ± 0.82 | 63 | 17 | 20 | 10 |
| AA | Animal numbers1 | 1.83 ± 0.38 | 83 | 17 | 0 | 0 |
| DD† | Attrition5 | 1.20 ± 0.96 | 57 | 7 | 37 | 17 |
| EE | Distribution6 | 1.20 ± 0.55 | 27 | 67 | 7 | 3 |
| FF | Precision6 | 1.57 ± 0.82 | 77 | 3 | 20 | 13 |
| HH † | Adverse events5 | 0.69 ± 0.93 | 31 | 7 | 62 | 14 |
| MM | Funding7 | 1.03 ± 0.18 | 3 | 97 | 0 | 0 |
| NN | Conflicts7 | 2.00 ± 0.00 | 100 | 0 | 0 | 0 |