F Daniel Ramirez1, Pouya Motazedian1, Richard G Jung1, Pietro Di Santo1, Zachary D MacDonald1, Robert Moreland1, Trevor Simard1, Aisling A Clancy1, Juan J Russo1, Vivian A Welch1, George A Wells1, Benjamin Hibbert2. 1. From the Division of Cardiology (F.D.R., P.M., R.G.J., P.D.S., T.S., J.J.R., B.H.), CAPITAL Research Group (F.D.R., P.M., R.G.J., P.D.S., Z.D.M.D., R.M., T.S., J.J.R., B.H.), Vascular Biology and Experimental Medicine Laboratory (R.G.J., T.S., B.H.), and Cardiovascular Research Methods Centre (G.A.W.), University of Ottawa Heart Institute, Ontario, Canada; and School of Epidemiology, Public Health and Preventive Medicine (F.D.R., V.A.W., G.A.W.), Department of Cellular and Molecular Medicine (R.G.J., T.S., B.H.), Department of Radiology (R.M.), Department of Obstetrics and Gynecology (A.A.C.), Bruyère Research Institute (V.A.W.), and Centre for Global Health (V.A.W.), University of Ottawa, Ontario, Canada. 2. From the Division of Cardiology (F.D.R., P.M., R.G.J., P.D.S., T.S., J.J.R., B.H.), CAPITAL Research Group (F.D.R., P.M., R.G.J., P.D.S., Z.D.M.D., R.M., T.S., J.J.R., B.H.), Vascular Biology and Experimental Medicine Laboratory (R.G.J., T.S., B.H.), and Cardiovascular Research Methods Centre (G.A.W.), University of Ottawa Heart Institute, Ontario, Canada; and School of Epidemiology, Public Health and Preventive Medicine (F.D.R., V.A.W., G.A.W.), Department of Cellular and Molecular Medicine (R.G.J., T.S., B.H.), Department of Radiology (R.M.), Department of Obstetrics and Gynecology (A.A.C.), Bruyère Research Institute (V.A.W.), and Centre for Global Health (V.A.W.), University of Ottawa, Ontario, Canada. bhibbert@ottawaheart.ca.
Abstract
RATIONALE: Methodological sources of bias and suboptimal reporting contribute to irreproducibility in preclinical science and may negatively affect research translation. Randomization, blinding, sample size estimation, and considering sex as a biological variable are deemed crucial study design elements to maximize the quality and predictive value of preclinical experiments. OBJECTIVE: To examine the prevalence and temporal patterns of recommended study design element implementation in preclinical cardiovascular research. METHODS AND RESULTS: All articles published over a 10-year period in 5 leading cardiovascular journals were reviewed. Reports of in vivo experiments in nonhuman mammals describing pathophysiology, genetics, or therapeutic interventions relevant to specific cardiovascular disorders were identified. Data on study design and animal model use were collected. Citations at 60 months were additionally examined as a surrogate measure of research impact in a prespecified subset of studies, stratified by individual and cumulative study design elements. Of 28 636 articles screened, 3396 met inclusion criteria. Randomization was reported in 21.8%, blinding in 32.7%, and sample size estimation in 2.3%. Temporal and disease-specific analyses show that the implementation of these study design elements has overall not appreciably increased over the past decade, except in preclinical stroke research, which has uniquely demonstrated significant improvements in methodological rigor. In a subset of 1681 preclinical studies, randomization, blinding, sample size estimation, and inclusion of both sexes were not associated with increased citations at 60 months. CONCLUSIONS: Methodological shortcomings are prevalent in preclinical cardiovascular research, have not substantially improved over the past 10 years, and may be overlooked when basing subsequent studies. Resultant risks of bias and threats to study validity have the potential to hinder progress in cardiovascular medicine as preclinical research often precedes and informs clinical trials. Stroke research quality has uniquely improved in recent years, warranting a closer examination for interventions to model in other cardiovascular fields.
RATIONALE: Methodological sources of bias and suboptimal reporting contribute to irreproducibility in preclinical science and may negatively affect research translation. Randomization, blinding, sample size estimation, and considering sex as a biological variable are deemed crucial study design elements to maximize the quality and predictive value of preclinical experiments. OBJECTIVE: To examine the prevalence and temporal patterns of recommended study design element implementation in preclinical cardiovascular research. METHODS AND RESULTS: All articles published over a 10-year period in 5 leading cardiovascular journals were reviewed. Reports of in vivo experiments in nonhuman mammals describing pathophysiology, genetics, or therapeutic interventions relevant to specific cardiovascular disorders were identified. Data on study design and animal model use were collected. Citations at 60 months were additionally examined as a surrogate measure of research impact in a prespecified subset of studies, stratified by individual and cumulative study design elements. Of 28 636 articles screened, 3396 met inclusion criteria. Randomization was reported in 21.8%, blinding in 32.7%, and sample size estimation in 2.3%. Temporal and disease-specific analyses show that the implementation of these study design elements has overall not appreciably increased over the past decade, except in preclinical stroke research, which has uniquely demonstrated significant improvements in methodological rigor. In a subset of 1681 preclinical studies, randomization, blinding, sample size estimation, and inclusion of both sexes were not associated with increased citations at 60 months. CONCLUSIONS: Methodological shortcomings are prevalent in preclinical cardiovascular research, have not substantially improved over the past 10 years, and may be overlooked when basing subsequent studies. Resultant risks of bias and threats to study validity have the potential to hinder progress in cardiovascular medicine as preclinical research often precedes and informs clinical trials. Stroke research quality has uniquely improved in recent years, warranting a closer examination for interventions to model in other cardiovascular fields.
Preclinical research and clinical therapy development exist in a symbiotic continuum yet are guided by different reporting standards, regulatory forces, and reward systems.[1,2] Consequent discrepancies in study designs and analytical methods can compromise scientific validity,[3] result in irreproducible results (particularly from preclinical experiments using animal models[2]), and may contribute to the high rate of attrition seen in early stages of clinical development.[4,5] Improved methodological rigor and transparent reporting practices have therefore been advocated to improve the predictive value of animal model data.[2,6-10]Editorial, see pIn This Issue, see pMeet the First Author, see pThe issue of irreproducibility in preclinical research and its impact are well-recognized in academic circles[2,11-14] and increasingly so among the lay public[15-17] and the biopharmaceutical industry.[13,18] In 2011, Bayer HealthCare published an analysis of 67 in-house target identification and validation projects in the fields of oncology, women’s health, and cardiovascular disease over a 4-year period, reporting that the published data were reproducible in less than one third of cases and that in nearly two thirds inconsistencies either prolonged the validation process or led to project termination.[11] In 2012, Amgen similarly published the results of their attempt to reproduce 53 “landmark” studies in the fields of hematology and oncology, succeeding in only 6 cases even after contacting the original authors for guidance, exchanging reagents, and in certain instances repeating experiments in the laboratory of the original investigators.[12] Some findings could not be reproduced even by the original investigators in their own laboratories when the experiments were repeated in a blinded fashion.[19] Disconcertingly, some of the irreproducible research were deemed to have prompted clinical studies. Others have likewise described candidate therapies that have advanced to testing in humans despite irreproducible or inconsistent results in animals.[8,20,21]Since the mid-1990s, pharmaceutical research and development productivity has declined with decreasing numbers of medications approved, increasing attrition rates at all stages of research, and lengthening development times despite rising expenditures.[4] This decline has been attributed, in part, to flawed preclinical research methodology, highlighting the importance of preclinical research for advancing clinical care, but equally portending a waning confidence in animal model data to identify promising therapeutic targets.[5,7,11,12,22,23] Cardiovascular research and development, in particular, has fared poorly in recent years relative to other disease categories, exhibiting the greatest decline as a percentage of total projects[4] and among the lowest success rates and likelihood of approval at all phases of clinical trials.[24] Moreover, research and development investment patterns are increasingly deterring incremental innovation (for instance, improving upon available effective therapies). Instead, increasing focus is being placed on novel disease targets with higher revenue potential but at higher risk of failure (eg, specific cancers).[4] Poor reproducibility in preclinical research coupled with this increasingly unfavorable landscape for successful cardiovascular therapy development could, therefore, substantially hinder progress in cardiovascular care.In response to the above systematic issues and in collaboration with editors from major journals, funding agencies, and scientific leaders, the National Institutes of Health (NIH) proposed a set of reporting guidelines and funding policies to improve the reproducibility and rigor of preclinical research.[25] A core set of standards first proposed by the NIH’s National Institute of Neurological Disorders and Stroke were adopted, which identified randomization, blinding, sample size estimation, and data handling as minimum reporting requirements to promote transparency in animal studies.[7] In addition, the NIH announced that it would require that sex be considered as a biological variable in applications for preclinical research funding.[26] These principles and guidelines have been endorsed by prominent academic societies, associations, and journals, including all American Heart Association (AHA) journals,[25] with evidence of editorial commitment to complying.[27,28] However, there are limited data on the extent of suboptimal methodological rigor or incomplete reporting in preclinical cardiovascular science and therefore no baseline from which to gauge progress. A commitment to improving the quality and impact of preclinical research and to maintaining the trust of public and private stakeholders requires transparency on current and future states of scientific practice.[13] We, therefore, reviewed all preclinical cardiovascular studies published in leading AHA journals over the past decade to determine the prevalence of randomization, blinding, sample size estimation, and inclusion of both sexes and trends in these practices over time.
Methods
As previously described,[29] all preclinical cardiovascular studies published in AHA journals with archives spanning at least 10 years were reviewed. Five journals met these criteria: Circulation; Circulation Research; Hypertension; Stroke; and Arteriosclerosis, Thrombosis, and Vascular Biology (ATVB). All reports published during a 10-year period (July 2006 to June 2016) were screened. Studies were included if they represented original research published as full manuscripts; reported results of in vivo experiments in nonhuman mammals; and described pathophysiology, genetics, or therapeutic interventions that were stated to be directly relevant to a specific cardiovascular disorder (CVD) in humans (see prespecified list of CVDs below). Studies on physiological and genetic characteristics were included if potential therapeutic applications or implications of the study findings were proposed in the article. These criteria are consistent with previously proposed definitions of preclinical (“confirmatory” or “proof-of-concept”) experiments.[7,30,31] CVDs of interest included atherosclerosis or vascular homeostasis, arterial aneurysms or dissections, myocardial infarction, valvular disease, cardiomyopathy or heart failure, cardiac transplantation, pulmonary hypertension, cardiac arrhythmia, stroke, resuscitative medicine, hypertension, metabolic or endocrine diseases, and hematologic disorders (including thrombosis). Studies deemed to report on clinically relevant cardiovascular conditions but not falling into one of the prespecified categories could be included at the journal reviewers’ discretion (category “other”). Each journal article was independently reviewed and data extracted using standardized case report forms by 2 reviewers, allowing for the assessments of inter-rater agreement. To permit our team to review a large volume of articles, we allowed for studies to be excluded as soon as it was clear that a manuscript violated any of our inclusion criteria, which was most often because of the model used (eg, zebrafish or humans) or because they were published in formats other than full manuscripts (eg, conference abstracts). The specific inclusion criterion/criteria that was/were deemed to have been violated for each excluded article, therefore, varied according to journal reviewer and were often multiple (eg, a nonmammalian animal model was used and no specific reference to therapeutic implications/applications was made). Discrepancies were resolved by consensus or by an independent adjudicator before building a final locked database for analysis.Pre-specified data, including the date of publication, CVD investigated, animal model(s) used and their sex, whether animals were randomized to treatment groups, whether any blinding was implemented (concealed allocation or blinded outcome assessment), and whether a priori sample size/power estimations were performed were collected. Subgroup analyses restricted to studies of therapeutic interventions and by CVD studied were performed as were post hoc comparisons of these practices before and after the publication of NIH guidelines and policies for reporting preclinical research and the implementation of a “Basic Science checklist” by Stroke,[28] which is purported to have improved the quality and designs of preclinical studies published in that journal.[32]Finally, the number of citations has been used as a surrogate measure of research impact and influence.[33-37] Therefore, Scopus (Elsevier) was queried to identify original research articles that cited preclinical cardiovascular studies published between July 2006 and June 2011. This 5-year period was selected as it ensured that each article had ≥5 years of citation data available in June 2016. Reviews, conference papers, editorials, letters, books/book chapters, and errata were excluded. Scopus was selected because it is the largest database available for citation analysis and retrieves a greater number of citations when compared with others.[38] Prespecified analyses of citations at 60 months after the index publication were performed, stratified by individual and cumulative study design elements and adjusted for journal of publication, year of publication, and CVD studied. Given the possibility that certain publications may have garnered attention in the short-term but could have ultimately been disproven or deemed irreproducible, a sensitivity analysis of citation counts at 36 months was also undertaken as this time point approximates contemporary mean time-to-retraction.[39]Categorical variables are reported as number (%) and were compared via χ2 tests. Continuous variables are reported as median (interquartile range) and were compared using Wilcoxon signed-rank or Kruskal–Wallis tests. Inter-rater agreement was calculated using Cohen κ statistic and percent agreement. Temporal patterns in the proportions of studies reporting randomization, blinding, and sample size estimations were evaluated via Cochrane–Armitage trend tests and journal-specific logistic regression models adjusting for CVD studied and animal model used when the number of events per predictor variable was adequate.[40-42] For the latter adjusted analyses, backward elimination was used for model building using a criterion of P<0.20 for specific CVD and animal model predictor variable inclusion. Associations between study design elements and citation counts were examined via stratification and multivariable linear regression. Non-normally distributed variables were log-transformed, when required. All analyses were performed using SAS 9.4 (SAS Institute Inc, Cary, NC) using a 2-tailed α level of 0.05 to define statistical significance.
Results
Study Selection and Characteristics
As previously described,[29] of 28 636 articles screened, 3396 met inclusion criteria and were analyzed (Figure 1). Inter-rater agreement for study inclusion before resolution was 94.5% (κ=0.72; 95% CI, 0.70–0.73). Atherosclerosis, hypertension, stroke, and cardiomyopathy/heart failure were the most commonly studied CVDs (range 14.2%–19.5%). Most other CVDs were the focus of <5.0% of studies. Ten studies on resuscitative medicine were identified and were included in the “other” category, which comprised 3.0% of studies. Nearly one third of studies examined a therapeutic intervention. Mice or rats were most often used by researchers (used in 89.8% of studies) whereas guinea pigs, gerbils, or hamsters were used in <0.2%. Multiple animal models were used in 2.8% of studies (Table).
Figure 1.
Literature search and results.
ATVB indicates Arteriosclerosis, Thrombosis and Vascular Biology; and Circ Res, Circulation Research.
Study CharacteristicsLiterature search and results.
ATVB indicates Arteriosclerosis, Thrombosis and Vascular Biology; and Circ Res, Circulation Research.
Randomization
Randomization of animals was reported in 740 studies (21.8%) overall, but was more frequently noted in the subset examining therapeutic interventions (38.3% versus 15.0%, P<0.0001). When studies were stratified by CVD studied, significant differences in the proportions reporting randomization were noted (P<0.0001, range 5.6%–46.5%) with a lack of randomization predominating in all cases except stroke (Figure 2A). Significant differences in the proportions of studies reporting randomization were also observed when stratified by animal model used (P<0.0001, range 14.6%–45.0%) with a lack of randomization predominating in all cases except when pigs or a combination of animal models were used (Figure 2B).
Figure 2.
Randomization in preclinical cardiovascular research studies published over a 10-y period stratified by (A) disease studied and (B) species used. Dark blue corresponds to the proportion of studies implementing randomization; numbers in bars correspond to absolute numbers of studies. Valvular disease and resuscitative medicine studies were included in the “Other” category because of the small number of relevant publications. “Combination” refers to more than one animal species used within the same publication; animal models in the “Other” category included guinea pig, gerbil, and hamster. *For comparison of studies incorporating randomization vs not.
Randomization in preclinical cardiovascular research studies published over a 10-y period stratified by (A) disease studied and (B) species used. Dark blue corresponds to the proportion of studies implementing randomization; numbers in bars correspond to absolute numbers of studies. Valvular disease and resuscitative medicine studies were included in the “Other” category because of the small number of relevant publications. “Combination” refers to more than one animal species used within the same publication; animal models in the “Other” category included guinea pig, gerbil, and hamster. *For comparison of studies incorporating randomization vs not.
Blinding
Blinding of treatment allocation or outcome assessment was reported in 1110 studies (32.7%) overall, but was also more frequently noted in the subset examining therapeutic interventions (41.9% versus 28.9%, P<0.0001). When stratified by CVD studied, significant differences in the proportions of studies reporting any blinding were observed (P<0.0001, range 10.9%–62.6%). Studies on stroke ranked highest among CVDs although arterial aneurysms/dissections and transplantation medicine had comparable numbers of blinded and nonblinded studies. For all other CVDs, studies with blinding formed the minority (Figure 3A). No significant difference in the proportions of studies reporting blinding were seen when stratified by animal model used (P=0.369, range 20.9%–36.8%; Figure 3B).
Figure 3.
Blinding in preclinical cardiovascular research studies published over a 10-y period stratified by (A) disease studied and (B) species used. Dark blue corresponds to the proportion of studies implementing blinding; numbers in bars correspond to absolute numbers of studies. Valvular disease and resuscitative medicine studies were included in the “Other” category because of the small number of relevant publications. “Combination” refers to more than one animal species used within the same publication; animal models in the “Other” category included guinea pig, gerbil, and hamster. *For comparison of studies incorporating blinding vs not.
Blinding in preclinical cardiovascular research studies published over a 10-y period stratified by (A) disease studied and (B) species used. Dark blue corresponds to the proportion of studies implementing blinding; numbers in bars correspond to absolute numbers of studies. Valvular disease and resuscitative medicine studies were included in the “Other” category because of the small number of relevant publications. “Combination” refers to more than one animal species used within the same publication; animal models in the “Other” category included guinea pig, gerbil, and hamster. *For comparison of studies incorporating blinding vs not.
Sample Size Estimation
A priori sample size estimations or power calculations were reported in 79 studies (2.3%). This aspect of study design was also more frequently reported in the subset examining therapeutic interventions (4.2% versus 1.6%, P<0.0001). Significant differences in the proportion of studies reporting sample size calculations were observed when stratified by CVD studied (P<0.0001, range 0%–9.5%) with stroke ranking highest (Figure 4A), but not when stratified by animal model used (P=0.270, range 0%–6.1%; Figure 4B).
Figure 4.
Sample size estimation in preclinical cardiovascular research studies published over a 10-y period stratified by (A) disease studied and (B) species used. Dark blue corresponds to the proportion of studies reporting sample size estimations/power calculations; numbers in bars correspond to absolute numbers of studies. Valvular disease and resuscitative medicine studies were included in the “Other” category because of the small number of relevant publications. “Combination” refers to more than one animal species used within the same publication; animal models in the “Other” category included guinea pig, gerbil, and hamster. *For comparison of studies incorporating sample size estimation vs not.
Sample size estimation in preclinical cardiovascular research studies published over a 10-y period stratified by (A) disease studied and (B) species used. Dark blue corresponds to the proportion of studies reporting sample size estimations/power calculations; numbers in bars correspond to absolute numbers of studies. Valvular disease and resuscitative medicine studies were included in the “Other” category because of the small number of relevant publications. “Combination” refers to more than one animal species used within the same publication; animal models in the “Other” category included guinea pig, gerbil, and hamster. *For comparison of studies incorporating sample size estimation vs not.
Inclusion of Both Sexes
Sex bias prevalence and detailed temporal trends have been described previously.[29] After excluding studies that did not report the sex of the animals used, significant differences were noted in the proportions including animals of both sexes when stratified by CVD of interest (P<0.0001, range 2.4%–34.7%) and by animal model used (P<0.0001, range 6.7%–26.9%; Online Figure).
Temporal Patterns in Study Design Element Implementation
Over the past 10 years, there have been no significant changes in the proportions of studies reporting blinding or randomization (inclusion of both sexes has been reported previously[29]). There has been a significant increase in the proportion reporting a priori sample size estimations (Ptrend<0.0001), although it remains below 7% (Figure 5). There was no substantial difference in the prevalence of any of these study design elements before and after the NIH principles and guidelines for reporting preclinical research were published in 2014 (range of differences 0.7%–3.6%). Post hoc calculations suggest that our sample sizes had ≥80% power to detect a difference of 5.0% for all study design elements.
Figure 5.
Temporal patterns in randomization, blinding, and sample size estimation in preclinical cardiovascular studies. Dashed line indicates the publication of the National Institutes of Health guidelines and policies for reporting preclinical research. Data for inclusion of both sexes have been reported previously.[29]
Temporal patterns in randomization, blinding, and sample size estimation in preclinical cardiovascular studies. Dashed line indicates the publication of the National Institutes of Health guidelines and policies for reporting preclinical research. Data for inclusion of both sexes have been reported previously.[29]CVD-specific temporal analyses suggest that preclinical studies on stroke are increasingly incorporating randomization, blinding, sample size estimations, and inclusion of both sexes—a pattern that is not seen in studies on atherosclerosis, hypertension, or cardiomyopathy/heart failure (the most commonly studied CVDs). Studies on atherosclerosis have shown a modest increase in the proportion reporting sample size estimation in recent years, but an overall significant decrease in the proportion reporting blinding, whereas the proportion of studies on cardiomyopathy/heart failure that include both sexes has significantly decreased. No other significant changes were observed during the 10-year period (Figure 6).
Figure 6.
Temporal patterns in randomization, blinding, sample size estimation, and inclusion of both sexes in preclinical research for the most commonly studied cardiovascular diseases. Note the different scale of y-axis for sample size estimation. CM/HF indicates cardiomyopathy/heart failure.
Temporal patterns in randomization, blinding, sample size estimation, and inclusion of both sexes in preclinical research for the most commonly studied cardiovascular diseases. Note the different scale of y-axis for sample size estimation. CM/HF indicates cardiomyopathy/heart failure.Given the disproportionate amount of stroke research published in Stroke, which has reported improvements in the quality and design of published preclinical research following the introduction of their “Basic Science Checklist” in 2011,[32] CVD-adjusted and animal model-adjusted comparisons of study design implementation before and after this time point were performed for each journal. Stroke uniquely exhibited significant improvements in all measures of methodological quality (range of adjusted odds ratios 2.4–8.2, P<0.0001 for all study design elements; Online Table). These analyses also identified stroke as the CVD studied as an independent positive predictor of one or more study design element in every journal.
Citations According to Index Study Methodology
At 60 months, 41 441 articles citing the 1681 preclinical studies that were published between July 2006 and June 2011 were identified. The median citation count per preclinical study was 20 (interquartile range 13–31, range 0–131). Studies that implemented randomization, blinding, or sample size estimation had similar numbers of citations as those that did not; however, studies that included both males and females were cited less frequently (median 18 versus 20, P=0.023). The cumulative number of these study design elements was not associated with citation counts (Figure 7). No study implemented all 4 of the above design elements and only 20 studies (1.2%) incorporated 3; therefore, studies with ≥2 were grouped together for this analysis. The above associations (or lack thereof) between study design element(s) and number of citations persisted after adjusting for journal of publication, year of publication, and CVD studied in multivariable regression models and all findings were comparable in sensitivity analyses of citation counts at 36 months.
Figure 7.
Box-whisker plot of the number of citations by methodological rigor of index preclinical study (n=1681). Diamond symbol plotted at mean. Outliers identified as beyond 1.5 interquartile range. Cumulative refers to sum of randomization, blinding, sample size estimation, and inclusion of both sexes (each contributing 1 point). *P<0.05.
Box-whisker plot of the number of citations by methodological rigor of index preclinical study (n=1681). Diamond symbol plotted at mean. Outliers identified as beyond 1.5 interquartile range. Cumulative refers to sum of randomization, blinding, sample size estimation, and inclusion of both sexes (each contributing 1 point). *P<0.05.
Discussion
Preclinical cardiovascular research using animal models plays an integral role in advancing the care of patients afflicted with CVDs. However, its impact is contingent on its scientific validity, reproducibility, and relevance to human physiology and disease. It is understood that inherent limitations of using animals to model human diseases can undermine the predictive value of preclinical findings, rendering it difficult for even the most skilled scientists to make impactful discoveries.[12] However, this difficulty is compounded when methodological sources of bias are introduced. We systematically examined a continuous and large body of preclinical cardiovascular research to determine how often randomization, blinding, and a priori sample size estimation are incorporated and did so over a sufficiently long period to draw meaningful conclusions on the temporal patterns of these practices. We report that, overall, these design elements are rarely implemented in studies published in leading peer-reviewed cardiovascular journals, paralleling the previously reported prevalence of sex bias.[29] Furthermore, apart from a modest increase in the proportion of studies reporting sample size calculations, there has been no improvement in these practices over the past decade, although stroke research may be a notable exception. Finally, analyses of citation counts suggest that crucial methodological aspects of preclinical studies may be overlooked by cardiovascular researchers, affording potentially biased research comparable influence on scientific research efforts as methodologically more robust studies.Poorly designed preclinical studies not only contribute to experimental irreproducibility[2,7] and wasted resources[43,44] but also may result in erroneous conclusions regarding the treatment effects,[9,21,23,45,46] which can ultimately spur or deter clinical trials in humans with consequent risk of harm.[12,20,47] The experimental design elements evaluated in our study are deemed crucial to improving the quality of preclinical research by addressing selection bias (randomization); minimizing performance, detection, and attrition bias (blinding); ensuring adequate statistical power and the ethical use of animals (sample size estimation); and promoting research relevance for both men and women (inclusion of both sexes).[1,6,7,10,13,26,48,49] Although these elements are routinely implemented in clinical trials as they have been shown to protect against bias and imprecision,[50-53] they have conspicuously not permeated preclinical experiments.[1,23,54-56] The high degree of experimental control that is possible in preclinical research (eg, via genetic and environmental homogeneity) can reduce variation and the sample sizes required relative to clinical trials; however, variations in injury or disease induction and the potential for persistent and unrecognized confounders represent important sources of bias.[6,31,57]Furthermore, despite a pervasive belief among scientists that there is a reproducibility “crisis” in research that is attributable to methodological and reporting shortcomings,[14] our citation analyses suggest that greater methodological rigor in preclinical cardiovascular research does not translate into greater scientific influence. A similar observation was noted by Amgen in a study of 53 preclinical cancer research publications: studies that they could not adequately reproduce were cited as often, if not more often, than those that they could successfully reproduce, irrespective of the journal of publication’s impact factor.[12] The suggestion that methodologically robust studies are given equal consideration as studies at greater risk of bias by cardiovascular researchers is troubling as it undermines the reputed “self-correcting” tenet of science and increases the risk of pursuing unfruitful avenues of research.[2,5,44,48]Cogent analyses and arguably practical solutions to improve preclinical research quality (and secondarily to enhance research translation) have been proposed. However, evidence of changes in research practice and data on the impact of corrective actions are scarce.[13] Several guidelines and checklists have been developed to improve preclinical methodology[45,58,59] and reporting,[25,45,60-62] yet there is little indication that they are effecting change despite being widely endorsed.[43,58,63] Systematic reviews and meta-analyses have been advocated as safeguards to expose bias in preclinical research before embarking on clinical trials,[9,47,64,65] but they are limited by the internal validity of included studies[66] and are not routinely performed. Changes in research funding such as those undertaken by the NIH[2,26] may bolster these efforts; however, our analyses of animal studies in the cardiovascular sciences have not shown signs of substantial changes in research practices or reporting since they were announced.[29]Journal editors and reviewers directly influence what is published thereby often serving as ultimate gatekeepers of research findings; yet, few journals effectively encourage the use of reporting guidelines.[67] At the time of writing, there exists considerable variation in author guidelines for reporting animal studies among leading cardiovascular journals with few requiring authors to report randomization, blinding, sample size estimation, or the sex of the animals used—requirements that could increase transparency and reinforce the importance of these study design elements.[13] Uniquely, however, Stroke implemented a “Basic Science Checklist” in 2011 (updated in 2016),[28] which includes all of these relevant elements, forms part of the manuscript submission process, and is evaluated by editors and reviewers. Over the 2.5 years after its introduction, Minnerup et al[32] noted improvements in randomization, blinding, and allocation concealment compared with the preceding 18 months. We noted significant improvements in the quality of preclinical stroke research, 90% of which was published in Stroke, raising the question of whether these improvements were due to journal editorial policies/culture or driven by changes in research practices among stroke researchers in general. Our post hoc analyses suggest that it is likely a combination of both: Stroke uniquely showed improvements in all study design elements even after adjusting for CVD studied and animal model used, but stroke as the CVD studied was identified as an independent predictor for at least one study design element for every journal examined. Our findings, therefore, corroborate those of Minnerup et al[32] and expand upon them by demonstrating that these improvements have continued into 2016, extend to sample size estimation (and to a lesser extent inclusion of both sexes), contrast with an extended preceding period of suboptimal reporting, and have not been paralleled in other prominent cardiovascular journals. Therefore, although not conclusive, our data suggest that journal editors and reviewers can exert considerable influence on preclinical research practices and reporting. Yet, our findings also suggest that stroke research has independently improved in quality—a finding that may be attributable to its community’s early appreciation of the importance of methodological rigor in preclinical science and extensive involvement in efforts to improve research translation.[7,20,21,46,58,68-71]Our study is not without limitations. The study sample comprised articles from 5 cardiovascular journals over 10 years, which may not fully represent preclinical cardiovascular research publications or practices. However, we deliberately selected AHA journals given their prominence in the field of cardiovascular medicine, established reputation, collective focus on a broad range of CVDs, and unanimous endorsement of the NIH guidelines on rigor and reproducibility. Given that none of the American College of Cardiology or European Society of Cardiology journals have endorsed the guidelines so far,[25] our data may actually overestimate the methodological quality of most preclinical cardiovascular research. There is no accepted definition for preclinical studies therefore criteria were developed for this study. It is, therefore, possible that not all relevant studies were included in our analysis. However, the inclusion criteria used, the number of journals reviewed, and the substantial inter-rater agreement for study inclusion support the validity of our results and render our data the best available on methodological rigor in preclinical cardiovascular research. Furthermore, our inclusion criteria are in line with previously proposed distinctions between “exploratory” and “confirmatory” preclinical research.[7,30,31] Our analysis considered blinding as a single and dichotomous variable, which limits detailed assessments of this design element, and did not examine all relevant potential sources of bias. Data handling and publication bias, for instance, are important factors that may contribute to irreproducibility in science.[2,6,7,9-11,13,43] As well, factors other than experimental methodology can influence the predictive value of preclinical research, including how well an animal model reflects human physiology and disease,[6,9,45,72] the appropriateness of statistical analyses,[73] and a lack of standardization of definitions and surrogate markers,[74,75] which were not examined. Although article citation is the most commonly used measure of research impact, it is an imperfect indicator.[33,34] Finally, study design elements may have been implemented but not reported, which could result in underestimates of their prevalence. However, underreporting of measures of methodological rigor is believed to be low[49,65] and therefore unlikely to have significantly influenced our results.Our analysis of preclinical research published in leading cardiovascular journals over the past 10 years demonstrates that methodological sources of potential bias and imprecision are prevalent, have not appreciably improved over time, and may be overlooked by researchers when basing subsequent studies. Concerted efforts to address this problem are urgently needed. Stroke research has uniquely shown substantial improvement in several measures of quality in recent years, warranting a closer examination to identify drivers of its success.
Authors: H Bart van der Worp; David W Howells; Emily S Sena; Michelle J Porritt; Sarah Rewell; Victoria O'Collins; Malcolm R Macleod Journal: PLoS Med Date: 2010-03-30 Impact factor: 11.069
Authors: Sean Scott; Janice E Kranz; Jeff Cole; John M Lincecum; Kenneth Thompson; Nancy Kelly; Alan Bostrom; Jill Theodoss; Bashar M Al-Nakhala; Fernando G Vieira; Jeyanthi Ramasubbu; James A Heywood Journal: Amyotroph Lateral Scler Date: 2008
Authors: Paul Glasziou; Douglas G Altman; Patrick Bossuyt; Isabelle Boutron; Mike Clarke; Steven Julious; Susan Michie; David Moher; Elizabeth Wager Journal: Lancet Date: 2014-01-08 Impact factor: 79.321
Authors: Malcolm R Macleod; Aaron Lawson McLean; Aikaterini Kyriakopoulou; Stylianos Serghiou; Arno de Wilde; Nicki Sherratt; Theo Hirst; Rachel Hemblade; Zsanett Bahor; Cristina Nunes-Fonseca; Aparna Potluru; Andrew Thomson; Julija Baginskaite; Julija Baginskitae; Kieren Egan; Hanna Vesterinen; Gillian L Currie; Leonid Churilov; David W Howells; Emily S Sena Journal: PLoS Biol Date: 2015-10-13 Impact factor: 8.029
Authors: Carlijn R Hooijmans; Maroeska M Rovers; Rob B M de Vries; Marlies Leenaars; Merel Ritskes-Hoitinga; Miranda W Langendam Journal: BMC Med Res Methodol Date: 2014-03-26 Impact factor: 4.615
Authors: Hans Erik Bøtker; Derek Hausenloy; Ioanna Andreadou; Salvatore Antonucci; Kerstin Boengler; Sean M Davidson; Soni Deshwal; Yvan Devaux; Fabio Di Lisa; Moises Di Sante; Panagiotis Efentakis; Saveria Femminò; David García-Dorado; Zoltán Giricz; Borja Ibanez; Efstathios Iliodromitis; Nina Kaludercic; Petra Kleinbongard; Markus Neuhäuser; Michel Ovize; Pasquale Pagliaro; Michael Rahbek-Schmidt; Marisol Ruiz-Meana; Klaus-Dieter Schlüter; Rainer Schulz; Andreas Skyschally; Catherine Wilder; Derek M Yellon; Peter Ferdinandy; Gerd Heusch Journal: Basic Res Cardiol Date: 2018-08-17 Impact factor: 17.165
Authors: Nathalie Percie du Sert; Viki Hurst; Amrita Ahluwalia; Sabina Alam; Marc T Avey; Monya Baker; William J Browne; Alejandra Clark; Innes C Cuthill; Ulrich Dirnagl; Michael Emerson; Paul Garner; Stephen T Holgate; David W Howells; Natasha A Karp; Stanley E Lazic; Katie Lidster; Catriona J MacCallum; Malcolm Macleod; Esther J Pearl; Ole H Petersen; Frances Rawle; Penny Reynolds; Kieron Rooney; Emily S Sena; Shai D Silberberg; Thomas Steckler; Hanno Würbel Journal: Exp Physiol Date: 2020-07-14 Impact factor: 2.969
Authors: Nathalie Percie du Sert; Viki Hurst; Amrita Ahluwalia; Sabina Alam; Marc T Avey; Monya Baker; William J Browne; Alejandra Clark; Innes C Cuthill; Ulrich Dirnagl; Michael Emerson; Paul Garner; Stephen T Holgate; David W Howells; Natasha A Karp; Stanley E Lazic; Katie Lidster; Catriona J MacCallum; Malcolm Macleod; Esther J Pearl; Ole H Petersen; Frances Rawle; Penny Reynolds; Kieron Rooney; Emily S Sena; Shai D Silberberg; Thomas Steckler; Hanno Würbel Journal: J Physiol Date: 2020-07-14 Impact factor: 5.182
Authors: Daniel G Remick; Alfred Ayala; Irshad H Chaudry; Craig M Coopersmith; Clifford Deutschman; Judith Hellman; Lyle Moldawer; Marcin F Osuchowski Journal: Shock Date: 2019-01 Impact factor: 3.454
Authors: Ali J Marian; Yanli Tan; Lili Li; Jeffrey Chang; Petros Syrris; Manouchehr Hessabi; Mohammad H Rahbar; James T Willerson; Benjamin Y Cheong; Chia-Ying Liu; Neal S Kleiman; David A Bluemke; Sherif F Nagueh Journal: Circ Res Date: 2018-03-14 Impact factor: 17.367