Literature DB >> 34898694

Reproducible Science Is Vital for a Stronger Evidence Base During the COVID-19 Pandemic.

Karla Therese L Sy^1,2, Laura F White³, Brooke E Nichols^2,4,5.

Abstract

Reproducible research becomes even more imperative as we build the evidence base on SARS-CoV-2 epidemiology, diagnosis, prevention, and treatment. In his study, Paez assessed the reproducibility of COVID-19 research during the pandemic, using a case study of population density. He found that most articles that assess the relationship of population density and COVID-19 outcomes do not publicly share data and code, except for a few, including our paper, which he stated "illustrates the importance of good reproducibility practices". Paez recreated our analysis using our code and data from the perspective of spatial analysis, and his new model came to a different conclusion. The disparity between our and Paez's findings, as well as other existing literature on the topic, give greater impetus to the need for further research. As there has been near exponential growth of COVID-19 research across a wide range of scientific disciplines, reproducible science is a vital component to produce reliable, rigorous, and robust evidence on COVID-19, which will be essential to inform clinical practice and policy in order to effectively eliminate the pandemic.

Entities: Chemical

Year: 2021 PMID： 34898694 PMCID： PMC8652901 DOI： 10.1111/gean.12314

Source DB: PubMed Journal: Geogr Anal ISSN： 0016-7363

Importance of reproducible research

Reproducible science, which refers to the ability to achieve identical findings with the same code and data, is essential to critically examine scientific evidence through the replication of results. This in contrast with replicability, which is the act of repeating a study without the original use of data, but generally using the same methods (Broman et al. 2017). The ability to reproduce research results can be used to assess whether different assumptions and models can lead to different conclusions, and examine inconsistent results across studies. Reproducible science fosters greater collaboration across disciplines, improves scientific discourse, and strengthens scientific evidence. However, in recent years, the scientific community has established that there is a “reproducibility crisis” (Sayre and Riegelman 2019). In 2016, a survey of researchers found that more than 70% have tried and failed to reproduce another scientist’s experiments, and more than 50% failed to recreate their own (Baker 2016). This lack of scientific rigor and robustness has consequences for the reliability of research findings, and the extent to which the public should trust research. In particular, during the current COVID‐19 pandemic, there has been near exponential growth of COVID‐19 literature and evidence across a wide range of scientific disciplines (Brainard 2020). Reproducible research becomes even more imperative as we build the evidence base on SARS‐CoV‐2 epidemiology, diagnosis, prevention, and treatment. In his study, Paez assessed the reproducibility of COVID‐19 research during the pandemic, using a case study of population density (Paez 2021). He found that most publications that assessed the association of population density and COVID‐19 outcomes fell short of the gold standard of reproducible research. Currently, the literature for the association of population density and COVID‐19 outcomes remains inconclusive, as several studies in various geographic settings have come to opposing conclusions. He found that different researchers used a range of statistical techniques that could account for these varying findings. Reproducibility of studies to verify findings or reanalyze the data could provide greater insight into why associations differ across studies. Unfortunately, Paez found that most articles that assess the relationship of population density and COVID‐19 outcomes do not publicly share data and code, except for a few, including our article (Sy and White 2021), which he stated “illustrates the importance of good reproducibility practices”. He was able to recreate our analysis using our code and data. Additionally, he also reanalyzed our data from the perspective of spatial analysis, and his new model came to a different conclusion. The perspective that Paez introduces to our work is an example of the importance of reproducibility. He empirically demonstrated that scientific results can have different conclusions depending on model choice and specifications, and that the discrepancy between his and our analysis likely stemmed from different modeling choices, which would not have been apparent had we not made our code and data available. It is well‐known that variation in statistical models and assumptions used can vastly change the results, and analytic choices usually depend on the scientists’ field of research. When 29 research groups were asked to answer a research question on the same data set, the effect sizes ranged from OR = 0.89 to OR = 2.83 (Silberzahn et al. 2018). Reproducible science also helps scientists across different fields work together in order to address various knowledge or method gaps, which promotes and fosters more rigorous interdisciplinary research. In this case, his expertise in spatial analysis added to our expertise in infectious disease epidemiology, and offered an alternative conclusion to our findings.

Comparison of Paez’s model and Sy, White, and Nichol model, and other published literature

In his article, Paez noted that the mixed linear models in our analysis were an appropriate modeling choice, but indicated two potential limitations that he attempted to address with spatial models. He addressed the potential non‐random sample selection with Heckman’s selection model with spatial filtering, and replaced the log‐transformation of population density with a quadratic expansion. In our study, we conducted several sensitivity analyses to assess the robustness of our results to address any potential limitations in our model; however, we state that sampling could be an issue, “we had to only include counties that had sufficient case data in order to accurately estimate R”. When Heckman’s selection model was used, Paez found that “the coefficient for population density is still positive, but the magnitude changes: in effect, it appears that the effect of density is more pronounced than what [Sy, White, and Nichols] Model 3 indicated.”, which corresponds to what we predicted in our discussion section “if we included all counties, the true association between population density and R.” His selection model confirmed our prediction about the direction of the potential selection bias. Moreover, decisions in how variables are operationalized could also be a reason for different conclusions. Paez’s method of transforming the population density variable allowing for non‐monotonic changes to the relationship between population density and the basic reproductive number offered an alternative conclusion, where higher density is not always associated with greater risk of disease spread. The disparity in findings between our and Paez’s findings give greater impetus to the need for further research on this topic. As noted in our article, “recent literature [on density and COVID‐19] has been conflicting, where some research also suggests a density‐dependence of COVID‐19 transmission (Rocklöv and Sjödin ; Rubin et al. ) and other measures of the severity of the outbreak ( Anand et al. ; Wong and Li ), while other research suggests that there are other factors that can better explain the pandemic (Hamidi, Ewing, and Sabouri 2020 ; Hamidi, Sabouri, and Ewing 2020 )”. Aside from differences in model specifications, other potential reasons for the variation in conclusions may be due to (a) unmeasured confounding, (b) data quality issues that cause misclassification, (c) selection bias, or (d) other unknown biases. These sources of bias are ubiquitous in observational studies, and researchers do their best to mitigate and limit these potential sources of bias that obscure the true association between exposure and outcome. Reproducible research provides an avenue to critically examine published study results for potential sources of bias that were not properly accounted for.

Conclusion

The COVID‐19 pandemic is an unprecedented time for scientific research, with experts from various fields working together to rapidly improve our understanding of SARS‐CoV‐2. Reproducible science is a vital component to produce reliable, rigorous, and robust literature on COVID‐19, which will be essential to inform clinical practice and policy in order to effectively eliminate the pandemic.

8 in total

1. 1,500 scientists lift the lid on reproducibility.

Authors: Monya Baker
Journal: Nature Date: 2016-05-26 Impact factor: 49.962

2. High population densities catalyse the spread of COVID-19.

Authors: Joacim Rocklöv; Henrik Sjödin
Journal: J Travel Med Date: 2020-05-18 Impact factor: 8.490

3. Association of Social Distancing, Population Density, and Temperature With the Instantaneous Reproduction Number of SARS-CoV-2 in Counties Across the United States.

Authors: David Rubin; Jing Huang; Brian T Fisher; Antonio Gasparrini; Vicky Tam; Lihai Song; Xi Wang; Jason Kaufman; Kate Fitzpatrick; Arushi Jain; Heather Griffis; Koby Crammer; Jeffrey Morris; Gregory Tasian
Journal: JAMA Netw Open Date: 2020-07-01

4. Spreading of COVID-19: Density matters.

Authors: David W S Wong; Yun Li
Journal: PLoS One Date: 2020-12-23 Impact factor: 3.240

5. Population density and basic reproductive number of COVID-19 across United States counties.

Authors: Karla Therese L Sy; Laura F White; Brooke E Nichols
Journal: PLoS One Date: 2021-04-21 Impact factor: 3.240

6. Reproducibility of Research During COVID-19: Examining the Case of Population Density and the Basic Reproductive Rate from the Perspective of Spatial Analysis.

Authors: Antonio Paez
Journal: Geogr Anal Date: 2021-11-18

7. Longitudinal analyses of the relationship between development density and the COVID-19 morbidity and mortality rates: Early evidence from 1,165 metropolitan counties in the United States.

Authors: Shima Hamidi; Reid Ewing; Sadegh Sabouri
Journal: Health Place Date: 2020-06-25 Impact factor: 4.078

8. Prevalence of SARS-CoV-2 antibodies in a large nationwide sample of patients on dialysis in the USA: a cross-sectional study.

Authors: Shuchi Anand; Maria Montez-Rath; Jialin Han; Julie Bozeman; Russell Kerschmann; Paul Beyer; Julie Parsonnet; Glenn M Chertow
Journal: Lancet Date: 2020-09-25 Impact factor: 202.731

8 in total