Weiyi Xia1,2, Zhiyu Wan3,2, Zhijun Yin3,4,2, James Gaupp1,2, Yongtai Liu3,2, Ellen Wright Clayton5,6,7,2, Murat Kantarcioglu8, Yevgeniy Vorobeychik1,3,2, Bradley A Malin1,3,4,2. 1. Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. 2. Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA. 3. Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA. 4. Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA. 5. Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN, USA. 6. Law School, Vanderbilt University, Nashville, TN, USA. 7. Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA. 8. Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA.
Abstract
Objective: Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods: This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results: The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion: The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.
Objective: Biomedical science is driven by datasets that are being accumulated at an unprecedented rate, with ever-growing volume and richness. There are various initiatives to make these datasets more widely available to recipients who sign Data Use Certificate agreements, whereby penalties are levied for violations. A particularly popular penalty is the temporary revocation, often for several months, of the recipient's data usage rights. This policy is based on the assumption that the value of biomedical research data depreciates significantly over time; however, no studies have been performed to substantiate this belief. This study investigates whether this assumption holds true and the data science policy implications. Methods: This study tests the hypothesis that the value of data for scientific investigators, in terms of the impact of the publications based on the data, decreases over time. The hypothesis is tested formally through a mixed linear effects model using approximately 1200 publications between 2007 and 2013 that used datasets from the Database of Genotypes and Phenotypes, a data-sharing initiative of the National Institutes of Health. Results: The analysis shows that the impact factors for publications based on Database of Genotypes and Phenotypes datasets depreciate in a statistically significant manner. However, we further discover that the depreciation rate is slow, only ∼10% per year, on average. Conclusion: The enduring value of data for subsequent studies implies that revoking usage for short periods of time may not sufficiently deter those who would violate Data Use Certificate agreements and that alternative penalty mechanisms may need to be invoked.
Authors: Santosh Kumar; Gregory D Abowd; William T Abraham; Mustafa al'Absi; J Gayle Beck; Duen Horng Chau; Tyson Condie; David E Conroy; Emre Ertin; Deborah Estrin; Deepak Ganesan; Cho Lam; Benjamin Marlin; Clay B Marsh; Susan A Murphy; Inbal Nahum-Shani; Kevin Patrick; James M Rehg; Moushumi Sharmin; Vivek Shetty; Ida Sim; Bonnie Spring; Mani Srivastava; David W Wetter Journal: J Am Med Inform Assoc Date: 2015-07-03 Impact factor: 4.497
Authors: Christina M Pacheco; Sean M Daley; Travis Brown; Melissa Filippi; K Allen Greiner; Christine M Daley Journal: Am J Public Health Date: 2013-10-17 Impact factor: 9.308
Authors: Ronald Margolis; Leslie Derr; Michelle Dunn; Michael Huerta; Jennie Larkin; Jerry Sheehan; Mark Guyer; Eric D Green Journal: J Am Med Inform Assoc Date: 2014-07-09 Impact factor: 4.497
Authors: Fadila Zerka; Samir Barakat; Sean Walsh; Marta Bogowicz; Ralph T H Leijenaar; Arthur Jochems; Benjamin Miraglio; David Townend; Philippe Lambin Journal: JCO Clin Cancer Inform Date: 2020-03
Authors: Ken Chang; Niranjan Balachandar; Carson Lam; Darvin Yi; James Brown; Andrew Beers; Bruce Rosen; Daniel L Rubin; Jayashree Kalpathy-Cramer Journal: J Am Med Inform Assoc Date: 2018-08-01 Impact factor: 7.942