| Literature DB >> 35370324 |
Josip Strcic1, Antonia Civljak2, Terezija Glozinic1, Rafael Leite Pacheco3, Tonci Brkovic4, Livia Puljak1.
Abstract
This study aimed to analyze the content of data availability statements (DAS) and the actual sharing of raw data in preprint articles about COVID-19. The study combined a bibliometric analysis and a cross-sectional survey. We analyzed preprint articles on COVID-19 published on medRxiv and bioRxiv from January 1, 2020 to March 30, 2020. We extracted data sharing statements, tried to locate raw data when authors indicated they were available, and surveyed authors. The authors were surveyed in 2020-2021. We surveyed authors whose articles did not include DAS, who indicated that data are available on request, or their manuscript reported that raw data are available in the manuscript, but raw data were not found. Raw data collected in this study are published on Open Science Framework (https://osf.io/6ztec/). We analyzed 897 preprint articles. There were 699 (78%) articles with Data/Code field present on the website of a preprint server. In 234 (26%) preprints, data/code sharing statement was reported within the manuscript. For 283 preprints that reported that data were accessible, we found raw data/code for 133 (47%) of those 283 preprints (15% of all analyzed preprint articles). Most commonly, authors indicated that data were available on GitHub or another clearly specified web location, on (reasonable) request, in the manuscript or its supplementary files. In conclusion, preprint servers should require authors to provide data sharing statements that will be included both on the website and in the manuscript. Education of researchers about the meaning of data sharing is needed. Supplementary Information: The online version contains supplementary material available at 10.1007/s11192-022-04346-1. © Akadémiai Kiadó, Budapest, Hungary 2022.Entities:
Keywords: COVID-19; Coronavirus; Data sharing; Open data; Preprint server; SARS-CoV-2
Year: 2022 PMID: 35370324 PMCID: PMC8956135 DOI: 10.1007/s11192-022-04346-1
Source DB: PubMed Journal: Scientometrics ISSN: 0138-9130 Impact factor: 3.801
Categories of data sharing statements reported in the “Data/Code field” on the preprint web site
| Category | N (%) |
|---|---|
| Data available on request | 130 (15) |
| Data generated or used during the study have been presented in the submitted article | 96 (11) |
| Data were posted on a specified website [ | 106 (12) |
| Data used in the manuscript are available from references, or another [ | 74 (8.2) |
| Data and script are available in GitHub | 55 (6.1) |
| Data available on | 41 (4.6) |
| Authors described a source of their data, but not exact public location | 15 (1.7) |
| Not available | 13 (1.4) |
| Not applicable | 13 (1.4) |
| Statement that was not about data availability | 13 (1.4) |
Ten most common categories are shown, from the 897 analyzed preprints
Ten most commonly used categories of data sharing statements reported in the full text of the preprint manuscript (N = 234)
| Category | N (%) |
|---|---|
| Data and script are available in GitHub | 47 (19) |
| Data available on request | 42 (18) |
| Data were posted on a specified website other than GitHub [ | 38 (16) |
| Data generated or used during the study have been presented in the submitted article (in the manuscript and/or supplementary materials) | 33 (14) |
| Data available on | 29 (12) |
| Data used in the manuscript are available from a public/open source [ | 27 (12) |
| No additional data available | 4 (1.7) |
| Not applicable | 4 (1.7) |
| Data not shared (“Not publicly available”; “Cannot be shared online”; “Data obtained for this study will not be made available to others”) | 3 (1.3) |
| Submitted (“Data are submitted”; “Submitted to databases”) | 2 (0.9) |
Categorizes responses from author survey on reasons for not sharing their data within the manuscript
| Group of autors | Responses | N |
|---|---|---|
| Author explained where the data are available | 19 | |
| There are no primary data that have been collected as part of the study | 2 | |
| Did not clarify the location of raw data | 2 | |
| Not willing to share their data | 1 | |
| Did not answer the question in the response | 1 | |
| Did not explain what would be a reasonable request for data sharing | 8 | |
| Author explained where the data are available | 6 | |
| Reasonable request would be a scientific project or an institutional/healthcare project | 1 | |
| We share data generated in our study upon request, for instance, upon requests by email | 1 | |
| Author explained where the data are available / raw data will be published later | 1 | |
| The data will be published only when the manuscript is formally published | 1 | |
| Author explained where the data are available / author explained what would be a reasonable request for data sharing | 1 | |
| Would be motivated to share if the request comes from a reputable institution with a clear aim and objectives for the data analysis | 1 | |
| Not willing to share their data | 1 | |
| Did not answer the question in the response | 1 | |
| Author explained where the data are available | 14 | |
| Authors will share their raw data but want to know what exactly is planned to do with the data | 1 | |
| No new data set was used in the paper | 1 | |
| Not sure what the term raw data means | 1 | |
| The data will be published only when the manuscript is formally published | 1 | |
| Willing to share their data but it's difficult concerning the size of data | 1 | |