| Literature DB >> 29719004 |
Lisa M Federer1, Christopher W Belter1, Douglas J Joubert1, Alicia Livinski1, Ya-Ling Lu1, Lissa N Snyders1, Holly Thompson1.
Abstract
A number of publishers and funders, including PLOS, have recently adopted policies requiring researchers to share the data underlying their results and publications. Such policies help increase the reproducibility of the published literature, as well as make a larger body of data available for reuse and re-analysis. In this study, we evaluate the extent to which authors have complied with this policy by analyzing Data Availability Statements from 47,593 papers published in PLOS ONE between March 2014 (when the policy went into effect) and May 2016. Our analysis shows that compliance with the policy has increased, with a significant decline over time in papers that did not include a Data Availability Statement. However, only about 20% of statements indicate that data are deposited in a repository, which the PLOS policy states is the preferred method. More commonly, authors state that their data are in the paper itself or in the supplemental information, though it is unclear whether these data meet the level of sharing required in the PLOS policy. These findings suggest that additional review of Data Availability Statements or more stringent policies may be needed to increase data sharing.Entities:
Mesh:
Year: 2018 PMID: 29719004 PMCID: PMC5931451 DOI: 10.1371/journal.pone.0194768
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Articles missing a Data Availability Statement over time.
The red line indicates total published articles, while the bars indicate articles with no Data Availability Statement.
Fig 2Flow diagram of Data Availability Statement coding.
Description of coding categories and example statements.
| Code | Number of Statements | Definition | Sample Statement |
|---|---|---|---|
| access restricted | 3,523 (7.4%) | statement mentions ethical, legal, or privacy restrictions, or the data are owned by a third party that restricts access | To protect potentially identifiable information on serious crimes, ethical approval is needed to access data. Data are available from < |
| combination | 2,125 (4.5%) | statement mentions more than one mechanism for sharing | All data not contained within the paper or supporting files are available from < |
| in paper | 11,553 (24.3%) | statement indicates data are reported in the paper, including in tables and/or figures | The minimal data set underlying the findings in our study data is within the paper. |
| in paper and SI | 21,568 (45.34%) | statement indicates data are reported in both paper and Supplemental Information | All relevant data are available from within the manuscript as well as a supplemental information file. |
| in SI | 682 (1.4%) | statement indicates data are reported in the Supplemental Information | All data and analysis code have been provided as Supporting Information files. |
| location not stated | 72 (0.2%) | statement says data are available but does not indicate where or how to locate the data | The authors confirm that all data underlying the findings are fully available without restriction. Data deposition. |
| N/A | 17 (< 0.1%) | statement includes some boilerplate text but also adds N/A or Not Applicable | The authors confirm that all data underlying the findings are fully available without restriction. N/A. |
| other | 31 (< 0.1%) | statement does not fit any of the nine categories | The authors confirm that all data underlying the findings are fully available without restriction. This paper is a theoretical discussion and therefore no data are involved. |
| repository | 7,334 (15.4%) | statement names a publicly accessible location where the data are available, such as a repository or website | Data sets for all samples are available in < |
| upon request | 688 (1.4%) | statement says that author or other individual or group must be contacted to access data | Data are available from < |
Fig 3Distribution of statements across categories by year.
Twenty most frequently mentioned repositories or sources.
| Rank | Repository | Count of Mentions |
|---|---|---|
| 1 | Figshare | 1,446 |
| 2 | Gene Expression Omnibus (GEO) | 1,001 |
| 3 | Genbank | 999 |
| 4 | Dryad | 987 |
| 5 | Sequence Read Archive (SRA) | 641 |
| 6 | Non-repository website | 329 |
| 7 | Institutional repository | 317 |
| 8 | GitHub | 280 |
| 9 | Dataverse | 217 |
| 10 | Protein Databank (PDB) | 172 |
| 11 | National Center for Biotechnology Information (NCBI) | 165 |
| 12 | Open Science Framework | 122 |
| 13 | ArrayExpress | 119 |
| 14 | European Nucleotide Archive (ENA) | 108 |
| 15 | DNA Data Bank of Japan (DDBJ) | 106 |
| 16 | Zenodo | 100 |
| 17 | European Molecular Biology Laboratory (EMBL) | 88 |
| 18 | BioProject | 79 |
| 19 | dbGaP | 64 |
| 20 | Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) | 45 |