| Literature DB >> 26401453 |
Tessa E Pronk1, Paulien H Wiersma1, Anne van Weerden1, Feike Schieving2.
Abstract
While reusing research data has evident benefits for the scientific community as a whole, decisions to archive and share these data are primarily made by individual researchers. In this paper we analyse, within a game theoretical framework, how sharing and reuse of research data affect individuals who share or do not share their datasets. We construct a model in which there is a cost associated with sharing datasets whereas reusing such sets implies a benefit. In our calculations, conflicting interests appear for researchers. Individual researchers are always better off not sharing and omitting the sharing cost, at the same time both sharing and not sharing researchers are better off if (almost) all researchers share. Namely, the more researchers share, the more benefit can be gained by the reuse of those datasets. We simulated several policy measures to increase benefits for researchers sharing or reusing datasets. Results point out that, although policies should be able to increase the rate of sharing researchers, and increased discoverability and dataset quality could partly compensate for costs, a better measure would be to directly lower the cost for sharing, or even turn it into a (citation-) benefit. Making data available would in that case become the most profitable, and therefore stable, strategy. This means researchers would willingly make their datasets available, and arguably in the best possible way to enable reuse.Entities:
Keywords: Citation benefit; Game theory; Impact; Research data; Reuse; Science policy; Share
Year: 2015 PMID: 26401453 PMCID: PMC4579014 DOI: 10.7717/peerj.1242
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Parameters, variables, and their values.
Overview of parameters, variables, and their standard values used in the model. Grey rows indicate the parameters that are varied in the model to assess their influence (examples for real-world measures to change these are explained in Table 2).
| Parameter | Meaning | Value | Source | Unit |
|---|---|---|---|---|
|
| Time-cost to produce a paper | 0.13 | Derived: | Year/paper |
|
| Time-cost to produce a dataset | 0.2 | Derived: | Year/paper |
|
| Time-cost to prepare a dataset for sharing | 0.1 | Estimated: 36.5 days | Year/paper |
|
| Time-cost to prepare a dataset to reuse | 0.05 | Estimated: 18.25 days | Year/paper |
|
| Decay rate of shared datasets | 0.1 | Derived: based on a storage time of 10 years | 1/year |
|
| Citation benefit (sharing researcher) | 0 | Estimated: percent extra citations | Percent |
|
| Probability to find an appropriate dataset | 0.00001 | Fitted | 1/dataset |
|
| Citations per paper produced | 3.4 | Derived: approximate from ’baselines’; average citation rate by year three, Thompson Reuters | Citation/paper |
|
|
|
|
| |
|
| Impact | See formula | Calculated | Citation/year |
|
| Number of papers | See formula | Calculated | Paper/year |
|
| Time for a publication | See formula | Calculated | Year/paper |
|
| Pool of shared datasets | See formula | Calculated | Dataset |
|
| Number of researchers | 10000 | Defined | n.a. |
Changed parameters and associated measures.
Overview of considered parameters determining reuse and sharing habits of researchers, and possible measures to improve these in a realistic setting.
| Parameters investigated in the model | Possible associated measures to improve this |
|---|---|
| Time ‘ | • Improve data quality, for instance by the use of data journals ( |
| Chance ‘ | • Harvest databases through data portals to reduce ‘scattering’ of datasets. • Standardization of metadata and documentation. • Advanced community and project-specific databases. • Library assistance in finding and using appropriate datasets. |
| Time ‘ | • Offer a good storing & sharing IT infrastructure. • Assistance with good data management planning at the early stages of a research project. |
| Benefit in citation per paper ’ | • Provide a permanent link between paper and dataset. • Increase attribution to datasets by citation rules . • Establish impact metrics for datasets. |
| Percentage of scientists sharing their research data | • Promote sharing by a top down policy from an institute, funder, or journal. • Promote sharing bottom up by offering education on the benefits of sharing, to change researchers’ mind set. |
Figure 1Publication distribution.
The sampled (bars) and fitted (line) distribution of published papers per researcher in a given year, in this case 2013. For reasons of visualisation the distribution is shown up to thirty publications, whereas the sampling sporadically included more publications per researcher. The fitted line is used as the published papers’ distribution for the simulated community.
Figure 2Impact per sharing type.
Citations (‘impact’) per year for researchers sharing and not sharing, at different percentages of sharing researchers. The simulations are done at parameter settings (A) default (see Table 1), (B) default but with f increased threefold (C) default but with t decreased threefold (D) default but with t decreased threefold (E) default but with b set to 0.1 (F) default but with b set to 0.4. The curved light-grey line depicts the impact of the sharing researchers . The curved dark-grey line depicts the impact of the not sharing researchers. The thin dotted curved black line is the averaged community impact. The straight black vertical dotted line depicts the percentage of sharing researchers at which community impact is maximized. The straight horizontal lines respectively depict the impact at zero percent researchers sharing (dark-grey line; dots-stripes) and hundred percent sharing researchers (light-grey line; stripes).
Figure 3Individual gains with sharing.
Gains from sharing in number of citations per individual researcher. These gains are calculated for the situation with fifty percent sharing researchers compared to the same situation without sharing researchers. For visualization purposes, the researchers are sorted according to sharing habitat: not sharing researchers (dark grey circles) to the left, sharing researchers (light grey circles) to the right. See the legend of Fig. 2 for parameter settings in all subfigures.
Figure 4Community impact.
Average community impact with varying percentage of sharing researchers and varying sharing benefit b. Figures are calculated at default parameter values (see Table 1) with the exception of b which is varied, and for subplot (B) t, of which the value was set from 0.1 to 0.2. On the z-axis is the average community impact. On the x and y axes, respectively, increasing benefits b for sharing from 0 to 0.8 (0 to 80% citation benefit with sharing) and increasing percentage of sharing researchers from 0 to 100%.