| Literature DB >> 29281658 |
Renata Gonçalves Curty1, Kevin Crowston2, Alison Specht3,4, Bruce W Grant5, Elizabeth D Dalton6.
Abstract
The value of sharing scientific research data is widely appreciated, but factors that hinder or prompt the reuse of data remain poorly understood. Using the Theory of Reasoned Action, we test the relationship between the beliefs and attitudes of scientists towards data reuse, and their self-reported data reuse behaviour. To do so, we used existing responses to selected questions from a worldwide survey of scientists developed and administered by the DataONE Usability and Assessment Working Group (thus practicing data reuse ourselves). Results show that the perceived efficacy and efficiency of data reuse are strong predictors of reuse behaviour, and that the perceived importance of data reuse corresponds to greater reuse. Expressed lack of trust in existing data and perceived norms against data reuse were not found to be major impediments for reuse contrary to our expectations. We found that reported use of models and remotely-sensed data was associated with greater reuse. The results suggest that data reuse would be encouraged and normalized by demonstration of its value. We offer some theoretical and practical suggestions that could help to legitimize investment and policies in favor of data sharing.Entities:
Mesh:
Year: 2017 PMID: 29281658 PMCID: PMC5744933 DOI: 10.1371/journal.pone.0189288
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Factors determining a person’s behaviour using the Theory of Reasoned Action (adapted from [29]).
Factor loading for reported modes of obtaining data.
| Variable | Factor 1 | Factor 2 | Factor 3 |
|---|---|---|---|
| Q47(1): make a plan to generate or collect the data I need myself. | 0.4291 | ||
| Q47(2): make a plan to generate or collect the data I need within my research team. | 0.4660 | ||
| Q47(3): ask colleagues if they have data I can use for analysis. | 0.7743 | ||
| Q47(4): search for data to use for analysis | 0.6810 | ||
| Q47(5): ask colleagues if they know of data I can use for analysis | 0.8519 | ||
| Q47(6): talk to a librarian about my data needs | 0.5719 | ||
| Q47(7): consult my data manager | 0.5659 |
Orthogonal varimax rotation. Loadings < 0.3 not shown.
Factor loadings for types of data used.
| Variable | Factor 1 | Factor 2 | Factor 3 |
|---|---|---|---|
| Q7(3): Data models | 0.3279 | ||
| Q7(7): Remote-sensed abiotic data (including meteorological data) | 0.6915 | ||
| Q7(8): Remote-sensed biotic data | 0.6448 | ||
| Q7(1): Abiotic surveys (soils, microclimate, hydrology, etc.) | 0.6011 | ||
| Q7(2): Biotic surveys | 0.6300 | ||
| Q7(4): Experimental (involving some degree of manipulation) | |||
| Q7(6): Observational (no manipulation involved) | 0.3529 | ||
| Q7(5): Interviews | 0.6531 | ||
| Q7(9): Social Science Survey | 0.6247 | ||
Orthogonal varimax rotation. Loadings < 0.3 not shown.
Relationship between TRA and research constructs.
| TRA | Research Constructs | Factors | Survey Items |
|---|---|---|---|
| Attitudes Toward the Behaviour | Positive Attitudes Toward Data Reuse | Reuse_A_F1 Perceived efficiency of data reuse (positive effect) | Q49(1): Data reuse saves time |
| Q49(2): Data reuse is efficient | |||
| Q49(3): Data reuse is easier than having to collect all my own data for analysis | |||
| Q49(8): Data reuse is harder than conducting research using only my own data (reversed) | |||
| Q49(9): Data reuse takes longer than conducting research with only my own data (reversed) | |||
| Reuse_A_F2 Perceived efficacy of data reuse (positive effect) | Q49(6): Data reuse improves my results | ||
| Q49(7): Data reuse helps me answer my research questions | |||
| Negative Attitudes Toward Data Reuse | Reuse_A_F3 Concern about trustworthiness of data (negative effect) | Q49(5): Data reuse requires too much trust in others’ methods | |
| Q50(2): I don’t trust others’ data collection methods | |||
| Q50(3): Lack of adequate metadata | |||
| Q50(5): I don’t have enough information about their data to feel confident using it | |||
| Subjective Norms | Subjective Norms | Reuse_N_F4 Perceived norms against data reuse (negative effect) | Q49(4): Data reuse is hard to explain in methods section |
| Q50(1): I only receive career advancement credit for working with data I collect myself | |||
| Q50(4): Not knowing how to share credit | |||
| Q50(6): It’s too hard to explain in my research outputs | |||
| Q50(8): I feel pressure to collect my own data | |||
| Reuse_N_F5 Perceived importance of data reuse (positive effect) | Q20(1): Lack of access to data generated by other researchers or institutions is a major impediment to progress in science. | ||
| Q20(2): Lack of access to data generated by other researchers or institutions has restricted my ability to answer scientific questions. | |||
| Intention | Not Measured | Not Measured | Not Measured |
| Behaviour | Reuse Behaviour | Reuse | Q47(3): ask colleagues if they have data I can use for analysis. |
| Q47(4): search for data to use for analysis | |||
| Q47(5): ask colleagues if they know of data I can use for analysis | |||
| Q48: How often do you conduct research in which some or all of the data analyzed was collected by someone besides yourself or members of your immediate research team? |
Fig 2Research model.
H1a: Perceived efficiency of data reuse will positively correlate with data reuse; H1b: Perceived efficacy of data reuse will positively correlate with data reuse; H2: Concerns about the trustworthiness of data will negatively correlate with data reuse; H3: Perceived norms against data reuse will negatively correlate with data reuse; and H4: Perceived importance of data reuse will positively correlate with data reuse.
Fig 3Distribution of self-reported data reuse behaviour.
N = 589.
Fig 4When I need to analyze data to answer a research question, I …(N = 569/ 569/ 570/ 576/ 572/ 557/ 526).
Descriptive statistics for variables.
| Variable | N | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| Reuse—Self-reported reuse behaviour | 589 | 3.527 | 0.995 | 1 | 5 |
| Sharing—Self-reported sharing behaviour | 592 | 3.538 | 1.098 | 1 | 5 |
| Reuse_A_F1—Perceived efficiency of data reuse | 580 | 3.238 | 0.840 | 1 | 5 |
| Reuse_A_F2—Perceived efficacy of data reuse | 572 | 3.568 | 0.827 | 1 | 5 |
| Reuse_A_F3—Concern about trustworthiness of data | 562 | 3.753 | 0.887 | 1 | 5 |
| Reuse_N_F4—Perceived norms against data reuse | 579 | 2.492 | 0.867 | 1 | 5 |
| Reuse_N_F5—Perceived importance of data reuse | 590 | 3.878 | 0.869 | 1 | 5 |
| UseMetadata—Reported use of metadata | 595 | 0.474 | 0.500 | 0 | 1 |
| Data_F1—Use of models and remote sensed data | 582 | 0.038 | 0.793 | –0.813 | 1.692 |
| Data_F2—Use of natural science surveys or observations | 582 | 0.074 | 0.742 | –0.901 | 1.492 |
| Data_F3—Use of social science data | 582 | 0.029 | 0.759 | –0.636 | 1.840 |
Regression results.
| Model 1 | Model 2 | |
|---|---|---|
| N | 555 | 546 |
| R2 | 0.28 | 0.36 |
| Reuse_A_F1—Perceived efficiency of data reuse | 0.111 | 0.137 |
| Reuse_A_F2—Perceived efficacy of data reuse | 0.260 | 0.233 |
| Reuse_A_F3—Concern about trustworthiness of data | –0.015 | –0.009 |
| Reuse_N_F4—Perceived norms against data reuse | –0.112 | –0.097 |
| Reuse_N_F5—Perceived importance of data reuse | 0.360 | 0.299 |
| UseMetadata—Reported use of metadata | 0.191 | |
| Data_F1—Use of models and remote sensed data | 0.283 | |
| Data_F2—Use of natural science surveys or observations | –0.031 | |
| Data_F3—Use of social science data | –0.063 |
*** p ≤ 0.001.
** p ≤ 0.01.
* p ≤ 0.5.
+ p ≤ 0.1.
Regression results comparing those who do or do not report using metadata.
| Don't use metadata | Do use metadata | |
|---|---|---|
| 278 | 268 | |
| R2 | 0.37 | 0.26 |
| Reuse_A_F1—Perceived efficiency of data reuse | 0.074 | 0.198 |
| Reuse_A_F2—Perceived efficacy of data reuse | 0.246 | 0.217 |
| Reuse_A_F3—Concern about trustworthiness of data | –0.002 | 0.007 |
| Reuse_N_F4—Perceived norms against data reuse | –0.132 | –0.070 |
| Reuse_N_F5—Perceived importance of data reuse | 0.368 | 0.208 |
| Data_F1—Use of models and remote sensed data | 0.301 | 0.268 |
| Data_F2—Use of natural science surveys or observations | 0.015 | –0.058 |
| Data_F3—Use of social science data | –0.057 | –0.051 |
*** p ≤ 0.001.
** p ≤ 0.01.
* p ≤ 0.5.
+ p ≤ 0.1.
| agree strongly | agree somewhat | neither agree nor disagree | disagree somewhat | disagree strongly | not sure | |
| 1) …….make a plan to generate or collect the data I need myself. | o | o | o | o | o | o |
| 2) …….make a plan to generate or collect the data I need within my research team. | o | o | o | o | o | o |
| 3) …….ask colleagues if they have data I can use for analysis. | o | o | o | o | o | o |
| 4) …….search for data to use for analysis. | o | o | o | o | o | o |
| 5) ……. ask colleagues if they know of data I can use for analysis. | o | o | o | o | o | o |
| 6) …….talk to a librarian about my data needs. | o | o | o | o | o | o |
| 7) …….consult my data manager. | o | o | o | o | o | o |
| agree strongly | agree somewhat | neither agree nor disagree | disagree somewhat | disagree strongly | not sure | |
| 1) …….saves time. | o | o | o | o | o | o |
| 2) …….is efficient. | o | o | o | o | o | o |
| 3) …….is easier than having to collect all my own data for analysis. | o | o | o | o | o | o |
| 4) …….is hard to explain in methods section. | o | o | o | o | o | o |
| 5) …….requires too much trust in others’ methods. | o | o | o | o | o | o |
| 6) …….improves my results. | o | o | o | o | o | o |
| 7) …….helps me answer my research questions. | o | o | o | o | o | o |
| 8) …….is harder than conducting research using only my own data. | o | o | o | o | o | o |
| 9) …….takes longer than conducting research with only my own data. | o | o | o | o | o | o |
| agree strongly | agree somewhat | neither agree nor disagree | disagree somewhat | disagree strongly | not sure | |
| 1) I only receive career advancement credit for working with data I collect myself. | o | o | o | o | o | o |
| 2) I don’t trust others’ data collection methods. | o | o | o | o | o | o |
| 3) Lack of adequate metadata. | o | o | o | o | o | o |
| 4) Not knowing how to share credit. | o | o | o | o | o | o |
| 5) I don’t have enough information about their data to feel confident using it. | o | o | o | o | o | o |
| 6) It’s too hard to explain in my research outputs. | o | o | o | o | o | o |
| 7) I don’t like working with others. | o | o | o | o | o | o |
| 8) I feel pressure to collect my own data. | o | o | o | o | o | o |
| 9) Other (please specify below). | o | o | o | o | o | o |