| Literature DB >> 32661229 |
Abstract
This paper presents a dataset produced from the largest known survey examining how researchers and support professionals discover, make sense of and reuse secondary research data. 1677 respondents in 105 countries representing a variety of disciplinary domains, professional roles and stages in their academic careers completed the survey. The results represent the data needs, sources and strategies used to locate data, and the criteria employed in data evaluation of these respondents. The data detailed in this paper have the potential to be reused to inform the development of data discovery systems, data repositories, training activities and policies for a variety of general and specific user communities.Entities:
Year: 2020 PMID: 32661229 PMCID: PMC7359296 DOI: 10.1038/s41597-020-0569-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Creation of dataset in relation to prior empirical work by the author. Bolded rectangles indicate steps with associated publications, resulting from an analytical literature review[10], semi-structured interviews[11] and an analysis of the survey data[8].
Content and branches of questionnaire items listed according to questionnaire section.
| Category | Content | Branches |
|---|---|---|
| Role | If research support professional (RSP) - asked all questions plus those marked If not RSP - asked all questions marked R only; not asked questions marked | |
| Discipline* | ||
| Years professional experience | ||
| Country of employment | ||
| Organization type | ||
| Perceptions (own, others) about data sharing | ||
| Perceptions (own, others) about data reuse | ||
| Ever shared data ( | ||
| Describe needed data | ||
| Select data type needed* | ||
| Purpose for using secondary data* | ||
| Need data outside of discipline ( | If yes - how find these data | |
| Need data to support others/own research | If support others - Who do you support* | |
| - How do you provide support* | ||
| Who finds data for you* | ||
| Sources used to find data - frequency of use | If academic literature - how use literature?* If search engine - how successful? | |
| Active vs serendipitous data discovery | ||
| Social interactions in discovery and access** | ||
| Discover literature differently than data | If not no - describe differences | |
| Ease of finding data | If not easy - what are the challenges?* | |
| Social interactions in evaluating/sensemaking** | ||
| Important information about data when evaluating | ||
| Differences between self and those support | ||
| Evaluation/sensemaking strategies | ||
| Important aspects in establishing trust | ||
| Important aspects in establishing quality |
Questions only asked to research support professionals are marked with RSP; those only asked to other respondents are marked with R. Items that allowed multiple responses are marked with an asterisk. Items marked with a double asterisk correspond to the same multiple response question.
Description of files composing the dataset.
| Data file name | Description | No. cases | No. variables |
|---|---|---|---|
| GREGORY_DATA_DISCOVERY_Readme.txt | Provides guidance to the dataset | ||
| datadiscovery_researchers.csv | Contains data for respondents identifying themselves as researchers, students, managers or other types of professionals | 1630 | 165 |
| datadiscovery_supportprof.csv | Contains data for respondents identifying themselves as librarians, archivists or research/data support providers | 47 | 167 |
| variable_labels_researchers.csv | Contains a description of the variable names in the datadiscovery_researchers.csv file | ||
| variable_labels_supportprof.csv | Contains a description of the variable names in the datadiscovery_supportprof.csv | ||
| datadiscovery_questionnaire.pdf | Contains the questionnaire for both researchers and research support professionals |
Description of primary mnemonic codes used to preface variable names.
| Mnemonic code | Associated research aim |
|---|---|
| need_ | Data needs |
| use_ | Purposes for which data are used |
| find_ | Data search and discovery practices |
| strategy_ | Data discovery strategies Related to |
| source_ | Sources used to discover data Related to |
| eval_ | Data evaluation and sense-making |
| accss_ | Data access |
| disc_ | Discipline chosen |
| dem_ | Demographic information |
Fig. 2(a) Disciplinary domains selected by respondents; multiple responses possible (n = 3431). (b) Respondents’ years of professional experience; percentages denote percent of respondents (n = 1677). (c) Number of respondents by country of employment (n = 1677).
Percentage of recruited participants by geographic location compared to percentage of respondents providing complete responses.
| Percent of recruited sample | Percent of respondents | ||
|---|---|---|---|
| United States | 19% | United States | 13% |
| China | 15% | Italy | 7% |
| United Kingdom | 5% | Brazil | 7% |
| Germany | 5% | United Kingdom | 4% |
| Japan | 4% | India | 4% |
| France | 4% | South Korea | 4% |
| India | 4% | Netherlands | 4% |
| Italy | 3% | China | 4% |
| Canada | 3% | Mexico | 3% |
| Spain | 3% | Canada | 3% |
| Australia | 3% | Germany | 3% |
| South Korea | 3% | Australia | 2% |
| Brazil | 2% | Portugal | 2% |
| Netherlands | 2% | Iran | 2% |
| Russian Federation | 1% | Argentina | 2% |
| Taiwan | 1% | France | 2% |
| Iran | 1% | Greece | 1% |
| Swizerland | 1% | Japan | 1% |
| Turkey | 1% | Russian Federation | 1% |
| Poland | 1% | South Africa | 1% |
| Other | 19% | Romania | 1% |
Content of questions which were designed to allow multiple responses and their associated variables.
| Question content | Variable names |
|---|---|
| Discipline | all variables beginning with code |
| Data type needed | need_obs, need_exp, need_sim, need_deriv, need_oth |
| Purpose for using secondary data | use_calb, use_bmk, use_vrf, use_inpt, use_idea, use_tch, use_nwprj, use_nwmth, use_tnd, use_cmp, use_smvs, use_intg, use_oth |
| Who do you support? (RSP) | whosupprt_stud, whosupprt_res, whosupprt_indus, whosupprt_oth |
| How do you support? (RSP) | supprt_teachdmp, supprt_teachsklls, supprt_finddta, supprt_curate, supprt_findlit, supprt_oth |
| Who finds data for you? (R) | find_whoself, find_whograd, find_whosuppt, find_whonetwk, find_whooth |
| How do you use academic literature? | strategy_goal, strategy_serend, strategy_cit, strategy_extrct, strategy_oth |
| What are challenges? | find_chalaccs, find_chalskill, find_chaldistrb, find_chaldigtl, find_chaltools, find_chalnetwk, find_chaloth |
| Social interactions in finding, accessing, evaluation/sensemaking | find_netwk, find_creatr, find_collab, find_conf, find_list, accss_netwk, accss_creatr, accss_collab, accss_conf, accss_list, eval_netwk, eval_creatr, eval_collab, eval_conf, eval_list |
RSP indicates questions only asked to research support professionals. R indicates questions only present in the researcher data file.
| Measurement(s) | data discovery behaviours • data reuse behaviours • data management |
| Technology Type(s) | Survey • Self-Report |