| Literature DB >> 30197664 |
Amy Pienta1, Dharma Akmon1, Justin Noble1, Lynette Hoelter1, Susan Jekielek1.
Abstract
Social scientists are producing an ever-expanding volume of data, leading to questions about appraisal and selection of content given finite resources to process data for reuse. We analyze users' search activity in an established social science data repository to better understand demand for data and more effectively guide collection development. By applying a data-driven approach, we aim to ensure curation resources are applied to make the most valuable data findable, understandable, accessible, and usable. We analyze data from a domain repository for the social sciences that includes over 500,000 annual searches in 2014 and 2015 to better understand trends in user search behavior. Using a newly created search-to-study ratio technique, we identified gaps in the domain data repository's holdings and leveraged this analysis to inform our collection and curation practices and policies. The evaluative technique we propose in this paper will serve as a baseline for future studies looking at trends in user demand over time at the domain data repository being studied with broader implications for other data repositories.Entities:
Year: 2017 PMID: 30197664 PMCID: PMC6128405 DOI: 10.2218/ijdc.v12i2.500
Source DB: PubMed Journal: Int J Digit Curation ISSN: 1746-8256
Frequency and total volume of site search on ICPSR’s website, 2011-2016.
| Year | ||||||
|---|---|---|---|---|---|---|
| 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | |
| % Non-bounce visits that | 49.37 | 49.17 | 49.99 | 50.03 | 48.5 | 48.52 |
| Total number of unique | 448,350 | 461,639 | 513,824 | 539,786 | 504,015 | 525,876 |
Classification of the top 500 search terms/phrases from ICPSR’s website, 2014 and 2015
| 2014 | 2015 | |||
|---|---|---|---|---|
| N | % | N | % | |
| Keyword | 365 | 73 | 348 | 69.6 |
| Serial Collection | 51 | 10.2 | 58 | 11.6 |
| Study | 79 | 15.8 | 90 | 18 |
| PI/Author | 5 | 1 | 3 | 0.6 |
Top keyword searches and user behavior from Google Analytics from ICPSR’s website, 2014.
| Search Phrase | # Exact | % Search | % Search | Average | Average | # Searches |
|---|---|---|---|---|---|---|
| education | 2,062 | 24.68% | 19.06% | 0:05:22 | 3.58 | 11,446 |
| crime | 1,591 | 23.76% | 16.55% | 0:05:37 | 3.75 | 14,710 |
| health | 1,156 | 23.62% | 17.11% | 0:06:04 | 3.93 | 20,777 |
| china | 1,011 | 41.64% | 10.67% | 0:06:19 | 3.34 | 4,296 |
| income | 971 | 19.26% | 27.57% | 0:05:36 | 3.14 | 5,827 |
| domestic | 924 | 28.79% | 14.75% | 0:06:15 | 3.59 | 2,348 |
| immigration | 833 | 25.57% | 17.73% | 0:06:42 | 4.12 | 2,231 |
| race | 801 | 18.85% | 25.69% | 0:04:38 | 3.15 | 3,816 |
| obesity | 749 | 28.57% | 16% | 0:06:10 | 3.59 | 2,215 |
| happiness | 742 | 14.69% | 19.30% | 0:07:22 | 8.38 | 1,147 |
Top keyword searches and user behavior from Google Analytics from ICPSR’s website, 2015.
| Search Phrase | # Exact | % | % Search | Average | Average | # Searches | 2014 |
|---|---|---|---|---|---|---|---|
| education | 1,952 | 22.69 | 19.57 | 0:05:47 | 4.38 | 11,016 | 1 |
| crime | 1,609 | 24.30 | 16.79 | 0:05:29 | 4.35 | 12,806 | 2 |
| health | 1,149 | 24.28 | 17.41 | 0:05:49 | 4.48 | 20,398 | 3 |
| income | 986 | 20.59 | 25.71 | 0:05:16 | 3.61 | 5,609 | 6 |
| immigration | 904 | 25.44 | 14.99 | 0:05:49 | 3.95 | 2,248 | 8 |
| domestic | 896 | 28.01 | 14.54 | 0:05:50 | 3.74 | 2,195 | 4 |
| mental | 896 | 21.32 | 17.56 | 0:06:13 | 4.69 | 3,505 | 7 |
| race | 826 | 20.22 | 29.16 | 0:04:24 | 2.86 | 4,137 | 10 |
| china | 793 | 39.22 | 10.50 | 0:06:07 | 3.89 | 3,555 | 5 |
| diabetes | 733 | 45.84 | 12.19 | 0:05:11 | 3.88 | 1,245 | 40 |
Top ten keyword searches with highest search:study ratio from ICPSR’s website, 2014.
| Search Phrase | # Exact | # Searches | # | Search: Study | % | % |
|---|---|---|---|---|---|---|
| social media | 336 | 812 | 20 | 40.6 | 26.19 | 26.6 |
| NCAA | 136 | 323 | 11 | 29.4 | 35.29 | 28.17 |
| LGBT | 216 | 658 | 25 | 26.3 | 24.07 | 22.8 |
| justice | 114 | 156 | 7 | 22.3 | 43.86 | 39.1 |
| 2012 | 118 | 396 | 18 | 22 | 17.8 | 15.4 |
| trafficking | 362 | 505 | 25 | 20.2 | 30.39 | 30.5 |
| generation | 255 | 291 | 15 | 19.4 | 74.51 | 68.73 |
| body image | 147 | 401 | 31 | 12.9 | 18.37 | 15.96 |
| smiddle and frisk | 88 | 149 | 14 | 10.6 | 36.36 | 27.52 |
| demoralization | 323 | 323 | 31 | 10.4 | 96.59 | 95.59 |
Top ten keyword searches with highest search:study ratio from ICPSR’s website, 2015
| Search Phrase | # Exact | # Searches | #ICPSR | Search: Study | % | % |
|---|---|---|---|---|---|---|
| theatre audience | 108 | 108 | 4 | 27 | 100 | 100 |
| LGBT | 236 | 822 | 33 | 24.91 | 22.03 | 20.07 |
| restorative justice | 136 | 194 | 8 | 24.25 | 33.82 | 28.35 |
| social media | 359 | 963 | 45 | 21.4 | 22.84 | 23.57 |
| 2012 election | 156 | 320 | 23 | 13.98 | 8.33 | 10 |
| microfinance | 91 | 151 | 11 | 13.73 | 35.16 | 30.46 |
| human trafficking | 356 | 517 | 51 | 10.14 | 32.3 | 30.95 |
| sex trafficking | 151 | 291 | 41 | 7.1 | 23.18 | 20.96 |
| body image | 88 | 191 | 31 | 6.16 | 26.14 | 20.94 |
| drug court | 108 | 378 | 107 | 3.53 | 34.26 | 27.51 |