| Literature DB >> 25653834 |
Carly Strasser1, John Kunze1, Stephen Abrams1, Patricia Cruse1.
Abstract
Scientific datasets have immeasurable value, but they lose their value over time without proper documentation, long-term storage, and easy discovery and access. Across disciplines as diverse as astronomy, demography, archeology, and ecology, large numbers of small heterogeneous datasets (i.e., the long tail of data) are especially at risk unless they are properly documented, saved, and shared. One unifying factor for many of these at-risk datasets is that they reside in spreadsheets. In response to this need, the California Digital Library (CDL) partnered with Microsoft Research Connections and the Gordon and Betty Moore Foundation to create the DataUp data management tool for Microsoft Excel. Many researchers creating these small, heterogeneous datasets use Excel at some point in their data collection and analysis workflow, so we were interested in developing a data management tool that fits easily into those work flows and minimizes the learning curve for researchers. The DataUp project began in August 2011. We first formally assessed the needs of researchers by conducting surveys and interviews of our target research groups: earth, environmental, and ecological scientists. We found that, on average, researchers had very poor data management practices, were not aware of data centers or metadata standards, and did not understand the benefits of data management or sharing. Based on our survey results, we composed a list of desirable components and requirements and solicited feedback from the community to prioritize potential features of the DataUp tool. These requirements were then relayed to the software developers, and DataUp was successfully launched in October 2012.Entities:
Year: 2014 PMID: 25653834 PMCID: PMC4304223 DOI: 10.12688/f1000research.3-6.v2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Locations and events where survey and/or interview data were collected on requirements.
| Venue | Collected |
|---|---|
| Ecological Society of America 2011 Summer
| 55 surveys |
| American Fisheries Society 2011 Fall
| 36 surveys |
| American Geophysical Union 2011
| 10 surveys |
| Estuarine Research Association 2011
| 2 surveys |
| UCSB | 8 surveys,
|
| UC Berkeley | 1 survey,
|
| UC Davis | 8 interviews |
| UC Santa Cruz | 11 interviews |
Figure 1. Demographic breakdown of researchers surveyed.
n = 133.
Figure 2. Breakdown of operating systems used by researchers surveyed.
n = 133.
Figure 3. Frequency of Excel use reported by researchers surveyed.
n = 118.
Figure 4. Percent of researchers surveyed who used Excel to perform certain tasks.
n = 119.
Figure 5. Percent of researchers surveyed that reported a given feature as present in their Excel data.
n = 70.
Figure 6. Percent of researchers surveyed that reported using a given program alongside Excel.
n = 131.
Feature comparison for the two versions of the DataUp tool: add-in for Excel and Web-based application.
| Feature | Excel add-in | Web-based
|
|---|---|---|
| Platform
| Windows only | Any |
| Spreadsheet
| Different add-in for
| One application covers
|
| Download
| Yes | No |
| Software
| Fixed bugs require
| No download/re-install
|
| Cloud-based? | No | Yes |
| Offline use? | Yes | No; potential future for
|
| Languages | C#.NET C/C++ | HTML/JavaScript
|
| Has all the
| Yes | No |
Metadata elements chosen for the DataUp metadata schema.
*elements are required.
|
| Abstract*
|
|
|
|
|
| Temporal coverage: Ending date
|
Figure 7. Architecture of the DataUp web service, web application, and add-in for Excel, and how they relate to the ONE Share repository.