| Literature DB >> 23785456 |
Benjamin D Morris1, Ethan P White.
Abstract
Ecological research relies increasingly on the use of previously collected data. Use of existing datasets allows questions to be addressed more quickly, more generally, and at larger scales than would otherwise be possible. As a result of large-scale data collection efforts, and an increasing emphasis on data publication by journals and funding agencies, a large and ever-increasing amount of ecological data is now publicly available via the internet. Most ecological datasets do not adhere to any agreed-upon standards in format, data structure or method of access. Some may be broken up across multiple files, stored in compressed archives, and violate basic principles of data structure. As a result acquiring and utilizing available datasets can be a time consuming and error prone process. The EcoData Retriever is an extensible software framework which automates the tasks of discovering, downloading, and reformatting ecological data files for storage in a local data file or relational database. The automation of these tasks saves significant time for researchers and substantially reduces the likelihood of errors resulting from manual data manipulation and unfamiliarity with the complexities of individual datasets.Entities:
Mesh:
Year: 2013 PMID: 23785456 PMCID: PMC3681786 DOI: 10.1371/journal.pone.0065848
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
A sample of datasets available from the EcoData Retriever.
|
|
|
|
|---|---|---|
| Capellini et al. 2010 [ | 1 file, 55.3 KB | 1 second |
| Petraitis et al. 2008 [ | 2 files, 121 KB | 1 second |
| Ernest et al. 2003 [ | 1 file, 149.6 KB | 1 second |
| Smith et al. 2003 [ | 1 file, 372 KB | 2 seconds |
| Lislevand et al. 2007 [ | 1 file, 824.5 KB | 5 seconds |
| Jones et al. 2009 [ | 1 file, 2.2 MB | 9 seconds |
| USDA Plant Taxonomy | 1 file, 6.9 MB | 16 seconds |
| McGlinn et al. 2010 [ | 6 files, 1.5 MB | 16 seconds |
| Ramesh et al. 2010 [ | 4 files, 1.6 MB | 18 seconds |
| North American Breeding Bird Survey [ | 66 files, 217.2 MB | 18 seconds |
| Ernest et al. 2009 [ | 3 files, 2.1 MB | 23 seconds |
| Woods 2009 [ | 6 files, 2.3 MB | 25 seconds |
| Del Moral 2010 [ | 4 files, 485.6 KB | 28 seconds |
| Zachmann et al. 2010 [ | 1 file, 10.1 MB | 35 seconds |
| Adler et al. 2007 [ | 6 files, 10.1 MB | 40 seconds |
| Alwyn H. Gentry Forest Transect Data | 226 files, 9.4 MB | 44 seconds |
| Barnes et al. 2008 [ | 1 file, 21.5 MB | 1 minute, 13 seconds |
| Forest Inventory and Analysis [ | 329 files, 6.5 GB | 43 minutes, 31 seconds |
Tested using MySQL on a machine with 4 GB RAM and 4 x 2.4GHz processor.
Includes time required to download and reformat data and import to MySQL
Figure 1The EcoData Retriever dataset download interface.
Each available dataset includes citation information as well as a link to more information from the dataset homepage.
Figure 2An EcoData Retriever dataset script file.
An example of a simple EcoData Retriever dataset script file for a dataset containing six tables. For many text-based data formats, the EcoData Retriever will automatically infer column names and data types from the data file itself, so users need only to list the data file URLs and metadata such as name and citation.