| Literature DB >> 25535486 |
Cherian Mathew1, Anton Güntsch1, Matthias Obst2, Saverio Vicario3, Robert Haines4, Alan R Williams4, Yde de Jong5, Carole Goble4.
Abstract
The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users.Entities:
Keywords: biodiversity informatics; data cleaning; e-Science; service oriented architecture; web services; workflows
Year: 2014 PMID: 25535486 PMCID: PMC4267104 DOI: 10.3897/BDJ.2.e4221
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.Taxonomic Data Refinement Workflow. Schematic diagram showing the integrated functions. Intermediate output from each section of the workflow can be stored and re-used as input for subsequent iterations.
Figure 2.Taxonomic Name Resolution. Overview of the Name Resolution function of the Taxonomic Data Refinement Workflow, depicting the aggregation of scientific name responses from the various checklist into a single XML message. This message is then used to display the results within a web interface.
Figure 3.OpenRefine interface with the BioVeL extension. The extension adds biodiversity data specific functionality to OpenRefine for the purposes of data cleaning, integration, and refinement. The GoogleRefine branding in the screenshot is due to the fact this workflow uses the last stable released version (2.5) of OpenRefine when the software was still being developed by Google.
Figure 6.Selection of targeted checklists (as controlled taxonomic vocabularies) in the name resolution process.
Figure 4.BioSTIF web interface. The interface allows users to filter species occurrence points based on selected geographical regions and time periods.
Figure 5.Data Refinement Workflow. Birds eye overview of service interactions of the workflow as shown in Taverna Workbench.