| Literature DB >> 28584342 |
Stephanie E Hampton1, Matthew B Jones1, Leah A Wasser1, Mark P Schildhauer1, Sarah R Supp1, Julien Brun1, Rebecca R Hernandez1, Carl Boettiger1, Scott L Collins1, Louis J Gross1, Denny S Fernández1, Amber Budden1, Ethan P White1, Tracy K Teal1, Stephanie G Labou1, Juliann E Aukema1.
Abstract
The scale and magnitude of complex and pressing environmental issues lend urgency to the need for integrative and reproducible analysis and synthesis, facilitated by data-intensive research approaches. However, the recent pace of technological change has been such that appropriate skills to accomplish data-intensive research are lacking among environmental scientists, who more than ever need greater access to training and mentorship in computational skills. Here, we provide a roadmap for raising data competencies of current and next-generation environmental researchers by describing the concepts and skills needed for effectively engaging with the heterogeneous, distributed, and rapidly growing volumes of available data. We articulate five key skills: (1) data management and processing, (2) analysis, (3) software skills for science, (4) visualization, and (5) communication methods for collaboration and dissemination. We provide an overview of the current suite of training initiatives available to environmental scientists and models for closing the skill-transfer gap.Entities:
Keywords: computing; data management; ecology; informatics; workforce development
Year: 2017 PMID: 28584342 PMCID: PMC5451289 DOI: 10.1093/biosci/bix025
Source DB: PubMed Journal: Bioscience ISSN: 0006-3568 Impact factor: 8.589
Examples of existing training resources and events.
| Type | Title | Organization | Data | Analysis | Software | Visualization | Collaboration | Target Audience | License | Web site |
|---|---|---|---|---|---|---|---|---|---|---|
| Lesson | Learn X and Y minutes, where X=json | Adam Bard/Anna Harren | *check* | Programmers | Open (CC-BY-SA) |
| ||||
| Lesson | R for reproducible scientific analysis | Software Carpentry | *check* | *check* | Researchers using R | Open (CC-BY) |
| |||
| Unit | NEON Data Skills | National Ecological obsevatory Network | *check* | *check* | *check* | *check* | *check* | Researchers, Students, Instructors | Open (CC-BY) |
|
| Unit | DataONE Data Management Modules | DataONE | *check* | *check* | *check* | Researchers, Instructors, Librarians | Open (CC0) |
| ||
| Workshop | Data Cerpentry Workshops | Data Carpentry | *check* | *check* | *check* | *check* | Researchers | Open (CC-BY) |
| |
| Workshop | Open Science for Synthesis | National Center for Ecological Analysis and Synthesis | *check* | *check* | *check* | *check* | *check* | Researchers | Open (CC-BY) |
|
| Course | Data wrangling, exploration, and analysis with R | University of British Columbia | *check* | *check* | *check* | *check* | Graduate students | Open (CC-BY-NC) |
| |
| Course | Programing for Biologists | Weecology Lab | *check* | *check* | *check* | *check* | Undergraduate students | Open (CC-BY) |
| |
| Program | Data Science Program (Coursera) | Johns Hopekins University | *check* | *check* | *check* | Researchers, Students | Proprietary |
| ||
| Program | Berkeley Data Science Education Program | University of California, Berkeley | *check* | *check* | *check* | *check* | *check* | Undergraduate students | Open (CC-BY-NC) |
|
A taxonomy of skills for data-intensive research.
| Data management and processing | Software skills for science | Analysis | Visualization | Communication for collaboration and results dissemination |
|---|---|---|---|---|
| Fundamentals of data management | Software development practices and engineering mindset | Basic statistical inference | Visual literacy and graphical principles | Reproducible open science |
| Modeling structure and organization of data | Version control | Exploratory analysis | Visualization services and libraries | Collaboration workflows for groups |
| Database management systems and queries (e.g., SQL) | Software testing for reliability | Geospatial information handling | Visualization tools | Collaborative online tools |
| Metadata concepts, standards, and authoring | Software workflows | Spatial analysis | Interactive visualizations | Conflict resolution |
| Data versioning, identification, and citation | Scripted programming (e.g., R and Python) | Time-series analysis | 2D and 3D visualization | Establishing collaboration policies |
| Archiving data in community repositories | Command-line programming | Advanced linear modeling | Web visualization tools and techniques | Composition of collaborative teams |
| Moving large data | Software design for reusability | Nonlinear modeling | Interdisciplinary thinking | |
| Data-preservation best practices | Algorithm design and development | Bayesian techniques | Discussion facilitation | |
| Units and dimensional analysis | Data structures and algorithms | Uncertainty propagation | Documentation | |
| Data transformation | Concepts of cloud and high-performance computing | Meta-analysis and systematic reviews | Website development | |
| Integrating heterogeneous, messy data | Practical cloud computing | Scientific workflows | Licensing | |
| Quality assessment | Code parallelization | Scientific algorithms | Message development for diverse audiences | |
| Quantifying data uncertainty | Numerical stability | Simulation modeling | Social media | |
| Data provenance and reproducibility | Algorithms for handling large data | Analytical modeling | ||
| Data semantics and ontologies | Machine learning |
Note: Many if not most of these elements apply across multiple categories. This taxonomy was initially created in a workshop involving natural and physical scientists, information scientists, and computer scientists (), with modest refinements by the authors.
Figure 2.Resources that promote data-intensive research skills, emerging from open education initiatives, can be incorporated into traditional education programs and coordinated either inside or outside of academia to form the basis of a data-intensive curriculum.