| Literature DB >> 29049281 |
Lindsay Barone1, Jason Williams1, David Micklos1.
Abstract
In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC-acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.Entities:
Mesh:
Year: 2017 PMID: 29049281 PMCID: PMC5654259 DOI: 10.1371/journal.pcbi.1005755
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Major data types used by National Science Foundation (NSF) Biological Sciences Directorate (BIO) principal investigators (PIs).
Fig 2Current (grey) and future (blue) data analysis needs of National Science Foundation (NSF) Biological Sciences Directorate (BIO) principal investigators (PIs) (percent responding affirmatively, 387 ≤ n ≤ 551).
Current and future data analysis needs of National Science Foundation (NSF) Biological Sciences Directorate (BIO) principal investigators (PIs): Bioinformaticians versus others, large versus small research groups.
| Current needs | Needs in 3 years | |||||||
|---|---|---|---|---|---|---|---|---|
| Bioinformatics 151 ≤ | All others 91 ≤ | Large group (>5 people) 245 ≤ | Small group (<5 people) 293 ≤ | Bioinformatics 114 ≤ | All others 263 ≤ | Large group (>5 people) 187 ≤ | Small group (<5 people) 196 ≤ | |
| Publish data to the community | 0.97 | 0.90 | 0.94 | 0.90 | 0.97 | 0.98 | ||
| Sufficient data storage | 0.94 | 0.91 | 0.94 | 0.90 | 0.97 | 0.98 | 0.98 | 0.98 |
| Share data with colleagues | 0.93 | 0.90 | 0.97 | 0.98 | 0.98 | 0.97 | ||
| Updated analysis software | 0.93 | 0.90 | 0.92 | 0.90 | 0.96 | 0.96 | 0.96 | 0.95 |
| Training on data management and metadata | 0.83 | 0.77 | ||||||
| Support for bioinformatics and analysis | 0.80 | 0.75 | 0.90 | 0.87 | ||||
| Training on basic computing and scripting | 0.94 | 0.90 | 0.92 | 0.90 | ||||
| Search for data and discover relevant data sets | 0.74 | 0.75 | 0.96 | 0.89 | 0.91 | 0.91 | ||
| Multistep analysis workflows or pipelines | ||||||||
| High performance computing (HPC) | ||||||||
| Training on integration of multiple data types | ||||||||
| Cloud computing | ||||||||
| Training on scaling analysis to cloud/HPC | ||||||||
Percent responding affirmatively. Bold text indicates a statistically significant chi-square result between groups (bioinformaticians versus others; large research groups versus small).
Current and future data analysis needs of National Science Foundation (NSF) Biological Sciences Directorate (BIO) principal investigators (PIs) by the NSF BIO division.
| Current needs | Needs in 3 years | |||||||
|---|---|---|---|---|---|---|---|---|
| Environ-mental biology 163 ≤ | Molecular and cellular biosciences 85 ≤ | Biological infra-structure 116 ≤ | Integrative organismal systems 159 ≤ | Environ-mental biology 124 ≤ | Molecular and cellular biosciences 59 ≤ | Biological infra-structure 85 ≤ | Integrative organismal systems 108 ≤ | |
| Publish data to the community | 0.99 | 0.98 | 0.96 | 0.98 | ||||
| Sufficient data storage | 0.93 | 0.91 | 0.90 | 0.94 | 0.99 | 0.95 | 0.97 | 0.98 |
| Share data with colleagues | 0.95 | 0.90 | 0.91 | 0.88 | 0.98 | 0.99 | 0.96 | 0.97 |
| Updated analysis software | 0.92 | 0.88 | 0.91 | 0.91 | 0.95 | 0.93 | 0.95 | 0.99 |
| Training on data management and metadata | 0.94 | 0.91 | 0.95 | 0.92 | ||||
| Support for bioinformatics and analysis | 0.80 | 0.83 | 0.72 | 0.76 | 0.89 | 0.90 | 0.88 | 0.87 |
| Training on basic computing and scripting | 0.94 | 0.89 | 0.85 | 0.92 | ||||
| Search for data and discover relevant data sets | 0.75 | 0.77 | 0.77 | 0.71 | 0.93 | 0.93 | 0.91 | 0.88 |
| Multistep analysis workflows or pipelines | 0.93 | 0.88 | 0.92 | 0.86 | ||||
| High performance computing (HPC) | 0.91 | 0.90 | 0.85 | 0.83 | ||||
| Training on integration of multiple data types | 0.69 | 0.57 | 0.62 | 0.65 | 0.91 | 0.93 | 0.89 | 0.91 |
| Cloud computing | 0.56 | 0.41 | 0.50 | 0.46 | 0.87 | 0.85 | 0.84 | 0.87 |
| Training on scaling analysis to cloud/HPC | 0.55 | 0.46 | 0.50 | 0.41 | 0.86 | 0.78 | 0.79 | 0.80 |
Percent responding affirmatively. Bold text indicates a statistically significant chi-square result between BIO divisions.
Fig 3Unmet data analysis needs of National Science Foundation (NSF) Biological Sciences Directorate (BIO) principal investigators (PIs) (percent responding negatively, 318 ≤ n ≤ 510).