| Literature DB >> 22207805 |
Anne E Thessen1, David J Patterson.
Abstract
We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the "Big New Biology". Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.Entities:
Keywords: data issues; escience; incentives; informatics; life science; standards
Year: 2011 PMID: 22207805 PMCID: PMC3234430 DOI: 10.3897/zookeys.150.1766
Source DB: PubMed Journal: Zookeys ISSN: 1313-2970 Impact factor: 1.546
Figure 1.Rogers adoption curve describes the acceptance of a new technology. Life Sciences is still in the Early Adopters phase for accepting principles of data readiness.
List of funding agencies and characteristics of their data policies
| Gordon and Betty Moore Foundation | US | × | × | × | |||||||||||
| Genome Canada | Canada | × | × | × | × | × | × | × | Data must be made available no later than the publication date or the date the patent has been filed (which ever comes first) at the end of the project | ||||||
| National Institutes of Health | US | × | × | Applies to projects requesting > $500,000, data must be released no later than the acceptance of publication of the main findings from the final data set | |||||||||||
| Biotechnology and Biological Sciences Research Council | UK | × | × | × | × | × | data release no later than publication or within 3 years of generation, Researchers are expected to ensure data availability for 10 years after completion of project | ||||||||
| Natural Environment Research Council | UK | × | × | × | × | × | Data must be made available within 2 years from the end of data collection | ||||||||
| Wellcome Trust | UK | × | × | ||||||||||||
| Department of Energy | US | × | × | × | × | × | × | × | × | Requires deposit of 1) protocols 2) raw data 3) other relevant materials no later than 3 months after publication | |||||
| Chinese Academy of Sciences | China | Requires deposit or no further funding | |||||||||||||
| Australian Research Council | Australia | No policy | |||||||||||||
| National Science Foundation | US | × | |||||||||||||
| Austrian Science Fund | Austria | × | × | Data must be available no more than 2 years after end of project | |||||||||||
| NASA | US | × | × | Data can be embargoed for 2 years | |||||||||||
| NOAA | US | × | × | ||||||||||||
| Council for Scientific and Industrial Research | India | Plan being developed in 2010 | |||||||||||||
| North Pacific Research Board | US | × | × | Data must be transferred to NPRB by the end of the project | |||||||||||
| Japan Science and Technology Agency | Japan | None | |||||||||||||
| National Research Foundation | South Africa | None |
Examples of repositories for Life Sciences data.
| AlgaeBase | algae names and references | |
| ArrayExpress | microarray | |
| Australia National Data Service | general research data | |
| ConceptWiki | concepts | |
| CSIRO | fisheries catch | |
| Data.gov | natural resources data | |
| Dipteran information | ||
| EMAGE | gene expression | |
| ENA | gene sequences | |
| Ensembl | genomes | |
| Euregene | renal genome | |
| Eurexpress | transcriptome | |
| EURODEER | movement of roe deer | |
| FishBase | fish information | |
| GBIF | occurrences | |
| GenBank | gene sequences | |
| GEO | microarray | |
| GNI | names | |
| INBIO | Costa Rican biodiversity | |
| INSPIRE | spatial | |
| KEGG | genes | |
| Life Sciences Data Archive NASA | effects of space on humans | |
| MassBank | mass spectra | |
| MGI | mouse | |
| MorphBank | images | |
| OBIS | occurrences | |
| OMIM | human genes and phenotypes | |
| PDB | molecule structure | |
| PRIDE | proteomics | |
| PubMed | citations | |
| Stanford Microarray Database | microarray | |
| tair | Arabidopsis molecular biology | |
| TOPP | animal tagging | |
| TreeBase | phylogenetic trees | |
| TROPICOS | plant specimens | |
| UniProt | protein sequence and function | |
| WILDSPACE | life history information | |
| WRAM | wireless remote animal monitoring |
Figure 2.A Big New Biology can only emerge with a framework that optimizes reuse. Ideally, data should be in forms that can flow from source into a common pool and can flow back out to consumers, be subject to quality control, or be enhanced through analysis to rejoin the pool as processed data.
Examples of standards and their location.
| ABCD | Schema | |
| Bioontology | Ontology Repository | |
| BIRN | ||
| Cardiac Electrophysiology Ontology | Ontology | |
| CMECS | Coastal and marine ecological classification standard | Vocabulary |
| Comparative Data Analysis ontology | Ontology | |
| Darwin Core | Metadata | |
| Dublin Core | Metadata | |
| Ecological Metdata Language | Metadata | |
| Environment Ontology | Ontology | |
| Evolution Ontology | Ontology | |
| Experimental Factor Ontology | Ontology | |
| Federal Geospatial Data Committee | Metadata | |
| Fungal Anatomy | Ontology | |
| Gene Ontology | Ontology | |
| Homology Ontology | Ontology | |
| HUPO | Vocabulary | |
| Infectious Disease ontology | Ontology | |
| International Standards Organization | Metadata | |
| Marine Metadata Interoperability | Metadata | |
| Miriam | Vocabulary | |
| National Biodiversity Information Infrastructure | Metadata | |
| Ontology of Microbial Phenotypes | Ontology | |
| Open Biological and Biomedical Ontologies | Ontology Repository | |
| Phenotype Quality Ontology | Ontology | |
| Plant Ontology | Ontology | |
| SDD | Schema | |
| Species Profile Model | Schema | |
| Taxonomic Concept Schema | Schema | |
| TDWG | Metadata | |
| Teleost Anatomy Ontology | Ontology |
Figure 3.Technical infrastructure needed for Big New Biology to fully emerge (based on Sinha et al. 2010).