| Literature DB >> 24076747 |
Jessica D Tenenbaum1, Susanna-Assunta Sansone, Melissa Haendel.
Abstract
In the era of Big Data, omic-scale technologies, and increasing calls for data sharing, it is generally agreed that the use of community-developed, open data standards is critical. Far less agreed upon is exactly which data standards should be used, the criteria by which one should choose a standard, or even what constitutes a data standard. It is impossible simply to choose a domain and have it naturally follow which data standards should be used in all cases. The 'right' standards to use is often dependent on the use case scenarios for a given project. Potential downstream applications for the data, however, may not always be apparent at the time the data are generated. Similarly, technology evolves, adding further complexity. Would-be standards adopters must strike a balance between planning for the future and minimizing the burden of compliance. Better tools and resources are required to help guide this balancing act.Entities:
Keywords: Data Sharing; Data Standards; Information dissemination; Terminology
Mesh:
Substances:
Year: 2013 PMID: 24076747 PMCID: PMC3932466 DOI: 10.1136/amiajnl-2013-002066
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
A sampling of (some of the) standards related to microarray-based transcriptomics, generated by non-experts for evaluation of relevance to a project involving microarray-based transcriptomics data
| Standard | Type | Description |
|---|---|---|
| MIAME | Reporting guideline | Minimum Information About a Microarray Experiment |
| ISA-TAB | Exchange format | Generic format for experimental representations; conversion tools to MAGE-Tab, MIMiML and other formats exist |
| MAGE-TAB | Exchange format | MicroArray and Gene Expression-Tabular |
| MAGE-ML | Exchange format | MicroArray and Gene Expression-Markup Language. No longer supported |
| SOFT | Exchange format | Simple Omnibus Format in Text. Line-based, plain text format designed for rapid batch submission of data. Used by GEO |
| MIMiML | Exchange format | MIAME Notation in Markup Language. Optimized for microarray and other high-throughput molecular abundance data |
| GO | Terminology artifact | Gene Ontology. Controlled vocabulary for annotation of gene function and cellular location. Part of the OBO Foundry |
| EFO | Terminology artifact | Experimental Factor Ontology. Provides a systematic description of many experimental variables. Used by ArrayExpress |
| OBI | Terminology artifact | Broader scope for experimental representations. Part of the OBO Foundry |
| MGED Ontology | Terminology artifact | Integrated in OBI |
| MAGE-OM | Object model | MicroArray and Gene Expression—Object Model. The object model from which MAGE-ML was derived |
| FuGE | Object model | Generic object model for functional genomics |
| SEND | Exchange format | Standard for Exchange of Nonclinical Data—an implementation of the CDISC (Clinical Data Interchange Standards Consortium) SDTM (Standard Data Tabulation Model) |
| GEML | Exchange format | These three standards have since been deprecated and/or replaced by other standards, but that progression may not always be clear to novice users |
| FUGO | Terminology artifact | |
| MAML | Exchange format | |
Potential resources to assist in the selection and adoption of appropriate standards
| Resource | Notes |
|---|---|
| Lay person's primer to standards | This would be a text document for the lay person to describe the standard, what problem it helps solve, and how it achieves that. Although FAQs address a number of these questions, one must first identify the standard and find the respective FAQ. This would be a centralized collection of documentation that requires no previous knowledge |
| ‘Consumer reviews’ | This would be a rating system along the lines of Amazon product reviews. Ontology registries such as the NCBO and the OBO Foundry enable or perform reviews, but the reviews are few in number, not substantive, or infrequent. As discussed above, the utility of a standard depends on the purpose for which it is being used, so information beyond numeric scores is needed |
| Standard-selection wizard | Decision support methods could be used to ask a researcher about the intended goals and make recommendations accordingly. For example, ‘what instrument type was used to generate the data?’ and, ‘will these data be deposited in a public data repository? If so, which one?’ etc. Clearly this would require significant resources and ongoing maintenance |
| Standards-adoption ‘helpdesk’ | This would be a centralized resource of real humans with expertise across a number of standards. Once a standard has been selected, many have rich user communities and distribution lists for help with questions. However, for an individual investigator who wants to be standards-compliant and does not know where to begin, expert advice can save significant time in researching options |
| Quality assurance tools | Similar to syntax validators such as for RDF, tools to gauge or validate standards compliance are useful for data submitters as well as reviewers |
NCBO, National Center for Biomedical Ontology (http://www.bioontology.org/); RDF, Resource Description Framework.