| Literature DB >> 26124554 |
Martha O Perez-Arriaga1, Susan Wilson2, Kelly P Williams3, Joseph Schoeniger3, Russel L Waymire4, Amy Jo Powell5.
Abstract
UNLABELLED: Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. AVAILABILITY: The OMMS can be obtained at http://omms.sandia.gov.Entities:
Keywords: Bioinformatics; biological curation; integrated workflow; next-generation sequencing; omics; open-source software; relational database management system
Year: 2015 PMID: 26124554 PMCID: PMC4479048 DOI: 10.6026/97320630011165
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Omics Metadata Management Software. Core functionality resides in three tables, “Specimen Information,” “Sample Processing,” and “Sequence MetaInformation,” which have fields with embedded automation supporting efficient data metadata entry, storage and intuitive entity relationships facilitating data sharing and analysis. These tables are accessed via the “MetaData” portal.
Figure 2Unified framework for metadata management and state-of-the-art analyses. Curation (highlighted in aqua) and analyses (indicated in yellow) tasks are intrinsically related (overlap region) in next-generation sequencing studies, because sample handling and sequence production are multistep processes, and careful metadata tracking and management are required for downstream analyses and publication preparation. The OMMS supports user input of project metadata, automated creation of consistently named and enumerated unique identifiers for specimens, samples and sequence production information, and straightforward integration with bioinformatics utilities. Spreadsheets can be generated for structured data extraction and local download. Standard input and output of executables used here are stored in automatically-generated files and directories.
Figure 3Omics Metadata Management Software (OMMS) curation and analysis interface. The OMMS was designed to integrate and implement with open-source bioinformatics tools, such as BLAST, Bowtie 2 and Tophat and/or custom pipelines. These tools are accessed via the “Analysis Portal” (panel A). End users select the identifier (“Sequence Run ID”) of interest, which is referenced to particular sequence files (panel B and inset). Following input selection, the desired program is chosen and parameterized (panel C and inset) to launch a run. Output from a given analysis run can be downloaded via the OMMS “Results” portal (not shown).