| Literature DB >> 24936976 |
Vivien G Dugan1, Scott J Emrich2, Gloria I Giraldo-Calderón2, Omar S Harb3, Ruchi M Newman4, Brett E Pickett5, Lynn M Schriml6, Timothy B Stockwell5, Christian J Stoeckert3, Dan E Sullivan7, Indresh Singh5, Doyle V Ward4, Alison Yao8, Jie Zheng3, Tanya Barrett9, Bruce Birren4, Lauren Brinkac5, Vincent M Bruno6, Elizabet Caler5, Sinéad Chapman4, Frank H Collins2, Christina A Cuomo4, Valentina Di Francesco8, Scott Durkin5, Mark Eppinger6, Michael Feldgarden4, Claire Fraser6, W Florian Fricke6, Maria Giovanni8, Matthew R Henn4, Erin Hine6, Julie Dunning Hotopp6, Ilene Karsch-Mizrachi9, Jessica C Kissinger10, Eun Mi Lee8, Punam Mathur8, Emmanuel F Mongodin6, Cheryl I Murphy4, Garry Myers6, Daniel E Neafsey4, Karen E Nelson5, William C Nierman5, Julia Puzak11, David Rasko6, David S Roos3, Lisa Sadzewicz6, Joana C Silva6, Bruno Sobral7, R Burke Squires8, Rick L Stevens12, Luke Tallon6, Herve Tettelin6, David Wentworth5, Owen White6, Rebecca Will7, Jennifer Wortman4, Yun Zhang5, Richard H Scheuermann13.
Abstract
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.Entities:
Mesh:
Year: 2014 PMID: 24936976 PMCID: PMC4061050 DOI: 10.1371/journal.pone.0099979
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1NIAID GSCID/BRC Project and Sample Application Standard Overview.
Coverage of the twelve major data categories in the five data field collections is shown.
Core Project Attributes.
| FieldID | Field Name | Data Categories | OBO Foundry URL | BioProject Synonyms | MIxS Synonym |
| CP1 | Project Title | Investigation |
| Title | project name |
| CP2 | Project ID | Investigation |
| ||
| CP3 | Project Description | Investigation |
| Description | |
| CP4 | Project Relevance | Investigation |
| Relevance | |
| CP5 | SampleScope | Investigation |
| Sample Scope | |
| CP6 | Target Material | Investigation |
| Material | |
| CP7 | Target Capture | Investigation |
| Capture | |
| CP8 | Project Method | Investigation |
| Methodology | |
| CP9 | Project Objectives | Investigation |
| Objective | |
| CP10 | Grant Agency | Investigation |
| ||
| CP11 | Supporting Grants/Contract ID | Investigation |
| Grant ID | |
| CP12 | Publication Citation | Investigation |
| PubMed ID; DOI | ref_ biomaterial |
| CP13 | Sample Provider PrincipalInvestigator (PI) Name | Investigation |
| ||
| CP14 | Sample ProviderPI’s Institution | Investigation |
| ||
| CP15 | Sample ProviderPI’s email | Investigation |
| ||
| CP16 | Sequencing Facility | Investigation |
| ||
| CP17 | Sequencing FacilityContact Name | Investigation |
| ||
| CP18 | Sequencing FacilityContact’s Institution | Investigation |
| ||
| CP19 | Sequencing FacilityContact’s email | Investigation |
| ||
| CP20 | Bioinformatics Resource Center | Investigation |
| ||
| CP21 | Bioinformatics Resource Center Contact Name | Investigation |
| ||
| CP22 | BioinformaticsResource CenterContact’s Institution | Investigation |
| ||
| CP23 | BioinformaticsResource CenterContact’s email | Investigation |
|
*Mandatory NCBI BioProject attributes.
Core Sample Attributes.
| FieldID | Field Name | Data Categories | OBO Foundry URL | BioSample Synonym | MIxS Synonym |
| CS1 | SpecimenSource ID | Host Characterization |
| host_subject_id | host_ subject_id |
| CS2 | SpecimenCategory | Pathogen Detection |
| sample_category | |
| CS3 | Specimen SourceSpecies | Host Characterization |
| host | host_taxid |
| CS4 | Species SourceCommon Name | Host Characterization |
| Host_common_name | host_ common_ name |
| CS5 | Specimen SourceGender | Host Characterization |
| host_sex | sex |
| CS6 | Specimen SourceAge - Value | Host Characterization |
| host_age | age |
| CS7 | Specimen SourceAge - Unit | Host Characterization |
| host_age | |
| CS8 | Specimen SourceHealth Status | Host Characterization |
| host_health_state | health_ disease stat |
| CS9 | Specimen SourceDisease | Host Characterization |
| host_disease | disease status |
| CS10 | Specimen CollectionDate | Specimen Isolation |
| collection_date | collection date |
| CS11 | Specimen CollectionLocation - Latitude | Specimen Isolation |
| lat_lon | geographic location (latitude and longitude) |
| CS12 | Specimen CollectionLocation - Longitude | Specimen Isolation |
| lat_lon | geographic location (latitude and longitude) |
| CS13 | Specimen CollectionLocation - Location | Specimen Isolation |
| geo_loc_name | |
| CS14 | Specimen CollectionLocation - Country | Specimen Isolation |
| geo_loc_name | geographic location (country and/or sea region) |
| CS15 | SpecimenID | Specimen Isolation |
| sample_name | |
| CS16 | SpecimenType | Specimen Isolation |
| host_tissue_sampled | body habitat, body site, body product |
| CS17 | SuspectedOrganism(s)in Specimen- Species | Pathogen Detection |
| organism | |
| CS18 | SuspectedOrganism(s)in Specimen- Subclassification | Pathogen Detection |
| strain | subspecific genetic lineage |
| CS19 | Human PathogenicityofSuspectedOrganism(s) in Specimen | Pathogen Characteristic |
| pathogenicity | phenotype |
| CS20 | EnvironmentalMaterial | Specimen Isolation |
| isolation_source | environment (material) |
| CS21 | Organism DetectionMethod | Pathogen Detection |
| organism_detection_method | sample collection device or method |
| CS22 | SpecimenRepository | Specimen Processing |
| culture_collection | source material identifiers |
| CS23 | Specimen RepositorySample ID | Specimen Processing |
| culture_collection | source material identifiers |
| CS24 | Comments | Specimen Comments |
| ||
| CS25 | Specimen CollectorName | Specimen Isolation |
| collected_by | |
| CS26 | Specimen Collector’sInstitution | Specimen Isolation |
| specimen_collector’s_ institution | |
| CS27 | Specimen Collector’semail | Specimen Isolation |
| specimen_collector’s_ institution |
*Mandatory NCBI BioSample attributes in the “Pathogen: clinical or host-associated” version 1.0 package.
Figure 2Semantic Network of the Core Project Data Fields.
A semantic representation of the entities relevant to describe infectious disease projects based on the OBI and other OBO Foundry ontologies is shown. Distinctions are made between material entities (blue outlines), information entities and qualities (black outlines), and processes (red outlines). Entities are connected by standard semantic relations, in italic. The subset of entities selected as Core Project fields are noted with ovals containing the respective Field ID. For example, both the “Project Title” (CP1) and “Project ID” (CP2) denote an OBI:Investigation; the “Project Description” (CP3) is_about the same OBI:Investigation.
Figure 3Semantic Network of the Core Sample Data Fields.
A semantic representation of the entities relevant to describe infectious disease samples based on the OBI and other OBO Foundry ontologies is shown. Distinctions are made between material entities (blue outlines), information entities and qualities (black outlines), and processes (red outlines). Entities are connected by standard semantic relations, in italic. The subset of entities selected as Core Sample fields are noted with ovals containing the respective Field ID. For example, the OBI:organism has_quality “Specimen Source Gender” (CS5), which is equivalent to the PATO:biological sex, and has_quality PATO:age, and has_quality “Specimen Source Health Status” (CS8), which is equivalent to PATO:organismal status. PATO:age is_quality_measured_as OBI:age since birth measurement datum, which has_measurement_value “Specimen Source Age – Value” (CS6) and has_measurement_unit_label “Specimen Source Age – Unit” (CS7).