| Literature DB >> 34249331 |
Sveinung Gundersen1, Sanjay Boddu2, Salvador Capella-Gutierrez3, Finn Drabløs4, José M Fernández3, Radmila Kompova1, Kieron Taylor2, Dmytro Titov1, Daniel Zerbino2, Eivind Hovig1,5.
Abstract
Background: Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser.Entities:
Keywords: FAIR; epigenomics; functional genomics; genomic tracks; genomics; interoperability; metadata; sequence annotations
Year: 2021 PMID: 34249331 PMCID: PMC8226415 DOI: 10.12688/f1000research.28449.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Summary of required attributes for metadata standards related to genomic track files.
| Standard | Object | Required attributes | Allowed values |
|---|---|---|---|
| FAANG | Analysis | Input data | String |
| FAANG | Analysis | Reference data | String |
| FAANG | Analysis | Analysis protocol | String |
| FAANG | Analysis | Total reads | Number |
| FAANG | Analysis | Mapped reads | Number |
| FAANG | Experiment | sample | BioSampleID |
| FAANG | Experiment | assay type | Enum ('ChIP-Seq', 'RNA-Seq of coding
|
| FAANG | Experiment | sample storage processing | Enum ('Fresh', 'Formalin fixed'...) |
| FAANG | Experiment | sampling to preparation interval | String with number + unit |
| FAANG | Experiment | extraction protocol | String |
| FAANG | Sample | SampleName | String |
| FAANG | Sample | Material | Enum ('Cell line', 'Organism'...) |
| FAANG | Sample | project | "FAANG" |
| HyperBrowser | AnalysisFile | URI | URI |
| HyperBrowser | AnalysisFile | Genome build | String (UCSC assembly versions) |
| HyperBrowser | AnalysisFile | File suffix | String |
| HyperBrowser | AnalysisFile | Data type/represented concept (
| Enum |
| HyperBrowser | AnalysisFile | Target (main target of dataset,
| String |
| HyperBrowser | AnalysisFile | Cell type | String |
| HyperBrowser | AnalysisFile | Tissue type | String |
| HyperBrowser | AnalysisFile | Experiment type | String |
| IHEC | Experiment | EXPERIMENT_TYPE | Enum ("DNAme", "RNA-Seq"...) |
| IHEC | Experiment | EXPERIMENT_ONTOLOGY_URI | OBI |
| IHEC | Experiment | LIBRARY_STRATEGY | Enum ('RNA-Seq', 'ChIP-Seq' ...) |
| IHEC | Experiment | MOLECULE_ONTOLOGY_URI | SO |
| IHEC | Experiment | MOLECULE | Enum ('Total RNA', 'Genomic DNA', ...) |
| IHEC | Sample | SAMPLE_ONTOLOGY_URI | EFO, CL or UBERON depending on type |
| IHEC | Sample | DISEASE_ONTOLOGY_URI | NCImetathesaurus |
| IHEC | Sample | DISEASE | String |
| IHEC | Sample | BIOMATERIAL_PROVIDER | String |
| IHEC | Sample | BIOMATERIAL_TYPE | Enum ("Cell Line", "Primary tissue"...) |
| INSDC | AnalysisFile | filename | string |
| INSDC | AnalysisFile | filetype | Enum |
| INSDC | AnalysisFile | checksum_method | Enum |
| INSDC | AnalysisFile | checksum | string |
| ISA-tab | Assay | Measurement Type | Ontology Annotation |
| ISA-tab | Assay | Technology Type | Ontology Annotation |
| ISA-tab | Assay | Technology Platform | String |
| ISA-tab | Investigation | Identifier | String |
| ISA-tab | Investigation | Identifier | String |
| ISA-tab | Investigation | Title | String |
| ISA-tab | Investigation | Description | String |
| ISA-tab | Investigation | Submission Date | Representation of a ISO8601 date |
| ISA-tab | Investigation | Public Release Date | Representation of a ISO8601 date |
| ISA-tab | Investigation | Publications | A list of Publication |
| ISA-tab | Investigation | Contacts | A list of Contact |
| ISA-tab | Study | Identifier | String |
| ISA-tab | Study | Title | String |
| ISA-tab | Study | Description | String |
| ISA-tab | Study | Submission Date | Representation of a ISO8601 date |
| ISA-tab | Study | Public Release Date | Representation of a ISO8601 date |
| ISA-tab | Study | Publications | A list of Publication |
| ISA-tab | Study | Contacts | A list of Contact |
| ISA-tab | Study | Design Type | Ontology Annotation |
| ISA-tab | Study | Factor Name | String |
| ISA-tab | Study | Factor Type | Ontology Annotation |
| Track Hub | Analysis | Contact e-mail address | String |
| Track Hub | AnalysisFile | An assembly identifier | UCSC nomenclature |
| Track Hub | AnalysisFile | A filetype | Enum |
| Track Hub | AnalysisFile | A URL | String |
| Track Hub | AnalysisFile | A short label | String |
| Track Hub | AnalysisFile | A long label | String |
| Zenbu | AnalysisFile | FileFormat | String |
| Zenbu | AnalysisFile | Date | String |
| Zenbu | AnalysisFile | ProtocolREF | String |
| Zenbu | AnalysisFile | ColumnVariable (string descriptions of each column) | String |
| Zenbu | AnalysisFile | ContactName | String |
| Zenbu | AnalysisFile | ContactEmail | String |
Figure 1. Overview of the key objects in the proposed data model, and the relationships between them.
Key attributes of FAIRtracks objects.
|
| Version id, version date, ontology versions, URL to original source |
|
| Name, description, URL to original source, contact info |
|
| Name, publications, contact info |
|
| Species, biospecimen class, sample type (
|
|
| (Sample OR upstream experiment), technique, biological target (
|
|
| Assembly details, file URL, label, description, ID of source collection, IDs of raw files, file format, type of
|
Figure 2. Important topics where the current state of track data and metadata have potential for improvements, as mapped to the FAIR recommendations.
Mapping of FAIRtracks objects to objects in other metadata standards.
| FAIRtrack | INSDC | ISA | Other | Comments |
|---|---|---|---|---|
| Track collection | SRA: Submission
| Investigation | Track Hub Registry:
| Can represent both original repository submissions,
|
| Study | Study | Study | ||
| Sample | Sample | Sample | ||
| Experiment | Experiment &
| Assay & Process | "aggregated_from" attribute allows provenance
| |
| Track | Analysis | Data | Track Hub Registry/ GSuite: Track | "raw_file_ids" can link to original data files in case a full
|