| Literature DB >> 26019997 |
Joachim Baran1, Bibi Sehnaaz Begum Durgahee2, Karen Eilbeck2, Erick Antezana3, Robert Hoehndorf4, Michel Dumontier1.
Abstract
Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology's GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.Entities:
Keywords: Bioinformatics; Genomics; Ontology
Year: 2015 PMID: 26019997 PMCID: PMC4435477 DOI: 10.7717/peerj.933
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1A GFF3 file example that shows the implicit relationships between data in the file and its genomic locus interpretation.
Overview of classes and properties in GFVO.
Total number of classes and properties in GFVO, number of classes/properties that have equivalences to SIO, number of classes based on GFF3, GTF, GVF and VCF specifications (not mutually exclusive), number of classes with reference to Wikipedia articles, disjointness axioms, and property restrictions.
| Total number | Equivalence with SIO | Equivalence with SO | |
|---|---|---|---|
| Classes | 102 | 40 | 13 |
| …modeled from GFF3 | 23 | 11 | 3 |
| …modeled from GTF | 13 | 11 | 2 |
| …modeled from GVF | 62 | 23 | 12 |
| …modeled from VCF | 42 | 15 | 7 |
| Class metadata | |||
| …Wikipedia references | 53 | n/a | n/a |
| …pairwise disjoint auxioms | 6 | n/a | n/a |
| …disjoint collection auxioms | 13 | n/a | n/a |
| …with property restrictions | 32 | n/a | n/a |
| Datatype properties | 1 | 1 | 0 |
| Object properties | 32 | 31 | 0 |
Overview of file-format data structures captured by GFVO.
Overview of the number of data structures (columns, key/value pairs, other) in genomics file formats that are captured by GFVO and other ontologies/frameworks.
| Specification | Fixed columns | Feature attributes and key/value properties | Pragma statements and information fields |
|---|---|---|---|
| Table column description |
|
|
|
|
| |||
|
| 8 (+1 SO) | 5 | 5 (+4 RDF schema) |
|
| 9 | n/a | 6 |
|
| 8 (+1 SO) | 25 | 27 (+4 RDF schema) |
|
| 6 | 24 | 15 |
Coverage of class documentation in terms of word- and class-counts.
Coverage of class documentation in genomics related ontologies. Coverage is denoted by total number of words of documentation as well as on a normalized per-class basis.
| Ontology | Total number of words in comments/descriptions | Total number of classes | Average number of words in descriptions per class |
|---|---|---|---|
|
| |||
|
| 477 | 18 | 26.5 (6; 25; 72) |
|
| 4,478 | 102 | 43.9 (12; 38.5; 133) |
|
| 23,412 | 1,414 | 16.56 (5; 16; 73) |
|
| 7,296 | 2,254 | 3.24 (1; 13; 97) |
|
| 4,612 | 384 | 12.01 (1; 8; 74) |