| Literature DB >> 27259657 |
Jesualdo Tomás Fernández-Breis1, Hirokazu Chiba2, María Del Carmen Legaz-García3, Ikuo Uchiyama2.
Abstract
BACKGROUND: Computational comparative analysis of multiple genomes provides valuable opportunities to biomedical research. In particular, orthology analysis can play a central role in comparative genomics; it guides establishing evolutionary relations among genes of organisms and allows functional inference of gene products. However, the wide variations in current orthology databases necessitate the research toward the shareability of the content that is generated by different tools and stored in different structures. Exchanging the content with other research communities requires making the meaning of the content explicit. DESCRIPTION: The need for a common ontology has led to the creation of the Orthology Ontology (ORTH) following the best practices in ontology construction. Here, we describe our model and major entities of the ontology that is implemented in the Web Ontology Language (OWL), followed by the assessment of the quality of the ontology and the application of the ORTH to existing orthology datasets. This shareable ontology enables the possibility to develop Linked Orthology Datasets and a meta-predictor of orthology through standardization for the representation of orthology databases. The ORTH is freely available in OWL format to all users at http://purl.org/net/orth .Entities:
Keywords: Comparative genomics; Knowledge representation; Ontology; Orthology; Semantic web
Mesh:
Year: 2016 PMID: 27259657 PMCID: PMC4893294 DOI: 10.1186/s13326-016-0077-x
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1A schematic representation of the evolutionary relations among genes of multiple organisms. The leaf nodes of the tree represent the genes and the internal nodes correspond to evolutionary events. X1 has two ancestral nodes associated with speciation events; one is the last common ancestor with Y1, and the other is common with Z1. Thus, Y1 and Z1 are orthologs of X1. On the other hand, X2 and Y2 are paralogs to X1, since their last common ancestor has a duplication event associated. Likewise, all the pairwise orthology/paralogy relations can be defined according to the strucutre of the given tree
Fig. 2The core classes and properties of the Orthology Ontology. The classes are represented as boxes and the properties as arrows. The prefixes cdao, sio, ro, ncbit and void represent entities reused from the corresponding ontologies. The entities without prefix are defined in the ORTH. On the whole, this figure includes three kinds of classes, each shown in the left/center/right parts, respectively: (left) classes for evolutionary changes; (center) classes for groups of biological sequences holding particular evolutionary relations; and (right) classes for biological sequences of interest
Scores of the OQuaRE quality characteristics for the ORTH
| ORTH | Structural | Funct. adequacy | Compatibility | Maintainability | Operability | Reliability | Transferability |
|---|---|---|---|---|---|---|---|
| Complete | 4.5 | 4.56 | 3.0 | 3.97 | 4.33 | 3.12 | 4.0 |
| No imports | 4.0 | 4.03 | 4.25 | 4.09 | 3.66 | 3.0 | 4.0 |
The first row shows the scores for the OWL file including the imported ontologies, whereas the second row shows the ones for the ontology without the imported ones
Fragments of the InParanoid OrthoXML file that stores orthology relations between human and mouse
| <species name="Homo sapiens " NCBITaxId="9606" > |
| <database name="UniProt" |
| version="UniProt_Complete_Proteomes_2013_06" |
| protLink=" |
| <genes> |
| <gene id="33162" protId="P0DJI8" geneId="SAA1"/> |
| <gene id="33163" protId="P0DJI9" geneId="SAA2"/> |
| </genes> |
| </species> |
| <species name="Mus musculus " NCBITaxId="10090"> |
| <database name="UniProt" |
| version="UniProt_Complete_Proteomes_2013_06" |
| protLink=" |
| <genes> |
| <gene id="33164" protId="P05366" geneId="Saa1"/> |
| <gene id="33165" protId="P05367" geneId="Saa2"/> |
| </genes> |
| </species> |
| <orthologGroup id="16021"> |
| <geneRef id="33162"/> |
| <geneRef id="33163"/> |
| <geneRef id="33164"/> |
| <geneRef id="33165"/> |
| </orthologGroup> |
Fig. 3Excerpt of the mapping from the OrthoXML schema (left) to the ORTH (right). The dashed lines represent the concrete mappings from OrthoXML entities to ORTH classes or properties
RDF triples for the cluster of orthologs 16021
| orth_data:orthologsCluster_16021 rdf:type orth:OrthologsCluster. |
| orth_data:orthologsCluster_16021 void:inDataset orth_data: |
| orthologyDataset_InParanoid. |
| orth_data:orthologsCluster_16021 dcterms:identifier "16021". |
| orth_data:orthologsCluster_16021 orth:hasHomologous orth_data: |
| gene_33162. |
| orth_data:orthologsCluster_16021 orth:hasHomologous orth_data: |
| gene_33163. |
| orth_data:orthologsCluster_16021 orth:hasHomologous orth_data: |
| gene_33164. |
| orth_data:orthologsCluster_16021 orth:hasHomologous orth_data: |
| gene_33165. |
A sample query for getting the orthologs of a given gene
| SELECT ?gene ?species ?database WHERE { |
| ?common_ancestor a orth:OrthologsCluster. |
| ?common_ancestor ort:hasHomologous ?tree_node1. |
| ?common_ancestor orth:hasHomologous ?tree_node2. |
| ?common_ancestor void:inDataset ?dataset. |
| ?dataset orth:hasSource ?database. |
| ?tree_node1 orth:hasHomologous* ?gene1. |
| ?tree_node2 orth:hasHomologous* ?gene2. |
| ?gene1 a orth:Gene. |
| ?gene2 a orth:Gene. |
| ?gene1 obo:RO_0002162 ?species1. |
| ?gene2 obo:RO_0002162 ?species2. |
| ?gene1 dcterms:identifier ?id. |
| ?gene2 dcterms:identifier ?gene. |
| ?species2 rdfs:label ?species. |
| bind(str(?id) as ?str_id) |
| FILTER (?tree_node1 != ?tree_node2 && ?species1 != ?species2) |
| VALUES (?str_id ?species1 ?species2) {(“SAA1” ncbit:9606 ncbit:10090)} |
| } |
In this example, the mouse (ncbit:10090) orthologs of the human (ncbit:9606) gene SAA1 are retrieved. obo:RO_0002162 stands for the property in_taxon
Results of the query shown in Table 4 for a repository that integrates InParanoid and OMA
| Gene | Species | Database |
|---|---|---|
| SAA1 |
| OMA |
| SAA3 |
| OMA |
| SAA2 |
| OMA |
| Saa1 |
| InParanoid |
| Saa2 |
| InParanoid |
Summary of sequence sources and identifiers used in three different orthology resources
| Orthology resource | Database source | Protein ID | Gene ID |
|---|---|---|---|
| OMA | Multiple sources | OMA ID | Gene symbola |
| InParanoid | UniProt | UniProt AC | Gene symbola |
| TreeFam | Multiple sources | Multiple sources | Multiple sources |
aThe gene symbol is used for model organisms, including human and mouse, but this is not the case for other organisms in general