| Literature DB >> 28062448 |
Daniel Greene1,2, Sylvia Richardson1, Ernest Turro1,2.
Abstract
Summary: Ontologies are widely used constructs for encoding and analyzing biomedical data, but the absence of simple and consistent tools has made exploratory and systematic analysis of such data unnecessarily difficult. Here we present three packages which aim to simplify such procedures. The ontologyIndex package enables arbitrary ontologies to be read into R, supports representation of ontological objects by native R types, and provides a parsimonius set of performant functions for querying ontologies. ontologySimilarity and ontologyPlot extend ontologyIndex with functionality for straightforward visualization and semantic similarity calculations, including statistical routines. Availability and Implementation: ontologyIndex , ontologyPlot and ontologySimilarity are all available on the Comprehensive R Archive Network website under https://cran.r-project.org/web/packages/ . Contact: Daniel Greene dg333@cam.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2017 PMID: 28062448 PMCID: PMC5386138 DOI: 10.1093/bioinformatics/btw763
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Mean execution time for retrieving descendants and ancestors for individual terms in the Human Phenotype Ontology
| Descendants (ms) | Ancestors (ms) | |
|---|---|---|
| ontoCAT | 11.99 | 12.75 |
| ontologyIndex | 0.38 | 0.14 |
Fig. 1Plot of terms descending from the cellular_component term in the GO, extracted using the exclude_descendants function from ontologyIndex, for genes QPCTL and CRNN using ontologyPlot. The left panel shows the full set of ancestral terms used in the annotation of the genes, while the right panel shows only those remaining after remove_uninformative_terms has been called. Terms annotated to both genes, either implicitly or explicitly, are shown in light blue, while those annotated only QPCTL and CRNN are shown in green and purple respectively. The size of the nodes has been set to be proportional to the information content (i.e. negative log frequency) of the terms with respect to gene annotation downloaded from the GO website
Execution times for computing pairwise similarity matrices for 1000 randomly selected GO terms and 100 randomly selected gene GO annotation sets using Lin's expression for term similarity
| Term sim (s) | Gene sim (s) | |
|---|---|---|
| GOSim | 1075.43 | 298.34 |
| GOSemSim | 1.71 | 116.72 |
| ontologySimilarity | 0.31 | 0.06 |
| ontologySimilarity (indexed) | 0.04 |