| Literature DB >> 34697637 |
Rebecca Jackson1, Nicolas Matentzoglu2, James A Overton3, Randi Vita4, James P Balhoff5, Pier Luigi Buttigieg6, Seth Carbon7, Melanie Courtot8, Alexander D Diehl9, Damion M Dooley10, William D Duncan7, Nomi L Harris7, Melissa A Haendel11, Suzanna E Lewis7, Darren A Natale12, David Osumi-Sutherland8, Alan Ruttenberg9, Lynn M Schriml13, Barry Smith9, Christian J Stoeckert14, Nicole A Vasilevsky11, Ramona L Walls15, Jie Zheng14, Christopher J Mungall7, Bjoern Peters4.
Abstract
Biological ontologies are used to organize, curate and interpret the vast quantities of data arising from biological experiments. While this works well when using a single ontology, integrating multiple ontologies can be problematic, as they are developed independently, which can lead to incompatibilities. The Open Biological and Biomedical Ontologies (OBO) Foundry was created to address this by facilitating the development, harmonization, application and sharing of ontologies, guided by a set of overarching principles. One challenge in reaching these goals was that the OBO principles were not originally encoded in a precise fashion, and interpretation was subjective. Here, we show how we have addressed this by formally encoding the OBO principles as operational rules and implementing a suite of automated validation checks and a dashboard for objectively evaluating each ontology's compliance with each principle. This entailed a substantial effort to curate metadata across all ontologies and to coordinate with individual stakeholders. We have applied these checks across the full OBO suite of ontologies, revealing areas where individual ontologies require changes to conform to our principles. Our work demonstrates how a sizable, federated community can be organized and evaluated on objective criteria that help improve overall quality and interoperability, which is vital for the sustenance of the OBO project and towards the overall goals of making data Findable, Accessible, Interoperable, and Reusable (FAIR). Database URL http://obofoundry.org/.Entities:
Mesh:
Year: 2021 PMID: 34697637 PMCID: PMC8546234 DOI: 10.1093/database/baab069
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Illustration of the principles around which the OBO Foundry was built.
Minimal ontology metadata captured in the OBO registry
| Field | Definition | Example | Automated validation |
|---|---|---|---|
| Title | Full name of the ontology | Ontology for Biomedical Investigations | Must be present |
| id | Abbreviation of the ontology’s name used as the exclusive namespace for the ontology | Obi | Lowercase, alphanumeric, no spaces |
| homepage | Website where a user can find information about the ontology |
| Must be in a URL format |
| contact.label | Name of a person responsible for the ontology | Bjoern Peters | Must not contain ‘@’ and only have one label |
| contact.email | Email address of the person responsible for the ontology | bpeters@lji.org | Must be in email format and only have one email |
| products.id | Name of the canonical ontology file (id.owl). Additional products may include ontology subsets, bridge files, etc. | obi.owl | Format enforced |
| description | Concise free text description of the scope of the ontology | An integrated ontology for the description of life science and clinical investigations | Must be present |
| license.label | Name of license | CC-BY 4.0 | Must correspond to the title of the license.url |
| license.url | URL of license |
| Must be in a URL format |
| activity status | Indicates the development status of the ontology | active | Must be one of ‘active’, ‘inactive’ or ‘orphaned’ |
| obsoletion status | Indicates if the ontology has been declared obsolete by the original developers | false | Must be either ‘true’ or ‘false’ |
OBO Foundry principles and their automated checks
| Principle | Summary | Automated check |
|---|---|---|
| Open | The ontology MUST be openly available to be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed in an altered form under the original name or with the same identifiers. | The registry data entry is validated with JSON Schema. The license schema ensures that a license entry is present and that the entry has a URL and label. The schema also checks that the license is one of the CC0 or CC-BY licenses. Then, annotations from the ontology are retrieved and the ‘dcterms:license’ annotation is retrieved (if exists). The script ensures that the correct ‘dcterms:license’ property is used and compares this license to the registry license to ensure that they are the same. Note that many ontologies currently fail this check due to discrepancies between the ontology file and the registry metadata, but we still require an ontology to conform to this principle in order to join the OBO Foundry. |
| Common Format | The ontology is made available in a common formal language in an accepted concrete syntax. | The ontology is loaded using OWLAPI. If the ontology is successfully loaded, it is assumed that it is in a good format. |
| URI/Identifier Space | Each class and relation (property) in the ontology must have a unique URI identifier that follows the format: a base URI + a prefix that is unique within the Foundry + a local numeric identifier. | All entity IRIs are retrieved from the ontology, excluding annotation properties. Annotation properties may use hashtags and words due to legacy OBO conversions for subset properties. All other IRIs are checked if they are in the ontology’s namespace. If the IRI begins with the ontology namespace, the next character must be an underscore. The IRI is also compared to a regex pattern to check if the local ID after the underscore is numeric. |
| Versioning | The ontology provider has documented procedures for versioning the ontology, and different versions of ontology are marked, stored and officially released. | The version IRI is retrieved from the ontology and, if found, this IRI is compared to a regex pattern to determine if it is in date format. |
| Scope | The scope of an ontology is the extent of the domain or subject matter it intends to cover. The ontology must have a clearly specified scope and content that adheres to that scope. | First, the registry data is checked for a ‘domain’ tag. If it is present, the domain is compared to all other ontology domains. If the ontology shares a domain with one or more other ontologies, we return a list of those ontologies. |
| Textual Definitions | The ontology has textual definitions for the majority of its classes and for top-level terms in particular. | ROBOT ‘report’ is run over the ontology. A count of violations for each of the following checks is retrieved from the report object: duplicate definition, multiple definitions and missing definition. The ROBOT report will warn on any and all missing definitions, so in order to pass this check, all terms in an ontology must have distinct textual definitions. |
| Relations | Relations should be reused from the Relations Ontology (RO). | The object and data properties from the ontology are compared to existing RO properties. |
| Documentation | The owners of the ontology should strive to provide as much documentation as possible. The documentation should detail the different processes specific to an ontology life cycle and target various audiences (users or developers). | The registry data is checked for ‘homepage’ and ‘description’ entries. If the homepage is present, the URL is checked to see if it resolves (does not return an HTTP status of 400 or greater). |
| Documented Plurality of Users | The ontology developers should document that the ontology is used by multiple independent people or organizations. | The registry data is checked for ‘usages’ entries. |
| Commitment to Collaboration | OBO Foundry ontology development, in common with many other standards-oriented scientific activities, should be carried out in a collaborative fashion. | N/A—this cannot be automated at this time. This principle does not appear in any dashboard result. |
| Locus of Authority | There should be one person responsible for communications between the community and the ontology developers, for communicating with the Foundry on all Foundry-related matters, for mediating discussions involving maintenance in the light of scientific advance and for ensuring that all user feedback is addressed. | The registry data entry is validated with JSON Schema to ensure that a contact entry is present and that the entry has a name and email address. |
| Naming Conventions | Each entity within the ontology must have a unique label and must not have more than one label. All labels should be declared using the ‘rdfs:label’ property. | ROBOT ‘report’ is run over the ontology. A count of violations for each of the following checks is retrieved from the report: duplicate label, multiple labels and missing label. |
| Maintenance | The ontology needs to reflect changes in scientific consensus to remain accurate over time. | A version Internationalized Resource Identifier (IRI) is retrieved from the ontology and checked against a regex pattern to determine if it is in date format. If so, the date is retrieved to ensure that the ontology is updated in a timely manner. While regular releases are a good indicator of maintenance, we realize that this does not necessarily mean that the ontology is up to date with scientific consensus. At this time, we do not have the methods to fully validate this principle as it is written. |
| Responsiveness | The ontology developers must offer a channel for community participation in the form of suggestions and requests. | The registry data is checked for a ‘tracker’ entry. |
OBO Foundry principles and their GitHub issues for discussion of automated validation
| Principle | Automated validation GitHub issue |
|---|---|
| Open |
|
| Common Format |
|
| URI/Identifier Space |
|
| Versioning |
|
| Scope |
|
| Textual Definitions |
|
| Relations |
|
| Documentation |
|
| Documented Plurality of Users |
|
| Commitment to Collaboration | N/A—this principle cannot be automatically validated at this time |
| Locus of Authority |
|
| Naming Conventions |
|
| Maintenance |
|
| Responsiveness |
|
Figure 2.The OBO dashboard (truncated). The rows represent OBO ontologies (of which the first 15 in alphabetical order are shown here) and the columns are the OBO principles. The final column, ‘Summary’, shows whether the ontology passed all of the tests. Clicking on the ontology ID in the far left column directs to a detailed report page.
Figure 3.Number of errors reported by dashboard on 11 November 2019 (blue bars) and 15 July 2020 (gray bars). The final column, ‘Ontologies with Errors’, is the total number of ontologies that had one or more errors, not a count of all errors. While more ontologies joined the OBO Foundry between these two dates, we only included statistics for the 223 ontologies that were present and active in both the first run and the second run. The automated checks remained the same during this time period.
Figure 4.Summary of principle conformance across all active OBO Foundry ontologies in May 2021.