Literature DB >> 30532623

Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier.

Lorena Endara¹, Anne E Thessen², Heather A Cole³, Ramona Walls⁴, Georgios Gkoutos^5,6, Yujie Cao⁷, Steven S Chong^8,9, Hong Cui⁹.

Abstract

Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called "modifiers". With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies.
Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using "broader synonym" or "not recommended" annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.

Entities: Chemical Disease Gene Mutation Species

Keywords: Modifier Ontology; certainty modifiers; coverage modifiers; degree modifiers; frequency modifiers; literary warrant; phenotype modifiers; user consensus; user warrant

Year: 2018 PMID： 30532623 PMCID： PMC6281706 DOI： 10.3897/BDJ.6.e29232

Source DB: PubMed Journal: Biodivers Data J ISSN： 1314-2828

Introduction

Despite the development and use of sensor technology in biomedical domains and applications, phenotypic character descriptions published in the literature remain an indispensable resource for ecological and systematics research. Anatomical and quality ontologies have been developed to support the curation workflows that aim to convert narrative phenotypical characters to ontological statements for cross-taxon inferences and computation. Uber-anatomy Ontology (UBERON), Hymenoptera Anatomy Ontology (HAO), and the Plant Ontology (PO) are some examples of anatomical ontologies that contain anatomical structure terms and their relationships (Cooper et al. 2013, Yoder et al. 2010, Mungall et al. 2016). The Phenotypic Quality Ontology (PATO) is a taxon-neutral quality ontology that treats character and character value terms (Gkoutos et al. 2017, Gkoutos et al. 2004). These ontologies are often used by EQ-based approaches, where Entity and Quality are post-composed to create an ontological statement for a character (Gkoutos et al. 2009, Dahdul et al. 2018a). Other phenotype ontologies, such as the Flora Phenotype Ontology or FLOPO (Hoehndorf et al. 2016), have also been developed to include complete characters. Modifier terms are used widely in phenotypic character descriptions but have not been treated formally in an ontology. Hagedorn (2007) provided a good definition for phenotype character modifiers: A modifier is a unit of information that adds detail (or constraints) to the statement to which it is applied. When the modifier information is ignored, the original statement must retain a substantial, albeit more general meaning. A modifier may be applied to statements already modified. Modifiers themselves are constrained by a terminology. Further, Hagedorn comprehensively summarized the existing studies and arrived at a modifier taxonomy, consisting of 11 groups of modifiers. In this work, we attempt to construct modifier ontologies that treat four groups of the modifiers that have general usage across many characters and share the same characteristics of having implied order among the terms, for example, rarely is less frequent than often, perhaps is less certain than clearly. This sequential relationship is the key semantics we would like to capture in the modifier ontologies because it will be the key for a computer to understand: How to compare modifiers semantically When to inherit a character from a family level description to a genus level How to use them in an identification key application We propose two alternative approaches to constructing a modifier ontology and discuss the tradeoffs between the two. Both approaches are grounded to a set of modifier words extracted from 30 volumes of Flora of North America (Flora of North America Editorial Committee 1993), the Flora of China (Flora of China Editorial Committee 1994), and a large number of taxonomic publications (ca. 21,000 treatments) on ants, algal fossils, and other taxon groups.

Related work

While a standard formula for building ontologies is yet to be proposed, Z39.19 National Standard for Monolingual Controlled Vocabulary Construction NISO (National Information Standards Organization) (2010) Z39.19-2005 laid out the fundamental principles for controlled vocabularies, which apply equally well to ontology building. These principles are “eliminating ambiguity, controlling synonyms, establishing relationships among terms where appropriate, [and] testing and validation of terms” p. 12 of the NISO (National Information Standards Organization) 2010. In addition, the OBO Foundry Principles provide a set of guidelines that OBO Foundry ontologies are expected to follow, covering aspects ranging from ontology content, from definitions and relations (mostly under-development) to ontology management (Smith et al. 2007). The Basic Formal Ontology (BFO, Arp et al. 2015; , accessed 4/18/2018) provides a genuine domain independent upper ontology that differentiates a number of fundamental concepts that are useful to guide the development of many ontologies. Within the BFO framework, character modifiers would fall under the Specifically Dependent Continuant > Quality class. PATO is a taxon-neutral quality ontology ( Hagedorn’s dissertation ( Spatial modifiers (p. 203, also called “location” or “topological” modifiers). These modifiers indicate a location where a character appear. For example, “at the base”. Temporal modifiers (p. 204) indicate a time when a character appears. For example, “when old”. Method modifiers (p. 205) indicate the method that is used to generate or observe a character, for example, “in alcohol”, and “under hand-lens”. Frequency modifiers (p.206) indicate the probability of observing a true statement, for example, “usually”, “occasionally”, and “rarely”. Certainty modifiers (p. 207) indicate the probability of a statement being true, for example, “perhaps”, “probably”, “likely”, and “certainly”. Approximation modifiers (p. 209), a kind of certainty modifier, indicate the degree of inaccuracy of a reported value. For example, “ca.”, “approximately”, “about”, and “roughly”. Modifiers hinting misinterpretation (p. 209) indicate a stated character is the result of misinterpretation. For example, “by misinterpretation”. Negation modifiers (p. 211) indicate a negation of a stated character. For example, “not red”. State modifiers (p. 212) modify the quality, Reliability modifiers (p. 213) indicate the suitability of a character for the purpose of taxon identification. Other modifiers (p. 214). The modifier taxonomy proposed in Over the course of the past ten years, many ontology design patterns have been proposed (e.g.,

Material and methods

Define the Scope Ontologies concerning Categories 1-3 in Hagedorn’s taxonomy have been developed or are under development, for example, the Biological Spatial Ontology, (BSPO, Dahdul et al. 2014), the Measurement Method Ontology (Shimoyama et al. 2012), and the Experimental Condition Ontology (Shimoyama et al. 2012). Categories 7 and 10 are defined solely for the purpose of taxon identification and consist of a closed set of system defined terms. These categories are out of scope of the modifier ontology, which focuses on groups of modifiers that have general usage across many characters and are sequentially related to one another. The negation modifiers, or Category 8, was also excluded because negations can be handled with the logical NOT operator. Category 9 derives more specific states from a base state and most of such modifiers are character dependent, for example, “dull” can only modify color characters or sharpness of some edges. However, a subset of the state modifiers, degree modifiers, does have general applicability. Based on this analysis, the scope of our modifier ontologies covers Frequency, Certainty, Degree, and Coverage modifiers (defined below). Coverage modifiers were added after reviewing the candidate terms extracted from a wide range of taxonomic descriptions. the probability of observing a quality the probability of a quality being true the measure or intensity of a quality, ranging from the minimal to extremely intense the spatial extent or scope of a quality, ranging from very sparse coverage to complete coverage of an entity. Data Collection Following the literary warrant principle ANSI/NISO (National Information Standards Organization) 2010, we intended for the modifier ontology to include modifier terms used in published taxonomic descriptions. CharaParser (Cui 2012), now a part of the Explorer of Taxon Concepts web toolkit (Cui et al. 2016), was used to parse taxonomic descriptions and extract modifiers from a variety of taxonomic publications (https://www.dropbox.com/sh/msnqb0aqjgwlgaw/AAA-jUfSq14vrnM-AgKSjd49a?dl=0), covering ants, diatoms, plants, and fungi. CharaParser markups biological entities, characters, relationships, and modifiers in taxonomic descriptions. A few thousand unique modifier terms/phrases were extracted and after a manual review of these extracted phrases, 130 unique, one-word modifiers within the scope defined above were selected. Multiple-word phrases or expressions were not considered in this work to limit its scope. We observed that the modifier terms were ordinal values. To express the sequential relationships among the terms of each modifier type, two inverse and transitive properties were needed in the ontology: http://www.obofoundry.org/ontology/ro.html, accessed 5/27/2018), but the former not only takes out the possibility of inserting an intermediate node between two existing nodes, it further equalizes the distances between any adjacent nodes to “one level”. Consumers of the ontology may define the level based on their specific needs. In applying the list pattern to build the modifier ontologies, we have the choice of keeping the list open or making it closed. An ontology was implemented with each of the two approaches. The open list approach does not limit the size of the list (Fig. Similar to the open list approach, in the closed list approach, each modifier type is modeled as a list. However, a closed list has a fixed size, where new modifier terms can only be added as synonyms to some existing nodes (terms) in the list (Fig. Open list allows new nodes (i.e., classes) to be inserted anywhere in the list, causing a shift of relative positions of existing nodes, for example, when node 5 is inserted, the original node 5 becomes node 6 (Fig. Both approaches have desirable and undesirable consequences. An open list is more flexible because not only can new types be easily added as a new list, but new modifier terms can also be added either as a class or a synonym. An open list is not suitable to model interval values because when a new term is added as a class, it changes the positions of all the nodes after the insertion point and therefore the relative positions of affected nodes to all other nodes. This changes the semantic distance between affected nodes. As shown (Fig. A closed list is a better fit for modeling interval values because the length of the list (the total semantic range) and the position of the nodes in the list are fixed. This fixed structure makes it easy to define the nodes as disjoint classes and to define a list to include only the given classes. This, in effect, creates a “closed world”, making it possible for the machine to classify an unknown entity (i.e., if an unknown entity is one-level preceding node 4 and one-level following node 2, then it must be node 3). Such classification reasoning cannot be done with an open list due to the “open world” assumption of OWL ontologies: the unknown entity may be node 3 but it could also be a node that has not yet been defined. We also note that open lists allow the ontology to be loaded with more nuanced terms (classes) in a list. Users need to be very cautious when using this feature. Many modifier terms only have subtle differences in meaning and these subtle differences are also quite subjective. This creates two major difficulties in maintaining the ontology’s stability and usability. First, ontology curators and ontology users may not share the same understanding of these terms (and human readable definitions for the terms will not solve this problem). Second, it will be very difficult for different users of the ontologies to use these terms consistently or even for the same users to use these terms consistently over time. The same is true for different curators managing the ontologies. We implemented two modifier ontologies using the approaches respectively because the need for being flexible and the need for stronger machine reasoning capability seem to be important. Users should decide which implementation better meets their needs. Term Categorization Consensus Both open and closed list ontologies need to start by crystallizing the sequential relationships among the available terms for a modifier type. To reveal experts’ shared understanding of modifier terms, five bins were created for each of the four modifier types. For example, for the frequency modifiers, the five bins are frequency_0, frequency_25, frequency_50, frequency_75, and frequency_100. The number five was selected to strike a balance between the need to differentiate a good number of levels in each type of modifiers and the requirement for intuitive and consistent categorization of the terms by the users. The three leading co-authors and the corresponding author categorized the 130 terms into 20 bins (5 bins for each type of modifier) using OTO (Huang et al. 2015, ). Since the terms are on the ordinal scale, the experts were not given numerical ranges for the bins but were instructed to simply categorize the terms based on their intuition: do you feel “sometimes” is more similar to 50% frequency or 75% frequency? OTO supports multi-user categorization of terms and synonyms and records all user decisions and comments. It also allows the user to put the same term into multiple bins (Fig. 3). After independent categorization of the terms, experts met virtually and finalized categorization.

Figure 3.

OTO Group Terms User Interface.

Terms to be categorized are in the Terms panel on the left, and the bins are shown in the Categories panel on the right. The source sentences where terms were used are shown in the Context tab in the lower panel. The user drags and drops a term into a bin. The red circle next to a term indicates users have different categorization decisions on the term. Click on the red circle, different decisions will be shown in a pop-up window. Synonyms of a term are shown with an indent below their preferred term. If a term is put into multiple bins, a numerical index is attached to the term to create copies of terms. The term set used in this study is "modifiers_cui_11170858" on OTO, accessible to any OTO registered user. Ontology Construction After the terms are categorized and categorization reviewed and discussed by the experts, Protege was used to implement the ontologies. Following the user warrant principle (NISO (National Information Standards Organization) 2010), expert consensus on term categorization forms the basis for constructing the ontologies (Tables 1, 2, 3, 4). The following scheme was used to construct a base ontology to which different data properties were then added to create the open list and the five-bin ontologies:

Table 1.

Frequency, certainty, degree, and coverage modifiers with complete consensus among four experts. Proposed labels are in bold. Expert contributed terms are in quotation marks.

frequency_0	frequency_25	frequency_50	frequency_75	frequency_100
never	infrequently, occasionally, seldom, uncommonly, rarely	sometimes	frequently, often, regularly, usually	always, consistently
certainty_0	certainty_25	certainty_50	certainty_75	certainty_100
“uncertain”“unclearly”“doubtfully”	perhaps, possibly	presumably, seemingly	approximately, nearly	decidedly, definitely, distinctly,effectively,essentially,evidentially,evidently,fundamentally,obviously,patently,readily, truly,undoubtedly, virtually
degree_0	degree_25	degree_50	degree_75	degree_100
inconspicuouslyimperceptibly“unnoticeably”	barely, faintly, feebly, gently, hardly, lightly, merely, obscurely, scarcely, slightly, subtly	moderately, relatively, modestly	appreciably, considerably, greatly, highly, much, particularly, profoundly, significantly, strongly, very,noticeably, visibly	boldly, conspicuously, prominently, extremely exceedingly, enormously, exceptionally, extraordinarily, grossly
coverage_0	coverage_25	coverage_50	coverage_75	coverage_100
	sparsely, sparingly		“densely”	entirely, throughout, uniformly

Table 2.

Frequency, certainty, degree, and coverage modifiers with type but not bin consensus among four experts.

Terms		Suggested bins
certainty	almost	certainty_100	certainty_75
	apparently	certainty_100	certainty_75
	basically	certainty_100	certainty_75
	practically	certainty_100	certainty_75
	probably	certainty_75	certainty_50	certainty_25
	reportedly	certainty_75	certainty_50
degree	strikingly	degree_100	degree_75
	notably	degree_50	degree_75
	quite	degree_50	degree_75
	rather	degree_50	degree_75
	fairly	degree_50	degree_25
	mildly	degree_50	degree_25
	somewhat	degree_50	degree_25
	sufficiently	degree_50	degree_100
	markedly	degree_100	degree_75

Table 3.

Terms that have bin consensus but not type consensus among four experts.

Term	Frequency	Degree	Certainty	Coverage
chiefly			_75	_75
mainly			_75	_75
primarily			_75	_75
strictly			_100	_100
exclusively			_100	_100
extensively		_75		_75
fully		_100		_100
totally		_100		_100
completely		_100		_100
largely		_75		_75
mostly		_75		_75
partly		_50		_50
partially		_50		_50
indistinctly		_25	_25
vaguely		_25	_25
perfectly		_100	_100	_100
predominantly	_75			_75
prevalently	_75		_75	_75
commonly	_75		_75	_75
typically	_75		_75	_75

Table 4.

Modifier terms with poor consensus on both type and bin, and their treatment in the ontology

Term	Bins the terms were categorized into by different experts					Treatment of the term for the ontology
Term	Frequency	Certainty	Degree	Coverage	Other	Treatment of the term for the ontology
altogether			_100		yes	Colloquial, excluded from ontologyE.g., The black spot altogether absent
casually	_25					State[pattern] modifier, excludedE.g., Veins regularly or casually anastomosing.
copiously			_75			State [quantity], excluded from ontologyE.g., Petiole copiously glandular when young
dominantly	_75		_75, _100	_75		Included as not RecommendedE.g., Cells dominantly solitary, but short chains can be found
eccentrically					yes	Spatial modifier, excludedE.g., Anthers eccentrically peltate
excessively			_75		yes	Not character modifier, excludedE.g., Females excessively rare
generally	_75	_50, _75		_75		Included as not RecommendedE.g., head otherwise generally smooth and shining.E.g., branches generally quadrangular
imperfectly			_75	_25		State modifier, excludedE.g., Rays furcate or imperfectly so.Ovary superior, imperfectly 2-loculed
incompletely			_75			State and other modifier, excludedE.g., Legumes incompletely 2-locular.E.g., Lamina incompletely 2-pinnate at base.E.g., Scales incompletely cover underlying leaves.
intensely			_75, _100		yes	State [color] modifier, excludedE.g., Petals intensely violet
intermittently	_50			_25, _50		Included as notRecommendedE.g., Sori spreading intermittently along individual veins almost from midrib to margine.
no	_0			_0		Negation, excluded
not	_0		_0			Negation, excluded
powerfully			_100			State[Size] modifier, excludedE.g., Larvae with mandibles powerfully developed for ant larvae
really		_100			yes	Not modify characters, excludedE.g., Really 3 convexities exist.
remarkably			_75		yes	Included as notRecommendedE.g., Style remarkably exserted.
richly				_100	yes	Coverage and state modifiers, excluded.E.g., Vein richly anastomosingStems richly pubescent.
roughly		_50	_50		yes	State and other modifiers. Included as notRecommendedE.g., Bark roughly furrowed.Stigma roughly rectangular.
simply					yes	State modifier, excluded.E.g., margin regularly doubly serrate, rarely simply serrate.
unusually			_75		yes	Included as notRecommendedE.g., Head unusually small
widely				_100	yes	State modifier, excludedE.g., Stem leaves widely spaced

Terms with experts’ full agreement on its type and its bin are considered as class label candidates (Table 1). Within the group of terms for each type and bin (e.g., frequency_75, see Table 1), experts selected one term that best represents the class and this term becomes the class label. This label has the least chance for end users to confuse it with other class labels. The rest of the terms become the exact synonyms of the class (oboInOWL#hasExactSynonym). Two exceptions are “throughout” and “uniformly” categorized under coverage_100. This will be discussed in the Discussion section. Terms with experts’ full agreement on its type, but not on its bin are included in the ontology but annotated as “not recommended” (a new annotation), because there is a good chance for the terms to confuse the end users of the ontology. These terms should be included in the ontology as “not recommended” to discourage the continued usage in scientific publications (Table 2). Terms with experts’ full agreement on its bin, but not on its type (Table 3) are included in the ontology as broader synonyms (oboInOWL#hasBroaderSynonym). We follow the best practice of the Plant Ontology Consortium and use broader synonym annotations to indicate if the term is considered a synonym of two or more different classes (Cooper et al. 2013). Terms without full agreement on its type nor its bin are either included as “not-recommended” or excluded from the ontology (Table 4). Informal terms (colloquial terms) are excluded from the ontology. If an ambiguous modifier is deemed to have a high probability of being used, it is included in the ontology as a not recommended term. State modifiers that fell into Category 9 in Hagedorn (2007) were excluded from the ontology as explained in the “Define the Scope” section. For bins where no terms with full agreement is found, experts contributed terms from their vocabulary. Descriptive sentences using these terms were then checked in other sources and terms with full expert agreement were included in the ontology. In Table 1, expert-contributed terms are enclosed with quotation marks. Classes were given a human readable definition based on their type definition. For example: Frequently (the class label for Frequency_75) is a frequency modifier that indicates around 75% probability of observing a quality. For the open list ontology, ordinal properties such as more_frequently_than and less_frequently_than were used to indicate the order of the classes in a list. The five-bin implementation of the ontology uses interval properties such as one_level_more_frequently_than and one_level_less_frequently_than. In addition, five-bin version also uses only (opposed to some) existence indicators, disjoint statements, and logical OR operators to make the lists “closed” worlds.

Results

Modifier terms categorized with full agreement on both modifier type and bin accounted for 57.7% of all categorized terms (Table Phenotype Modifier Ontology (open list) and Phenotype Modifier Ontology (5-bin) were created, each contains 44 classes and 128 terms. The ontologies can be accessed at ). In the current modifier ontologies, a set of inverse object properties are defined for each type of modifier (e.g., more_frequently_than, less_frequently_than in the open list version, and one_level_ more_frequently_than, one_level_less_frequently_than in the five-bin version), as opposed to using one generic object property for all types of modifiers (Fig. 1). We believe this treatment better models reality because one level of frequency can be semantically different from one level of certainty. These object properties are subproperties of follows/precedes or next item/previous item properties imported from the list pattern.

Figure 1.

List related object properties in Open List and Five-Bin Ontologies

Discussion

An ontology is a conceptual representation of the consensus of a domain. In the modifier domain, we show that there is a level of consensus among the experts: 16 of 20 bins end up holding terms with full agreement. We acknowledge stronger/weaker consensus can be obtained if we had used smaller/larger number of bins. This result suggests that five bins capture a good amount of consensus and a reasonable number of levels most applications need to distinguish within a modifier type. Since the two ontologies share the same set of terms, the consensus gathered from the experts are presented in both. We would like users to decide which ontology works better for their application and it would be interesting to see how the open list ontology evolves with use over time. In the process of categorizing the terms, Certainty and Degree modifiers were the most difficult to separate among the four types of modifiers. We note that characters that are intense or with great measurements may imply a high certainty of the observation of the character. However, a high certainty does not always correlate with a stronger degree. Based on this observation, terms primarily describing a degree should be categorized as degree and not extended automatically to certainty. For example, authors may have used the words “visibly” and “noticeable” to indicate certainty on characters, however, knowing the ambiguity associated with certainty and degree terms, we need to alert future authors to the difference. Relatively fewer terms were consistently categorized into Coverage (Table 1). The vast majority (90%) of the terms that had only type disagreement were categorized as Coverage by at least one expert (Table 3). Terms such “mostly” and “generally” are used frequently in phenotype descriptions, but it was not easy to ascertain what the authors tried to express with the term. For example, “leaves mostly short-petiolate”, was the author trying to say “leaves clearly short-petiolate”(degree), “most leaves short-petiolate” (coverage), or even “leaves usually short-petiolate” (frequency)? Such terms are included in the ontology with an annotation (broader synonym or not recommended) to alert future authors of the ambiguity with hope that these terms will not be used. We also considered the term “intermittently” as a potential coverage_50 modifier to fill the empty bin in Table 1, but there was only one usage of the term (Table 4) in over 21,000 descriptions included in this exercise, and the experts could not agree on its meaning. We decided to leave the empty bins for future work. PATO has a frequency class and also treats degree terms to an extent, but they both are different from the modifier ontologies. PATO:frequency (PATO_0000044) is a physical quality of a process, “which inheres in a bearer by virtue of the number of the bearer’s repetitive actions in a particular time”. Based on this definition, PATO:frequency is a quality itself and not a modifier to a quality. Using one example to differentiate the two concepts: a PATO:frequency can be rate of heart beat, say 70 times/min, in contrast, our frequency modifiers describe how often we observe a heart beat of 70 times/min. Hence, frequency modifiers are different from PATO:frequency, conceptually. In our ontologies, we used label “frequency_modifier” to make the difference clear. PATO employs a consistent pattern of representing the extent of measurable qualities as “decreased”, “increased”, or “normal”, for example, increased degree of illumination, decreased length. This is one way to bring out the degree semantics of a quality by referring to an implied normal value. The treatment of degree modifiers in the modifier ontologies is ignorant of any norm, and only attempts to represent the ranges of the degree for a quality. The concept of modifiers is also used in the Human Phenotype Ontology (Köhler et al. 2016) as reflected in "Clinical modifier" and "Frequency" classes. HPO:Frequency class is similar to our Frequency modifiers in that it bins freqency into a number of ranges: Excluded (0% of the cases), Very rare (1-4%), Occasional (5-29%), Frequent (30-79%), Very frequent (80-99%) and Obligate(100%). HPO:Frequency class is not applicable to our application for several reasons: (1) The class labels (e.g., excluded, obligate) are not terms used by the majority of taxonomists. We believe meaninful class lables are critical to the usability of an ontology. (2) Due to the broad range of various taxon groups we need to cover, precise ranges of percentages of the cases are not going to be applicable to all groups. (3) It is very unlikely for various taxon groups to record and compute the percentage of cases for an undefined number of characters they may care. HPO:Clinical modifier class holds subclasses "Agravated by", "Ameliorated by", "Pain characteristic", "Phenotpic variability", "Position", "Refractory", "Severity", and "Triggered by". All but "Severity" is disjoint from the types of modifiers that we treat in the modifier ontologies. HPO:Severity overlaps with the Degree modifiers, but it holds subclasses that are applicable to clinical settings: Boderline, Mild, Moderate, Severe, and Profound. While these ontologies recognize the need to treat modifiers seperately and observed sequential relations among the terms, another key difference between the treatment of modifiers in HPO, as compared to our ontology construct, is that the two Modifier Ontologies we created have clear logic definitions order the terms that form a range, while HPO only has human readable definition. The five-bin ontology is currently being used for comparing taxon concepts in the ETC project (Cui et al. 2016). The Taxonomy Comparison tool of the ETC project uses the morphological characters extracted from taxonomic descriptions to facilitate taxon concept resolution tasks. The intuition is that character evidence documented should correlate well with expert asserted relationships between two taxon concepts: if an expert asserts that one taxon concept is congruent with another, then the characters described for the two concepts should be very similar. ETC Text Capture tools extract characters from text for the Taxonomy Comparison tool to compute the similarity between two characters. For example, are “leaves usually toothed” “leaves often toothed”, and “leaves rarely toothed” essentially the same or somewhat different? With an interval list that has a fixed number of elements, as implemented in the five-bin ontology, the software can be configured to reliably compute the similarity score without being affected by ontology updates. The two ontologies are being applied in another project entitled “Authors in the driver's seat: fast, consistent, computable phenotype data and ontology production”, recently funded by the US National Science Foundation (Cui et al. 2017). Recognizing that the semantic ambiguity in vocabulary usage by the authors at the time of writing results in inconsistent interpretations of documented characters at the time of use (Cui et al. 2015, Endara et al. 2017, Dahdul et al. 2018), the project aims to investigate effective ways to help phenotype authors converge on their term usage and to produce ontology-informed characters for computer algorithms to harvest. These two modifier ontologies will be compared in empirical studies to evaluate their effectiveness for this purpose. For example, the need of authors to add a term as a class vs. a synonym will be examined, in addition to the frequency of authors adopting a modifier from the given classes and exact synonyms.

Conclusions

The two modifier ontologies were created by following the literary warrant and user warrant principles of the national standard on constructing controlled vocabularies, using the list ontology pattern. The ontologies address four types of modifier terms (frequency, certainty, degree, and coverage) that are used widely in describing phenotype characters but have not been treated by existing ontologies. We have made the ontologies public accessible on GitHub. These ontologies can be used to support machine-based character similarity calculations and to increase author’s awareness of the ambiguities in modifier terms.

Data resources

Included or linked to within the manuscript

14 in total

1. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

Authors: Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis
Journal: Nat Biotechnol Date: 2007-11 Impact factor: 54.908

2. Entity/quality-based logical definitions for the human skeletal phenome using PATO.

Authors: Georgios V Gkoutos; Chris Mungall; Sandra Dolken; Michael Ashburner; Suzanna Lewis; John Hancock; Paul Schofield; Sebastian Kohler; Peter N Robinson
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2009

3. Three ontologies to define phenotype measurement data.

Authors: Mary Shimoyama; Rajni Nigam; Leslie Sanders McIntosh; Rakesh Nagarajan; Treva Rice; D C Rao; Melinda R Dwinell
Journal: Front Genet Date: 2012-05-28 Impact factor: 4.599

4. Using ontologies to describe mouse phenotypes.

Authors: Georgios V Gkoutos; Eain C J Green; Ann-Marie Mallon; John M Hancock; Duncan Davidson
Journal: Genome Biol Date: 2004-12-20 Impact factor: 13.583

5. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants.

Authors: Robert Hoehndorf; Mona Alshahrani; Georgios V Gkoutos; George Gosline; Quentin Groom; Thomas Hamann; Jens Kattge; Sylvia Mota de Oliveira; Marco Schmidt; Soraya Sierra; Erik Smets; Rutger A Vos; Claus Weiland
Journal: J Biomed Semantics Date: 2016-11-14

6. Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building.

Authors: Hong Cui; Dongfang Xu; Steven S Chong; Martin Ramirez; Thomas Rodenhausen; James A Macklin; Bertram Ludäscher; Robert A Morris; Eduardo M Soto; Nicolás Mongiardino Koch
Journal: BMC Bioinformatics Date: 2016-11-17 Impact factor: 3.169

Review 7. The Human Phenotype Ontology in 2017.

Authors: Sebastian Köhler; Nicole A Vasilevsky; Mark Engelstad; Erin Foster; Julie McMurry; Ségolène Aymé; Gareth Baynam; Susan M Bello; Cornelius F Boerkoel; Kym M Boycott; Michael Brudno; Orion J Buske; Patrick F Chinnery; Valentina Cipriani; Laureen E Connell; Hugh J S Dawkins; Laura E DeMare; Andrew D Devereau; Bert B A de Vries; Helen V Firth; Kathleen Freson; Daniel Greene; Ada Hamosh; Ingo Helbig; Courtney Hum; Johanna A Jähn; Roger James; Roland Krause; Stanley J F Laulederkind; Hanns Lochmüller; Gholson J Lyon; Soichi Ogishima; Annie Olry; Willem H Ouwehand; Nikolas Pontikos; Ana Rath; Franz Schaefer; Richard H Scott; Michael Segal; Panagiotis I Sergouniotis; Richard Sever; Cynthia L Smith; Volker Straub; Rachel Thompson; Catherine Turner; Ernest Turro; Marijcke W M Veltman; Tom Vulliamy; Jing Yu; Julie von Ziegenweidt; Andreas Zankl; Stephan Züchner; Tomasz Zemojtel; Julius O B Jacobsen; Tudor Groza; Damian Smedley; Christopher J Mungall; Melissa Haendel; Peter N Robinson
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

8. The plant ontology as a tool for comparative plant anatomy and genomic analyses.

Authors: Laurel Cooper; Ramona L Walls; Justin Elser; Maria A Gandolfo; Dennis W Stevenson; Barry Smith; Justin Preece; Balaji Athreya; Christopher J Mungall; Stefan Rensing; Manuel Hiss; Daniel Lang; Ralf Reski; Tanya Z Berardini; Donghui Li; Eva Huala; Mary Schaeffer; Naama Menda; Elizabeth Arnaud; Rosemary Shrestha; Yukiko Yamazaki; Pankaj Jaiswal
Journal: Plant Cell Physiol Date: 2012-12-05 Impact factor: 4.927

9. Ontology Design Patterns for bio-ontologies: a case study on the Cell Cycle Ontology.

Authors: Mikel Egaña Aranguren; Erick Antezana; Martin Kuiper; Robert Stevens
Journal: BMC Bioinformatics Date: 2008-04-29 Impact factor: 3.169

10. The anatomy of phenotype ontologies: principles, properties and applications.

Authors: Georgios V Gkoutos; Paul N Schofield; Robert Hoehndorf
Journal: Brief Bioinform Date: 2018-09-28 Impact factor: 11.622