| Literature DB >> 23440304 |
Tudor Groza1, Jane Hunter, Andreas Zankl.
Abstract
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. The intrinsic value and knowledge captured within such descriptions can only be expressed by taking advantage of their inner structure that implicitly combines qualities and anatomical entities. We present a meta-model (the Phenotype Fragment Ontology) and a processing pipeline that enable together the automatic decomposition and conceptualization of phenotype descriptions for the human skeletal phenome. We use this approach to showcase the usefulness of the generic concept of phenotype decomposition by performing an experimental study on all skeletal phenotype concepts defined in the Human Phenotype Ontology.Entities:
Keywords: human skeletal phenome; ontologies; phenotype decomposition; phenotype segmentation
Year: 2013 PMID: 23440304 PMCID: PMC3572876 DOI: 10.4137/BII.S10729
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Figure 1The Phenotype Fragment Ontology.
Notes: Concepts introduced by the ontology are depicted with continuous lines, while those imported from other ontologies, such as FMA and PATO, are depicted with dotted lines. Similarly, the relationships introduced by PFO are bolded in the figure, while those imported from the Relation Ontology are not.
Figure 2Phenotype decomposition and conceptualization pipeline.
Notes: The pipeline has three phases: 1. Segmentation, that is, textual representations of the phenotype descriptions are segmented into their atomic elements, that is, anatomical (A) and quality (Q) entities, that is, resulting segments are reordered and further segmented into anatomical parts (AP, A) and coordinates (P), and qualities and qualifiers, respectively. Both segmentations use the BIO format; 2. Alignment, that is, resulted segments are aligned to FMA and PATO concepts; part-subpart relationships between anatomical entities are preserved; 3. Representation, that is, aligned concepts are used to create PFO entities.
Features used for classification in the segmentation phase. Examples are provided using the token “epiphysis” from Figure 2.
| Token | Current token | epiphysis |
| Token prefix (variable size) | Token prefixes (size in example is 5) | e ep epi epip epiph |
| Token postfix (variable size) | Token postfixes (size in example is 5) | s is sis ysis physis |
| Token shape | Shape of token by replacing all capitalized letters with ‘A’, all non-capitalized letters with ‘a’ and all digits with ‘0’ | aaaaaaaaa |
| Token brief shape | Compressed version of the token shape where all consecutive equal characters are compressed | a |
| Token lemma | Token lemma (stem) | epiphysi |
| Token POS tag | Part of speech tag of token | NNP |
| Morpho: punctuation | Flag to indicate whether the token ends in a punctuation sign | no |
| Morpho: vowels | Shape of token provided by replacing all consonants with ‘-’ | e-i----i- |
| Morpho digits | Shape of token by replacing all digits with ‘*’ | no* |
| Context: unigram | Unigram-based surrounding context of token (variable window size). Window size in example is 3 | Stippling of the epiphysis of the proximal |
| Context: bigram | Bigram-based surrounding context of token (variable window size). Window size in example is 3 | Stippling-of of-the the-epiphysis epiphysis-of of-the the-proximal |
| Context: trigram | Trigram-based surrounding context of token (variable window size). Window size in example is 3 | Stippling-of-the of-the-epiphysis the-epiphysis-of epiphysis-of-the of-the-proximal |
| Dictionary: conjunctions | Lexicon comprising conjuctions (and, or) | |
| Dictionary: connectives | Lexicon comprising connective tokens (at, of, the, etc) | |
| Dictionary: ordinals | Lexicon comprising ordinals (1st, 2nd, etc) | |
| Dictionary: coordinates | Lexicon comprising anatomical coordinates (central, left, etc) | |
| Dictionary: anatomy | Gazzetteer compiled from unigrams of FMA concepts | |
| Dictionary: quality | Gazzetteer compiled from unigrams of PATO concepts |
Figure 3Similarity matrix and traces computation. (A) Similarity matrix and traces computation between vertebral bodies using the normal order and the FMA concept Spinal_reticular_process; (B) Similarity matrix and traces computation between the inverse order of vertebral bodies and the FMA concept Body_of_vertebra; stop-words are discarded during the traces computation.
Comparative results for the first step of the segmentation phase.
| YamCha1vsAll | 96.94 | 96.94 | 96.94 | 96.70 | 96.70 | 96.70 |
| Set operations | 96.63 | 97.21 | 96.92 | 96.27 | 97.03 | 96.65 |
| Voting (YamCha1vs1) | 97.04 | 97.04 | 97.04 | 96.77 | 96.77 | 96.77 |
Comparative results for the second step of the segmentation phase.
| CRF++/YamCha1vsAll | 97.15 | 97.15 | 97.15 | 92.21 | 92.21 | 92.21 |
| Set operations | 96.74 | 96.74 | 96.74 | 91.32 | 91.32 | 91.32 |
| Voting (CRF++/CRF++) | 97.26 | 97.26 | 97.26 | 92.44 | 92.44 | 92.44 |
Evaluation results of the alignment phase.
| Anatomy | 88.81 | 85.59 | 87.17 |
| Qualities | 93.05 | 90.13 | 91.56 |
Often occurring concepts in the skeletal phenotypes in HPO.
| Phalanx (1094—30.92%) | Abnormality (337—9.52%) |
| Epiphysis (612—17.29%) | Hypoplasia (187—5.28%) |
| Toe (598—16.9%) | Aplasia (178—5.03%) |
| Finger (537—15.17%) | -shaped (174—4.91%) |
| Proximal (384—10.85%) | Sclerosis (136—3.84%) |
| Distal (367—10.37%) | Duplication (133—3.75%) |
| Middle (278—7.85%) | Absent (102—2.88%) |
Distribution of FMA and PATO concepts in the skeletal phenotypes in HPO.
| Existing concepts | 330 (86.16%) | 183 (33.27%) |
| Non-existing concepts | 53 (13.83%) | 165 (30%) |
| Atomic phenotypes | – | 202 (36.72%) |
Coverage of missing FMA and PATO concepts in the skeletal phenotypes in HPO.
| Non-existing FMA concepts | 275 (7.77%) |
| Atomic phenotypes | 683 (19.30%) |
| Non-existing PATO concepts | 592 (16.73%) |