| Literature DB >> 23061930 |
Tudor Groza1, Jane Hunter, Andreas Zankl.
Abstract
BACKGROUND: Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. In order to fully capture the intrinsic value and knowledge expressed within them, we need to take advantage of their inner structure, which implicitly combines qualities and anatomical entities. The first step in this process is the segmentation of the phenotype descriptions into their atomic elements.Entities:
Mesh:
Year: 2012 PMID: 23061930 PMCID: PMC3495645 DOI: 10.1186/1471-2105-13-265
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of segmentation of a phenotype description according to the two phases of our approach. In phase I, tokens are labeled using BIO labeling corresponding to each class: Q (quality), A (anatomy), C (connective). Label O (not present in the figure) is used to denote tokens outside the target classes. Similarly, in phase II, specific classifiers are used to segment quality-qualifier pairs (Q-QF) and anatomical parts (A), sub-parts (AP) and coordinates (P).
Phenotype description corpus statistics
| Anatomy (A-B/A-I) | 13,003 | 70.08% |
| Quality (Q-B/Q-I) | 5,465 | 29.45% |
| Connectives (C-B/C-I) | 43 | 0.23% |
| Outside (O) | 45 | 0.24% |
| TOTAL | 18,556 |
Statistics for the Anatomy corpora used in phase II
| Main anatomy (A-B/A-I) | 2,984 | 41.01% |
| Anatomy part (AP-B/AP-I) | 1,209 | 16.61% |
| Anatomy coordinate (PB) | 698 | 9.60% |
| Connectives (C-B/C-I) | 2,354 | 32.35% |
| Outside (O) | 31 | 0.43% |
| TOTAL | 7,276 |
Statistics for the Quality corpora used in phase II
| Quality (Q-B/Q-I) | 2,141 | 76.31% |
| Qualifier (QF-B/QF-I) | 590 | 21.03% |
| Connectives (C-B/C-I) | 52 | 1.84% |
| Outside (O) | 23 | 0.82 |
| TOTAL | 2,806 |
Comparative segmentation results for phase I, with domain dictionaries
| Individual (YamCha1vsAll) | 96.98 | 96.98 | 96.98 |
| Set operations | 96.64 | 97.17 | 96.91 |
| Voting (veto: YamCha1vsAll) | 97.05 | 97.05 |
Comparative segmentation results for phase I, without domain dictionaries
| Individual (YamCha1vsAll) | 96.80 | 96.80 | 96.80 |
| Set operations | 96.31 | 97.11 | 96.70 |
| Voting (veto: YamCha1vs1) | 96.90 | 96.90 |
Comparative segmentation results for phase II on the Anatomy category
| Individual (CRF++) | 97.11 | 97.11 | 97.11 |
| Set operations | 96.71 | 97.38 | 97.04 |
| Voting (veto: CRF++ / MALLET) | 97.16 | 97.16 |
Comparative segmentation results for phase II on the Quality category
| Individual (YamCha1vsAll) | 94.50 | 94.50 | |
| Set operations | 93.84 | 94.64 | 94.24 |
| Voting (YamCha1vsAll) | 94.50 | 94.50 |
Label-based segmentation results for phase I, including the coverage of the label
| Q-B | 20.83 | 96.40 |
| Q-I | 8.62 | 88.93 |
| A-B | 15.17 | 94.91 |
| A-I | 54.91 | 98.79 |
| C-B | 0.22 | 47.73 |
| C-I | 0.01 | 10.00 |
| O | 0.24 | 12.50 |
Label-based segmentation results for phase II, the Anatomy category, including the coverage of the label
| A-B | 24.90 | 96.17 |
| A-I | 16.11 | 95.19 |
| AP-B | 15.67 | 96.10 |
| AP-I | 0.94 | 84.36 |
| P-B | 9.60 | 97.06 |
| C-B | 18.04 | 98.48 |
| C-I | 14.31 | 100.00 |
| O | 0.43 | 0.00 |
Label-based segmentation results for phase II, the Quality category, including the coverage of the label
| Q-B | 68.48 | 96.51 |
| Q-I | 7.83 | 79.06 |
| QF-B | 20.31 | 91.36 |
| QF-I | 0.72 | 55.62 |
| C-B | 1.70 | 95.65 |
| C-I | 0.14 | 28.00 |
| O | 0.82 | 17.99 |