| Literature DB >> 30858527 |
Sarah M Alghamdi1,2, Beth A Sundberg3, John P Sundberg3, Paul N Schofield4,5, Robert Hoehndorf6.
Abstract
Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.Entities:
Mesh:
Year: 2019 PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Overview of the strains used in our analysis.
| Strain | Number of mice in each strain | ||||||
|---|---|---|---|---|---|---|---|
| Female | Male | total | 12 M | 20 M | LONG | total | |
| 129S1/SvImJ | 31 | 40 | 71 | 30 | 28 | 13 | 71 |
| A/J | 40 | 32 | 72 | 27 | 16 | 29 | 72 |
| BALB/cByJ | 41 | 27 | 68 | 28 | 24 | 16 | 68 |
| BTBR T+ tf/J | 28 | 23 | 51 | 32 | 13 | 6 | 51 |
| BUB/BnJ | 25 | 12 | 37 | 23 | 11 | 3 | 37 |
| C3H/HeJ | 28 | 29 | 57 | 23 | 16 | 18 | 57 |
| C57BL/10J | 30 | 35 | 65 | 28 | 26 | 11 | 65 |
| C57BL/6J | 36 | 39 | 75 | 30 | 29 | 16 | 75 |
| C57BLKS/J | 36 | 43 | 79 | 27 | 28 | 24 | 79 |
| C57BR/cdJ | 30 | 34 | 64 | 30 | 26 | 8 | 64 |
| C57L/J | 31 | 31 | 62 | 30 | 29 | 3 | 62 |
| CBA/J | 36 | 32 | 68 | 29 | 24 | 15 | 68 |
| DBA/2J | 24 | 24 | 48 | 23 | 13 | 12 | 48 |
| FVB/NJ | 28 | 26 | 54 | 29 | 13 | 12 | 54 |
| KK/HlJ | 26 | 22 | 48 | 27 | 14 | 7 | 48 |
| LP/J | 30 | 37 | 67 | 30 | 27 | 10 | 67 |
| MRL/MpJ | 48 | 31 | 79 | 30 | 17 | 32 | 79 |
| NOD.B10Sn-H2/J | 25 | 21 | 46 | 24 | 17 | 5 | 46 |
| NON/ShiLtJ | 36 | 28 | 64 | 28 | 25 | 11 | 64 |
| NZO/H1LtJ | 18 | 12 | 30 | 17 | 11 | 2 | 30 |
| NZW/LacJ | 25 | 27 | 52 | 29 | 18 | 5 | 52 |
| P/J | 13 | 9 | 22 | 3 | 16 | 3 | 22 |
| PL/J | 15 | 17 | 32 | 23 | 7 | 2 | 32 |
| PWD/PhJ | 34 | 28 | 62 | 27 | 23 | 12 | 62 |
| RIIIS/J | 36 | 32 | 68 | 29 | 26 | 13 | 68 |
| SM/J | 28 | 30 | 58 | 28 | 24 | 6 | 58 |
| SWR/J | 24 | 19 | 43 | 20 | 12 | 11 | 43 |
| WSB/EiJ | 27 | 26 | 53 | 26 | 22 | 5 | 53 |
| Total | 829 | 766 | 1595 | 730 | 555 | 310 | 1595 |
OQuaRE measures: tangledness (TMOnto) represents the mean number of classes with more than one directed parent, the weighted method count (WMCOnto) is the average depth of leaf classes, and DITOnto is the depth of the subsumption hierarchy.
| TMOnto | WMCOnto | DITOnto | |
|---|---|---|---|
| MAP | 0.6093 | 4.3008 | 11 |
| MAPT | 0.3704 | 3.5914 | 10 |
| PAM | 0.6251 | 4.6638 | 11 |
| PAMT | 0.5749 | 4.6701 | 11 |
| MA | 0.0381 | 1.7033 | 9 |
| MPATH | 0.1025 | 4.5134 | 8 |
OQuaRE measures: comparison of the inferred axioms using the HermiT reasoner.
| TMOnto | WMCOnto | DITOnto | ||||
|---|---|---|---|---|---|---|
| inferred | asserted | inferred | asserted | inferred | asserted | |
| MAP | 0.6093 | 0.0376 | 4.3008 | 1.5413 | 11 | 9 |
| MAPT | 0.3704 | 0.0376 | 3.5914 | 1.5413 | 10 | 9 |
| PAM | 0.6251 | 0.0376 | 4.6638 | 1.5413 | 11 | 9 |
| PAMT | 0.5749 | 0.0376 | 4.6701 | 1.5413 | 11 | 9 |
The area under cluster purity curves, based on four different clustering methods and using annotations to all six ontologies.
| Complete Linkage | UPMGA | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MA | MAP | MAPT | PAM | PAMT | MPATH | MA | MAP | MAPT | PAM | PAMT | MPATH | ||
| 12m-F | 0.6324 | 0.6624 | 0.6614 | 0.6776 | 0.6740 | 0.6390 | 12m-F | 0.6093 | 0.6393 | 0.6469 | 0.6539 | 0.6619 | 0.6358 |
| 12m-M | 0.6431 | 0.6563 | 0.6572 | 0.6703 | 0.6700 | 0.6487 | 12m-M | 0.6234 | 0.6346 | 0.6423 | 0.6594 | 0.6586 | 0.6336 |
| 20m-F | 0.6532 | 0.6753 | 0.6707 | 0.6997 | 0.7036 | 0.6359 | 20m-F | 0.6215 | 0.6607 | 0.6582 | 0.6780 | 0.6699 | 0.6148 |
| 20m-M | 0.6497 | 0.6959 | 0.7144 | 0.7044 | 0.7026 | 0.6481 | 20m-M | 0.6253 | 0.6721 | 0.6903 | 0.6907 | 0.6924 | 0.6320 |
| LONG-F | 0.6654 | 0.6902 | 0.7126 | 0.6973 | 0.7049 | 0.6541 | LONG-F | 0.6496 | 0.6858 | 0.7114 | 0.6733 | 0.6832 | 0.6476 |
| LONG-M | 0.6041 | 0.6479 | 0.6329 | 0.6127 | 0.6329 | 0.5782 | LONG-M | 0.5927 | 0.6127 | 0.6209 | 0.6046 | 0.6078 | 0.5802 |
| weighted average | 0.6521 | 0.6755 | 0.6793 |
| 0.6863 | 0.6432 | weighted average | 0.6306 | 0.6546 | 0.6641 | 0.6676 |
| 0.6329 |
| Neighbor-Joining | K-medoids | ||||||||||||
| MA | MAP | MAPT | PAM | PAMT | MPATH | MA | MAP | MAPT | PAM | PAMT | MPATH | ||
| 12m-F | 0.6618 | 0.6398 | 0.6644 | 0.6474 | 0.6249 | 0.6494 | 12m-F | 0.6380 | 0.6598 | 0.6523 | 0.6449 | 0.6468 | 0.6254 |
| 12m-M | 0.6361 | 0.6386 | 0.6312 | 0.6132 | 0.6202 | 0.6290 | 12m-M | 0.6401 | 0.6659 | 0.6596 | 0.6489 | 0.6518 | 0.6296 |
| 20m-F | 0.6817 | 0.6686 | 0.6210 | 0.6360 | 0.6434 | 0.6442 | 20m-F | 0.6366 | 0.6870 | 0.6735 | 0.6776 | 0.6834 | 0.6260 |
| 20m-M | 0.6490 | 0.6454 | 0.6397 | 0.6350 | 0.6377 | 0.6570 | 20m-M | 0.6443 | 0.6706 | 0.6838 | 0.7087 | 0.6931 | 0.6358 |
| LONG-F | 0.5719 | 0.5927 | 0.5809 | 0.5795 | 0.5832 | 0.5826 | LONG-F | 0.6780 | 0.7112 | 0.6950 | 0.7001 | 0.6938 | 0.6292 |
| LONG-M | 0.5731 | 0.5753 | 0.5641 | 0.5796 | 0.5704 | 0.5829 | LONG-M | 0.6067 | 0.6312 | 0.6325 | 0.5938 | 0.6091 | 0.5766 |
| weighted average |
| 0.6410 | 0.6336 | 0.6287 | 0.6264 | 0.6386 | weighted average | 0.6498 |
| 0.6708 | 0.6679 | 0.6690 | 0.6319 |
Figure 1ROC curves for identifying mice from the same strain based on phenotypic similarity, and separated by the six groups of mice used in our analysis.
Area under the ROC curve.
| 12 m-F | 12 m-M | 20 m-F | 20 m-M | LONG-F | LONG-M | Weighted average | |
|---|---|---|---|---|---|---|---|
| MA | 0.6961 | 0.7038 | 0.7412 | 0.7384 | 0.7430 | 0.7202 | 0.6962 |
| MAP | 0.7276 | 0.7287 | 0.7705 | 0.7661 |
|
| 0.7221 |
| MAPT |
| 0.7340 | 0.7701 | 0.7769 | 0.7655 | 0.7345 |
|
| PAM | 0.7149 |
|
|
| 0.7465 | 0.7129 | 0.7234 |
| PAMT | 0.7126 | 0.7350 | 0.7789 | 0.7938 | 0.7494 | 0.7129 | 0.7224 |
| MPATH | 0.6950 | 0.7098 | 0.7387 | 0.7416 | 0.7044 | 0.6785 | 0.6899 |