| Literature DB >> 33319711 |
Francisco Abad-Navarro1,2, Manuel Quesada-Martínez3, Astrid Duque-Ramos4, Jesualdo Tomás Fernández-Breis5,6.
Abstract
BACKGROUND: The increasing adoption of ontologies in biomedical research and the growing number of ontologies available have made it necessary to assure the quality of these resources. Most of the well-established ontologies, such as the Gene Ontology or SNOMED CT, have their own quality assurance processes. These have demonstrated their usefulness for the maintenance of the resources but are unable to detect all of the modelling flaws in the ontologies. Consequently, the development of efficient and effective quality assurance methods is needed.Entities:
Keywords: Ontologies; Quality assurance; Quality metrics; Readability; Structural accuracy
Year: 2020 PMID: 33319711 PMCID: PMC7737250 DOI: 10.1186/s12911-020-01291-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Top 50-most frequently used annotation properties in BioPortal
| Annotation property | Usage per entity |
|---|---|
| 0.52 | |
| 0.44 | |
| 0.42 | |
| 0.42 | |
| 0.37 | |
| 0.35 | |
| 0.16 | |
| 0.10 | |
| 0.10 | |
| 0.08 | |
| 0.08 | |
| 0.08 | |
| 0.08 | |
| 0.07 | |
| 0.07 | |
| 0.07 | |
| 0.07 | |
| 0.04 | |
| 0.04 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.03 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.02 | |
| 0.01 |
Identified annotations properties for describing labels, synonyms, and descriptions
| Name | |
| Synonym | |
| Description | |
Fig. 1Example of hierarchy formed by concepts in SNOMED CT
Fig. 2Values of the readability metrics for all of the SNOMED CT versions included in the study: 2011–2019
Fig. 3Box plots indicating the number of descriptions, names, and synonyms per class and per object property. Y axis was limited to 5 because of readability reasons
Fig. 4Values of the structural metrics for all of the SNOMED CT versions included in the study: 2011–2019
Top ten LR classes according to the systematic naming metric, sorted by positive cases
| LR class (label) | LR class depth | Positive cases (depth) (distance) | Negative cases (depth) (distance) | Metric value |
|---|---|---|---|---|
| 411123000 (diagnostic allergen extract) | 4 | 336 (5.961) (1.961) | 0 (NA) (NA) | 1 |
| 256259004 (pollen) | 7 | 289 (8.606) (2.346) | 0 (NA) (NA) | 1 |
| 24851008 (deoxyribonucleic acid) | 8 | 243 (10.465) (2.465) | 0 (NA) (NA) | 1 |
| 263490005 (status) | 6 | 38 (7.658) (1.658) | 0 (NA) (NA) | 1 |
| 257351008 (shunt) | 7 | 22 (8) (1) | 0 (NA) (NA) | 1 |
| 87612001 (blood) | 5 | 20 (6.2) (1.2) | 0 (NA) (NA) | 1 |
| 449872003 (powder) | 5 | 19 (6.158) (1.158) | 0 (NA) (NA) | 1 |
| 264193005 (segment) | 7 | 10 (8.4) (1.4) | 0 (NA) (NA) | 1 |
| 255711007 (pattern) | 6 | 9 (7) (1) | 0 (NA) (NA) | 1 |
| 277536004 (serogroup) | 6 | 8 (7) (1) | 0 (NA) (NA) | 1 |
Top ten of LR classes according to the systematic naming metric, showing their hierarchy in SNOMED CT
| LR class (label) | Hierarchy |
|---|---|
| 411123000 (diagnostic allergen extract) | Pharmaceutical/biologic product |
| 256259004 (pollen) | Substance |
| 24851008 (deoxyribonucleic acid) | Substance |
| 263490005 (status) | SNOMED CT model component |
| 257351008 (shunt) | Physical object |
| 87612001 (blood) | Substance |
| 449872003 (powder) | Substance |
| 264193005 (segment) | Qualifier value |
| 255711007 (pattern) | SNOMED CT model component |
| 277536004 (serogroup) | Qualifier value |
Bottom ten of LR classes according to the systematic naming metric, sorted by negative cases
| LR class (label) | LR class depth | Positive cases (depth) (distance) | Negative cases (depth) (distance) | Metric value |
|---|---|---|---|---|
| 385268001 (oral dose form) | 4 | 0 (NA) (NA) | 61 (5.607) (1.607) | 0 |
| 273248003 (action) | 6 | 0 (NA) (NA) | 27 (7.259) (1.259) | 0 |
| 385287007 (parenteral dose form) | 4 | 0 (NA) (NA) | 25 (5.16) (1.16) | 0 |
| 740596000 (cutaneous dose form) | 4 | 0 (NA) (NA) | 24 (5.167) (1.167) | 0 |
| 10546003 (site) | 6 | 0 (NA) (NA) | 11 (7.545) (1.545) | 0 |
| 133936004 (adult) | 5 | 0 (NA) (NA) | 6 (6.5) (1.5) | 0 |
| 260726005 (part) | 7 | 0 (NA) (NA) | 4 (8) (1) | 0 |
| 246176004 (form) | 7 | 0 (NA) (NA) | 3 (8.667) (1.667) | 0 |
| 738984000 (parenteral) | 4 | 0 (NA) (NA) | 3 (5) (1) | 0 |
| 116154003 (patient) | 5 | 0 (NA) (NA) | 2 (6) (1) | 0 |
Bottom ten of LR classes according to the systematic naming metric, showing their hierarchy in SNOMED CT
| LR class (label) | Hierarchy |
|---|---|
| 385268001 (oral dose form) | Qualifier value |
| 273248003 (action) | SNOMED CT model component |
| 385287007 (parenteral dose form) | Qualifier value |
| 740596000 (cutaneous dose form) | Qualifier value |
| 10546003 (site) | SNOMED CT model component |
| 133936004 (adult) | Social context |
| 260726005 (part) | SNOMED CT model Component |
| 246176004 (form) | SNOMED CT model Component |
| 738984000 (parenteral) | Qualifier value |
| 116154003 (patient) | Social context |
Fig. 5Boxplots for the LSLD and systematic naming metrics for each LR class in the SNOMED CT July 2019 release
Top ten of LR classes according to the LSLD metric, sorted by metric value first, and positive cases later
| LR Class (label) | LR class depth | Positive cases (depth) (distance) | Negative cases (depth) (distance) | Metric value |
|---|---|---|---|---|
| 736849007 (conventional release) | 4 | 6071 (7.082) (3.082) | 0 (NA) (NA) | 1 |
| 385268001 (oral dose form) | 4 | 2955 (5.725) (1.725) | 0 (NA) (NA) | 1 |
| 421026006 (conventional release oral tablet) | 5 | 2286 (7.212) (2.212) | 0 (NA) (NA) | 1 |
| 385287007 (parenteral dose form) | 4 | 1613 (5.581) (1.581) | 0 (NA) (NA) | 1 |
| 420692007 (conventional release oral capsule) | 5 | 696 (7.129) (2.129) | 0 (NA) (NA) | 1 |
| 740596000 (cutaneous dose form) | 4 | 566 (5.783) (1.783) | 0 (NA) (NA) | 1 |
| 736847009 (prolonged-release) | 5 | 398 (6.96) (1.965) | 0 (NA) (NA) | 1 |
| 272673000 (bone structure) | 7 | 372 (10.288) (3.293) | 0 (NA) (NA) | 1 |
| 282721001 (fluoroscopic guidance) | 6 | 877 (6.523) (0.875) | 1 (7) (1) | 0.999 |
| 19830006 (blood group antibody) | 9 | 715 (11.292) (2.365) | 2 (5.5) (3.5) | 0.997 |
Top ten of LR classes according to the LSLD metric, showing their hierarchy in SNOMED CT
| LR class (label) | Hierarchy |
|---|---|
| 736849007 (conventional release) | Qualifier value |
| 385268001 (oral dose form) | Qualifier value |
| 421026006 (conventional release oral tablet) | Qualifier value |
| 385287007 (parenteral dose form) | Qualifier value |
| 420692007 (conventional release oral capsule) | Qualifier value |
| 740596000 (cutaneous dose form) | Qualifier value |
| 736847009 (prolonged-release) | Qualifier value |
| 272673000 (bone structure) | Body structure |
| 282721001 (fluoroscopic guidance) | Procedure |
| 19830006 (blood group antibody) | Substance |
Bottom ten of LR classes according to the LSLD metric, sorted by negative cases later
| LR Class (label) | LR class depth | Positive cases (depth) (distance) | Negative cases (depth) (distance) | Metric value |
|---|---|---|---|---|
| 42504009 (containing) | 6 | 0 (NA) (NA) | 20803 (6.678) (0.992) | 0 |
| 255503000 (entire) | 6 | 0 (NA) (NA) | 14678 (10.842) (4.844) | 0 |
| 18720000 (in) | 6 | 0 (NA) (NA) | 12671 (6.699) (1.228) | 0 |
| 20401003 (with) | 6 | 0 (NA) (NA) | 10616 (7.495) (1.769) | 0 |
| 260548002 (oral) | 7 | 1 (6) (1) | 7531 (6.59) (0.713) | 0 |
| 246176004 (form) | 7 | 0 (NA) (NA) | 6519 (5.737) (1.351) | 0 |
| 255333006 (conventional) | 5 | 0 (NA) (NA) | 6086 (7.082) (2.083) | 0 |
| 86495002 (for) | 6 | 0 (NA) (NA) | 5715 (6.95) (1.501) | 0 |
| 420862001 (on) | 5 | 0 (NA) (NA) | 4821 (6.873) (1.91) | 0 |
| 733021006 (system) | 5 | 0 (NA) (NA) | 4068 (6.481) (1.545) | 0 |
Bottom ten of LR classes according to the LSLD metric, showing their hierarchy in SNOMED CT
| LR Class (label) | Hierarchy |
|---|---|
| 42504009 (containing) | Qualifier value |
| 255503000 (entire) | Qualifier value |
| 18720000 (in) | SNOMED CT model component |
| 20401003 (with) | SNOMED CT model component |
| 260548002 (oral) | Qualifier value |
| 246176004 (form) | SNOMED CT model component |
| 255333006 (conventional) | Qualifier value |
| 86495002 (for) | Qualifier value |
| 420862001 (on) | Qualifier value |
| 733021006 (system) | Qualifier value |
Fig. 6Boxplots for number of synonyms per class (log scale) by hierarchy in the SNOMED CT July 2019 release
Number of LR classes per hierarchy, sorted by number of LR classes
| Hierarchy | Number of LR classes |
|---|---|
| Qualifier value | 177 |
| SNOMED CT model component | 43 |
| Substance | 39 |
| Body structure | 34 |
| Clinical finding | 32 |
| Procedure | 22 |
| Observable entity | 7 |
| Physical object | 5 |
| Social context | 5 |
| Environment or geographical location | 4 |
| Organism | 2 |
| Pharmaceutical/biologic product | 2 |
| Event | 1 |
| Physical force | 1 |
| Record artifact | 1 |
| Special concept | 1 |
| Specimen | 1 |
| Staging and scales | 1 |
| Situation with explicit context | 0 |
Systematic naming metric values for each SNOMED CT hierarchy, including the number of LR classes in the hierarchy, and the counts of both positive and negative cases, sorted by the metric value
| Hierarchy | LR classes | Positive cases | Negative cases | Metric value |
|---|---|---|---|---|
| Record artifact | 1 | 3 | 0 | 1.00 |
| Event | 1 | 75 | 2 | 0.97 |
| Pharmaceutical/biologic product | 2 | 730 | 22 | 0.97 |
| SNOMED CT model component | 43 | 217 | 88 | 0.71 |
| Organism | 2 | 1735 | 1042 | 0.62 |
| Specimen | 1 | 1024 | 676 | 0.60 |
| Qualifier value | 177 | 870 | 719 | 0.55 |
| Environment or geographical location | 4 | 143 | 133 | 0.52 |
| Physical force | 1 | 32 | 38 | 0.46 |
| Observable entity | 7 | 1443 | 1939 | 0.43 |
| Substance | 39 | 8496 | 39131 | 0.18 |
| Procedure | 22 | 15463 | 79584 | 0.16 |
| Clinical finding | 32 | 16556 | 90841 | 0.15 |
| Body structure | 34 | 1680 | 13527 | 0.11 |
| Physical object | 5 | 1452 | 13981 | 0.09 |
| Social context | 5 | 25 | 296 | 0.08 |
| Special concept | 1 | 0 | 0 | |
| Staging and scales | 1 | 0 | 0 |
Fig. 7Boxplots for the distribution of both LSLD and systematic naming metrics per LR class by hierarchy in the SNOMED CT July 2019 release
LSLD metric values for each SNOMED CT hierarchy, including the number of LR classes in the hierarchy, and the counts of both positive and negative cases, sorted by the metric value
| Hierarchy | LR classes | Positive cases | Negative cases | Metric value |
|---|---|---|---|---|
| Pharmaceutical/biologic product | 2 | 730 | 62 | 0.92 |
| Event | 1 | 475 | 56 | 0.89 |
| Specimen | 1 | 1462 | 389 | 0.79 |
| Body structure | 34 | 21979 | 6975 | 0.76 |
| Organism | 2 | 2203 | 1096 | 0.67 |
| Clinical finding | 32 | 17069 | 10199 | 0.63 |
| Procedure | 22 | 16322 | 11050 | 0.60 |
| Physical force | 1 | 254 | 215 | 0.54 |
| Physical object | 5 | 2287 | 2776 | 0.45 |
| Substance | 39 | 15404 | 22749 | 0.40 |
| Observable entity | 7 | 1902 | 2960 | 0.39 |
| Qualifier value | 177 | 58589 | 180390 | 0.25 |
| Environment or geographical location | 4 | 143 | 1650 | 0.08 |
| Social context | 5 | 87 | 6898 | 0.01 |
| Record artifact | 1 | 3 | 410 | 0.01 |
| SNOMED CT model component | 43 | 217 | 69645 | 0.00 |
| Special concept | 1 | 0 | 452 | 0.00 |
| Staging and scales | 1 | 0 | 354 | 0.00 |