| Literature DB >> 27777627 |
Guangming Xing1, Guo-Qiang Zhang2, Licong Cui3.
Abstract
BACKGROUND: Redundant hierarchical relations refer to such patterns as two paths from one concept to another, one with length one (direct) and the other with length greater than one (indirect). Each redundant relation represents a possibly unintended defect that needs to be corrected in the ontology quality assurance process. Detecting and eliminating redundant relations would help improve the results of all methods relying on the relevant ontological systems as knowledge source, such as the computation of semantic distance between concepts and for ontology matching and alignment.Entities:
Keywords: Dynamic programming; Gene ontology; Redundant relations; SNOMED CT; UMLS
Year: 2016 PMID: 27777627 PMCID: PMC5057496 DOI: 10.1186/s13040-016-0110-8
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Indirect path from concept A (hormone secretion) to concept F (biological process) in GO (2015-05-01 version)
| GO Id | Relation | GO Id | ||
|---|---|---|---|---|
| A | GO:0046879 | is-a | B | GO:0009914 |
| B | GO:0009914 | is-a | C | GO:0010817 |
| C | GO:0010817 | is-a | D | GO:0065008 |
| D | GO:0065008 | is-a | E | GO:0065007 |
| E | GO:0065007 | is-a | F | GO:0008150 |
Fig. 1Graphical rendering of Table 1 and a direct edge between A and F. Directed edges represent “is-a” relation
Summary of the results for 5 versions of SNOMED CT
| Version | # Concepts | # is-a Relations | TC | RR | RR % | T(ms) |
|---|---|---|---|---|---|---|
| 2013-09-01 | 300,485 | 447,442 | 5,226,630 | 240 | 0.00459 | 10,472 |
| 2014-03-01 | 300,409 | 446,603 | 5,188,221 | 277 | 0.00534 | 10,335 |
| 2014-09-01 | 302,902 | 449,564 | 5,222,506 | 305 | 0.00584 | 10,074 |
| 2015-03-01 | 315,904 | 467,799 | 5,408,010 | 235 | 0.00435 | 15,264 |
| 2015-09-01 | 320,911 | 476,226 | 5,511,334 | 372 | 0.00675 | 16,077 |
TC: number of transitive closure pairs, RR: number of redundant is-a relations, T(ms): time taken in milliseconds
Fig. 2Basic mechanism for updating the D-set and the I-set of a node
Fig. 3Illustration of Algorithm 1
Summary of the results for 10 versions of Gene Ontology
| Version | # Concepts | # is-a Relations | TC | RR | RR% | T(ms) |
|---|---|---|---|---|---|---|
| 2014-08-01 | 41,436 | 66,544 | 517,092 | 497 | 0.0961 | 1,372 |
| 2014-09-01 | 41,694 | 66,995 | 522,741 | 502 | 0.0960 | 1,472 |
| 2014-10-01 | 41,867 | 67,536 | 528,821 | 631 | 0.1193 | 1,455 |
| 2014-11-01 | 42,012 | 69,300 | 541,718 | 1,031 | 0.1903 | 1,497 |
| 2014-12-01 | 42,189 | 69,887 | 545,168 | 1,193 | 0.2188 | 1,425 |
| 2015-01-01 | 42,329 | 70,272 | 544,210 | 1,277 | 0.2347 | 1,510 |
| 2015-02-01 | 42,466 | 70,724 | 546,158 | 1,420 | 0.2600 | 1,549 |
| 2015-03-01 | 42,588 | 71,032 | 548,006 | 1,463 | 0.2670 | 1,542 |
| 2015-04-01 | 42,805 | 71,549 | 552,367 | 1,552 | 0.2810 | 1,437 |
| 2015-05-01 | 42,979 | 71,954 | 557,550 | 1,609 | 0.2886 | 1,538 |
TC: number of transitive closure pairs, RR: number of redundant is-a relations, RR%: percentage of redundant is-a relations among transitive closure pairs, T(ms): time taken in milliseconds
Numbers of redundant is-a relations in 5 versions of SNOMED CT regarding to the length of the indirect path. l represents the number of redundant is-a relations in length of i regarding to the indirect path
| Version |
|
|
| Total |
|---|---|---|---|---|
| 2013-09-01 | 233 | 7 | 0 | 240 |
| 2014-03-01 | 264 | 11 | 2 | 277 |
| 2014-09-01 | 291 | 13 | 1 | 305 |
| 2015-03-01 | 224 | 10 | 1 | 235 |
| 2015-09-01 | 358 | 13 | 1 | 372 |
Numbers of redundant is-a relations in 10 different versions of Gene Ontology regarding to the length of the indirect path. l represents the number of redundant is-a relations in length of i regarding to the indirect path
| Version |
|
|
|
|
|
| Total |
|---|---|---|---|---|---|---|---|
| 2014-08-01 | 421 | 40 | 23 | 11 | 1 | 1 | 497 |
| 2014-09-01 | 419 | 44 | 24 | 13 | 1 | 1 | 502 |
| 2014-10-01 | 512 | 72 | 29 | 15 | 2 | 1 | 631 |
| 2014-11-01 | 771 | 164 | 64 | 27 | 4 | 1 | 1,031 |
| 2014-12-01 | 921 | 174 | 63 | 27 | 7 | 1 | 1,193 |
| 2015-01-01 | 980 | 202 | 62 | 24 | 8 | 1 | 1,277 |
| 2015-02-01 | 1,098 | 220 | 68 | 24 | 8 | 2 | 1,420 |
| 2015-03-01 | 1,119 | 237 | 72 | 25 | 8 | 2 | 1,463 |
| 2015-04-01 | 1,198 | 238 | 78 | 29 | 7 | 2 | 1,552 |
| 2015-05-01 | 1,238 | 255 | 78 | 29 | 7 | 2 | 1,609 |
Summary of the results for source vocabularies in UMLS (2015AB release) with redundant relations
| Version | # AUIs | # is-a Relations | TC | RR | RR% | T(ms) |
|---|---|---|---|---|---|---|
| SNOMEDCT_US | 846,444 | 476,055 | 5,511,334 | 372 | 0.00675 | 28,977 |
| SNOMEDCT_VET | 85,939 | 19,832 | 29,688 | 7 | 0.024 | 954 |
| GO | 148,900 | 71,687 | 554,859 | 1,576 | 0.2840 | 9,128 |
| NCI | 270,618 | 119,707 | 701,986 | 20 | 0.0028 | 5,881 |
| HPO | 18,175 | 14,762 | 117,366 | 101 | 0.0861 | 502 |
| UMD | 34,124 | 10,750 | 37,732 | 20 | 0.0530 | 1,906 |
AUI: Atom Unique Identifier, TC: number of transitive closure pairs, RR: number of redundant is-a relations, T(ms): time taken in milliseconds
Summary of the results for randomly generated ontologies
| ( | # Layers | TC | RR | RR% | T(ms) | ||
|---|---|---|---|---|---|---|---|
| (500,000, 550,000, 2, 5) | 12 | 5,957,690 | 6 | 0.0001 | 5,847 | ||
| (500,000, 550,000, 2, 10) | 8 | 4,036,792 | 12 | 0.0003 | 5,212 | ||
| (500,000, 550,000, 2, 20) | 6 | 3,000,751 | 7 | 0.0002 | 5,637 | ||
| (500,000, 600,000, 2, 20) | 6 | 3,567,691 | 13 | 0.00036 | 5,449 | ||
| (500,000, 700,000, 2, 20) | 6 | 4,190,494 | 34 | 0.00081 | 5,332 | ||
| (500,000, 900,000, 2, 20) | 6 | 7,104,749 | 109 | 0.00153 | 8,183 | ||
| (500,000, 1,300,000, 2, 20) | 6 | 25,934,499 | 1,404 | 0.0054 | 33,235 | ||
| (1,000,000, 1,200,000, 2, 20) | 7 | 7,071,813 | 21 | 0.0003 | 18,051 | ||
| (1,000,000, 1,400,000, 2, 20) | 7 | 8,694,703 | 133 | 0.0015 | 11,582 | ||
N: number of concepts, E: number of is-a relations, C min/C max: minimum/maximum number of children a node can have, TC: number of transitive closure pairs, RR: number of redundant is-a relations, T(ms): time taken in milliseconds
Fig. 4Change rate of redundant is-a relations compared to the change rate of all is-a relations during SNOMED CT evolution
Fig. 5Change rate of redundant is-a relations compared to the change rate of all is-a relations during Gene Ontology evolution
Fig. 6A visualized example of redundant is-a relation in SNOMED CT
Fig. 7A visualized example of redundant is-a relation in GO
Numbers of direct edge and indirect edge that should be removed for 30 redundant is-a relations in SNOMED CT and 50 in Gene Ontology
| Remove direct edge | Remove indirect edge | |
|---|---|---|
| SNOMED CT | 24 (80 %) | 6 (20 %) |
| Gene Ontology | 45 (90 %) | 5 (10 %) |