| Literature DB >> 29745853 |
Xiaofeng Gong1, Jianping Jiang1, Zhongqu Duan1, Hui Lu2.
Abstract
BACKGROUND: Although rapid developed sequencing technologies make it possible for genotype data to be used in clinical diagnosis, it is still challenging for clinicians to understand the results of sequencing and make correct judgement based on them. Before this, diagnosis based on clinical features held a leading position. With the establishment of the Human Phenotype Ontology (HPO) and the enrichment of phenotype-disease annotations, there throws much more attention to the improvement of phenotype-based diagnosis.Entities:
Keywords: Diagnosis; Disease; Human phenotype ontology (HPO); Semantic similarity
Mesh:
Year: 2018 PMID: 29745853 PMCID: PMC5998886 DOI: 10.1186/s12859-018-2064-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Example of the structure of HPO. Term Abnormality of finger (HP:0001167) and all its ancestors are shown. Each term, representing a phenotypic abnormality, is related to parents terms by “is a” relationship
Fig. 2The workflow of disease diagnosis based on RelativeBestPair
Fig. 3The workflow of disease diagnosis based on the seven existing methods
Summary results of different methods on the four simulated datasets
| Dataset 1(Noise:-, Imprecision:-) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Resnik | Lin | JC | Rel | IC | GraphIC | Wang | RBP | |
| Top 1 | 1027 | 1016 | 1029 | 1018 | 1021 | 1029 | 1023 | 1031 |
| Top 5 | 1087 | 1071 | 1082 | 1071 | 1075 | 1079 | 1078 | 1091 |
| Top 10 | 1089 | 1077 | 1088 | 1077 | 1079 | 1081 | 1081 | 1095 |
| Top 20 | 1092 | 1078 | 1092 | 1078 | 1080 | 1083 | 1081 | 1096 |
| Dataset 2(Noise:+, Imprecision:-) | ||||||||
| Resnik | Lin | JC | Rel | IC | GraphIC | Wang | RBP | |
| Top 1 | 992 | 997 | 1036 | 996 | 1006 | 1031 | 1001 | 1030 |
| Top 5 | 1074 | 1059 | 1081 | 1063 | 1070 | 1077 | 1071 | 1089 |
| Top 10 | 1081 | 1069 | 1086 | 1071 | 1077 | 1080 | 1078 | 1094 |
| Top 20 | 1087 | 1074 | 1089 | 1076 | 1078 | 1083 | 1079 | 1095 |
| Dataset 3(Noise:-, Imprecision:+) | ||||||||
| Resnik | Lin | JC | Rel | IC | GraphIC | Wang | RBP | |
| Top 1 | 434 | 243 | 104 | 302 | 336 | 120 | 172 | 438 |
| Top 5 | 767 | 502 | 261 | 583 | 603 | 341 | 446 | 765 |
| Top 10 | 866 | 613 | 342 | 685 | 707 | 482 | 604 | 863 |
| Top 20 | 926 | 714 | 440 | 785 | 797 | 620 | 725 | 926 |
| Dataset 4(Noise:+, Imprecision:+) | ||||||||
| Resnik | Lin | JC | Rel | IC | GraphIC | Wang | RBP | |
| Top 1 | 183 | 130 | 97 | 143 | 162 | 73 | 77 | 370 |
| Top 5 | 453 | 327 | 239 | 383 | 406 | 252 | 263 | 694 |
| Top 10 | 579 | 452 | 319 | 509 | 533 | 393 | 384 | 786 |
| Top 20 | 703 | 570 | 420 | 640 | 657 | 540 | 535 | 860 |
Resnik the Resnik measure, Lin the Lin measure, JC the Jiang-Conrath measure, Rel the Relevance measure, IC the information coefficient measure, GraphIC the graph IC measure, Wang the Wang measure, RBP RelativeBestPair method
The seven existing measures are all implemented with one-sided search algorithm. The numbers represent the number of patients in 1100 cases that the true diseases are ranked within top 1, top 5, top 10 or top 20
Fig. 4Cumulative Distribution of the rank of the underlying diseases on the simulated dataset without noise and imprecision. The horizontal axis is the threshold for the disease rank. The vertical axis is the corresponding ratio of patients satisfying the ranking threshold
Fig. 5Cumulative Distribution of the rank of the underlying diseases on the simulated dataset with noise and without imprecision. The horizontal axis is the threshold for the disease rank. The vertical axis is the corresponding ratio of patients satisfying the ranking threshold
Fig. 6Cumulative Distribution of the rank of the underlying diseases on the simulated dataset without noise and with imprecision. The horizontal axis is the threshold for the disease rank. The vertical axis is the corresponding ratio of patients satisfying the ranking threshold
Fig. 7Cumulative Distribution of the rank of the underlying diseases on the simulated dataset with both noise and imprecision. The horizontal axis is the threshold for the disease rank. The vertical axis is the corresponding ratio of patients satisfying the ranking threshold