| Literature DB >> 31865916 |
Junning Gao1, Lizhi Liu1, Shuwei Yao1, Xiaodi Huang2, Hiroshi Mamitsuka3,4, Shanfeng Zhu5,6,7.
Abstract
BACKGROUND: As a standardized vocabulary of phenotypic abnormalities associated with human diseases, the Human Phenotype Ontology (HPO) has been widely used by researchers to annotate phenotypes of genes/proteins. For saving the cost and time spent on experiments, many computational approaches have been proposed. They are able to alleviate the problem to some extent, but their performances are still far from satisfactory.Entities:
Keywords: Hierarchical structure; Human phenotype ontology; Low-rank approximation; Protein-protein interaction networks
Mesh:
Year: 2019 PMID: 31865916 PMCID: PMC6927106 DOI: 10.1186/s12920-019-0625-1
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1An example of the HPO hierarchical tree. All parent-child relationships in HPO represent “is-a” relationships. X-linked inheritance, Abnormality of limbs, Phenotypic variability, and Age of death are examples for sub-ontologies Mode of inheritance, Organ abnormality, Clinical modifier, and Mortality/Aging, respectively
Fig. 2The framework of HPOAnnotator
Statistics of two datasets: Data-201706 and Data-201712
| Dataset | Data-201706 | Data-201712 |
|---|---|---|
| #Proteins | 3,459 | 3,644 |
| #HPO terms | 6,407 | 6,642 |
| #Leaves of HPO | 4,092 | 4,274 |
| #Annotations | 284,621 | 317,443 |
| Ave. #annotations per protein | 82.28 | 87.11 |
| Ave. #annotations per HPO term | 44.42 | 47.79 |
Fig. 3HPO terms are divided into five groups according to the number of proteins they annotate. The number of HPO terms per group (the left-hand side of each group) and the total number of annotations per group (the right-hand side of each group) are shown for Data-201706
Statistics of PPNs of Data-201706
| Dataset | #Annotations | #Connect-proteins |
|---|---|---|
| STRING | 214,410 | 3,342 |
| GeneMANIA | 206,900 | 3,385 |
| BioGRID | 10,752 | 2,725 |
| Reactome | 970 | 1,051 |
Fig. 4Each circle is a pair of two HPO terms in NHPO, with sharing the same numbers of proteins, say M. The y-axis is the average similarity score between two HPO terms over those proteins sharing the same M, and the x-axis is M, i.e., the number of shared proteins. The red line is fitted by a linear function
Fig. 5Each circle is a pair of two proteins in STRING PPN, with sharing the same numbers of HPO terms, say K. The x-axis is the average similarity score between two proteins over those HPO terms sharing the same K, and the y-axis is K, i.e., the number of shared HPO terms. The red line shows the trend, which is fitted by a polynomial function with the maximum degree of three
The results of the eight criteria obtained by 5 ×5-fold cross-validation over Data-201706 for the nine competing methods in total
| Method | AUC | AUPR | micro-AUC | micro-AUPR | macro-AUC | macro-AUPR | leaf-AUC | leaf-AUPR |
|---|---|---|---|---|---|---|---|---|
| LR | 0.775 | 0.028 | 0.760 | 0.072 | 0.579 | 0.052 | 0.532 | 0.020 |
| BiRW | 0.875 | 0.066 | 0.826 | 0.096 | 0.732 | 0.056 | 0.597 | 0.031 |
| OGL | 0.785 | 0.051 | 0.776 | 0.078 | 0.603 | 0.034 | 0.536 | 0.014 |
| DLP | 0.902 | 0.073 | 0.875 | 0.100 | 0.736 | 0.094 | 0.659 | 0.055 |
| NMF | 0.961 | 0.496 | 0.900 | 0.273 | 0.753 | 0.139 | 0.701 | 0.089 |
| NMF-PPN | 0.963 | 0.525 | 0.902 | 0.281 | 0.756 | 0.142 | 0.703 | 0.089 |
| NMF-NHPO | 0.965 | 0.541 | 0.903 | 0.290 | 0.756 | 0.144 | 0.702 | 0.094 |
| AiPA | 0.970 | 0.559 | 0.905 | 0.295 | 0.760 | 0.146 | 0.705 | 0.096 |
| HPOAnnotator | 0.760 |
Method performs best in terms of this evaluation metric are in boldface
Macro-AUC obtained by 5 ×5-fold cross-validation over Data-201706 for the nine competing methods
| Method | [1-10] | [11-30] | [31-100] | [101-300] | [ ≥301] |
|---|---|---|---|---|---|
| LR | 0.526 | 0.553 | 0.633 | 0.735 | 0.755 |
| BiRW | 0.608 | 0.854 | 0.875 | 0.835 | 0.815 |
| OGL | 0.586 | 0.670 | 0.788 | 0.812 | 0.806 |
| DLP | 0.622 | 0.880 | 0.914 | 0.863 | 0.834 |
| NMF | 0.649 | 0.908 | 0.942 | 0.948 | 0.911 |
| NMF-PPN | 0.651 | 0.911 | 0.943 | 0.951 | 0.916 |
| NMF-NHPO | 0.653 | 0.919 | 0.946 | 0.947 | 0.919 |
| AiPA | 0.654 | 0.922 | 0.943 | 0.957 | 0.931 |
| HPOAnnotator |
Method performs best in terms of this evaluation metric are in boldface
Macro-AUPR obtained by 5 ×5-fold cross-validation over Data-201706 for the nine competing methods
| Method | [1-10] | [11-30] | [31-100] | [101-300] | [ ≥301] |
|---|---|---|---|---|---|
| LR | 0.003 | 0.022 | 0.047 | 0.064 | 0.077 |
| BiRW | 0.023 | 0.119 | 0.164 | 0.175 | 0.155 |
| OGL | 0.005 | 0.024 | 0.056 | 0.087 | 0.132 |
| DLP | 0.028 | 0.135 | 0.182 | 0.223 | 0.182 |
| NMF | 0.032 | 0.204 | 0.362 | 0.470 | 0.428 |
| NMF-PPN | 0.032 | 0.206 | 0.365 | 0.479 | 0.440 |
| NMF-NHPO | 0.032 | 0.209 | 0.373 | 0.488 | 0.472 |
| AiPA | 0.033 | 0.216 | 0.369 | 0.500 | 0.482 |
| HPOAnnotator |
Method performs best in terms of this evaluation metric are in boldface
Performance of NMF-PPN with individual PPNs
| Data scource | AUPR | micro-AUPR | macro-AUPR |
|---|---|---|---|
| STRING | 0.525 | 0.281 | 0.142 |
| GeneMANIA | 0.523 | 0.280 | 0.143 |
| BioGRID | 0.517 | 0.280 | 0.140 |
| Reactome | 0.505 | 0.278 | 0.139 |
| All |
Results are for each PPN on the Data-201706. “All” means all four PPNs are used.
Method performs best in terms of this evaluation metric are in boldface
Training times of a single run in 5 ×5-fold cross-validation (average over 25 runs)
| Method | Computation time |
|---|---|
| LR | ∼3.5 hours |
| BiRW | ∼1.5 hours |
| OGL, DLP | ≥4 hours |
| NMF, NMF-PPN, NMF-NHPO, HPOAnnotator | ∼30 minutes |
AUC obtained by independent test using Data-201712
| Method | AUC |
|---|---|
| BiRW | 0.7971 |
| DLP | 0.8298 |
| OGL | 0.7322 |
| NMF | 0.8527 |
| NMF-PPN | 0.8923 |
| NMF-NHPO | 0.8959 |
| AiPA | 0.9187 |
| HPOAnnotator |
Method performs best in terms of this evaluation metric are in boldface
Seven true predictions out of the top 30 results (by HPOAnnotator) among all newly added annotations
| Rank | Protein ID | Protein name | Gene name | HPO ID | HPO name |
|---|---|---|---|---|---|
| 2 | Q02388 | Collagen alpha-1(VII) chain (Long-chain collagen) (LC collagen) | COL7A1 | HP:0001072 | Thickened skin |
| 7 | Q9UBX5 | Fibulin-5 | FBLN5 DANCE, UNQ184/PRO210 | HP:0012638 | Abnormality of nervous system physiology |
| 17 | Q9H5I5 | Piezo-type mechanosensitive ion channel component 2 (Protein FAM38B) | PIEZO2 | HP:0000422 | Abnormality of the nasal bridge |
| 19 | O43175 | D-3-phosphoglycerate dehydrogenase (3-PGDH) (EC 1.1.1.95) (2-oxoglutarate reductase) (EC 1.1.1.399) (Malate dehydrogenase) (EC 1.1.1.37) | PHGDH | HP:0000366 | Abnormality of the nose |
| 24 | Q02388 | Collagen alpha-1(VII) chain (Long-chain collagen) (LC collagen) | COL7A1 | HP:0000962 | Hyperkeratosis |
| 26 | Q04656 | Copper-transporting ATPase 1 (EC 3.6.3.54) (Copper pump 1) (Menkes disease-associated protein) | ATP7A | HP:0002650 | Scoliosis |
| 27 | P43026 | Growth/differentiation factor 5 | GDF5 BMP14, CDMP1 | HP:0005622 | Broad long bones |
These seven annotations were not in the training data (Data-201706), but found in the latest release (Data-201712)
Validation of false positives in the top 10 ranked predictions
| Gene name | Protein | HPO ID | HPO name | PubMed ID | Disease | Evidence |
|---|---|---|---|---|---|---|
| SH3TC2 | Q8TF17 | HP:0001315 | Reduced tendon reflexes | PMID: 14574644 | Charcot-Marie-Tooth disease 4C (CMT4C) | "Demyelinating neuropathies are characterized by severely reduced nerve conduction velocities (less than 38 m/sec), segmental demyelination and remyelination with onion bulb formations on nerve biopsy, slowly progressive distal muscle atrophy and weakness, |
| FOXG1 | P55316 | HP:0001263 | Global developmental delay | PMID: 19578037 | Rett syndrome congenital variant (RTTCV) | " |
Predicted HPO terms of P23434 (gene name: GCSH) by our four methods based on NMF
| Method | Predicted HPO terms | Correct |
|---|---|---|
| NMF | HP:0002079, HP:0001276, | 2 |
| NMF-NHPO | 3 | |
| NMF-PPN | 4 | |
| HPOAnnotator | 5 | |
| True | HP:0000007, HP:0000711, HP:0000718, HP:0001250, HP:0001298, HP:0001522, HP:0002086, HP:0002795, HP:0100247, HP:0100710 |
Correctly predicted HPO terms are in boldface
Performance results on Data-201706 focusing on the sub-ontology Organ abnormality
| Method | AUC | AUPR | micro-AUC | micro-AUPR | macro-AUC | macro-AUPR | leaf-AUC | leaf-AUPR |
|---|---|---|---|---|---|---|---|---|
| NMF-Organ | 0.955 | 0.507 | 0.883 | 0.250 | 0.745 | 0.127 | 0.682 | 0.077 |
| NMF-PPN-Organ | 0.962 | 0.555 | 0.889 | 0.276 | 0.755 | 0.144 | 0.701 | 0.091 |
| NMF-NHPO-Organ | 0.962 | 0.535 | 0.888 | 0.264 | 0.756 | 0.141 | 0.702 | 0.089 |
| NMF-All | 0.956 | 0.512 | 0.884 | 0.258 | 0.755 | 0.129 | 0.685 | 0.083 |
| NMF-PPN-All | 0.962 | 0.553 | 0.889 | 0.273 | 0.755 | 0.143 | 0.698 | 0.089 |
| NMF-NHPO-All | 0.962 | 0.556 | 0.889 | 0.274 | 0.755 | 0.144 | 0.699 | 0.090 |
| HPOAnnotator-All | 0.702 |
The first three rows of methods with “Organ” are trained by HPO terms on Organ abnormality, while the others with “All” are trained by considering all sub-ontologies.
Method performs best in terms of this evaluation metric are in boldface
Macro-AUC obtained by focusing on Organ abnormality
| Method | [1-10] | [11-30] | [31-100] | [101-300] | [ ≥301] |
|---|---|---|---|---|---|
| NMF-Organ | 0.645 | 0.897 | 0.924 | 0.945 | 0.922 |
| NMF-PPN-Organ | 0.654 | 0.921 | 0.943 | 0.956 | 0.934 |
| NMF-NHPO-Organ | 0.652 | 0.926 | 0.942 | ||
| NMF-All | 0.645 | 0.906 | 0.939 | 0.941 | 0.912 |
| NMF-PPN-All | 0.651 | 0.924 | 0.941 | 0.954 | 0.919 |
| NMF-NHPO-All | 0.650 | 0.928 | 0.940 | 0.953 | 0.935 |
| HPOAnnotator-All | 0.955 |
The three rows with “Organ" use only organ abnormality for training, while the others with “All" take all sub-ontologies for training.
Method performs best in terms of this evaluation metric are in boldface
Macro-AUPR obtained by focusing on Organ abnormality
| Method | [1-10] | [11-30] | [31-100] | [101-300] | [ ≥301] |
|---|---|---|---|---|---|
| NMF-Organ | 0.030 | 0.190 | 0.355 | 0.478 | 0.446 |
| NMF-PPN-Organ | 0.033 | 0.205 | 0.371 | ||
| NMF-NHPO-Organ | 0.033 | 0.207 | 0.369 | 0.490 | 0.485 |
| NMF-All | 0.031 | 0.193 | 0.363 | 0.477 | 0.449 |
| NMF-PPN-All | 0.032 | 0.204 | 0.370 | 0.486 | 0.460 |
| NMF-NHPO-All | 0.032 | 0.209 | 0.373 | 0.482 | 0.462 |
| HPOAnnotator-All | 0.493 | 0.485 |
The three rows with “Organ" use only organ abnormality for training, while the other four rows with “All" take all sub-ontologies for training.
Method performs best in terms of this evaluation metric are in boldface