| Literature DB >> 26818594 |
Wangshu Zhang1, Marcelo P Coba2,3, Fengzhu Sun4,5.
Abstract
BACKGROUND: Protein domains can be viewed as portable units of biological function that defines the functional properties of proteins. Therefore, if a protein is associated with a disease, protein domains might also be associated and define disease endophenotypes. However, knowledge about such domain-disease relationships is rarely available. Thus, identification of domains associated with human diseases would greatly improve our understanding of the mechanism of human complex diseases and further improve the prevention, diagnosis and treatment of these diseases.Entities:
Mesh:
Year: 2016 PMID: 26818594 PMCID: PMC4895779 DOI: 10.1186/s12918-015-0247-y
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1The relationships between the different data types. The histograms of the number of a proteins with respect to the number of domains the protein contains, b disease modules with respect to the number of diseases the module contains, c disease modules with respect to the number of proteins the module associates, d domains with respect to the number of proteins the domain associates, e diseases with respect to the number of disease modules the disease associates, and f proteins with respect to the number of disease modules the protein associates
Fig. 2Scheme for predicting domain-disease relationships. Nodes represent diseases/traits, modules, proteins and domains. An edge connecting two nodes represents a known association. Steps 1-7 demonstrate the procedure that, when predicting for a specific disease, how to obtain its candidate domains. Step 1: For a given disease T , all module(s) containing this disease (in the figure M ) and all the other diseases/traits contained in M are extracted. Step 2: Module(s) sharing at least one disease with module M (in the figure ) are extracted. Step 3: All the other diseases/traits in are included in the prediction scheme. Step 4: All proteins associated with the set of (in the figure P and P ) are extracted. Step 5: All domains contained in the set of {P , P } are included in the prediction scheme. Step 6: All proteins sharing domains with proteins {P , P } are included in the prediction scheme. Step 7: All the other domains in all proteins produced at Step 6 are included in the prediction scheme and the resulting set of domains are called candidate domains
Fig. 3Receiver Operating Characteristic (ROC) and Precision-Recall curves of the different approaches. The figure shows ROC curves (Subplot a) and precision-recall curves (Subplot b) of the Association, MLE (fp = 0, fn = 0.9), DPEA, Bayesian (u , u = 0, v , v = 1, and α = 2, β = 2), and PE (r = 100 %, and pw threshold ≤ 0.01) approaches, respectively. Based on both ROC and precision-recall curves, the three MLE based approaches including DPEA, MLE and Bayesian outperform PE and Association. The Bayesian approach performs slightly better than DPEA and MLE
Performance of the five approaches
| Approach | AUC | Accuracy | Mean rank ratio |
|---|---|---|---|
| Association | 0.7900 | 0.6256 | 0.2432 |
| MLE | 0.8407 | 0.7074 | 0.1914 |
| DPEA | 0.8309 | 0.6513 | 0.2177 |
| Bayesian | 0.8554 | 0.7289 | 0.1872 |
| PE | 0.8262 | 0.6525 | 0.2282 |
The AUC, accuracy and mean rank ratio of the Association, MLE (fp = 0, fn = 0.9), DPEA, Bayesian (u , u = 0, v , v = 1, and α = 2, β = 2), and PE (r = 100 %, and pw threshold ≤ 0.01) approaches for predicting domain-disease associations, respectively
Fig. 4Influences of the free parameters on the performance of the MLE and PE approaches. Horizontally, Subplots a-c illustrate the influences of false positive rate (fp) and false negative rate (fn) on AUC, accuracy and the mean rank ratio of the of the MLE approach; Subplots d-f illustrate the influences of reliable rate (r) and pw threshold on AUC, accuracy and the mean rank ratio of the PE approach. Vertically, Subplots a and d illustrate AUC scores; Subplots B and E illustrate accuracies; Subplots C and F illustrate mean rank ratios, respectively
Fig. 5Example for illustration of module effect. Nodes represent diseases with OMIM numbers, modules with index numbers, proteins with OMIM numbers and domains with Pfam numbers. Edges connecting two nodes represent a known association. Nodes with the same background colors represent 7 known associations between corresponding diseases and proteins. (i) Disease OMIM # corresponds to disease/trait names as: [179800]: RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL DOMINANT. [179830]: RENAL TUBULAR ACIDOSIS, PROXIMAL. [267200]: RENAL TUBULAR ACIDOSIS III. [267300]: RENAL TUBULAR ACIDOSIS, DISTAL, WITH PROGRESSIVE NERVE DEAFNESS. [602722]: RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL RECESSIVE; RTADR. [604278]: RENAL TUBULAR ACIDOSIS, PROXIMAL, WITH OCULAR ABNORMALITIES AND MENTAL RETARDATION. [259730]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 3; OPTB3. [259700]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 1; OPTB1. [259710]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 2; OPTB2. [259720]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 5; OPTB5. [600329]: OSTEOPETROSIS AND INFANTILE NEUROAXONAL DYSTROPHY. [611490]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 4; OPTB4. [611497]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 6; OPTB6. [612301]: OSTEOPETROSIS, AUTOSOMAL RECESSIVE 7; OPTB7. (ii) Protein OMIM # corresponds to gene name as: [164360]: ATP5A1; [114815]: CA8; [109270]: SLC4A1; [192132]: ATP6V1B1; [603345]: SLC4A4; [611492]: CA2; [602642]: TNFSF11; [153440]: LTA; [191160]: TNF; [300386]: CD40LG; [146690]: IMPDH1; [602727]: CLCN7; [602743]: PRKAG2; [604592]: TCIRG1; [611716]: ATP6V0A2. (iii) Domain Pfam # corresponds to domain name as: [PF00006]: ATP-synt_ab; [PF02874]: ATP-synt_ab_N; [PF00194]: Carb_anhydrase; [PF00955]: HCO3_cotransp; [PF07565]: Band_3_cyto; [PF00306]: ATP-synt_ab_C; [PF00229]: TNF; [PF00478]: IMPDH; [PF00571]: CBS; [PF00654]: Voltage_CLC; [PF01496]: V_ATPase_I
Effect of disease modularization on the performances of different approaches
| Disease (OMIM ID) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL DOMINANT (OMIM: 179800) | OSTEOPETROSIS, AUTOSOMAL RECESSIVE 3; OPTB3 (OMIM: 259730) | ||||||||||
| Association | MLE | DPEA | Bayesian | PE | Association | MLE | DPEA | Bayesian | PE | ||
| Domain | ATP-synt_ab | 0.5 | 0.0971 | 0.0041 | 0.6256 | 0.0476 | 0.25 | 0.0930 | 0.0024 | 0.6084 | 0.0476 |
| ATP-synt_ab_N | 0.5 | 0.0971 | 0.0041 | 0.6843 | 0.0476 | 0.25 | 0.0930 | 0.0024 | 0.6366 | 0.0476 | |
| Carb_anhydrase | 0.5 | 0.1566 | 0.062 | 0.8031 | 0.1646 | 0.5 | 0.1499 | 0.0541 | 0.7741 | 0.1210 | |
| HCO3_cotransp | 1 | 0.1300 | 0.0065 | 0.7569 | 0.0714 | 0.5 | 0.1086 | 0.0003 | 0.6438 | 0.0714 | |
| Band_3_cyto | 1 | 0.1300 | 0.0065 | 0.7078 | 0.0714 | 0.5 | 0.1086 | 0.0003 | 0.6859 | 0.0714 | |
| ATP-synt_ab_C | 1 | 0.1069 | 0.0002 | 0.6879 | 0.0476 | 0.5 | 0.0983 | 0.0001 | 0.6048 | 0.0476 | |
| TNF | 0 | 0.0806 | 0.1900 | 0.4526 | 0 | 0.125 | 0.0968 | 0.1906 | 0.5360 | 0.1250 | |
| IMPDH | 0 | 0.0874 | 0.007 | 0.4572 | 0 | 0 | 0.0878 | 0.0059 | 0.4301 | 0 | |
| CBS | 0 | 0.0851 | 0.0676 | 0.4381 | 0 | 0.1667 | 0.0910 | 0.0646 | 0.6035 | 0.06250 | |
| Voltage_CLC | 0 | 0.0874 | 0.0058 | 0.4547 | 0 | 0.5 | 0.1040 | 0.0061 | 0.6864 | 0.06250 | |
| V_ATPase_I | 0 | 0.0808 | 0.0823 | 0.4801 | 0 | 0.25 | 0.1103 | 0.0823 | 0.7057 | 0.1250 | |
The scores of predicting the associations between two diseases RENAL TUBULAR ACIDOSIS, DISTAL, AUTOSOMAL DOMINANT (OMIM: 179800) as well as OSTEOPETROSIS, AUTOSOMAL RECESSIVE 3; OPTB3 (OMIM: 259730)), and their candidate domains ATP-synt_ab (PF00006), ATP-synt_ab_N (PF02874), Carb_anhydrase (PF00194), HCO3_cotransp (PF00955), Band_3_cyto (PF07565), ATP-synt_ab_C (PF00306), TNF (PF00229), IMPDH (PF00478), CBS (PF00571), Voltage_CLC (PF00654) as well as V_ATPase_I (PF01496), in terms of the Association, MLE (fp = 0, fn = 0.9), DPEA, Bayesian (u , u = 0, v , v = 1, and α = 2, β = 2), and PE (r = 100 %, and pw threshold ≤ 0.01) approaches, respectively
Novel predictions of domain-disease and gene-disease associations
| Rank | Disease | OMIMd | Module Index | Domain | Pfam | Gene | OMIMg |
|---|---|---|---|---|---|---|---|
| 1 | APNEA, OBSTRUCTIVE SLEEP | 107650 | 152 | Acetyltransf_1 | PF00583 | NAA10 | 300013 |
| AANAT | 600950 | ||||||
| 2 | ARTERIES, ANOMALIES OF | 108000 | 182 | Sugar_tr | PF00083 | SLC2A1 | 138140 |
| SLC2A2 | 138160 | ||||||
| SLC2A9 | 606142 | ||||||
| SLC2A10 | 606145 | ||||||
| 3 | ATRESIA OF EXTERNAL AUDITORY CANAL AND CONDUCTIVE DEAFNESS | 108760 | 133 | zf-C2H2_2 | PF12756 | TSHZ1 | 614427 |
| 4 | CELIAC ARTERY STENOSIS FROM COMPRESSION BY MEDIAN ARCUATE LIGAMENT OF DIAPHRAGM | 116870 | 182 | Sugar_tr | PF00083 | SLC2A1 | 138140 |
| SLC2A2 | 138160 | ||||||
| SLC2A9 | 606142 | ||||||
| SLC2A10 | 606145 | ||||||
|
|
|
|
|
|
|
|
|
| 6 | CORNEAL DYSTROPHY, EPITHELIAL BASEMENT MEMBRANE; EBMD | 121820 | 77 | Fasciclin | PF02469 | TGFBI |
|
| 7 | CORNEAL DYSTROPHY, GROENOUW TYPE I; CDGG1 | 121900 | 77 | Fasciclin | PF02469 | TGFBI |
|
|
|
|
|
|
|
|
|
|
| 9 | CORNEAL DYSTROPHY, LATTICE TYPE I; LCD1 | 122200 | 77 | Fasciclin | PF02469 | TGFBI |
|
| 10 | CORONARY ARTERY DISSECTION, SPONTANEOUS | 122455 | 182 | Sugar_tr | PF00083 | SLC2A1 | 138140 |
| SLC2A2 | 138160 | ||||||
| SLC2A9 | 606142 | ||||||
| SLC2A10 | 606145 | ||||||
| 11 | DEAFNESS, CONDUCTIVE STAPEDIAL, WITH EAR MALFORMATION AND FACIAL PALSY | 124490 | 137 | Endothelin | PF00322 | EDN1 | 131240 |
| EDN3 | 131242 | ||||||
| 12 | EAR FOLDING | 128500 | 137 | Endothelin | PF00322 | EDN1 | 131240 |
| EDN3 | 131242 | ||||||
| 13 | PREAURICULAR FISTULAE, CONGENITAL | 128700 | 137 | Endothelin | PF00322 | EDN1 | 131240 |
| EDN3 | 131242 | ||||||
| 14 | EAR PITS, POSTERIOR HELICAL | 128710 | 137 | Endothelin | PF00322 | EDN1 | 131240 |
| EDN3 | 131242 | ||||||
|
|
|
|
|
|
|
|
|
|
|
| ||||||
| 16 | EXTERNAL AUDITORY CANAL, BILATERAL ATRESIA OF, WITH CONGENITAL VERTICAL TALUS | 133705 | 133 | zf-C2H2_2 | PF12756 | TSHZ1 | 614427 |
| 17 | FIBROMUSCULAR DYSPLASIA OF ARTERIES | 135580 | 182 | Sugar_tr | PF00083 | SLC2A1 | 138140 |
| SLC2A2 | 138160 | ||||||
| SLC2A9 | 606142 | ||||||
| SLC2A10 | 606145 | ||||||
| 18 | GLAUCOMA AND SLEEP APNEA | 137763 | 152 | Acetyltransf_1 | PF00583 | NAA10 | 300013 |
| AANAT | 600950 | ||||||
| 19 | LUBS X-LINKED MENTAL RETARDATION SYNDROME; MRXSL | 300260 | 66 | MBD | PF01429 | MECP2 |
|
| 20 | ALPHA-THALASSEMIA | 604131 | 188 | Globin | PF00042 | HBA1 |
|
| HBA2 |
| ||||||
| HBB | 141900 | ||||||
|
|
|
|
|
|
|
|
|
| 22 | INTERNAL CAROTID ARTERY, SPONTANEOUS DISSECTION OF | 147820 | 182 | Sugar_tr | PF00083 | SLC2A1 | 138140 |
| SLC2A2 | 138160 | ||||||
| SLC2A9 | 606142 | ||||||
| SLC2A10 | 606145 | ||||||
| 23 | LITHIUM TRANSPORT | 152420 | 180 | SNF | PF00209 | SLC6A3 | 126455 |
| SLC6A19 | 608893 | ||||||
| 24 | MACULAR DYSTROPHY, FENESTRATED SHEEN TYPE | 153890 | 77 | Fasciclin | PF02469 | TGFBI | 601692 |
| 25 | MULLERIAN APLASIA AND HYPERANDROGENISM | 158330 | 199 | wnt | PF00110 | WNT5A | 164975 |
| WNT10B | 601906 | ||||||
| WNT4 |
| ||||||
| WNT10A | 606268 | ||||||
| 26 | OSSIFIED EAR CARTILAGES | 165670 | 137 | Endothelin | PF00322 | EDN1 | 131240 |
| EDN3 | 131242 | ||||||
| 27 | ENCHONDROMATOSIS, MULTIPLE, OLLIER TYPE | 166000 | 166 | Iso_dh | PF00180 | IDH2 | 147650 |
| IDH1 | 147700 | ||||||
| IDH3B | 604526 | ||||||
|
|
|
|
|
|
|
|
|
| 29 | RADIAL RAY HYPOPLASIA WITH CHOANAL ATRESIA | 179270 | 140 | LMBR1 | PF04791 | LMBR1 | 605522 |
|
|
|
|
|
|
|
|
|
|
|
| ||||||
| 31 | THUMB DEFORMITY | 188100 | 140 | LMBR1 | PF04791 | LMBR1 | 605522 |
| 32 | THYROID HORMONE PLASMA MEMBRANE TRANSPORT DEFECT | 188560 | 180 | SNF | PF00209 | SLC6A3 | 126455 |
| SLC6A19 | 608893 | ||||||
| 33 | TRACHEOESOPHAGEAL FISTULA WITH OR WITHOUT ESOPHAGEAL ATRESIA | 189960 | 133, 181 | zf-C2H2_2 | PF12756 | TSHZ1 | 614427 |
| 34 | TRIGGER THUMB | 190410 | 140 | LMBR1 | PF04791 | LMBR1 | 605522 |
| 35 | TRIPHALANGEAL THUMB WITH DOUBLE PHALANGES | 190500 | 140 | LMBR1 | PF04791 | LMBR1 | 605522 |
| 36 | TRIPHALANGEAL THUMB, NONOPPOSABLE | 190600 | 140 | LMBR1 | PF04791 | LMBR1 | 605522 |
| 37 | UTERINE ANOMALIES | 192000 | 199 | wnt | PF00110 | WNT5A | 164975 |
| WNT10B | 601906 | ||||||
| WNT4 | 603490 | ||||||
| WNT10A | 606268 | ||||||
| 38 | UTERUS BICORNIS BICOLLIS WITH PARTIAL VAGINAL SEPTUM AND UNILATERAL HEMATOCOLPOS WITH IPSILATERAL RENAL AGENESIS | 192050 | 199 | wnt | PF00110 | WNT5A | 164975 |
| WNT10B | 601906 | ||||||
| WNT4 | 603490 | ||||||
| WNT10A | 606268 | ||||||
| 39 | ACRORENAL-MANDIBULAR SYNDROME | 200980 | 199 | wnt | PF00110 | WNT5A | 164975 |
| WNT10B | 601906 | ||||||
| WNT4 | 603490 | ||||||
| WNT10A | 606268 | ||||||
| 40 | ADDUCTED THUMBS SYNDROME | 201550 | 140 | LMBR1 | PF04791 | LMBR1 | 605522 |
| 41 | APNEA, CENTRAL SLEEP | 207720 | 152 | Acetyltransf_1 | PF00583 | NAA10 | 300013 |
| AANAT | 600950 | ||||||
| 42 | ARTERIAL TORTUOSITY SYNDROME; ATS | 208050 | 182 | Sugar_tr | PF00083 | SLC2A1 | 138140 |
| SLC2A2 | 138160 | ||||||
| SLC2A9 | 606142 | ||||||
| SLC2A10 |
| ||||||
| 43 | AURAL ATRESIA, MULTIPLE CONGENITAL ANOMALIES, AND MENTAL RETARDATION | 209770 | 133, 181 | zf-C2H2_2 | PF12756 | TSHZ1 | 614427 |
| 44 | BILIARY ATRESIA, EXTRAHEPATIC; EHBA | 210500 | 133, 181 | zf-C2H2_2 | PF12756 | TSHZ1 | 614427 |
| 45 | CITRULLINE TRANSPORT DEFECT | 215720 | 180 | SNF | PF00209 | SLC6A3 | 126455 |
| SLC6A19 | 608893 | ||||||
| 46 | CENTRAL CLOUDY DYSTROPHY OF FRANCOIS; CCDF | 217600 | 77 | Fasciclin | PF02469 | TGFBI | 601692 |
| 47 | DEAFNESS, CONDUCTIVE, WITH MALFORMED EXTERNAL EAR | 221300 | 137 | Endothelin | PF00322 | EDN1 | 131240 |
| EDN3 | 131242 | ||||||
|
|
|
|
|
|
|
|
|
|
|
| ||||||
| 49 | DUODENAL ATRESIA | 223400 | 181 | zf-C2H2_2 | PF12756 | TSHZ1 | 614427 |
| 50 | HARTNUP DISORDER; HND | 234500 | 180 | SNF | PF00209 | SLC6A3 | 126455 |
| SLC6A19 |
|
“Rank” is the rank of predicted domain-disease associations in terms of Bayesian scores; “Disease” is the name of the disease phenotype, “OMIMd” is ID of disease phenotype in the OMIM database; “Module Index” is the index of module including the disease; “Domain” is the name of domain; “Pfam” is the domain ID in the Pfam database; “Gene” is mapped gene from corresponding domain; and “OMIMg” is the gene ID in the OMIM database. The bold rows represent known domain-disease associations compiled by the Ensembl BioMart tool. The italic rows represent domain-disease associations that are unknown in our study but have at least one known genes in the OMIM database. The bold italic elements in the “OMIMg” column represent that the predicted genes are known as disease genes in the OMIM database
Strengths and weaknesses of the approaches
| Strength | Weakness | |
|---|---|---|
| Association | • Fast in implementation | • Unsatisfactory in predictive power |
| MLE | • Good in predictive power | • Need to pre-determine parameters |
| DPEA | • Satisfactory in predictive power | • Slow in implementation when the number of candidate domain-disease associations is large |
| Bayesian | • Excellent in predictive power | • Slow in implementation when the number of candidate domain-disease associations is large |
| PE | • Satisfactory in predictive power | • Slow in implementation when the number of candidate domain-disease associations is large |