| Literature DB >> 22735708 |
TaeHyun Hwang1, Gowtham Atluri, MaoQiang Xie, Sanjoy Dey, Changjin Hong, Vipin Kumar, Rui Kuang.
Abstract
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype-gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype-gene association matrix under the prior knowledge from phenotype similarity network and protein-protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype-gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein-protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.Entities:
Mesh:
Year: 2012 PMID: 22735708 PMCID: PMC3479160 DOI: 10.1093/nar/gks615
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.NMTF of disease phenotype–gene associations. The phenotype–gene association matrix X is factorized into products of three matrices, phenotype cluster membership F, gene cluster membership G and phenotype cluster–gene cluster association S for supervised co-clustering of phenotypes and genes. Label information for the disease classes and the pathways are available for a small number of phenotypes and genes. Prior knowledge is also introduced from phenotype similarity network and gene network. For better visualization, different colors are used to distinguish the phenotypes and the genes in different clusters.
Notations
| Notation | Definition |
|---|---|
| Number of disease phenotypes | |
| Number of genes | |
| Number of phenotype clusters (e.g. classes) | |
| Number of gene clusters (e.g. pathways) | |
| Disease phenotype–gene association matrix ( | |
| Phenotype cluster membership ( | |
| Phenotype cluster–gene cluster association Matrix ( | |
| Gene cluster membership ( | |
| Annotated phenotype cluster membership ( | |
| Annotated gene cluster membership ( | |
| Disease phenotype similarity network ( | |
| Gene interaction network ( |
Performance of phenotype classification in leave-one-out cross-validation
| Compared methods | Avg. rank | win/draw/loss ( |
|---|---|---|
| R-NMTF versus NMTF | 300/154/136 (4.617e−13) | |
| versus SVM-linear | versus 6.103 | 308/154/128 (3.693e−12) |
| versus SVM-rbf | versus 5.037 | 268/213/109 (1.497e−4) |
| versus LP | versus 3.700 | 161/388/41 (9.145e−05) |
This table reports the average rank of the target class out of the 20 classes, and the pairwise ‘win/draw/loss’ comparisons of each leave-one-out case between R-NMTF and the baselines, SVMs with linear and rbf kernels, NMTF and LP. The last column reports the statistical significance of the ranking results using Wilcoxon rank sum test.
Figure 2.Performance of phenotype classification in leave-one-out cross-validation. In this plot, the x-axis represents the cutoffs of the rank of the target disease class out of the 20 classes. The y-axis represents the faction of phenotypes with their target disease class ranked within a certain cutoff. For example, R-NMTF ranked the target class of >60% of the phenotypes within Rank 2, while the other methods only ranked around or <50% within the same rank cutoff.
Disease phenotype classification results by disease classes
| Disease classes (No) | Avg. rank | ||||
|---|---|---|---|---|---|
| R-NMTF | NMTF | SVM- linear | SVM-rbf | LP | |
| Bone (23) | 8.5 | 4.7 | 7.6 | 4.7 | |
| Cancer (53) | 5.0 | 4.2 | 2.0 | 1.9 | |
| Cardiovascular (28) | 10.1 | 10.0 | 6.0 | 4.3 | |
| Connective tissue (16) | 8.9 | 10.6 | 11.4 | 11.1 | |
| Dermatological (32) | 4.4 | 3.0 | 4.0 | 2.5 | |
| Developmental (28) | 5.7 | 9.6 | 9.2 | 6.5 | |
| Ear,Nose,Throat (3) | 20.0 | 20.0 | 15.0 | 16.7 | |
| Endocrine (30) | 5.4 | 13.4 | 5.4 | 4.9 | |
| Gastrointestinal (12) | 9.7 | 7.8 | 9.7 | 11.7 | |
| Hematological (30) | 3.5 | 9.5 | 6.9 | 3.8 | |
| Immunological (31) | 10.0 | 8.1 | 5.2 | 2.8 | |
| Metabolic (84) | 2.2 | 4.1 | 2.2 | ||
| Muscular (18) | 5.7 | 12.2 | 9.1 | 7. 3 | |
| Neurological (80) | 6.2 | 5.8 | 2.7 | 1.4 | |
| Nutritional (2) | 16.0 | 3.0 | 19.0 | 20 | |
| Ophthamological (35) | 4.2 | 2.5 | 2.9 | 2.5 | |
| Psychiatric (9) | 7.9 | 8.0 | 11.4 | 14.8 | |
| Renal (23) | 4.1 | 4.4 | 6.8 | 4.9 | |
| Respiratory (7) | 15.4 | 14.1 | 15.7 | ||
| Skeletal (46) | 3.3 | 4.8 | 5.2 | 1.8 | |
This table reports the ranking performance by R-NMTF, SVM with linear and rbf kernels, NMTF and LP in each disease class in the leave-one-out cross-validation. The number of phenotypes in each disease class is reported in the parentheses.
Performance of disease gene discovery in leave-one-out cross-validation
| Compared methods | Avg. AUC | win/draw/loss ( |
|---|---|---|
| R-NMTF versus LP | 526/1/63 (5.4482e−113) |
This table reports the average AUC for disease gene classification, and the pairwise ‘win/draw/loss’ comparisons of each leave-one-out case between R-NMTF and LP. The last column reports the statistical significance of ranking results using Wilcoxon rank sum test.
Figure 3.Performance of disease gene discovery in leave-one-out cross-validation. In the plot, the x-axis represents AUC cutoffs. The y-axis represents the faction of disease genes with a AUC score above the cutoffs. For example, R-NMTF achieved AUCs above 0.9 for >80% of the genes, while LP only achieved the same level of AUC for 20% of the genes.
Figure 4.HPO phenotype similarities by clusters. The HPO similarity matrix of the phenotypes are display as a heap map. The phenotypes are grouped into 20 clusters with the disease classes annotated below.
New disease phenotypes in 20 disease classes
| Disease classes | New disease phenotypes | ||||
|---|---|---|---|---|---|
| Bone | Achondrogenesis, Type III ( | Canine Teeth (Omim:114600) | Dens Evaginatus ( | Dental Noneruption ( | Dentin Dysplasia, Type I( |
| Cancer | Fanconi Anemia ( | Juvenile Myelomonocytic Leukemia | Breast Cancer | Proteus Syndrome ( | Bannayan-Riley-Ruvalcaba Syndrome ( |
| Cardiovascular | Cardiomyopathy (Omim:192600) | Atrial Standstill ( | Cardiomyopathy, Dilated, 1E | Long Qt Syndrome 3 ( | Sudden Infant Death Syndrome ( |
| Connective tissue | Arthritis, Sacroiliac ( | Spondyloarthropathy (Omim:183840) | Slipped Femoral Capital Epiphyses ( | Facial Asymmetry ( | Cervical Rib |
| Dermatological | Deafness; Dfna3 ( | Epidermolysis Bullosa (Omim:131800) | Pachyonychia Congenita, Type 1 ( | Epidermolysis Bullosa Herpetiformis ( | Epidermolysis Bullosa Simplex, Koebner Type ( |
| Developmental | Leucine Transport, High | Uterine Anomalies ( | Testes, Rudimentary ( | Oligosynaptic Infertility | Hypospadias, Autosomal ( |
| Ear,Nose,Throat | Otosclerosis 3 ( | Otosclerosis 2 ( | Otosclerosis 5 ( | Periodontitis, Aggressive, 2 | Red Cell Permeability Defect |
| Endocrine | Diabetes Mellitus | Hypoglycemia (Omim:601820) ( | Polycystic Ovary Syndrome 1 ( | Diabetes Mellitus, Transient Neonatal | Goiter, Multinodular 2 ( |
| Gastrointestinal | Cholestasis2 (Omim:605479) ( | Bile Acid, Synthetic Defect Of | Cholestasis; Pfic2 (Omim:601847) ( | Cholestasis; Pfic3 (Omim:602347) ( | Pancreatitis, Hereditary ( |
| Hematological | Anemia ( | Hyperheparinemia | Sideroblastic Anemia, Autosomal ( | Platelet Groups–ko System | Anemia, Familial Pyridoxine-Responsive ( |
| Immunological | Herpesvirus Sensitivity ( | Interleukin (Omim:243110) ( | Panbronchiolitis, Diffuse ( | Immune Deficiency Disease | Allergic Bronchopulmonary Aspergillosis ( |
| Metabolic | Immunoglobulin D Level In Plasma | Magnesium, Elevated Red Cell | Flood Factor Deficiency | Citrulline Transport Defect | Amobarbital, Deficient N-Hydroxylation of |
| Muscular | Palmomental Reflex | Myopathy (Omim:255100) | Muscular Hypoplasia | Pleoconial Myopathy With Salt Craving | Myopathy, Congenital |
| Neurological | Amyotrophic Lateral Sclerosis 1 | Amyotrophic Lateral Sclerosis 2 | Alzheimer Disease 2 | Prion Disease (Omim:603218) | Frontotemporal Dementia (Omim:607485) |
| Nutritional | Bulimia Nervosa | Red Cell Permeability Defect | Labia Minora (Omim:149600) ( | Schizophrenia 9 ( | Amyotrophic Lateral Sclerosis 6 ( |
| Ophthamological | Cone Dystrophy 3 | Cone-Rod Dystrophy 3 | Leber Congenital Amaurosis | Cone-Rod Dystrophy 6 | Retinitis Pigmentosa 19 |
| Psychiatric | Fg Syndrome 2 ( | Fg Syndrome 3 ( | Schizophrenia 5 | Cerebral Angiopathy, Dysphoric ( | Gambling, Pathologic |
| Renal | Nephrotic Syndrome, Type 2 ( | Hypertensive Nephropathy ( | Enuresis, Nocturnal, 2 ( | Enuresis, Nocturnal, 1 ( | Blue Diaper Syndrome |
| Respiratory | Hemangiomatosis | Respiratory Underresponsiveness | Emphysema (Omim:130700) | Asthma, Short Stature, and Elevated Iga | Asthma-Related Traits, Susceptibility To, 1 |
| Skeletal | Brachydactyly, Mononen Type | Tibial Hemimelia ( | Acropectoral Syndrome | Syndactyly, Type IV | Spondyloepimetaphyseal Dysplasia, Irapa Type |
The 5 most confident predictions of phenotypes in each disease class are reported.
New member genes of KEGG disease pathways
| Kegg disease pathways | New member genes | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Hsa04930: Type II Diabetes Mellitus | KCNJ8 ( | EFHC1 | ADIPOR2 ( | ABCC9 | LDHA | CDH13 ( | ENSA | CRYBB1 | CASR | KCNJ2 |
| Hsa04940: Type I Diabetes Mellitus | CKAP5 | SPTBN4 | PTPRT | SNX19 | CD74 | LILRB1 ( | LILRB2 | GAST | LRRC23 | CTLA4 ( |
| Hsa04950: Maturity Onset Diabetes of the Young | OLIG2 | EN2 | PCSK1 | PNRC1 | PCSK2 | GATA5 | GATA6 | PNRC2 | OTX2 | RAMP2 |
| Hsa05010: Alzheimers Disease | TMED10 ( | BRI3 | PTX3 | APH1B ( | TFCP2 ( | HRG | C1R | FKBP2 | KHSRP | NEDD8 ( |
| Hsa05020: Parkinsons Disease | ARIH1 ( | AMFR | AGXT | TRIM25 | CCNB1IP1 | GAN | TMCC2 | STUB1 | SH2D3C | SLC6A1 |
| Hsa05030: ALS | SSR3 | JUB | ALS2CL ( | APBA1 | MTMR2 | ABL2 | HOXB2 | RAB37 | PKN1 ( | CHML |
| Hsa05040: Huntingtons Disease | HIP1R ( | SNX5 | IFT20 | PICALM | RPS10 | PQBP1 | NECAP1 | ARF1 | KPNA4 | MBTPS1 |
| Hsa05050: Dentatorubropallidoluysian Atrophy | ALG13 | TRIM22 | CLCN5 | ECM1 | MYST3 | NET1 | SYNPO | EFEMP1 | CPSF6 | NDFIP2 |
| Hsa05060: Prion Disease | PRND ( | CHD6 | LAMA2 | RPS21 | EIF2AK3 | KEAP1 | ADAM23 | DPP6 | MOG | OPCML |
| Hsa05110: Cholera Infection | SERP1 | SEC63 | ARFIP2 | APOB | ARFIP1 | PIP5K1A | FLAD1 | TRAM1 | ETHE1 | AP1B1 |
| Hsa05120: Epithelial Cell Signaling in Helicobacter Pylori Infection | GRLF1 | ETHE1 | HBA1 | EFNA2 | TOMM34 | DARC | ADD2 | SH3D19 | PFKM | ANG |
| Hsa05130: Pathogenic Escherichia Coli Infection Ehec | ARPC4 | GRM7 | HS1BP3 | CGN | PLA2G7 | KIAA1543 | LAPTM4A | NOX4 | ACTR2 | SSB |
| Hsa05210: Colorectal Cancer | EXO1 ( | ADIPOR1 ( | MUTYH ( | PMS2 | CDCA8 | ROR2 | PMS1 | MAZ | WNT5A | WNT7A |
| Hsa05211: Renal Cell Carcinoma | HIF3A | OS9 | EGLN2 | ING4 | ARNTL2 | SIM1 | ASB8 | LRRC41 | SENP6 | SIM2 |
| Hsa05212: Pancreatic Cancer | REPS1 | REPS2 | PLCD1 | SHFM1 | EXOC1 | RAD51AP1 | RAD54L | RALGPS1 | EXOC5 | EXOC3 |
| Hsa05213: Endometrial Cancer | MSR1 | BRCA2 ( | NF1 | MXI1 ( | RNASEL | FH | MSH2 | ELAC2 | MAD1L1 | CHEK2 |
| Hsa05214: Glioma | PDAP1 | KIAA1683 | RHBDF1 | RPS18 | ART1 | BRD2 | NKD2 | MYO10 | TFDP2 | SETD8 |
| Hsa05215: Prostate Cancer | KRT27 | MTTP | ATF6 ( | PTHLH ( | SEMG1 | ATF2 ( | G6PC | NFIL3 ( | ASGR1 | MALL |
| Hsa05216: Thyroid Cancer | TSSK2 | TMOD2 | RNF14 | TRIM25 | PPP4C | IFI16 | CNN1 | TMOD1 | S100A2 | NUP98 |
| Hsa05217: Basal Cell Carcinoma | IHH | DHH | ZIC1 | ZIC2 | PORCN | SFRP1 | ROR2 | FRMPD4 | GPC3 | GAS1 |
| Hsa05218: Melanoma | FGFR4 ( | FGFR2 ( | PHEX | FGFR3 ( | SCN8A | EBNA1BP2 | RPS2 | MAPK8IP2 | TFEB | PDAP1 |
| Hsa05219: Bladder Cancer | MLC1 | UNC5B ( | UNC5A | PAWR | AATF | TNXB | CAMK2A | RECK | HIST3H2A | ATF4 ( |
| Hsa05220: Chronic Myeloid Leukemia | APBA3 | MAP4K5 ( | BAZ2B | KLF3 | TDGF1 | MAPK4 | FMOD | RAI2 | ELF2 | SPRY2 ( |
| Hsa05221 Acute Myeloid Leukemia | RPL21 | NDUFB8 ( | FBXO18 | GATA2 ( | CEBPD ( | GFI1 ( | TAF9B | MYST3 ( | CBFA2T3 | NFATC1 |
| Hsa05222: Small Cell Lung Cancer | CKS2 | BCKDK | TBC1D8 | TNFRSF19 | DUSP1 | TNFRSF4 | TNFRSF12A | NGFRAP1 | LTBR | MAP6 |
| Hsa05223: Non Small Cell Lung Cancer | FDXR ( | LATS1 ( | MAP6 | NR1H2 ( | PRKRIR | CSN1S1 | NR1H3 | CNKSR1 | FOXG1 ( | PNRC1 |
The 10 most confident predictions of member genes in KEGG disease pathways are reported.
Figure 5.PPI subnetworks of the extended disease pathways. In each pathway, gray nodes are known member genes in the disease pathways and red nodes are newly predicted member genes. Edges represent PPI between two genes. Note that if a known or a newly predicted member gene is not interacting with any other member genes in the pathway, the gene is not included. (A) Colorectal cancer pathway. The predicted colorectal cancer genes EXO1 and ADIPOR1 are interacting with many other genes in the colorectal cancer pathway. (B) Alzheimer pathway. Over-expression of C1R is known for involving alzheimer disease. (C) Melanoma pathway. Mutation and copy number changes in new member gene FGFR3 were recently discovered in melanoma.
Figure 6.Predicted associations between disease classes and pathways. Each red entry represents a predicted association between 20 disease classes and 200 KEGG pathways.