| Literature DB >> 22654636 |
Carlos Roberto Arias1, Hsiang-Yuan Yeh, Von-Wun Soo.
Abstract
Finding a genetic disease-related gene is not a trivial task. Therefore, computational methods are needed to present clues to the biomedical community to explore genes that are more likely to be related to a specific disease as biomarker. We present biomarker identification problem using gene prioritization method called gene prioritization from microarray data based on shortest paths, extended with structural and biological properties and edge flux using voting scheme (GP-MIDAS-VXEF). The method is based on finding relevant interactions on protein interaction networks, then scoring the genes using shortest paths and topological analysis, integrating the results using a voting scheme and a biological boosting. We applied two experiments, one is prostate primary and normal samples and the other is prostate primary tumor with and without lymph nodes metastasis. We used 137 truly prostate cancer genes as benchmark. In the first experiment, GP-MIDAS-VXEF outperforms all the other state-of-the-art methods in the benchmark by retrieving the truest related genes from the candidate set in the top 50 scores found. We applied the same technique to infer the significant biomarkers in prostate cancer with lymph nodes metastasis which is not established well.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22654636 PMCID: PMC3354662 DOI: 10.1100/2012/842727
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1General gene prioritization overview.
Seed genes of prostate cancer from omim database.
| Gene ID | Gene symbol | Gene name |
|---|---|---|
| 367 | AR | Androgen receptor |
| 675 | BRCA2 | Breast cancer type 2 susceptibility protein |
| 3732 | CD82 | CD82 antigen |
| 11200 | CHEK2 | Serine/threonine-protein kinase Chk2 |
| 60528 | ELAC2 | Zinc phosphodiesterase ELAC protein 2 |
| 2048 | EPHB2 | Ephrin type-B receptor 2 precursor |
| 3092 | HIP1 | Huntingtin-interacting protein 1 |
| 1316 | KLF6 | Kruppel-like factor 6 |
| 8379 | MAD1L1 | Mitotic spindle assembly checkpoint protein MAD |
| 4481 | MSR1 | Macrophage scavenger receptor types I and II |
| 4601 | MXI1 | MAX-interacting protein 1 |
| 7834 | PCAP | Predisposing for prostate cancer |
| 5728 | PTEN/PTENP1 | Phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase, and dual-specificity protein phosphatase PTEN |
| 6041 | RNASEL | 2–5A-dependent ribonuclease |
| 5513 | HPC1 | Hereditary prostate cancer 1 |
Figure 2GP-MIDAS-VXEF workflow.
Algorithm 1Overview of NetWalk Phase.
Figure 3ROC curves comparing the performance of GP-MIDAS-VXEF with existent state-of-the-art network-based prioritization methods.
Figure 4Target genes retrieved. Showing the amount of target genes retrieve on different ranks, on top of each bar the average position of the found genes is shown.
Figure 5Venn diagram shows how the set of target genes is found amongst the different methods tested.
Figure 6Edge flux values distribution.
Figure 7Prostate-normal experiment top 50 genes induced result network. Red color shows the higher difference in expression between prostate cancer and normal tissue sample, on the other hand the green color shows the smaller difference in expression between the samples. Nodes with node circle denote seed genes.
Figure 8Prostate-metastatis experiment top 50 genes induced result network. Red color shows the higher difference in expression between prostate cancer and lymph node metastasis tissue sample, on the other hand the green color shows the smaller difference in expression between the samples. Nodes with node circle denote Seed Genes.
Target genes found across methods.
| Method | Target genes |
|---|---|
| CIPHER | ATM, BRCA1, CAV1, CCND1, CDKN1A, CDKN1B, EGFR, EGR1, ESR1, ESR2, HIF1A, HRAS, MME, MSH2, MYC, NCOA3, NCOA4, PGR, RB1, RNF14, SMARCA4, TP53 |
|
| |
| ENDEAVOUR | ACPP, ANXA7, APC, ARMET, ATM, BCL2, BMP6, BRCA1, BTRC, CAV1, CCND1, CD44, CDH1, CDH13, CDKN1A, CDKN1B, CDKN2A, CTCF, CTNNA1, CTNNB1, CYP1B1, DAPK1, EDNRB, EGFR, EGR1, ERBB2, ERCC5, ESR1, ESR2, FAF1, FHIT, GGT1, GSTP1, HIF1A, HOXA13, HRAS, IGFBP3, IL12A, IL8, KLK10, KLK2, KLK3, MAP2K4, MME, MSH2, MYC, NAT1, NCOA3, NCOA4, NEFL, PGK1, PGR, PLAU, POLB, PTPN13, RARB, RASSF1, RB1, RNF14, SLC2A2, SMARCA4, SOX2, STMN1, TCEB1, TMEPAI, TNF, TP53, TYR, VDR |
|
| |
| ToppNet (K-Step Markov, HITS with Priors, PageRank with Priors | ACPP, ANXA7, APC, ATM, BCL2, BMP6, BRCA1, BTRC, CAV1, CCND1, CD44, CDH1, CDH13, CDKN1A, CDKN1B, CDKN2A, CTCF, CTNNA1, CTNNB1, CYP1B1, DAPK1, EDNRB, EGFR, EGR1, ERBB2, ERCC5, ESR1, ESR2, FAF1, FHIT, GGT1, GSTP1, HIF1A, HOXA13, HRAS, IGFBP3, IL12A, IL8, KLK10, KLK2, KLK3, MAP2K4, MC1R, MME, MSH2, MYC, NAT1, NCOA3, NCOA4, NEFL, NME1, PGK1, PGR, PLAU, POLB, PTPN13, RARB, RASSF1, RB1, RNF14, SLC2A2, SMARCA4, SOX2, STMN1, TCEB1, TNF, TP53, TYR, VDR |
|
| |
| GP-MIDAS-VXEF | ACPP, ANXA7, APC, ARMET, ATM, BCL2, BMP6, BRCA1, BTRC, CAV1, CCND1, CD44, CDH1, CDH13, CDKN1A, CDKN1B, CDKN2A, CTCF, CTNNA1, CTNNB1, CYP1B1, DAPK1, EDNRB, EGFR, EGR1, |
Top 50 Genes.
| Rank | Using prostate and normal tissue | Using prostate and metastatis tissue |
|---|---|---|
| 1 | CAV1 | CAV1 |
| 2 | TP53 | MAGEA11 |
| 3 | MAGEA11 | CALM1 |
| 4 | CALM1 | CALR |
| 5 | EGFR | TP53 |
| 6 | UBE2I | FHL2 |
| 7 | CALR | EGFR |
| 8 | SMAD3 | APP |
| 9 | FHL2 | JUN |
| 10 | HDAC1 | SMAD3 |
| 11 | APP | SMAD2 |
| 12 | MYC | ESR1 |
| 13 | JUN | RB1 |
| 14 | ESR1 | HIPK3 |
| 15 | GNB2L1 | BRCA1 |
| 16 | HIPK3 | SMAD1 |
| 17 | SMAD2 | GNB2L1 |
| 18 | APPBP2 | XRCC6 |
| 19 | CDC2 | UBE2I |
| 20 | BRCA1 | HDAC1 |
| 21 | RB1 | CDC2 |
| 22 | SMAD1 | AES |
| 23 | PXN | STAT3 |
| 24 | XRCC6 | IL6ST |
| 25 | IL6ST | APPBP2 |
| 26 | STAT3 | PCAF |
| 27 | DLG1 | REPS2 |
| 28 | AES | FLNA |
| 29 | TRAF6 | RAF1 |
| 30 | FLNA | MYC |
| 31 | TRIM29 | MAPK1 |
| 32 | PCAF | TRAF6 |
| 33 | REPS2 | CCND1 |
| 34 | AKT1 | SMARCA4 |
| 35 | PRKCA | HLA-B |
| 36 | RAF1 | TRAF2 |
| 37 | HLA-B | RANBP9 |
| 38 | TRAF2 | PIAS4 |
| 39 | SMARCA4 | GSK3B |
| 40 | MAPK1 | TRIM29 |
| 41 | CHGB | FOS |
| 42 | RANBP9 | IDE |
| 43 | CCND1 | SRC |
| 44 | GSK3B | PXN |
| 45 | HSPA1A | SLC25A4 |
| 46 | BCL2 | SP1 |
| 47 | VCL | NR5A1 |
| 48 | RAI17 | YWHAG |
| 49 | TGFBR1 | AKT1 |
| 50 | SELENBP1 | CCNE1 |
Top 50 genes overlap.
| Set | Total genes | Genes in set |
|---|---|---|
| Overlapped genes | 41 | HDAC1 IL6ST SMAD1 RB1 TRAF2 RAF1 BRCA1 APP CDC2 EGFR AKT1 FLNA AES SMAD2 REPS2 GSK3B SMARCA4 GNB2L1 STAT3 UBE2I TRAF6 MAPK1 MYC CAV1 JUN CCND1 RANBP9 HLA-B PCAF FHL2 TP53 TRIM29 CALR APPBP2 SMAD3 CALM1 MAGEA11 HIPK3 ESR1 PXN XRCC6 |
|
| ||
| Genes in metastasis analysis | 9 | PIAS4 FOS YWHAG SLC25A4 SP1 SRC IDE CCNE1 NR5A1 |
|
| ||
| Genes in nonmetastasis analysis | 9 | DLG1 VCL TGFBR1 CHGB SELENBP1 BCL2 PRKCA RAI17 HSPA1A |
Available biological networks sites.
| Name | Acronym | URL |
|---|---|---|
| Human Protein Reference Database | HPRD |
|
| Biomolecular Interaction Network Database | BIND |
|
| Biological General Repository for Interaction Datasets | BioGRID |
|
| Database of Interacting Proteins | DIP |
|
| IntAct Molecular Interaction Database | IntAct |
|
| The MIPS Mammalian Protein-Protein Interaction Database | MIPS |
|
| Molecular Interaction Database | MINT |
|
| Kyoto Encyclopedia of Genes and Genomes | KEGG |
|
| National Center for Biotechnology Information | NCBI |
|
Data and Text Mining Gene Prioritization Methods.
| Method | Brief description | Reported results |
|---|---|---|
| Gene seeker | Gathers gene expression and phenotypic data from human and mouse from nine databases. Relies on the assumption that disease genes are likely to be expressed in tissues affected by that disease [ | Offers a web-service to find disease-related genes to the input genetic localisation and phenotypic/expression terms |
|
| ||
| eVOC | Co-occurrence of disease name on PubMed Abstracts. It selects the disease genes according to expression profiles [ | It was tested on 417 candidate genes, using 17 known disease genes. It successfully retrieved 15 of the 17 known disease genes and shrunk the candidate set by 63.3% |
|
| ||
| DPG | Basic Sequence Information [ | They concluded that disease proteins tend to be long, conserved, phylogenetically extended, and without close paralogues. |
|
| ||
| Prospectr | Basic Sequence Information [ | It achieved an enrichment of list of disease genes twofold 77% of the time, fivefold 37% of the time and twentyfold 11% of the time |
|
| ||
| Suspects | Extension of prospectr, incorporates GO [ | On average the target gene was on the top 31.23% of the resulting ranking list. |
|
| ||
| MedSim | GO enrichment and functional comparison [ | It accomplished a performance of up to 0.90 in their ROC curve. |
|
| ||
|
|
Generally imposed by the source data which carries little knowledge about the disease. For instance GO terms include brief description of the corresponding biological function of the genes but only 60% of all human genes have associated | |
Network based gene prioritization methods.
| Method | Brief description |
|---|---|
| Endeavor | Machine learning: using initial known disease genes; then multiple genomic data sources to rank [ |
|
| |
| HITS with priors | 310 cm prioritization based on networks using social and web networks analysis [ |
|
| |
| CGI | Combination of protein interaction network and gene expression using markov random field theory [ |
|
| |
| CANDID | Uses publications, protein domain descriptions, cross species conservation measures, gene expression profiles and Protein Interaction Networks [ |
|
| |
| IDEA | Uses the interactome and microarray data [ |
|
| |
|
| Most of these approaches include additional interactions predicted from coexpression, pathway, functional or literature data, but still fail to incorporate weights expressing the confidence on the evidence of the interactions. Another issue is that previous methods start with the given PIN without filtering its edges, to keep more relevant interactions to the disease |
|
| |
| GP-MIDAS-VXEF | Our proposed method, integrates protein interaction network with normal and disease microarray data, using this integration we apply all-pairs shortest paths to find the significant networks and calculate the score for the genes. Additionally our method filters interactions, in such way the most relevant interactions are left for analysis |