| Literature DB >> 19292927 |
Qi Liu1, Jinling Huang, Huiqing Liu, Ping Wan, Xiuzi Ye, Ying Xu.
Abstract
BACKGROUND: Understanding the constituent domains of oncogenes, their origins and their fusions may shed new light about the initiation and the development of cancers.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19292927 PMCID: PMC2679021 DOI: 10.1186/1471-2105-10-88
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Distributions of origins of 105 oncogene domains across cellular organisms. Archaea: 1(1%); Bacteria: 17 (16%), Archaea_Bacteria: 22 (21%); Eukaryota: 19 (18%); Metazoa: 30(29%); Chordata: 16 (15%); Mammalia: 0 (0%); Homo sapiens: 0 (0%).
Enrichment analysis of oncogene domain origination distribution compared with background human genome.
| Archaea_only | 1 | 103 | 1021 | 88025 | 0.8370 | 6.64E-01 |
| Bacteria_archaea | 20 | 103 | 13052 | 88025 | 1.3095 | 1.13E-01 |
| Lower-level Eukaryote | 21 | 103 | 23391 | 88025 | 0.7673 | 9.26E-02 |
| Chordate | 16 | 103 | 13309 | 88025 | 1.0274 | 4.26E-01 |
| Mammalian | 0 | 103 | 4553 | 88025 | 0 | 0 |
| Homo sapiens | 0 | 103 | 1381 | 88025 | 0 | 0 |
| Prokaryote | 34 | 103 | 29367 | 88025 | 0.9894 | 5.16E-01 |
| Eukaryote | 69 | 103 | 58658 | 88025 | 1.0053 | 4.46E-01 |
Main function groups of oncogene domains.
| ATP binding | Pkinase, DEAD, Helicase_C, Pkinase_C, MutS_V, MutS_IV, MutS_III MutS_II MutS_I Furin-like, Ephrin_lbd |
| protein binding | zf-C3HC4, Death, LRR_1, PDZ, SAMP, EB1_binding, APC_basic, APC_15aa |
| DNA/RNA binding | Myb_DNA-binding, P53, bZIP_Maf, bZIP_2, RRM_1, zf-C2H2, zf-C4, bZIP_1, Ets, SAM_PNT |
| signal transduction | Cbl_N, wnt, C1_1, Death, RhoGAP, Ras |
| growth factor receptor activity | FGF, PDGF, PDGF_N, IGFBP |
| protein tyrosine kinase activity | Pkinase, Pkinase_Tyr, SH2, SH3, ig, I-sev, V-set |
| regulation of transcription | HLH, RHD, Wos2, Hormone_recep, zf-C4, bZIP_1, Ets, IRF, p53, Myc_N, WT1, E2F_TDP, Myc-LZ, bZIP_Maf, bZIP_2 |
| regulation of apoptosis | BH, BH4 |
| receptor factor activity/binding | Recep_L_domain, NCD3G, 7tm_1, 7tm_3, PSI |
| Wnt receptor signalling pathway | Wnt |
| zinc ion binding | zf-C2H2, zf-C3HC4, zf-C4, zf-RanBP |
| calcium ion binding | Cadherin, Cadherin_C |
| transcription factor activity | Hormone_recep, zf-C4, bZIP_1, Ets, RHD, IRF, P53, Myc_N, E2F_TDP, Myc-LZ |
38 oncogene domains present in virus dataset (367,752 proteins).
| 1 | PF00271 | Helicase_C | 1455 | Helicase conserved C-terminal domain |
| 2 | PF00023 | Ank | 459 | Ankyrin repeat |
| 3 | PF00270 | DEAD | 296 | DEAD/DEAH box helicase |
| 4 | PF00097 | zf-C3HC4 | 168 | Zinc finger, C3HC4 type (RING finger) |
| 5 | PF00069 | Pkinase | 162 | Protein kinase domain |
| 6 | PF00001 | 7tm_1 | 115 | 7 transmembrane receptor (rhodopsin family) |
| 7 | PF00047 | ig | 111 | Immunoglobulin domain |
| 8 | PF07686 | V-set | 110 | Immunoglobulin V-set domain |
| 9 | PF00048 | IL8 | 87 | Small cytokines (intecrine/chemokine), interleukin-8 like |
| 10 | PF00170 | bZIP_1 | 50 | bZIP transcription factor |
| 11 | PF01403 | Sema | 47 | Sema domain |
| 12 | PF00167 | FGF | 33 | Fibroblast growth factor |
| 13 | PF07714 | Pkinase_Tyr | 30 | Protein tyrosine kinase |
| 14 | PF00096 | zf-C2H2 | 22 | Zinc finger, C2H2 type |
| 15 | PF00017 | SH2 | 21 | SH2 domain |
| 16 | PF00560 | LRR_1 | 17 | Leucine Rich Repeat |
| 17 | PF00018/PF07653 | SH3 | 16 | SH3 domain |
| 18 | PF00605 | IRF | 16 | Interferon regulatory factor transcription factor |
| 19 | PF00041 | fn3 | 15 | Fibronectin type III domain |
| 20 | PF00341 | PDGF | 14 | Platelet-derived growth factor |
| 21 | PF00452 | Bcl-2 | 12 | Apoptosis regulator proteins, Bcl-2 family |
| 22 | PF00134 | Cyclin_N | 11 | Cyclin, N-terminal domain |
| 23 | PF01056 | Myc_N | 11 | Myc amino-terminal region |
| 24 | PF00010 | HLH | 10 | Helix-loop-helix DNA-binding domain |
| 25 | PF07679 | I-set | 8 | Immunoglobulin I-set domain |
| 26 | PF02344 | Myc-LZ | 6 | Myc leucine zipper domain |
| 27 | PF00076 | RRM_1 | 5 | RNA recognition motif |
| 28 | PF01437 | PSI | 5 | Plexin repeat |
| 29 | PF02201 | SWIB | 5 | SWIB/MDM2 domain |
| 30 | PF00104 | Hormone_recep | 3 | Ligand-binding domain of nuclear hormone receptor |
| 31 | PF00105 | zf-C4 | 3 | Zinc finger, C4 type |
| 32 | PF07716 | bZIP_2 | 2 | Basic region leucine zipper |
| 33 | PF07988 | Wos2 | 2 | Mitotic protein Wos2 |
| 34 | PF00071 | Ras | 1 | Ras family |
| 35 | PF00595 | PDZ | 1 | PDZ domain |
| 36 | PF00611 | FCH | 1 | Fes/CIP4 homology domain |
| 37 | PF02757 | YLP | 1 | YLP motif |
| 38 | PF04692 | PDGF_N | 1 | Platelet-derived growth factor, N terminal |
24 oncogenes whose domain fusion events arose in prokaryotes.
| MOS | {Pkinase} | A_B | P |
| SPINK1 | {Kazal_1} | B | / |
| NRAS | {Ras} | A_B | P |
| HRAS | {Ras} | A_B | P |
| KRAS | {Ras} | A_B | P |
| GLI2 | {zf-C2H2} | A_B | P |
| GLI3 | {zf-C2H2} | A_B | P |
| MYBL2 | {Myb_DNA-binding} | B | / |
| RALA | {Ras} | A_B | P |
| RALB | {Ras} | A_B | P |
| PIM1 | {Pkinase} | A_B | P |
| CDK4 | {Pkinase} | A_B | P |
| FOSL1 | {bZIP_1} | A | P |
| TAL1 | {HLH} | B | P |
| BCL3 | {Ank} | A_B | P |
| DDX6 | {DEAD, Helicase_C} | A_B | P |
| NKTR | {Pro_isomerase} | A_B | / |
| BMI1 | {zf-C3HC4} | B | P |
| TGFBR2 | {Pkinase} | A_B | P |
| MPL | {fn3} | A_B | P |
| MSH2 | {MutS_V, MutS_I, MutS_II, MutS_IV, MutS_III} | A_B | / |
| RAB8A | {Ras} | A_B | P |
| MAX | {HLH} | B | P |
| EVI1 | {zf-C2H2} | A_B | P |
(A_B: archaea and bacteria; A: archaea only; B: bacteria_only; P: presence.)
General classification of oncogene origins according to their functions.
| signal transducers | Prokaryotes | |
| non-receptor kinases | Metazoans | |
| growth factors | Chordates | |
| growth factor receptors | Metazoans(9) Chordates(6) | |
| transcription factors | Metazoans(4) Chordates(12) | |
| programmed cell death regulators | Metazoans |
Figure 2Oncogene domain co-occurrence graph consisting of 105 domains. Each node is labelled with a domain name. The weight of each edge represents the co-occurrence frequency across all the 124 oncogenes.
Figure 3Frequency distribution of node degrees in oncogene domain network. The distribution follows a generalized power law:. Parameter values of the fit (solid curve) are a = 1.125; b = -0.887, and r = 0.101.
Frequent domain pairs XY in the oncogene graph compared with the background genome.
| 1 | SH2 | SH3 | 63 | 16 | 51.25 | 6.84E-19 |
| 2 | SH2 | Pkinase_Tyr | 41 | 11 | 54.15 | 7.83E-17 |
| 3 | SH3 | Pkinase_Tyr | 32 | 9 | 56.76 | 3.42E-14 |
| 4 | SAM_PNT | Ets | 13 | 5 | 77.62 | 3.43E-09 |
| 5 | Furin-like | Pkinase_Tyr | 8 | 4 | 100.91 | 3.96E-08 |
| 6 | Recep_L_domain | Pkinase_Tyr | 8 | 4 | 100.91 | 3.96E-08 |
| 7 | Furin-like | Recep_L_domain | 10 | 4 | 80.73 | 1.18E-07 |
| 8 | Jun | bZIP_1 | 3 | 3 | 201.81 | 1.19E-07 |
| 9 | Pkinase | RBD | 4 | 3 | 151.36 | 4.73E-07 |
| 10 | C1_1 | RBD | 5 | 3 | 121.09 | 1.18E-06 |
| 11 | Myc_N | HLH | 5 | 3 | 121.09 | 1.18E-06 |
| 13 | ig | Pkinase_Tyr | 27 | 4 | 29.9 | 9.23E-06 |
(P-value cutoff is 10-6. ns: the number of proteins containing specific domain pair in the background genome. ms: the number of proteins containing specific domain pairs in the oncogene proteins. Background proteins set size: 25,025; oncogene proteins set size: 124)
Phylogenetic profiling analysis of frequent individual domains and domain pairs through 7 taxa from 495 genomes.
| Phinase_Tyr | |||||||
| SH2 | |||||||
| SH3 | |||||||
| RhoGEF | |||||||
| fn3 | |||||||
| SH2&SH3 | |||||||
| SH2& Pkinase_Tyr | |||||||
| SH3& Pkinase_Tyr | |||||||
| SAM_PNT& Ets | |||||||
| Furin-like& Pkinase_Tyr | |||||||
| Recep_L_domain& Pkinase_Tyr | |||||||
| Furin-like& Recep_L_domain Pkinase_Tyr Recep_L_domain | |||||||
| Jun& bZIP_1 | |||||||
| Pkinase& RBD | |||||||
| C1_1& RBD | |||||||
| Myc_N& HLH | |||||||
| ig& Pkinase_Tyr | |||||||
(- for absence and + for presence)
Figure 4A computational pipeline for prediction of origins of oncogene domains. Different components of the pipeline are colour-coded with yellow for prediction of domain origins, blue for analysis of oncogene domain co-occurrence and red for analysis of evolutionary characteristics of frequent domains and domain pairs.
Figure 5A simplified taxonomy. For cellular organisms, each ellipse represents a major taxonomic class. Each rectangle represents all organisms covered by its parent class but not covered under its sibling ellipse.