| Literature DB >> 30407583 |
Pora Kim1, Xiaobo Zhou1,2,3.
Abstract
Gene fusion is one of the hallmarks of cancer genome via chromosomal rearrangement initiated by DNA double-strand breakage. To date, many fusion genes (FGs) have been established as important biomarkers and therapeutic targets in multiple cancer types. To better understand the function of FGs in cancer types and to promote the discovery of clinically relevant FGs, we built FusionGDB (Fusion Gene annotation DataBase) available at https://ccsm.uth.edu/FusionGDB. We collected 48 117 FGs across pan-cancer from three representative fusion gene resources: the improved database of chimeric transcripts and RNA-seq data (ChiTaRS 3.1), an integrative resource for cancer-associated transcript fusions (TumorFusions), and The Cancer Genome Atlas (TCGA) fusions by Gao et al. For these ∼48K FGs, we performed functional annotations including gene assessment across pan-cancer fusion genes, open reading frame (ORF) assignment, and retention search of 39 protein features based on gene structures of multiple isoforms with different breakpoints. We also provided the fusion transcript and amino acid sequences according to multiple breakpoints and transcript isoforms. Our analyses identified 331, 303 and 667 in-frame FGs with retaining kinase, DNA-binding, and epigenetic factor domains, respectively, as well as 976 FGs lost protein-protein interaction. FusionGDB provides six categories of annotations: FusionGeneSummary, FusionProtFeature, FusionGeneSequence, FusionGenePPI, RelatedDrug and RelatedDisease.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30407583 PMCID: PMC6323909 DOI: 10.1093/nar/gky1067
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of FusionGDB. (A) Work flow of functional annotation of FGs. (B) ORF types based on fusion transcript. (C) Schematic overview of retention of protein domain or interactors in fusion protein. Fusion protein in this figure retained protein domain of 5′-partner and lost protein-protein interaction (PPIs) with cellular regulators/interactors of 3′-partner.
Figure 2.Gene assessment across pan-cancer in-frame FGs. (A) 5′-partner genes with high-MAII scores. (B) 3′-Partner genes with high-MAII scores. Y-axis presents Major Active Iso-fusion Index (MAII) score. MAII score can be calculated by log2(observed frequency/DoF score × 10). Degree of Frequency (DoF) score can be calculated by (# cancer types) × (# partners) × (# breakpoints). The genes that have the positive and bigger values of MAIIs are ‘effective genes in pan-cancer fusion genes (eGinPCFG)’. The genes that have the negative and less values of MAIIs are ‘possible effective genes in pan-cancer fusion genes (peGinPCFG)’.
Statistics of retention status of 39 protein features from UniProt sequence annotation in the in-frame and frame-shift FGs
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
|
| Initiator methionine | 1013 | 136 | 1172 | 141 | 808 | 135 |
| Signal peptide | 829 | 145 | 812 | 190 | 1066 | 255 | |
| Transit peptide | 145 | 31 | 181 | 24 | 193 | 31 | |
| Propeptide | 92 | 139 | 93 | 146 | 139 | 205 | |
| Chain | 179 | 6836 | 260 | 1640 | 6343 | 1868 | |
| Peptide | 6 | 18 | 22 | 37 | 15 | 48 | |
|
| Topological domain | 554 | 1094 | 543 | 941 | 1048 | 1152 |
| Transmembrane | 744 | 1171 | 746 | 1226 | 867 | 1513 | |
| Intramembrane | 18 | 49 | 20 | 45 | 24 | 63 | |
| Domain | 1458 | 3264 | 1635 | 2449 | 2235 | 2664 | |
| Repeat | 461 | 786 | 563 | 657 | 525 | 713 | |
| Calcium binding | 44 | 99 | 58 | 93 | 51 | 94 | |
| Zinc finger | 281 | 705 | 293 | 428 | 317 | 550 | |
| DNA binding | 97 | 174 | 87 | 143 | 104 | 166 | |
| Nucleotide binding | 532 | 633 | 615 | 599 | 523 | 596 | |
| Region | 929 | 2029 | 1023 | 1649 | 1183 | 1688 | |
| Coiled coil | 373 | 957 | 401 | 773 | 569 | 754 | |
| Motif | 403 | 881 | 420 | 629 | 334 | 761 | |
| Compositional bias | 1016 | 1723 | 1097 | 1322 | 967 | 1474 | |
|
| Active site | 310 | 679 | 324 | 682 | 257 | 706 |
| Metal binding | 302 | 557 | 331 | 488 | 282 | 531 | |
| Binding site | 360 | 718 | 439 | 671 | 350 | 699 | |
| Site | 348 | 478 | 316 | 451 | 249 | 389 | |
|
| Non-standard residue | 0 | 2 | 2 | 2 | 0 | 4 |
| Modified residue | 3315 | 4222 | 3683 | 3535 | 2819 | 3678 | |
| Lipidation | 82 | 116 | 94 | 120 | 75 | 172 | |
| Glycosylation | 731 | 967 | 746 | 1080 | 846 | 1289 | |
| Disulfide bond | 413 | 655 | 433 | 739 | 534 | 906 | |
| Cross-link | 486 | 795 | 588 | 621 | 374 | 651 | |
|
| Alternative sequence | 2745 | 4267 | 3103 | 3637 | 3405 | 4020 |
| Natural variant | 2764 | 4224 | 2994 | 4155 | 2917 | 4593 | |
|
| Mutagenesis | 1087 | 1945 | 1184 | 1569 | 917 | 1634 |
| Sequence uncertainty | 0 | 0 | 0 | 0 | 0 | 0 | |
| Sequence conflict | 2726 | 4108 | 3050 | 3883 | 2655 | 4116 | |
| Non-adjacent residues | 0 | 0 | 0 | 0 | 0 | 0 | |
| Non-terminal residue | 0 | 0 | 0 | 0 | 0 | 0 | |
|
| Helix | 1836 | 2675 | 1965 | 2324 | 1652 | 2473 |
| Beta strand | 1728 | 2430 | 1824 | 2144 | 1511 | 2260 | |
| Turn | 1281 | 2072 | 1387 | 1826 | 1163 | 1892 | |
Figure 3.FusionGeneSummary category. This category shows the overall function of fusion gene and each partner gene. It also provides information of the impact of each gene in pan-cancer fusion genes and functional category assigned by multiple functional annotations.
Domain retention annotation of ALK fusion proteins from FusionGDB
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1116_1392 | 1057 | 1621 | Domain | Protein kinase |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1197_1199 | 1057 | 1621 | Region | Note = Inhibitor binding |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1060_1620 | 1057 | 1621 | Topological domain | Cytoplasmic |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1116_1392 | 1057 | 1621 | Domain | Protein kinase |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1197_1199 | 1057 | 1621 | Region | Note = Inhibitor binding |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1060_1620 | 1057 | 1621 | Topological domain | Cytoplasmic |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1116_1392 | 1057 | 1621 | Domain | Protein kinase |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1197_1199 | 1057 | 1621 | Region | Note = Inhibitor binding |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1060_1620 | 1057 | 1621 | Topological domain | Cytoplasmic |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1116_1392 | 1057 | 1621 | Domain | Protein kinase |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1197_1199 | 1057 | 1621 | Region | Note = Inhibitor binding |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1060_1620 | 1057 | 1621 | Topological domain | Cytoplasmic |
|
| |||||||||||
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 816_940 | 1057 | 1621 | Compositional bias | Note = Gly-rich |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 264_427 | 1057 | 1621 | Domain | MAM 1 |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 478_636 | 1057 | 1621 | Domain | MAM 2 |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 437_473 | 1057 | 1621 | Domain | Note = LDL-receptor class A |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 19_1038 | 1057 | 1621 | Topological domain | Extracellular |
| EML4-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1039_1059 | 1057 | 1621 | Transmembrane | Helical |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 816_940 | 1057 | 1621 | Compositional bias | Note = Gly-rich |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 264_427 | 1057 | 1621 | Domain | MAM 1 |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 437_473 | 1057 | 1621 | Domain | Note = LDL-receptor class A |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 478_636 | 1057 | 1621 | Domain | MAM 2 |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 19_1038 | 1057 | 1621 | Topological domain | Extracellular |
| NPM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1039_1059 | 1057 | 1621 | Transmembrane | Helical |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 816_940 | 1057 | 1621 | Compositional bias | Note = Gly-rich |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 264_427 | 1057 | 1621 | Domain | MAM 1 |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 437_473 | 1057 | 1621 | Domain | Note = LDL-receptor class A |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 478_636 | 1057 | 1621 | Domain | MAM 2 |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 19_1038 | 1057 | 1621 | Topological domain | Extracellular |
| SQSTM1-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1039_1059 | 1057 | 1621 | Transmembrane | Helical |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 816_940 | 1057 | 1621 | Compositional bias | Note = Gly-rich |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 264_427 | 1057 | 1621 | Domain | MAM 1 |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 437_473 | 1057 | 1621 | Domain | Note = LDL-receptor class A |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 478_636 | 1057 | 1621 | Domain | MAM 2 |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 19_1038 | 1057 | 1621 | Topological domain | Extracellular |
| TFG-ALK | ALK | ENST00000389048 | chr2:29446394 | - | 18 | 29 | 1039_1059 | 1057 | 1621 | Transmembrane | Helical |
Protein-protein interaction (PPI) retention annotation of KMT2A-MLLT10 fusion protein from FusionGDB
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| |||||||
| KMT2A | PPIE, PPP1R15A, KMT2A, ASH2L, HCFC1, HCFC2, MEN1, RBBP5, WDR5, AVP, INS, OXT, MAP3K5, HDAC1, CTBP1, CBX4, BMI1, CREBBP, SMARCB1, CXXC1, MYB, CTNNB1, SNW1, E2F2, E2F4, E2F6, PSIP1, MLLT4, POLR2A, KAT8, RNF2, TP53, SBF1, MTM1, SET, HIST1H3A, HIST1H4A, KAT6A, ELL, AFF1, AFF4, CDK9, CCNT1, CTR9, LEO1, PAF1, CDC73, WDR61, MLLT3, DOT1L, SKP2, HIST3H3, SVIL, HIST2H3C, SIN3A, MLLT1, RUNX1, CBFB, H3F3A, SIRT7, ASB2, TCEB1, TCEB2, CBX8, TOP1, TAF6, NCL, HECW2, LGR4, CSNK2A2, SENP3, SYMPK, PKN1, PIH1D1, KRAS, TAF1, CHD3, SMARCA2, SMARCC2, SMARCC1, HDAC2, RBBP4, RBBP7, TBP, MBD3, SAP30, RAN, TAF9, TASP1, HIST1H2BG, EWSR1, DYNLT1, KIF11, ING4, ZNF131, ASB7 | MLLT10 | YEATS4, SMARCB1, SS18, MLLT10, DOT1L, MLLT1, MLLT3, AFF1, DISC1, NDEL1, ELAVL1, CENPJ, TMPO, ZNF526, MLLT6, TCP10L, KIF6, PMF1 | |||||||
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| MLLT10 | chr11:118352807 | chr10:22002700 | ENST00000377091 | + | 0 | 5 | 141_233 | −59 | 127 | FSTL3 |
| MLLT10 | chr11:118352807 | chr10:22002700 | ENST00000377100 | + | 0 | 4 | 141_233 | −119 | 180 | FSTL3 |
| MLLT10 | chr11:118355029 | chr10:21959377 | ENST00000377091 | + | 0 | 5 | 141_233 | −59 | 127 | FSTL3 |
| MLLT10 | chr11:118355029 | chr10:21959377 | ENST00000377100 | + | 0 | 4 | 141_233 | −119 | 180 | FSTL3 |
| MLLT10 | chr11:118355690 | chr10:21959377 | ENST00000377091 | + | 0 | 5 | 141_233 | −59 | 127 | FSTL3 |
| MLLT10 | chr11:118355690 | chr10:21959377 | ENST00000377100 | + | 0 | 4 | 141_233 | −119 | 180 | FSTL3 |
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| KMT2A | chr11:118352807 | chr10:22002700 | ENST00000354520 | + | 7 | 35 | 1584_1600 | 1337 | 3932 | histone H3K4me3 |
| KMT2A | chr11:118352807 | chr10:22002700 | ENST00000389506 | + | 7 | 36 | 1584_1600 | 1337 | 3970 | histone H3K4me3 |
| KMT2A | chr11:118352807 | chr10:22002700 | ENST00000534358 | + | 7 | 36 | 1584_1600 | 1337 | 3973 | histone H3K4me3 |
| KMT2A | chr11:118355029 | chr10:21959377 | ENST00000354520 | + | 9 | 35 | 1584_1600 | 1406 | 3932 | histone H3K4me3 |
| KMT2A | chr11:118355029 | chr10:21959377 | ENST00000389506 | + | 9 | 36 | 1584_1600 | 1406 | 3970 | histone H3K4me3 |
| KMT2A | chr11:118355029 | chr10:21959377 | ENST00000534358 | + | 9 | 36 | 1584_1600 | 1406 | 3973 | histone H3K4me3 |
| KMT2A | chr11:118355690 | chr10:21959377 | ENST00000354520 | + | 1 | 35 | 1584_1600 | 0 | 3932 | histone H3K4me3 |
| KMT2A | chr11:118355690 | chr10:21959377 | ENST00000389506 | + | 10 | 36 | 1584_1600 | 1444 | 3970 | histone H3K4me3 |
| KMT2A | chr11:118355690 | chr10:21959377 | ENST00000534358 | + | 10 | 36 | 1584_1600 | 1444 | 3973 | histone H3K4me3 |
| KMT2A | chr11:118352807 | chr10:22002700 | ENST00000354520 | + | 7 | 35 | 3764_3771 | 1337 | 3932 | WDR5 |
| KMT2A | chr11:118352807 | chr10:22002700 | ENST00000389506 | + | 7 | 36 | 3764_3771 | 1337 | 3970 | WDR5 |
| KMT2A | chr11:118352807 | chr10:22002700 | ENST00000534358 | + | 7 | 36 | 3764_3771 | 1337 | 3973 | WDR5 |
| KMT2A | chr11:118355029 | chr10:21959377 | ENST00000354520 | + | 9 | 35 | 3764_3771 | 1406 | 3932 | WDR5 |
| KMT2A | chr11:118355029 | chr10:21959377 | ENST00000389506 | + | 9 | 36 | 3764_3771 | 1406 | 3970 | WDR5 |
| KMT2A | chr11:118355029 | chr10:21959377 | ENST00000534358 | + | 9 | 36 | 3764_3771 | 1406 | 3973 | WDR5 |
| KMT2A | chr11:118355690 | chr10:21959377 | ENST00000354520 | + | 1 | 35 | 3764_3771 | 0 | 3932 | WDR5 |
| KMT2A | chr11:118355690 | chr10:21959377 | ENST00000389506 | + | 10 | 36 | 3764_3771 | 1444 | 3970 | WDR5 |
| KMT2A | chr11:118355690 | chr10:21959377 | ENST00000534358 | + | 10 | 36 | 3764_3771 | 1444 | 3973 | WDR5 |
| MLLT10 | chr11:118352807 | chr10:22002700 | ENST00000307729 | + | 12 | 23 | 141_233 | 566 | 1069 | FSTL3 |
| MLLT10 | chr11:118352807 | chr10:22002700 | ENST00000377072 | + | 13 | 24 | 141_233 | 582 | 1476 | FSTL3 |
| MLLT10 | chr11:118355029 | chr10:21959377 | ENST00000307729 | + | 8 | 23 | 141_233 | 265 | 1069 | FSTL3 |
| MLLT10 | chr11:118355029 | chr10:21959377 | ENST00000377072 | + | 8 | 24 | 141_233 | 265 | 1476 | FSTL3 |
| MLLT10 | chr11:118355690 | chr10:21959377 | ENST00000307729 | + | 8 | 23 | 141_233 | 265 | 1069 | FSTL3 |
| MLLT10 | chr11:118355690 | chr10:21959377 | ENST00000377072 | + | 8 | 24 | 141_233 | 265 | 1476 | FSTL3 |