| Literature DB >> 34755868 |
Pora Kim1, Hua Tan1, Jiajia Liu1, Haeseung Lee2, Hyesoo Jung3, Himanshu Kumar1, Xiaobo Zhou1,4,5.
Abstract
A knowledgebase of the systematic functional annotation of fusion genes is critical for understanding genomic breakage context and developing therapeutic strategies. FusionGDB is a unique functional annotation database of human fusion genes and has been widely used for studies with diverse aims. In this study, we report fusion gene annotation updates aided by deep learning (FusionGDB 2.0) available at https://compbio.uth.edu/FusionGDB2/. FusionGDB 2.0 has substantial updates of contents such as up-to-date human fusion genes, fusion gene breakage tendency score with FusionAI deep learning model based on 20 kb DNA sequence around BP, investigation of overlapping between fusion breakpoints with 44 human genomic features across five cellular role's categories, transcribed chimeric sequence and following open reading frame analysis with coding potential based on deep learning approach with Ribo-seq read features, and rigorous investigation of the protein feature retention of individual fusion partner genes in the protein level. Among ∼102k fusion genes, about 15k kept their ORF as In-frames, which is two times compared to the previous version, FusionGDB. FusionGDB 2.0 will be used as the reference knowledgebase of fusion gene annotations. FusionGDB 2.0 provides eight categories of annotations and it will be helpful for diverse human genomic studies.Entities:
Mesh:
Year: 2022 PMID: 34755868 PMCID: PMC8728198 DOI: 10.1093/nar/gkab1056
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of FusionGDB 2.0. Updated FusionGDB provides multiple annotations on fusion genes in eight categories including fusion gene summary, fusion gene genomic feature, fusion gene ORF, fusion protein features, fusion sequence, fusion PPI, related drugs and related diseases.
Statistical comparison of FusionGDB and FusionGDB 2.0
| # fusion genes | # in-frame fusion genes | # all partner genes | # reviewed UniProt accessions of all partner genes | |
|---|---|---|---|---|
| FusionGDB | 43 895 | 9859 | 14 910 | 14 943 |
| FusionGDB 2.0 | 102 645 | 16 146 | 26 688 | 17 300 |
Number of fusion genes per gene group
| Gene group | # in-frame fusion genes | # frame-shift fusion genes |
|---|---|---|
| Cancer gene census | 2747 | 2862 |
| Cell metabolism | 4136 | 4342 |
| Epigenetic factor | 2076 | 2309 |
| Essential gene | 11 628 | 12 708 |
| IUPHAR | 4871 | 5325 |
| Kinase | 1430 | 1582 |
| Trascription factor | 2002 | 2347 |
| Tumor suppressors | 2513 | 2694 |
Figure 2.Feature importance (FI) score distributions across fusion breakpoint sequence of 20 Kbp length from FusionAI. (A) Distribution of overlaps between top 10% FI scored regions and 44 different types of human genomic features. (B) Distribution of overlaps between all regions and 44 different types of human genomic features. (Individual background corresponds to categories with the same colored font among 44 human genomic features).
Number of fusion genes per ORF types
| ORF of fusion transcript | # fusion genes |
|---|---|
| 3UTR-3UTR | 3035 |
| 3UTR-5UTR | 2311 |
| 3UTR-CDS | 5499 |
| 3UTR-intron | 8577 |
| 5UTR-3UTR | 2204 |
| 5UTR-5UTR | 5132 |
| 5UTR-CDS | 9865 |
| 5UTR-intron | 8781 |
| CDS-3UTR | 6890 |
| CDS-5UTR | 11 944 |
| CDS-intron | 28 940 |
| Frame-shift | 17 710 |
| In-frame | 16 146 |
| intron-3UTR | 13 411 |
| intron-5UTR | 12 883 |
| intron-CDS | 27 931 |
| intron-intron | 48 685 |
Figure 3.Prediction of the coding potential of fusion transcript with the deep learning approach. (A) Pipeline for creating the training data of deepORF. (B) Performance comparison between deepORF and RNAsamba. (C) Comparison of the distribution of the coding potential scores for in-frame fusion transcripts between deepORF data-based model (blue) and RAsamba (pink).