| Literature DB >> 31680157 |
Ye Eun Jang1, Insu Jang2, Sunkyu Kim3, Subin Cho1, Daehan Kim3, Keonwoo Kim3, Jaewon Kim1, Jimin Hwang1, Sangok Kim4, Jaesang Kim4, Jaewoo Kang3, Byungwook Lee2, Sanghyuk Lee1,4.
Abstract
Fusion genes represent an important class of biomarkers and therapeutic targets in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data (ChimerSeq) and text mining of publications (ChimerPub) with extensive manual annotations (ChimerKB). In this update, we present all three modules substantially enhanced by incorporating the recent flood of deep sequencing data and related publications. ChimerSeq now covers all 10 565 patients in the TCGA project, with compilation of computational results from two reliable programs of STAR-Fusion and FusionScan with several public resources. In sum, ChimerSeq includes 65 945 fusion candidates, 21 106 of which were predicted by multiple programs (ChimerSeq-Plus). ChimerPub has been upgraded by applying a deep learning method for text mining followed by extensive manual curation, which yielded 1257 fusion genes including 777 cases with experimental supports (ChimerPub-Plus). ChimerKB includes 1597 fusion genes with publication support, experimental evidences and breakpoint information. Importantly, we implemented several new features to aid estimation of functional significance, including the fusion structure viewer with domain information, gene expression plot of fusion positive versus negative patients and a STRING network viewer. The user interface also was greatly enhanced by applying responsive web design. ChimerDB 4.0 is available at http://www.kobic.re.kr/chimerdb/.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31680157 PMCID: PMC7145594 DOI: 10.1093/nar/gkz1013
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of ChimerDB 4.0. Each number indicates the number of unique gene pairs from the relevant resources.
Statistics of ChimerDB 4.0
| ChimerKB | ChimerPub | ChimerSeq | |||
|---|---|---|---|---|---|
| Literature curation | 147 |
|
| 49 648 | |
| COSMIC | 331 | Translocation | 925 | STAR-Fusion | 28 749 |
| mRNA Sequence | 272 | Disease | 1075 | FusionScan | 12 070 |
| Mitelman, OMIM, GenBank | 459 | Validation method | 1049 | TumorFusions | 18 404 |
| ChimerPub-Plus | 777 | TCGA FAWG | 23 978 | ||
| TopHat-Fusion | 1624 | ||||
|
| 142 | ||||
|
| 16 270 | ||||
| Panel of | 2985 | ||||
|
|
|
|
|
|
|
| ChimerPub supported | 937 | ChimerKB supported | 937 | ChimerKB supported | 240 |
| ChimerSeq supported | 240 | ChimerSeq supported | 205 | ChimerPub supported | 205 |
| | 21 106 | ||||
|
|
| ||||
| Exon junction | 1063 | TCGA | 52 534 | ||
| ChiTaRS | 16 152 | ||||
All numbers represent the number of unique fusion genes.
aTranscripts not included in ChimerKB and ChimerPub were classified as novel fusion.
Figure 2.Statistics and contents of ChimerDB 4.0. (A) Venn diagram of unique fusions in three modules. (B) Venn diagram of unique fusions from five prediction pipelines that analyzed the TCGA dataset. (C) Contribution of each prediction pipeline to ChimerSeq-Plus. Dark colors indicate fusion genes that were identified by ≥3 prediction programs, whereas light colors indicate fusion genes predicted by the program of interest and one additional program. (D) Bar plot of TCGA samples for each cancer type. (E) Bar plot of fusion genes in different functional categories for each cancer type.
Figure 3.Statistics and contents of ChimerPub 4.0. (A) Comparison of ChimerPub 4.0 versus 3.0. (B) Curative procedure and resulting numbers at each step. (C) Number of ChimerKB entries with information on breakpoints and/or experimental evidences.
Figure 4.Recurrent fusion genes from the TCGA cohort. (A) Representative known fusion genes in ChimerKB and ChimerPub. (B) Representative novel fusion genes in ChimerSeq-Plus. (C) Recurrent fusion genes for each cancer type. Horizontal axes (A–C) indicate the number of patients with fusion genes identified in ChimerSeq-Plus.
Figure 5.User interface of ChimerDB 4.0. (A) The search and filter window and output table in ChimerKB. (B) Main output form for a ChimerKB entry. Colored blocks are links to detailed information. (C) Example of a PubMed abstract where key words are highlighted. (D) Example of fusion structure viewer. (E) STRING network view. (F) Gene expression plots of 5′ and 3′ genes in fusion-positive versus fusion-negative patients in the TCGA dataset. (G) Scatter plots of gene expression versus copy number for 5′ and 3′ genes in the TCGA dataset.