| Literature DB >> 35252882 |
Pora Kim1, Hua Tan1, Jiajia Liu1,2, Himansu Kumar1, Xiaobo Zhou1,3,4.
Abstract
Even though there were many tool developments of fusion gene prediction from NGS data, too many false positives are still an issue. Wise use of the genomic features around the fusion gene breakpoints will be helpful to identify reliable fusion genes efficiently. For this aim, we developed FusionAI, a deep learning pipeline predicting human fusion gene breakpoints from DNA sequence. FusionAI is freely available via https://compbio.uth.edu/FusionGDB2/FusionAI. For complete details on the use and execution of this protocol, please refer to Kim et al. (2021b).Entities:
Keywords: Bioinformatics; Computer sciences; Genomics; Health Sciences; Molecular Biology
Mesh:
Substances:
Year: 2022 PMID: 35252882 PMCID: PMC8892011 DOI: 10.1016/j.xpro.2022.101185
Source DB: PubMed Journal: STAR Protoc ISSN: 2666-1667
Computation resources used in this study
| Operating system | Version |
|---|---|
| CentOS Linux | 7.9.2009 |
| CPU information | Parameter |
| RAM Memory | 93 GB |
| Thread(s) per core | 2 |
| Core(s) per socket | 2 |
| Model | 85 |
| Model name | Intel(R) Xeon(R) Gold 6254 CPU @ 3.10 GHz |
| CPU MHz: | 2899.816 |
| CPU(s) | 36 |
Fusion gene information example, which were predicted for K562 cell-line from STAR-fusion
| Hgene | Hchr | Hbp | Hstrand | Tgene | Tchr | Tbp | Tstrand |
|---|---|---|---|---|---|---|---|
| BCR | chr22 | 23632600 | + | ABL1 | chr9 | 133729450 | + |
| BAG6 | chr6 | 31619433 | - | SLC44A4 | chr6 | 31833561 | - |
| NUP214 | chr9 | 134074402 | + | XKR3 | chr22 | 17288973 | - |
| ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ |
Figure 1Make input data of FusionAI
FusionAI input data example, which were made by running preprocessing script
| Hgene | Hchr | Hbp | Hstrand | Tgene | Tchr | Tbp | Tstrand | 20 Kb fusion DNA sequence |
|---|---|---|---|---|---|---|---|---|
| BCR | chr22 | 23632600 | + | ABL1 | chr9 | 133729450 | + | TACCAGAGCGGCTGCCAAC… |
| BAG6 | chr6 | 31619433 | - | SLC44A4 | chr6 | 31833561 | - | CAGTGATGCTTCTGCCTCC… |
| NUP214 | chr9 | 134074402 | + | XKR3 | chr22 | 17288973 | - | GATAAAATTTTTTCACTAA… |
| ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ | ⁞ |
Figure 2Diagram of fusion gene breakpoints classification by FusionAI
Selection of common fusion genes between FusionAI and other tools based on the FusionAI score including validated fusion genes for the user’s information
| Hgene | Hchr | Hbp | Hstrand | Tgene | Tchr | Tbp | Tstrand | STAR-fusion | STAR-fusion & FusionAI >0.5 | STAR-fusion & FusionAI >0.95 | STAR-fusion & arriba | Validated | FusionAI score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BCR | chr22 | 23632600 | + | ABL1 | chr9 | 133729450 | + | X | X | X | X | X | 0.9999999 |
| IMMP2L | chr7 | 111127293 | - | DOCK4 | chr7 | 111409733 | - | X | X | X | X | X | 0.9999999 |
| BAG6 | chr6 | 31619432 | - | SLC44A4 | chr6 | 31833561 | - | X | X | X | X | 0.99999857 | |
| RP11-344E13.3 | chr17 | 20771998 | + | UBBP4 | chr17 | 21730694 | + | X | X | X | X | 0.9999932 | |
| BAG6 | chr6 | 31619432 | - | SLC44A4 | chr6 | 31833378 | - | X | X | X | X | 0.9999831 | |
| C10orf76 | chr10 | 103799769 | - | KCNIP2 | chr10 | 103588956 | - | X | X | X | X | 0.99743265 | |
| RP11-321F6.1 | chr15 | 66874586 | + | SMAD6 | chr15 | 67004005 | + | X | X | X | 0.9900406 | ||
| NUP214 | chr9 | 134074402 | + | XKR3 | chr22 | 17288973 | - | X | X | X | X | X | 0.95663476 |
| RP11-96H19.1 | chr12 | 46781755 | + | RP11-446N19.1 | chr12 | 47046172 | + | X | X | 0.93317753 | |||
| RP11-96H19.1 | chr12 | 46781755 | + | RP11-446N19.1 | chr12 | 46965038 | + | X | X | 0.9303843 | |||
| RP5-964N17.1 | chrX | 113181480 | - | LRCH2 | chrX | 114398346 | - | X | X | 0.8816845 | |||
| UPF3A | chr13 | 115070392 | + | CDC16 | chr13 | 115037658 | + | X | X | X | X | 0.8794392 | |
| CTC-786C10.1 | chr16 | 85205413 | + | RP11-680G10.1 | chr16 | 85391068 | + | X | X | 0.8380846 | |||
| C16orf87 | chr16 | 46858297 | - | ORC6 | chr16 | 46729473 | + | X | X | X | 0.6423692 | ||
| RP11-680G10.1 | chr16 | 85391249 | + | GSE1 | chr16 | 85667519 | + | X | 0.30633911 | ||||
| C16orf87 | chr16 | 46858297 | - | ORC6 | chr16 | 46727004 | + | X | X | 0.13516404 | |||
| RP11-680G10.1 | chr16 | 85391249 | + | GSE1 | chr16 | 85682157 | + | X | 0.040422514 |
Accuracies across different comparisons of results for the users’ information
| STAR-fusion | FusionAI > 0.5 | FusionAI > 0.95 | Arriba | Validated | |
|---|---|---|---|---|---|
| TP | 6 | 6 | 5 | 4 | 6 |
| FP | 11 | 8 | 3 | 4 | 0 |
| TN | 0 | 3 | 8 | 9 | 11 |
| FN | 0 | 0 | 1 | 2 | 0 |
| Precision | 0.35 | 0.43 | 0.63 | 0.50 | 1.00 |
| Recall | 1.00 | 1.00 | 0.83 | 0.67 | 1.00 |
| Accuracy | 0.35 | 0.53 | 0.76 | 0.68 | 1.00 |
| F-measure | 0.52 | 0.60 | 0.71 | 0.57 | 1.00 |
| MCC | NA | 0.34 | 0.54 | 0.34 | 1.00 |
Figure 3Calculate the feature importance scores across 20 Kb fusion DNA sequence
Figure 4Left - distribution of 44 human genomic features across 20 Kb fusion DNA sequence
Right - overlap between the top 1% FIS regions and 44 different types of human genomic features across 20 Kb fusion DNA sequence.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| newdat_newmod_jj.h5 | FusionAI model in this paper. | |
| gencode_hg19v19_.txt | Gene structure information file with UCSC genome browser known gene format of GENCODE version 19. | |
| nib_files_hg19.tar.gz | Nib files of all chromosomes of hg19, which were transformed from fasta files provided from the UCSC genome browser. | |
| chromosome_size.txt | This paper | |
| features_info.txt | This paper | |
| feature.tar.gz | This paper | |
| Python (>=3.0) | Python Software Foundation, 2021: high-level programming language | |
| nibFrag | Converts portions of a .nib file back to fasta format. | |
| Tensor flow | TensorFlow is an end-to-end open source platform for machine learning. | |
| keras | A deep learning framework developed by François Chollet | |
| pandas | A community project for fast and easy data analysis and manipulation | |
| numpy | Community project, 2021: array processing for numbers, strings, records, and objects | |
| argparse | A python module that makes it easy to write user-friendly command-line interfaces | |
| FusionAI_pred.py | This paper | |
| FusionAI_FIS.py | This paper | |
| pre_processing_for_FusionAI_from_tab_delim.py | This paper | |
| bedtools (>=2.26.0) | ( | |
| R (>=3.5) | ( | |
| devtools (>=1.13.6) | ( | |
| bedtoolsr (2.30.0.1) | ( | |
| optparse (>=1.6.0) | ( | |
| doParallel (1.0.16) | ( | |
| iterators (1.0.13) | ( | |
| magrittr (2.0.1) | ( | |
| foreach (1.5.1) | ( | |
| ggplot2 (3.3.5) | ( | |
| gridExtra (2.3) | ( | |
| scales (1.1.1) | ( | |
| cowplot (1.1.1) | ( | |
| ggpubr (>=0.1.7) | ( | |