Literature DB >> 35252882

FusionAI, a DNA-sequence-based deep learning protocol reduces the false positives of human fusion gene prediction.

Pora Kim1, Hua Tan1, Jiajia Liu1,2, Himansu Kumar1, Xiaobo Zhou1,3,4.   

Abstract

Even though there were many tool developments of fusion gene prediction from NGS data, too many false positives are still an issue. Wise use of the genomic features around the fusion gene breakpoints will be helpful to identify reliable fusion genes efficiently. For this aim, we developed FusionAI, a deep learning pipeline predicting human fusion gene breakpoints from DNA sequence. FusionAI is freely available via https://compbio.uth.edu/FusionGDB2/FusionAI. For complete details on the use and execution of this protocol, please refer to Kim et al. (2021b).
© 2022 The Author(s).

Entities:  

Keywords:  Bioinformatics; Computer sciences; Genomics; Health Sciences; Molecular Biology

Mesh:

Substances:

Year:  2022        PMID: 35252882      PMCID: PMC8892011          DOI: 10.1016/j.xpro.2022.101185

Source DB:  PubMed          Journal:  STAR Protoc        ISSN: 2666-1667


Before you begin

Since the accelerated accumulation of the next-generation sequencing data, there were many tool developments for the prediction of fusion genes from the RNA-seq data such as STAR-Fusion (Haas et al., 2019), Arriba (Uhrig et al., 2021), SOAPfuse (Jia et al., 2013), deFuse (McPherson et al., 2011), and FusionScan (Kim et al., 2019). The main difference between those tools comes from the ways of dealing with the RNA sequencing reads that were aligned far apart and repeat region mappings. However, too many false positives were the main problems in the prediction of fusion genes and the researchers regarded the fusion genes that were predicted in more than two prediction tools as reliable fusions. This selection approach can be helpful in reducing some false positives, but also not be helpful in terms of that all these tools are relying on the split RNA sequencing reads. Using other types of information like genomic sequence features around the breakpoint area can be a helpful and efficient way for better removal of the false positives. To help identify reliable fusion genes efficiently, we developed FusionAI, a deep learning pipeline predicting human fusion gene breakpoints from DNA sequences. For the given breakpoint of fusion genes, FusionAI provides the possibility of being used as the fusion gene breakpoints and landscapes of human genomic features around the fusion gene breakpoints. FusionAI is freely available via https://compbio.uth.edu/FusionGDB2/FusionAI. The protocol below describes the specific steps for running FusionAI for the fusion genes predicted in K562 cell using STAR-Fusion (Haas et al., 2019). By combining the output result of FusionAI to these predicted fusion genes, we can have more reliable fusion genes with reduced false positives from the fusion DNA sequence using the genomic features of the fusion gene breakpoints.

Software prerequisites and data requirements

Our model is installed and run under the Linux system. Before launching our program, preinstalled Python (>= v.3.0), TensorFlow, and Keras modules are required. You should also prepare fusion gene information that was predicted using other existing tools for your cancer sample. The example of prerequisites and input data format can be found on our website: https://compbio.uth.edu/FusionGDB2/FusionAI. All the R packages required to visualize 44 human genome features in a 20 Kb DNA sequence are listed in the key resources table under the “R packages to draw feature landscape image” category. The R package “bedtoolsr” can only be installed using devtools::install_github ("PhanstielLab/bedtoolsr") and other R packages can be installed using install.packages() function.

Key resources table

Materials and equipment

The program in this protocol was written in the Ubuntu Linux system using Python language (>=v.3.0). All experiments were carried out and evaluated under the Ubuntu system with the computational resources listed in Table 1.
Table 1

Computation resources used in this study

Operating systemVersion
CentOS Linux7.9.2009
CPU informationParameter
RAM Memory93 GB
Thread(s) per core2
Core(s) per socket2
Model85
Model nameIntel(R) Xeon(R) Gold 6254 CPU @ 3.10 GHz
CPU MHz:2899.816
CPU(s)36
CRITICAL: The implementation of the model is lightweight. However, the required memory usage in practice depends on the size of your own data. 1. Our model can work with fewer CPU cores and less RAM memory, although it may take a longer time for a large dataset. During running the example input for FusionAI, it used 17.6% of a CPU and 0.3% of the memory of the server with the computation capacity described in Table 1. 2. If the user does not need to draw the feature images, then no need to install the software and algorithms to draw the feature landscape images listed in the key resources table. Computation resources used in this study

Step-by-step method details

Download our package and install the prerequisites

Timing: < 10 min Download the latest version of FusionAI into your preferred directory. The running will be executed inside of this directory (a, b, c, and d are required for running FusionAI. e, f and g are required to draw feature landscape images for the chosen fusion genes): Download FusionAI_pred.py from https://compbio.uth.edu/FusionGDB2/FusionAI/FusionAI_pred.py Download FusionAI model (newdat_newmod_jj.h5) from https://compbio.uth.edu/FusionGDB2/FusionAI/newdat_newmod_jj.h5 Download preprocessing script (pre_processing_for_FusionAI_from_tab_delim.py) from https://compbio.uth.edu/FusionGDB2/FusionAI/pre_processing_for_FusionAI_from_tab_delim.py Download example fusion gene file (k562_starfusion.txt) https://compbio.uth.edu/FusionGDB2/FusionAI/k562_starfusion.txt Download 44 human genomic feature information files (features.tar.gz, features_info.txt, and chromosome_size.txt) from https://compbio.uth.edu/FusionGDB2/FusionAI/features.tar.gz, https://compbio.uth.edu/FusionGDB2/FusionAI/features_info.txt, and https://compbio.uth.edu/FusionGDB2/FusionAI/chromosome_size.txt Download human gene structure file and nib files (gencode_hg19v19_.txt and nib_files_hg19.tar.gz) from https://compbio.uth.edu/FusionGDB2/FusionAI/gencode_hg19v19_.txt and https://compbio.uth.edu/FusionGDB2/FusionAI/nib_files_hg19.tar.gz Install R packages using the command install.packages(). Input the individual R package name into the parenthesis like install.packages(‘devtools’). These

Prepare input data of 20 Kb DNA sequence of fusion genes

Timing: < 1 min FusionAI takes the input data of fusion gene breakpoint information, which is given by other fusion gene prediction tools or known fusion gene information (k562_starfusion.txt and Table 2). The preprocessing script will make 20 Kb DNA sequences for individual fusion genes, which is the combined sequence of +/-5 Kb flanking sequence from the two breakpoints’ genomic position for individual fusion partner genes (Figure 1 and Table 3).
Table 2

Fusion gene information example, which were predicted for K562 cell-line from STAR-fusion

HgeneHchrHbpHstrandTgeneTchrTbpTstrand
BCRchr2223632600+ABL1chr9133729450+
BAG6chr631619433-SLC44A4chr631833561-
NUP214chr9134074402+XKR3chr2217288973-
Figure 1

Make input data of FusionAI

Table 3

FusionAI input data example, which were made by running preprocessing script

HgeneHchrHbpHstrandTgeneTchrTbpTstrand20 Kb fusion DNA sequence
BCRchr2223632600+ABL1chr9133729450+TACCAGAGCGGCTGCCAAC…
BAG6chr631619433-SLC44A4chr631833561-CAGTGATGCTTCTGCCTCC…
NUP214chr9134074402+XKR3chr2217288973-GATAAAATTTTTTCACTAA…
Run preprocessing script to make a 20 Kb DNA sequence from the given fusion gene information. The fusion gene information should include the following information in tab-delimited format: Hgene, Hchr, Hbp, Hstrand, Tgene, Tchr, Tbp, Tstrand. The command is shown below. Here the $ INPUT_FILE is the output file after checking the junction position of the fusion breakpoints in step 2. > python pre_processing_for_FusionAI_from_tab_delim.py [INPUT_FILE] > python pre_processing_for_FusionAI_from_tab_delim.py k562_starfusion.txt CRITICAL: The timing is based on the number of fusion genes of the input file. Fusion gene information example, which were predicted for K562 cell-line from STAR-fusion Make input data of FusionAI FusionAI input data example, which were made by running preprocessing script

Run FusionAI

Timing: < 2 s (depending on your data) FusionAI takes the 20 Kb DNA sequence of fusion genes from the previous step and outputs two probabilities as not being used and being used as the fusion gene breakpoints (Figure 2).
Figure 2

Diagram of fusion gene breakpoints classification by FusionAI

Run FusionAI prediction script to predict the fusion breakpoint tendency from the FusionAI model. Here the $ INPUT_FILE is the output file after making the 20 Kb DNA sequence in the previous step. $COLA and $COLB are the DNA sequences of 5′ and 3′ fusion partner genes that were created from the previous step. If the user wants to run for one specific fusion gene, then set $INDEX_OF_FUSION as row index of interested line in the input file. > python FusionAI_pred.py [-h] -f [INPUT_FILE] -m [MODEL, default: newdat_newmod_jj.h5] -o [OUTPUT_FILE] -A [COLA] -B [COLB] -I [INDEX_OF_FUSION] > python FusionAI_pred.py -f k562_starfusion.FusionAI.input -o k562_starfusion.FusionAI .output -m newdat_newmod_jj.h5 Diagram of fusion gene breakpoints classification by FusionAI

Select high scored fusion genes (or interested fusion genes) from FusionAI output

Timing: < 5 s From the output scores of FusionAI for the fusion candidates that were predicted from other tools, the users can select high scored or interested fusion genes. This can be done by the user in a text editor or another appropriate tool of choice. The users can stop the pipeline at this step if they do not need to do further analyses including feature importance analysis or drawing a landscape image of human genomic features in fusion genes, which take relatively long. With the output scores of FusionAI, still uses can reduce the false positives. For better understanding, Table 4 shows the comparison results among different cutoff of FusionAI scores, other prediction tools, and experimentally validated fusion genes. Table 5 shows the accuracy comparisons. When we used a higher threshold of FusionAI output scores, we could reduce the false positives efficiently.
Table 4

Selection of common fusion genes between FusionAI and other tools based on the FusionAI score including validated fusion genes for the user’s information

HgeneHchrHbpHstrandTgeneTchrTbpTstrandSTAR-fusionSTAR-fusion & FusionAI >0.5STAR-fusion & FusionAI >0.95STAR-fusion & arribaValidatedFusionAI score
BCRchr2223632600+ABL1chr9133729450+XXXXX0.9999999
IMMP2Lchr7111127293-DOCK4chr7111409733-XXXXX0.9999999
BAG6chr631619432-SLC44A4chr631833561-XXXX0.99999857
RP11-344E13.3chr1720771998+UBBP4chr1721730694+XXXX0.9999932
BAG6chr631619432-SLC44A4chr631833378-XXXX0.9999831
C10orf76chr10103799769-KCNIP2chr10103588956-XXXX0.99743265
RP11-321F6.1chr1566874586+SMAD6chr1567004005+XXX0.9900406
NUP214chr9134074402+XKR3chr2217288973-XXXXX0.95663476
RP11-96H19.1chr1246781755+RP11-446N19.1chr1247046172+XX0.93317753
RP11-96H19.1chr1246781755+RP11-446N19.1chr1246965038+XX0.9303843
RP5-964N17.1chrX113181480-LRCH2chrX114398346-XX0.8816845
UPF3Achr13115070392+CDC16chr13115037658+XXXX0.8794392
CTC-786C10.1chr1685205413+RP11-680G10.1chr1685391068+XX0.8380846
C16orf87chr1646858297-ORC6chr1646729473+XXX0.6423692
RP11-680G10.1chr1685391249+GSE1chr1685667519+X0.30633911
C16orf87chr1646858297-ORC6chr1646727004+XX0.13516404
RP11-680G10.1chr1685391249+GSE1chr1685682157+X0.040422514
Table 5

Accuracies across different comparisons of results for the users’ information

STAR-fusionFusionAI > 0.5FusionAI > 0.95ArribaValidated
TP66546
FP118340
TN038911
FN00120
Precision0.350.430.630.501.00
Recall1.001.000.830.671.00
Accuracy0.350.530.760.681.00
F-measure0.520.600.710.571.00
MCCNA0.340.540.341.00
Sort the FusionAI prediction output based on the FusionAI scores of individual fusion genes and select high-scored fusion genes. The users can choose the cutoff score, which should be larger than 0.5. Table 4 below shows the examples that were chosen with different cutoffs like 0.5 or 0.95. Then, the selected fusion genes will be used for further analyses such as screening of the feature importance scores and landscaping the human genomic features across 20 Kb fusion DNA sequence in the following steps. Selection of common fusion genes between FusionAI and other tools based on the FusionAI score including validated fusion genes for the user’s information Accuracies across different comparisons of results for the users’ information

Calculate the feature importance scores across 20 Kb DNA sequence

Timing: < 33 min After selecting the reliable fusion gene candidates, the users can check the distribution of the feature importance scores of individual fusion genes across the 20 Kb fusion DNA sequence. To calculate the feature importance score (FIS), we masked 20 bp each time by setting all the 20 values to zero and measured the change of prediction outcome upon this masking. We slide this 20 bp window 20 nucleotides each time along the whole 20K input sequence and repeated the procedure to obtain the FIS for all the 20 bp segments. In this way, we got 20,000/20 = 1,000 FIS for each input sequence. Run FusionAI feature importance score script to get the feature importance scores across the 20 Kb fusion DNA sequence. Here the $ INPUT_FILE is the output file after making 20 Kb DNA sequence in step 3. $COLA and $COLB are the DNA sequences of 5′ and 3′ fusion partner genes that were created from step 3. If the user wants to run for one specific fusion gene, then set $INDEX_OF_FUSION, the row indexes of interested lines in the input file. If the user can use multiple GPUs, then the user can control the number of GPUs using the parameter of NGPUS. However, the GPU is not necessary (Figure 3).
Figure 3

Calculate the feature importance scores across 20 Kb fusion DNA sequence

Calculate the feature importance scores across 20 Kb fusion DNA sequence > python FusionAI_FIS.py [-h] -f FILENAME [-m MODEL, default: newdat_newmod_jj.h5] [-o OUTPUT] [-A COLA] [-B COLB] [-I ROWI] [-N NGPUS] > python FusionAI_FIS.py -f k562_starfusion.FusionAI.output -o k562_starfusion.FusionAI.output.FIS

Visualize 44 human genomic features across 20 Kb DNA sequence

Timing: < 1 h 10 min and < 20 min for step 6 and 7, respectively After getting reliable fusion gene candidates and feature importance scores, it is important to interpret the aspect of human genomic features. From our original work, we integrated 44 human genomic features across five important cellular mechanism categories such as integration site category of 6 viruses, 13 types of repeat category, 5 types of structural variant category, 15 different types of chromatin state category, and 5 gene expression regulatory category (Kim et al., 2021a, 2021b). From this step, the users can create two figures on the landscape of the fusion gene breakpoint-related genomic features across the 20 Kb fusion DNA sequence. Each script will create separate figures of individual fusion genes that have the FIS values from the previous step. All figures will be created under the user defined directory. The first figure is the overlap between the 20 Kb fusion DNA sequence and 44 genomic features and the second figure is the overlap between the top 1% FIS regions and 44 genomic features (Figure 4).
Figure 4

Left - distribution of 44 human genomic features across 20 Kb fusion DNA sequence

Right - overlap between the top 1% FIS regions and 44 different types of human genomic features across 20 Kb fusion DNA sequence.

Visualize 44 human genomic features across a 20 Kb DNA sequence. Run FusionAI genomic feature analysis script to make a landscape image of overlap between fusion breakpoints area (+/- 5 Kb) and 44 human genomic features. > Rscript FusionAI_genomic_features.R -g [FUSION_GENE_FILE] -f [FEATURE_PATH] -s [CHROMOSOME_SIZE_FILE] -i [FEATURE_INFO_FILE] -o [OUTPUT_FILE_PATH] > Rscript FusionAI_genomic_features.r -g K562_STARfusion.FusionAI.output.FIS -f ./features/ -s chromosome_size.txt -i features_info.txt -o ./K562/whole_features/ Visualize the overlaps between the top 1% FIS regions and 44 human genomic features across 20 Kb DNA sequence. Run FusionAI genomic feature analysis script to have the landscape of overlap between high-FIS regions of fusion genes and 44 human genomic features. > Rscript FusionAI_genomic_features2.R -g [FUSION_GENE_FILE] -f [FEATURE_PATH] -s [CHROMOSOME_SIZE_FILE] -i [FEATURE_INFO_FILE] -o [OUTPUT_FILE_PATH] > Rscript FusionAI_genomic_features2.r -g K562_STARfusion.FusionAI.output.FIS -f ./features/ -s chromosome_size.txt -i features_info.txt -o ./K562/top1pct_features/ Left - distribution of 44 human genomic features across 20 Kb fusion DNA sequence Right - overlap between the top 1% FIS regions and 44 different types of human genomic features across 20 Kb fusion DNA sequence.

Expected outcomes

The above command will generate the following results from your fusion gene candidates’ information: FusionAI output scores: FusionAI result will be saved in the current working directory with your preferred output file name. Feature importance scores: 1,000 feature importance scores of individual fusion genes that were resulted as the potential fusion breakpoints will be saved in the current working directory with your preferred output file name. Genomic feature landscape images: the distribution of 44 human genomic features across 20 Kb fusion DNA sequence will be saved in the current working directory with your preferred output file name.

Limitations

For prediction tasks, since our program provides an option of taking one fusion at a time, there is no problem running it on a CPU.

Troubleshooting

Problem 1

Installation of FusionAI fails due to uninstalled prerequisites (step 1).

Potential solution

Please install the required dependencies manually through the links we provided in the key resources table, and then try installing FusionAI again.

Problem 2

Installation of FusionAI fails due to using old version python (step 1). Please install the recent version of python at least v 3.0, and then try installing FusionAI again.

Problem 3

The preprocessing script fails to read the fusion gene information (step 2). Please make the fusion gene information following the format described in step 2.

Problem 4

FusionAI fails to read the input file or parse it correctly. Currently, FusionAI can only parse the tab- and space-separated file. Please check the format of the input file and make sure each column was properly separated and each row has the same number of columns.

Problem 5

FusionAI fails at the one-hot encoding step. Make Sure the input DNA sequences contain only five letters: A, C, G, T, and N.

Problem 6

Running FusionAI fails due to missing parameters (step 3). Please provide the essential parameters to run FusionAI such as input and output file names, and then run FusionAI again.

Problem 7

Creating genomic feature landscape image fails due to not downloading human genomic feature information files (step 3). Please download the human genomic feature information files from the link we provided in the key resources table, and then run the script again.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Pora Kim (pora.kim@uth.tmc.edu).

Materials availability

This study did not generate new unique reagents.
REAGENT or RESOURCESOURCEIDENTIFIER
Deposited data

newdat_newmod_jj.h5FusionAI model in this paper.https://compbio.uth.edu/FusionGDB2/FusionAI/newdat_newmod_jj.h5
gencode_hg19v19_.txtGene structure information file with UCSC genome browser known gene format of GENCODE version 19.https://compbio.uth.edu/FusionGDB2/FusionAI/ gencode_hg19v19_.txt
nib_files_hg19.tar.gzNib files of all chromosomes of hg19, which were transformed from fasta files provided from the UCSC genome browser.https://compbio.uth.edu/FusionGDB2/FusionAI/nib_files_hg19.tar.gz
chromosome_size.txtThis paperhttps://compbio.uth.edu/FusionGDB2/FusionAI/chromosome_size.txt
features_info.txtThis paperhttps://compbio.uth.edu/FusionGDB2/FusionAI/features_info.txt
feature.tar.gzThis paperhttps://compbio.uth.edu/FusionGDB2/FusionAI/feature.tar.gz

Software and algorithms

Python (>=3.0)Python Software Foundation, 2021: high-level programming languagehttps://www.python.org/downloads/
nibFragConverts portions of a .nib file back to fasta format.http://hgdownload.soe.ucsc.edu/admin/jksrc.zip
Tensor flowTensorFlow is an end-to-end open source platform for machine learning.https://anaconda.org/conda-forge/tensorflow
kerasA deep learning framework developed by François Chollethttps://github.com/keras-team/keras
pandasA community project for fast and easy data analysis and manipulationhttps://pandas.pydata.org/about/
numpyCommunity project, 2021: array processing for numbers, strings, records, and objectshttps://numpy.org/
argparseA python module that makes it easy to write user-friendly command-line interfaceshttps://docs.python.org/3/library/argparse.html
FusionAI_pred.pyThis paperhttps://compbio.uth.edu/FusionGDB2/FusionAI/FusionAI_pred.py
FusionAI_FIS.pyThis paperhttps://compbio.uth.edu/FusionGDB2/FusionAI/FusionAI_FIS.py
pre_processing_for_FusionAI_from_tab_delim.pyThis paperhttps://compbio.uth.edu/FusionGDB2/FusionAI/pre_processing_for_FusionAI_from_tab_delim.py
bedtools (>=2.26.0)(Quinlan and Hall, 2010): a powerful toolset for genome arithmetichttps://bedtools.readthedocs.io/en/latest/content/installation.html
R (>=3.5)(Team, 2019): software environment for statistical computing and graphicshttps://www.r-project.org/
devtools (>=1.13.6)(Wickham et al., 2018): developing R Packages toolhttps://cran.r-project.org/web/packages/devtools/index.html
bedtoolsr (2.30.0.1)(Patwardhan et al., 2019): genomic data analysis and manipulationhttp://phanstiel-lab.med.unc.edu/bedtoolsr-install.html
optparse (>=1.6.0)(Davis, 2018): Command Line Option Parserhttps://cran.r-project.org/web/packages/optparse/index.html
doParallel (1.0.16)(Corporation and Weston, 2020): parallel backendhttps://cran.r-project.org/web/packages/doParallel/index.html
iterators (1.0.13)(Analytics and Weston, 2020): a package to allow a programmer to traverse through all the elements of a vector, list, or other collection of datahttps://cran.r-project.org/web/packages/iterators/index.html
magrittr (2.0.1)(Bache and Wickham, 2020): A Forward-Pipe Operator for Rhttps://cran.r-project.org/web/packages/magrittr/index.html
foreach (1.5.1)(Microsoft and Weston, 2020): an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter.https://cran.r-project.org/web/packages/foreach/index.html
ggplot2 (3.3.5)(Wickham, 2016): Elegant Graphics for Data Analysishttps://cran.r-project.org/web/packages/ggplot2/index.html
gridExtra (2.3)(Auguie, 2017): a package to arrange multiple grid-based plots on a pagehttps://cran.r-project.org/web/packages/gridExtra/index.html
scales (1.1.1)(Wickham and Seidel, 2020): Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.https://cran.r-project.org/web/packages/scales/index.html
cowplot (1.1.1)(Wilke, 2020): a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images.https://cran.r-project.org/web/packages/cowplot/index.html
ggpubr (>=0.1.7)(Kassambara, 2018): 'ggplot2′ Based Publication Ready Plotshttps://cran.r-project.org/web/packages/ggpubr/index.html
  9 in total

1.  Bedtoolsr: An R package for genomic data analysis and manipulation.

Authors:  Mayura N Patwardhan; Craig D Wenger; Eric S Davis; Douglas H Phanstiel
Journal:  J Open Source Softw       Date:  2019-12-06

2.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

3.  deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data.

Authors:  Andrew McPherson; Fereydoun Hormozdiari; Abdalnasser Zayed; Ryan Giuliany; Gavin Ha; Mark G F Sun; Malachi Griffith; Alireza Heravi Moussavi; Janine Senz; Nataliya Melnyk; Marina Pacheco; Marco A Marra; Martin Hirst; Torsten O Nielsen; S Cenk Sahinalp; David Huntsman; Sohrab P Shah
Journal:  PLoS Comput Biol       Date:  2011-05-19       Impact factor: 4.475

4.  Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods.

Authors:  Brian J Haas; Alexander Dobin; Bo Li; Nicolas Stransky; Nathalie Pochet; Aviv Regev
Journal:  Genome Biol       Date:  2019-10-21       Impact factor: 13.583

5.  FusionScan: accurate prediction of fusion genes from RNA-Seq data.

Authors:  Pora Kim; Ye Eun Jang; Sanghyuk Lee
Journal:  Genomics Inform       Date:  2019-07-23

6.  Accurate and efficient detection of gene fusions from RNA sequencing data.

Authors:  Sebastian Uhrig; Julia Ellermann; Tatjana Walther; Pauline Burkhardt; Martina Fröhlich; Barbara Hutter; Umut H Toprak; Olaf Neumann; Albrecht Stenzinger; Claudia Scholl; Stefan Fröhling; Benedikt Brors
Journal:  Genome Res       Date:  2021-01-13       Impact factor: 9.043

7.  FusionGDB 2.0: fusion gene annotation updates aided by deep learning.

Authors:  Pora Kim; Hua Tan; Jiajia Liu; Haeseung Lee; Hyesoo Jung; Himanshu Kumar; Xiaobo Zhou
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

8.  FusionAI: Predicting fusion breakpoint from DNA sequence with deep learning.

Authors:  Pora Kim; Hua Tan; Jiajia Liu; Mengyuan Yang; Xiaobo Zhou
Journal:  iScience       Date:  2021-09-25

9.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data.

Authors:  Wenlong Jia; Kunlong Qiu; Minghui He; Pengfei Song; Quan Zhou; Feng Zhou; Yuan Yu; Dandan Zhu; Michael L Nickerson; Shengqing Wan; Xiangke Liao; Xiaoqian Zhu; Shaoliang Peng; Yingrui Li; Jun Wang; Guangwu Guo
Journal:  Genome Biol       Date:  2013-02-14       Impact factor: 13.583

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.