Literature DB >> 34522901

Integrated protocol for exitron and exitron-derived neoantigen identification using human RNA-seq data with ScanExitron and ScanNeo.

Abstract

Exitron splicing (EIS) events in cancers can disrupt functional protein domains to cause cancer driver effects. EIS has been recognized as a new source of tumor neoantigens. Here, we describe an integrated protocol for EIS and EIS-derived neoantigen identification using RNA-seq data. The protocol constitutes a step-by-step guide from data collection to neoantigen prediction. For complete details on the use and execution of this protocol, please refer to Wang et al. (2021).

Entities: Chemical

Keywords: Bioinformatics; Cancer; Genetics; Genomics; Immunology; RNAseq

Mesh：

Substances：
Antigens, Neoplasm

Year: 2021 PMID： 34522901 PMCID： PMC8424586 DOI： 10.1016/j.xpro.2021.100788

Source DB: PubMed Journal: STAR Protoc ISSN： 2666-1667

Before you begin

Data collection

This integrated protocol to analyze RNA sequencing (RNA-seq) data includes two components: ScanExitron (Wang et al., 2021) and ScanNeo (Wang et al., 2019) (Figure 1). ScanExitron was designed to detect exitron splicing events from short-read RNA-seq data, such as those produced by the Illumina sequencing platform from The Cancer Genome Atlas (TCGA) study (Wang et al., 2021). ScanNeo was originally developed for insertion and deletion (indel) derived neoantigen detection. Because of the similarity between deletions and EIS events in their effects changing protein sequences, ScanNeo is capable of detecting exitron-derived neoantigen directly. By definition, exitrons are cryptic introns with both their splice sites inside an annotated protein-coding exon. Therefore, human reference gene annotation is needed to identify bona fide exitrons. We recommend using the GRCh38 gene annotation GTF file from the GENCODE project (Frankish et al., 2019).

Figure 1

Flow chart showing exitron and exitron-derived neoantigens detection with ScanExitron and ScanNeo

Flow chart showing exitron and exitron-derived neoantigens detection with ScanExitron and ScanNeo The protocol below describes ScanExitron applications analyzing a toy example data set and a real data set from the TCGA prostate cancer (PRAD) cohort, respectively. Example data can be found at https://github.com/ylab-hi/ScanExitron/tree/master/example_data. The RNA-seq alignment files in BAM format for TCGA PRAD cohort can be downloaded from NCI Genomic Data Commons (https://portal.gdc.cancer.gov/). A single representative aliquot was selected per participant for cases where more than one aliquot was available. Thus, 496 PRAD primary tumor samples and 52 normal samples were kept. HLA class I four-digit types of 495 out of 496 TCGA PRAD samples were obtained from (Thorsson et al., 2018) (https://gdc.cancer.gov/about-data/publications/panimmune). For the remaining one sample used in this study, ScanNeo was employed for HLA class I typing.

Optional reads alignment

If the users are dealing with their in-house RNA-seq data in raw FASTQ format, the alignment step will be needed before running the protocol. ScanExitron requires the input to be a BAM file, which is provided by a splice-aware aligner, such as HISAT2 (Kim et al., 2019). We recommend aligning the raw read FASTQ file using HISAT2 with Hierarchical Graph Ferragina-Manzini (HGFM) index built with known transcripts annotations. Users can build the HGFM index on their own (http://daehwankimlab.github.io/hisat2/howto/#build-hgfm-index-with-transcripts) or download the HGFM index (genome_tran) directly (http://daehwankimlab.github.io/hisat2/download/).

Key resources table

Materials and equipment

Data (RNA-seq alignment files in BAM format – see data collection in before you begin)

Software

ScanExitron and its dependencies. ScanExitron is implemented in Python 3. While different versions of the Python software and associated packages may work correctly with ScanExitron, the authors use Python 3.7 and the following packages at the indicated versions when writing this protocol: pyfaidx (v0.5.9.2) SamTools (v1.12) BEDTools (v2.26.0) RegTools (v0.4.2) ScanExitron is not compatible with RegTools (v0.5 or above) in its current design. ScanNeo and its dependencies. ScanNeo is also implemented in Python 3. When writing this protocol, the authors use Python 3.7 and the following packages at the indicated versions: transIndel (v2.0) IEDB MHC class I peptide binding prediction tools (v3.1) optitype (v1.3.5) BWA (v0.7.17) Sambamba (v0.8.0) BEDTools (v2.26.0) Variant Effect Predictor (v102.0) coincbc (v2.10.5) razers3 (v3.5.8) Picard (v2.24.0) Yara (v1.0.2) pyomo (v5.7.3) PyVCF (v0.6.8) HDF5 (v1.10.4) tabix (v1.12) pyfaidx (v0.5.9.2)

Step-by-step method details

Step 1: Installing ScanExitron and ScanNeo

Timing: 60 min Full installation of ScanExitron and ScanNeo includes downloading the ScanExitron and ScanNeo packages from GitHub. An example of how to perform all steps of this protocol using example data is available on the project GitHub at https://github.com/ylab-hi/ScanExitron/wiki/Exitron-and-exitron-derived-neoantigen-identification-with-ScanExitron-and-ScanNeo Installing ScanExitron Install ScanExitron dependencies Install RegTools v0.4.2 $ git clone --depth 1 --branch 0.4.2https://github.com/griffithlab/regtools.git Install other dependent packages via conda. $ conda install -c bioconda samtools bedtools pyfaidx Install ScanExitron by running the following code: $ git clonehttps://github.com/ylab-hi/ScanExitron.git CRITICAL: Check if all required dependencies are downloaded and installed correctly. Originally, installing packages via conda will automatically check for and install the required dependencies. However, errors during installation could occur when installing on computational environments (Troubleshooting 1 and Troubleshooting 2). Installing ScanNeo Install ScanNeo dependencies Install transIndel v2.0 $ git clonehttps://github.com/cauyrd/transIndel Add the directory of transIndel_build_RNA.py and transIndel.py to the $PATH environment variable. Install IEDB HLA class I binding prediction tools (https://downloads.iedb.org/tools/mhci/3.1/IEDB_MHC_I-3.1.tar.gz) Install other dependent packages via conda. $ conda install -c bioconda optitype ensembl-vep sambamba bedtools picard bwa yara razers3 pyfaidx pyvcf $ conda install -c conda-forge coincbc $ conda install -c anaconda hdf5 Install VEP annotations and plugins Install VEP annotations using the following command. $ vep_install -a cf -s homo_sapiens -y GRCh38 –CONVERT Before install VEP annotations, make sure the directory of executable file vep_install is in the $PATH environment variable (Troubleshooting 3). Install two VEP plugins for ScanNeo. $ git clonehttps://github.com/ylab-hi/ScanNeo.git $ cd VEP_plugins $ cp Downstream.pm ∼/.vep/Plugins $ cp Wildtype.pm ∼/.vep/Plugins Configure optitype and yara index according to the ScanNeo manual (https://github.com/ylab-hi/ScanNeo). Install ScanNeo by running the following code: $ git clonehttps://github.com/ylab-hi/ScanNeo.git Make sure the directories of all the executable files are in the $PATH environment variable.

Step 2: Preparing the reference genome sequences and gene annotation files

Timing: 15 min ScanExitron utilized annotated coding sequence (CDS) regions to probe the exitrons, and it also extracted splice sites using the reference genome sequences. The human reference genome sequences and gene annotation will be used. Preparing human reference genome sequences in FASTA format. Download hg38 FASTA human reference genomes from UCSC genome browser (https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz) and unzip it. Preparing reference gene annotation in GTF format. Download hg38 annotation file from GENCODE project (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_37/gencode.v37.annotation.gtf.gz) Extract the protein-coding CDS regions In Unix/Linux system, the protein-coding exons regions can be extracted using “cat”, “awk” and “tr” commands, as followed. $ cat gencode.v37.annotation.gtf | awk 'OFS="\t" {if ($3=="CDS") {print $1,$4-1,$5,$10,$16,$7}}' | tr -d '";' > gencode.hg38.CDS.bed Make sure the input RNA-seq BAM files used the same coordinate system as the reference genome and the reference annotations files. Otherwise, you have to remap the RNA-seq reads with the corresponding reference genome.

Step 3: Running ScanExitron

Timing: 5min After installing all of the dependencies and preparing the reference genome sequences and annotation files, it is time to run ScanExitron. ScanExitron can be used only in UNIX/Linux systems currently. Additional details for running ScanExitron and updates to the parameters can be found at the project GitHub repository (https://github.com/ylab-hi/ScanExitron). Here we only provided the running time for the toy example dataset, which contains three exitrons. The actual running time for the real sample is dependent on the number of junction reads and the number of exitrons in it. Make necessary modifications to the configuration file of ScanExitron. Replace the items in config.ini with the reference genome sequences and annotation files prepared in step 2: preparing the reference genome sequences and gene annotation files. The example config.ini file can be found at https://github.com/ylab-hi/ScanExitron/blob/master/config.ini.example (Troubleshooting 4). Run ScanExitron with the following command: $ ScanExitron.py -i example.bam --ao 3 --pso 0.05 -m 50 -r hg38 CRITICAL: Make sure the input RNA-seq BAM files used the same coordinate system as the reference genome and the reference annotations files (Troubleshooting 5). In practice, the different parameter settings will result in the different number of exitrons identified. For example, if you set a higher alternate allele observation (AO) and percent spliced out (PSO) (Wang et al., 2021), you will get a smaller number of exitrons. The details for these two metrics are described in quantification and statistical analysis. Additional details for running ScanExitron and updates to the parameters can be found at the project GitHub repository (https://github.com/ylab-hi/ScanExitron). Multiple files will be generated in this step, including “example.hq.bam”, “example.hq.bam.bai”, “example.hq.janno” and “example.exitron”. The identified exitrons are stored in the example.exitron file (Table 1). Figure 2 illustrates these detected EIS events using Integrative Genomics Viewer (IGV) (Robinson et al., 2011).

Table 1

The identified exitrons in the example data set

chrm:start-end	ao	strand	gene_symbol	Length	splice_site	pso	psi	dp
chr22:29489329–29489390	169	+	NEFH	60	GC-AG	0.261	0.739	648
chr22:29489371–29489432	80	+	NEFH	60	GT-AG	0.115	0.885	696
chr22:29489593–29489618	36	+	NEFH	24	GC-AG	0.0848	0.915	424

Figure 2

Three exitron splicing (EIS) events identified in NEFH gene loci by ScanExitron from the example RNA-seq data

Differential analysis will be available if researchers have groups of samples of interest (Troubleshooting 6). The identified exitrons in the example data set Three exitron splicing (EIS) events identified in NEFH gene loci by ScanExitron from the example RNA-seq data In order to feed the ScanExitron results to ScanNeo, output files of ScanExitron are required to be converted to VCF format using the utility script named exitron2vcf.py contained in the ScanExitron utils folder with the following command: $ exitron2vcf.py -i example.exitron -o example.vcf The directory of exitron2vcf.py should be in the $PATH environment variable.

Step 4: Running ScanNeo

Timing: 15 min After running ScanExitron for the sample dataset, we get a list of exitron splicing events in the example.vcf file. In practice, you have to also run ScanExitron for the corresponding normal samples aiming to obtain exitrons that are tumor specific. Here, we assume all the exitrons identified in the sample dataset are tumor-specific exitrons (TSEs). It is time to run ScanNeo to generate exitron-derived neoantigens. ScanNeo can be used only in UNIX/Linux systems currently. Additional details for running ScanNeo and updates to the parameters can be found at the project GitHub repository (https://github.com/ylab-hi/ScanNeo). Make necessary modifications to the configuration file of ScanNeo. Replace the items in config.ini with the reference genome sequences and gene annotation files prepared in step 2: preparing the reference genome sequences and gene annotation files. The example config.ini file can be found at https://github.com/ylab-hi/ScanNeo/blob/master/config.ini.example. (Troubleshooting 4) The reference genome sequences field is mandatory for this protocol. The gene annotation field is necessary when calling indels using ScanNeo. Yara HLA index field is necessary when HLA typing using ScanNeo. Run ScanNeo ScanNeo first added corresponding reference and alternate allele sequences to each EIS event. Next, these events were annotated with variant effect predictor (VEP) (McLaren et al., 2016). Run this annotation step of ScanNeo using the following command. $ ScanNeo.py anno -i example.vcf -o example.vep.vcf Neoantigen prediction step of ScanNeo used VEP annotated VCF file as input to predict neoantigens using the following command. $ ScanNeo.py hla -i example.vep.vcf --alleles HLA-A∗68:02,HLA-A∗23:01,HLA-B∗07:02,HLA-B∗53:01,HLA-C∗07:02,HLA-C∗04:01 -t 16 --af PSO -e 9 -p /path/to/iedb/ -o example.tsv The putative exitron-derived neoantigens are stored in the example.tsv file (Table 2).

Table 2

The predicted exitron-derived neoantigens in the example data set

Chrom	Start	Stop	Gene name	HLA allele	Peptide length	MT epitope seq	WT epitope seq	Best MT score method	Best MT score	Corresponding WT score
chr22	29489329	29489389	NEFH	HLA-B∗07:02	9	SPPEAKSPA	SPPEAKSPE	NetMHCpan	399.52	7247.45

This is a good time to compare your output results files to example files provided in the ScanExitron GitHub repository (https://github.com/ylab-hi/ScanExitron/tree/master/example_data) to ensure that you have run the protocol correctly. Pause point: Once you know the parameters you wish to use and have successfully run ScanExitron and ScanNeo, you may find this to be a good place to pause and evaluate the results before proceeding with the optional steps. The predicted exitron-derived neoantigens in the example data set

Optional step 5: Running this protocol for TCGA PRAD cohort

Timing: 15 h As a matter of fact, we have to use exitrons that are tumor-specific to predict neoantigens. We used TCGA PRAD cohort that includes 496 tumor and 52 tumor-adjacent normal samples to demonstrate how to use this protocol. For every sample in TCGA PRAD cohort, we identified EIS events of PRAD tumor and normal samples following the instructions in step 3: running ScanExitron. Then we generated a list of tumor-specific exitrons (TSEs) by excluding the EIS events in tumor samples that were also found in more than three normal samples. We achieved this filtering process using in-house Python scripts, which are available at https://github.com/ylab-hi/ScanExitron/wiki/Exitron-and-exitron-derived-neoantigen-identification-with-ScanExitron-and-ScanNeo. A summary of identified exitrons and TSEs in PRAD is described in Figure 3.

Figure 3

Tumor-specific exitron (TSE) splicing events detection in PRAD cohort

(A) The proportion of frameshift and inframe TSEs in PRAD tumors.

(B) The proportion of genes with and without exitrons in PRAD tumors.

(D) PSO distribution of TSEs identified in PRAD tumors.

Tumor-specific exitron (TSE) splicing events detection in PRAD cohort (A) The proportion of frameshift and inframe TSEs in PRAD tumors. (B) The proportion of genes with and without exitrons in PRAD tumors. (C) Exitron size distribution of TSEs identified in PRAD tumors. (D) PSO distribution of TSEs identified in PRAD tumors. Run step 8 using the same parameters for TSEs of every sample in VCF format, we identified exitron-derived neoantigens for PRAD cohort (Figure 4).

Figure 4

The loads of TSEs, frameshift TSEs, inframe TSEs, neoantigen-yielding TSEs, neoantigen-yielding frameshift TSEs, neoantigen-yielding inframe TSEs, and putative TSE neoantigens in PRAD tumors

The loads of TSEs, frameshift TSEs, inframe TSEs, neoantigen-yielding TSEs, neoantigen-yielding frameshift TSEs, neoantigen-yielding inframe TSEs, and putative TSE neoantigens in PRAD tumors The timing didn’t include downloading PRAD BAM files. In step 9, we submitted 16 jobs in the Slurm queue system. Every job only required one CPU core. In step 10, we used 20 jobs, every job required 16 CPU cores. Because ScanNeo implemented a parallel computing architecture, we highly recommend users set more CPU cores for it.

Expected outcomes

At the end of the process of the example dataset, you will have two main text files; (1) the exitron splicing events identified (Data showed in Table 1 and Figure 2) and (2) the predicted exitron-derived neoantigens (Data showed in Table 2). At the end of the process of the PRAD RNA-seq dataset, you will have TSE events for 496 PRAD patients and the corresponding predicted neoantigens (Data plotted in Figures 3 and 4).

Quantification and statistical analysis

For every exitron splicing event identified, we used two measurements to quantify the exitron splicing event, that is, AO and PSO (Wang et al., 2021) . AO is the number of splice junction reads supporting exitron splicing. PSO metric was used to measure the percentage of transcripts in which a given exitron is spliced. Generally speaking, higher AO and PSO metrics indicated exitron splicing events with high confidence. Besides AO and PSO, we also reported percent spliced-in (PSI) (Schafer et al., 2015) as the counterpart of PSO and the average depth of the identified exitron splicing event in the ScanExitron output. Additional details for ScanExitron results can be found at the project GitHub repository (https://github.com/ylab-hi/ScanExitron).

Limitations

The accuracy of exitron identification with ScanExitron is dependent on the accuracy of splice junctions from the RNA-seq BAM file and the completeness of CDS annotations. Firstly, due to the complexity of alternative splicing within a gene and the short-reads length, splice-aware aligners could produce large numbers of false-positive junctions (Engstrom et al., 2013). There is no optimal solution so far. But we can still mitigate it in two ways. One way is to make aligners prefer to use known splice sites by using the index built with known transcripts annotations, as we suggested in the optional reads alignment section. The other obvious way is to increase the read length when possible. Secondly, even for model organisms such as human, the reference annotations are incomplete, thus genuine exitrons with supporting junctions may be missed owing to the lack of overlapped annotated CDS annotations. Thus, in practice, we highly suggested using the latest gene annotations when possible. Currently, the neoantigen prediction workhorse of this protocol, ScanNeo, only supports two well-established and popular MHC class I prediction algorithms, aka, NetMHC (Lundegaard et al., 2008) and NetMHCpan (Nielsen and Andreatta, 2016). Alternative versatile prediction algorithms should be used for neoantigen prediction. Thus, we plan to update ScanNeo to incorporate more MHC class I prediction approaches.

Troubleshooting

Problem 1

Install the software-dependent packages (Steps 1 and 2).

Potential solution

When possible, use Anaconda (https://www.anaconda.com/) to install Python 3 and its dependent packages. To order to avoid potential conflicts with installed Python packages, you can create a new conda environment to install all the necessary packages using the “conda create” command.

Problem 2

Software versions specific requirements (Steps 1 and 2). Make sure that Python and other dependencies versions are appropriate. You can use Anaconda to specify the version of the installed package, using the following commands: $ conda install = Or use GitHub tag to specify the package version.

Problem 3

You are receiving a “command not found” error message (Step 2), when you are trying to install VEP annotations using vep_install or run other conda installed executable files such as bedtools and sambamba. This indicated that the executable files are not in the $PATH environment variable. $ git clone --depth 1 --branch https://github.com/.git Add Anaconda bin directory to the $PATH environment variable in the file ∼/.bashrc. export PATH="/path/to/Anaconda3/Python3/bin:$PATH"

Problem 4

You are receiving a “configparser.NoSectionError” error message (Step 5 and 7). Place config.ini file to the location of ScanExitron or ScanNeo.

Problem 5

You are receiving an “Errors in BED line” error message (Step 6). This indicated the input RNA-seq BAM file used GRCh37/GRCh38 contig names, such as ‘1’, ‘2’, instead of hg37/hg38 contig names, such as ‘chr1’, ‘chr2’. If you have the raw RNA-seq reads in FASTQ format, you can realign the reads using hg38/hg19 reference genome sequences. Otherwise, you can extract the reads from the RNA-seq BAM file using Picard SamToFastq (https://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq), then realign the reads.

Problem 6

How to perform a differential analysis of exitrons between two groups of samples (Step 6 and Table 1). First, following steps 1–6, you can detect a list of exitrons for every sample. Second, organize the exitron results of all samples to form a table of PSO values. In this table, you should put PSO values in the cell for the corresponding row (exitron splicing event) and column (sample). Because you have two groups of samples, you can use a linear model or statistical tests (e.g., T-test) to calculate the statistical significance (p-value) for each exitron. If there are multiple exitrons in the table, multiple testing correction is needed to adjust the p-values.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Rendong Yang (yang4414@umn.edu).

Materials availability

This study did not generate new unique reagents.

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

Example data (example.bam file)	This paper	https://github.com/ylab-hi/ScanExitron/blob/master/example_data/
RNA-Seq data from TCGA PRAD cohort	NCI Genomic Data Commons	https://portal.gdc.cancer.gov
HLA types for TCGA cohort	Thorsson et al., 2018	https://gdc.cancer.gov/about-data/publications/panimmune
GENCODE human gene annotations	Frankish et al., 2019	https://www.gencodegenes.org/human/
Human reference genome NCBI build 38, GRCh38	Genome Reference Consortium	http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/

Software and algorithms

HISAT2	Kim et al., 2019	RRID:SCR_015530;http://daehwankimlab.github.io/hisat2/
ScanExitron	Wang et al., 2021	https://github.com/ylab-hi/ScanExitron
Pyfaidx v0.5.9.2	Shirley et al., 2015	https://github.com/mdshw5/pyfaidx
SamTools v1.12	Li et al., 2009	RRID:SCR_00210;http://www.htslib.org/
BEDTools v2.26.0	Quinlan, 2014	RRID:SCR_006646; https://github.com/arq5x/bedtools2
RegTools v0.4.2	Feng et al., 2018	https://github.com/griffithlab/regtools
ScanNeo	Wang et al., 2019	RRID:SCR_019253;https://github.com/ylab-hi/ScanNeo
transIndel v2.0	Yang et al., 2018	https://github.com/cauyrd/transIndel
OptiType v1.2	Szolek et al., 2014	https://github.com/FRED-2/OptiType
Yara aligner v1.0.2	Siragusa et al., 2013	https://github.com/seqan/seqan/tree/master/apps/yara
Variant Effect Predictor v102.0	McLaren et al., 2016	RRID:SCR_007931; https://useast.ensembl.org/info/docs/tools/vep/script/index.html
Sambamba v0.8.0	Tarasov et al., 2015	https://lomereiter.github.io/sambamba/
IEDB MHC class I peptide binding prediction tools v3.1	Vita et al., 2019	https://downloads.iedb.org/tools/mhci/3.1/
BWA v0.7.17	Li and Durbin, 2009	RRID:SCR_010910;http://bio-bwa.sourceforge.net/
PyVCF v0.6.8	N/A	https://github.com/jamescasbon/PyVCF/
Picard v2.24.0	Broad Institute	https://broadinstitute.github.io/picard/
HDF5 v1.10.4	The HDF Group	http://www.hdfgroup.org/HDF5/
Tabix v1.12	Li et al., 2009	RRID:SCR_00210;http://www.htslib.org/

Other

PC with 4 CPU cores and 16GB RAM	AMD	N/A
HPC system with 16 CPU cores and 64GB RAM	AMD	N/A

19 in total

1. Alternative Splicing Signatures in RNA-seq Data: Percent Spliced in (PSI).

Authors: Sebastian Schafer; Kui Miao; Craig C Benson; Matthias Heinig; Stuart A Cook; Norbert Hubner
Journal: Curr Protoc Hum Genet Date: 2015-10-06

2. ScanNeo: identifying indel-derived neoantigens using RNA-Seq data.

Authors: Ting-You Wang; Li Wang; Sk Kayum Alam; Luke H Hoeppner; Rendong Yang
Journal: Bioinformatics Date: 2019-10-15 Impact factor: 6.937

3. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.

Authors: Daehwan Kim; Joseph M Paggi; Chanhee Park; Christopher Bennett; Steven L Salzberg
Journal: Nat Biotechnol Date: 2019-08-02 Impact factor: 54.908

4. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

5. BEDTools: The Swiss-Army Tool for Genome Feature Analysis.

Authors: Aaron R Quinlan
Journal: Curr Protoc Bioinformatics Date: 2014-09-08

6. The Immune Landscape of Cancer.

Authors: Vésteinn Thorsson; David L Gibbs; Scott D Brown; Denise Wolf; Dante S Bortone; Tai-Hsien Ou Yang; Eduard Porta-Pardo; Galen F Gao; Christopher L Plaisier; James A Eddy; Elad Ziv; Aedin C Culhane; Evan O Paull; I K Ashok Sivakumar; Andrew J Gentles; Raunaq Malhotra; Farshad Farshidfar; Antonio Colaprico; Joel S Parker; Lisle E Mose; Nam Sy Vo; Jianfang Liu; Yuexin Liu; Janet Rader; Varsha Dhankani; Sheila M Reynolds; Reanne Bowlby; Andrea Califano; Andrew D Cherniack; Dimitris Anastassiou; Davide Bedognetti; Younes Mokrab; Aaron M Newman; Arvind Rao; Ken Chen; Alexander Krasnitz; Hai Hu; Tathiane M Malta; Houtan Noushmehr; Chandra Sekhar Pedamallu; Susan Bullman; Akinyemi I Ojesina; Andrew Lamb; Wanding Zhou; Hui Shen; Toni K Choueiri; John N Weinstein; Justin Guinney; Joel Saltz; Robert A Holt; Charles S Rabkin; Alexander J Lazar; Jonathan S Serody; Elizabeth G Demicco; Mary L Disis; Benjamin G Vincent; Ilya Shmulevich
Journal: Immunity Date: 2018-04-05 Impact factor: 43.474

7. Indel detection from DNA and RNA sequencing data with transIndel.

Authors: Rendong Yang; Jamie L Van Etten; Scott M Dehm
Journal: BMC Genomics Date: 2018-04-19 Impact factor: 3.969

8. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

9. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.

Authors: Morten Nielsen; Massimo Andreatta
Journal: Genome Med Date: 2016-03-30 Impact factor: 11.117

10. The Immune Epitope Database (IEDB): 2018 update.

Authors: Randi Vita; Swapnil Mahajan; James A Overton; Sandeep Kumar Dhanda; Sheridan Martini; Jason R Cantrell; Daniel K Wheeler; Alessandro Sette; Bjoern Peters
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971