| Literature DB >> 35669345 |
Shengjie Chai1,2, Christof C Smith1,3, Tavleen K Kochar4, Sally A Hunsucker1, Wolfgang Beck1,3, Kelly S Olsen1,3, Steven Vensko1, Gary L Glish4, Paul M Armistead1,5, Jan F Prins2,6, Benjamin G Vincent1,2,3,5,7.
Abstract
Motivation: Splice variant neoantigens are a potential source of tumor-specific antigen (TSA) that are shared between patients in a variety of cancers, including acute myeloid leukemia. Current tools for genomic prediction of splice variant neoantigens demonstrate promise. However, many tools have not been well validated with simulated and/or wet lab approaches, with no studies published that have presented a targeted immunopeptidome mass spectrometry approach designed specifically for identification of predicted splice variant neoantigens.Entities:
Year: 2022 PMID: 35669345 PMCID: PMC9154024 DOI: 10.1093/bioadv/vbac032
Source DB: PubMed Journal: Bioinform Adv ISSN: 2635-0041
Fig. 1.Overview of the NeoSplice method. BAM files from tumor and matched-normal samples generated from a splice-aware aligner are input into the NeoSplice algorithm. Step 1: The multi-string BWT tool based on a variant of the Burrows Wheeler transform (BWT) builds the multi-string BWT data structure for tumor and normal RNA-seq bam files. Step 2: The splice graphs are constructed from the tumor RNA-seq bam files. A depth-first search process operating in lockstep on the tumor and normal BWT data structures identifies all tumor-specific k-mers. The tumor-specific k-mers are mapped onto the splice graphs using CIGAR strings. Step 3: Graph traversal infers the tumor-specific splice junction containing partial transcript isoforms within an open reading frame by taking advantage of using paired-end read information and annotated transcript information. Sequences spanning tumor-specific splice junctions are translated into amino acid space, filtering against a reference peptidome. Remaining peptide sequences are run through MHC binding prediction software, with predicted binders representing putative splice variant neoantigens
Fig. 2.Performance and resource usage of NeoSplice on simulated data. (A) Sensitivity and precision for simulated splice variant transcripts, including those derived from exon skipping, exon loss, or intron gain. Values represent the mean performance across 22 chromosomes (up to 5 splice variant junctions per chromosome) for each of 20 total simulated samples. (B) Average runtime (left) and max resident set size (RSS; right), with error bars representing the maximum and minimum times across three simulated samples. Runtime is defined as the elapsed time and RSS is the amount of memory requested by NeoSplice from the operating system as reported by the ‘sacct’ command, as measured on an Intel Xeon ES-E5620 2.4 GHz CPU or ES-E5520 2.27 GHz CPU
Fig. 3.Comparison of single-nucleotide variant (SNV) and splice variant (SV) antigen calls in the TCGA LAML dataset. (A) Number of HLA-I SNV and SV antigen calls per TCGA LAML sample, defined as netMHCpan4.0 predicted binding affinity <500 nM. (B) Histogram showing proportion of public SNV or SV antigens shared among TCGA LAML samples
Fig. 4.Mass spectrometric analysis of the K562.A2 cell line HLA-I immunopeptidome versus NeoSplice predicted peptides. MS spectra from K562.A2 anti-HLA-A*02-based immunoprecipitation and column purification/acid elution of MHC-bound peptides (left) alongside the matching peptide standard from NeoSplice predicted antigens from the K562-A2 cell line (right). Fit (F) score for each peptide is shown (right).
Comparison of splice variant prediction tools from the literature with NeoSplice
| Splice variant antigen caller | Input | Predicts neoantigens | Splicing event identified | Required packages |
| Wet lab validation | Wet lab performance |
|---|---|---|---|---|---|---|---|
| ASNEO | RNA-seq | Yes | Filters reads against GTEx and hg19 reference, translating novel isoforms into proteins for antigen prediction. |
Python: sj2psi R: survival, survminer, MCPcounter |
Not reported | Mass spectrometry (external dataset) | 2/407 peptides confirmed from 14 patient cohort |
| JuncBase | RNA-seq | No | (1) Identifies annotated and novel splice junctions, (2) quantifies each junction and (3) calculated for differential expression between groups. |
Python 2.6+ Biopython 1.5+ Pysam R v2.14+ Rpy2 MySQL/sqlite |
Sens. ∼50–80% Prec. ∼10–95% (Compared in Kahles | RT-PCR | 16/16 splicing events confirmed |
| MiSplice | RNA-seq + WGS | No | Jointly analyzes WGS and RNA-Seq data, scanning the transcriptome for statistically significant non-canonical sequence junctions supported by expression evidence. |
SamTools MaxEntScan |
Sens. 74–97% Spec. ∼77% | Splicing reporter minigene functional assay | 10/11 of splicing alterations |
| MutPred Splice | DNAseq | No | Uses human disease alleles for training a machine learning model to predict exonic nucleotide substitutions that disrupt pre-mRNA splicing. |
None (web interface) |
FPR = 7.0% Sens. 64.7% Spec. 93.0% Acc. 78.8% AUC 83.5% | RT-PCR | Amplicon changes from ATM mutation-contain vs WT cell line confirmed by RT-PCR |
| NeoSplice | RNA-seq | Yes | (1) Identify differentially expressed k-mers, (2) map tumor-specific k-mers to splice graph and (3) ORF inference, translation, and MHC binding prediction. |
Python 2.7 MSBWT MSBWT-IS NetMHCpan 4.0 NetMHCIIpan 3.2 networkx 1.11 pyahocorasick 1.4.0 bcbio-gff 0.6.4 pyfaidx 0.5.3.1 pysam 0.14.1 biopython 1.70 scipy 1.2.0 |
Sens. >80% Recall >80% | Internal mass spectrometry validation against synthetic peptide reference | 4/37 peptides confirmed, corresponding to 3/17 novel splice junctions |
| RI neoantigen pipeline | RNA-seq | Yes | (1) Pseudoaligns RNAseq reads to hg19 with exon and intron transcripts, (2) quantification, (3) KMA algorithm to identify expresed introns and (4) predict MHC binding. |
Kallisto KMA suite of python and R packages POLYSOLVER NetMHCpan v3.1 |
Not reported | Mass spectrometry (external dataset) | Confirmed 1–2 per each of six cell line tested (Mean total splice variant neantigen load of 1515) |
| rMATS | RNA-seq | No | Detection of differentially expressed splice variants between two sets of RNA-seq data. |
Python 2.7/3.6 BLAS, LAPACK GSL 2.5 GCC (5.4.0) Fortran 77 CMake (3.15.4) |
Sens. ∼30% Prec. >95% (Compared in Kahles | RT-PCR | 32/34 exon skipping candidates confirmed |
| SplAdder | RNA-seq | No | (1) Integrating annotation and RNA-Seq data, (2) generating an augmented splicing graph, (3) extraction of splicing events, (4) quantifying the events, and optionally and (5) the differential analysis between samples. |
LIMIX GATK STAR Samtools |
Sens. ∼30–80% Prec. ∼20–90% | None | NA |
| SpliceGrapher |
RNA-seq +/-EST | No | (1) Alignment of RNA-seq to the reference genome, (2) spliced alignment of reads that did not align in the first step, (3) initial splice graph construction, (4) assembly of exons from the ungapped short-read alignments and (5) insertion of the new exons into the splice graph using spliced alignments. |
PyML 0.7.9+ matplotlib 1.1.0+ pysam 0.5+ |
Sens. ∼30–60% Prec. ∼20–90% (Compared in Kahles | None | NA |