Literature DB >> 28484631

TSNAD: an integrated software for cancer somatic mutation and tumour-specific neoantigen detection.

Zhan Zhou1, Xingzheng Lyu2, Jingcheng Wu1, Xiaoyue Yang1, Shanshan Wu1, Jie Zhou1, Xun Gu3, Zhixi Su4, Shuqing Chen1.   

Abstract

Tumour antigens have attracted much attention because of their importance to cancer diagnosis, prognosis and targeted therapy. With the development of cancer genomics, the identification of tumour-specific neoantigens became possible, which is a crucial step for cancer immunotherapy. In this study, we developed software called the tumour-specific neoantigen detector for detecting cancer somatic mutations following the best practices of the genome analysis toolkit and predicting potential tumour-specific neoantigens, which could be either extracellular mutations of membrane proteins or mutated peptides presented by class I major histocompatibility complex molecules. This pipeline was beneficial to the biologist with little programmatic background. We also applied the software to the somatic mutations from the International Cancer Genome Consortium database to predict numerous potential tumour-specific neoantigens. This software is freely available from https://github.com/jiujiezz/tsnad.

Entities:  

Keywords:  cancer somatic mutation; major histocompatibility complex; membrane protein; neoantigen; tumour antigen

Year:  2017        PMID: 28484631      PMCID: PMC5414268          DOI: 10.1098/rsos.170050

Source DB:  PubMed          Journal:  R Soc Open Sci        ISSN: 2054-5703            Impact factor:   2.963


Introduction

Tumour antigens have attracted much attention for their importance in cancer diagnosis, prognosis and targeted therapy, as they are crucial tumour biomarkers for identifying tumour cells and are potential targets for cancer therapy [1-3]. Tumour antigens can be broadly classified into two categories based on their specificity: tumour-specific antigens, which are only present in tumour cells; and tumour-associated antigens, which are overexpressed or aberrantly expressed in tumour cells and are also expressed in some normal cells [1]. In addition to abnormal expression patterns, tumour cells also contain a range of cancer somatic mutations and mutations in protein-coding regions might produce tumour-specific mutant proteins [4,5]. Tumour antigens derived from these tumour-specific mutant proteins are unparalleled tumour biomarkers, as they are only produced by tumour cells and are potential tumour-specific mutant antigens or neoantigens [3]. Tumour antigens recognized by T cells or antibodies should present on the surface of tumour cells [6,7]. A major part of tumour antigens used as drug targets are membrane proteins, such as HER2 and CD19, which are targets of the antibody trastuzumab [8] and chimaeric antigen receptor T-cell immunotherapy (CAR-T) for B-cell cancer [9,10], respectively. Additionally, tumour antigens presented by class I major histocompatibility complex (MHC) molecules for recognition by T cells (i.e. tumour-specific neoantigens) could also be used as drug targets [2,11,12]. On the other hand, in the immune checkpoint blockade therapy, the neoantigen load is associated with the therapy efficacy (i.e. PD-1, CTLA-4 blockade), which indicates that the neoantigen load is a great biomarker in cancer immunotherapy [13]. Because of their potential application to be targets and biomarkers in cancer immunotherapy [1,12,14,15], tumour-specific neoantigens have attracted much attention in biomedical research. Several prediction tools have been developed to predict tumour-specific neoantigens from cancer somatic mutations, such as pVAC-seq [16] and INTEGRATE-neo [17], which can predict neoantigens produced by non-synonymous somatic mutations and gene fusions, respectively. However, these tools only predict neoantigens presented by class I MHC molecules that can be recognized by T cells, they do not consider the mutations in the extracellular regions of membrane proteins that can be recognized by mutation-specific antibodies [18,19]. In this study, we developed integrated software with a graphical user interface (GUI), called the tumour-specific neoantigen detector (TSNAD), which can identify cancer somatic mutations following the best practices of the genome analysis toolkit (GATK v. 3.5) [20] from the genome/exome sequencing data of tumour-normal pairs. We also provided a filter for calling tumour-specific mutant proteins. Then, we conducted two strategies to predict neoantigens. First, we extracted the extracellular mutations of membrane proteins according to the protein topology. Second, we invoked NetMHCpan (v. 2.8) [21] to predict the binding information of mutant peptides to class I MHC molecules. Finally, we applied TSNAD on the cancer somatic mutations collected in the International Cancer Genome Consortium (ICGC) database to predict potential neoantigens.

Material and methods

Tools

Standard sequencing data processing consists of preprocessing, alignment, variants calling, annotation and further analysis. Given that the existing software or tools are designed for specific functions, it was necessary to develop an automated and user-friendly framework that calls a series of software. This section summarizes the required software and its main features.

Data filtering software

Trimmomatic (v. 0.35) [22]. Original raw sequences have random lengths and contain adaptors that will be harmful to the subsequent data processing. This software can trim and crop raw reads and remove artefacts.

Genome mapping software

Burrows-Wheeler Aligner (BWA, v. 0.7.12) [23,24]. This alignment toolkit is used for mapping short sequences to a reference genome. This software is based on the Burrows-Wheeler transformation and is highly efficient at finding locations of low-divergent sequences on a large genome.

Alignment manipulating tool

Samtools (v. 1.3) [25]. Its view and sort functions transform sequencing data format from SAM (sequence alignment/map) to BAM (binary alignment/map), which will save an enormous amount of storage space. Moreover, it can manage duplicate reads and index alignments.

Data processing tool

Picard tools (v. 1.140) [26]. This program consists of a set of Java command lines to handle with different sequencing data format (such as SAM, BAM and VCF). Given redundancy data may influence further processing, Picard MarkDuplicates tool can thus be applied to remove repeat sequences.

Variant calling software

Genome Analysis Toolkit (GATK v. 3.5) [20], Mutect2 [27]. The main function of GATK is variant discovery in high-throughput sequencing data. Mutect2 is a package in GATK to identify somatic SNVs and INDELs.

Mutation annotation software

Annovar (14 December 2015) [28,29]. We use it to functional annotate somatic mutations, including position, change of nucleotide, change of amino acid for protein-coding region, and other functions. We can then extract tumour-specific mutant proteins.

Human leucocyte antigen typing software

SOAP-HLA (v. 2.2) [30]. This software detects the human leucocyte antigen (HLA—the MHC in humans) types for each sample. The program takes sorted aligned sequencing data (BAM format) as the input and outputs HLA types. The HLA types are critical for the MHC-binding predictions.

Protein topology indicating software

TMHMM (v. 2.0) [31]. This tool is used to predict the topology of membrane proteins based on a hidden Markov model (HMM). The prediction of transmembrane helices and membrane proteins is highly accurate [32].

Major histocompatibility complex-binding predicting software

NetMHCpan (v. 2.8) [21]. This software can forecast peptides that can bind to MHC class I molecules using artificial neural networks.

Datasets

The somatic mutations were collected from the whole-genome/exome sequencing data of 9155 tumour-normal pairs in the ICGC database (Release 20, http://icgc.org). This dataset has compiled over 1.5 million sample somatic mutations in coding regions, among which 828 129 missense variants have caused amino acid changes with a frequency range from 1 to 476 out of 9155 tumour samples. The HLA types were extracted from the 1000 Genome Project. We choose 16 HLA alleles with frequencies of more than 5% in the population collected in the 1000 Genome Project [33], which includes five HLA-A (HLA-A*01:01, HLA-A*02:01, HLA-A*03:01, HLA-A*11:01 and HLA-A*24:02), four HLA-B (HLA-B*07:02, HLA-B*35:01, HLA-B*40:01 and HLA-B*51:01) and seven HLA-C (HLA-C*01:02, HLA-C*03:03, HLA-C*03:04, HLA-C*04:01, HLA-C*06:02, HLA-C*07:01 and HLA-C*07:02) alleles.

Identification of extracellular region of membrane proteins

The list of human membrane proteins was extracted from the human protein atlas [34]. The amino acid sequences of these membrane proteins were downloaded from Ensembl (GRCh37 v. 75) [35]. TMHMM (v. 2.0) was used to identify the transmembrane topology and extracellular region of each membrane protein [31].

Prediction of class I major histocompatibility complex binding

After we obtained the list of the tumour-specific mutant proteins, we extracted the peptide sequences around the mutation sites. As MHC molecules always bind to peptides 8–11 amino acids in length, we extracted peptides 21 amino acids in length, with 10 amino acids upstream and 10 amino acids downstream of mutation sites for NetMHCpan prediction (figure 1). Wild-type peptides with the same length as the mutant peptides were extracted as references. These wild-type and mutant peptides were measured for their binding affinities (50% inhibitory concentration [IC50], nM) to each class I HLA allele. The binding was considered strong if the IC50 value was less than 150 nM, and a weak binding had an IC50 value between 150 and 500 nM. Non-binding occurred if the IC50 value was more than 500 nM [11].
Figure 1.

Mutant peptides with 21 amino acids and corresponding 8–11 mer peptides. MHC molecules always bind to 8–11 mer peptides, so we extracted peptides 21 amino acids in length, with 10 amino acids upstream and 10 amino acids downstream of mutation sites for NetMHCpan prediction. The number 11 in red indicates the mutated site, and the peptides in yellow represent all the possible peptides which may bind to MHC molecules.

Mutant peptides with 21 amino acids and corresponding 8–11 mer peptides. MHC molecules always bind to 8–11 mer peptides, so we extracted peptides 21 amino acids in length, with 10 amino acids upstream and 10 amino acids downstream of mutation sites for NetMHCpan prediction. The number 11 in red indicates the mutated site, and the peptides in yellow represent all the possible peptides which may bind to MHC molecules.

Experimental validation of peptide biding to class I major histocompatibility complex molecular

Peptides were obtained lyophilized (more than 95% purity) from Bankpeptide Biological Technology Co., Ltd (Hefei, China), dissolved in 10% DMSO in sterile water and tested for sterility, purity, endotoxin and residual organics. Peptide binding to HLA-A*02:01 was determined by T2 assay [36]. T2 cells were washed in phosphate buffered saline (PBS) and RPMI-1640 without serum. In total, 5 × 105 cell ml–1 were incubated with 5 µg ml−1 peptide and 10 µg ml−1 human beta-2-microglobulin in serum-free RPMI-1640 for 4 h or overnight at 37°C. The pulsed cells were pelleted and followed by 3 × 1 ml rinses in PBS with centrifugation at 500g for 5 min at 4°C. Cells were resuspended in 200 µl PBS and stained with 1 µl of w6/32 (Thermo Fisher) for 30 min on ice, followed by three rinses with 1 ml PBS at 4°C. Cells were then resuspended in 200 µl PBS and 1 µl of goat anti-mouse antibody-FITC (Beyotime Biotechnology) for 30 min on ice, followed by three rinses at 4°C. Then, cells were resuspended in 500 µl PBS. Stained T2 cells were analysed using a FACSCalibur.

Results

Software overview

We developed integrated software, called TSNAD, under the Linux operation system through a GUI. The platform is completely automated and is mainly designed for users who have little programming experience. There are several neoantigen prediction pipelines such as pVAC-seq, INTEGRATE-neo: pVAC-seq combined the tumour mutation and expression data to predict neoantigens by invoking NetMHC v. 3.4; INTEGRATE-neo was designed to predict neoantigens from fusion genes based on the pipeline INTERGRATE and NetMHC v. 4.0. Similar with these pipelines, TSNAD also used widely approved software NetMHCpan v. 2.8 to predict neoantigens. Compared with other neoantigen prediction pipelines, TSNAD has lists of advantages: first, TSNAD offered a pipeline for mutation calling from sequencing data; second, TSNAD not only considered the neoantigens presented by class I MHC molecules, but also took mutations in membrane proteins into consideration; third, unlike other pipelines that performed through command lines, TSNAD provided a GUI for biologists without programming background to analyse their data easily. The software consists of two toolkits: mutation detection and neoantigen prediction. Each toolkit is a two-step process as follows: configure the parameters and run the corresponding toolkit. The first step is to configure the software paths and parameters. This step is of great significance, and users are expected to ensure the appropriateness and correctness of the configurations. Users can find the detailed instructions about how to set paths and parameters in the user's manual. For the software paths, the users do not need to change these parameters once they are set because TSNAD will import the existing configuration files by default. Users can also edit partial parameters by GUI or by manually modifying the configuration files. It is worth noting that TSNAD requires its own naming convention for the input files. The users can choose to either manually or use the tool we provided to rename the names of sequencing files to suit the criteria of TSNAD. After setting the configurations, non-expert users can run the pipeline by just clicking on the appropriate toolbar. In the processing monitoring window, the users can observe the pipeline progression. The pipeline, which was written in Python programming language (v2.7), calls for standard third-party software and applies multiprocessing strategy to speed up the data processing. When the pipeline is finished, all of the results will be stored in a user-specified folder. The mutation detection pipeline returns the list of somatic mutations with annotations. The neoantigen prediction pipeline returns extracellular mutations of the membrane proteins and the MHC-binding information (all in TXT format).

Detection of cancer somatic mutations

The software can detect single-nucleotide variants (SNVs) and small insertions or deletions (INDELs) according to the pipeline as depicted in figure 2. The raw paired-end sequence data were in FastQ format from the whole-genome sequencing, the whole-exome sequencing or the targeted gene panel sequencing using the Illumina platform. The raw data were cleaned using Trimmomatic [22]. BWA-MEM was used to map the reads to the reference genome sequences [23,24]. Samtools [25] and Picard [26] were used to address files in SAM or BAM formats, including transform, sort, merge and mark duplicates. GATK [20] was used to pre-process the BAM files, such as realigning the INDELs and recalibrating the bases. Mutect2 [27] in GATK was used to call the somatic SNVs and INDELs between tumour and normal samples. Annovar [28,29] was used to annotate the detailed mutation information.
Figure 2.

The software pipeline of TSNAD. The pipeline performs best practices for somatic SNVs and INDELs in whole-genome/exome sequence with GATK. Then, we extracted the extracellular mutations of membrane proteins according to the protein topology, and invoked NetMHCpan to predict the binding information of mutant peptides to class I MHC molecules.

The software pipeline of TSNAD. The pipeline performs best practices for somatic SNVs and INDELs in whole-genome/exome sequence with GATK. Then, we extracted the extracellular mutations of membrane proteins according to the protein topology, and invoked NetMHCpan to predict the binding information of mutant peptides to class I MHC molecules. We further provide a filter to detect the somatic mutations in the protein-coding regions and the somatic missense variants which fit the cut-off (tumour reads > 10, normal reads > 6, tumour alteration reads > 5, variant allele frequency (VAF) in tumour DNA > 0.05 and VAF in normal DNA = 0).

Prediction of neoantigens

When peptides differ by only one amino acid change, specific antibodies can be generated [19,37]. Therefore, missense mutations that are present on the surfaces of tumour cells are important targets for antibody-based immunotherapy. We performed two strategies to predict the neoantigens that would present on the surfaces of tumour cells [1,2]. First, we extracted the somatic mutations in the extracellular regions of the membrane proteins. Second, we predicted the neoantigens that would present on the cell surface by evaluating the binding affinity between the peptides and class I MHC molecules. According to the Human Protein Atlas, there were 5462 predicted membrane proteins [34]. We identified the transmembrane topologies and the extracellular regions of these proteins using TMHMM [31]. To identify the extracellular mutations of membrane proteins, the filtered cancer somatic missense variants were mapped to the extracellular regions of membrane proteins. We further verified the characteristics of the mutant amino acids. Mutations that change the polarity of the amino acids have gained more attention, as they may be more likely to cause differences in binding features to antibodies between wild-type and mutant proteins. In addition to the membrane proteins, peptides could be present on the cell surface because of the antigen presenting system, which is mediated by MHC molecules. SOAP-HLA was used for the HLA typing of each sample [30]. NetMHCpan was used to predict the binding affinity between the class I MHC and wild-type/mutant peptides [21]. We further compared the binding information of the HLA molecules to the wild-type and mutant peptides. The mutant peptides that can bind to the HLA-A/B/C molecules were extracted for further analysis; the specific bindings of the HLA proteins to the mutant peptides were preferred for their potential to be drug targets without affecting normal tissues.

Prediction of neoantigens based on the somatic mutation data from the International Cancer Genome Consortium database

In previous study, we performed oncogene targeted depth sequencing on a malignant peritoneal mesothelioma [38]. Applying the TSNAD to analyse the sequence data of the tumour sample and the paired peripheral blood sample, we detected 2897 somatic SNVs and 218 somatic INDELs. Four SNVs of NOTCH2, PDE4DIP, ATP10B and NSD1 and one frameshift INDEL of BAP1 were validated by Sanger sequencing on tumour RNA. We also predicted the neoantigens on these mutated proteins, and found specific-binding of neo-peptide generated by BAP1 frameshift INDEL to HLA-B*35:42 of the patient. A polyclonal antibody of the neo-peptide of BAP1 were produced in rabbits and showed a good antibody-neoantigen specificity, which indicates that the neo-peptide of BAP1 could be a potential tumour-specific neoantigen [38]. In addition to handling original sequencing data, TSNAD could also analyse exiting mutations data to predict potential neoantigens. We applied TSNAD to the simple somatic mutations of 9155 samples from the ICGC database and predicted numerous neoantigens, including extracellular mutations of membrane proteins and peptides presented by the class I MHC molecules.

Prediction of neoantigens from membrane proteins

To identify the extracellular mutations of membrane proteins, we mapped all of the missense mutations to the extracellular regions of membrane proteins. A dataset containing 88 354 extracellular mutations was obtained. A majority of these extracellular mutations (89.6%, 79 198 out of 88 354) occurs only once in the 9155 donors (electronic supplementary material, table S1 and figure S1), which illustrates the high heterogeneity in tumour samples. However, membrane proteins with mutations that occur in more samples are also ideal drug targets for antibody-based immunotherapy. The top 20 frequent extracellular mutations are listed in table 1 and MUC4:H4205Q is the most frequent extracellular mutation (44 out of 9155).
Table 1.

Top 20 most frequent extracellular mutations in 9155 donors.

ChrPosIDgeneDNA mutationprotein mutationmutation frequency
3195505836MU10935MUC412615C>GH4205Q44 out of 9155
129138975MU68226OPRD180G>TC27F25 out of 9155
1120611960MU869951NOTCH261G>AA21T23 out of 9155
191065018MU68245ABCA76133G>TA2045S18 out of 9155
1522369378MU4380351OR4M2803C>TS268F16 out of 9155
1737868208MU85975ERBB2929C>TS310F15 out of 9155
3195509676MU586249MUC48775G>CQ2925H15 out of 9155
755233043MU589341EGFR1793G>TG598 V15 out of 9155
3195515449MU605883MUC43002T>AV1001E14 out of 9155
199072091MU4382243MUC1615355C>TP5119S13 out of 9155
2017639816MU4585427RRBP11337A>CQ446P12 out of 9155
5179071958MU4110168C5orf6064G>CD22H12 out of 9155
7146829338MU4413315CNTNAP21085G>AG362E12 out of 9155
1158261127MU4408485CD1C265C>TR89C11 out of 9155
115345040MU4383907OR51B2488C>TS163 L11 out of 9155
1721319519MU613603KCNJ12865G>CE289Q11 out of 9155
246707884MU70561TMEM247458A>GQ153R11 out of 9155
2137814319MU4440003THSD7B469G>AE157 K11 out of 9155
3195511286MU4617526MUC47165G>AD2389N11 out of 9155
7139167934MU66261KLRG2455A>CK152T11 out of 9155
Top 20 most frequent extracellular mutations in 9155 donors.

Prediction of neoantigens through major histocompatibility complex-binding information

Peptides could also present on the cell surface via the antigen presenting system, mediated by MHC class I molecules. In this manner, mutant peptides that are present exclusively in tumour cells are the potential neoantigens, and the MHC–peptide complexes are called neoantigens. Based on the missense mutations of the 9155 tumour samples from the ICGC, we extracted peptides 21 amino acids length, with 10 amino acids upstream and 10 amino acids downstream of the mutation sites. Both the mutant and reference peptides were extracted. Combined with the 16 HLA alleles whose frequencies were more than 5% in the population collected in the 1000 Genome Project, we used our software, invoking NetMHCpan (v2.8) [21] to predict the binding affinity between HLA and the collected peptides. Then, we compared the binding information of the HLA proteins to wild-type and mutant peptides, and the specific bindings of the HLA proteins to mutant peptides were collected. These mutant peptides are seen as potential neoantigens. Finally, we obtained a dataset containing 1 420 785 records. We also analysed the distribution of the dataset (electronic supplementary material, table S2 and figure S2). The results showed a similar phenomenon with that in membrane proteins. Mutations with more frequencies in the samples may play important roles in tumorigenesis. There are 65 potential common neoantigens whose corresponding mutations appear in at least 20 out of the 9155 donors from the ICGC database and had an IC50 of less than 500. The 65 neoantigens are related to the 23 somatic mutations of 12 genes (table 2; electronic supplementary material, table S3). KRAS, PIK3CA and TP53 occupy more potential neoantigens than other genes, indicating that these genes play more important roles in tumour immunotherapy, corresponding to former research results that KRAS and PIK3CA are oncogenes and that TP53 is a tumour suppressor gene [39]. Moreover, we also found some genes that have not been identified as tumour-associated genes by Cancer Gene Census also encode potential neoantigens, such as MUC4, FAM194B, OPRD1 and FRG1.
Table 2.

Sixty five potential common neoantigens and their corresponding genes and mutation frequency.

generole in tumourno. mutationno. neoantigen
KRASoncogene511
PIK3CAoncogene521
TP53tumour suppressor gene38
SF3B1tumour-related gene12
MUC411
CHEK2tumour suppressor gene12
PTENtumour suppressor gene23
FAM194B12
OPRD115
CTNNB1oncogene15
FRG114
GNAStumour-related gene11
Sixty five potential common neoantigens and their corresponding genes and mutation frequency. We found that the most frequent potential neoantigens are encoded by gene KRAS, which has been identified as an oncogene in vivo. There are six potential neoantigens related to the KRAS gene in the top 10 potential neoantigens, with two different mutations: G12D and G12 V. Among the six peptides, three of them (KLVVVGADGV, KLVVVGAVGV and KLVVVGAV) are presented by HLA-A*02:01, one (TEYKLVVVGAV) is presented by HLA-A*40:01, one (GAVGVGKSAL) is presented by HLA-A*03:04 and one (GAVGVGKSAL) is presented by HLA-C*03:03 (table 3).
Table 3.

Top 10 neoantigens with the highest mutation frequency in 9155 donors.

geneHLA allelepositionpeptidemutationaffinity (nM)mutation frequency
KRASHLA-A*02:018KLVVVGADGVG12D214322 out of 9155
KRASHLA-A*02:018KLVVVGAVGVG12 V112239 out of 9155
KRASHLA-A*02:018KLVVVGAVG12 V163239 out of 9155
KRASHLA-B*40:0111TEYKLVVVGAVG12 V90239 out of 9155
KRASHLA-C*03:043GAVGVGKSALG12 V172239 out of 9155
KRASHLA-C*03:033GAVGVGKSALG12 V172239 out of 9155
PIK3CAHLA-C*07:022ARHGGWTTKMH1047R218200 out of 9155
PIK3CAHLA-C*06:023ARHGGWTTKMH1047R457200 out of 9155
PIK3CAHLA-C*07:012ARHGGWTTKMH1047R249200 out of 9155
PIK3CAHLA-A*11:0111STRDPLSEITKE545 K81182 out of 9155
Top 10 neoantigens with the highest mutation frequency in 9155 donors. To study the distribution of the neoantigens across different HLA type, we classified the 1 420 785 records into 16 parts according to the HLA type we used (figure 3a). It was found that approximately 10 mutant peptides could bind to each HLA type in each sample, which means that we can find about 60 neoantigens in each tumour sample on average.
Figure 3.

The distribution of tumour-specific neoantigens across 16 HLA types and 20 tumour types. (a) The number of tumour-specific neoantigens with each HLA type is shown in decreasing order. The dashed line indicates the average number of neoantigens. (b) Distribution of tumour-specific neoantigens across 20 tumour types. The width of each violin indicates the proportion of donors sharing a certain number of neoantigens in each tumour type. Upper limit and lower limit of white bar and the black line in it denote upper quartile, lower quartile and median number for each type.

The distribution of tumour-specific neoantigens across 16 HLA types and 20 tumour types. (a) The number of tumour-specific neoantigens with each HLA type is shown in decreasing order. The dashed line indicates the average number of neoantigens. (b) Distribution of tumour-specific neoantigens across 20 tumour types. The width of each violin indicates the proportion of donors sharing a certain number of neoantigens in each tumour type. Upper limit and lower limit of white bar and the black line in it denote upper quartile, lower quartile and median number for each type. Because of the highly heterogeneity of tumours, we further investigated the distribution of neoantigens in each tumour type (based on the tissue origin; figure 3b). The results showed that the neoantigen load is related to the somatic mutation burden. The cancer types have more mutation load, such as skin and lung cancer, have more neoantigens in average. Interestingly, uterus cancer has the largest number of neoantigens on average (715.98, electronic supplementary material, table S4), but the median number of neoantigens of uterus cancer ranks 10th among the 20 cancer types (figure 3b). The reason may be that the number of neoantigens varies greatly among different patients of uterus cancer, several uterus tumours have large numbers of neoantigens. The nervous system cancer possesses the least neoantigens (2.39) on average. The results indicated that the neoantigen load is not only quite different between different cancer types, but also quite different between different tumours from the same tissue.

Specific binding of the TP53 mutant peptide to HLA-A*02:01

We choose one of the 65 potential common neoantigens, which was generated by the TP53 R248 W mutation, to experimentally confirm the specific-binding of neoantigen to HLA-A*02:01 using T2 assay [36]. We predicted that the wild-type (WT) peptide (GMNRRPILTII) could not bind to HLA-A*02:01, while the mutant peptide (GMNWRPILTII) could weakly bind to HLA-A*02:01 with the IC50 value = 350 nM (electronic supplementary material, table S3). The T2 cell line was widely used to confirm the binding of the peptides to HLA-A*02:01 as its HLA levels can be stabilized by the addition of exogenous HLA-binding peptides but unable to present the endogenous HLA-associated peptides [15,36]. To assess binding strength, we first incubated T2 cells with the WT and mutant peptides, respectively, and then used the W6/32 antibody that targets HLA molecules stabilized by any HLA-binding peptides. The strength of peptide binding between WT and mutant peptides were comparable as suggested by W6/32 staining. Analysis of the pulsed cells by flow cytometry showed that binding of the TP53 (R248 W) mutant peptide to T2 cells was more significant than the background levels of staining to the WT peptide or negative control cells (figure 4), which confirms the specific binding of the TP53 mutant peptide to HLA-A*02:01. Therefore, the R248 W mutation of TP53 can generate a potential tumour-specific neoantigen in the patient with HLA-A*02:01, which can be an ideal target for neoantigen-specific cancer immunotherapy.
Figure 4.

Specific binding of mutant peptide of TP53 to HLA-A*02:01. Blank control: FITC-goat anti-mouse IgG + T2 cells; negative control: human beta-2-microglobulin were incubated with T2 cells overnight at 37°C + W6/32 + FITC-goat anti-mouse IgG; wide-type (WT) peptide binding affinity analysis: WT peptide (GMNRRPILTII) and human beta-2-microglobulin were incubated with T2 cells overnight at 37°C + W6/32 + FITC-goat anti-mouse IgG; mutated peptide binding affinity analysis: mutated peptide (GMNWRPILTII) and human beta-2-microglobulin were incubated with T2 cells overnight at 37°C + W6/32 + FITC-goat anti-mouse IgG.

Specific binding of mutant peptide of TP53 to HLA-A*02:01. Blank control: FITC-goat anti-mouse IgG + T2 cells; negative control: human beta-2-microglobulin were incubated with T2 cells overnight at 37°C + W6/32 + FITC-goat anti-mouse IgG; wide-type (WT) peptide binding affinity analysis: WT peptide (GMNRRPILTII) and human beta-2-microglobulin were incubated with T2 cells overnight at 37°C + W6/32 + FITC-goat anti-mouse IgG; mutated peptide binding affinity analysis: mutated peptide (GMNWRPILTII) and human beta-2-microglobulin were incubated with T2 cells overnight at 37°C + W6/32 + FITC-goat anti-mouse IgG.

Discussion

TSNAD is a tool for detecting cancer somatic mutations following the best practices of GATK [20]. TSNAD can also provide potential neoantigens [1], which can be either extracellular mutations of membrane proteins or mutant peptides presented by class I MHC molecules. It is critical for biologists without programming background. We applied the antigen-predicting tool of TSNAD to predict neoantigens, including extracellular mutations of membrane proteins and neoantigens presented by MHC class I molecules. And we experimentally verified the specific-binding of the mutated peptide of TP53 we predicted (R248 W, wild-type: GMNRRPILTII, mutant: GMNWRPILTII) to HLA-A*02:01. The predicted neoantigens in our study were important sources for selecting suitable drug targets. In further study, these predicted neoantigens would need more experimental validation for their potential to be employed as drug targets of T cell or antibody-based immunotherapy.
  39 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

2.  SOAP2: an improved ultrafast tool for short read alignment.

Authors:  Ruiqiang Li; Chang Yu; Yingrui Li; Tak-Wah Lam; Siu-Ming Yiu; Karsten Kristiansen; Jun Wang
Journal:  Bioinformatics       Date:  2009-06-03       Impact factor: 6.937

3.  A hidden Markov model for predicting transmembrane helices in protein sequences.

Authors:  E L Sonnhammer; G von Heijne; A Krogh
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1998

4.  Proteomics. Tissue-based map of the human proteome.

Authors:  Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal:  Science       Date:  2015-01-23       Impact factor: 47.728

Review 5.  Monoclonal antibody-defined antigens on tumor cells.

Authors:  R W Baldwin; M J Embleton; M R Price
Journal:  Biomembranes       Date:  1983

Review 6.  T cell defined tumor antigens.

Authors:  B J Van den Eynde; P van der Bruggen
Journal:  Curr Opin Immunol       Date:  1997-10       Impact factor: 7.486

Review 7.  The cancer genome.

Authors:  Michael R Stratton; Peter J Campbell; P Andrew Futreal
Journal:  Nature       Date:  2009-04-09       Impact factor: 49.962

Review 8.  Landscape of Tumor Antigens in T Cell Immunotherapy.

Authors:  Sadia Ilyas; James C Yang
Journal:  J Immunol       Date:  2015-12-01       Impact factor: 5.422

9.  Ensembl 2016.

Authors:  Andrew Yates; Wasiu Akanni; M Ridwan Amode; Daniel Barrell; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Nathan Johnson; Thomas Juettemann; Stephen Keenan; Ilias Lavidas; Fergal J Martin; Thomas Maurel; William McLaren; Daniel N Murphy; Rishi Nag; Michael Nuhn; Anne Parker; Mateus Patricio; Miguel Pignatelli; Matthew Rahtz; Harpreet Singh Riat; Daniel Sheppard; Kieron Taylor; Anja Thormann; Alessandro Vullo; Steven P Wilder; Amonida Zadissa; Ewan Birney; Jennifer Harrow; Matthieu Muffato; Emily Perry; Magali Ruffier; Giulietta Spudich; Stephen J Trevanion; Fiona Cunningham; Bronwen L Aken; Daniel R Zerbino; Paul Flicek
Journal:  Nucleic Acids Res       Date:  2015-12-19       Impact factor: 16.971

10.  pVAC-Seq: A genome-guided in silico approach to identifying tumor neoantigens.

Authors:  Jasreet Hundal; Beatriz M Carreno; Allegra A Petti; Gerald P Linette; Obi L Griffith; Elaine R Mardis; Malachi Griffith
Journal:  Genome Med       Date:  2016-01-29       Impact factor: 11.117

View more
  23 in total

1.  Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy.

Authors:  Elham Sherafat; Jordan Force; Ion I Măndoiu
Journal:  BMC Bioinformatics       Date:  2020-12-30       Impact factor: 3.169

Review 2.  Computational cancer neoantigen prediction: current status and recent advances.

Authors:  G Fotakis; Z Trajanoski; D Rieder
Journal:  Immunooncol Technol       Date:  2021-11-20

Review 3.  Alternative tumour-specific antigens.

Authors:  Christof C Smith; Sara R Selitsky; Shengjie Chai; Paul M Armistead; Benjamin G Vincent; Jonathan S Serody
Journal:  Nat Rev Cancer       Date:  2019-07-05       Impact factor: 60.716

4.  Population-level distribution and putative immunogenicity of cancer neoepitopes.

Authors:  Mary A Wood; Mayur Paralkar; Mihir P Paralkar; Austin Nguyen; Adam J Struck; Kyle Ellrott; Adam Margolin; Abhinav Nellore; Reid F Thompson
Journal:  BMC Cancer       Date:  2018-04-13       Impact factor: 4.430

5.  Identification of trunk mutations in gastric carcinoma: a case study.

Authors:  Zhan Zhou; Shanshan Wu; Jun Lai; Yuan Shi; Chixiao Qiu; Zhe Chen; Yufeng Wang; Xun Gu; Jie Zhou; Shuqing Chen
Journal:  BMC Med Genomics       Date:  2017-07-17       Impact factor: 3.063

Review 6.  Immunopharmacogenomics towards personalized cancer immunotherapy targeting neoantigens.

Authors:  Kazuma Kiyotani; Hiu Ting Chan; Yusuke Nakamura
Journal:  Cancer Sci       Date:  2018-02-14       Impact factor: 6.716

Review 7.  The perfect personalized cancer therapy: cancer vaccines against neoantigens.

Authors:  Luigi Aurisicchio; Matteo Pallocca; Gennaro Ciliberto; Fabio Palombo
Journal:  J Exp Clin Cancer Res       Date:  2018-04-20

8.  TP53 Targeted Deep Sequencing of Cell-Free DNA in Esophageal Squamous Cell Carcinoma Using Low-Quality Serum: Concordance with Tumor Mutation.

Authors:  Dariush Nasrollahzadeh; Gholamreza Roshandel; Tiffany Myriam Delhomme; Patrice Hodonou Avogbe; Matthieu Foll; Farrokh Saidi; Hossein Poustchi; Masoud Sotoudeh; Reza Malekzadeh; Paul Brennan; James Mckay; Pierre Hainaut; Behnoush Abedi-Ardekani
Journal:  Int J Mol Sci       Date:  2021-05-26       Impact factor: 5.923

9.  Challenges targeting cancer neoantigens in 2021: a systematic literature review.

Authors:  Ina Chen; Michael Y Chen; S Peter Goedegebuure; William E Gillanders
Journal:  Expert Rev Vaccines       Date:  2021-06-09       Impact factor: 5.683

10.  TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis.

Authors:  Jingcheng Wu; Wenyi Zhao; Binbin Zhou; Zhixi Su; Xun Gu; Zhan Zhou; Shuqing Chen
Journal:  Genomics Proteomics Bioinformatics       Date:  2018-09-15       Impact factor: 7.691

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.