| Literature DB >> 34031450 |
Carlos Wert-Carvajal1,2,3, Rubén Sánchez-García1, José R Macías1, Rebeca Sanz-Pamplona4,5, Almudena Méndez Pérez1, Ramon Alemany6, Esteban Veiga1, Carlos Óscar S Sorzano1, Arrate Muñoz-Barrutia7,8.
Abstract
Lack of a dedicated integrated pipeline for neoantigen discovery in mice hinders cancer immunotherapy research. Novel sequential approaches through recurrent neural networks can improve the accuracy of T-cell epitope binding affinity predictions in mice, and a simplified variant selection process can reduce operational requirements. We have developed a web server tool (NAP-CNB) for a full and automatic pipeline based on recurrent neural networks, to predict putative neoantigens from tumoral RNA sequencing reads. The developed software can estimate H-2 peptide ligands, with an AUC comparable or superior to state-of-the-art methods, directly from tumor samples. As a proof-of-concept, we used the B16 melanoma model to test the system's predictive capabilities, and we report its putative neoantigens. NAP-CNB web server is freely available at http://biocomp.cnb.csic.es/NeoantigensApp/ with scripts and datasets accessible through the download section.Entities:
Year: 2021 PMID: 34031450 PMCID: PMC8144223 DOI: 10.1038/s41598-021-89927-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Workflow for the integrated pipeline. (a) The user interface of NAP-CNB with the fields required for NGS analysis. Users can introduce filters of GATK for base quality score recallibration (BQSR) of RNA-Seq reads, minimum depth coverage (DP) and allele frequency (AF). Additionally, users may submit peptidic sequences for affinity prediction. Individual submissions are haplotype-specific, and results are sent to an email address. (b) Workflow for the integrated pipeline. Firstly, the sample is preprocessed before variant calling. Quality control through FastQC and STAR alignment with the reference genome is followed with protocols from Best Practices of GATK. Known variants are introduced through known polymorphisms or a panel-of-normals if requested, andsufficient non-tumor RNA-Seq reads are provided. MuTect2 is used for variant calling, and plausible single nucleotide variant (SNV) mutations translated into peptidic sequences for prediction with the RNN model. Gene expression is quantified through Cuffquant in Cufflinks.
Figure 2Neural network model of the binding affinity prediction for H-2Kb. The input sequence corresponds to a one-hot encoding of a 12 mer peptide sequence extracted from the preprocessing workflow. The number of LSTM units corresponds to the input sequence’s overall length across the three consecutive layers. Following the RNN, two hidden dense units, with alternating dropouts, serve to process an affinity probability.
Binary classification metrics for the final fivefold cross-validated algorithm for the H-2Kb typing.
| AUC ROC | ACC | PPV | Sensitivity | Specificity | F1 |
|---|---|---|---|---|---|
| (±SD) | (±SD) | (±SD) | (±SD) | (±SD) | (±SD) |
The reported mean statistics estimators correspond to AUC ROC, accuracy (ACC), precision or positive predictive value (PPV), and sensitivity and specificity with their harmonic average (F1). The prevalence of positive samples was around 1:40.
Figure 3ROC and precision-recall curves for the final model trained with H-2Kb samples. (a) ROC curve for 10% test partition with an AUC of 86.5%, the dashed line shows chance level. (b) Precision-recall curve with the prevalence of around 3% shown as chance. The precision-recall AUC is 41.97%, whereas a random guess corresponds to an AUC of 2.64% for the same data imbalance.
Figure 4Cross-validation of peptide window sizes for H-2Kb. The area under the curve of the receiver operating characteristic curve using 8 mers, 9 mers, and 12 mers obtained through fivefold cross-validation in different conditions. The windows are obtained from the mutated peptide sequence centered at the location of the SNV. Significant differences between means (Student’s t-test, p ) are shown.
AUC ROC scores and minimum required peptide lengths of haplotypes implemented in NAP-CNB.
| Haplotype | AUC ROC(±SD) | Peptide length (mer) |
|---|---|---|
| H-2Db | 12 | |
| H-2Dd | 12 | |
| H-2Dq | 12 | |
| H-2Kk | 8 | |
| H-2Kq | 12 | |
| H-2Ld | 12 | |
| H-2Lq | 8 |
The AUC ROC corresponds to the fivefold cross-validation average of the best configuration obtained through grid-search parametrization. In all haplotypes 128 models were initially generated for lengths of 8, 10 and 12 amino acids with additional fine-tuning for some instances.
Putative neoantigens, shown by sequence and gene symbol, ranked by scores for the H-2Kb restricted B16 melanoma model.
| Rank | Sequence | Gene | Probability | FPKM | Castle et al. | NetH2pan | MHCflurry 2.0 |
|---|---|---|---|---|---|---|---|
| 1 | NKVVMEYENLEK | Pnp | 1.00 | 3.04 | – | 24 | 22 |
| 2 | KASGFRYNVLSC | Nr1h2 | 1.00 | 0.00 | – | 1 | 17 |
| 3 | SQAWTHPPGVVN | Adar | 1.00 | 0.00 | – | 88 | 128 |
| 4 | TFVYPTIFPLRE | Lrrc28 | 1.00 | 0.94 | – | 10 | 14 |
| 5 | DKSYTLPSSLRK | Zic2 | 1.00 | 1.83 | – | 27 | 28 |
| 6 | TLAQLTWPLWLE | Hjurp | 0.43 | 0.00 | – | 26 | 72 |
| 7 | VDTNMMGHEHIR | Safb2 | 0.26 | 24.20 | – | 140 | 150 |
| 8 | AKTAVNDYFQCN | Stox2 | 0.25 | 0.00 | – | 126 | 179 |
| 9 | FIAIYHHASRAI | Tm9sf3 | 0.21 | 24.29 | ** | 8 | 40 |
| 10 | SGASNTTPHLGF | Tab2 | 0.20 | 29.21 | – | 103 | 58 |
| 11 | YSSMRMMKEALQ | Herc6 | 0.18 | 10.93 | – | 38 | 102 |
| 12 | TRASVTNFQIVH | Tulp2 | 0.16 | 0.00 | – | 43 | 16 |
| 13 | AWGVDGTLAQLE | Pkdcc | 0.16 | 5.50 | – | 118 | 134 |
| 14 | VVLLMDALYLLR | Sirpa | 0.14 | 51.24 | – | 13 | 49 |
| 15 | NVTISNLYEGMM | Hjurp | 0.13 | 0.00 | – | 6 | 20 |
| 16 | ARALWFWAFSLQ | Sfi1 | 0.09 | 0.00 | – | 5 | 47 |
| 17 | GASSFREAMRIG | Eno3 | 0.09 | 29.01 | – | 21 | 112 |
| 18 | LAAIVGKQVLLG | Rpl13a | 0.09 | 1203.49 | * | 67 | 5 |
| 19 | AYSAHTSENLED | Zfp638 | 0.09 | 0.00 | – | 142 | 181 |
| 20 | TVAVLGFILSSA | Commd4 | 0.09 | 41.28 | – | 52 | 30 |
| 21 | FQYCLFKICRDV | Pla2g12a | 0.08 | 7.05 | – | 63 | 101 |
| 22 | AISAPCIGSPGC | Hjurp | 0.08 | 0.00 | – | 227 | 297 |
| 23 | HKHLMPTQIIPG | Jmjd1c | 0.08 | 3.42 | – | 144 | 106 |
| 24 | MFGIDGFAAVIN | Pdhx | 0.07 | 10.26 | – | 56 | 59 |
| 25 | YQPRQSVSYEDV | Tasor2 | 0.06 | 5.16 | – | 188 | 220 |
| 26 | LCPLESRVPHTL | Hjurp | 0.06 | 0.00 | – | 218 | 127 |
| 27 | QMIVFYLIELLK | Jak2 | 0.05 | 6.03 | – | 2 | 6 |
| 28 | AHMYEAVALIKD | Dennd5a | 0.05 | 64.21 | – | 17 | 9 |
| 29 | DRIVHALNTTVP | Ccdc58 | 0.05 | 0.00 | – | 70 | 108 |
| 30 | NEVDVQEVTHSA | Dlg4 | 0.04 | 9.45 | – | 289 | 138 |
| 31 | LAAIVGKQVLLV | Rpl13a | 0.04 | 1203.49 | * | 48 | 2 |
| 32 | QRNRKLDYSSSE | Bod1l | 0.04 | 3.65 | – | 282 | 328 |
| 33 | HLGCIKKKFLQR | Sfi1 | 0.04 | 0.00 | – | 177 | 225 |
| 34 | PPTARMMFSGLA | Wiz | 0.03 | 16.70 | – | 18 | 167 |
| 35 | QEEVFAKHVSNA | Smarcc2 | 0.03 | 0.00 | – | 167 | 104 |
The gene expression is quantified as fragments per kilobase million. Neoantigens examined in Castle et al.[44] are classified by selection for validation (*) and reactivity (**). Ranked classification of the average scores of peptide sequences for a complete 12 mer sequence, considering epitope lengths between 8 and 12, given by NetH2pan and MHCflurry 2.0. The ranking of NetH2pan and MHCflurry 2.0 corresponds to binding affinity and presentation scores, respectively.