Literature DB >> 18463140

NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11.

Claus Lundegaard¹, Kasper Lamberth, Mikkel Harndahl, Søren Buus, Ole Lund, Morten Nielsen.

Abstract

NetMHC-3.0 is trained on a large number of quantitative peptide data using both affinity data from the Immune Epitope Database and Analysis Resource (IEDB) and elution data from SYFPEITHI. The method generates high-accuracy predictions of major histocompatibility complex (MHC): peptide binding. The predictions are based on artificial neural networks trained on data from 55 MHC alleles (43 Human and 12 non-human), and position-specific scoring matrices (PSSMs) for additional 67 HLA alleles. As only the MHC class I prediction server is available, predictions are possible for peptides of length 8-11 for all 122 alleles. artificial neural network predictions are given as actual IC(50) values whereas PSSM predictions are given as a log-odds likelihood scores. The output is optionally available as download for easy post-processing. The training method underlying the server is the best available, and has been used to predict possible MHC-binding peptides in a series of pathogen viral proteomes including SARS, Influenza and HIV, resulting in an average of 75-80% confirmed MHC binders. Here, the performance is further validated and benchmarked using a large set of newly published affinity data, non-redundant to the training set. The server is free of use and available at: http://www.cbs.dtu.dk/services/NetMHC.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2008 PMID： 18463140 PMCID： PMC2447772 DOI： 10.1093/nar/gkn202

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Intracellular infections with pathogens such as viruses and certain bacteria are defeated by cytotoxic T lymphocytes (CTL). The CTL T-cell receptor (TCR) recognizes foreign peptides in complex with major histocompatibility complex (MHC) class I molecules on the surface of the infected cells. MHC class I molecules preferably bind and present nine amino acid long peptides, which mainly originates from proteins expressed in the cytosol of the presenting cell. In most vertebrates, MHCs exist in a number of different allelic variants that each binds a specific and very limited set of peptides. For a number of years, prediction methods have developed to identify which peptides will bind a given MHC (1), and such predictions can be highly valuable in a broad range of applications, including rational vaccine design and disease diagnostics. The artificial neural network (ANN) training method behind NetMHC (2,3) has been benchmarked to be the best among available methods (4). Preliminary versions of the algorithm have been used to predict possible MHC-binding peptides in a large set of pathogenic viral proteomes, resulting in an average of >75% confirmed MHC binders (5). Most MHC prediction algorithms (a list of other servers is included in the Supplementary Material) are trained on peptides of the same length as they predict, but since data for peptide lengths different from nine are much more scarce, the broadness of MHC binding predictions for different peptide lengths is accordingly limited. In this server, however, a method is implemented making it possible to predict 8-, 10- and 11-mer peptide binding using 9-mer trained predictors, which extends the MHC coverage for these peptide lengths significantly compared to other available MHC:peptide-binding servers.

METHODS

The server is trained on the largest number of quantitative peptide:MHC affinity measurements ever published using both affinity data from the Immune Epitope Database and Analysis Resource (IEDB) (6), eluted peptide data from the SYFPEITHI database (7) and proprietary affinity data. The predictions based on ANNs are trained essentially as described in (3) on data from 55 MHC alleles (43 Human and 12 non-human), and the predictions based on position specific scoring matrices (PSSMs) are trained as described in (2) for additional 67 HLA alleles. A large number of 9-mer MHC affinity data have become available from the IEDB database, since the training of the ANNs used at NetMHC-3.0, and all peptides not used in the training (6452 9-mer peptide affinity data points, covering 32 HLA alleles) were used for evaluation of the server performance. These data are available at the server. In this dataset, 3104 were measured to be binders (IC50<500 nM), 76% of these were correctly predicted as such. 3030 peptides were predicted to bind to a given HLA, and 78% of these had a measured IC50<500 nM. The average Pearson correlation coefficient (PCC) and area under a ROC curve (AUC) value using a 500 nM classification threshold were 0.71 and 0.86, respectively. For the full per allele results, see the Supplementary Material (Supplementary Table 1 and Supplementary Figure 1). NetMHC-3.0 uses a new approximation algorithm that reliably predicts the affinity of peptides of lengths 8, 10 and 11, for which affinity data for training are rare (8). The method uses predictors trained on peptides of length 9 to successfully extrapolate to other lengths. In short, the method approximates each peptide of any length to a number of 9-mers, by inserting X (for 8-mers) or deleting amino acid(s) (for 10- and 11-mers) and set the final prediction to an average of the 9-mer predictions. We had previously trained ANN predictors directly on 10-mer affinity data and since this training more than 2000 10-mer peptide:MHC affinities had become available from the IEDB database (6). Area under a ROC curve (AUC) values were calculated for each allele using either ANNs trained on 10-mers or the approximation method. For 12 of the 16 alleles, the approximation method performed better than the 10-mer trained ANNs (P < 0.01), see Supplementary Material Figure 2. However, for the four HLA-alleles, this evaluation showed better performance for ANNs trained on 10-mer peptides; these 10-mer trained ANNs are used for predictions by the server. For 8-mers, 2002 affinity data were extracted covering 35 MHC alleles. The overall PCC and AUC were 0.68 and 0.86, respectively. For 8-mer per allele performance, see the Supplementary Material Figure 4. For 8-mers, predictors trained on actual 8-mers seems to be better than the approximation method otherwise used, so for the alleles with available 8-mer affinity data, 8-mer trained ANNs are used for the predictions. In general, it is not possible to estimate how reliable a single prediction is. However, the stronger the affinity is predicted the higher are the chance that the actual affinity is stronger than the generally accepted binding threshold of 500 nM.

SERVER

NetMHC-3.0 predicts the binding affinity of either a list of peptides with a defined length (8–11 residues) or all possible sub-peptides hosted within full-length proteins. The input must be in the FASTA format, or as peptides all of equal length, one peptide pr. line. The server will accept a maximum of 5000 sequences per submission; each sequence not more than 20 000 amino acids with a minimum length corresponding to the selected length of prediction (see subsequently). Input data can be pasted into a text field or uploaded from a local file on the user's computer. If the input is in peptide, format the corresponding tick-box must be selected. The input must not exceed 5000 sequences and with a maximum of 20 000 amino acids in each sequence. One or more MHCs must be selected, as well as the desired peptide length. Only one prediction length at a time can be used. The output can optionally be sorted according to the predicted affinity by selecting a tick-box. The predictions start by clicking the Submit button. An example input in FASTA format is shown in Figure 1.

Figure 1.

Example input in FASTA format.

Example input in FASTA format. The output is displayed as raw text with a header indicating the server name, the type of prediction (PSSM, ANN or ANN-approximation) the first selected allele and the date (Figure 2) followed by the prediction output in a column format. The columns are named in the first line of the prediction output. The first column [pos] is the position of the first amino acid of the predicted peptide within the possibly longer sequence, numbering starting with 0. Column (peptide) is the primary sequence of the (sub-)peptide. Column (logscore) is the raw prediction output, which for ANNs is 1-log50000 to the affinity in nanomolar units. For PSSM predictions the raw prediction score is a log-odds likelihood score. Additionally a column is included for ANN predictions, [affinity (nM)], which is the predicted affinity presented in nanomolar units. Column (Bind Level) indicates if the peptide is predicted to bind stronger than a certain threshold [for ANN predictions stronger than 50 nM (SB) or stronger than 500 nM (WB); for PSSM high-binding peptides (SB) have a prediction score greater than the 0.1% percentile score value of 1 000 000 random natural peptides, and weak binding (WB) peptides a score value above the 1% percentile score of 1 000 000 random natural peptides predictions]. Predicted affinities weaker than 500 nM or lower than the 1% percentile score have no indications. Column (Protein Name) gives the name of the predicted protein. If peptide input was used, the name will always be ‘Sequence’. Column (Allele) gives the name of the MHC allele chosen. The output contains all the sub-peptides for each protein for a given allele either in the order they appear in the sequence or sorted by predicted affinity within each protein (if chosen). If more than one protein sequence were entered, a dashed line will separate the peptides from each protein. If more than one allele were chosen, the output will show a header similar to the first immediately after the first predictions, all in the same web output page.

Figure 2.

Raw text output using the input in Figure 1 and selecting the alleles HLA-A0201 and HLA-A0301. 10-mer peptide predictions were chosen. Affinity sorting was chosen.

Raw text output using the input in Figure 1 and selecting the alleles HLA-A0201 and HLA-A0301. 10-mer peptide predictions were chosen. Affinity sorting was chosen. In each header, there is a link to a file with the output in tab as separated format, where the filename ends on.xls making it easily imported into spreadsheet programs. This file always contains the predicted peptides in the order they appeared in the input file. The output data for each peptide will be displayed on a single line with predictions for each of the selected alleles in different columns (Figure 3).

Figure 3.

Downloaded output sheet opened in Microsoft® Excel and adjusted with of column. The output was generated using input in Figure 1 and selecting the alleles HLA-A0201 and HLA-A0301. 10mer peptide predictions were chosen.

FINAL REMARKS

This server is developed to aid research and limit the resources needed for rational and effective CTL epitope discovery and will be continuously updated as new data become available. All comments and suggestions for usability improvements are most welcome.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

8 in total

Review 1. SYFPEITHI: database for MHC ligands and peptide motifs.

Authors: H Rammensee; J Bachmann; N P Emmerich; O A Bachor; S Stevanović
Journal: Immunogenetics Date: 1999-11 Impact factor: 2.846

2. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.

Authors: Morten Nielsen; Claus Lundegaard; Peder Worning; Sanne Lise Lauemøller; Kasper Lamberth; Søren Buus; Søren Brunak; Ole Lund
Journal: Protein Sci Date: 2003-05 Impact factor: 6.725

3. Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.

Authors: Morten Nielsen; Claus Lundegaard; Peder Worning; Christina Sylvester Hvid; Kasper Lamberth; Søren Buus; Søren Brunak; Ole Lund
Journal: Bioinformatics Date: 2004-02-12 Impact factor: 6.937

4. A roadmap for the immunomics of category A-C pathogens.

Authors: Alessandro Sette; Ward Fleri; Bjoern Peters; Muthuraman Sathiamurthy; Huynh-Hoa Bui; Stephen Wilson
Journal: Immunity Date: 2005-02 Impact factor: 31.745

5. Accurate approximation method for prediction of class I MHC affinities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers.

Authors: Claus Lundegaard; Ole Lund; Morten Nielsen
Journal: Bioinformatics Date: 2008-04-14 Impact factor: 6.937

6. A community resource benchmarking predictions of peptide binding to MHC-I molecules.

Authors: Bjoern Peters; Huynh-Hoa Bui; Sune Frankild; Morten Nielson; Claus Lundegaard; Emrah Kostem; Derek Basch; Kasper Lamberth; Mikkel Harndahl; Ward Fleri; Stephen S Wilson; John Sidney; Ole Lund; Soren Buus; Alessandro Sette
Journal: PLoS Comput Biol Date: 2006-06-09 Impact factor: 4.475

7. Modeling the adaptive immune system: predictions and simulations.

Authors: Claus Lundegaard; Ole Lund; Can Kesmir; Søren Brunak; Morten Nielsen
Journal: Bioinformatics Date: 2007-11-28 Impact factor: 6.937

8. SARS CTL vaccine candidates; HLA supertype-, genome-wide scanning and biochemical validation.

Authors: C Sylvester-Hvid; M Nielsen; K Lamberth; G Røder; S Justesen; C Lundegaard; P Worning; H Thomadsen; O Lund; S Brunak; S Buus
Journal: Tissue Antigens Date: 2004-05

8 in total

332 in total

1. Towards an immunosense vaccine to prevent toxoplasmosis: protective Toxoplasma gondii epitopes restricted by HLA-A*0201.

Authors: Hua Cong; Ernest J Mui; William H Witola; John Sidney; Jeff Alexander; Alessandro Sette; Ajesh Maewal; Rima McLeod
Journal: Vaccine Date: 2010-11-21 Impact factor: 3.641

2. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions.

Authors: Edita Karosiene; Claus Lundegaard; Ole Lund; Morten Nielsen
Journal: Immunogenetics Date: 2011-10-20 Impact factor: 2.846

Review 3. Immunoinformatics: an integrated scenario.

Authors: Namrata Tomar; Rajat K De
Journal: Immunology Date: 2010-08-16 Impact factor: 7.397

Review 4. Major histocompatibility complex class I binding predictions as a tool in epitope discovery.

Authors: Claus Lundegaard; Ole Lund; Søren Buus; Morten Nielsen
Journal: Immunology Date: 2010-05-26 Impact factor: 7.397

5. The impact of human leukocyte antigen (HLA) micropolymorphism on ligand specificity within the HLA-B*41 allotypic family.

Authors: Christina Bade-Döding; Alex Theodossis; Stephanie Gras; Lars Kjer-Nielsen; Britta Eiz-Vesper; Axel Seltsam; Trevor Huyton; Jamie Rossjohn; James McCluskey; Rainer Blasczyk
Journal: Haematologica Date: 2010-10-07 Impact factor: 9.941

6. MHC-I prediction using a combination of T cell epitopes and MHC-I binding peptides.

Authors: Tal Vider-Shalit; Yoram Louzoun
Journal: J Immunol Methods Date: 2010-10-12 Impact factor: 2.303

7. Analysis of CD8⁺ T cell response during the 2013-2016 Ebola epidemic in West Africa.

Authors: Saori Sakabe; Brian M Sullivan; Jessica N Hartnett; Refugio Robles-Sikisaka; Karthik Gangavarapu; Beatrice Cubitt; Brian C Ware; Dylan Kotliar; Luis M Branco; Augustine Goba; Mambu Momoh; John Demby Sandi; Lansana Kanneh; Donald S Grant; Robert F Garry; Kristian G Andersen; Juan Carlos de la Torre; Pardis C Sabeti; John S Schieffelin; Michael B A Oldstone
Journal: Proc Natl Acad Sci U S A Date: 2018-07-23 Impact factor: 11.205

8. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer.

Authors: Naiyer A Rizvi; Matthew D Hellmann; Alexandra Snyder; Pia Kvistborg; Vladimir Makarov; Jonathan J Havel; William Lee; Jianda Yuan; Phillip Wong; Teresa S Ho; Martin L Miller; Natasha Rekhtman; Andre L Moreira; Fawzia Ibrahim; Cameron Bruggeman; Billel Gasmi; Roberta Zappasodi; Yuka Maeda; Chris Sander; Edward B Garon; Taha Merghoub; Jedd D Wolchok; Ton N Schumacher; Timothy A Chan
Journal: Science Date: 2015-03-12 Impact factor: 47.728

9. Tumor exome analysis reveals neoantigen-specific T-cell reactivity in an ipilimumab-responsive melanoma.

Authors: Nienke van Rooij; Marit M van Buuren; Daisy Philips; Arno Velds; Mireille Toebes; Bianca Heemskerk; Laura J A van Dijk; Sam Behjati; Henk Hilkmann; Dris El Atmioui; Marja Nieuwland; Michael R Stratton; Ron M Kerkhoven; Can Kesmir; John B Haanen; Pia Kvistborg; Ton N Schumacher
Journal: J Clin Oncol Date: 2013-09-16 Impact factor: 44.544

Review 10. Genomic Approaches to Understanding Response and Resistance to Immunotherapy.

Authors: David A Braun; Kelly P Burke; Eliezer M Van Allen
Journal: Clin Cancer Res Date: 2016-10-03 Impact factor: 12.531