| Literature DB >> 21125265 |
Harm Nijveen1, Michel G D Kester, Chopie Hassan, Aurélie Viars, Arnoud H de Ru, Machiel de Jager, J H Fred Falkenburg, Jack A M Leunissen, Peter A van Veelen.
Abstract
T cell epitopes derived from polymorphic proteins or from proteins encoded by alternative reading frames (ARFs) play an important role in (tumor) immunology. Identification of these peptides is successfully performed with mass spectrometry. In a mass spectrometry-based approach, the recorded tandem mass spectra are matched against hypothetical spectra generated from known protein sequence databases. Commonly used protein databases contain a minimal level of redundancy, and thus, are not suitable data sources for searching polymorphic T cell epitopes, either in normal or ARFs. At the same time, however, these databases contain much non-polymorphic sequence information, thereby complicating the matching of recorded and theoretical spectra, and increasing the potential for finding false positives. Therefore, we created a database with peptides from ARFs and peptide variation arising from single nucleotide polymorphisms (SNPs). It is based on the human mRNA sequences from the well-annotated reference sequence (RefSeq) database and associated variation information derived from the Single Nucleotide Polymorphism Database (dbSNP). In this process, we removed all non-polymorphic information. Investigation of the frequency of SNPs in the dbSNP revealed that many SNPs are non-polymorphic "SNPs". Therefore, we removed those from our dedicated database, and this resulted in a comprehensive high quality database, which we coined the Human Short Peptide Variation Database (HSPVdb). The value of our HSPVdb is shown by identification of the majority of published polymorphic SNP- and/or ARF-derived epitopes from a mass spectrometry-based proteomics workflow, and by a large variety of polymorphic peptides identified as potential T cell epitopes in the HLA-ligandome presented by the Epstein-Barr virus cells.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21125265 PMCID: PMC3035791 DOI: 10.1007/s00251-010-0497-1
Source DB: PubMed Journal: Immunogenetics ISSN: 0093-7711 Impact factor: 2.846
Overview of known MiHA used as a test set in this study. It displays the epitope name and the HLA-molecule it is presented in. In addition, its immunogenicity is indicated together with the gene name and the polymorphisms are indicated. aNames according to http://www.lumc.nl/dbminor
| Epitope namea / HLA | Sequence | Remarks | Gene | polymorphic AA | dnSNP entry |
|---|---|---|---|---|---|
| HA1 / A2 | VLHDDLLEA | immunogenic | HMHA1 | VL[R/H]DDLLEA | rs1801284 |
| HA2 / A2 | YIGEVLVSV | immunogenic | MYO1G | YIGEVLVS[V/M] | rs61739531 |
| HA3 / A1 | VTEPGTAQY | immunogenic | AKAP13 | V[M/T]EPGTAQY | rs2061821 |
| HA8 / A2 | RTLDKVLEV | immunogenic | KIAA0020 | [R/P]TLDKVLE[V/I] | rs2270891 |
| HA1 / B60 | KECVLHDDL | immunogenic | HMHA1 | KECVL[R/H]DDL | rs1801284 |
| LB-ADIR-1 F / A2 | SVAPALALFPA | immunogenic; ARF in 5’UTR | TOR3A (ADIR) | SVAPALAL[F/S]PA | rs2296377 |
| LB-ADIR-1 S / A2 | SVAPALALSPA | allelic counterpart | |||
| CTSHr / A31 | ATLPLLCAR | immunogenic | CTSH | ATLPLLCA[G/R] | rs2289702 |
| CTSHr / A33 | WATLPLLCAR | immunogenic | CTSH | WATLPLLCA[G/R] | rs2289702 |
| ACC1y / A24 | DYLQYVLQI | immunogenic | BCL2A1 | DYLQ[C/Y]VLQI | rs1138357 |
| ACC1c / A24 | DYLQCVLQI | immunogenic | |||
| ACC1c+ cystinylated | DYLQCVLQI | immunogenic | |||
| HB1h / B44 | EEKRGSLHVW | immunogenic | HMHB1 | EEKRGSL[H/Y]VW | rs161557 |
| ACC2d / B44 | KEFEDDIINW | immunogenic | BCL2A1 | KEFED[G/D]IINW | rs3826007 |
| ACC2g / B44 | KEFEDGIINW | allelic counterpart | |||
| LB-ECGF1-1 H/B7# | RPHAIRRPLAL | immunogenic; ARF | TYMP (ECGF1) | RP[H/R]AI[R/C]RPLAL | no entry; rs1061205 |
Summary of the searches with the test set of known MiHA against the IPI, MSIPI, PepHum, and HSPV database. The peptide names and sequences are given together with the charge of the precursor, submitted to tandem mass spectrometry. For each database, three columns are displayed: (1) whether the peptide is present in the database (Pr?), followed by (2) the mascot ion score assigned to the tandem mass spectrum (black filling if the mascot ion score is above the threshold of the search), and (3) the evaluation, i.e., was the tandem mass spectrum matched to the correct peptide (black filling and (Y) if correct, and above the mascot threshold (cut-off score), gray filling if correct and below (ye) the mascot threshold. In short, the blacker the better. The HSPVdb scores very well, due to its reduced format in combination with a high density of relevant SNP information. Wr wrong interpretation of MS2 spectrum; np no matching/no proposal from mascot search. aNames according to http://www.lumc.nl/dbminor. #Charge state 4+ was the most abundant in the charge distribution of peptide LB-ECGF-1H, but its MS2 spectrum was of such poor quality that it was not included for database searching. LB-ADIR peptides are from an ARF. ACC1+ Cys represents a special case in which the cysteine residue in the epitope can be modified by formation of an S–S bridge with free cysteines. This is relevant for both in vivo recognition and mass spectrometric interpretation
Overview of the databases used in this study, listing the number of entries and the number of amino acid residues present in each database. In addition, the presence of ARFs and the (type of) SNP information in the various databases is indicated. The number of residues of each database relative to the IPI database and the relative size of the HSPV subsets is given. The number of SNPs in MSIPI 3.67 is 170.242; the number of SNPs in HSPVdb (subsets 1 and 5) is 380.182
| Database | Number sequences | Number of residues | Size relative to IPI 3.69 | ARFs? | 0/1? | Unk? |
|---|---|---|---|---|---|---|
| IPI (HUMAN v3.69) | 87130 | 35200044 | 1.00 | – | – | |
| MSIPI (HUMAN v3.67) | 87040 | 42553286 | 1.21 | ✓ | ✓ | |
| PepHum | 75237 | 176019757 | 5.00 | ✓ | ✓ | ✓ |
| HSPV | 2634086 | 45422884 | 1.29 | ✓ | ✓ | ✓ |
| Rel. to set 5 | ||||||
| HSPV subset 1 | 423015 | 8344552 | 0.18 | ✓ | ✓ | |
| HSPV subset 2 | 377269 | 7440614 | 0.16 | ✓ | ||
| HSPV subset 3 | 106379 | 2108989 | 0.05 | |||
| HSPV subset 4 | 152125 | 3012927 | 0.07 | ✓ | ||
| HSPV subset 5 | 2634086 | 45422884 | 1.00 | ✓ | ✓ | ✓ |
| HSPV subset 6 | 2378073 | 41106669 | 0.90 | ✓ | ✓ | |
| HSPV subset 7 | 729721 | 12444311 | 0.27 | ✓ | ||
| HSPV subset 8 | 985734 | 16760526 | 0.37 | ✓ | ✓ |
Fig. 1a Summary of the searches with 1-ppm accuracy against the IPI, MSIPI, PepHum, and HSPV databases. The color coding is as follows: black correct hit and above the MASCOT significance threshold; gray correct hit, but below the significance threshold. b Summary of the searches against HSPVdb with various mass measurement accuracies. b Summary of the searches with various mass accuracies, 1, 2, 5, 10, and 50-ppm accuracy against the HSPV database. The color coding is as above
Fig. 2Number of incorporated SNPs per release of RefSeq (a) and of MSIPI (b)
Exclusive peptides with selected info from the HSPVdb. Peptides are either in frame (y) or in an ARF (n). The position of a SNP is indicated in the column SNP. In addition, the heterozygosity and NetMHC score is given
| Peptide | mRNA | Gene | Protein | rel2cds | In frame | dbSNP | SNP | Het | NetMHC |
|---|---|---|---|---|---|---|---|---|---|
| FLIPKTLVGV | NM_017700 | FLJ20184 | NP_060170.1 | downstream | y | rs2121558 | FLIPKTLVG[E/V] | 0.47 | 9 |
| SLSDLIYAL | NM_001080837 | SEBOX | NP_001074306.2 | inside | y | rs9910163 | SLSDLIYA[L/S] | 0.13 | 7 |
| GLWEQENHL | NM_024713 | C15orf29 | NP_078989.1 | inside | y | rs34998154 | GLW[E/K]QENHL | 0.05 | 41 |
| FIVTVIHTI | NM_024607 | PPP1R3B | NP_078883.2 | downstream | n | rs330915 | FIVTVIHT[I/F] | 0.49 | 30 |
| FLSEHPNVTL | NM_145298 | APOBEC3F | NP_660341.2 | inside | y | rs17000697 | FL[A/S]EHPNVTL | 0.28 | 19 |
| FLNQRSIML | NM_030956 | TLR10 | NP_112218.2 | upstream | n | rs9998678 | FLNQ[R/W]SIML | 0.05 | 29 |
| LLQSLVSI | NM_198889 | ANKRD17 | NP_942592.1 | inside | n | rs6855349 | LLQS[S/L]VSI | 0.46 | 46 |
| TLLDPNEKYLL | NM_016243 | CYB5R1 | NP_057327.2 | inside | y | rs2232842 | TLLDP[N/S]EKYLL | 0.31 | 31 |
Fig. 3Screen shots show the output of a query for the peptides SVAPALALFPA (upper panel) and TLSELHCD (lower panel). It clearly illustrates the effect of the large number of annotated variations at the amino acid level