| Literature DB >> 32415162 |
Michele Mishto1,2, Juliane Liepe3, Gerd Specht4, Hanna P Roetschke4, Artem Mansurkhodzhaev4, Petra Henklein5, Kathrin Textoris-Taube6, Henning Urlaub4.
Abstract
Proteasomes are the main producers of antigenic peptides presented to CD8+ T cells. They can cut proteins and release their fragments or recombine non-contiguous fragments thereby generating novel sequences, i.e. spliced peptides. Understanding which are the driving forces and the sequence preferences of both reactions can streamline target discovery in immunotherapies against cancer, infection and autoimmunity. Here, we present a large database of spliced and non-spliced peptides generated by proteasomes in vitro, which is available as simple CSV file and as a MySQL database. To generate the database, we performed in vitro digestions of 55 unique synthetic polypeptide substrates with different proteasome isoforms and experimental conditions. We measured the samples using three mass spectrometers, filtered and validated putative peptides, identified 22,333 peptide product sequences (15,028 spliced and 7,305 non-spliced product sequences). Our database and datasets have been deposited to the Mendeley (doi:10.17632/nr7cs764rc.1) and PRIDE (PXD016782) repositories. We anticipate that this unique database can be a valuable source for predictors of proteasome-catalyzed peptide hydrolysis and splicing, with various future translational applications.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32415162 PMCID: PMC7228940 DOI: 10.1038/s41597-020-0487-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Proteasome-catalyzed peptide hydrolysis and peptide splicing. Proteasomes form peptide fragments by: (a) peptide hydrolysis and (b,c) cis peptide splicing, when the two splice-reactants derive from the same polypeptide molecule; peptide fragment ligation can occur in normal order, i.e. following the orientation from N- to C-terminus of the parental protein (normal cis peptide splicing; b), or in reverse order (reverse cis peptide splicing; c); (d) trans peptide splicing, when the two splice-reactants originate from two distinct molecules of the same protein or two distinct proteins. The two fragments, bound together during the peptide splicing reaction, are named splice-reactants. The portion between two splice-reactants is called intervening sequence. In this schematic, we summarize the residue positions (from nsP4 to nsP4’ for non-spliced peptides and from sP4 to sP4’ for spliced peptides) that were examined for the position frequency matrices, and which seem to be relevant in proteasome-catalyzed peptide hydrolysis and peptide splicing reactions. Arrows represent the substrate cleavage sites used by proteasome catalytic Thr1.
Fig. 2Database content and construction. (a) Database content. The database consists of peptide products identified in digestions of 55 unique substrates with 20 S and 26S s- and i-proteasomes at 4 h and 20/24 h digestion time. Shown is the main part, which represents all digestions with 20 S proteasomes. Values refer to product sequences (identified per sample). Several product sequences are identified in multiple experimental conditions using the sample substrate. Therefore, the number of unique peptide sequences, i.e. peptide sequences identified per substrate, is smaller than that of product sequences. (b) Length distribution of synthetic polypeptide substrates included in the database. (c) Matrixes of the amino acid frequency of the synthetic polypeptide substrates included in our database and of the human proteome database. (d) Overview of the data processing pipeline to construct the database. (e) Identification of spliced and non-spliced peptides from MS datasets, which was applied to assign peptides to each MS/MS spectrum.
Database description.
| Column name | Description |
|---|---|
| sampleID | Unique identifier for every sample |
| sampleName | Sample Name used during experiment |
| runID | Technical replicate number |
| protIsotype | proteasome isoform used for digestion |
| digestTime | Elapsed digestion time (hours) at time of measurement |
| species | species origin for used proteasome |
| sampleDate | Sample date |
| instrument | Instrument used for measurement |
| fragmentation | Fragmentation method used for measurement |
| substrateOrigin | Biological origin of substrate |
| location | Measurement location |
| substrateSeq | Amino acid sequence of substrate |
| substrateID | Unique identifier for a substrate sequence |
| pepSeq | Amino acid sequence of peptide products |
| scanNum | Scan Number listed in the RAW file |
| rankMS | Peptide rank assigned by Mascot Server |
| ionScore | Ion Score assigned by Mascot Server |
| qValue | Q-Value assigned by Mascot Server |
| productType | PCP: non-spliced peptide; PSP: spliced peptide |
| spliceType | PSP: |
| positions | Peptide sequence described as amino acid positions in the substrate sequence |
| charge | Ion charge |
| PTM | Post-translational modifications |
Listed are the column names (i.e. Attributes) in the database and their corresponding explanations.
Fig. 3Database validations and characteristics of spliced and non-spliced peptides products. (a-c) Comparison of measured and predicted retention time of non-spliced, cis and trans spliced peptides identified in our database. Non-spliced peptides were used to train a retention time model (a), which was then used to predict the retention times of identified cis spliced (b) and trans spliced peptides (c). (d,e) Relative frequency (d) and length distribution (e) of non-spliced, cis spliced and trans spliced unique peptides generated after 20/24 h digestion by 20S s-proteasomes. This analysis is done on unique peptide sequences (i.e. unique sequences identified per substrate). For database validation, length distribution of synthesis artifacts (d,e) and of random control dataset (e) are shown. (f,g) Length distribution of N-terminal (splice-reactant 1) and C-terminal (splice-reactant 2) splice-reactants (f) and intervening sequence length distribution (g) of spliced peptide products detected in 20/24 h in vitro digestions with 20S s-proteasomes are shown. As comparison, length distribution of synthesis artifacts and of random control dataset are shown. Statistically significant comparisons are labeled with * and the related p values are reported in Table 2. (h,i) Matrixes of the amino acid frequency, in the position enumerated in Fig. 1, of non-spliced and spliced peptide products generated by 20S s-proteasomes after 20/24 h (h) and synthesis artifacts identified in control samples (i).
Tests for statistical differences between characteristics of identified peptides.
| Group 1 | Group2 | p-value |
|---|---|---|
| Non-spliced products | Non-spliced synthesis artifacts | <2e-16 |
| Non-spliced products | Non-spliced random control | <2e-16 |
| Non-spliced synthesis artifacts | Non-spliced random control | <2e-16 |
| <2e-16 | ||
| <2e-16 | ||
| <2e-16 | ||
| <2e-16 | ||
| <2e-16 | ||
| <2e-16 | ||
| Non-spliced products | <2e-16 | |
| <2e-16 | ||
| Non-spliced products | <2e-16 | |
| Normal | Normal | <2e-16 |
| Normal | Normal | <2e-16 |
| Normal | Normal | <2e-16 |
| Reverse | Reverse | <2e-16 |
| Reverse | Reverse | 6.3e-16 |
| Reverse | Reverse | 7.9e-13 |
| <2e-16 | ||
| <2e-16 | ||
| 1.8e-11 | ||
| Normal | Normal | <2e-16 |
| Normal | Normal | 0.055 |
| Normal | Normal | <2e-16 |
| Reverse | Reverse | <2e-16 |
| Reverse | Reverse | <2e-16 |
| Reverse | Reverse | <2e-16 |
| <2e-16 | ||
| <2e-16 | ||
| <2e-16 | ||
| Normal | Normal | 0.013 |
| Normal | Normal | <2e-16 |
| Normal | Normal | <2e-16 |
| Reverse | Reverse | 0.042 |
| Reverse | Reverse | 0.014 |
| Reverse | Reverse | 0.14 |
Group 1 is compared to group 2 using Kolmogorov-Smirnov test. Resulting p-values are listed.
| Measurement(s) | peptide |
| Technology Type(s) | mass spectrometry |
| Factor Type(s) | spliced/non-spliced • instrument • synthetic polypeptide • proteasome isoform • time of reaction |
| Sample Characteristic - Organism | Homo sapiens |