| Literature DB >> 35847273 |
Ilanit Gutman1, Ron Gutman1, John Sidney1, Leila Chihab1, Michele Mishto2,3, Juliane Liepe4, Anthony Chiem5, Jason Greenbaum6, Zhen Yan6, Alessandro Sette1,7, Zeynep Koşaloğlu-Yalçın1, Bjoern Peters1,7.
Abstract
Synthetic peptides are commonly used in biomedical science for many applications in basic and translational research. While peptide synthesis is generally easy and reliable, the chemical nature of some amino acids as well as the many steps and chemical compounds involved can render the synthesis of some peptide sequences difficult. Identification of these problematic sequences and mitigation of issues they may present can be important for the reliable use of peptide reagents in several contexts. Here, we assembled a large dataset of peptides that were synthesized using standard Fmoc chemistry and whose identity was validated using mass spectrometry. We analyzed the mass spectra to identify errors in peptide syntheses and sought to develop a computational tool to predict the likelihood that any given peptide sequence would be synthesized accurately. Our model, named Peptide Synthesis Score (PepSySco), is able to predict the likelihood that a peptide will be successfully synthesized based on its amino acid sequence.Entities:
Year: 2022 PMID: 35847273 PMCID: PMC9280948 DOI: 10.1021/acsomega.2c02425
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Examples of MS1 spectra of failed and successful peptide syntheses. (A) MS1 spectrum of a successful synthesis. A single peak at 100% relative abundance at a molecular weight of 1644.6 kDa which matches the molecular weight of the expected synthesized peptide. (B) MS1 spectrum of a lower quality synthesis. The peak at 100% relative abundance at a molecular weight of 1264.9 kDa matches the molecular weight of the expected synthesized peptide. There are, however, additional peaks at other molecular weights with lower relative abundance values. (C) MS1 spectrum of a problematic synthesis. The molecular weight of the expected synthesized peptide is 1566.1 kDa, which corresponds to the right-most peak. This peak is only fourth in relative abundance, and there are several other peaks at higher and lower relative abundance values. Those peaks are associated with lower molecular weights indicating the presence of shorter peptides.
Figure 2Abundance ratio histogram. We calculated the “abundance ratio” as the sum of the relative abundance values of the m/z peaks matching the ordered peptide divided by the sum of the relative abundance values of all m/z peaks observed in the MS1 spectrum. We considered MS1 spectra with an abundance ratio <50% (red line) as indicative of problematic (or failed) syntheses and MS1 spectra with an abundance ratio >50% as successful. The MS1 spectra for 87% of the analyzed peptides met the abundance ratio threshold of >50% and were considered as successful syntheses.
Figure 3Histogram of discrepancies between expected and measured peptide molecular weight. The chart is a histogram highlighting differences in the molecular weight of various peaks present in a representative spectrum compared to the desired peptide’s molecular weight. The values are rounded to the nearest 2. A difference of 0, which indicates an exact match, was removed from this chart for visualization purposes. Difference values higher than 0 indicate the molecular weight gained by the peptide. Difference values lower than 0 indicate that molecular weight dropped from the peptide. Possible explanations for various gains and losses are provided.
Figure 4(A) ML process. Shows the process of finding the best features and the best ML algorithm and validating the trained model against other data sources. (B) 10-fold cross-validation of different predictors. This set of runs shows that naive Bayes consistently provided performed best. (C) ROC of the final prediction model. Best performance was achieved using a naive Bayes classifier considering the peptide length and amino acid properties according to the Janin index.
Validation Dataset, Success Rate Threshold per Peptide Lengtha
| success rate threshold | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1 |
|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 100.0% | 99.9% | 99.6% | 99.4% | 98.4% | 97.2% | 94.6% | 89.0% | 79.8% | 70.9% |
| 10 | 99.6% | 99.2% | 98.8% | 97.7% | 95.4% | 91.9% | 85.8% | 76.0% | 65.2% | 53.1% |
| 15 | 96.3% | 91.3% | 85.0% | 77.0% | 69.9% | 62.5% | 55.2% | 47.7% | 41.1% | 33.9% |
| total | 99.2% | 98.3% | 97.0% | 95.1% | 92.6% | 89.5% | 84.7% | 77.1% | 67.6% | 57.7% |
For each peptide length and each success rate threshold, the fraction of peptides passing the threshold is summarized.
Figure 5ROC analysis of peptide synthesis success predictions on the independent MS2-based dataset. We used PepSySco to predict the likelihood of a successful synthesis for all peptides in the validation dataset and performed an ROC analysis considering the different MS2 success rate thresholds (shown here in different colors).
Figure 6ROC analysis for PepSysCo and the ThermoFisher Scientific tool for (A) our training dataset and (B) the independent MS2-based validation dataset.