Literature DB >> 35609248

Accurate Prediction of y Ions in Beam-Type Collision-Induced Dissociation Using Deep Learning.

HyeonSeok Shin1, Youngmin Park1, Kyunggeun Ahn1, Sungsoo Kim1.   

Abstract

Peptide fragmentation spectra contain critical information for the identification of peptides by mass spectrometry. In this study, we developed an algorithm that more accurately predicts the high-intensity peaks among the peptide spectra. The training data are composed of 180,833 peptides from the National Institute of Standards and Technology and Proteomics Identification database, which were fragmented by either quadrupole time-of-flight or triple-quadrupole collision-induced dissociation methods. Exploratory analysis of the peptide fragmentation pattern was focused on the highest intensity peaks that showed proline, peptide length, and a sliding window of four amino acid combination that can be exploited as key features. The amino acid sequence of each peptide and each of the key features were allocated to different layers of the model, where recurrent neural network, convolutional neural network, and fully connected neural network were used. The trained model, PrAI-frag, accurately predicts the fragmentation spectra compared to previous machine learning-based prediction algorithms. The model excels at high-intensity peak prediction, which is advantageous to selective/multiple reaction monitoring application. PrAI-frag is provided via a Web server which can be used for peptides of length 6-15.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35609248      PMCID: PMC9178553          DOI: 10.1021/acs.analchem.1c03184

Source DB:  PubMed          Journal:  Anal Chem        ISSN: 0003-2700            Impact factor:   8.008


Proteomics is a growing field of research that has greatly benefited from the advances in tandem mass spectrometry (LC–MS/MS) technology. Increased throughput, accuracy, and resolution of MS/MS have enabled the rapid transition from peptide discovery to industrial and clinical applications.[1] One of the breakthroughs in MS/MS technology lies in the process of peptide fragmentation and isolation of its fragments. Historically used collision-induced dissociation (CID) with ion trap CID (IT-CID), also known as resonance-type CID, uses relatively low energy for fragmentation. Beam-type CID, a fragmentation method that uses relatively higher collision energy (CE), such as triple quadrupole (QqQ), quadrupole time of flight (QTOF), and higher energy collision dissociation (HCD) were developed later.[2−4] Among the different fragmentation methods, QqQ, QTOF, and HCD methods have gained attention as higher CE increased the detection of y ions which enabled the detection and quantification of low-abundant proteins.[2,3,5,6] The differences in fragmentation methods, instruments, number of collisions, and CE strength lead to different patterns of fragmentation. The IT-CID method leads to different fragmentation patterns compared to beam-type CIDs at high CE settings, whereas similar spectra are obtained on QqQ and QTOF.[3] To identify peptides from different fragmentation methods, database generation for each fragmentation method would be required. Unless sufficient databases have been made, the database search method may result in a large fraction of unidentified spectra, which led to increased development of more powerful prediction algorithms.[7,8] In particular, HCD method has recently been widely used for scanning complex data in a high-throughput manner, leading to an explosive increase in available data. Along with the subtle but unignorably different fragmentation pattern of HCD compared to other types of CID, deep learning has recently been employed in several studies to predict the fragmentation spectra. Deep learning and machine learning have been utilized to predict fragment patterns in several studies.[9,10] As peptides are composed of amino acids, the amino acid combination is approached in a similar manner to the time series or natural language data, which is highly applicable for recurrent neural network (RNN) models.[11−14] Furthermore, sequence data are converted into image or vector data with additional information, as features have been utilized for applications to convolutional neural networks and existing machine learning algorithms.[12,15] Regardless of the trend in the research community, clinical industries are showing more interest toward selective/multiple reaction monitoring (SRM/MRM) MS/MS technology which uses QqQ CID as the fragmentation method.[16] The relatively lower resolution of MRM is compensated by its reproducibility and the ability to quantitate multiple biomarkers in a single run, which makes MRM a favorable instrument for clinical application. To this end, we developed a deep learning algorithm that is more suitable for such applications. The model is composed of RNN, convolutional neural network (CNN), and fully connected neural network (FCN), each allocated to appropriate features. The algorithm was more specifically fitted to QTOF CID fragmentation and trained to focus on the prediction of y ions (peptide fragments on the C-terminus) that are expected to be measured at higher intensities by MS/MS.[17]

Methods

Data Preparation

The data used for training in this study are composed of the National Institute of Standards and Technology (NIST) collision cell library quadrupole time-of-flight of human data (2012-04-20), NIST yeast data (2012-04-17), data from the Proteomics Identification (PRIDE) study of human proteins by SWATH-MS (PXD000954), and 647 laboratory-synthesized peptides that are part of Pep-Quant library (Bertis).[18,19] The NIST rat data (2013-06-05) were used for overfitting validation and model evaluation.[18] Additional evaluation of the models was performed on the PRIDE data PXD001587 and PXD008651.[20,21] All obtained data were first parsed to the NIST database format. For peptides that are found in more than one database, a single database with the highest number of peaks in the spectrum was used. Modifications, such as carbamidomethylation on cysteine, were ignored and encoded as unmodified sequences. Peptides with the same sequence but different charged states were considered unique peptides. The amino acid sequence of each peptide was one-hot encoded to 20 numbers, each representing an amino acid. The b ions were removed. The peak intensities for a peptide were transformed to a size of (1,42) tensor, where each y ion was allocated three times to annotate the charge state of 1 to 3, from Y1 to Y14. The missing values that were highly unlikely or impossible to exist, such as Y14 ion for peptide length 8, were filled with −1. Otherwise, the missing values were filled with 0.

Training Environment

The deep learning model was generated using Python ver. 3.8.3 environment with Python libraries, Torch ver. 1.7.1,[22] numpy, 1.18.5,[23] pandas ver. 1.2.2, pyteomics ver.4.3.2,[24] sklearn ver.0.0,[25] pyYAML ver. 5.4.1, and easydict ver. 1.9. Unless otherwise mentioned, the training was performed via PyTorch model pipelines with MSEloss (mean squared error loss, reduction = “mean”), Adam optimizer (LR = 0.01),[26] ReduceLROnPlateau scheduler of mode “min”, gamma value of 0.1, and patience of 7. The training was performed using GeForce RTX 3090 (NVIDIA) with CUDA ver. 11.2.

Training Model Structure

The training model initially used the peptide amino acid sequence, charge, and CE as input values (Supporting Information Data S1). Using the input values, the length of the peptide and the counted number of proline residues in the peptide were calculated to be added as an extra feature in the model. In addition, the sliding window of the 4-mers were generated as vectors and added as another feature. The structure of the training model is a combination of RNN, FCN, and CNN with conventional layers as follows; the one-hot encoded sequence was weighted by a GRU bidirectional layer with a hidden size of 128. Part of the hidden information was stored in memory, and part of the information was passed through dot product attention layers that focus on the important features obtained from GRU, which are merged to the products of the FCN and CNN. The information from CE, charge, peptide length, and proline residue count were inputted through a FCN layer. The initial linear vector form of the sliding window of 4-mers is transformed to a 2D matrix of 4 × 12. The transformed 2D data were convoluted through three CNN layers. The results from the FCN and CNN are concatenated first and merged to the results from the first GRU layer by matrix multiplication, which was decoded by the second GRU layer. The weights were then processed into a second dot product attention layer and linear layers that form the prediction (Supporting Information Data S2).

K-Fold Validation

To validate the model’s accuracy and check for overfitting, the training data were divided to 10-fold, where 1 out of 10 was used for validation and 9 out of 10 were used for training. During k-fold cross-validation training and validation, the data change for each fold, enabling each data to be used as the validation data at least once. Each fold generated its final model after training, which was evaluated by the Pearson correlation coefficient (PCC) calculation on the NIST rat data.

Model Comparison

Prosit_2020_intensity_hcd, MS2PIP_QTOF and MS2PIP_HCD were used in comparison with our model (Supporting Information Data S3). For our model, from the 10 models obtained from k-fold cross-validation, the fold 2 model was used in PCC comparison, highest peak accuracy comparison, and in the inference model provided by the Web site. PCC was calculated using two methods. The first method, which will be called “without-zero” throughout the manuscript, uses input values that exist in the database (Supporting Information Figure S1a,b). For example, for a peptide “A” of length 9 and charge 2, the spectrum information for Y8, Y7, Y6, and Y5 is in the database. Then, the predicted values for Y8, Y7, Y6, and Y5 are compared. The second method, which will be called “with-zero” throughout the manuscript, uses input values with zero filled in for fragments without detected intensity. For peptide “A”, the intensity prediction was compared for Y8+2, Y8, Y7+2, Y7, Y6+2, Y6, 25+2, Y5, Y4+2, Y4, Y3+2, Y3, Y2+2, Y2, and Y1.

Inference

The inference model is available at www.prai.co.kr and https://github.com/bertis-prai/prai-test. The only required inputs are the peptide sequences, and the enabled inputs are charge and CE value, which are filled in if missing (Supporting Information Data S1).

Results and Discussion

Input Data Preparation

To build a predictive model for peptide fragmentation, we assembled a database from several QTOF data which show high similarity to QqQ-CID. The human and yeast data from the NIST peptide library, data from a study of human proteins by SWATH-MS (PXD000954), and data from laboratory-synthesized peptides were assembled to construct the initial training database.[18,19] The initial database was parsed to the NIST database format and further parsed to remove redundant peptides. The removal criteria were dependent on the number of identified fragments per peptide, as more y ion peaks per peptide would greatly benefit the training algorithm (Figure a). Accordingly, we further removed peptides that had less than three spectra per peptide. The generated database for training consisted of 180,833 unique peptides with an average of 5.427 peaks per peptide (Figure b).
Figure 1

Training data preparation. (a) Schematic diagram of the training data preparation. (b) Density plot showing the distribution of the number of peaks per peptide spectrum before and after filtering the input data. The blue and light-yellow filled graphs indicate the distribution before and after filtering, respectively.

Training data preparation. (a) Schematic diagram of the training data preparation. (b) Density plot showing the distribution of the number of peaks per peptide spectrum before and after filtering the input data. The blue and light-yellow filled graphs indicate the distribution before and after filtering, respectively.

Model Development and Peptide Spectra Characterization Using the Highest Peak

The initial model for peptide spectra prediction was developed with a simple-structured RNN model, with two RNN layers acting as an encoder and decoder. During model development, several editions, such as adding attention layers, testing different dropout probabilities, and changing hidden and batch size, have been performed until the model reached a point where its accuracy reached a plateau (Supporting Information Data S2). Theoretically, bidirectional RNNs should be able to learn the important features of the data and remember their weights in one or some of the nodes. However, in many cases, models improve depending on the type of features and weights that have been appropriately placed in the algorithm. To this end, we attempted to characterize the peptide fragmentation pattern by investigating the affiliation between the peptide and its highest peak (production ion with the most abundance per spectrum) after fragmentation. First, the amino acid sequence patterns before and after the site of fragmentation were investigated (Figure a). Heatmap of the amino acid residues show proline residue enrichment at fragment +1 site, regardless of the sequence at the fragment −1 site (Figure b). To investigate the effect of proline at the fragment +1 site, more analysis was performed on the peptides that contain proline residues. Proline residues at the +1 site were found to be more abundant for peptides with a high difference between the highest and second highest fragment intensities (Supporting Information Figure S2a,b). Furthermore, proline residues were highly enriched for product ions that retained the same charge state as the precursor ion. More than 50% of the product ions that retained their charge were composed of proline residues (Figure c). Although the mechanism behind these results is unclear, the existence of proline residues in a peptide showed sufficient anomalies to be taken as an additional feature to the prediction model.
Figure 2

Peptide feature characterization. (a) Schematic diagram of the words describing peptides, peptide fragments, and their location in reference to the fragment with the highest intensity (I.E., y ion with the highest intensity value for the peptide precursor). (b) Heatmap showing the abundance of amino acids for fragment site +1 and fragment site −1 at the vertical and horizontal axes, respectively. (c) Plot of WebLogo showing the probability of amino acid abundance in relation to the fragment site for product ions that retained their charge after fragmentation. (d) Heatmap showing the abundance of the precursor length and highest fragment length at vertical and horizontal axes, respectively. (e) Scatterplot showing the correlation between the amino acid count at fragment +1 sites, for all fragment sites, except for the precursor length −2 and the amino acid count at fragment +1 sites that have the length of precursor length −2.

Peptide feature characterization. (a) Schematic diagram of the words describing peptides, peptide fragments, and their location in reference to the fragment with the highest intensity (I.E., y ion with the highest intensity value for the peptide precursor). (b) Heatmap showing the abundance of amino acids for fragment site +1 and fragment site −1 at the vertical and horizontal axes, respectively. (c) Plot of WebLogo showing the probability of amino acid abundance in relation to the fragment site for product ions that retained their charge after fragmentation. (d) Heatmap showing the abundance of the precursor length and highest fragment length at vertical and horizontal axes, respectively. (e) Scatterplot showing the correlation between the amino acid count at fragment +1 sites, for all fragment sites, except for the precursor length −2 and the amino acid count at fragment +1 sites that have the length of precursor length −2. The fragment −1 site, on the other hand, showed enriched frequency to aliphatic and acidic amino acids. This pattern along with the highly enriched proline residue at the fragment +1 site agrees with previous studies that examined the amino acid sequence of peptide fragmentation.[8] Moreover, if a single sequence pattern shows such patterns, we suspect that the combination of fragment +1 and fragment +2 sites with fragment −1 and fragment −2 sites would also greatly affect the algorithm. Thus, we implemented a sliding window pattern as one of the key features in the developing algorithm. Next, we investigated whether the precursor length asserts any effect on the highest fragment length (Figure d). Interestingly, we found that the length of the highest fragment was considerably more abundant at precursor length −2 and slightly more abundant at precursor length −4. The length-dependent pattern was especially clear for peptides with precursor lengths between 7 and 12. For longer peptides (>12), the enrichment pattern was weakened, and fragment sites were more evenly distributed at the middle region of the peptide. To check whether any amino acid residues are enriched at the precursor length −2, the abundance of amino acids at the precursor length −2 sites and all other sites were compared (Figure e). The correlation graph shows that most amino acid abundances correlate except for proline, which is expected as proline shows an exceptionally high frequency of fragmentation compared to other amino acids. These data indicate that the abundance of amino acids at a given location was similar except for proline, which was enriched indifferentto the precursor length. Thus, the strong influence of the precursor peptide length was added as another feature for the fragment prediction model. The generated model takes in the peptide sequence, charge, and CE as inputs to calculate the characterized features such as length, proline counts, and sliding windows (Figure a). CE, charge, length, and the number of prolines in each peptide were used as feature set 1, where its data flow through a series of linear layers and are concatenated. The feature set 2 is composed of the peptide’s sliding window data, which was shaped linearly at first and then transformed into a two-dimensional layer similar to one-channel image data and trained via CNN (Supporting Information Figure S3). The obtained inputs of peptide sequence and features thus flow through a series of RNNs, CNN, and linear layers to finally predict 42 fragment intensities for a peptide (Supporting Information Table S1). The added features obtained from investigating the highest peak of each peptide enabled better performance of the model (Figure b,c). The initial and final loss values for both training and validation datasets were decreased with the representative features, and the number of epochs required for training was reduced.
Figure 3

Model structure. (a) Schematic representation of the deep learning model structure. From the input data, the amino acid sequences were one-hot encoded and trained via a bidirectional RNN layer. The information from the feature group was fed to the model by a fully connected layer. The amino acid sliding window information was fed to the CNN layer and represented by the colored circles on the top right side of the figure. The line graph showing the change in loss per epoch with and without features for (b) training data and (c) validation data.

Model structure. (a) Schematic representation of the deep learning model structure. From the input data, the amino acid sequences were one-hot encoded and trained via a bidirectional RNN layer. The information from the feature group was fed to the model by a fully connected layer. The amino acid sliding window information was fed to the CNN layer and represented by the colored circles on the top right side of the figure. The line graph showing the change in loss per epoch with and without features for (b) training data and (c) validation data.

Model Evaluation

To evaluate the model, PCC was calculated for each peptide by comparing the predicted spectrum values to the database values. The rat QTOF data from NIST (2013-06-05) was used as the evaluation dataset for 10-fold models (Supporting Information Table S2). NIST rat data were parsed in a similar format to the training data (Supporting Information Data S3). Peptides redundant to the training and validation data were removed, which left 3709 unique peptides with 27,121 spectra after parsing. The average median PCC value obtained from 10-fold cross-validation is 0.944, and the standard deviation of the median PCC value was approximately 0.000129 (Supporting Information Table S2). The low deviation value between the 10-fold models indicates the uniform performance of the trained models, whereas a model with overfitting problem would show high variance from cross-validation. These data indicate that the generated model performs at a similar level of accuracy. The final model (hereinafter PrAI-frag) was evaluated further by comparing the PCC distributions calculated from the prediction of Prosit 2020 HCD, MS2PIP QTOF, and HCD models to the rat QTOF data (Supporting Information Data S3).[9,11,19,27] To apply the Prosit HCD model to the evaluation data, we tested all possible normalized CE (NCE) values for CE calibration (Supporting Information Figure S4). The Prosit HCD model with NCE values of 27 and 28 showed the highest correlation to the NIST rat QTOF data for the without-zero and with-zero calculation methods, respectively. The PCC values of approximately 0.945 and 0.905 for the without-zero and with-zero methods, respectively, indicate the Prosit HCD model’s compatibility to the QTOF data (Figure a,b). The median PCC of PrAI-frag showed the highest PCC among the compared models of MS2PIP and Prosit for both with-zero and without-zero comparisons (Figure a,b). While the overall values of PCC decreased for all models by the with-zero method calculation, PrAI-frag showed the least reduction in PCC values (Figure b). To further analyze the models’ accuracy, we also calculated the PCC for different peptide precursor charge states. All models showed a slightly increased prediction accuracy for charge 2 and decreased prediction accuracy for charge 3 (Supporting Information Table S3). However, the magnitude of accuracy changes differed for the compared models. PrAI-frag and Prosit models showed 0.1–0.2 decrease in PCC, while the MS2PIP model showed a greater decrease in PCC for peptides with charge 3. Nonetheless, these results suggest that PrAI-frag shows robust prediction accuracy.
Figure 4

Model comparison. Violin plots showing the PCC of the prediction results from PrAI-frag, Prosit_2020_intensity_HCD (NCE = 27), Prosit_2020_intensity_HCD (NCE = 28), MS2PIP QTOF, and MS2PIP HCD for the rat QTOF data from the NIST database. (a) PCC calculation performed with “without-zero” data, which uses the input values that only exist in the database. (b) PCC calculation performed with “with-zero” data where zero values are filled in for positions where intensity detection is possible. Only values above zero are shown in the graph.

Model comparison. Violin plots showing the PCC of the prediction results from PrAI-frag, Prosit_2020_intensity_HCD (NCE = 27), Prosit_2020_intensity_HCD (NCE = 28), MS2PIP QTOF, and MS2PIP HCD for the rat QTOF data from the NIST database. (a) PCC calculation performed with “without-zero” data, which uses the input values that only exist in the database. (b) PCC calculation performed with “with-zero” data where zero values are filled in for positions where intensity detection is possible. Only values above zero are shown in the graph. To assess the effect of the prediction accuracy of different models on the downstream process, we designed two tests. The first test simulated an MRM analysis where a few discriminative transitions, that is, the intense peaks such as the highest peaks need to be selected. To this end, we counted the instances where the predicted highest peak for a peptide spectrum was equal to the highest peak from the database, using the NIST rat data. In agreement to the violin plot, PrAI-frag showed the highest accuracy for the prediction of top three highest intensity peaks (Table ).
Table 1

Prediction Accuracy of PrAI-Frag, MS2PIP, and Prosit for High-Intensity Peaksa

 highest peaka (%)top 1 from top 3 (%)top 2 from top 3 (%)
PrAI-frag67.83584.33568.563
PrAI-fragb68.02383.77064.896
Prosit_HCD_2764.08781.01963.629
MS2PIP_HCD51.76679.40154.354
MS2PIP_QTOF55.05579.40157.482

Highest peak, the number of instances where the predicted highest intensity peak is equal to the highest intensity peak from the database; top 1 from top 3, the predicted highest intensity peak is among the top three highest intensity peaks from the database; top 2 from top 3, the number of instances where the predicted top two highest intensity peaks are among the top three highest intensity peaks from the database.

Altered version of PrAI-frag with higher weight implemented to higher intensity peaks.

Highest peak, the number of instances where the predicted highest intensity peak is equal to the highest intensity peak from the database; top 1 from top 3, the predicted highest intensity peak is among the top three highest intensity peaks from the database; top 2 from top 3, the number of instances where the predicted top two highest intensity peaks are among the top three highest intensity peaks from the database. Altered version of PrAI-frag with higher weight implemented to higher intensity peaks. The second test simulated a simplified peptide spectrum match analysis, where peptides with similar precursor ion m/z (±0.5) that also contains at least three product ions with similar m/z (±0.5) are used to generate a candidate group of peptides (Supporting Information Data S3). For complex samples, such as serum, the peptides that belong to the same group would be potential noise peaks for one another during the identification process. For the NIST rat data, the grouping resulted in 1822 groups, with 2.548 peptides per group. We next counted the instances in which the model’s predicted peptide spectrum accurately matches the target peptide, against other peptides in the group. Match was performed by selecting the highest PCC and MSE values calculated from comparing all the predicted peptide spectrums and all peptide spectrums in the database within the group. The test showed that PrAI-frag shows the highest match accuracy among the models (Supporting Information Table S4). For both downstream tests, the accuracy difference between the PrAI-frag and Prosit HCD models (NCE = 27) was in close competition. To confirm the PrAI-frag performance, the first downstream test on the highest peak prediction was performed on two additional QTOF databases (PXD001587 and PXD008651) (Supporting Information Data S3). The highest peak prediction accuracy for the additional database showed that PrAI-frag reproduced the highest prediction accuracy among the compared models (Table S5). It is also noteworthy that the best prediction model for Prosit differed for each database. For PXD001587 and PXD 008651, the Prosit model with NCE values 23 and 25, respectively, showed the best results. Overall, these results suggest that PrAI-frag outperforms other models for high-intensity peak prediction. This is advantageous for applicability to MRM, where fewer and higher intensity peaks are used per peptide. During model development, we also tested an altered version of PrAI-frag to increase the accuracy on specific target fragments. One example was the prediction of the highest peak (Supporting Information Data S4). By generating a second loss function that calculates the MSE for the fragments with the top three intensities and combining the loss function to the original MSE, the model showed slightly increased accuracy for the highest peak prediction. However, the overall correlation suffered more greatly than that expected and was thus unused in the final model (Table ).

Conclusions

Advances in MS/MS technologies and deep learning algorithms have led to the generation of a number of peptide spectrum prediction algorithms. However, to our knowledge, many of the studies used the data obtained from HCD fragmentation, followed by different trapping methods. In this study, we develop a more accurate and QTOF CID-specific peptide fragmentation prediction algorithm. The k-fold cross-validation results showed reproducible training results without overfitting, which also indicated the applicability of the algorithm with additional or different types of data. PrAI-frag was compared with Prosit and MS2PIP’s due to their applicability to beam-type fragmentation methods. The comparison of the predicted accuracies of the peptide spectrum and downstream tests indicated that PrAI-frag is highly robust and accurate. The relationship between the length of the precursor peptide and the fragment length of the highest peak led to unexpected observations. The fragment with the highest intensity was highly enriched at the site of precursor length −2 (Figure d). Similar patterns have been reported for doubly-protonated short peptides (5–7 amino acids) fragmented with IT-CID.[28] The reported protonated oxazolone structure of b2 ions indicates that the precursor peptide in the gas phase leads to a structure, which may increase the chance of fragmentation caused by the electrical potential. On the other hand, the fragment with the highest intensity was also enriched at the sites of precursor lengths −4 and −5, for peptides of lengths 8 to 13.[29] One reason behind the effect of precursor length distribution may be the different protonation distribution to N-term and C-term which may increase the chance of peptide structure formation during acceleration caused by the electric potential difference.[30] Although it is a speculation, such information applied to deep learning models may improve the accuracy of prediction. The model building in the present study was designed to map important features from both the peptide fragmentation pattern and the RNN model itself. Unlike the hidden and automatic feature finding from training, feature exploration from the fragment pattern was controllable. We hypothesized that by utilizing peptide fragments with the highest intensity as the representative feature, extra weight would be enforced on the prediction accuracy of peaks with relatively higher intensity. This hypothesis proved correct, which is evident in the comparison of the actual accuracy of prediction of the top three fragments (Table ). PrAI-frag was able to predict the fragment with the highest intensity for each peptide with higher accuracy compared to MS2PIP and Prosit. The accuracy of the highest peak prediction from validation data during training reached over 75%. These data demonstrate the applicability of our algorithm to MRM transition selection. It is also noteworthy that PrAI-frag used 0.98 million spectra for training, while the Prosit 2020 model used approximately 30 million spectra. The substantially smaller training data led to peptide length limitations of 15 amino acids. Nevertheless, we expect our model to continuously improve with more data and tests, which would be updated via the Web.
  27 in total

1.  Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos.

Authors:  Christian K Frese; A F Maarten Altelaar; Marco L Hennrich; Dirk Nolting; Martin Zeller; Jens Griep-Raming; Albert J R Heck; Shabaz Mohammed
Journal:  J Proteome Res       Date:  2011-04-01       Impact factor: 4.466

2.  High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis.

Authors:  Shivani Tiwary; Roie Levy; Petra Gutenbrunner; Favio Salinas Soto; Krishnan K Palaniappan; Laura Deming; Marc Berndl; Arthur Brant; Peter Cimermancic; Jürgen Cox
Journal:  Nat Methods       Date:  2019-05-27       Impact factor: 28.547

3.  Investigation of gas phase ion structure for proline-containing b(2) ion.

Authors:  Lori L Smith; Kristin A Herrmann; Vicki H Wysocki
Journal:  J Am Soc Mass Spectrom       Date:  2005-12-09       Impact factor: 3.109

4.  On the accuracy and limits of peptide fragmentation spectrum prediction.

Authors:  Sujun Li; Randy J Arnold; Haixu Tang; Predrag Radivojac
Journal:  Anal Chem       Date:  2010-12-22       Impact factor: 6.986

Review 5.  Clinical potential of mass spectrometry-based proteogenomics.

Authors:  Bing Zhang; Jeffrey R Whiteaker; Andrew N Hoofnagle; Geoffrey S Baird; Karin D Rodland; Amanda G Paulovich
Journal:  Nat Rev Clin Oncol       Date:  2019-04       Impact factor: 66.675

6.  Predicting intensity ranks of peptide fragment ions.

Authors:  Ari M Frank
Journal:  J Proteome Res       Date:  2009-05       Impact factor: 4.466

7.  A repository of assays to quantify 10,000 human proteins by SWATH-MS.

Authors:  George Rosenberger; Ching Chiek Koh; Tiannan Guo; Hannes L Röst; Petri Kouvonen; Ben C Collins; Moritz Heusel; Yansheng Liu; Etienne Caron; Anton Vichalkovski; Marco Faini; Olga T Schubert; Pouya Faridi; H Alexander Ebhardt; Mariette Matondo; Henry Lam; Samuel L Bader; David S Campbell; Eric W Deutsch; Robert L Moritz; Stephen Tate; Ruedi Aebersold
Journal:  Sci Data       Date:  2014-09-16       Impact factor: 6.444

8.  In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics.

Authors:  Yi Yang; Xiaohui Liu; Chengpin Shen; Yu Lin; Pengyuan Yang; Liang Qiao
Journal:  Nat Commun       Date:  2020-01-09       Impact factor: 14.919

Review 9.  Array programming with NumPy.

Authors:  Charles R Harris; K Jarrod Millman; Stéfan J van der Walt; Ralf Gommers; Pauli Virtanen; David Cournapeau; Eric Wieser; Julian Taylor; Sebastian Berg; Nathaniel J Smith; Robert Kern; Matti Picus; Stephan Hoyer; Marten H van Kerkwijk; Matthew Brett; Allan Haldane; Jaime Fernández Del Río; Mark Wiebe; Pearu Peterson; Pierre Gérard-Marchant; Kevin Sheppard; Tyler Reddy; Warren Weckesser; Hameer Abbasi; Christoph Gohlke; Travis E Oliphant
Journal:  Nature       Date:  2020-09-16       Impact factor: 49.962

Review 10.  Recent advances in mass spectrometry based clinical proteomics: applications to cancer research.

Authors:  Andrew Macklin; Shahbaz Khan; Thomas Kislinger
Journal:  Clin Proteomics       Date:  2020-05-24       Impact factor: 3.988

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.