| Literature DB >> 31874640 |
Yang-Ming Lin1, Ching-Tai Chen2, Jia-Ming Chang3.
Abstract
BACKGROUND: Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search.Entities:
Keywords: Deep convolutional neural networks; Deep learning; Machine learning; Mass spectrum; Peptide; Protein identification; Spectral library search; Tandem mass spectrometry
Mesh:
Substances:
Year: 2019 PMID: 31874640 PMCID: PMC6929458 DOI: 10.1186/s12864-019-6297-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Bar chart of MS2CNN COS on charge 2+ (blue), 3+ (orange), and mix (gray) models. Blue and orange dashed lines indicate the peptide number of charge 2+ and 3+ data sets, respectively
Average cosine similarity (COS) and Pearson correlation coefficient (PCC) of spectra from the same peptide in training and independent test sets with charge 2+ and charge 3+
| Length | Charge 2+ | Charge 3+ | ||
|---|---|---|---|---|
| COS | PCC | COS | PCC | |
| 9 | 0.800 | 0.757 | 0.617 | 0.553 |
| 10 | 0.770 | 0.724 | 0.781 | 0.734 |
| 11 | 0.760 | 0.713 | 0.771 | 0.721 |
| 12 | 0.735 | 0.688 | 0.735 | 0.682 |
| 13 | 0.704 | 0.655 | 0.732 | 0.681 |
| 14 | 0.703 | 0.658 | 0.703 | 0.650 |
| 15 | 0.687 | 0.643 | 0.672 | 0.617 |
| 16 | 0.694 | 0.652 | 0.691 | 0.641 |
| 17 | 0.645 | 0.601 | 0.690 | 0.641 |
| 18 | 0.646 | 0.606 | 0.660 | 0.612 |
| 19 | 0.636 | 0.595 | 0.668 | 0.622 |
Fig. 2a COS (cosine similarity) and b PCC (Pearson’s correlation coefficient) of MS2CNN 2+ (blue bar), MS2CNN_mix (blue bar with white dots), MS2PIP (white bar with blue dashes), and pDeep (black bar) on the charge 2+ peptides from the independent test set
Fig. 3a COS and b PCC of MS2CNN 3+ (blue bar), MS2CNN_mix (blue bar with white dots), MS2PIP (white bar with blue dashes), and pDeep (black bar) on the charge 3+ peptides from the independent test set
Features used to encode a peptide sequence and its fragment ion sequences
| Feature | Description | Package: function name |
|---|---|---|
| m/z | Pyteomics v3.4.2: calculate_mass | |
| Original | m/z of the original sequence | |
| Fragment ion | m/z of the fragment ion sequence | |
| Isoelectric point | isoelectric point of the sequence | Biopython 1.7: isoelectric_point |
| Instability index | instability index of the sequence | Biopython 1.7: instability_index |
| Aromaticity | aromaticity of the sequence | Biopython 1.7: aromaticity |
| Secondary structure fraction | α-helix, β-strand and coil fraction of the sequence | Biopython 1.7: secondary_structure_fraction |
| Helicity | helicity of the sequence | In-house program |
| Hydrophobicity | hydrophobicity of the sequence | in-house program |
| Basicity | basicity of the sequence | in-house program |
Fig. 4MS2CNN model architecture