| Literature DB >> 32549361 |
Delphine Vincent1, Keith Savin1, Simone Rochfort1, German Spangenberg1.
Abstract
Cannabis research has taken off since the relaxation of legislation, yet proteomics is still lagging. In 2019, we published three proteomics methods aimed at optimizing protein extraction, protein digestion for bottom-up and middle-down proteomics, as well as the analysis of intact proteins for top-down proteomics. The database of Cannabis sativa proteins used in these studies was retrieved from UniProt, the reference repositories for proteins, which is incomplete and therefore underrepresents the genetic diversity of this non-model species. In this fourth study, we remedy this shortcoming by searching larger databases from various sources. We also compare two search engines, the oldest, SEQUEST, and the most popular, Mascot. This shotgun proteomics experiment also utilizes the power of parallel digestions with orthogonal proteases of increasing selectivity, namely chymotrypsin, trypsin/Lys-C and Asp-N. Our results show that the larger the database the greater the list of accessions identified but the longer the duration of the search. Using orthogonal proteases and different search algorithms increases the total number of proteins identified, most of them common despite differing proteases and algorithms, but many of them unique as well.Entities:
Keywords: Asp-N; LC-MS; Mascot; SEQUEST; bottom-up and middle-down proteomics; cannabis sativa; chymotrypsin; missed cleavages; post-translational modification; trypsin/Lys-C
Year: 2020 PMID: 32549361 PMCID: PMC7356525 DOI: 10.3390/proteomes8020013
Source DB: PubMed Journal: Proteomes ISSN: 2227-7382
Figure 1Experimental design.
Description of the five FASTA databases used in this study.
| DB Name | Source | Number of Entries | Annotation | Date | Algorithm | Taxonomy |
|---|---|---|---|---|---|---|
| SP21 |
| 19 from SwissProt + CBCAS (patent WO2015/196275) + GOT (patent WO2011/017798) = 21 | Yes | Feb 2020 | SEQUEST | C. sativa |
| Uniprot515 |
| 513 from UniProt + | Yes | Feb 2020 | SEQUEST | C. sativa |
| JO29k | https://www.cannabisdraftmap.org/ | 29,057 | Yes | Dec 2019 | SEQUEST 1 | C. sativa |
| Homemade95k | Uniprot515 + | Yes | Feb 2020 | SEQUEST | C. sativa | |
| SPGP40k |
| 39,800 | Yes | Feb 2020 | SEQUEST | Green plants |
1 JO29k FASTA file could not be parsed in Mascot due to duplicate rows. SEQUEST could handle the duplicates.
Figure 2LC-MS pattern and statistical results. (A) 2-D nLC-MS maps along m/z 300–1800 on the X-axis and 9–39 min retention time on the Y-axis. (B) Box plots of cluster intensities. (C) Violin plots of cluster volumes. (D) Principal component analysis (PCA) plots of PC1xPC2 of the nine samples. Buds 1–3 are the biological triplicates. Proteases A, rAsp-N; protease C, chymotrypsin; protease TL, trypsin/Lys-C.
Number of MS and MS/MS scans and clusters per sample.
| Sample | MS Scans | MS/MS Scans | MS Clusters |
|---|---|---|---|
| bud1_A | 12,582 | 10,990 | 91,784 |
| bud2_A | 11,820 | 10,174 | 85,566 |
| bud3_A | 11,686 | 10,079 | 85,388 |
| bud1_C | 11,345 | 9532 | 89,030 |
| bud2_C | 10,391 | 8458 | 82,091 |
| bud3_C | 11,562 | 9597 | 83,440 |
| bud1_TL | 13,423 | 11,828 | 91,320 |
| bud2_TL | 12,858 | 11,242 | 87,335 |
| bud3_TL | 12,330 | 10,665 | 84,845 |
| mean A | 12,029 | 10,414 | 87,579 |
| SD A | 483 | 501 | 3642 |
| CV A | 4 | 5 | 4 |
| mean C | 11,099 | 9196 | 84,854 |
| SD C | 623 | 640 | 3679 |
| CV C | 6 | 7 | 4 |
| mean TL | 12,870 | 11,245 | 87,833 |
| SD TL | 547 | 582 | 3266 |
| CV TL | 4 | 5 | 4 |
Number of identities for each sample across the five databases and the two algorithms.
| Database | # Proteins in Database | Sample | # Proteins with SEQUEST | # Proteins with Mascot | % Proteins with SEQUEST | % Proteins with Mascot |
|---|---|---|---|---|---|---|
| SP21 | 21 | bud1_A | 15 | 9 | 71.4 | 42.9 |
| SP21 | 21 | bud2_A | 15 | 9 | 71.4 | 42.9 |
| SP21 | 21 | bud3_A | 15 | 9 | 71.4 | 42.9 |
| SP21 | 21 | bud1_C | 15 | 12 | 71.4 | 57.1 |
| SP21 | 21 | bud2_C | 15 | 12 | 71.4 | 57.1 |
| SP21 | 21 | bud3_C | 15 | 11 | 71.4 | 52.4 |
| SP21 | 21 | bud1_TL | 16 | 15 | 76.2 | 71.4 |
| SP21 | 21 | bud2_TL | 15 | 14 | 71.4 | 66.7 |
| SP21 | 21 | bud3_TL | 16 | 16 | 76.2 | 76.2 |
| Uniprot515 | 515 | bud1_A | 65 | 40 | 12.6 | 7.8 |
| Uniprot515 | 515 | bud2_A | 63 | 35 | 12.2 | 6.8 |
| Uniprot515 | 515 | bud3_A | 67 | 36 | 13.0 | 7.0 |
| Uniprot515 | 515 | bud1_C | 67 | 46 | 13.0 | 8.9 |
| Uniprot515 | 515 | bud2_C | 70 | 39 | 13.6 | 7.6 |
| Uniprot515 | 515 | bud3_C | 70 | 38 | 13.6 | 7.4 |
| Uniprot515 | 515 | bud1_TL | 70 | 48 | 13.6 | 9.3 |
| Uniprot515 | 515 | bud2_TL | 69 | 39 | 13.4 | 7.6 |
| Uniprot515 | 515 | bud3_TL | 69 | 48 | 13.4 | 9.3 |
| JO29k | 29,057 | bud1_A | 1071 | n.a. | 3.7 | n.a. |
| JO29k | 29,057 | bud2_A | 1037 | n.a. | 3.6 | n.a. |
| JO29k | 29,057 | bud3_A | 1034 | n.a. | 3.6 | n.a. |
| JO29k | 29,057 | bud1_C | 748 | n.a. | 2.6 | n.a. |
| JO29k | 29,057 | bud2_C | 766 | n.a. | 2.6 | n.a. |
| JO29k | 29,057 | bud3_C | 807 | n.a. | 2.8 | n.a. |
| JO29k | 29,057 | bud1_TL | 1244 | n.a. | 4.3 | n.a. |
| JO29k | 29,057 | bud2_TL | 1162 | n.a. | 4.0 | n.a. |
| JO29k | 29,057 | bud3_TL | 1188 | n.a. | 4.1 | n.a. |
| Homenade95k | 95,069 | bud1_A | 1130 | 792 | 1.2 | 0.8 |
| Homenade95k | 95,069 | bud2_A | 1115 | 741 | 1.2 | 0.8 |
| Homenade95k | 95,069 | bud3_A | 1085 | 699 | 1.1 | 0.7 |
| Homenade95k | 95,069 | bud1_C | 981 | 552 | 1.0 | 0.6 |
| Homenade95k | 95,069 | bud2_C | 988 | 555 | 1.0 | 0.6 |
| Homenade95k | 95,069 | bud3_C | 1002 | 549 | 1.1 | 0.6 |
| Homenade95k | 95,069 | bud1_TL | 1322 | 1126 | 1.4 | 1.2 |
| Homenade95k | 95,069 | bud2_TL | 1192 | 922 | 1.3 | 1.0 |
| Homenade95k | 95,069 | bud3_TL | 1237 | 1009 | 1.3 | 1.1 |
| SPGP40k | 39,800 | bud1_A | 627 | 439 | 1.6 | 1.1 |
| SPGP40k | 39,800 | bud2_A | 620 | 415 | 1.6 | 1.0 |
| SPGP40k | 39,800 | bud3_A | 605 | 394 | 1.5 | 1.0 |
| SPGP40k | 39,800 | bud1_C | 604 | 443 | 1.5 | 1.1 |
| SPGP40k | 39,800 | bud2_C | 605 | 395 | 1.5 | 1.0 |
| SPGP40k | 39,800 | bud3_C | 621 | 416 | 1.6 | 1.0 |
| SPGP40k | 39,800 | bud1_TL | 756 | 688 | 1.9 | 1.7 |
| SPGP40k | 39,800 | bud2_TL | 706 | 562 | 1.8 | 1.4 |
| SPGP40k | 39,800 | bud3_TL | 730 | 624 | 1.8 | 1.6 |
Search times across the five databases and two algorithms for each sample.
| Database | Sample | Total Search Duration 1 | SEQUEST/Decoy 2 Search Duration | Mascot/Decoy 2 Search Duration |
|---|---|---|---|---|
| SP21 | bud1_A | 11 min 0 s | 2 min 0 s | 6 min 43 s |
| SP21 | bud2_A | 10 min 0 s | 1 min 30 s | 6 min 44 s |
| SP21 | bud3_A | 10 min 0 s | 1 min 31 s | 6 min 25 s |
| SP21 | bud1_C | 10 min 0 s | 2 min 35 s | 4 min 52 s |
| SP21 | bud2_C | 8 min 0 s | 1 min 54 s | 4 min 4 s |
| SP21 | bud3_C | 10 min 0 s | 2 min 21 s | 5 min 12 s |
| SP21 | bud1_T | 12 min 0 s | 2 min 28 s | 6 min 42 s |
| SP21 | bud2_T | 11 min 0 s | 2 min 18 s | 6 min 28 s |
| SP21 | bud3_T | 11 min 0 s | 2 min 12 s | 6 min 1 s |
| Uniprot515 | bud1_A | 20 min 0 s | 5 min 30 s | 10 min 12 s |
| Uniprot515 | bud2_A | 19 min 0 s | 5 min 10 s | 10 min 53 s |
| Uniprot515 | bud3_A | 21 min 0 s | 5 min 15 s | 11 min 42 s |
| Uniprot515 | bud1_C | 18 min 0 s | 8 min 28 s | 5 min 12 s |
| Uniprot515 | bud2_C | 16 min 0 s | 7 min 1 s | 4 min 22 s |
| Uniprot515 | bud3_C | 19 min 0 s | 8 min 53 s | 5 min 4 s |
| Uniprot515 | bud1_T | 26 min 0 s | 11 min 33 s | 8 min 25 s |
| Uniprot515 | bud2_T | 20 min 0 s | 8 min 55 s | 6 min 4 s |
| Uniprot515 | bud3_T | 21 min 0 s | 8 min 49 s | 7 min 22 s |
| JO29k | bud1_A | 1 h 14 min 0 s | 1 h 9 min | n.a. |
| JO29k | bud2_A | 1 h 17 min 0 s | 1 h 13 min | n.a. |
| JO29k | bud3_A | 1 h 22 min 0 s | 1 h 18 min | n.a. |
| JO29k | bud1_C | 28 min 0 s | 24 min 3 s | n.a. |
| JO29k | bud2_C | 19 min 0 s | 16 min 14 s | n.a. |
| JO29k | bud3_C | 25 min 0 s | 21 min 4 s | n.a. |
| JO29k | bud1_T | 56 min 0 s | 51 min 50 s | n.a. |
| JO29k | bud2_T | 45 min 0 s | 40 min 29 s | n.a. |
| JO29k | bud3_T | 49 min 0 s | 44 min 30 s | n.a. |
| Homemade95k | bud1_A | 19 h 13 min 0 s | 4 h 47 min | 14 h 17 min |
| Homemade95k | bud2_A | 22 h 16 min 0 s | 5 h 14 min | 16 h 54 min |
| Homemade95k | bud3_A | 25 h 28 min 0 s | 5 h 56 min | 19 h 24 min |
| Homemade95k | bud1_C | 8 h 31 min 0 s | 2 h 53 min | 5 h 31 min |
| Homemade95k | bud2_C | 5 h 21 min 0 s | 1 h 31 min | 3 h 43 min |
| Homemade95k | bud3_C | 5 h 29 min 0 s | 1 h 57 min | 3 h 25 min |
| Homemade95k | bud1_T | 9 h 20 min 0 s | 2 h 50 min | 6 h 22 min |
| Homemade95k | bud2_T | 5 h 29 min 0 s | 1 h 49 min s | 3 h 30 min |
| Homemade95k | bud3_T | 8 h 10 min 0 s | 2 h 19 min s | 5 h 43 min |
| SPGP40k | bud1_A | 6 h 48 min 0 s | 3 h 33 min | 3 h 8 min |
| SPGP40k | bud2_A | 7 h 41 min 0 s | 3 h 50 min | 3 h 45 min |
| SPGP40k | bud3_A | 8 h 39 min 0 s | 4 h 17 min | 4 h 15 min |
| SPGP40k | bud1_C | 3 h 35 min 0 s | 2 h 3 min | 1 h 26 min |
| SPGP40k | bud2_C | 2 h 18 min 0 s | 1 h 14 min | 59 min 41 s |
| SPGP40k | bud3_C | 2 h 42 min 0 s | 1 h 39 min | 57 min 18 s |
| SPGP40k | bud1_T | 4 h 22 min 0 s | 2 h 27 min | 1 h 48 min |
| SPGP40k | bud2_T | 2 h 43 min 0 s | 1 h 34 min | 1 h 2 min |
| SPGP40k | bud3_T | 3 h 42 min 0 s | 1 h 59 min | 1 h 36 min |
1 The total search duration is the time PD 1.4 takes to completely process one LC-MS/MS file as detailed in the workflow supplied in Supplementary Materials Figure S1. Beside database/decoy searches using SEQUEST and Mascot, the workflow includes a spectrum file reading step, a spectrum selector step and a target decoy PSM validator step. 2 Decoy searches are performed during the search engine steps using a decoy reversed database; false positives are eliminated during the target decoy PSM validator step. We exemplify this in Supplementary Materials File F1.txt using the Homemade95k database.
Number of missed cleavages per database.
| # Miscleavage | SP21 | Uniprot515 | JO29k | Homemade95k | SPGP40k |
|---|---|---|---|---|---|
| 0 | 116 | 433 | 2822 | 5818 | 2060 |
| 1 | 33 | 95 | 282 | 1091 | 403 |
| 2 | 20 | 51 | 32 | 339 | 140 |
| 3 | 7 | 16 | 13 | 158 | 60 |
| 4 | 8 | 9 | 5 | 54 | 28 |
| 5 | 1 | 1 | 6 | 22 | 7 |
| 6 | 4 | 3 | 4 | 8 | 5 |
| 7 | 2 | 3 | 1 | 8 | 4 |
| 8 | 1 | 0 | 3 | 5 | 1 |
| 10 | 1 | 0 | 1 | 1 | 1 |
| TOTAL | 193 | 611 | 3169 | 7504 | 2709 |
| TOTAL miscleavage = 0 | 116 | 433 | 2822 | 5818 | 2060 |
| TOTAL miscleavage > 0 | 77 | 178 | 347 | 1686 | 649 |
| % miscleavage > 0 | 39.9 | 29.1 | 10.9 | 22.5 | 24.0 |
| ELPD a | 39 | 255 | 2475 | 4132 | 1411 |
a ELDP, excess of limit-digested peptides.
Masses of identified peptides across all five databases (A) and for each protease (B).
| A. Peptide Mass | SP21 | Uniprot515 | JO29k | Homemade95k | SPGP40k |
|---|---|---|---|---|---|
| min | 626.4 | 626.4 | 969.5 | 604.3 | 604.3 |
| max | 7600.9 | 6385.2 | 6724.5 | 6993.1 | 6448.6 |
| average | 2123.2 | 2023.2 | 2173.6 | 1975.8 | 1866.0 |
| SD | 1099.7 | 1048.9 | 791.1 | 830.3 | 776.8 |
|
|
|
|
|
|
|
| A | SP21 | 1006.6 | 7600.9 | 2475.2 | 1166.7 |
| A | Uniprot515 | 631.3 | 5994.1 | 2363.4 | 1192.1 |
| A | JO29k | 969.5 | 6724.5 | 2280.9 | 905.8 |
| A | Homemade95k | 653.4 | 6375.2 | 2147.2 | 939.1 |
| A | SPGP40k | 653.4 | 6448.6 | 2028.9 | 929.2 |
| C | SP21 | 774.4 | 5520.9 | 1807.1 | 927.0 |
| C | Uniprot515 | 704.4 | 5520.9 | 1779.1 | 793.0 |
| C | JO29k | 1034.6 | 6061.9 | 2108.9 | 776.2 |
| C | Homemade95k | 789.5 | 6954.3 | 1901.9 | 724.2 |
| C | SPGP40k | 789.5 | 5121.4 | 1832.0 | 581.4 |
| TL | SP21 | 626.4 | 5303.5 | 2007.0 | 1058.9 |
| TL | Uniprot515 | 626.4 | 6385.2 | 1926.4 | 1015.7 |
| TL | JO29k | 1055.5 | 6369.2 | 2112.1 | 705.8 |
| TL | Homemade95k | 604.3 | 6369.2 | 1922.4 | 789.4 |
| TL | SPGP40k | 604.3 | 6369.2 | 1795.0 | 706.0 |
Figure 3Comparison of the protein coverage results obtained using the five databases. (A) Histogram of cumulated sequence coverage for the 18 proteins identified using SP21 database. The secondary Y-axis represents the number of AAs. (B) Histogram of cumulated sequence coverage for the 72 accessions identified using Uniprot515 database. The secondary Y-axis represents the number of AAs. (C) Scatterplot of the coverage of the 1343 accessions identified using JO29k database plotted against their MWs (kD) for each digestion. (D) Scatterplot of the coverage of the 1442 accessions identified using Homemade95k database plotted against their MWs (kD) for each digestion. (E) Scatterplot of the coverage of the 819 accessions identified using SPGP40k database plotted against their MWs (kD) for each digestion.
Number of post-translational modifications (PTMs) per database.
| PTM | SP21 | Uniprot515 | JO29k | Homemade95k | SPGP40k |
|---|---|---|---|---|---|
| Carbamidomethyl (C) | 34 | 94 | 493 | 602 | 226 |
| N-term acetyl (K) | 21 | 16 | 27 | 91 | 44 |
| Acetyl (K) | 47 | 32 | 47 | 132 | 71 |
| Methyl (K) | 61 | 49 | 114 | 163 | 158 |
| NAG (N) | 10 | 5 | 9 | 17 | 7 |
| Oxidation (M) | 18 | 24 | 43 | 66 | 90 |
| Phospho (STY) | 86 | 57 | 100 | 201 | 71 |
| TOTAL PTMs | 277 | 277 | 833 | 1272 | 667 |
| # identified peptides | 344 | 611 | 3169 | 7504 | 2709 |
| # unmodified peptides | 192 | 450 | 2255 | 5593 | 1834 |
| # modified peptides | 152 | 161 | 914 | 1911 | 875 |
| % modified peptides | 44.2 | 26.4 | 28.8 | 25.5 | 32.3 |