| Literature DB >> 32438695 |
Cary Pirone-Davies1,2, Melinda A McFarland1, Christine H Parker1, Yoko Adachi3, Timothy R Croley1.
Abstract
As the apparent incidence of tree nut allergies rises, the development of MS methods that accurately identify tree nuts in food is critical. However, analyses are limited by few available tree nut protein sequences. We assess the utility of translated genomic and transcriptomic data for library construction with Juglans regia, walnut, as a model. Extracted walnuts were subjected to nano-liquid chromatography-mass spectrometry (n-LC-MS/MS), and spectra were searched against databases made from a six-frame translation of the genome (6FT), a transcriptome, and three proteomes. Searches against proteomic databases yielded a variable number of peptides (1156-1275), and only ten additional unique peptides were identified in the 6FT database. Searches against a transcriptomic database yielded results similar to those of the National Center for Biotechnology Information (NCBI) proteome (1200 and 1275 peptides, respectively). Performance of the transcriptomic database was improved via the adjustment of RNA-Seq read processing methods, which increased the number of identified peptides which align to seed allergen proteins by ~20%. Together, these findings establish a path towards the construction of robust proxy protein databases for tree nut species and other non-model organisms.Entities:
Keywords: Juglans regia; database; de-novo transcriptome; nut allergen; pecan; proteomics; walnut
Year: 2020 PMID: 32438695 PMCID: PMC7284556 DOI: 10.3390/biology9050104
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Figure 1Histogram of the sequence lengths of the NCBI proteome and the transcriptome. The transcriptome is represented in blue, the proteome in red.
The number of residues, sequences, and identified peptides in each sized database. All database sizes include the contaminants database, 125 sequences, 40,028 residues.
| SIZED | UNSIZED | ||||
|---|---|---|---|---|---|
| Database Type | Number of Sequences | Number of Residues | Number of Sequences | Number of Residues | Number of Identified Peptides |
| NCBI Proteome | 55,751 | 24,750,578 | 61,756 | 26,627,674 | 1275 |
| Maker Proteome | 32,621 | 13,113,315 | 76,087 | 26,627,669 | 1183 |
| Braker Proteome | 30,306 | 13,599,899 | 72,425 | 26,627,687 | 1156 |
| Translated transcriptome | 194,436 | 26,627,682 | 194,436 | 26,627,682 | 1200 |
| Six-frame Translation Genome | 172,954 | 679,657,178 | NA | NA | 719 |
Figure 2Venn diagrams comparing the total number of peptides identified in five databases. Number of unique peptides listed, along with the number of peptides shared by all databases. (A). NCBI, Maker, and Braker databases, (B). NCBI and 6FT databases, (C). NCBI and Transcriptomic databases.
Figure 3Plot of the mean number of peptides identified across eight transcriptomic databases constructed using different read processing conditions in walnut (n = 4). Bars represent standard deviations, sd = max_pct_stdev.
Figure 4Alignment of sequences from a cluster of related sulfur-rich seed storage sequences from a parsimonious comparison (A). Comparison of the NCBI proteome (XP 018824007.1) and the transcriptome assembled under published conditions. (B). Comparison of the NCBI proteome and the improved transcriptome assembled using Rcorrector and max_pct_stdev = 100.