| Literature DB >> 29132314 |
J Michael Proffitt1, Jeremy Glenn1, Anthony J Cesnik2, Avinash Jadhav1,3, Michael R Shortreed2, Lloyd M Smith2,4, Kylie Kavanagh5, Laura A Cox1,6, Michael Olivier7,8,9.
Abstract
BACKGROUND: Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue.Entities:
Keywords: Galaxy-P; Liver; Morpheus; Non-human primate; Proteogenomics; Proteomics; RNA-Seq; Vervet
Mesh:
Year: 2017 PMID: 29132314 PMCID: PMC5683380 DOI: 10.1186/s12864-017-4279-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Descriptive statistics for the RNA-Seq and mass spectrometry analyses utilizing the Vervet reference search database (REFdb, 19,255 gene entries) and the sample-specific databases (SSdb)
| Sample | RNA-Seq | SSdb Entries | Mass Spectra | PSMs | Peptide IDs | ||||
|---|---|---|---|---|---|---|---|---|---|
| RNA-Seq reads | % reads aligned | Genes | Novel SJs | REFdb | SSdb | REFdb | SSdb | ||
| 1030 | 7,040,525 | 55.5 | 13,804 | 4069 | 80,003 | 26,525 | 26,680 | 9765 | 9702 |
| 1211 | 6,585,341 | 68.8 | 15,782 | 7171 | 79,381 | 27,288 | 27,673 | 10,532 | 10,527 |
| 1238 | 6,594,936 | 67.1 | 15,659 | 6595 | 78,444 | 19,600 | 19,898 | 9349 | 9354 |
| 1245 | 6,730,432 | 64.0 | 13,901 | 4089 | 80,281 | 29,143 | 29,193 | 10,503 | 10,463 |
| 1248 | 10,504,974 | 69.4 | 15,513 | 7429 | 80,221 | 22,205 | 22,479 | 9120 | 9162 |
| 1254 | 9,127,588 | 62.5 | 15,936 | 7641 | 79,675 | 23,655 | 23,738 | 9334 | 9385 |
| 1291 | 6,575,182 | 67.9 | 13,354 | 3653 | 79,960 | 30,623 | 30,722 | 11,478 | 11,652 |
| 1347 | 6,637,842 | 56.6 | 13,284 | 3147 | 78,791 | 17,284 | 17,037 | 8633 | 8575 |
| 1448 | 8,019,158 | 65.0 | 15,668 | 6419 | 71,853 | 15,612 | 15,561 | 7582 | 7593 |
| 1467 | 9,983,615 | 66.2 | 16,176 | 7305 | 78,781 | 20,101 | 20,162 | 9177 | 9223 |
Fig. 1Sample specific database vs reference database proteins identified
Gene Set Enrichment Analysis for proteins identified by reference but not sample-specific databases
| GO Annotation | Description |
| FDR q-value |
|---|---|---|---|
| GO: STRUCTURAL MOLECULE ACTIVITY | The action of a molecule that contributes to the structural integrity of a complex or assembly within or outside a cell. | 3.03 × 10−15 | 3.42 × 10−11 |
| GO: OXIDATION REDUCTION PROCESS | A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. | 9.8 × 10−14 | 5.52 × 10−10 |
| GO: DNA PACKAGING COMPLEX | A protein complex that plays a role in the process of DNA packaging. | 7.61 × 10−13 | 2.86 × 10−9 |
| GO: EXTRACELLULAR SPACE | That part of a multicellular organism outside the cells proper, usually taken to be outside the plasma membranes, and occupied by fluid. | 1.25 × 10−11 | 2.81 × 10−8 |
| GO: PROTEIN DNA COMPLEX | A macromolecular complex containing both protein and DNA molecules. | 6.04 × 10−11 | 1.13 × 10−7 |