| Literature DB >> 31615963 |
B Blank-Landeshammer1, I Teichert2, R Märker2, M Nowrousian2,3, U Kück4, A Sickmann5.
Abstract
Proteogenomics combines proteomics, genomics, and transcriptomics and has considerably improved genome annotation in poorly investigated phylogenetic groups for which homology information is lacking. Furthermore, it can be advantageous when reinvestigating well-annotated genomes. Here, we applied an advanced proteogenomics approach, combining standard proteogenomics with peptide de novo sequencing, to refine annotation of the well-studied model fungus Sordaria macrospora We investigated samples from different developmental and physiological conditions, resulting in the detection of 104 so-far hidden proteins and annotation changes in 575 genes, including 389 splice site refinements. Significantly, our approach provides peptide-level evidence for 113 single-amino-acid variations and 15 C-terminal protein elongations originating from A-to-I REntities:
Keywords: RNA editing; alternative splice sites; alternative splicing; fungal genome; gene ontology; genomics; peptide de novo sequencing; phylostratigraphy; proteogenomics; proteomics
Mesh:
Substances:
Year: 2019 PMID: 31615963 PMCID: PMC6794485 DOI: 10.1128/mBio.02367-19
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1Schematic representation of the data analysis pipeline. Generated MS/MS spectra were subjected to subsequent database searches against the known S. macrospora protein sequences and a 6-frame translation of the S. macrospora genome as well as an independent de novo peptide sequencing method. Putative novel peptide identifications were clustered, filtered, converted to a genome browser readable format, and analyzed in conjunction with RNA-Seq data. Final curation of the genome annotation was performed manually.
FIG 2Evaluation of peptide identifications, classified as known and novel, in the 2-day data set. All peptide spectrum matches (PSMs) belonging to the respective class were compared to known false-positive decoy hits. (A) Normalized density plot of observed precursor mass deviation indicates no difference in distribution of identified known and novel PSMs. (B) Distribution of posterior error probability (PEP) shows clear distinction between decoy PSMs and known PSMs but almost overlapping distribution of known and novel hits. (C) Length of identified peptides of both classes, but not between classes, differs from decoy identifications. AA, amino acids. (D) Observed retention time shows tight correlation to predicted hydrophobicity index (HI) by SSRCalc for both classes, while retention times of known false positives only weakly correlate. In all cases, all decoy PSMs with a q value of <0.01 were plotted as a reference. See Fig. S1 and S2 for additional data sets.
Classification of all annotation refinements performed in this study
| Type of refinement | Total no. of refinements | No. of refinements with two variants |
|---|---|---|
| Splice site (3’ or 5’ splicing site) | 389 | 45 |
| TIS | 283 | 101 |
| Annotation extension (up- or downstream of annotation) | 237 | |
| Frame shift | 116 | |
| Annotation fission | 13 | |
| Annotation fusion | 7 | |
| Novel annotation | 104 |
FIG 3Alternative splicing in the pheromone pathway-specific kinase gene mek2. (A) Graphical representation of the canonical gene structure, including observed peptides (green bars), covering three splice junctions. The mek2 gene comprises 5 exons and was identified with a total of 50 peptides, covering 75% of the sequence. (B) Graphical representation of the alternatively spliced variant. Retention of intron 4 leads to translation into an alternative protein C terminus, identified by 6 novel peptides (orange bars). (C) Label-free quantification of both MEK2 isoforms throughout six growth conditions of S. macrospora reveals downregulation of the newly identified variant (orange, SMAC_06526.3_t2) during sexual development (BMM_5d and BMM_7d). (D) Sashimi plot visualizing the RNA-Seq coverage of both splice variants in vegetative and sexual mycelium.
Classification of identified SAAVs by type
| SAAV | Mass shift (Da) | No. of observations | Edited and nonedited detected (%) | Median PROVEAN score | Predicted deleterious (%) |
|---|---|---|---|---|---|
| K→E | 0.94763 | 32 | 53 | −1.2565 | 21.9 |
| I→V | −14.01565 | 19 | 95 | −0.689 | 0 |
| S→G | −30.01057 | 14 | 100 | −1.8475 | 28.6 |
| R→G | −99.07965 | 11 | 90 | −3.349 | 63.6 |
| K→R | 28.00615 | 10 | 70 | −1.1875 | 10.0 |
| T→A | −30.01057 | 7 | 100 | −0.792 | 42.9 |
| Q→R | 28.04253 | 6 | 100 | −0.797 | 16.7 |
| Y→C | −60.05414 | 5 | 100 | −6.16 | 80.0 |
| M→V | −31.97208 | 5 | 75 | −1.047 | 20.0 |
| E→G | −72.02113 | 2 | 100 | −2.833 | 50.0 |
| N→S | −27.0109 | 1 | 100 | −2.053 | 0 |
| I→M | 17.95643 | 1 | 100 | −2.867 | 100 |
| Total | 113 | 81 | −1.220 | 26.8 |
Total number of observed single-amino-acid variation (SAAV) putatively caused by mRNA editing events in the 7-day sample. Respective theoretical mass shifts of each amino acid exchange are given. Additionally, percentages of sites found for both the edited as well as the nonedited variant are given. PROVEAN prediction score was retrieved for each individual SAAV via the PROVEAN web interface (70), and the median score for each class of SAAV was calculated. A default threshold of −2.5 was used to estimate the extent of potentially deleterious variants.
N→D events were not considered, as they can be caused by RNA editing as well as by asparagine deamidation on the protein/peptide level.
FIG 4Peptide-level evidence for a recoding RNA editing event. (A) Parallel reaction monitoring (PRM) transitions of peptide EDDAVFFNYR originating from a putative RNA editing event in SMAC_03693, leading to the exchange of Thr to Ala at position 255. (B) PRM transitions of the genome-encoded peptide EDDTVFFNYR. (C) Peptides were monitored in fungal cells grown for 2, 3, 5, and 7 days, with the peptide originating from edited RNA only being identified in the latter two cases.
FIG 5Validation of a stop-loss editing site in the transcript for the white collar 1 protein. (A) Primary amino acid sequence of the white collar 1 protein. Editing results in the conversion of a stop codon into a tryptophan codon and extends the amino acid sequence by 131 amino acids. As a consequence, the protein carries a C-terminal histone deacetylase domain (HDAC). Peptides encoded by the canonical gene are shown in red, while the blue peptide is unique to the novel C-terminal sequence. (B) Annotated MS/MS spectrum of the editing-specific peptide observed in the 7-day data set. (C) Domain structure of the white collar 1 protein, including the HDAC domain (green box) in the extended C terminus (blue box). PAS, Per-Arnt-Sim domain; GATA, GATA-type zinc finger transcription factor domain.
FIG 6Phylostratigraphic map of all S. macrospora proteins. (A) Division into known and identified detected proteins, novel proteins identified by the proteogenomics analysis, and known and unidentified proteins not found in this study. Relative protein occurrence (i.e., the number of proteins assigned to a PS relative to the total number of proteins) describes the share of proteins assigned to a given phylostratum (PS) within its aforementioned class. (B) Detailed phylostratigraphic map of all novel, completely annotated class I proteins, displaying the BLAST E value of the top 5 hits in every PS. Proteins are hierarchically clustered (Ward’s method) to show similarities in PS distribution.