| Literature DB >> 34795380 |
Theodore G Smith1,2, Anuli C Uzozie1,2, Siyuan Chen1,2, Philipp F Lange3,4,5.
Abstract
The local sequence context is the most fundamental feature determining the post-translational modification (PTM) of proteins. Recent technological improvements allow for the detection of new and less prevalent modifications. We found that established state-of-the-art algorithms for the detection of PTM motifs in complex datasets failed to keep up with this technological development and are no longer robust. To overcome this limitation, we developed RoLiM, a new linear motif deconvolution algorithm and webserver, that enables robust and unbiased identification of local amino acid sequence determinants in complex biological systems demonstrated here by the analysis of 68 modifications found across 30 tissues in the human draft proteome map. Furthermore, RoLiM analysis of a large-scale phosphorylation dataset comprising 30 kinase inhibitors of 10 protein kinases in the EGF signalling pathway identified prospective substrate motifs for PI3K and EGFR.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34795380 PMCID: PMC8602328 DOI: 10.1038/s41598-021-01971-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Positional bias and performance comparison between algorithms. (a) Overlap between motifs identified by rmotif-X, MoMo, and RoLiM in original and reverse sequence datasets. (b) Overlap between motifs identified by rmotif-X, MoMo, and RoLiM in original and reordered sequence datasets. The proportion of overlap shown in the lollipop graph was calculated as the number of the same motifs found in both forward and reverse foregrounds/the sum of all unique motifs found in both foregrounds. See online Methods for details. (c) Compound residue classification implemented in RoLiM. (d) Top ten enriched phopsho-serine motifs identified by RoLiM in the dataset from Zadora et al. with and without compound residues enabled.
Figure 2Proteome wide pan-modification. (a) Hierarchical clustering of all modified sequences matching all extracted motifs. Motifs are identified by RoLiM for each modification separately. The fraction of sequences in each modification matching to each motif is displayed and clustered using the Euclidean distance. Only significantly enriched modification to motif matches are considered. Column grouping indicates the modification type and row grouping indicates the three motif complexity classes. (b) Number of modifications matching patterns of different complexity classes. (c) Percentage of patterns in each complexity class matched by enzymatic (E), non-enzymatic (N) or undefined (U) modifications. (d) t-SNE embedding of the data represented in a. Shaded areas are added manually to highlight undefined modifications and modifications matching motifs with the indicated central residues. Overlapping areas indicate modifications matching to motifs containing either central residue. Numbers denote mass shifts of select undefined modifications. (e) Positional load of modification associated motifs. The frequency at which a position is identified as overrepresented in the motifs is plotted and rows clustered using the Euclidean distance. (f) Number of modifications for a specific modification type matching Class I patterns. (g) Tissue specificity of phosphorylation motifs. Ln(x + 1) transformed, scaled and centered summed counts of sequences identified in each tissue and matching each pattern are clustered and plotted.
Figure 3EGF pathway inhibition analysis. (a) Enriched phosphorylation patterns identified by RoLiM. Phosphosites analysed in RoLiM lacked motifs matched to curated kinases in Olsen et al. data (see “Methods”). The abundance of each pattern (calculated as the average abundance of all phosphosites, is shown for the different experimental conditions—RPE1 cells without EGF stimulation (No serum), with EGF stimulation (EGF), and treated with 1 µm of a panel of inhibitors targeting 10 kinases in the EGF pathway. (b) A scheme of the EGF pathway showing EGFR and downstream kinases targeted with specific inhibitors in Olsen et al. Broken lines denote a series of steps involving other proteins and phosphosite abundance changes are depicted in blue (downregulation) and red (upregulation).