| Literature DB >> 34323402 |
Lazaro Hiram Betancourt1, Jeovanis Gil1, Aniel Sanchez2, Viktória Doma3,4, Magdalena Kuras2, Jimmy Rodriguez Murillo5, Erika Velasquez2, Uğur Çakır4, Yonghyo Kim1, Yutaka Sugihara1, Indira Pla Parada2, Beáta Szeitz6, Roger Appelqvist1, Elisabet Wieslander1, Charlotte Welinder1, Natália Pinto de Almeida7,8, Nicole Woldmar7,8, Matilda Marko-Varga1, Jonatan Eriksson1, Krzysztof Pawłowski4,9,10, Bo Baldetorp1, Christian Ingvar11,12, Håkan Olsson1,11, Lotta Lundgren1,11, Henrik Lindberg1, Henriett Oskolas1, Boram Lee1, Ethan Berge1, Marie Sjögren1, Carina Eriksson1, Dasol Kim13, Ho Jeong Kwon13, Beatrice Knudsen14, Melinda Rezeli8, Johan Malm2, Runyu Hong15, Peter Horvath16, A Marcell Szász17,6, József Tímár3, Sarolta Kárpáti4, Peter Horvatovich18, Tasso Miliotis19, Toshihide Nishimura20, Harubumi Kato21, Erik Steinfelder22, Madalina Oppermann22, Ken Miller22, Francesco Florindi23, Quimin Zhou24, Gilberto B Domont7, Luciana Pizzatti7, Fábio C S Nogueira7, Leticia Szadai25, István Balázs Németh25, Henrik Ekedahl1,11, David Fenyö15, György Marko-Varga8,13,21.
Abstract
The MM500 meta-study aims to establish a knowledge basis of the tumor proteome to serve as a complement to genome and transcriptome studies. Somatic mutations and their effect on the transcriptome have been extensively characterized in melanoma. However, the effects of these genetic changes on the proteomic landscape and the impact on cellular processes in melanoma remain poorly understood. In this study, the quantitative mass-spectrometry-based proteomic analysis is interfaced with pathological tumor characterization, and associated with clinical data. The melanoma proteome landscape, obtained by the analysis of 505 well-annotated melanoma tumor samples, is defined based on almost 16 000 proteins, including mutated proteoforms of driver genes. More than 50 million MS/MS spectra were analyzed, resulting in approximately 13,6 million peptide spectrum matches (PSMs). Altogether 13 176 protein-coding genes, represented by 366 172 peptides, in addition to 52 000 phosphorylation sites, and 4 400 acetylation sites were successfully annotated. This data covers 65% and 74% of the predicted and identified human proteome, respectively. A high degree of correlation (Pearson, up to 0.54) with the melanoma transcriptome of the TCGA repository, with an overlap of 12 751 gene products, was found. Mapping of the expressed proteins with quantitation, spatiotemporal localization, mutations, splice isoforms, and PTM variants was proven not to be predicted by genome sequencing alone. The melanoma tumor molecular map was complemented by analysis of blood protein expression, including data on proteins regulated after immunotherapy. By adding these key proteomic pillars, the MM500 study expands the knowledge on melanoma disease.Entities:
Keywords: BRAF; acetylation stoichiometry; driver mutations; histopathology; metastatic melanoma; phosphorylation; posttranslational-modification; proteogenomics
Mesh:
Substances:
Year: 2021 PMID: 34323402 PMCID: PMC8299047 DOI: 10.1002/ctm2.451
Source DB: PubMed Journal: Clin Transl Med ISSN: 2001-1326
FIGURE 1Comprehensive view of proteomic workflows used in the MM500 study. (Upper panel) 505 melanoma tissue samples and four cultured cell lines were analyzed. 1549 LC‐MS/MS experiments produced a proteomic signature of melanoma based on the quantification of 15 973 protein groups representing more than 360 000 nonredundant peptides. (Sample preparation) Several protocols were used which included protein extraction in the presence of urea or SDS with the aid of a Sonifier or a Bioruptor, followed by manual or automatic enzymatic digestion. (Global proteomics) This was performed using both DDA and DIA. DDA data was generated by TMT 11‐plex technology combined with high pH RP‐HPLC fractionation; by SCX stepwise separation of peptide mixtures, by the analysis of fractions derived from the MED‐FASP method, and also by shotgun proteomics. (Acetylomics) DIA‐MS was used to determine naturally occurring protein acetylation sites. This was achieved by modifying protein‐free lysine e‐amino groups with deuterium‐labeled acetyl groups, which upon MS peptide identification and quantitation allowed distinguishing chemically labeled acetylation from endogenous acetylation. (Phosphoproteomics) Enrichment of phosphopeptides was performed in the Bravo AssayMap robot and isolated phosphopeptides were directly analyzed by DDA or DIA. (Spectral Libraries of DIA‐MS) MS/MS spectral libraries for DIA‐MS global proteomics acetylomics and phosphoproteomics were built out of DDA‐LC‐MS/MS data. This included shotgun analysis of the very same samples submitted to DIA‐MS, of other samples from melanoma tissues and cultured cells used in this meta‐study, as well as the analysis of a mixture of these samples previously fractionated by high pH RP‐HPLC. (Shotgun analysis) Individual samples were submitted to LC‐MS/MS analysis either in DDA or DIA modes. (Data analysis) The programs Proteome Discoverer and Spectronaut were used throughout all the experiments for protein identification and quantitation
FIGURE 2The Melanoma Protein Abundance Map. LC‐MS/MS data was first normalized across batches of analysis in the MM500 study. (A) Violin plots showing the distribution of intrabatch coefficients of variation for the 45 proteins, identified in 100% of the samples and with less than 60% of variation in all batches. (B) Box plots of the relative abundance of the 45 less variable proteins in each batch. The median abundance in each batch was used for inter‐batch abundance correction of the melanoma proteome. (C) Box plots of protein relative abundance across all samples of the the study, before (top panel) and after (bottom panel) intra‐ and interbatch abundance normalization using the 45 proteins with the lowest variability. (D) Distribution of the malignant melanoma proteome ranked according to protein abundance across all samples (left y‐axis) and the number of samples where the protein was identified (right y‐axis). Proteins were represented by the gene names. The lines point to WT protein products of genes with driver mutations in melanoma. Proteins involved in pathways commonly dysregulated in melanoma, proteins with known driver mutations, and proteins linked to melanoma therapy are marked in different colors as indicated. The number in parentheses specifies the designated isoform of the protein. A typical example is the protein Transforming acidic coiled‐coil‐containing protein 1, where the canonical protein TACC1, the isoform 2 TACC1(2), and isoform 4 TACC1(4) were quantified. A more complex example is represented by the gene CDKN2A that codes for the canonical proteins p16‐INK4a and p14ARF being both quantified, together with the isoform 4 of the former (CDKN2A (4) p16‐INK4a) and the mutated protein p16‐INK4a P114L. At the edges of the plot are highlighted enriched pathways for high‐ (red) and low‐ (blue) abundance proteins
FIGURE 3Comparison of MM500 melanoma proteome, TCGA melanoma transcriptome, and the Human proteome. (A) Overlapping of transcripts (TCGÀ), identified melanoma protein‐coding genes (canonical proteins) and the human proteome (NextProt). In NextProt, proteins are categorized in a PE1‐PE5 structure, in acceptance within the scientific community (https://www.uniprot.org/help/protein_existence), with five types of evidence for the existence of a protein: (1) experimental evidence at protein level; (2) experimental evidence at transcript level; (3) protein inferred from homology; (4) protein predicted; (5) protein is uncertain. (B) Correlation relationships between mRNA and mean protein expressions. Scatter plot of median intensity of the proteins identified in this study versus the median intensity of transcripts coming from RNA sequencing data from 443 melanoma tumors downloaded from the TCGA repositories. RNA sequencing data were classified according to the number of samples where the transcript was detected. The Pearson correlation and best‐fitting curve were provided for the whole dataset and those transcripts quantified in more than 99% of the samples. Both datasets were scaled to the range between 10 and 35. (C) Representation of the 1D KEGG annotation enrichment of the differences between the median intensity in all samples of the transcripts and the proteins. Bars indicate the level of enrichment according to a Benjamini‐Hochberg FDR truncation strategy. Blue correspond to pathways overrepresented for proteins relatively more abundant than their transcripts and Red bars correspond to pathways overrepresented in those transcripts showing relatively more abundance than their corresponding protein. Pathways were sorted based on their KEGG classification. The 1D annotation enrichment analysis was performed under the Perseus platform
Summary of mutations identified in this study
| Gene | Mutation | Identified peptide | # PSMs |
|---|---|---|---|
|
| V600E | IGDFGLAT | 8 |
|
| Q61K | QVVIDGETCLLDILDTAG | 12 |
| Q61R | QVVIDGETCLLDILDTAG | 3 | |
| G12A | LVVVGA | 1 | |
|
| G13D | LVVVGAG | 1 |
|
| N566D | VVEEING | 1 |
|
| P114L | L | 1 |
|
| N266K | SSVILFLN | 3 |
Mutation identification supported by previous genomic studies on the samples.
Substituted amino acid is highlighted in red.
Peptide Spectrum Matches indicates the number of MS/MS spectra that were assigned to the mutated peptide.
FIGURE 4Identification of mutated variants of NRAS and WT NRAS by mass spectrometry. (A) Assigned MS/MS spectrum of the TMT‐labeled peptide QVVIDGETCLLDILDTAGK corresponding to the mutation NRAS Q61K. (B) Assigned MS/MS spectrum of the TMT‐labeled peptide QVVIDGETCLLDILDTAGR corresponding to the mutation NRAS Q61R. (C) Assigned MS/MS spectrum of the TMT‐labeled peptide QVVIDGETCLLDILDTAGQEEYSAMR of WT NRAS. The Q61K/R mutations introduced an additional trypsin cleavage in the sequence of the WT protein, rendering shorter mutated peptides lacking the C‐terminal part (‐EEYSAMR) of the WT peptide sequence
FIGURE 5Melanoma phosphoproteome and kinome analysis. (A) Number of identified mono‐, di‐ and multiphosphorylated peptides. (B) Abundance distribution of the melanoma proteome phosphoproteome and acetylome. The relative abundance of the proteins was calculated based on the quantitative proteomic data, with the exception of the 439 proteins that were only detected after phosphopeptide enrichment. In that case the abundance was calculated from the phosphopeptides identified. (C) Distribution of the melanoma phosphoproteome based on enriched KEGG pathway analysis. (D) First 20 phosphorylation motifs in the output list of the motifeR software. (E) Venn diagram of the melanoma kinome comprising the kinases directly identified in this meta‐study and kinases predicted based on detection of phosphorylation motifs of identified phosphosites, both covering a comprehensive part of the human kinome. (F) Kinome network mapping based on direct identification of kinases and computational kinase‐specific phosphorylation site prediction. Kinases identified, predicted, identified/predicted, and not found have different color nodes and are clustered in different categories based on the branch color
FIGURE 6Distribution of the acetylome identified in melanoma tumors. (A) Violin plots showing the distribution of the site‐specific acetylation occupancy (acetylation stoichiometry [%], left axis) of peptides in the 60 samples submitted to acetylome analysis. The samples were grouped according to their origin: primary tumors (red), lymph node (blue), and other metastases (green). The number of acetylated peptides identified in each sample is represented with red dots and connected lines within origin based groups (no. of acetylated peptides, right axis). (B) KEGG pathways significantly enriched in the melanoma acetylome. Bars correspond to the number of acetylated proteins involved in the annotated pathway. The enrichment –log(P value) represented as red dots was plotted for each pathway annotation (right axis)
FIGURE 7Melanoma therapy‐associated proteins and blood plasma protein profiles from pooled patient samples. (A) Distribution of 35 therapy‐associated proteins identified in our samples (Top panel).The bar length indicates the number of MM500 tumor samples where the proteins were identified. The bottom panel shows a functional clustering of these proteins. The analysis was performed including nine proteins previously described by our lab as responders to several drug treatments, two well‐known treatment‐targeted proteins (CD274, PDCD1), and a signature of 24 proteins described by Harel et al (2019) as markers of response to immunotherapy. (B) Box‐plot of quantified proteins in plasma, related to the protein classes, and functions. The abundances were calculated according to NSAF criteria. (C) Pie chart representation of the 1000 proteins identified in an a pool of blood plasma of melanoma patients. The figure single out specific fractions that have been identified in proteomic or transcriptomic studies on MM tissues as well as those related to exosomal expression. (D) Representation of plasma proteins distribution among tissue samples. The x‐axis represent the percentage of plasma proteins categorized as proteins originating from blood plasma (in red “only in Proteomics of MM tissues”) or proteins identified in blood plasma and also expressed in MM tissues (in blue “Proteomics and RNASeq of MM tissues”). The y‐axis represents the percentage of MM tissue samples where the plasma proteins were identified. The intersection points marked in red represent the percentage of samples (30%, 50%, and 80%) where the plasma proteins were identified. (E) Distribution of protein originating from blood plasma across MM tissue samples. The x‐axis represents the tissue samples and y‐axis represents the percentage of proteins originating from blood plasma that were identified in MM samples (100 × (# proteins originating from blood/total number of proteins in MM tissue))
FIGURE 8The Proposed impact of the MM500 study data under the clinical treatment cycle, where the patient from a healthy state enters into a progressive disease evolvement process. The first indication of early melanoma disease onset is discovered by an image capture, for example, CT (high resolution), and/or MR, accompanied by a genomic mutation signature, along with protein localization and expression. Assignments of posttranslational modifications, as well as pathway biology activations, and histopathology characterizations added to the disease status. Based on the diagnosis output, a dedicated and optimized drug treatment is presented to the patient. The melanoma disease cycle is reinforced upon metastasis development, that entails a resistance build up