| Literature DB >> 31316139 |
Iulia M Lazar1,2, Arba Karcini3, Shreya Ahuja3, Carly Estrada-Palma4.
Abstract
Cancer evolves as a result of an accumulation of mutations and chromosomal aberrations. Developments in sequencing technologies have enabled the discovery and cataloguing of millions of such mutations. The identification of protein-level alterations, typically by using reversed-phase protein arrays or mass spectrometry, has lagged, however, behind gene and transcript-level observations. In this study, we report the use of mass spectrometry for detecting the presence of mutations-missense, indels and frame shifts-in MCF7 and SKBR3 breast cancer, and non-tumorigenic MCF10A cells. The mutations were identified by expanding the database search process of raw mass spectrometry files by including an in-house built database of mutated peptides (XMAn-v1) that complemented a minimally redundant, canonical database of Homo sapiens proteins. The work resulted in the identification of nearly 300 mutated peptide sequences, of which ~50 were characterized by quality tandem mass spectra. We describe the criteria that were used to select the mutated peptide sequences, evaluate the parameters that characterized these peptides, and assess the artifacts that could have led to false peptide identifications. Further, we discuss the functional domains and biological processes that may be impacted by the observed peptide alterations, and how protein-level detection can support the efforts of identifying cancer driving mutations and genes. Mass spectrometry data are available via ProteomeXchange with identifier PXD014458.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31316139 PMCID: PMC6637242 DOI: 10.1038/s41598-019-46897-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Stacked column charts depicting the relative frequency of amino acid mutations. (a) Charts representing the number of mutations from each amino acid in the original protein sequence to a mutated residue; (b) Charts representing the number of mutations to each amino acid from the original residues. The mutation counts were normalized to the frequency of the respective amino acids in the human proteome.
Selected peptides with mutated sequences.
| Sequence | UniProt ID | Protein Name | Mutation | Cells line presence | Pfam Domain |
|---|---|---|---|---|---|
| AFQHLSEAVQAAEEEAQPPSWSCGLAAGVIDAYMTLADFCDQQLR | P78527 | DNA-dependent protein kinase catalytic subunit | P3405L | MCF10 | FAT |
|
| P23381 | Tryptophan–tRNA ligase, cytoplasmic | A31V | MCF10 | WHEP-TRS |
| AINQSQSVQESLESLLQSIGEVEQNLEGK | Q9UPN3-5 | Isoform 4 of Microtubule-actin cross-linking factor 1 | A2547? | MCF10 | |
| ALMLQGVDLLDDAVAVTMGPK | C9JL25 | 60 kDa heat shock protein, mitochondrial (Fragment) | A90D | MCF7, MCF10, SKBR3 | Cpn60 TCP1 |
| AMQAAGQIPATALLPTMTPDGLSVTPTPVPVVGSQMTR | P26368 | Splicing factor U2AF 65 kDa subunit | A131S | MCF7 | |
| ASTVKSVLELIPELNEKGEAYNSLMK | P42704 | Leu-rich PPR motif-containing protein, mitochondrial | E1318G | MCF7, MCF10 | |
| CAPLFSGTEHHASLIDSLLHTVYRLSK | Q92736 | Ryanodine receptor 2 | A2536S | MCF7 | |
| CQEKWDKLLLTSTEK | Q7L0Y3 | Mitochondrial ribonuclease P protein 1 | Y262C | MCF7, MCF10 | tRNA m1G MT |
| DDLQFLADLEELITKFQVFRISQR | Q9Y6X0 | SET-binding protein | H971Q | MCF7, MCF10, SKBR3 | |
| DIMTYVSSFYHAFSGAQK | O43707 | Alpha-actinin-4 | A256D | MCF7, MCF10 | CH |
| DQEGQDLLLFIDNIFR | P06576 | ATP synthase subunit beta, mitochondrial | V301L | SKBR3 | ATP-synt ab |
|
| P21333 | Filamin-A | L261M | MCF7, MCF10 | CH |
| EAVMSFSITETEKIK | Q5T4T6 | Synaptonemal complex protein 2-like | N353S | MCF10 | |
| EFADSLGIPFLETSAKNAMNVEQSFMTMAAEIK | P62820 | Ras-related protein Rab-1A | T159M | SKBR3, MCF7 | Ras |
| EFDTLSGKVEESPDK | Q9BXX2 | Ankyrin repeat domain-containing protein 30B | L760V | SKBR3 | |
| EIYPYVIQERRPTLNELGISTPEELGLDKV | P20674 | Cytochrome c oxidase subunit 5 A, mitochondrial | L130R | SKBR3, MCF7 | COX5A |
| EQLQQEQALLEEIER | Q15149 | Plectin | R1386Q | SKBR3, MCF7 | |
| FGLAHLMALGLGPWMAVEIPDLIQK | Q9Y3C8 | Ubiquitin-fold modifier-conjugating enzyme 1 | L146M | MCF10 | UFC1 |
| FQSSAVMALQEGCEAYLVGLFEDTNLCAIHAK | P68431 | Histone H3.1 | A96G | SKBR3, MCF10 | Histone |
|
| P35268 | 60 S ribosomal protein L22 | 39Y | SKBR3, MCF10 | Ribosomal L22e |
| GCIEKLSEDVEQLKK | P61266 | Syntaxin-1B | V53L | MCF7 | Syntaxin |
| GFDFVTFESPADAK | P38159-3 | Isoform 3 of RNA-binding motif protein, X chr | A52D | MCF10 | RRM 1 |
| GFGFITFTNPEHASDAMR | P98179 | Putative RNA-binding protein 3 | V62D | MCF7 | RRM 1 |
| GMLDLLEVHLLDFPNIVIK | Q6P2Q9 | Pre-mRNA-processing-splicing factor 8 | P1871L | MCF7 | PRP8 domainIV |
| P50914 | 60 S ribosomal protein L14 | A159_K160insAAA | MCF7, MCF10 | ||
| HQGVMVGMCQKDSYVGDEAQSK | P68133 | Actin, alpha skeletal muscle | G50C | MCF7, MCF10, SKBR3 | Actin |
| HRILPEKYPPPTELLDLQPLPVSALR | O75643 | U5 small nuclear ribonucleoprotein 200 kDa helicase | L1289R | MCF7 | |
| IMSLVDPNHCGLVTFQAFIDFMSR | O43707 | Alpha-actinin-4 | S823C | SKBR3, MCF7 | |
|
| P31040 | Ubiquinone flavoprotein subunit, mitochondrial | Y629F | MCF7 | Succ DH flav C |
| KSQESLTENPSETLKPATSISSTSQTKGINVK | F5GXV7 | Neurobeachin | I1735T | MCF10 | |
| LCGLLVLGSWCISVMGFLLETLTILR | O60412 | Olfactory receptor 7C2 | S156F | MCF10 | 7tm 4 |
| LCYVALYFEQEMATAASSSSLEK | Q6S8J3 | POTE ankyrin domain family member E | D922Y | MCF7 | Actin |
| LDTNSDGQLDYSEFLNLIGGLAMACHDSFLK | P31949 | Protein S100-A11 | F77Y | MCF10 | EF-hand 1 |
| LFDHLESPTPNPTEPLFLAQAEVYK | P49327 | Fatty acid synthase | P972L | SKBR3 | PS-DH |
| LFLASLAAAGSGTDAQVALENEVK | Q7Z6Z7 | E3 ubiquitin-protein ligase HUWE1 | V2153E | SKBR3 | |
| LTENLSALQR | Q8TD16-2 | Isoform 2 of Protein bicaudal D homolog 2 | R398Q | MCF7 | BicD |
| LTQAQIFDYSEIPNFPR | P00491 | Purine nucleoside phosphorylase | G51S | MCF7 | PNP UDP 1 |
| MDATFIGNSTAIQELFK | P04350 | Tubulin beta-4A chain | A364D | MCF7, MCF10, SKBR3 | Tubulin C |
| MHDMNTDQENLVGTHDAPIR | O43684 | Mitotic checkpoint protein BUB3 | L84M | MCF7 | |
| MLVVLRQGTREEDDVVSEDLVQQDVQDLYEAGELK | P08133 | Annexin A6 | L162R | SKBR3 | |
| QVHPDTGISSKVMGIMNSFVNDIFER | P23527 | Histone H2B type 1-O | A59V | SKBR3, MCF7 | Histone |
| QVYPDTGISSKAMGIMNSFVNDIFER | O60814 | Histone H2B type 1-K | H50Y | SKBR3, MCF7 | Histone |
| SDASSGQSGSRSASRTTR | P20930 | Filaggrin | R884S | SKBR3 | |
| SLGQNPTEAELQDMINEVDADGNGTIDFPEFFTMMAR | P62158 | Calmodulin | L70F | MCF7, MCF10, SKBR3 | EF-hand 7 |
| SMGIMNSFVNDIFER | Q8N257 | Histone H2B type 3-B | A59S | MCF7, MCF10, SKBR3 | Histone |
| SVIVVLRLNVDLQAVVIFELVY (del/frame shift) | Q8TC27 | Disintegrin and metalloproteinase domain protein 32 | K388fs*39 | SKBR3, MCF7 | |
| SYELPDGQVITIGKER | P68133 | Actin, alpha skeletal muscle | N254K | MCF7, MCF10, SKBR3 | Actin |
|
| Q13469 | Nuclear factor of activated T-cells, cytoplasmic 2 | S330L | MCF7 | |
| TTGIVMDSGDGVTHTVPIYEAYALPHAILR | P63261 | Actin, cytoplasmic 2 | G168A | MCF7, MCF10, SKBR3 | Actin |
|
| P06748-3 | Isoform 3 of Nucleophosmin | A53S | SKBR3, MCF10 | Nucleoplasmin |
|
| P53675 | Clathrin heavy chain 2 | D1188E | MCF10 | Clathrin |
| VTNGAFTGEISLGMIK | P60174-1 | Isoform 2 of Triosephosphate isomerase | P81L | SKBR3, MCF7 | TIM |
Figure 2Comparison of chromosomal distributions of missense mutations in various datasets to the human genome. The left panels (blue) represent mutations in the human genome (18,864 genes), and the right panels (orange) represent mutations in various protein or gene datasets. Comparisons to the human genome are provided for: (a–c) an aggregate list of 9,046 genes coding for proteins detected in the proteome profile of MCF7, MCF10 and SKBR3 cells; (d–f) the list of 980 genes in the OncoKB database[33]; (g–i) the list of 137 oncogenes and tumor suppressors proposed in ref.[1]. (j–l) the list of 194 mutated proteins detected in SKBR3 and MCF7 cells. The 1st column of histograms represents the % distribution of genes or proteins per chromosome in the human genome and in the various datasets; the 2nd column represents the % distribution of missense mutations per chromosome in the human genome and in various datasets; the 3rd column represents the distribution of the total number of mutations per gene or protein and per chromosome. The numbers of genes or proteins for each dataset are provided for each panel, and the 2nd column provides the total mutation counts per dataset, as well. All mutation counts represent the catalogued values for the existing genes in the mutation database that was used in this study.
Hypothetical spectral count scenarios that reflect the detectability of mutated peptides and proteins in a subpopulation of cancer cells that carry the mutation.
| Small cell number/Low abundance protein | Large cell number/Low abundance protein | Small cell number/High abundance protein | Large cell number/High abundance protein | |
|---|---|---|---|---|
| Spectral counts of the mutated peptides | Zero-to-low | Low | Low | High |
| Spectral counts of the non-mutated counter-peptides | Low | Zero-to-low | High | Low |
| Spectral counts of the mutated protein | Low | Low | High | High |
Two scenarios, low and high, of the number of cancer cells carrying a specific mutation in a larger population of cancer cells, and of the abundance of the protein carrying the mutation, are considered.
Figure 3Column chart representing BLOSUM62 scores associated with the observed amino acid substitutions. The scores were assigned based on the BLOSUM62 substitution matrix and reflect the frequency of amino acid substitutions in conserved regions of protein families. More negative scores reflect less likely substitutions, while more positive scores reflect more likely substitutions. For scores close to zero, the observed frequency of a substitution approaches the expected value.
Figure 4Column chart representing a few GO biological processes of relevance to cancer. Assignments of proteins to a particular category were made with DAVID tools. The selection of biological processes was made based on relevance and number of protein components per category (>4).