| Literature DB >> 35227895 |
P S Hari1, Lavanya Balakrishnan1, Chaithanya Kotyada1, Arivusudar Everad John1, Shivani Tiwary2, Nameeta Shah3, Ravi Sirdeshmukh4.
Abstract
We have carried out proteogenomic analysis of the breast cancer transcriptomic and proteomic data, available at The Clinical Proteomic Tumor Analysis Consortium resource, to identify novel peptides arising from alternatively spliced events as well as other noncanonical expressions. We used a pipeline that consisted of de novo transcript assembly, six frame-translated custom database, and a combination of search engines to identify novel peptides. A portfolio of 4,387 novel peptide sequences initially identified was further screened through PepQuery validation tool (Clinical Proteomic Tumor Analysis Consortium), which yielded 1,558 novel peptides. We considered the dataset of 1,558 validated through PepQuery to understand their functional and clinical significance, leaving the rest to be further verified using other validation tools and approaches. The novel peptides mapped to the known gene sequences as well as to genomic regions yet undefined for translation, 580 novel peptides mapped to known protein-coding genes, 147 to non-protein-coding genes, and 831 belonged to novel translational sequences. The novel peptides belonging to protein-coding genes represented alternatively spliced events or 5' or 3' extensions, whereas others represented translation from pseudogenes, long noncoding RNAs, or novel peptides originating from uncharacterized protein-coding sequences-mostly from the intronic regions of known genes. Seventy-six of the 580 protein-coding genes were associated with cancer hallmark genes, which included key oncogenes, transcription factors, kinases, and cell surface receptors. Survival association analysis of the 76 novel peptide sequences revealed 10 of them to be significant, and we present a panel of six novel peptides, whose high expression was found to be strongly associated with poor survival of patients with human epidermal growth factor receptor 2-enriched subtype. Our analysis represents a landscape of novel peptides of different types that may be expressed in breast cancer tissues, whereas their presence in full-length functional proteins needs further investigations.Entities:
Keywords: CPTAC; alternative splicing; breast cancer; de novo transcript assembly; proteogenomics
Mesh:
Substances:
Year: 2022 PMID: 35227895 PMCID: PMC9020135 DOI: 10.1016/j.mcpro.2022.100220
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 7.381
List of proteogenomics tools/pipelines and their salient features
| Name of pipeline/tool | Input data (source) | Custom database creation | Search engines | Splice-type interpretation | Differential expression information | Reference |
|---|---|---|---|---|---|---|
| IPAW | RNA-Seq, MS/MS raw data (A431 cells) | Genomic sequence database or transcript assembly aligned with genomic sequence. Six-frame translation of genomic sequences, no integrated transcript assembler | MSGF+ | Present | Not present | ( |
| JUMPg | RNA-Seq, MS/MS raw data (Alzheimer’s disease postmortem brain, multiple myeloma cell line [ANBL6]) | Reference genomic sequence used for custom protein database through three-frame translation and six-frame translation | Single tag–based in-house built multistage search engine | Present | Not present | ( |
| PGMiner | RNA or complementary DNA sequence, MS/MS raw data ( | User-defined custom database creation with three-frame translation with an option of six-frame translation based on reference genome sequence | MSGF+, OMSSA, and X!Tandem | Present | Not present | ( |
| Splicify | RNA-Seq fastq files, MS/MS raw data (colorectal cancer cell line SW480) | Reference genome sequence based on three-frame translation and custom protein database creation | MaxQuant | Present | Present | ( |
| ASV-ID | RNA-Seq fastq files, MS/MS raw data (human embryonic kidney 293, HepG2, HeLa, and MCF7 cell lines) | Reference genome sequence based on three-frame translation and custom protein database creation | Comet | Present | Present | ( |
| PGA | vcf, bed, and gtf files from RNA-Seq, MS/MS raw data. (Jurkat cell line) | Reference genome sequence based on three-frame translation or six-frame translation of | One default search engine | Present | Not present | ( |
| ProteomeGenerator | RNA-Seq, MS/MS raw data (K052 leukemia cells) | Reference genomebased sequence alignment, Reference or | MaxQuant | Present | Not present | ( |
PubMed literature was searched for the last 5 years using proteogenomics tools and pipelines as the keyword, and some of the major tools/pipelines were examined for the analytical details. The list given in the table is not exhaustive but represents major tools/pipelines used. The custom pipeline used in CPTAC analysis is not listed in this table but discussed under the Results section.
Fig. 1Proteogenomic analysis and identification of novel peptides.A, a schematic view of the proteogenomic pipeline. Breast cancer transcriptomic and proteomic data from CPTAC resource was used for the analysis. The pipeline includes de novo assembly of RNA-Seq reads followed by six-frame translation for custom database creation to search against the MS/MS files from the proteomics analysis. The custom database generated for each of the samples was searched against the respective mgf files using the search engines, X!Tandem, MSGF+, and Tide. PeptideShaker was used for integrated identification of the candidate peptides and their corresponding proteins. The known peptides (RefSeq) were then filtered out from the total identifications to get the list of novel peptides, which were then subjected to ACTG analysis followed by categorization into different kinds of peptide categories using custom scripts as described under Experimental Procedures section. The novel peptides obtained were then validated using PepQuery. The novel peptides validated by PepQuery were categorized into those that map to protein-coding genes, noncoding genes, and uncharacterized ORFs. Numbers shown in brackets represent number of novel peptides in the respective groups. The different types of novel peptides obtained after ACTG categorization are also shown. The novel peptides mapping to known protein-coding genes were mapped to cancer hallmark genes and further assessed for clinical relevance in breast cancer by carrying out survival analysis. Validation with DeepMass:Prism is briefly discussed in the Results section. B, chromosome-wise distribution of the PepQuery-validated peptides as revealed by ACTG and their respective categories, indicated by the color code, is shown. The details of the categories are explained under Experimental Procedures section.
Fig. 2Density distribution plot ( Details about the number of peptides identified in our analysis as compared with CPTAC analysis are as follows: CPTAC—422, our analysis—1,558, and overlap—38. The basis and details of these numbers are given in the Results section.
Fig. 3Schematic representation of novel peptide categories to understand their functional and clinical significance. Of the 1,558 peptides validated by PepQuery (Fig. 1), 580 were found to map to known protein-coding genes,147 mapped to noncoding genes, and 801 mapped to uncharacterized ORFs. The different types of peptides seen in each of the categories along with the respective numbers are depicted using the pie chart. The peptides (n = 580) corresponding to 501 protein-coding genes were further mapped to cancer hallmarks to identify their tumor relevance. Seventy-six of them mapped to cancer hallmarks, and the corresponding novel peptide sequences were further subjected to survival analysis as described under Experimental Procedures and Results sections. The survival association plots for significant peptide sequences are given in Figure 4.
Fig. 4Survival analysis of novel peptide sequences mapping to protein-coding genes. Survival plots for the novel peptide sequences belonging to eight genes, namely FADD, FLT1, ALDOA, CXCL16, FGFR1, PLCB3, PPP2R2A, and RPA1, found to exhibit significant survival association are provided along with the respective peptide sequences. The novel peptides were quantified at transcript level using the breast cancer RNA-Seq data from TCGA. Red line represents high-expression group of patients, whereas blue line indicates low-expression group of patients. Number of patients at risk in the high- and low-expression groups are also shown. A, peptides showing survival association in luminal (FADD) and basal subtypes (FLT1). B, peptides showing survival association in HER2-enriched subtype (ALDOA, CXCL16, FGFR1, PLCB3, PPP2R2A, and RPA1). ALDOA, aldolase A; CXCL16, C-X-C motif chemokine ligand 16; FADD, Fas-associated death domain; FGFR1, fibroblast growth factor receptor 1; FLT1, Fms-related receptor tyrosine kinase 1; HER2, human epidermal growth factor receptor 2; PLCB3, phospholipase C beta 3; PPP2R2A, protein phosphatase 2 regulatory subunit B alpha; RPA1, replication protein A1.
List of novel peptides corresponding to protein-coding genes significantly associated with survival
| Serial number | Novel peptide sequence | Gene symbol; gene description | Molecular function | CHG class | Survival association | Survival outcome | ||
|---|---|---|---|---|---|---|---|---|
| Novel peptide | Parent gene | |||||||
| 1 | ISSEAPELATTSTMPYQYPALTPEQK | ALDOA; aldolase, fructose-bisphosphate A | Lyase activity | Reprogramming energy metabolism | HER2-enriched | High expression of poor survival | 0.05 | 0.31 |
| 2 | TGQAGGLLNR | CXCL16; C-X-C motif chemokine ligand 16 | Chemokine activity | Inducing angiogenesis; evading immune destruction; resisting cell death; and tumor-promoting inflammation | HER2-enriched | High expression of poor survival | 0.05 | 0.44 |
| 3 | RAGAGDAGTRPL | FGFR1; fibroblast growth factor receptor 1 | Transmembrane receptor protein tyrosine kinase activity | Inducing angiogenesis; sustaining proliferative signaling; resisting cell death; activating invasion and metastasis; enabling replicative immortality; evading growth suppressors; and reprogramming energy metabolism | HER2-enriched | High expression of poor survival | 0.05 | 0.88 |
| 4 | AESASMTER | HSPB1; heat shock protein family b (small) member 1 | Chaperone activity | Resisting cell death; activating invasion and metastasis; inducing angiogenesis; sustaining proliferative signaling | HER2-enriched | High expression of poor survival | 0.03 | 0.12 |
| 5 | QGSVGPRPAPGR | PLCB3; phospholipase C beta 3 | Phospholipase activity | Enabling replicative immortality; activating invasion and metastasis; resisting cell death; sustaining proliferative signaling; tumor-promoting inflammation; reprogramming energy metabolism; and evading immune destruction | HER2-enriched | High expression of poor survival | 0.04 | 0.35 |
| 6 | GAVDDDVAEDIISTVEFNHSGELLATGDK | PPP2R2A; protein phosphatase 2 regulatory subunit B alpha | Protein serine/threonine phosphatase activity | Activating invasion and metastasis; evading growth suppressors; reprogramming energy metabolism; sustaining proliferative signaling; and resisting cell death | HER2-enriched | High expression of poor survival | 0.02 | 0.40 |
| 7 | PSHQQPPSATMATAPYNYSYIFK | RAB14; Ras-related protein Rab-14 | GTPase-binding activity | Reprogramming energy metabolism | HER2-enriched | High expression of poor survival | 0.01 | 0.08 |
| 8 | GSLGGGAMVGQLSEGAIAAIMQ | RPA1; replication protein A1 | DNA binding | Genome instability and mutation | HER2-enriched | High expression of poor survival | 0.05 | 0.41 |
| 9 | PLADPAMDPFLVLLHSVSSSLSSSELTELK | FADD; Fas-associated death domain | Receptor signaling complex scaffold activity | Evading immune destruction; resisting cell death; and tumor-promoting inflammation | Luminal | Low expression of poor survival | 0.04 | 0.20 |
| 10 | GSLAAGSAGTGRAGR | FLT1; Fms-related receptor tyrosine kinase 1 | Transmembrane receptor protein tyrosine kinase activity | Inducing angiogenesis; activating invasion and metastasis; sustaining proliferative signaling; evading growth suppressors; and resisting cell death | Basal | Low expression of poor survival | 0.05 | 0.21 |
Parent genes, HSPB1 and RAB14, were found to have near significant survival association. Please refer to supplemental Fig. S2.
Fig. 5MS/MS spectra of novel peptides of eight protein-coding genes with survival association as shown inFigure 4. The details of the peptides and their corresponding genes are given in supplemental Table S5.