| Literature DB >> 26503891 |
Ioanna Ntai1, Richard D LeDuc2, Ryan T Fellers2, Petra Erdmann-Gilmore3, Sherri R Davies3, Jeanne Rumsey3, Bryan P Early2, Paul M Thomas4, Shunqiang Li3, Philip D Compton5, Matthew J C Ellis6, Kelly V Ruggles7, David Fenyö7, Emily S Boja8, Henry Rodriguez8, R Reid Townsend9, Neil L Kelleher10.
Abstract
Bottom-up proteomics relies on the use of proteases and is the method of choice for identifying thousands of protein groups in complex samples. Top-down proteomics has been shown to be robust for direct analysis of small proteins and offers a solution to the "peptide-to-protein" inference problem inherent with bottom-up approaches. Here, we describe the first large-scale integration of genomic, bottom-up and top-down proteomic data for the comparative analysis of patient-derived mouse xenograft models of basal and luminal B human breast cancer, WHIM2 and WHIM16, respectively. Using these well-characterized xenograft models established by the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium, we compared and contrasted the performance of bottom-up and top-down proteomics to detect cancer-specific aberrations at the peptide and proteoform levels and to measure differential expression of proteins and proteoforms. Bottom-up proteomic analysis of the tumor xenografts detected almost 10 times as many coding nucleotide polymorphisms and peptides resulting from novel splice junctions than top-down. For proteins in the range of 0-30 kDa, where quantitation was performed using both approaches, bottom-up proteomics quantified 3,519 protein groups from 49,185 peptides, while top-down proteomics quantified 982 proteoforms mapping to 358 proteins. Examples of both concordant and discordant quantitation were found in a ∼60:40 ratio, providing a unique opportunity for top-down to fill in missing information. The two techniques showed complementary performance, with bottom-up yielding eight times more identifications of 0-30 kDa proteins in xenograft proteomes, but failing to detect differences in certain posttranslational modifications (PTMs), such as phosphorylation pattern changes of alpha-endosulfine. This work illustrates the potency of a combined bottom-up and top-down proteomics approach to deepen our knowledge of cancer biology, especially when genomic data are available.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26503891 PMCID: PMC4762530 DOI: 10.1074/mcp.M114.047480
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911
Summary of experiments comparing the performance of TD and BU proteomics to detect and quantify cancer specific aberrations
| Study | Description | Bottom-up | Top-down |
|---|---|---|---|
| 1 | Qualitative comparison of WHIM2 and WHIM16 (BU/TD) protein MW range 0–100 kDa[ | 10,453 proteins[ | 2,006 proteoforms (370 proteins[ |
| 2 | Label-free TD quantitation of WHIM2 vs WHIM16 protein MW range 0–30 kDa[ | N/P[ | 1,334 proteoforms (218 proteins[ |
| 3 | Quantitative comparison of WHIM2 and WHIM16 protein MW range 0–30 kDa[ | 3,367 proteins[ | 3,125 proteoforms (438 proteins[ |
a Proteins were fractionated using GELFrEE. Representative fractionations for each study are illustrated in Supplemental Fig. S1.
b The term proteins corresponds to protein groups as defined by Peak Studio, ver. 7.
c the term proteins corresponds to a single RefSeq identifier.
d Identification required a spectrum count of 3 within a single LC/MS run.
e not performed.
Fig. 1.Workflow for the qualitative and quantitative analysis of CompRef tumor xenografts by bottom-up and top-down.
Coding polymorphisms (cSNPs) detected and genotyped by TD proteomics
| RefSeq | Uniprot accession | Protein description | cSNP | WHIM2 | WHIM16 |
|---|---|---|---|---|---|
| NP_000995 | P05387 | 60S acidic RP P2 | S64I | S64 and I64 | S64 |
| NP_001093162 | Q6IS14 | eIF-5A1-like | V137L | V137 | V137 and L137 |
| NP_001120865 | P56378 | 6.8kDa mitochondrial proteolipid | I26V | I26 and V26 | I26 |
| NP_003078 | O76070 | γ-synuclein | E110V | E110 | E110 and V110 |
| NP_003854 | O94777 | DPM synthase subunit 2 | T76S | N/D[ | S76 |
| NP_005013 | P07737 | Profilin-1 | N10S | N10 | N10 and S10 |
| NP_006734 | P98179 | Putative RNA-binding protein 3 | Y117D | Y117 | Y117 and D117 |
| NP_009140 | P42766 | RP L35 | N101H | N101 | N101 and H101 |
| NP_037519 | Q9UDW1 | Cytochrome b-c1 complex subunit 9 | I47V | I47 and V47 | I47 |
| NP_543011 | Q96KR6 | Protein FAM210B | P126S | P126 and S126 | P126 |
a not detected.
Fig. 2.Protein identifications in WHIM2 and WHIM16. (A) TD spectrum of gamma-synuclein displaying the distinctive pattern of a heterozygote genotype at this locus, and sequence of gamma-synuclein including fragment ions (flags) detected by TD and peptide sequences (underlined) detected by BU. The highlighted N-terminal amino acid indicates an N-terminal acetylation. The cSNP E110V is circled. Both technologies provided evidence of the cSNP. (B) TD spectrum of ribosomal protein L35 displaying the distinctive pattern of a heterozygote genotype at this locus, and sequence of ribosomal protein L35 including fragment ions (flags) detected by TD and peptide sequences (underlined) detected by BU. The cSNP N101H is circled. Only TD provided evidence of the cSNP.
Fig. 3.Summary of quantitative results from Study 3. (A) Volcano plot obtained using label-free TD quantitative analysis from comparison of 0–30kDa proteins in WHIM2 and WHIM16 (Study 2), (B) Volcano plot obtained using label-free TD quantitative analysis from comparison of 0–30kDa proteins in WHIM2 and WHIM16 (Study 3), (C) Volcano plot obtained using label-free BU quantitative analysis from comparison of 0–30kDa proteins in WHIM2 and WHIM16 (Study 3), (D) Correlation of BU and TD fold change estimates for significantly different entities.
Overall quantitative results for Study 3 reveal a prevalence of concordant examples where proteoform-level changes differ substantially from that determined by BU
| Differentially expressed by TD | Not differentially expressed by TD | Not detected by TD | |
|---|---|---|---|
| Differentially expressed by BU | 12 proteins | 14 proteins | 314 proteins |
| 27 proteoforms | 18 proteoforms | N/A | |
| Not differentially expressed by BU | 152 proteins | 232 proteins | 2,795 proteins |
| 233 proteoforms | 584 proteoforms | N/A | |
| Not detected by BU | 0 proteins | 64 proteins | |
| 0 proteoforms | 99 proteoforms |
Top number are the RefSeq IDs detected in each cell, while the bottom number are the number of proteoforms detected.
TD often has more than one proteoform per RefSeq ID, and so the same ID may be in two or more boxes (as some proteoforms are differentially expressed, and others are not).
Fig. 4.Differential expression of alpha-endosulfine. (A) Bottom-up heatmap illustrating number of alpha-endosulfine peptides identified in each replicate. Each row represents a separate peptide reporting uniquely on alpha-endosulfine, while columns in the map represent separate LC-MS/MS runs. Red represents one spectral count in the run, yellow two, light green three, and dark green four or more spectral counts. (B) Sequence of alpha-endosulfine including fragment ions (flags) detected by TD and peptide sequences (underlined) detected by BU. Two phosphorylation sites detected by TD are circled. The highlighted N-terminal amino acid indicates an N-terminal acetylation. (C) Boxplots illustrating abundance differences of alpha-endosulfine in WHIM2 (blue) and WHIM16 (orange) samples. The box in the boxplots show the median, first and third quartiles of all MS1 intensities detected for the protein. The bars show the range of the observed data. (D) Mass spectrum of alpha-endosulfine showing phosphorylation pattern changes of alpha-endosulfine in the two WHIM samples.
Fig. 5.Discordant examples of differential expression profiles as measured by BU and TD. Panels A–D show heatmaps generated from BU spectral count data, while panels E–G contain corresponding boxplots from TD MS1 intensity data. Each row of the BU heatmaps represents a separate peptide reporting uniquely on the corresponding protein, while columns in the map represent separate LC-MS/MS runs. Red represents one spectral count in the run, yellow two, light green three, and dark green four or more spectral counts. The box in the boxplots show the median, first, and third quartiles of all MS1 intensities detected for the protein. The bars show the range of the observed data. Panels 5 A and E represent d-dopachrome decarboxylase (NP_001346) BU and TD, respectively; C and F represent cytochrome b-c1 complex subunit 8 (NP_055217), while D and G represent protein phosphatase 1 regulatory subunit 1B (NP_115568). Panel B shows BU data for androgen-induced gene 1 (NP_057192).