| Literature DB >> 18304323 |
Yasushi Ishihama1, Thorsten Schmidt, Juri Rappsilber, Matthias Mann, F Ulrich Hartl, Michael J Kerner, Dmitrij Frishman.
Abstract
BACKGROUND: Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18304323 PMCID: PMC2292177 DOI: 10.1186/1471-2164-9-102
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Protein fractionation, peptide separation and mass spectrometric identification strategies for enhancement of proteome identification coverage explored in this study.
| (A) Protein fractionation | (1) SDS-PAGE slicing |
| (B) Tryptic digestion | (1) In-solution digestion |
| (C) Peptide chromatography | (1) Strong cation exchange chromatograhpy |
| (D) Parent ion selection in LC-MS | (1) Simple repetition |
| (E) CID for MS/MS | (1) Quadrupole-TOF |
Abbreviations: SDS-PAGE, sodium dodecylsulfate polyacrylamide gel electrophoresis; PSDVB, poly(styrene-divinylbenzene) copolymer; TOF: time-of-flight.
Figure 1Comparison of protein abundances in the . Protein abundance as derived by emPAI values. 454 proteins with more than two identified peptides were evaluated in samples from minimal and rich medium. The dashed lines indicate the positions equivalent to a concentration ratio of 0.1 and 10. The emPAI values in the minimal and rich medium correlate significantly with a Pearson correlation coefficient of 0.7 (pval < 10-54) and 0.77 (p-value < 10-80) for logrithmized variables.
Figure 2Correlation between observed emPAI values and independently measured protein copy numbers per cell. Protein abundances in the E. coli cytosol as measured by the emPAI approach correlate well with protein copy numbers per cell measured independently by isotope dilution using spiked E. coli BW25113 cells containing 40 proteins with known amounts [33]. A dynamic range of approximately 4 orders of magnitude of protein copy numbers per cell is covered. The Pearson correlation coefficient is 0.84 with a p-value < 10-10 for logarithmized and 0.52 (p-value < 10-4) for non-logarithmized variables.
Figure 3Observed concentration and protein detection frequencies. Correlation between the observed protein copy numbers (based on emPAI) and the detection frequency of the identified proteins. Detection frequency is defined as the average ratio of detection of the observed parent ions of a given protein in all performed LCMS experiments. Red dots indicate reference proteins (introduced in Figure 2), black dots indicate ribosomal proteins.
Comparison of the experimental cytosolic sample with the complete predicted E. coli proteome with respect to the number of predicted transmembrane segments (TMS), cellular localization from the PSORT-database and experimental localization data (EXP). Shown is the amount of unique proteins and the relation to the measured number of molecules in the cell.
| Proteins b | % Proteinsc | Proteinsb | % Proteinsc | % Abundance d | |
| TMS = 0 | 3202 | 75.66 | 940 | 89.5 | 97.6 |
| TMS = 1 | 265 | 6.26 | 50 | 4.8 | 1.7 |
| TMS = 2 | 117 | 2.76 | 10 | 1.0 | 0.2 |
| TMS = 3 | 54 | 1.28 | 7 | 0.7 | 0.1 |
| TMS = 4 | 82 | 1.94 | 7 | 0.7 | 6.2E-02 |
| TMS = 5 | 61 | 1.44 | 5 | 0.5 | 2.9E-02 |
| TMS = 6 | 81 | 1.91 | 5 | 0.5 | 4.0E-02 |
| TMS = 7 | 30 | 0.71 | 1 | 0.1 | 1.1E-02 |
| TMS = 8 | 52 | 1.23 | 3 | 0.3 | 2.6E-02 |
| PSORT = Cytoplasmic (C) | 1574 | 36.51 | 554 | 52.8 | 65.3 |
| PSORT = CytoplasmicMembrane (CM) | 851 | 19.74 | 93 | 8.9 | 1.2 |
| PSORT = Periplasmic (P) | 142 | 3.29 | 61 | 5.8 | 1.6 |
| PSORT = OuterMembrane (OM) | 91 | 2.11 | 25 | 2.4 | 2.3 |
| PSORT = Extracellular (E) | 20 | 0.46 | 0 | 0.0 | |
| PSORT = Unknown (U) | 1577 | 36.58 | 288 | 27.4 | 29.0 |
| PSORT = Unknown (multiple sites) (UM) | 56 | 1.30 | 14 | 1.3 | 0.4 |
| PSORT = C| CM | U | UM | 4058 | 94.13 | 949 | 90.4 | 95.9 |
| PSORT = C | U | 3054 | 71.21 | 842 | 80.2 | 94.3 |
| TMS = 0 & PSORT = C | 1253 | 29.21 | 548 | 52.2 | 65.1 |
| TMS = 0 & PSORT = C | CM | 1903 | 44.37 | 580 | 55.3 | 65.7 |
| TMS = 0 & PSORT = C | CM | U | 3111 | 72.53 | 843 | 80.3 | 94.3 |
| TMS < = 1 & PSORT = C | 1335 | 31.13 | 553 | 52.7 | 65.3 |
| TMS < = 1 & PSORT = C | CM | 2033 | 47.40 | 592 | 56.4 | 65.8 |
| TMS < = 1 & PSORT = C | CM | U | 3334 | 77.73 | 877 | 83.5 | 94.8 |
| TMS < = 1 & PSORT = C | U | 2636 | 61.46 | 838 | 79.8 | 94.3 |
| EXP = C | 370 | 18.57 | 279 | 26.6 | 63.0 |
| EXP = IM | 76 | 3.82 | 46 | 4.4 | 4.7 |
| EXP = OM | 62 | 3.11 | 40 | 3.8 | 2.1 |
| EXP = P | 60 | 3.01 | 43 | 4.1 | 1.7 |
| TMS < = 1 & EXP = C | 281 | 6.55 | 279 | 26.6 | 63.0 |
| TMS < = 1 & EXP = IM | 62 | 1.45 | 42 | 4.0 | 4.6 |
| TMS < = 1 & EXP = OM | 44 | 1.03 | 36 | 3.4 | 2.0 |
| TMS < = 1 & EXP = P | 48 | 1.12 | 43 | 4.1 | 1.7 |
| TMS < = 1 & (PSORT = C|U | EXP = C) | 2655 | 61.90 | 853 | 81.2 | 94.6 |
| (TMS < = 1 & PSORT = C|U) | EXP = C | 2680 | 62.49 | 853 | 81.2 | 94.6 |
Annotated attributes of the proteins depicted as logical statements. An ampersand (&) indicates that both conditions must be fulfilled ('and'), a vertical line (|) indicates 'or'. The following abbreviations are used:
TMS – number of predicted transmembrane segments
PSORT – localization annotation from the PSORT database (C Cytoplasmic, CM Cytoplasmic Membrane, E Extracellular, OM Outer Membrane, P Periplasmic, U Unknown, UM Unknown – this protein may have multiple localization sites)
EXP – experimental localization data from [71] (C Cytoplasmic, IM Inner membrane, OM Outer Membrane, P Periplasmic)
Number of unique proteins with the given attributes annotated
Percentage of the unique proteins relative to the sum of unique proteins in the predicted E. coli proteome or in the experimental cytosolic sample, respectively
Percentage of the actual number of protein copies found in the experimental sample, i.e. fraction of the total protein copy number sum.
Figure 4Abundance distribution of all identified proteins. Distributions are shown for the group of highly abundant proteins and the remaining low abundance protein group. Circles show distribution outliers as defined in Methods. The lower hinge represents the first quartile (25%) and the upper hinge the third quartile (75%). The high and low group were separated by clustering at a copy number cutoff of 2050 proteins per cell as described in Methods.
The most abundant functional groups in the E. coli cytosol.
| 05.01.01 | ribosomal proteins | 55 | 1 |
| 05.01 | ribosome biogenesis | 62 | 2 |
| 63.03.03 | RNA binding | 83 | 3 |
| 05 | Protein synthesis | 107 | 4 |
| 63.03 | nucleic acid binding | 144 | 5 |
| 40.03 | cytoplasm | 275 | 6 |
| 63 | Protein with binding function or cofactor requirement (structural or catalytic) | 483 | 7 |
| 63.07 | structural protein | 6 | 8 |
| 05.04 | translation | 34 | 9 |
| 63.01 | protein binding | 113 | 10 |
| 06.01 | protein folding and stabilization | 70 | 11 |
| 04.01.99 | other rRNA-transcription activities | 6 | 12 |
Figure 5Abundance functional profile. Shown is the fraction of proteins which are involved in different functional categories in different abundance ranges. The first data point shows the functional breakdown of the 50 most abundant proteins, the second data point corresponds to the 100 most abundant proteins, and so on. Note that the fractions relative to the number of proteins (e.g. 50, 100...) do not sum up to 1 since a protein can have assigned multiple functions like protein synthesis and with binding function. The functional categories shown in the legend are the FunCat top level classifications as outlined in the Methods sections. In this plot all 1103 proteins – inclusive the 53 ribosomal proteins – are shown. Since the plot is based on relative ranking it is robust with respect to the observed copy number variability of these most abundant proteins.
The most abundant protein folds in the E. coli cytosol.
| Barrel-sandwich hybrid | 10 | 1 |
| Ribonuclease H-like motif | 11 | 2 |
| OB-fold | 27 | 3 |
| Thioredoxin fold | 15 | 4 |
| NAD(P)-binding Rossmann-fold domains | 41 | 5 |
| Transmembrane beta-barrels | 12 | 6 |
| Ferredoxin-like | 22 | 7 |
| TIM beta/alpha-barrel | 47 | 8 |
| Flavodoxin-like | 28 | 9 |
| DNA/RNA-binding 3-helical bundle | 20 | 10 |
| P-loop containing nucleoside triphosphate hydrolases | 57 | 11 |
| FAD/NAD(P)-binding domain | 14 | 12 |
| PLP-dependent transferases | 14 | 13 |
| Class II aaRS and biotin synthetases | 13 | 14 |
| Adenine nucleotide alpha hydrolase-like | 17 | 15 |
| Periplasmic binding protein-like II | 22 | 16 |
| ATP-grasp | 10 | 17 |
| S-adenosyl-L-methionine-dependent methyltransferases | 12 | 18 |
a All folds with 10 or more proteins were considered to avoid single outliers influencing the general trend.
Comparison of features associated with protein aggregation between high abundant proteins and the remaining detected proteins. The high abundant group is defined as described in Material and Methods.
| Protein length (in amino acids) | 386 (327) | 309 (252) | 10-6, 10-7 |
| Number of alternating hydrophobic-/hydrophilic stretches (> = 5aa) | 11.7 (9.0) | 9.5 (8.0) | 0.03, 10-4 |
| pI distance from neutrality | 1.52 (1.50) | 1.69 (1,84) | 0.003, 0.01 |
| Hydrophobicity (Kyte-Doolite scale) | -0.20 (-0.21) | -0.25 (-0.24) | 0.17, 0.08 |
Figure 6Abundance and essentiality. The abundance distribution of essential and non-essential proteins is shown: essential proteins are more abundant than non-essential proteins. The medians which represent 50% of all proteins within each group are shown as thick black bars, the one in the essential group is clearly higher (613 copies per cell vs. 432). Additionally in the essential group proteins can be found in higher abundance ranges than non-essential proteins (as can be seen by the difference of the upper whisker and upper hinge). A Mann-Whitney test as well as a Kolmogorov-Smirnov test indicated that the abundance distributions of essential and non-essential proteins are significantly different with p-values 0.0002 and 0.0001 respectively.
Figure 7Abundance versus codon adaptation index (CAI). Each point on the plot corresponds to a protein characterized by two values: abundance and CAI. The Spearman rank correlation coefficient rs between log-copy number and CAI is 0.5 and the Pearson correlation coefficient is 0.57 indicating a good non-random (p-values both < 10-16) correlation with some variance. The dotted line is a linear regression between log(copy number) and CAI, the solid line a loess local fitting curve.
Figure 8Karlin's predicted gene expression and measured protein abundance. The dotted line is linear regression and the solid line a loess local fitting curve. The Pearson correlation coefficient between log(copy number) and Karlin's expression value is 0.52 (p-value < 10-12) and the Spearman's rho is 0.53 (p-value < 10-12).
Figure 9Variance of abundance within known operons. Only the 33 operons for which we have abundance data of 3 or more proteins are considered. The variance of all 1050 proteins is 0.35 and shown as dashed line. Low variance within an operon shows that the abundance of its proteins is similar. Here in 91% (30 of 33) of all operons the variance is lower than the variance of all proteins (left to the vertical bar). Copy number values are distributed according to the extreme value distribution and were therefore logarithmized for better representation.