| Literature DB >> 21356119 |
W Andrew Lancaster1, Jeremy L Praissman, Farris L Poole, Aleksandar Cvetkovic, Angeli Lal Menon, Joseph W Scott, Francis E Jenney, Michael P Thorgersen, Ewa Kalisiak, Junefredo V Apon, Sunia A Trauger, Gary Siuzdak, John A Tainer, Michael W W Adams.
Abstract
BACKGROUND: Metal-containing proteins comprise a diverse and sizable category within the proteomes of organisms, ranging from proteins that use metals to catalyze reactions to proteins in which metals play key structural roles. Unfortunately, reliably predicting that a protein will contain a specific metal from its amino acid sequence is not currently possible. We recently developed a generally-applicable experimental technique for finding metalloproteins on a genome-wide scale. Applying this metal-directed protein purification approach (ICP-MS and MS/MS based) to the prototypical microbe Pyrococcus furiosus conclusively demonstrated the extent and diversity of the uncharacterized portion of microbial metalloproteomes since a majority of the observed metal peaks could not be assigned to known or predicted metalloproteins. However, even using this technique, it is not technically feasible to purify to homogeneity all metalloproteins in an organism. In order to address these limitations and complement the metal-directed protein purification, we developed a computational infrastructure and statistical methodology to aid in the pursuit and identification of novel metalloproteins.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21356119 PMCID: PMC3058030 DOI: 10.1186/1471-2105-12-64
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Experimental and computational overview. Below the dashed line is a schematic overview of the framework: all of the tools, databases and methods developed and utilized in this effort. Subsequent figures focus on the methods and calculations in the bioinformatics category.
Figure 2GMPA score (Global Metal Protein Association) Calculation. A, B and C illustrate the calculation with data for p= PF1587 and m= Mo, arrows represent generic steps in the calculation. A) Peptide counts for each protein (per fraction) are reduced to Boolean values (present/not present-shown as blue/white cells respectively). Whether a fraction is part of a metal peak or not is already a Boolean value (green/red cells). B) The "present/not present" values are counted across all fractions in the data set. C) The GMPA score is calculated from these values using the hypergeometric distribution and is roughly a p-value: how likely is a given protein to have been seen in metal peak fractions as many times as it was (or more) assuming an equal likelihood for the protein to have been observed in any fraction and given the number of metal peak fractions, protein fractions, and total fractions.
Figure 3Protein selection using GMPA score significance curve criterion. Scatterplots (per metal) of the GMPA scores versus number of fraction occurrences for all proteins. The green/orange lines are exponential "significance" curves (plotted on a logarithmic scale) and proteins below are considered significant and selected for clustering (green regions and cyan points). The Ni significance curve (green) was based on occurrences of known Ni-proteins (red points) and the Mo significance curve (orange) was extrapolated from the relationships between the significance curves and exponential regression curves (through all points-the black lines) on average across all metals with known metalloproteins. Typically, this step removes an additional 20-35 proteins beyond what would be removed using the regression curves themselves.
Figure 4Hierarchical clustering of GMPA score criterion significant nickel associated proteins. Potential nickel proteins clustered based on co-occurrence in fractions using Ward's method. The grid at top contains excerpts from the data set selected to give a rough sense of protein co-occurrence within fractions and its effect on the resulting clustering. The numbered boxes at the bottom indicate the partitioning of the overall clustering into self-contained clusters as determined by cutreeHybrid (indicated by colors in supplementary material clusterings). Clusters containing known Ni-proteins are highlighted at the bottom illustrating the clustering together of subunits of known nickel-proteins and clustering apart of distinct Ni-proteins (SHI: soluble hydrogenase I PF0891-PF0894, SHII: soluble hydrogenase II PF1329-PF1332).
Figure 5IPM predicted metalloproteins identified by MS/MS. Out of the 870 proteins identified with two or more peptides, 221 were predicted to be metalloproteins. The majority were predicted to contain Fe or Zn with fewer predicted to contain Mn, Mo, Co, W and Ni.
GMPA Analysis Clustering and Coverage of Known Metalloproteins
| Metal | Known metalloprotein subunits | Proteins | Clusters | |||
|---|---|---|---|---|---|---|
| Total | Observed | Met GMPA | Total | With known | ||
| Co | 5 | 5 | 3 | 139 | 16 | 3 |
| 35(20) | 35 | 19 | 148 | 15 | 8 | |
| 0 | 0 | 0 | 73 | 7 | 0 | |
| 2 | 2 | 1 | 119 | 10 | 1 | |
| 12(5) | 12 | 9 | 153 | 13 | 4 | |
| 0 | 0 | 0 | 90 | 9 | 0 | |
| 0 | 0 | 0 | 76 | 7 | 0 | |
| 0 | 0 | 0 | 45 | 5 | 0 | |
| 5 | 5 | 4 | 136 | 11 | 4 | |
| 5 | 5 | 1 | 116 | 9 | 1 | |
Some proteins consist of multiple subunits, the number of holoenzymes are given in parentheses. Some proteins contain multiple metals and factor into multiple rows. For full clusters, refer to supplementary material.
Manually Evaluated Nickel Protein Candidates
| Cluster | ORF | Annotation | Crystal | Metal | IPM |
|---|---|---|---|---|---|
| 2 | PF0144 | Aldolase-type TIM barrel | Fe | ||
| 2 | PF1881 | Alba archaeal DNA/RNA-binding protein | |||
| 3 | PF0038 | Beta-lactamase-like glyoxalase II family member | Zn | ||
| 4 | PF1916 | Glycosyl transferase, family 2 | |||
| 4 | PF1987 | Conserved hypothetical protein | |||
| 5 | PF0138 | Uncharacterized rubrerythrin domain protein | Fe | ||
| 6 | PF1664 | Phosphoribosyl-AMP cyclohydrolase | Cd | Zn | |
| 6 | PF2038 | Adenosylcobalamin biosynthesis | Mg | Co | |
| 8 | PF1500 | PRC-barrel-like | |||
| 8 | PF1529 | Pyroxidine biosynthesis protein | |||
| 12 | PF1401 | Peptidyl-prolyl cis-trans isomerase | |||
| 13 | PF0615 | Hydrogenase expression/formation protein A | Zn | Ni | |
| 13 | PF1272 | LamB/YcsF | |||
| 13 | PF1684 | Acetylglutamate kinase | |||
| 13 | PF1861 | Lysyl aminopeptidase | Zn | ||
Cluster number refers to hierarchical clustering with dynamic hybrid partitioning, see Figure 4 for explanation and supplementary tables 1-8 in Additional File 9 for complete cluster tables. Crystal structures obtained from PDB with sequence similarity >50%. ORF numbers in bold have been previously characterized in P. furiosus, those in italics were characterized by metal-targeted purification.
Manually Evaluated Molybdenum Protein Candidates
| Cluster | ORF | Annotation | Crystal | Metal in | IPM |
|---|---|---|---|---|---|
| 1 | PF0009 | ThiF family protein | Zn | ||
| 1 | PF0187 | Putative cofactor synthesis protein | Fe,Mo,W | ||
| 1 | PF0668 | YjgF-like protein | |||
| 1 | PF1718 | Wyosine base formation, Radical SAM | Fe | ||
| 1 | PF1766 | Cell division transporter FtsY | |||
| 2 | PF1956 | Fructose-1,6-bisphosphate aldolase class I | |||
| 3 | PF1828 | Protein of unknown function DUF1621 | |||
| 3 | PF1886 | Carbohydrate/purine kinase | |||
| 4 | PF0098 | NAD+ synthase | |||
| 4 | PF0236 | Phosphoribosyl pyrophosphokinase | Mg | ||
| 4 | PF1401 | Peptidyl-prolyl cis-trans isomerase | |||
| 4 | PF1675 | Asp/Glu/hydantoin racemase | |||
| 4 | PF1731 | Signal recognition particle 54 | |||
| 5 | PF1538 | Amidohydrolase 1 | Ni | ||
| 7 | PF0523 | Protein of unknown function DUF509 | Mg | ||
| 7 | PF1222 | Protein of unknown function DUF217 | |||
| 9 | PF0212 | DNA polymerase, family B | Mn | ||
| 9 | PF0306 | Translation factor, SUA5 type | |||
| 9 | PF0463 | Phosphoglycolate phosphatase | |||
Cluster number refers to hierarchical clustering with dynamic hybrid partitioning, see Figure 4 for explanation and supplementary tables 1-8 in Additional File 9 for complete cluster tables. Crystal structures obtained from PDB with sequence similarity >50%. ORF numbers in bold have been previously characterized in P. furiosus, those in italics were characterized by metal-targeted purification.