| Literature DB >> 32722260 |
Yan Zhang1,2,3, Junge Zheng1,2,3.
Abstract
Trace metals are inorganic elements that are required for all organisms in very low quantities. They serve as cofactors and activators of metalloproteins involved in a variety of key cellular processes. While substantial effort has been made in experimental characterization of metalloproteins and their functions, the application of bioinformatics in the research of metalloproteins and metalloproteomes is still limited. In the last few years, computational prediction and comparative genomics of metalloprotein genes have arisen, which provide significant insights into their distribution, function, and evolution in nature. This review aims to offer an overview of recent advances in bioinformatic analysis of metalloproteins, mainly focusing on metalloprotein prediction and the use of different metals across the tree of life. We describe current computational approaches for the identification of metalloprotein genes and metal-binding sites/patterns in proteins, and then introduce a set of related databases. Furthermore, we discuss the latest research progress in comparative genomics of several important metals in both prokaryotes and eukaryotes, which demonstrates divergent and dynamic evolutionary patterns of different metalloprotein families and metalloproteomes. Overall, bioinformatic studies of metalloproteins provide a foundation for systematic understanding of trace metal utilization in all three domains of life.Entities:
Keywords: bioinformatics; comparative genomics; evolution; metal; metalloprotein; metalloproteome
Year: 2020 PMID: 32722260 PMCID: PMC7435645 DOI: 10.3390/molecules25153366
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Computational tools for metal-binding site and metalloprotein prediction.
| Name | Website | Related Metals | Main Algorithm | Reported Performance | Ref. |
|---|---|---|---|---|---|
| RDGB |
| Zn, Cu, Fe and other metals | Integration of tools for retrieval of protein domains and genome analysis | Accuracy: 89.6%, precision: 85.9% | [ |
| Zincfinder |
| Zn | a SVM | b AURPC: 0.590 (local predictor) and 0.633 (gating network) | [ |
| Zincpred |
| Zn | SVM- and homology-based algorithm | AURPC: 0.723 (local predictor) and 0.701 (gating network) | [ |
| TEMSP |
| Zn | Structure-based algorithm with a range of geometric criteria | c AUC: 0.945 | [ |
| Zincidentifier |
| Zn | A two-step feature selection method based on random forest algorithm | AUC: 0.955, AURPC: 0.829 | [ |
| ZincExplorer |
| Zn | A combination of SVM-, cluster- and template-based predictors | AURPC: 0.907 | [ |
| ZincBinder |
| Zn | SVM model trained on PSSM-based input feature | AUC: 0.91 | [ |
| ZINCCLUSTER |
| Zn | SVM-based Ligand Finder and Cluster Finder algorithms | d MCC: 0.798, F1-score: 0.801 | [ |
| ZnMachine |
| Zn | A combination of several intensively-trained machine learning models | AUC: 0.933 (SVM) and 0.910 (neural network) | [ |
| HemeBIND |
| Fe (heme) | SVM | MCC: 0.504, F1-score: 56.87% | [ |
| SCMHBP |
| Fe (heme) | Based on a newly-developed scoring card method for predicting heme-binding proteins | Accuracy: 85.90% | [ |
| Isph |
| Fe (Fe-S) | A penalized linear model based on machine learning approach | Precision: 87.9%, recall: 80.1% (extended model) | [ |
| MetalPredator |
| Fe (Fe-S) | Integration of existing domain-based methodology with a new approach for discovering metal-binding motifs | Precision: 85.2%, recall: 88.6% | [ |
| MetSite | N/A | Fe, Zn, Cu, Mn, Ca, Mg | Artificial neural network | Mean accuracy: 94.5% | [ |
| FINDSITE-metal |
| Fe, Zn, Cu, Mn, Ni, Co, Ca, Mg | Integration of structure/evolutionary information and machine learning approach (SVM) | Overall accuracy: 70–90% | [ |
| SeqCHED |
| Fe, Zn, Cu, Mn, Ni, Co, Ca, Mg | A modification of the CHED algorithm and machine learning filters (decision tree classifier and SVM) | Sensitivity: 84–85%, selectivity: 82–93% (stringent filtration) | [ |
| MetalDetector |
| Transition metals that use cysteine and histidine as ligands | A combination of different machine learning algorithms (SVM-HMM, structured-output SVM) | Precision: 60–79%, recall: 71–88% | [ |
| MIB |
| Ca, Cu, Fe, Mg, Mn, Zn, Cd, Ni, Hg, Co | Fragment transformation method | Overall accuracy: 92.9–95.1% | [ |
| SECISearch3 and Seblastian | Se | Homology-based RNA motif finding and selenoprotein gene detection approach | Precision: 81.48–100%, recall: 33.33–100% | [ | |
| SelGenAmic | N/A | Se | Selenoprotein gene assembly algorithm based on the GenAmic approach used by geneid | N/A | [ |
| bSECISearch |
| Se | An algorithm for prediction of bacterial selenoprotein genes based on a concensus RNA structural model | True positive rate: 96.5% | [ |
a SVM: support vector machine. b AURPC: area under the recall-precision curve. c AUC: area under the curve. d MCC: Matthew’s correlation coefficient.
Metalloprotein databases.
| Name | Website | Main Content | Ref. |
|---|---|---|---|
| MDB |
| Metalloproteins and metal-binding sites in protein structures | [ |
| Metal-MACiE |
| All metalloenzymes annotated in the MACiE database | [ |
| dbTEU |
| Transporters and metalloproteins for Cu, Mo, Co, Ni, and Se in more than 700 organisms | [ |
| Mespeus |
| Experimentally established geometry of metal and protein interactions | [ |
| MetalPDB |
| Metal-binding sites detected in the 3D structures of biological macromolecules | [ |
| SelenoDB |
| Selenoprotein genes in at least 58 animal genomes | [ |
| ZincBind |
| All known Zn-binding sites from PDB | [ |
Figure 1A general diagram for comparative genomic analysis of metal utilization.
List of known metalloprotein families for several metals in prokaryotes and eukaryotes.
| Metal | Prokaryotes | Eukaryotes |
|---|---|---|
| Cu | Cytochrome c oxidase subunit I | Cytochrome c oxidase subunit I |
| Cytochrome c oxidase subunit II | Cytochrome c oxidase subunit II | |
| Plastocyanin family | Plastocyanin family | |
| Cu amine oxidase | Cu amine oxidase | |
| Cu-Zn superoxide dismutase | Cu-Zn superoxide dismutase | |
| Multicopper oxidase family | Multicopper oxidase family | |
| Tyrosinase | Tyrosinase | |
| Azurin | Galactose oxidase | |
| Rusticyanin | Hemocyanin | |
| Nitrosocyanin | Plantacyanin family | |
| Nitrous oxide reductase | Peptidylglycine α-hydroxylating monooxygenase | |
| Nitrite reductase | Dopamine β-monooxygenase | |
| NADH dehydrogenase 2 | Cnx1G | |
| Particulate methane monooxygenase | ||
| Mo | Sulfite oxidase | Sulfite oxidase |
| Xanthine oxidase | Xanthine oxidase | |
| Dimethylsulfoxide reductase | MOSC-containing protein (mARC) | |
| MOSC-containing protein | ||
| Fe-Mo-containing nitrogenase | ||
| W | Aldehyde:ferredoxin oxidoreductase | N/A |
| Certain members of dimethylsulfoxide reductase: | ||
| Formate dehydrogenase and acetylene hydratase (obligately anaerobic bacteria) | ||
| Formylmethanofuran dehydrogenase (methanogenic archaea) | ||
| Ni | Urease | Urease |
| Ni-Fe hydrogenase | ||
| Carbon monoxide dehydrogenase | ||
| Superoxide dismutase SodN | ||
| Acetyl-coenzyme A synthase/decarbonylase | ||
| Methyl-coenzyme M reductase | ||
| Lactate racemase | ||
| Co | Methylmalonyl-CoA mutase | Methylmalonyl-CoA mutase |
| Isobutyryl-CoA mutase | B12-dependent ribonucleotide reductase class II | |
| Ethylmalonyl-CoA mutase | Methionine synthase | |
| Glutamate mutase | ||
| Methyleneglutarate mutase | ||
| D-lysine 5,6-aminomutase | ||
| Diol dehydratase | ||
| Glycerol dehydratase | ||
| Ethanolamine ammonia lyase | ||
| B12-dependent ribonucleotide reductase class II | ||
| Methionine synthase | ||
| Methyltetrahydromethanopterin:coenzyme M methyltransferase subunit A | ||
| Other methyltransferases | ||
| B12-dependent reductive dehalogenase PceA/CprA | ||
| LitR/CarH/CarA | ||
| PpaA | ||
| Epoxyqueuosine reductase |
Figure 2Distribution of metalloprotein families in the three domains of life. (A) Distribution of cuproproteins in Cu-utilizing organisms; (B) Distribution of molybdoproteins in Mo-utilizing organisms; (C) Distribution of tungstoproteins in W-utilizing organisms; (D) Distribution of Ni-dependent proteins in Ni-utilizing organisms; (E) Distribution of B12-dependent proteins in Co-utilizing organisms. COX I, cytochrome c oxidase subunit I; COX II, cytochrome c oxidase subunit II; MCO, multicopper oxidase; Cu-Zn SOD, Cu-Zn superoxide dismutase; Ndh2, NADH dehydrogenase 2; NiR, nitrite reductase; N2OR, nitrous oxide reductase; CuAO, Cu amine oxidase; pMMO, particulate methane monooxygenase; PHM, peptidylglycine α-hydroxylating monooxygenase; DBM, dopamine β-monooxygenase; GAO, galactose oxidase; DMSOR, dimethylsulfoxide reductase; XO, xanthine oxidase; SO, sulfite oxidase; AOR, aldehyde:ferredoxin oxidoreductase; CODH, carbon monoxide dehydrogenase; SodN, Ni-containing superoxide dismutase; CODH/ACS, acetyl-coenzyme A synthase/decarbonylase; MCR, methyl-coenzyme M reductase; MetH, methionine synthase; RNR II, B12-dependent ribonucleotide reductase class II; MCM, methylmalonyl-CoA mutase; ICM, isobutyryl-CoA mutase; ECM, ethylmalonyl-CoA mutase; EAL, ethanolamine ammonia lyase; DDH/GDH, diol/glycerol dehydratase; 5,6-LAM, D-lysine 5,6-aminomutase; GM, glutamate mutase; PceA/CprA, B12-dependent reductive dehalogenase; MtrA, methyltetrahydromethanopterin:coenzyme M methyltransferase subunit A. Data used to generate this figure are available in the supplementary information of [81,83].
List of known selenoprotein families.
| Prokaryotes | Eukaryotes |
|---|---|
|
|
|
| Formate dehydrogenase alpha subunit | Deiodinase (DIO) family: DIO1, DIO2, and DIO3 |
| Selenophosphate synthetase | Glutathione peroxidase (GPX) family: GPX1, GPX2, GPX3, GPX4, and GPX6 |
| Coenzyme F420-reducing hydrogenase alpha subunit | Thioredoxin reductase (TXNRD) family: TXNRD1, TXNRD2, and TXNRD3 |
| Coenzyme F420-reducing hydrogenase delta subunit | Methionine sulfoxide reductase B1 |
| Methylviologen-reducing hydrogenase alpha subunit | Selenoprotein F |
| Glycine reductase selenoprotein A | Selenoprotein H |
| Glycine reductase selenoprotein B | Selenoprotein I |
| Proline reductase | Selenoprotein K |
| Heterodisulfide reductase alpha subunit | Selenoprotein M |
| Methionine-S-sulfoxide reductase | Selenoprotein N |
| Peroxiredoxin (Prx)Thioredoxin (Trx) | Selenoprotein O |
| Glutaredoxin (Grx) | Selenoprotein P |
| Arsenite S-adenosylmethyltransferase | Selenoprotein S |
| Selenoprotein T | |
|
| Selenoprotein V |
| Thiol:disulfide isomerase-like protein | Selenoprotein W |
| Thiol:disulfide interchange protein | Selenophosphate synthetase 2 |
| HesB-like | |
| Deiodinase-like |
|
| Glutathione peroxidase-like | Methionine-S-sulfoxide reductase |
| Selenoprotein W-like | Protein disulfide isomerase |
| Fe-S oxidoreductase | Selenoprotein J |
| DsbA-like | Selenoprotein L |
| DsrE-like | Selenoprotein U |
| DsbG-like | Selenoprotein E |
| AhpD-like | SAM-dependent methyltransferase |
| Arsenate reductase | |
| Molybdopterin biosynthesis protein MoeB |
|
| Glutathione S-transferase | Prx-like protein |
| COG0737 UshA | Trx-fold protein |
| OsmC-like | Membrane selenoprotein MSP |
| Rhodanase-related protein | SelTryp |
| Sulfurtransferase COG2897 | Other hypothetical proteins |
| Cation-transporting ATPase, E1-E2 family | |
| Methylated-DNA-protein-cysteine methyltransferase | |
| UGSC-containing protein | |
| CMD domain containing protein | |
| Organic mercuric lyase MerB2 | |
| Predicted redox-active disulfide protein 2 | |
| Prx-like/Trx-like/Grx-like and Trx-fold proteins | |
| Other hypothetical selenoproteins |
Figure 3Distribution of the top 20 selenoproteins in Sec-utilizing bacteria. Data used to generate this figure can be found in [114].