Literature DB >> 33502860

InterMetalDB: A Database and Browser of Intermolecular Metal Binding Sites in Macromolecules with Structural Information.

Abstract

InterMetalDB is a free-of-charge database and browser of intermolecular metal binding sites that are present on the interfaces of macromolecules forming larger assemblies based on structural information deposited in Protein Data Bank (PDB). It can be found and freely used at https://intermetaldb.biotech.uni.wroc.pl/. InterMetalDB collects the interfacial binding sites with involvement of metal ions and clusters them on the basis of 50% sequence similarity and the nearest metal environment (5 Å radius). The data are available through the web interface where they can be queried, viewed, and downloaded. Complexity of the query depends on the user, because the questions in the query are connected with each other by a logical AND. InterMetalDB offers several useful options for filtering records including searching for structures by particular parameters such as structure resolution, structure description, and date of deposition. Records can be filtered by coordinated metal ion, number of bound amino acid residues, coordination sphere, and other features. InterMetalDB is regularly updated and will continue to be regularly updated with new content in the future. InterMetalDB is a useful tool for all researchers interested in metalloproteins, protein engineering, and metal-driven oligomerization.

Entities: Chemical Gene Species

Keywords: interfacial metal; interprotein site; metalloprotein; protein assembly; protein−protein interaction

Year: 2021 PMID： 33502860 PMCID： PMC8023803 DOI： 10.1021/acs.jproteome.0c00906

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Nearly every macromolecule in living organisms needs to interact either in a transient or permanent way with another macromolecule to fulfill its function. Taking into account that metal ions are associated with an estimated 30–40% of all proteins,[1,2] often performing essential structural or functional roles, it is no wonder that the areas of macromolecule–macromolecule interaction and metal–macromolecule interaction overlap.[3] What is more surprising is the fact that this area of research remains almost unexplored and our knowledge is only fragmentary. With the growth of identified macromolecules containing metal ions, efforts have begun to identify and differentiate specific characteristics of binding sites that determine the affinity of the metal ion to the site, and its function in the binding sites. Among the first features described were the metal ion-binding ligands and the distinction whether the bonded metal ion has a catalytic or structural function.[4−6] For the most part, the concept of binding metal ions on the interface has escaped researchers’ attention. Although it was described in an extensive review paper in 2014,[3] few preceding reviews mentioned intermolecular zinc binding.[7,8] It is possible that the presence of metal ions on macromolecules’ interfaces has not attracted much attention because of the rarity or instability of this type of interaction, but it might also be due to the great difficulty in testing and investigating intermolecularly bound metal ions, especially with transient character. In addition to developing our knowledge of intermolecular metal ion binding, it is worth noting that the tool we provide can be used for the construction or improvement of existing models that predict metal ion binding by macromolecules. The aggregation of intermolecular metal binding sites in the form of a database, combined with coordination chemistry and statistical models, may facilitate the engineering of artificial macromolecular interfaces involving metal binding.[9] We believe that our very recent contribution in the field of interfacial metal binding together with the presented resource will help researchers to expand knowledge about factors determining interfacial metal binding and its role in biological systems.[10] It seems that, so far, the best-explored and described d-block metal ion found in macromolecules is the zinc ion (formally Zn2+). This is fully understandable, given the prevalence of Zn2+ in the living world—Zn2+ is estimated to occur in about 10% of all human proteins—so it will be used as a background for the comparison of intermolecular ion binding.[11] However, it is important to mention that estimated zinc protein number is based on already known fingerprints found in proteins encoded in the human genome, and this number does not take into account interprotein sites due to the lack of available bioinformatic tools facilitating identification of such sites.[5,10] The first systematic attempt to describe all Zn2+-binding sites in protein structures appeared at the end of 1990. The description of the structures deposited in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB)[12] was rerun several more times, usually without leaving the data deposited in electronic form.[13−16] In comparison, the first review of the biological sites of intermolecular metal binding appeared only in 2014,[3] although two important contributions related to interprotein zinc sites were published earlier, and ours appeared just recently.[7,8,10] Although several electronic resources have made searching for PDBs (of metal containing proteins) possible—MESPEUS,[17] ZifBase,[18] MetalPDB,[19,20] ZincBind[21]—none of them allow for filtering of intermolecular metal binding sites. In order to allow the scientific community to investigate this obscure area and simultaneously efficiently explore the vast amount of structural information, we have aggregated intermolecular binding sites in the entire RCSB PDB database and stored the results in our freely and publicly accessible InterMetalDB database. InterMetalDB is also a browser for deposited structures and offers several useful options for filtering records such as searching for structures by particular parameters, e.g., structure resolution, structure description, and date of deposition. Identified intermolecular binding sites can be filtered by coordinated metal ion, number of coordinating amino acids, coordination sphere, and other features. Nevertheless, records stored in InterMetalDB should be considered with caution. As discussed in our recent review article,[10] interprotein Zn2+-binding sites that are not physiological are quite common. Out of around 600 structures containing interfacial Zn2+ (after redundancy removal and preselection via a Python script), we manually selected around 170 structural complexes that we believe contain intermolecular Zn2+ of physiological importance.[10] Because currently, we do not have any algorithms or tools that allow for precise artifact prediction, we do not filter in any way metal binding sites, thus leaving it to the user’s experience to judge whether a bound metal has a physiological function. The goal of InterMetalDB is to collect and present all intermolecular metal binding sites in the RCSB PDB and allow the user to easily filter and access useful information regarding them. In order to allow this, it contains the newest possible data set of all known intermolecular metal binding sites deposited in the RCSB PDB.[22] InterMetalDB has a user-friendly querying interface and is automatically and regularly updated at https://intermetaldb.biotech.uni.wroc.pl/. The source code for InterMetalDB can be found at https://github.com/jzftran/InterMetalDB/, where it can be viewed, downloaded, and modified under MIT license.

Methods

Acquiring Intermolecular Metal Binding Sites

The RCSB PDB search application programming interface (API) allows one to run queries across RCSB PDB Search Services and retrieve a list of accordant identifiers (e.g., PDB ID) (https://search.rcsb.org/). Structures containing metal elements were acquired from the RCSB PDB,[22] querying for the relevant metal element via RCSB PDB Search API using the chemical component identifier rcsb_chem_comp_container_identifiers.comp_id, which is an exact search attribute, and returns only structures that contain a standalone metal ion, not metal bound by any kind of molecule; i.e., structures containing iron in heme or iron–sulfur clusters are not returned. In the future we hope to broaden our search to molecules like this as well. During database construction, no constraints regarding structure resolution were applied. After acquiring corresponding structural file identifiers, each file was received and processed using the Python parser library atomium, which allows for processing of structural files deposited in the RCSB PDB.[22] When working on PDB files, the coordinate for biological assembly and asymmetric unit are often the same. Nevertheless, for some files there is a difference and some space operations are needed to analyze the biological assembly. The asymmetric unit is the nonreducible (smallest) model of the crystal which, when duplicated and moved by crystal symmetry operation, will produce the unit cell of the crystal, i.e., part of the crystal that is repeated (https://dictionary.iucr.org). The asymmetric unit should not be confused with the biological functional unit, which is the tertiary or quaternary protein structure that is believed to be a functional macromolecule in an organism. Biological assembly is constructed from an asymmetric unit after selecting a subset of the deposited coordinates (biological assembly will be a portion of the asymmetric unit) or selecting a subset of the deposited coordinates and duplicating or applying symmetry operations (e.g., translation, rotation, and their combination). In order to deal with biological assemblies, using assembly instructions given in a structural file, the biological assembly containing the metal element of interest and having the lowest energy (if given in assembly instruction) was chosen for further examination. If no macromolecular binding energy was given in a structural file, the first assembly containing the metal has been selected. Sometimes structural files contain duplicated atoms. This is especially often true for atoms lying on a point of symmetry rotation. In order to deal with this redundancy, duplicated atoms are removed, considering as duplicated atoms those that are within a radius of 1 Å or less than the original atom. Each metal ion from the biological assembly is examined for the surrounding environment in a radius of 3 Å (center-to-center) of the metal ion, and a coordination environment is assumed to include all noncarbon, non-hydrogen atoms. PDB structures are considered to contain an intermolecular metal binding site if the metal ion is bound by at least two amino acid residues or nucleotide residues from at least two different macromolecular chains. For example, if a metal ion is coordinated by three amino acid residues from chain A, and a chlorine ion assigned to chain B, such a metal binding site is not considered as intermolecularly bound. For each coordinating atom a one letter abbreviation of the corresponding residue is used to construct a coordination identifier (e.g., a metal ion coordinated by three cysteinyl residues and one histidinyl residue will have C3H1 as the coordination identifier). A group identifier is constructed in a similar way, but for a radius of 5 Å and without restrictions for atom type. Coordination identifier can be understood as a description of the coordination environment of a metal ion, while a group identifier is a description of all amino acid residues located in a radius of 5 Å of the metal ion. The first identifier allows the user to query for a specific coordination environment, while the latter is used for the purposes of clustering. If a metal ion is coordinated by two or more chains in the way described above, the metal site record is added to the SQLite database (https://www.sqlite.com), together with the oxidation state, coordination identifier, group identifier, number of coordinating amino acid residues, number of all ligands, number of coordinating chains and other information. Metal oxidation state is read directly from the file; no additional steps are taken to determine the oxidation state. Oxidation state should be taken with caution as there is no separate identifier for metals with uncertain oxidation state.

Redundancy Removal, Representative Sites

The RCSB PDB as a worldwide repository for macromolecular structures contains structures of the same macromolecules or highly similar macromolecules. This structure redundancy is caused by representation of different variants of the same macromolecule (various bound ligands or small mutations in structure) or existence of highly homologous macromolecules. Because the RCSB PDB holds a body of data that contains considerable redundancy of structures, the next step for database construction was to identify redundancy and select representative intermolecular metal binding sites. In order to account for this redundancy, a similar approach to MetalPDB[19] has been used. MMseqs2,[23] chain clustering with 50% sequence identity for both query and target, has been used, ensuring that the clusters have the same fold.[24] Metal binding sites may not be unique in structure and may appear many times—an extreme example of this is the structure of rotavirus inner capsid particle (PDB ID: 3KZ4) containing 240 Zn2+-binding sites.[25] In order to group similar binding sites and deal with metal binding sites’ redundancy, in each sequence cluster the binding sites are then themselves clustered based on the group identifier (described above). The first unique metal site of the best-resolution structure is chosen as a representative metal site.

Web Interface

The InterMetalDB database is integrated into a Django-based web application (https://www.djangoproject.com). Metal binding sites and structures are visualized in the web front-end molecular viewer NGL Viewer.[26] The user can filter results with various specific parameters: PDB ID, structure title, keywords, etc. Additionally, one can search for interfacial metal binding sites by coordinating residues, number of coordinating chains, and other parameters. Filtered results can be exported as a CSV or JSON file for further analysis. Statistics for the whole database and for a specific metal can be viewed with the help of the JavaScript library for data visualization Chart.js (www.chartjs.org).

Results and Discussion

User Interface

InterMetalDB can be queried with the web interface at https://intermetaldb.biotech.uni.wroc.pl/ via the Django web application. It allows the user to search for the data by multiple criteria from the PDB and metal sites search sites. PDB files can be queried by title, keywords, resolution, source organism, etc. (Figure ). The database contains all PDB files containing a metal-involved macromolecule–macromolecule interface published in the RCSB PDB so far. Metal sites can be queried by coordinated metal element, types of bound residues, number of bound residues chains, etc. (Figure ). The complexity of the query depends on the user as querying conditions are connected by a logical AND operator. In both cases searches will return a list of records that can be downloaded to a CSV or JSON file, allowing for further analysis. From the list the user can select one PDB or metal site to view more details (Figure , Figure ), PDB records and metal sites records are associated via id, and allow browsing one based on another. After selecting a record, the user can view the visualized structure using NGL Viewer.[26]

Figure 1

Figure 2

Querying of InterMetalDB for metal binding sites. Query fields are connected with logical AND. Every field in the query contains a placeholder that helps the user to fill in the appropriate term. Each result can be viewed separately by clicking on the metal site ID. Obtained records can be sorted by clicking on table title and downloaded to the file of interest.

Figure 3

PDB structure details can be found in top left card. In top right card are links to interfacial metal binding sites in the structure. PDB visualization (bottom) is made with help of NGL Viewer.[26] From this page the user can choose one of the metal binding sites to be viewed in detail.

Figure 4

Interfacial metal site details can be found in top-left card. Representative site and similar sites (if available) can be found in top-right card. From this card the user can choose to view another metal binding site. Visualization of metal site is achieved with NGL Viewer.[26]

Querying InterMetalDB for Protein Data Bank deposited structures of macromolecules. Query fields are connected with logical AND. Every field in the query contains a placeholder that helps the user to fill in the appropriate term. In this case InterMetalDB is queried for PDB title containing “insulin”, gene source organism “Homo sapiens”, PDB classification “hormone”, resolution better than 2.0 A, deposition date between 2015–01–01 and 2020–06–29. Results can be sorted by clicking a title table and downloaded to the file of interest. Querying of InterMetalDB for metal binding sites. Query fields are connected with logical AND. Every field in the query contains a placeholder that helps the user to fill in the appropriate term. Each result can be viewed separately by clicking on the metal site ID. Obtained records can be sorted by clicking on table title and downloaded to the file of interest. PDB structure details can be found in top left card. In top right card are links to interfacial metal binding sites in the structure. PDB visualization (bottom) is made with help of NGL Viewer.[26] From this page the user can choose one of the metal binding sites to be viewed in detail. Interfacial metal site details can be found in top-left card. Representative site and similar sites (if available) can be found in top-right card. From this card the user can choose to view another metal binding site. Visualization of metal site is achieved with NGL Viewer.[26]

Statistics

The Statistics page contains basic information about bond lengths in metal sites between heteroatoms and metal ion, residues creating metal sites, amount of records, types of protein gathered in the database, gene source of observed macromolecules and other information (Figure ). By clicking on the Statistics panel in the header, the user is redirected to nonrepresentative (for the whole data set) data records statistics. Whether representative statistics or statistics for the whole data set are displayed can be changed by clicking at the very top of the web page and choosing the preferred option. Below are placed two drop-down panels, the first of which allows one to choose statistics for a certain metal. The other panel shows coordination identifiers for metals in the database. From the first drop-down panel the user can choose a metal element for which statistics are displayed. In the second panel the user sees the most common coordination identifiers and performs a search of metal sites. The first graph presented on a nonrepresentative data set statistics web page allows one to see how many of all structures deposited in the RCSB PDB contain the metal and how many of them contain the intermolecular metal binding site. These data do not currently include metals bound in any kind of compounds (e.g., iron in heme or iron–sulfur clusters). In the future the database will also be extended in order to contain such structures as well. A web page showing statistics for a representative data set instead of the number of interface-containing PDB files shows the number of representative versus nonrepresentative structures gathered in the database. When viewing statistics for a single metal instead of all metals, a pie chart showing the number of particular metal binding sites versus the number of other binding sites is displayed. The rest of the statistics are the same type. Next to the pie chart is placed a graph that shows the number of structures containing intermolecular metal-binding sites published per year. This graph shows the upward trend reflecting the number of published structures in the RCSB PDB. Below on the left is placed a histogram of bond lengths between heteroatoms and metals in binding sites. This type of evaluation of bond length is best done for the whole data set, because in this case having a representative data set is not important for the precise determination of geometric factors, while a large number of observations and high resolution are important.[27] Structures that were taken into account in order to prepare this graph have a resolution better than 3 Å. The graph shows that nitrogen and sulfur form distinct groups with clearly defined median and narrow distribution, while in the case of oxygen donors, the length distribution is not so compact. This is due to the large variety of metal-binding oxygen donors. The groups that coordinate metal ions through oxygen donors may be different, such as carboxylates derived from asparaginate residues or glutaminate, carboxylates of the protein C-terminus, but also different low molecular weight ligands such as water, organic acids, etc. An additional factor increasing the variation in the case of bond lengths between oxygen donors and metals is the type of metal; for different metals, different bond lengths with the same metals will be observed. This effect is not so well visible in the case of sulfur and nitrogen donors because these are donors for a narrower group of metals. Next to the distribution of bond lengths there is the number of the most frequent coordination identifiers. Both in the case of representative and nonrepresentative data, records with a small number of bound amino acid or nucleotide residues (two or three) will be frequent. In some cases, the coordination sphere in the structure will be filled by low molecular weight ligands, while in other cases they will not be described in the PDB structure for various reasons, including low resolution. Note that there is a high probability that these types of structures will not be physiological. The last graph describing metal binding sites presents information about the occurrence of a certain residue in metal-binding sites. Generally, occurrence of residues in metal-binding sites follows the HSAB (hard and soft acids and bases) concept; thus residues that can coordinate metal via carboxylates will be most present in metal sites containing Ca2+, Mg2+, Na+, K+. Higher occurrence of histidyl residues and acidic residues in Zn2+ coordination may reflect moderate binding affinity and stability of intermolecular binding sites. Next to the chart representing residues in the metal binding site a bar graph showing classification of PDB files deposited in the RCSB PDB is placed. Because of the huge variation of classifiers, making classification and data presentation almost impossible to do, and because enzymes are the most common group in gathered records, we decided to classify PDB files based on enzyme classification. The succeeding graph shows the gene source for structures containing intermolecular metal binding site, roughly reflecting the gene source distribution in the RCSB PDB, meaning that structures containing intermolecular metal binding sites are not particularly represented in a specific organismal group, but rather follow the trend in RCSB. The last graph presented on the Statistics web page informs the user about techniques that have been used to acquire the structural model, which again is consistent with the trend in the RCSB PDB.

Figure 5

General statistics page for whole MPPI database. The database statistics can be viewed depending on whether they are displayed for a representative data set or not; this can be chosen on top of the Web site. Below the option of data set selection there are two drop-down panels, which allow one to select the metal for which data are displayed and to select the coordination identifier for a given data set. By clicking on a specific coordination identifier the user is redirected to the search option. Below there is a set of graphs described in more detail in the text.

Prevalence of Interfacial Metal Binding Sites

Of the 227 854 structures deposited in the RCSB PDB at the time of the last database update (October 30, 2020), 50 565 contain a metal as a standalone ion, while 7854 of them were found to contain a metal-involved interface as nonrepresentative sites. Among 6345 representative metal binding sites gathered in the InterMetalDB database, Ca2+ binding sites are the most common, represented by 1403 sites, followed by Zn2+ (1357 sites) and Mg2+ (1110 sites) (Table ). These three elements are also the most common metal ions in the entire RCSB PDB, and it is no wonder that they will be profoundly represented in InterMetalDB. A lower, but still high, number of protein complex structures contain monovalent Na+ and K+ at the interfaces. Their role is in most examples linked with protein or nucleotide charge compensation and structure stabilization. As a result of the stabilization metal-mediated macromolecule-macromolecule complexes are formed. Interestingly, the high content of iron ions (both Fe2+ and Fe3+) in the RCSB PDB does not correspond to the number in the InterMetalDB. While Fe3+ is present in only 118 representative sites, the Fe2+ ion was found at 23 unique interfaces. It is probably caused by increased likelihood of oxidation at interfaces, but also the fact that iron ions usually do not play a structural role in proteins, but rather catalytic.[28] One additional reason why iron ion representation on the macromolecular interfaces is low, and does not correspond to abundance in RCSB PDB, may be due to querying only for the chemical component identifier, which means that only structures that contain standalone ion metals are returned, i.e., structures containing iron–sulfur clusters, heme, or other similar iron-containing particles are not analyzed.

Table 1

Number of Metal Binding Sites in InterMetalDB for Particular Elementsa

metal ion	representative	nonrepresentative
Ca²⁺	1434	13991
Zn²⁺	1350	6128
Mg²⁺	1104	4248
Na⁺	774	4148
K⁺	413	2703
Cd²⁺	220	2565
Mn²⁺	302	1625
Cu²⁺	168	1193
Fe²⁺	58	1035
Fe³⁺	112	965
Ni²⁺	173	682
Co²⁺	85	519
Au⁺	5	271
Ba²⁺	23	120
Pd²⁺	4	99
Ag⁺	38	82
Hg²⁺	31	54
Rb⁺	5	53
Tl⁺	11	52
Cs⁺	11	48
Cu⁺	22	45
Pt²⁺	12	41
Sr²⁺	18	28
Tb³⁺	1	22
La³⁺	8	20
Li⁺	9	18
Sm³⁺	11	16
Pb²⁺	6	13
Mn³⁺	0	12
Gd	2	6
Ho	2	4
Lu³⁺	3	3
Au³⁺	0	2
Cr³⁺	1	2
Eu³⁺	1	2
Re	1	2
Pr³⁺	2	2
Yb³⁺	1	2
Eu²⁺	1	1
Gd³⁺	1	1
total	6423	40823

The most common interfacial metal binding sites contain Ca2+, Zn2+, and Mg2+.

The most common interfacial metal binding sites contain Ca2+, Zn2+, and Mg2+. Very similar to Fe2+, the presence of Cu+ (22 representative sites) on protein interfaces is rather rare due to its capability for oxidation and lack of structural properties. However, it is worth underlining that Cu+ is cellularly transported between chaperone proteins through the formation of interfacial sites, and therefore the list of interfacial copper sites contains such transport-active complexes.[29] Manganese is present in metalloproteins as Mn2+ and Mn3+ where it serves catalytically and structurally, but interfacial sites contain only Mn2+, and this state is recognized as a structural one. Metal ions such as Cd2+, Hg2+, Co2+, Ni2+, and Ag+ are frequently used as metal probes for Zn2+ or Cu+ and therefore are frequently investigated by structural methods. The question how found interfacial sites probe native sites is rather an individual example and requires solution studies. It was shown that interfacial Hg2+ or Cd2+ in the Rad50 homodimer very well mimics the Zn2+ complex, and they have been used for characterization of the complex.[30,31] The presence of other metal ions in structurally characterized macromolecule-macromolecule complexes is more likely to be linked with a particular interest and can be explored individually by searching in original reports, a list of which can be easily downloaded using InterMetalDB.

Ligands of Interprotein Metal Binding Sites

The most common residue coordinating metal ions in interfaces identified by the InterMetalDB database is an aspartyl residue followed by histidyl residue (Table ). The first one is usually found in sites containing Ca2+, Mg2+ but also Zn2+, Na+ and Mn2+, while the second is more common for zinc sites. Acidic residues, glutaminyl and asparaginyl with histidinyl and threonyl residues account for 72.5% of all amino acid residues in all metal binding sites and 66.4% in the nonredundant data set where in order to remove bias to more often studied macromolecules only representative metal-binding sites are analyzed. It means that binding sites in the nonrepresentative data set are characterized to some degree by smaller variation than a more representative set. Although a cysteinyl residue is found in many physiologically confirmed Zn2+-involved protein–protein complexes, in the whole database it accounts for only 3.73%. One reason why acidic and histidyl residues are frequent in interprotein metal sites is the fact that Ca2+, Mg2+, Na+, K+ are hard acids according to the HSAB concept and Zn2+ demonstrates moderate character, and therefore they prefer coordination of oxygen and nitrogen donors, respectively.[32] Another explanation is linked with the fact that those residues are flexible and have a large size, which allows main chains of interacting protein subunits to have a longer distance without or with minimal conformational change of protein molecules. Moreover, in the case of Zn2+ those residues guarantee moderate stability, which is required for transient sites.[10] This is in contrast to cysteinyl residues, which are closer to each other at metal interfaces and require a more significant change of protein structure upon metal binding, increasing the thermodynamic stability of such a site.[33,34]

Table 2

Most Common Amino Acid Residues Found in the Metal Sites Located at Macromolecular Interfacesa

representative		nonrepresentative
residue	count	residue	count
Asp	4576	Asp	33468
His	3558	His	25297
Glu	3442	Glu	23250
Asn	922	Gln	7685
Gly	891	Asn	6587
dG	819	Thr	5755
other	7183	other	36877

Residue is considered to be bound to metal if any heteroatom (e.g., oxygen, nitrogen) is in radius 3 Å or less from metal. In order to see the detailed distribution of the residues based on bound metal, please visit the statistics web page of InterMetalDB. In the nonrepresentative set the most common number of bound ligand donors is three, followed by four and two. In the representative set, the most common number of bound ligand donors is two, followed by four and three, corresponding to 30.1, 29.4, and 29.0% of all sites (Table ). Interestingly, the higher coordination number in intermolecular sites is relatively low and accounts for 5.3, 4.8, 0.1, and 1.2% in the case of five, six, seven, and eight donors, respectively. The largest number of sites with six donors was identified for Ca2+ while K+ demonstrates the largest tendency to form sites with eight donors. Detailed information on the number of ligand donors of a particular metal ion is presented in Figure S1 and Figure , for nonrepresentative and representative data sets, respectively. It is worth underlining that the number of donors bound to various metal ions depends on their chemical features according to bioinorganic rules, but metal sites in X-ray structures may differ from those present in the solution.[35] The high representation of such a number of low-filled coordination spheres can be explained by unresolved crystal structures of low molecular weight ligands such as water molecules and others. While probably most of these sites are not physiological, or metal binding affinities to such sites are extremely weak, we have not decided to remove such sites from InterMetalDB, since there may be sites that are physiologically important. An example of this is the structure of P. furiosus Rad50’s zinc hook domain (PDB ID: 6ZFF), with a not fully resolved Zn2+-coordination sphere.[10,36]

Table 3

Number of Metal Binding Sites Containing a Specific Number of Residues Coordinating Metala

no. of donors	representative	nonrepresentative
2	1883	9560
3	1865	13794
4	1924	12266
5	349	3166
6	309	1358
7	9	57
8	84	622

Precise distribution of donors over metal can be found in Figure S3.

Figure 6

Number of donors in a metal site depending on a metal ion plotted for a representative data set. The most common places are those that have the number of donors in the range between 2 and 4. Data for lanthanides are presented in Supporting Information (Figure S1). Precise distribution of donors over metal can be found in Figure S3. The most common assembly in InterMetalDB is the association of the smallest possible number of macromolecules at the metal interface, that is two, accounting for 86% of all representative interfaces. Subsequent numbers of metal-bound chains, that is three and four, correspond to 9.2% and 4.4% of all representative interfaces, respectively (Table ). Detailed information on the number of chain ligands depending on metal ions is presented in Figure S2 and Figure , for nonrepresentative and representative data sets, respectively. The macromolecular interfaces gathered in InterMetalDB involving metal binding occur in nucleic acid molecules, proteins and also between nucleic acid and protein (e.g., the binding of catalytic Ca2+ by Hinc II restriction endonuclease (PDB ID 1TW8).[37] The largest number of chains to be bound is the complex of K+ with nucleic acid creating the i-motif (PDB ID 1V3P).[38] In the case of nucleic acids, metal ions participate in the stabilization of G-quadruplexes and i-motif DNA structures. While different types of cations will promote the formation of G-quadruplex structures, starting with bivalent ions such as Ba2+ (PDB ID: 4U92)[39] or Pb (PDB ID: 6A85),[40] the physiologically relevant G-quadruplex structures will be Na+ and K+.[41] Although the coordination number correlates with the number of ligands to a certain degree, the most important factor deciding on the quantity of macromolecules at the interface is the number of donors coordinated to the metal ion from a particular ligand (chain).

Table 4

Number of Metal Sites Containing a Certain Number of Chainsa

no. of metal sites	representative	nonrepresentative
2	5501	31549
3	605	5789
4	294	3290
5	9	174
6	12	15
8	2	6
total	6423	40823

The most binding sites are created by two chains.

Figure 7

Number of chains creating metal sites, depending on a bound metal ion. Graphs are prepared for representative data set. Formation of an intermolecular metal ion binding site occurs between two macromolecule chains. Data for lanthanides are presented in Supporting Information (Figure S2).

Comparison with Other Databases

Integration of structural information about metalloproteins provides the basis for utilization of metal ions and their roles in proteins. It is no wonder that in recent years several databases aggregating metalloproteins have been provided. Nevertheless, some of them are no longer maintained or even accessible, e.g., MDB (Metalloprotein Database and Browser),[42] Mespeus,[17] or dbTEU.[43] Unfortunately, available electronic resources that are regularly updated (MetalPDB,[19,20] ZincBind[21]), although providing a user-friendly interface, do not allow for filtering for metal ions that are bound at macromolecular interfaces. Furthermore, MetalPDB records are based mostly on asymmetric units and ZincBind provides only information of proteins that bind zinc. While ZincBind seems to be updated weekly or monthly, MetalPDB is not updated so often; the last update, as of the time of writing, was 2019–09–18. Both resources are a good resource of knowledge about metalloproteins. MetalPDB contains structures with intermolecularly bound metal but does not have a function to query for such records. An additional obstacle that makes MetalPDB not suitable to find intermolecularly bound metals is the fact that records in MetalPDB are mostly based on asymmetry units. In the case of examining intermolecularly bound metal ions, this is extremely important, as the metal ions bound in this way will often be bound on the surface of the chains, which will only create an interface after constructing a biological assembly, as exemplified by the human rhinovirus 16 coat protein structure (PDB ID: 1AYM),[44] in which the zinc is located on the interface created by the five chains, but this is only visible in the biological assembly. ZincBind overcomes this obstacle by aggregating data that are based on the biological assembly. In addition, ZincBind offers a much friendlier record search interface and a GraphQL application programming interface that allows programmatic access to the aggregated data. MetalPDB allows for downloading only partial information from its database, and download of a 5 Å-radius cut-out of the structure around the metal ion. Currently, in the case of InterMetalDB, data can be retrieved from the site after prior filtering. All updated resources allow one to view directly the structure of the selected record, although by using different front end viewers, JSmol in the case of MetalPDB, and NGL Viewer in the case of both ZincBind and InterMetalDB. Both InterMetalDB and other available resources contain web pages allowing quick insight into general statistics of records contained in the database. All these statistics relate to the interaction of metal ions with proteins and nucleic acids, although each database gives an insight into a slightly different part of this field, because ZincBind focuses only on the interaction of macromolecules with zinc ions, MetalPDB aggregates all records containing the metal, while InterMetalDB focuses only on structures that contain intermolecularly bound metal, which is why the statistics provided may differ. This difference may be mainly seen in terms of what residues will be involved in the metal ion binding and what the coordination identifier will be, and this seems to be related to the fact that the amino acid residues forming intermolecular metal binding sites must have slightly different properties. As it has been discussed, in the case of Zn2+ residues, creating an intermolecular binding site will provide the moderate stability needed for transient binding and enabling association and dissociation.[10] InterMetalDB allows for advanced search of structures and metal binding sites in a very similar way to the databases discussed here, except for one function. The function, which is not yet implemented in InterMetalDB, is searching for structures by a sequence. In the future InterMetalDB will also be extended with this feature as well. So rather than replacing those existing databases, InterMetalDB aims to complement existing resources, providing the possibility for advanced searching of intermolecular interfaces.

Conclusions

InterMetalDB has been created in order to provide a resource that identifies and aggregates all metal ions involved in macromolecular interfaces from the RCSB PDB. Although other databases also contain this type of interaction, none of them allows for filtering of such records. InterMetalDB is the first database strictly focused on aggregating and searching for this type of metal binding sites. The database is updated on a regular basis and allows for the retrieval of searched results in different forms. The InterMetalDB clusters intermolecular metal binding sites in accordance with 50% sequential similarity of a given molecule and the nearest metal environment, then the representative site is selected on the basis of the best resolution of the examined structure. No restraints on structure resolution were applied during data acquisition, and structures included in the InterMetalDB are based on biological assemblies (described in PDB files). The web interface allows for searching, browsing and downloading the data. Query filters allow for filtering based on structure quality, deposition date, as well as other parameters such as number of ligands, number of coordinating chains, etc. InterMetalDB gives insight into interfacial metal binding, additionally serving as a useful resource for researchers willing to develop machine learning models predicting macromolecular interactions and involvement of metal ions in such processes. We believe that the data set contained in InterMetalDB will be helpful to other researchers interested in interfacial metal binding, metal-induced protein polymerization, aggregation, nanoparticle creation, and metalloprotein engineering and will boost research in those fields. In the future the resource as well as the web interface will be expanded as needed. InterMetalDB can be accessed at https://intermetaldb.biotech.uni.wroc.pl and the source code can be viewed and downloaded at https://github.com/jzftran/InterMetalDB.

39 in total

1. Zinc coordination spheres in protein structures.

Authors: Mikko Laitaoja; Jarkko Valjakka; Janne Jänis
Journal: Inorg Chem Date: 2013-09-23 Impact factor: 5.165

Review 2. Relationship between the architecture of zinc coordination and zinc binding affinity in proteins--insights into zinc regulation.

Authors: Tomasz Kochańczyk; Agnieszka Drozd; Artur Krężel
Journal: Metallomics Date: 2015-02 Impact factor: 4.526

3. The human iron-proteome.

Authors: Claudia Andreini; Valeria Putignano; Antonio Rosato; Lucia Banci
Journal: Metallomics Date: 2018-09-19 Impact factor: 4.526

Review 4. Galvanization of Protein-Protein Interactions in a Dynamic Zinc Interactome.

Authors: Anna Kocyła; Józef Ba Tran; Artur Krężel
Journal: Trends Biochem Sci Date: 2020-09-18 Impact factor: 13.807

Review 5. Interfacial metal coordination in engineered protein and peptide assemblies.

Authors: Pamela A Sontz; Woon Ju Song; F Akif Tezcan
Journal: Curr Opin Chem Biol Date: 2014-01-07 Impact factor: 8.822

6. The refined structure of human rhinovirus 16 at 2.15 A resolution: implications for the viral life cycle.

Authors: A T Hadfield; W m Lee; R Zhao; M A Oliveira; I Minor; R R Rueckert; M G Rossmann
Journal: Structure Date: 1997-03-15 Impact factor: 5.006

Review 7. Zinc coordination sphere in biochemical zinc sites.

Authors: D S Auld
Journal: Biometals Date: 2001 Sep-Dec Impact factor: 2.949

8. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data.

Authors: Helen Berman; Kim Henrick; Haruki Nakamura; John L Markley
Journal: Nucleic Acids Res Date: 2006-11-16 Impact factor: 16.971

9. MetalPDB: a database of metal sites in biological macromolecular structures.

Authors: Claudia Andreini; Gabriele Cavallaro; Serena Lorenzini; Antonio Rosato
Journal: Nucleic Acids Res Date: 2012-11-15 Impact factor: 16.971

10. MetalPDB in 2018: a database of metal sites in biological macromolecular structures.

Authors: Valeria Putignano; Antonio Rosato; Lucia Banci; Claudia Andreini
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

6 in total

Review 1. The Role of the Metabolism of Zinc and Manganese Ions in Human Cancerogenesis.

Authors: Julian Markovich Rozenberg; Margarita Kamynina; Maksim Sorokin; Marianna Zolotovskaia; Elena Koroleva; Kristina Kremenchutckaya; Alexander Gudkov; Anton Buzdin; Nicolas Borisov
Journal: Biomedicines Date: 2022-05-05

Review 2. A Comprehensive Review of Computation-Based Metal-Binding Prediction Approaches at the Residue Level.

Authors: Nan Ye; Feng Zhou; Xingchen Liang; Haiting Chai; Jianwei Fan; Bo Li; Jian Zhang
Journal: Biomed Res Int Date: 2022-03-31 Impact factor: 3.411

3. The zinc proteome of SARS-CoV-2.

Authors: Claudia Andreini; Fabio Arnesano; Antonio Rosato
Journal: Metallomics Date: 2022-07-25 Impact factor: 4.636

Review 4. Structural Bioinformatics and Deep Learning of Metalloproteins: Recent Advances and Applications.

Authors: Claudia Andreini; Antonio Rosato
Journal: Int J Mol Sci Date: 2022-07-12 Impact factor: 6.208

5. Relations between Structure and Zn(II) Binding Affinity Shed Light on the Mechanisms of Rad50 Hook Domain Functioning and Its Phosphorylation.

Authors: Józef Ba Tran; Michał Padjasek; Artur Krężel
Journal: Int J Mol Sci Date: 2022-09-22 Impact factor: 6.208

Review 6. The Mechanism of Metal Homeostasis in Plants: A New View on the Synergistic Regulation Pathway of Membrane Proteins, Lipids and Metal Ions.

Authors: Danxia Wu; Muhammad Saleem; Tengbing He; Guandi He
Journal: Membranes (Basel) Date: 2021-12-15

6 in total