Literature DB >> 29541002

Distinguishing Metal-Organic Frameworks.

Senja Barthel¹, Eugeny V Alexandrov^2,3, Davide M Proserpio^2,4, Berend Smit¹.

Abstract

We consider two metal-organic frameworks as identical if they share the same bond network respecting the atom types. An algorithm is presented that decides whether two metal-organic frameworks are the same. It is based on distinguishing structures by comparing a set of descriptors that is obtained from the bond network. We demonstrate our algorithm by analyzing the CoRe MOF database of DFT optimized structures with DDEC partial atomic charges using the program package ToposPro.

Entities: CellLine Chemical Disease Gene Species

Year: 2018 PMID： 29541002 PMCID： PMC5843951 DOI： 10.1021/acs.cgd.7b01663

Source DB: PubMed Journal: Cryst Growth Des ISSN： 1528-7483 Impact factor: 4.076

Introduction

A primary concern of materials science is the discovery of new materials and the prediction and understanding of their properties. With steadily increasing computer power, computational studies have become an inevitable tool for both analysis and prediction of materials. Large databases contain not only naturally occurring[1,2] and synthesized materials but also thousands upon thousands of structures that are generated in silico.[3−10] These databases provide the ground for computational studies, in particular screening studies to identify interesting materials for different applications.[3,11−15] Less known is that these databases, as we will demonstrate below, can contain many variations of the same structure. Clearly, one would like to avoid spending valuable resources on studying similar structures but, more importantly, having an unspecified number of duplicated structures will make the statistics of any screening study unreliable. Therefore, developing a systematic methodology to identify whether two deposited structures are duplicates not only is an important fundamental question but also is of practical importance. This is in particular the case if the number of structures is so large that manual inspection is out of the question. To illustrate our approach of comparing structures, we focus on a popular class of materials called metal–organic frameworks (MOFs).[16] These are potentially porous 3D, 2D, and 1D crystalline materials, which consist of metal nodes connected by organic ligands.[17−19] MOFs have gained much attention during the past decade due to their huge variety. By changing a metal type or substituting the functional group of an organic linker, one can in principle systematically change the properties of a known MOF. This not only makes MOFs and related nanoporous materials such as COFs, ZIFs, PPNs, etc. intriguing material classes for basic research but also suggests them for many potential applications, ranging from gas separation and storage to sensing and catalysis.[16,20−25] For complex compounds such as MOFs, we have to be careful how to define two materials as being equivalent, since similarities exist on different levels. For example, if two crystals do not have the same space groups or similar lattice parameters, they are considered as different materials from a strict crystallographic viewpoint and are listed as two separate entries in most databases. However, from a MOF point of view two structures are considered identical if they share the same bond network, with respect to the atom types and their embedding: i.e., if two structures can in principle be deformed into each other without breaking and forming bonds. We do not consider a particular MOF as a new material after, for example, rotating a ligand. However, such a small change can change the space group and hence can be reported as a new material in these databases. There exist several algorithms to compare crystals, but either they are restricted to structures with the same space group[26−28] or they evaluate the differences between atomic positions,[29,30] which is useful to detect small differences between crystals due to slightly different experimental conditions. However, while the traditional crystallographic approaches are important for solid-state chemistry, the unit cells of porous MOFs and related materials are much larger and are filled with solvents. This often causes substantial deviations of the crystal parameters for the activated evacuated material and the representatives with guest molecules,[31] and a different method is required to compare MOFs. Most synthesized MOFs are deposited in the Cambridge Structural Database (CSD).[32] These materials often contain remaining solvent molecules, as do their structural files in the CSD. If a material is experimentally obtained under changed conditions, the remaining solvent molecules can differ, ligands can be differently aligned, and the unit cells can be distorted with respect to each other. All these versions of a material are stored independently in the CSD, and different versions of one MOF can have different chemical and physical behavior such as the narrow- and large-pore versions of the highly flexible MIL-53. However, from a fundamental point of view, one is often interested in understanding the properties of the underlying framework: i.e., the material without solvent molecules that are not believed to be part of the true framework. Before computational studies are performed, structures are usually “cleaned”: i.e., solvents are artificially removed and disorders often neglected. This leads to duplications in the resulting databases since many materials, in particular those on which considerable experimental efforts have been spent, are reported in numerous variations: the CSD contains for example more than 50 structures that all describe the famous CuBTC.[33] Clearly, if the number of duplicates is this large, it will bias these databases. Another postprocess that can cause multiple entries is relaxation: both experimentally known and hypothetical structures are often relaxed to obtain well-defined and energetically most stable representations of the materials before they are studied by simulations. Since it is impossible to ensure that an energetic minimum is global, it is possible that different relaxations find varying local minima that lead to multiple entries in a database. In this article, we show how to systematically find topological duplicates in these material databases. We demonstrate how to compare frameworks of MOFs, but a small variation of the algorithm can also consider other classes of materials such as molecular crystals by considering the patterns of hydrogen bonds and van der Waals interactions. Similarly, it is possible to distinguish different versions of flexible MOFs by including van der Waals interactions in the bond network. In a representative study, we analyze a subset of the so-called “computationally ready” MOFs of experimentally known structures (CoRe MOF database): namely, the database of 502 frameworks[33] (502 CoRe MOF database) that contains the structures of that are relaxed using density functional theory (DFT) and to which density derived electrostatic and chemical (DDEC) partial atomic charges are assigned. The files stored in the CoRe MOF database are mainly derived from the CSD by removing solvents and sometimes adding missing hydrogens. The results are of interest in their own right, since this database is frequently used for screening studies. Alternative databases of cleaned MOFs can be obtained by applying the MOF detection and the user-adopted solvent-removal algorithms that have been made available by the CSD.[34] The prospective generation of databases of existing and new MOFs made the development of a tool for removing duplicates relevant and urgent. The issue of duplication in databases is well-known. For example, the authors of the CoRe MOF databases already eliminated some duplicates: two cleaned CSD structures were considered equivalent if they share the numbers and type of atoms and if the root-mean-square deviation of the atomic positions of their Niggli cells is smaller than 0.1A.[35] While this approach is intuitive, it is neither necessary nor sufficient to determine duplicates. Clearly, all duplicates have the same number of atoms and atom types. However, the atomic positions can vary largely between different representations of the same material. Indeed, we still find many duplicates in the CoRe MOF databases. The fundamental problem is, that allowing larger root-mean-square deviation does not address the problem of detecting duplicates correctly. Increasing the limit allows us to find more duplicates but also falsely identifies more nonidentical structures as duplicates. We present a systematic and rigorous way of distinguishing structures that describe different materials, by introducing a set of descriptors that each give the same value for identical structures (invariants). Therefore, two structures with different descriptors are necessarily different. According to our notion of equivalence, atom types and atom numbers as well as all properties that are derived from the graph describing the bond network are invariants. We consider the following invariants: atom types, ligand graph, ligand coordination mode, and properties derived from the bond network and from several of its simplified versions, such as the dimensionality of the net, its topological indices, and possible interpenetration. In contrast to atomic positions, symmetries, cell parameters, volumes, or surface areas, our methodology is independent from distortion, which makes it very robust and reliable. Our set of invariants does not provide a complete invariant, meaning that there might exist different structures that cannot be distinguished by the set of descriptors. Such an example would be a pair of structures whose bond networks were practically indistinguishable by their topological indices (e.g., net topology, vertex symbol, point symbol[36]), which is the case for stereoisomers. Excluding one couple of enantiomers, we have not come across an example in the 502 CoRe MOF database where our invariants wrongly identify two structures as identical. All analyses have been performed using the software package ToposPro.[37] We found that the 502 CoRe MOF database of 502 relaxed structures with DDEC partial atomic charges contains 48 structures with duplicates, some of them being reported several times, leading to 78 redundant entries. MOF-5 is the most often listed structure with 17 entries.

Similarities of Reported Materials

Given the large number of deposited structures, it is inevitable to use an algorithm to automatically detect similar structures and duplicates. The results of our representative study of the 502 CoRe MOF database, the CSD refcodes of each structure (with all bibliographic references), all chemical data, the analyses of the nets, and a list of all duplicates is given in the Supporting Information. At present, we often rely on visual inspection to determine whether a newly reported crystal structure is similar to one of the existing materials, which is, given the ever-increasing number of reported MOF structures, close to impossible. Interestingly, even for a single pair of frameworks visual inspection might not be sufficient to determine with confidence whether they are identical or not. To illustrate this point, we consider three structures with the simple composition [Li(isonicotinate)], XUNGOD, XUNHAQ, and XUNGUJ, which contain only C, H, Li, N, and O atomic species and which all form three-dimensional porous networks. The experimental structures contain different solvent molecules (morpholine, N-methylpyrrolidinone, and dimethylformamide, respectively) and have different space groups (P1, P21, and P21/n). They are reported in the same publication[38] and are correctly stored as different structures in the CSD. Figure shows a striking similarity, and one could easily conclude that the frameworks have the same topology.

Figure 1

XUNGOD (left) and XUNGUJ (right) in [100] projection (top) and [010] projection (bottom). The two frameworks have different topologies, as can be seen by simplifying the adjacency matrix.

XUNGOD (left) and XUNGUJ (right) in [100] projection (top) and [010] projection (bottom). The two frameworks have different topologies, as can be seen by simplifying the adjacency matrix. Indeed, the authors assigned to all frameworks the same topological type sra (with Li2O2 dimers as 4-c node), without naming it. (We use the RCSR three letter names[39] for net topologies, when available, or else ToposPro TTD names.[40]) However, the ligands of XUNGOD have a connectivity to the metals different from that of the ligands of XUNHAQ and XUNGUJ. All three frameworks contain infinite rod-shaped structural units aligned parallel to each other, which is clearly seen after removing dangling atoms (1-c vertices) and suppressing 2-coordinated atoms (2-c vertices). (Figure ).

Figure 2

Underlying nets after simplification of 1-c and 2-c vertices, grown up to the fifth coordination sphere around the O atom for XUNGOD (left) (CS: 3,7,14,26,40) and XUNGUJ (right) (CS: 3,7,14,26,42). The central O atom is marked in red, yellow balls are vertices belonging to the second to fourth coordination spheres, and green balls denote vertices of the fifth coordination sphere. The analysis of the coordination sequence (CS) of atoms in the simplified net shows that XUNHAQ and XUNGUJ have the same CS for all atoms and share the net topology, while the CS of XUNGOD is different. For example, the CS of the O atom differs for the fifth coordination sphere (Figure ). The frameworks of XUNHAQ and XUNGUJ are duplicates but are consequently different from XUNGOD. These subtleties cannot be found by visual inspection but are only detectable by a more sensitive graph analysis using the simplified adjacency matrix and topological indexes such as the coordination sequences (CS). An example of two structures that have identical frameworks is the pair AMILUE and AMIMEP,[41] two versions of [Zn4(urotropin)2(2,6-naphtalenedicarboxylato)4]. They arise from a study of different framework–host interactions: AMIMEP contains guest ferrocene molecules that are not present in AMILUE. However, the frameworks (Figure ) are too complicated to be reliably identified as identical by visual inspection, which is additionally hindered by the difference in the cell parameters and a shift of the unit cells.

Figure 3

AMILUE (left) and AMIMEP (right) in [001] (top) and [100] (middle, bottom) projection. The cleaned frameworks are identical.

AMILUE (left) and AMIMEP (right) in [001] (top) and [100] (middle, bottom) projection. The cleaned frameworks are identical. Two identical frameworks of [Zn3(bpdc)3bpy] (bpdc2– = biphenyldicarboxylate dianion, bpy = 4,4′-bipyridine), which were originally reported as two different structures, are HEGJUZ[42] and XUVHEB.[43] The two publications do not refer to each other. This is not surprising, since HEGJUZ has space group P21/n and some disorder on the solvated dimethylformamide, while XUVHEB has space group Pbcn and no disorder on the solvate molecules but instead contains two additional uncoordinated waters (Figure ).

Figure 4

Frameworks of HEGJUZ (left) and XUVHEB (right) in [010] projection. HEGJUZ and XUVHEB only differ in water clathrates and a disorder of HEGJUZ. The cleaned frameworks are identical.

Frameworks of HEGJUZ (left) and XUVHEB (right) in [010] projection. HEGJUZ and XUVHEB only differ in water clathrates and a disorder of HEGJUZ. The cleaned frameworks are identical. Finally, we illustrate that an analysis of the net topology alone is also not sufficient in general to distinguish frameworks, since frameworks with different composition can share their net topologies. Clearly, substituting one atom type with another will change the structure but not the net. An example is IBICED[44] (or its analogue IBIDAA[44]), which differs from IBICAZ[44] only by the type of halogen atom in the [Zn(Hal)(mpmab)] framework (Figure ). A more complex reason for two different structures to share the same net can be that they are formed from enantiomeric ligands. An example is IBICON, which is the enantiomeric isomer of IBICED and IBIDAA. While IBICED and IBIDAA are constructed with the chiral L ligand and belong to the chiral space group P61, IBICON has space group P65 using the D ligand. Comparing the space groups of chiral structures (e.g., P61 and P65) will tell enantiomeric pairs apart, but this is a difficult task for frameworks taken from the CoRe MOF databases, since all relaxed structures are stored in the space group P1 and the original information on the space group is lost.

Figure 5

Identical frameworks of IBICED (top left) and IBIDAA (bottom left). IBICON (top right) is their mirror image. IBICAZ (bottom right) is only distinguished from IBICED and IBIDAA by the atom types: Br (orange balls) is substituted by Cl (green balls).

Methods

To automatically search for duplicates, we first compare the atom types of networks and the composition and the graph of linkers and subsequently analyze topological properties of the bond network and its simplifications as described below. This analysis is very robust in distinguishing networks of different topologies as well as in detecting skeleton isomers. In principle, it is also possible to find stereoisomers (enantiomers, cis/trans isomers, conformers) using information about crystal symmetry and geometrical fingerprints.[45,46] The bond network of a structure is the graph whose vertices correspond to the atoms and whose edges correspond to interatomic bonds. A network, net, or graph is a particular combinatorial structure that consists of vertices and edges attached to the vertices. The degree of a vertex is the number of end points of edges connected to it. The degree of a vertex corresponds to the coordination of an atom. The bond network is equivalent to the adjacency matrix of a structure: i.e., the matrix that lists all atoms and the bonds between them. An underlying net of a structure is a simplified version of the bond network. It is constructed by adding a vertex for each structural group and connecting a pair of vertices with an edge if the corresponding structural groups have a bond between them.[40,47] We perform three different simplifications on the bond network, which we further analyze. They are illustrated for MOF-5 in Figure a.

Figure 6

Simplifications of MOF-5 (SAHYOG[48]): (a) original MOF-5; (b) simplified adjacency matrix, net topology mof; (c) standard simplification, net topology fff; (d) clusters of the cluster simplification; (e) cluster simplification, net topology pcu; (f) 2-fold interpenetrated version of MOF-5 (HIFTOG[49]).

The simplified adjacency matrix is obtained by deleting isolated and dangling atoms and suppressing atoms that have only two bonds. Every vertex of the underlying net with degree 1 is removed together with its adjacent edge, and edges with an end point of degree 2 are contracted iteratively until the minimal degree of the graph is 3 (the resulting graph is independent of the order in which the deletions and edge contractions are performed) (Figure b). The standard simplification considers metal atoms and organic ligands of a MOF as its structural units and substitutes the atoms of each ligand by one dummy atom, usually placed at the center of mass. In more general terms, anything that is not a metal is contracted to its center of mass. That applies not only to organic ligands but also to single nonmetal atoms, such as oxygen, halogen, or multiatomic noncoordinated species (anion, cation, solvent) (Figure c). The motivation of the cluster simplification is to recognize clusters of atoms by decomposing the structure into pieces with high connectivity. For each bond, the smallest ring of bonds is found that contains the bond. The ring sizes are sorted by increasing values into the sequence a1 ≤ a2 ≤ ... ≤ a, where N is the number of bonds in the structure. If the sequence contains a pair a, a such that a – a > 2, bonds whose smallest rings are formed with fewer than i + 1 bonds are considered to belong to a cluster while the other bonds connect two clusters (Figure d). The cluster simplification for i is obtained by substituting each cluster with a dummy atom and keeping the bonds between clusters (Figure e). If there exist several gaps in the sequence a, the structure permits several different cluster simplifications and one cluster simplification is obtained for each index. Note that identical structures have the same sets of cluster simplifications. Simplifications of MOF-5 (SAHYOG[48]): (a) original MOF-5; (b) simplified adjacency matrix, net topology mof; (c) standard simplification, net topology fff; (d) clusters of the cluster simplification; (e) cluster simplification, net topology pcu; (f) 2-fold interpenetrated version of MOF-5 (HIFTOG[49]). An unlimited number of simplifications can be performed on top of each other, and it clearly matters in which order the simplifications are performed. However, only finitely many nonidentical simplified nets of a given structure can be obtained, as at some point it is impossible to further simplify a net. To facilitate the analysis of the network topology, we perform an adjacency matrix simplification on top of both the standard simplification and the cluster simplification. While the topology of the net obtained from simplifying the adjacency matrix is often too specific to match one of the common three-letter topologies, the net obtained by the cluster simplification is the most simplified one and usually carries the topology that is commonly assigned to a structure. For example, the topology of the net obtained by simplifying the adjacency matrix of MOF-5 got its own name mof only because it is such a famous structure. However, one would usually consider MOF-5 to be of primitive cubic topology pcu, which indeed is the topology of the net obtained by performing a cluster simplification and subsequently simplifying the adjacency matrix. Simplifying the adjacency matrix of the standard simplified MOF-5 yields a net with topological type fff. The standard and cluster descriptions coincide in many cases (239 from 488 structures in the 502 CoRe MOF database: 49%): namely, if the structure building unit is a single metal atom and the ligand is not branched, which prevents the underlying net from splitting into several vertices with degree greater than 2. For example, both simplified versions of [Cd(isonicotinate)2] AVAQIX[50] have dia (diamond) topology. The topological type of a net is a invariant, as are (extended) point and vertex symbols. These are weaker notions than the net topology,[36] but the combination of the extended point symbol and the vertex symbol is in praxis able to distinguish different topologies. If a topology is not identified because it is not contained in the ToposPro database of topologies, the point symbol and vertex symbol can still be used to compare two structures. However, two nets with the same net topology might have different structural building units. For example, KAYBIX and KAYBUJ[51] have the same composition C7CaH3NO4, and their standard simplified nets both have 5,5T7 topology. However, they are not duplicates since their ligands are isomers: pyridine-2,5-dicarboxylate anion and pyridine-2,4-dicarboxylate anion, respectively. Such a difference can be detected by comparing the graphs of ligands, which were analyzed by computing the coordination modes of ligands and metals following the approach of Serezhkin et al.[52] A difference in one of the obtained graph descriptors, namely the coordination mode of the ligand (in brackets) and an identifier for their composition (in braces), is sufficient to conclude that two structures are chemically different. Examples are given in Table . Coordination isomers and illustrated in Figures SI1 and SI2.

Table 2

Compounds with the Same Stoichiometric Compositions and Ligands but Different Modes of Ligand Coordination

compound	refcode	ligand
[Cd₃(μ₆-biphenyl-3,4′,5-tricarboxylato)₂]	HEKTUO	C₁₅H₇O₆[G42]{196}
	QEKLID, QEKLID01	C₁₅H₇O₆[G51]{196}
[Y(benzene-1,3,5-tricarboxylato)]	SEHTEF	C₉H₃O₆[G22]{158}
	LAVSUY	C₉H₃O₆[G42]{158}
	NADZEZ	C₉H₃O₆[G6]{158}
[Y₂(terephthalate)₃]	LAGNOY	C₈H₄O₄[K22]{78}
		C₈H₄O₄[K4]{78}
	LAGNUE	C₈H₄O₄[K4]{78}
[Y₂(pyridine-3,5-dicarboxylato)₃]	SERJUV	C₇H₃NO₄[K22]{290}
		C₇H₃NO₄[K31]{290}
		C₇H₃NO₄[K4]{290}
	SERKEG	C₇H₃NO₄[K22]{290}

The topological type of a framework contains no information on interpenetration, but ToposPro is able to determine the degree of interpenetration. We add this check to our analysis and distinguish differently interpenetrated versions of a structure. For example, HIFTOG[49] is a 2-fold interpenetrated version of MOF-5 (Figure f). It is also possible to detect rare cases of entanglement isomers by using the extended ring net.[53,54] To analyze the 502 CoRe MOF database, we performed the steps given below. They turned out to give a test, which is not only sufficient but also necessary to distinguish MOFs up to enantiomers. We did not compare the exact number of atoms, since the CoRe MOF database contains structures given in multiples of the unit cell (e.g., Figure a,b), but the ratios between elements and between central atoms and ligands were determined. At each step, uniquely determined structures were filtered out and sets of indistinguishable structures compared during the following steps.

Figure 7

(a) SAKRED and SEFBOV, (b) KAXQOR and ZERQOE, and (c) GOMRAC and GOMREG are duplicates in the 502 CoRe MOF database since their physical structures differ only by some disorder.

composition (atom types and stoichiometry), i.e. empirical formula central atom type: ligand graph, composition, and coordination topological type of the net obtained by standard simplification topological type of the net obtained by simplifying the adjacency matrix topological type of the net obtained by cluster simplification degree of interpenetration (a) SAKRED and SEFBOV, (b) KAXQOR and ZERQOE, and (c) GOMRAC and GOMREG are duplicates in the 502 CoRe MOF database since their physical structures differ only by some disorder. Clearly, the order of the steps can be interchanged. In particular, the cost of computing the net type of a more complicated net competes with the cost of highly simplifying a net. Therefore, interchanging steps 3 and 5 will require more effort to compute the simplifications but less effort to compute the net topologies.

Results and Discussion

We investigated the 502 CoRe MOF database with 502 DFT relaxed structures with assigned DDEC partial atomic charges as an example. Of these, 488 were considered to be reliable for comparative analysis. While searching for duplicates, we performed some simple tests on the integrity of the 502 CoRe MOF database, such as searching for too short interatomic contacts and wrongly coordinated atoms. That flagged 66 entries with potential problems. Before we performed our analysis, we replaced in 46 structures erroneous atom coordinates by their positions before relaxation to maintain the net. We furthermore detected errors in 14 structures that were mainly caused by the removal of solvents that are structural building blocks or attached to the structure and chemically important or by the removal of charged anions without balancing the charges. In these cases, it is not surprising that the DFT optimization dramatically changes the network by breaking and rejoining valence bonds. We excluded 14 structures, for which hydrogens (CISMAT01, CUNXIS, CUNXIS10, GIHBII, XUWVEG), anions (AVEMOE, BICDAU, SENWAL, SENWIT, SENWOZ), or cations (VAHSIH, MODNIC) were missed or excess atoms were present (IJIROY, YIWMIA). Among the removed structures is AVEMOE,[55] from which a bridging coordinate sulfate anion was removed together with a terminal water ligand. As a result, the removed charge is not balanced and the DFT relaxed structure has not only a very different cell but even uncoordinated Ag atoms and the underlying net consequently differs from the original one. The atomic charges are also incorrect for BICDAU,[56] where terminal acetate ligands were excluded from the structure and thus could not be taken into account in the DFT calculations. Details and the list of problematic structures are given in the Supporting Information.

Duplicates

As can be expected from the generation of the CoRe MOF database, most of its duplicates originate from structures in the CSD that differ only by their clathrate solvents. The CSD refcode of each structure, all chemical data, the results from the analyses of the nets, and a list of all duplicates are contained in the Supporting Information. In addition, a list of structures that should be removed to obtain a duplicate-free version of the database is given in the Supporting Information. Here, whenever one representative was correct and the other erroneous, the correct one is kept, and if all representatives were correct but were reported with different multiples of the unit cell, the representative with larger cell volume is removed. Examples are discussed below. We followed the procedure outlined in Methods. In the first step, we found 325 materials uniquely determined by their composition and detected a further 163 structures distributed among 59 unique empirical formulas. We then examined the structures with the same empirical formula separately by comparing them in the next step. The second step found a further 28 uniquely determined structures from the 163, and the resulting 135 structures with duplicate ligand sets were distributed among 47 representatives. Among the 28 unique compounds are six pairs of structures with isomeric ligands (see Table and Methods for anexplanation).

Table 1

Compounds with Isomeric Ligandsa

refcode	compound	refcode	compound
MIMVEJ	[Zn(nicotinato)₂]	VACFUB01	[Zn(isonicotinato)₂]
WIDZOA	[Cd(succinato)(3,3′-(hydrazine-1,2-diylidenedieth-1-yl-1-ylidene)dipyridine)]	WIFBAQ	[Cd(succinato)(4,4′-(hydrazine-1,2-diylidenedieth-1-yl-1-ylidene)dipyridine)]
UBACOR	[Zn₂(1,1′-biphenyl-2,2′,6,6′-tetracarboxylato)(4,4′-bipyridyl)]	XUYXAR	[Zn₂(1,1′-biphenyl-2,2′,4,4′-tetracarboxylato)(2,2′-bipyridyl)]
BERGAI	[Zn₂(3-amino-1,2,4-triazolato)₂(terephthalato)]	QIFLIC	[Zn₂(3-amino-1,2,4-triazolato)₂(isophthalato)]
KAYBIX	[Ca(pyridine-2,5-dicarboxylato)]	KAYBUJ	[Ca(pyridine-2,4-dicarboxylato)]
ESEVIH	[Zn₂(OH)(benzene-1,3,5-tricarboxylato)]	FAGREM	[Zn₂(OH)(benzene-1,2,4-tricarboxylato)]

The differences are highlighted in boldface.

The differences are highlighted in boldface. In the same set of 28 structures, 10 coordination isomers are found, which differ in the coordination mode of the ligand (in brackets) in complexes of the same composition (identified by the same number in braces) (see Table ). For example, there are two types of [Cd3(μ6-biphenyl-3,4′,5-tricarboxylato)2] complexes, in which the hexadentate ligand is either coordinated in G[42] mode (HEKTUO[57]) or in G[51] mode (QEKLID,[58] QEKLID01[59] see Figure SI1). The difference in the coordination mode also leads to different underlying topologies of the standard simplified nets, 4,4,6T38 and 4,4,6T24, respectively. The original structures (in the CSD) differ in addition by terminal ligands, namely dimethylacetamide (HEKTUO) and dimethylformamide (QEKLID, QEKLID01), and water solvates contained in QEKLID and QEKLID01 but not in HEKTUO. Other examples are the three different clathrate structures of [Y(benzene-1,3,5-tricarboxylato)] that are distinguished by the coordination mode of the tricarboxylate: G[22] (SEHTEF;[60] dimethylformamide and dimethyl sulfoxide), G[42] (LAVSUY;[60] dimethylformamide), and G[6] (NADZEZ;[61] dimethylformamide and water) (see Figure SI2). The topologies of the underlying nets obtained by standard simplification are also different: namely 4-c sra, 6,6T2, and 6-c htp, respectively. One more striking example is the pair SERJUV[62] and SERKEG,[62] which can be distinguished by the coordination modes of their ligands as well as by the topologies of their simplified adjacency matrices, while the topological type of their nets obtained from standard simplification is stp for both. Examination of the remaining 135 structures in step 3, i.e. comparison of their nets obtained from standard simplification, identifies an additional 7 structures as unique (see Table ). The 128 structures so obtained with potential duplicates occur in 45 unique combinations of composition, ligand symbol, and topology of the net obtained from standard simplification.

Table 3

Skeleton Isomers Revealed at the Third Step of the Analysis (Comparing the Topologies of the Nets Obtained from Standard Simplification)a

compound	refcode	net	refcode	isomeric net
Different Clathrates
[AlPO₄]	LOFZUB	SAV	GOMRAC, GOMREG	LAU
[Zn(imidazol-1,3-diyl)₂]	HIFVOI	dft	GIZJOP, VEJYUF01, VEJYUF02	cag
[Zn(HCO₂)₂]	KAVROQ	3,3,6,6,6T15	RATDAS02, TESGOO, TESGUU, TESHAB, TEVZEA, TEVZIE, TEVZOK, TEVZUQ	3,6,6T1
[Cu(3,4′-biphenyldicarboxylato)]	MOYYEF	4,4T69	MOYYIJ	4,4T74
Different (Removed) Terminal Ligands
[Zn(4-(tetrazol-5-yl)benzoato)]	WENDIE	4,4,4,4T59	FECWOB01	gis

The seven unique structures are underlined.

The seven unique structures are underlined. Comparing the topological types of the nets obtained from the matrix simplification in step 4 detects two more unique structures (NUTQEZ and XUNGOD), and one quartet of [Ca(4,4′-sulfonyldibenzoato)] structures (ZERQOE,[63] KAXQOR,[64] KAXQIL,[64] KAXQOR01[65]), originally containing different clathrates, is split into two pairs of isomers (ZERQOE-KAXQOR, KAXQIL-KAXQOR01). KAXQOR and KAXQOR01 are the only examples of real polymorphs in the 502 CoRe MOF database. Therefore, the number of possible duplicated structures reduces to 126 and the number of unique representatives increases to 46. Comparing the topological types of the nets obtained from cluster simplification in step 5 does not distinguish any additional structures, which can be explained by this commonly used representation being the simplest notion of underlying nets.[47] Even on analyzing the large CoRE MOF database with more than 4700 structures, we did not find any structures that step 5 distinguishes but which were not already differentiated by the previous steps. However, we include the cluster representation in our analysis for it captures the net topology that is usually used to classify and describe the topology of a MOF. Furthermore, the order in which the steps of the algorithm can be performed is a matter of choice as described in Methods. Counting the degree of interpenetration in step 6 allows differentiation of four isomers. The [Zn2(2,2′-bitiophene-5,5′-dicarboxylato)2(4,4′-bipyridyl)] framework is twice listed as 2-fold interpenetrated (GUYLOC, GUYMAP) and the 3-fold interpenetrated analogue is given two times as well (GUYLUI, GUYLUI01). The well-known MOF-5 framework of the composition [Zn4O(benzene-1,4-dicarboxylato)3] is found twice in its 2-fold interpenetrated version (HIFTOG, HIFTOG02), and is listed 17 times as single framework. Consequently, the number of possibly distinct structures in the previous list of 126 structures is now 46 + 2 = 48. The remaining 126 structures cannot be uniquely described by applying our set of invariants. Indeed, all of the indistinguishable structures have multiple entries: we find 39 pairs, 5 triples, 2 quadruples, one structure that is deposited 8 times, and MOF-5 with 17 entries. Most duplicates are caused by the removal of different clathrates/solvent molecules from the original structure. For example, KAXQOR[64] and ZERQOE[63] (Figure b) only differ in the CSD by the CO2 adsorbed in ZERQOE. Similarly, WOWGEU[66] and GUXLIU[67] are independently listed in the CSD only because they contain different numbers of clathrate water molecules in the pores of the framework [Al2F2(ethylenediphosphonato)]. Two structures, JAVNIE[68] and FUSWIA,[69] differ by water coordinated to the copper atoms of the framework [Cu3Cl2(5-(4-pyridyl)tetrazolato)4], which is present in JAVNIE but absent in FUSWIA, as well as by the clathrate molecules dimethylformamide and methanol in FUSWIA and dimethylformamide and water in JAVNIE. More examples of duplicates caused by different solvent molecules are the pairs AMIMEP[41] and AMILUE,[41] and HEGJUZ[42] and XUVHEB,[43] which are discussed in Methods (Figures and 4), SAKRED[70] and SEFBOV,[71] and KAXQOR[64] and ZERQOE[63] (Figure a,b). However, the pair GOMRAC[72] and GOMREG[72] of AlPO4 is a duplicate due to neglect of the metal disorder (Figure c). In both materials, a third of the aluminum sites are substituted, but while GOMRAC contains zinc, GOMREG contains manganese. Although the two materials are different, they are both stored with full occupation of aluminum in the 502 CoRe MOF database and must therefore be counted as duplicates. In Chart we summarize the structures that are distinguished during the six steps of our algorithm applied to the 502 CoRe MOF database, leaving 126 duplicates in 48 groups (UNIQUE structures: 325 + 28 + 7 + 2 + 48 = 410): 16% of 488 structures are redundant.

Chart 1

Detecting Duplicates in the 502 CoRe MOF Database

Statistical Errors Caused by Multiple Entries in a Database

We close with an example of the significance of cleaning databases from duplicates before drawing statistical conclusions. The following examination of interpenetration gives a simple example for a misleading statistical analysis caused by multiple entries: if we consider all 502 structures of the 502 CoRe MOF database, we find 58 2-fold, 16 3-fold, 9 4-fold, 3 5-fold, 1 6-fold, 3 7-fold, and 2 8-fold interpenetrated structures. However, if duplicates are removed, we find that there is only one 7-fold interpenetrated structure of [Zn(4-(2-(pyridin-4-yl)vinyl)benzoato)2]: namely, UVARIT = UVAROZ = UVASAM[73] (dia). Similarly, the 3-fold interpenetrated structures contain the double GUYLUI[74] and GUYLUI01,[75] and the 2-fold interpenetrated structures contain 7 doubles. The numbers of interpenetrated structures should instead read as 51 2-fold, 15 3-fold, 9 4-fold, 3 5-fold, 1 6-fold, 1 7-fold, and 2 8-fold interpenetrated structures (see Chart ). The degree of interpenetration is given in the file dealing with duplicates in the Supporting Information.

Chart 2

Statistics of the Interpenetration

Conclusions

We have presented a rigorous method to distinguish MOFs that is based on an analysis of the bond network. In contrast to approaches that rely on comparing atom numbers and cell parameters or properties such as atom positions, pore volume, and surface area, we are able to reliably distinguish structures and respectively detect duplicates, even when frameworks are distorted. Although superimposable duplicates would be found by purely geometrical descriptors, even large differences in any of them do not allow the conclusion that two structures are different. However, nonidentical structures can be more similar than two different relaxations of one structure with respect to purely geometrical descriptors. For example, if a symmetry is broken by relaxation or if different clathrates induce distinct symmetries, multiples of the original unit cell can be needed to describe the relaxed cleaned structure, which makes it useless to compare the number of atoms or cell parameters. In contrast, the properties that we obtain from the bond network such as its atom types, topology, dimensionality, interpenetration, and point and vertex symbols remain unchanged for all representations of a structure. It immediately follows that in order to distinguish two structures, it is sufficient that they differ in one of these properties. As an example, the 502 CoRe MOF database of 502 DFT relaxed MOFs with assigned DDEC partial atomic charges was investigated, showing that 15.5% (78) of the structures are redundant duplicates. A total of 9.2% (46) structural files contains incorrect atomic coordinates that affect the network topology and were replaced before the study, and 2.8% (14) structures have wrong framework compositions. In all, 502 – 78 – 14 = 364 structures are reliable, which is 72.5% of the database. The analysis was performed using ToposPro.

35 in total

1. Control of interpenetration for tuning structural flexibility influences sorption properties.

Authors: Sareeya Bureekaew; Hiroshi Sato; Ryotaro Matsuda; Yoshiki Kubota; Raita Hirose; Jungeun Kim; Kenichi Kato; Masaki Takata; Susumu Kitagawa
Journal: Angew Chem Int Ed Engl Date: 2010-10-11 Impact factor: 15.336

2. A fingerprint based metric for measuring similarities of crystalline structures.

Authors: Li Zhu; Maximilian Amsler; Tobias Fuhrer; Bastian Schaefer; Somayeh Faraji; Samare Rostami; S Alireza Ghasemi; Ali Sadeghi; Migle Grauzinyte; Chris Wolverton; Stefan Goedecker
Journal: J Chem Phys Date: 2016-01-21 Impact factor: 3.488

3. Crystal engineering: toward intersecting channels from a neutral network with a bcu-type topology.

Authors: Tzuoo-Tsair Luo; Hui-Lien Tsai; Shang-Li Yang; Yen-Hsiang Liu; R Dayal Yadav; Chan-Cheng Su; Chuen-Her Ueng; Lee-Gin Lin; Kuang-Lieh Lu
Journal: Angew Chem Int Ed Engl Date: 2005-09-19 Impact factor: 15.336

4. The inconsistency in adsorption properties and powder XRD data of MOF-5 is rationalized by framework interpenetration and the presence of organic and inorganic species in the nanocavities.

Authors: Jasmina Hafizovic; Morten Bjørgen; Unni Olsbye; Pascal D C Dietzel; Silvia Bordiga; Carmelo Prestipino; Carlo Lamberti; Karl Petter Lillerud
Journal: J Am Chem Soc Date: 2007-03-07 Impact factor: 15.419

5. A database of new zeolite-like materials.

Authors: Ramdas Pophale; Phillip A Cheeseman; Michael W Deem
Journal: Phys Chem Chem Phys Date: 2011-03-18 Impact factor: 3.676

6. In silico design of porous polymer networks: high-throughput screening for methane storage materials.

Authors: Richard L Martin; Cory M Simon; Berend Smit; Maciej Haranczyk
Journal: J Am Chem Soc Date: 2014-03-24 Impact factor: 15.419

7. Carbon dioxide capture: prospects for new materials.

Authors: Deanna M D'Alessandro; Berend Smit; Jeffrey R Long
Journal: Angew Chem Int Ed Engl Date: 2010-08-16 Impact factor: 15.336

8. In silico screening of carbon-capture materials.

Authors: Li-Chiang Lin; Adam H Berger; Richard L Martin; Jihan Kim; Joseph A Swisher; Kuldeep Jariwala; Chris H Rycroft; Abhoyjit S Bhown; Michael W Deem; Maciej Haranczyk; Berend Smit
Journal: Nat Mater Date: 2012-05-27 Impact factor: 43.841

9. Microporous sensor: gas sorption, guest exchange and guest-dependant luminescence of metal-organic framework.

Authors: Sergey A Sapchenko; Denis G Samsonenko; Danil N Dybtsev; Maxim S Melgunov; Vladimir P Fedin
Journal: Dalton Trans Date: 2010-11-22 Impact factor: 4.390

10. Solid-state coordination chemistry of copper(II) tetrazolates: anion control of frameworks constructed from trinuclear copper(II) building blocks.

Authors: Wayne Ouellette; Hongxue Liu; Charles J O'Connor; Jon Zubieta
Journal: Inorg Chem Date: 2009-06-01 Impact factor: 5.165

9 in total

1. The role of molecular modelling and simulation in the discovery and deployment of metal-organic frameworks for gas storage and separation.

Authors: Arni Sturluson; Melanie T Huynh; Alec R Kaija; Caleb Laird; Sunghyun Yoon; Feier Hou; Zhenxing Feng; Christopher E Wilmer; Yamil J Colón; Yongchul G Chung; Daniel W Siderius; Cory M Simon
Journal: Mol Simul Date: 2019 Impact factor: 2.178

2. A collection of forcefield precursors for metal-organic frameworks.

Authors: Taoyi Chen; Thomas A Manz
Journal: RSC Adv Date: 2019-11-13 Impact factor: 4.036

3. Identifying misbonded atoms in the 2019 CoRE metal-organic framework database.

Authors: Taoyi Chen; Thomas A Manz
Journal: RSC Adv Date: 2020-07-20 Impact factor: 4.036

4. Metal-Organic Frameworks with Hexakis(4-carboxyphenyl)benzene: Extensions to Reticular Chemistry and Introducing Foldable Nets.

Authors: Francoise M Amombo Noa; Erik Svensson Grape; Steffen M Brülls; Ocean Cheung; Per Malmberg; A Ken Inge; Christine J McKenzie; Jerker Mårtensson; Lars Öhrström
Journal: J Am Chem Soc Date: 2020-05-05 Impact factor: 15.419

5. Diverse π-π stacking motifs modulate electrical conductivity in tetrathiafulvalene-based metal-organic frameworks.

Authors: Lilia S Xie; Eugeny V Alexandrov; Grigorii Skorupskii; Davide M Proserpio; Mircea Dincă
Journal: Chem Sci Date: 2019-08-01 Impact factor: 9.825

Review 6. Too Many Materials and Too Many Applications: An Experimental Problem Waiting for a Computational Solution.

Authors: Daniele Ongari; Leopold Talirz; Berend Smit
Journal: ACS Cent Sci Date: 2020-10-02 Impact factor: 14.553

7. Visualization and Quantification of Geometric Diversity in Metal-Organic Frameworks.

Authors: Thomas C Nicholas; Eugeny V Alexandrov; Vladislav A Blatov; Alexander P Shevchenko; Davide M Proserpio; Andrew L Goodwin; Volker L Deringer
Journal: Chem Mater Date: 2021-10-27 Impact factor: 10.508

8. Structural Diversity and Carbon Dioxide Sorption Selectivity of Zinc(II) Metal-Organic Frameworks Based on Bis(1,2,4-triazol-1-yl)methane and Terephthalic Acid.

Authors: Taisiya S Sukhikh; Evgeny Yu Filatov; Alexey A Ryadun; Konstantin A Kovalenko; Andrei S Potapov
Journal: Molecules Date: 2022-10-01 Impact factor: 4.927

9. Machine-Learning Prediction of Metal-Organic Framework Guest Accessibility from Linker and Metal Chemistry.

Authors: Rémi Pétuya; Samantha Durdy; Dmytro Antypov; Michael W Gaultois; Neil G Berry; George R Darling; Alexandros P Katsoulidis; Matthew S Dyer; Matthew J Rosseinsky
Journal: Angew Chem Int Ed Engl Date: 2022-01-12 Impact factor: 16.823

9 in total