| Literature DB >> 35515793 |
Abstract
Databases of experimentally-derived metal-organic framework (MOF) crystal structures are useful for large-scale computational screening to identify which MOFs are best-suited for particular applications. However, these crystal structures must be cleaned to identify and/or correct various artifacts. The recently published 2019 CoRE MOF database (Chung et al., J. Chem. Eng. Data, 2019, 64, 5985-5998) reported thousands of experimentally-derived crystal structures that were partially cleaned to remove solvent molecules, to identify hundreds of disordered structures (approximately thirty of those were corrected), and to manually correct approximately 100 structures (e.g., adding missing hydrogen atoms). Herein, further cleaning of the 2019 CoRE MOF database is performed to identify structures with misbonded or isolated atoms: (i) structures containing an isolated atom, (ii) structures containing atoms too close together (i.e., overlapping atoms), (iii) structures containing a misplaced hydrogen atom, (iv) structures containing an under-bonded carbon atom (which might be caused by missing hydrogen atoms), and (v) structures containing an over-bonded carbon atom. This study should not be viewed as the final cleaning of this database, but rather as progress along the way towards the goal of someday achieving a completely cleaned set of experimentally-derived MOF crystal structures. We performed atom typing for all of the accepted structures to identify those structures that can be parameterized by previously reported forcefield precursors (Chen and Manz, RSC Adv., 2019, 9, 36492-36507). We report several forcefield precursors (e.g., net atomic charges, atom-in-material polarizabilities, atom-in-material dispersion coefficients, electron cloud parameters, etc.) for more than five thousand MOFs in the 2019 CoRE MOF database. This journal is © The Royal Society of Chemistry.Entities:
Year: 2020 PMID: 35515793 PMCID: PMC9055497 DOI: 10.1039/d0ra02498h
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Fig. 1Flow diagram for the construction of CoRE MOF 2019 database.
Fig. 2Examples of artifacts being screened in this paper. Panel (A) is an example of isolated atoms in the data that are likewise isolated in the real physical specimen (the circled atoms are F− ions). Panel (B) is an example of isolated atoms in the data that are likely not isolated in the real physical specimen (the circled atoms are oxygen atoms which likely belong to water molecules in the physical specimen for which hydrogen atoms were omitted in the reported crystal structure). Panel (C) is an example of overlapping atoms. Panel (D) is an example of misplaced hydrogens. Panel (E) is an example of under-bonded carbons. Panel (F) is an example of over-bonded carbons.
Coefficients for eqn (1) for fitted C-atom bond orders
| Atom |
|
|
|
|---|---|---|---|
| H | −0.6093 | 0.5927 | 0.7584 |
| B | −2.2011 | 3.4380 | 0.9638 |
| C | −1.2685 | 1.8855 | 0.9233 |
| N | −1.2680 | 1.8401 | 0.9255 |
| O | −1.0525 | 1.5189 | 0.9477 |
| Cl | −0.7621 | 1.3723 | 0.9350 |
| Br | −0.8003 | 1.5272 | 0.9776 |
Breakdown of flagged MOFs of major artifacts from each subset. The number of structures containing only that artifact type is listed in parentheses
| Isolated atoms | Misbonded hydrogens | Overlapping atoms | Under-bonded carbons | Over-bonded carbons | Total flagged | Accepted | |
|---|---|---|---|---|---|---|---|
| ASR_CSD | 88 (72) | 20 (16) | 100 (33) | 201 (154) | 137 (70) | 441 | 1204 |
| ASR_public | 819 (718) | 132 (107) | 127 (93) | 1041 (922) | 91 (51) | 2046 | 8100 |
| FSR_CSD | 218 (149) | 44 (28) | 445 (101) | 433 (281) | 481 (127) | 1119 | 1779 |
| FSR_public | 485 (405) | 82 (63) | 70 (46) | 727 (629) | 63 (29) | 1295 | 4713 |
Number of structures not containing hydrogen or carbon atoms
| Total structures | No hydrogens | No carbons | |
|---|---|---|---|
| ASR_CSD | 1645 | 48 | 9 |
| ASR_public | 10 146 | 859 | 463 |
| FSR_CSD | 2898 | 74 | 10 |
| FSR_public | 6008 | 473 | 300 |
Fig. 3Flow diagram of this project.
| IA/MH | IA/OA | IA/UC | IA/OC | MH/OA | MH/UC | MH/OC | |
|---|---|---|---|---|---|---|---|
| ASR_CSD | 1 | 5 | 7 | 0 | 0 | 1 | 1 |
| ASR_public | 5 | 12 | 78 | 1 | 3 | 6 | 7 |
| FSR_CSD | 2 | 11 | 30 | 1 | 4 | 3 | 1 |
| FSR_public | 2 | 8 | 62 | 1 | 3 | 5 | 7 |
| OA/UC | OA/OC | UC/OC | IA/MH/OA | IA/MH/UC | IA/MH/OC | IA/OA/UC | |
|---|---|---|---|---|---|---|---|
| ASR_CSD | 12 | 41 | 19 | 0 | 0 | 0 | 2 |
| ASR_public | 7 | 6 | 21 | 1 | 0 | 1 | 1 |
| FSR_CSD | 21 | 243 | 40 | 0 | 0 | 0 | 5 |
| FSR_public | 5 | 3 | 18 | 1 | 1 | 0 | 2 |
| IA/OA/OC | IA/UC/OC | MH/OA/UC | MH/OA/OC | MH/UC/OC | OA/UC/OC | IA/MH/OA/UC | |
|---|---|---|---|---|---|---|---|
| ASR_CSD | 1 | 0 | 1 | 0 | 0 | 5 | 0 |
| ASR_public | 0 | 2 | 2 | 0 | 0 | 2 | 0 |
| FSR_CSD | 16 | 1 | 3 | 3 | 0 | 46 | 0 |
| FSR_public | 0 | 3 | 0 | 0 | 0 | 2 | 0 |
| IA/MH/OA/OC | IA/MH/UC/OC | IA/OA/UC/OC | MH/OA/UC/OC | All 5 | |
|---|---|---|---|---|---|
| ASR_CSD | 0 | 0 | 0 | 0 | 0 |
| ASR_public | 0 | 0 | 0 | 0 | 0 |
| FSR_CSD | 0 | 0 | 3 | 0 | 0 |
| FSR_public | 0 | 0 | 0 | 0 | 0 |