Daniele Ongari1, Leopold Talirz1,2, Berend Smit1. 1. Laboratory of Molecular Simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), Rue de l'Industrie 17, Sion, CH-1951 Valais, Switzerland. 2. Theory and Simulation of Materials (THEOS), Faculté des Sciences et Techniques de l'Ingénieur, École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland.
Abstract
Finding the best material for a specific application is the ultimate goal of materials discovery. However, there is also the reverse problem: when experimental groups discover a new material, they would like to know all the possible applications this material would be promising for. Computational modeling can aim to fulfill this expectation, thanks to the sustained growth of computing power and the collective engagement of the scientific community in developing more efficient and accurate workflows for predicting materials' performances. We discuss the impact that reproducibility and automation of the modeling protocols have on the field of gas adsorption in nanoporous crystals. We envision a platform that combines these tools and enables effective matching between promising materials and industrial applications.
Finding the best material for a specific application is the ultimate goal of materials discovery. However, there is also the reverse problem: when experimental groups discover a new material, they would like to know all the possible applications this material would be promising for. Computational modeling can aim to fulfill this expectation, thanks to the sustained growth of computing power and the collective engagement of the scientificcommunity in developing more efficient and accurate workflows for predicting materials' performances. We discuss the impact that reproducibility and automation of the modeling protocols have on the field of gas adsorption in nanoporous crystals. We envision a platform that combines these tools and enables effective matching between promising materials and industrial applications.
In this Outlook, we are indulging in a
luxury problem: too many
materials with too many possible applications. In the field of nanoporous
materials, more than 100 000 metal–organic frameworks
(MOFs) have been reported (Figure ), with large design spaces also guaranteed for covalent
organic frameworks (COFs), zeolites, porous organiccages, etc. At
the same time, the range of applications of these materials is expanding
from gas separation[1] and gas storage[2] to fields such as catalysis[3,4] and
sensing.[5,6] A new material is often designed and tested
for one specific application; testing it for a wide range of applications
may exceed the expertise, time resources, and/or infrastructure of
the research group synthesizing the material. Conversely, research
groups focused on the application side can no longer afford to test
all materials of possible interest.
Figure 1
(a) Papers mentioning “Zeolite”,
“Metal Organic
Framework”, and “Covalent Organic Framework”
in the title or the abstract, as parsed from Scopus in July 2020.[7] The right column collects histograms for the
deposition of materials in publicly available databases. (b) Zeolite
code types by year of assignment, from the database of the International
Zeolite Association (IZA).[8] (c) MOF-subset
of the Cambridge Structural Database (CSD, May 20 update) by year
of publication (orange).[9] MOFs in the CoRE-2019
“All solvent Removed” (ASR) subset (purple) are selected
from the CSD release of November 2017 with criteria such as three-dimensionality
of the framework and permeability to small molecules.[10] (d) COFs in the CURATED-COFs database (June 20 update),
by year of publication.[11,12]
(a) Papers mentioning “Zeolite”,
“Metal Organic
Framework”, and “Covalent Organic Framework”
in the title or the abstract, as parsed from Scopus in July 2020.[7] The right column collects histograms for the
deposition of materials in publicly available databases. (b) Zeolitecode types by year of assignment, from the database of the InternationalZeolite Association (IZA).[8] (c) MOF-subset
of the Cambridge Structural Database (CSD, May 20 update) by year
of publication (orange).[9] MOFs in the CoRE-2019
“All solvent Removed” (ASR) subset (purple) are selected
from the CSD release of November 2017 with criteria such as three-dimensionality
of the framework and permeability to small molecules.[10] (d) COFs in the CURATED-COFs database (June 20 update),
by year of publication.[11,12]Let us illustrate this point with a few examples. Al-PMOF was first
synthesized for its photocatalytic activity but was later discovered
to be promising for the separation of CO2 from wet flue
gases.[13] MOF SBMOF-1 was synthesized to
capture CO2 but turned out to be an excellent material
for separating Xe from Kr.[14] UMCM-152 was
first reported in 2010 and tested for H2 adsorption[15] but was rediscovered as a record-breaking material
for oxygen storage eight years later.[16] In these and further examples,[17] computational
screening studies discovered the potential of existing materials for
new applications. Since the number of available materials is so large
(and becomes even larger when including materials generated in silico(18)), computational modeling
is at present the only feasible screening method.A typicalcomputational screening study aims to rank a set of materials
for a given application: the first step is to determine key performance
indicators (KPIs) and the ranking criteria for the comparison of different
materials. KPIs are typically related to material properties, such
as the electronic band structure (for optical or electronic KPIs),
or adsorption isotherms (for KPIs related to gas storage or separations).
While some properties, such as stability, remain difficult to predict
from first-principles, density functional theory (DFT) calculations
provide access to a wide range of properties of MOFs, including the
band gap and band structure,[19] mechanical
properties,[20,21] and catalytic properties.[22,23] DFT also allows us to make accurate predictions of the interactions
of guest molecules inside the pores.[24,25] In addition,
classical molecular simulations enable the computation of thermodynamic
and transport properties of these guest molecules.[26,27]We can envision taking this approach even
one step further: What
if the workflow infrastructure was made available for experimental
groups to use? They could upload the crystal structure of a new nanoporous
material (MOF, COF, or zeolite) and—with close to zero effort—obtain
calculations of thermodynamic and transport properties for a range
of guest molecules, as well as predictions of how this new material
would perform in screening studies previously published in the literature
(Figure ), or if an
experimental group develops a novel separation process, having a database
of thermodynamic data would allow this group to identify top performing
materials for their applications. Finally, if a computational group
improves an existing force field or protocol for a specificclass
of materials, the updated workflow could be available whenever a material
of that class is uploaded.
Figure 2
Scheme of exemplary workflow. The user starts
by uploading the
atomic structure of a crystalline materials in the CIF format, which
triggers the refinement of the atomic positions, the computation of
pore geometry, and thermodynamic and transport properties. Finally,
its performance for specific applications is evaluated, and the material
is ranked versus other candidates.
Scheme of exemplary workflow. The user starts
by uploading the
atomic structure of a crystalline materials in the CIF format, which
triggers the refinement of the atomic positions, the computation of
pore geometry, and thermodynamic and transport properties. Finally,
its performance for specific applications is evaluated, and the material
is ranked versus other candidates.
Databases of Nanoporous Materials
Curated Structures from
Experimental Syntheses
Computational
screening studies rely on large databases of materials. A first step
is to collect all the reported structures in a consistent format:
today, the Crystallographic Information File (CIF) format is the most
common one. The minimal information needed to proceed with a computational
screening study is the dimensions of the unit cell and the elements
and coordinates of the atoms that compose the framework. This information
is obtained ideally from single crystal X-ray diffraction studies.
When obtaining single crystals is not possible, one can rely on powder
X-ray diffraction or other indirect measurements instead.[28]In the MOFcommunity, it is standard practice
to publish new structures in the Cambridge Structural Database (CSD)[29] and to report the reference code assigned to
the entry in the article. With its more than one million entries (102 508
of which were recognized as MOFs until May 2020),[9] the CSD is the oldest and largest data set of organic and
metal–organiccrystals.[30]Unfortunately, many of the reported structures in the CSD are not
suitable for a computational screening study out of the box. From
X-ray diffraction data, it is difficult to locate hydrogen. If the
material is charged, locating the counterions can also be challenging.
For porous MOFs, the crystal structure is often determined with solvent
molecules present inside the framework, while many practical applications
require activating the material by removing the solvent
molecules. In addition, disorder in the material is often indicated
via partial occupancies, which need to be resolved to unique structures
(in most of the cases manually) before using the structures as input
for simulations.The group around D. Sholl pioneered the extraction
of MOFs from
the CSD for a computational screening purpose. In two studies from
2012 (seeking materials for kinetic separation of noble gases[31] and CO2/N2[32]) they distilled a set of 3432 and 1163 MOFs,
respectively, from which they discarded entries with atomic disorder,
and they algorithmically removed solvent molecules. One year later,
Siegel et al.[33] targeted hydrogen storage
and identified ca. 38 800 crystals from the CSD as MOFs but
had to exclude ca. 16 000 problematic structures due to missing
H, disorder, etc. For these first studies, the final database of filtered
and curated CIFs was only accessible upon request to the authors.
In 2014, Chung et al. created a set of 5109 “Computation-Ready
Experimental” (CoRE) structures, selected to be three-dimensional,
porous (i.e., with a pore limiting diameter >2.4 Å), and fully
desolvated, and made the database openly available for download.[34] Recently, this database was updated to include
14 142 MOFs, 546 of which were collected from sources other
than the CSD.[10] This update also included
a version of the database where solvent molecules coordinated to metal
sites were not removed (i.e., the MOFs were not “computationally
activated”). In two separate projects, Nazarian et al. used
DFT to provide partialcharges[35] and geometry-optimized
structures[36] for the first set of CoRE-MOFs.
Only 502 (i.e., ca. 10%) passed both refinements. The database with
partialcharges was used by other groups in several screening studies,[16,37−39] demonstrating the impact of providing an open-access,
curated, and extensive database of structures to the computationalcommunity.Ideally, for every new MOF structure deposited in
the CSD, a computation-ready
structure would be generated as well. At present, however, there is
no standardized protocol for these steps—e.g., removal of solvents,
addition of missing hydrogen atoms, resolution of partial occupancies,
correction of atomic overlaps, and structural distortion—and
as a result, each group may generate slightly different structures
that make it difficult to compare predicted properties.[40,41] This cleaning procedure can be seen as a continuous process where
more and more checks and fixes are added in a collaborative effort.[42,43] For the platform we envision, it would be highly desirable to eventually
arrive at an internally consistent and extensive database of nanoporous
structures that are “ready” for molecular simulations
or electronic structure calculations, and made available in a way
that satisfies the FAIR data principles: Findable, Accessible, Interoperable,
and Reusable.[44,45]Extending this curation
to different classes of materials, such
as COFs, inevitably results in new considerations and challenges.
For MOFs, the CSD imposes quality controls on the accuracy of the
crystal structure. COFs typically have short-range crystallinity but
long-range disorder, preventing the refinement of the crystal structure
directly from X-ray diffraction data[46] and
thus inclusion in the CSD. As a consequence, experimental groups develop
their own protocols to generate the crystal structures reported with
their publications. These difficulties motivated us to create a database
that combines the advantages of both the CoRE and CSD protocols and
provides a high level of transparency and consistency in the refinements
including cell optimization and the calculation of partialcharges.
Branching off from the second version of the CoRE-COF database by
Tong et al.,[47] we extended the database
to 574 COFs in the June 2020 update. The relevant literature is monitored
by the @COF_Papers Twitter bot, and
structures in CIF format are collected in a public repository, where
researchers can suggest new additions or report errors.[12] All modifications, corrections, and additions
are tracked by the Git version control system. Moreover, an automated
routine, orchestrated by the AiiDA workflow manager,[48,49] computes the DFT-optimized structures and partialcharges following
a published protocol.[11] The results are
made available periodically in the CURATED COF database hosted on
the Materials Cloud open science platform.[50] A recent independent study compared the gas-separation performances
as computed from the structures of this database and the originalCOF structures, highlighting the importance of the curation process.[51]
Hypothetical Structures
In addition
to the databases
of experimentally determined structures, there is an even larger number
of structures generated in silico, which are further
candidates for computational screenings. Replacing the experimental
synthesis of new materials with computationalalgorithms increases
the number of atomic structures that can be assembled (zeolites, MOFs,
and COFs) by orders of magnitude. To give an idea of how large these
databases can get, we just mention two very recent works, which reported
325 000 hypotheticalMOFs[13] and
471 990 hypotheticalCOFs.[52]As the growth of the number of hypothetical structures even outpaces
the continued increase of computational power, brute force screening
will become increasingly unfeasible. A promising alternative is to
select only a modest subset of most diverse structures
to perform accurate calculations (comparable to those on experimental
structures), train machine learning methods to capture the structure–performance
relations, and use them to extend the screening to the remaining materials.[53] Indeed, the key to the success of this approach
is to find effective metrics for the “diversity” between
structures in the context of a particular application.[54]
Computation of Materials’ Properties
Gas Adsorption
Properties
Once one has a set of computation-ready
structures, one can start computing the properties that are relevant
for the application(s) of interest—here, gas adsorption in
nanoporous materials. If interested in comparing properties and performances
among many materials (and updating this ranking over time), one needs
to pay particular attention to the consistency of results. Consistency
means, for example, applying the same protocol to curate the crystals’
atomic structure, to estimate partialcharges, to exclude inaccessible
pores, etc. in order to enable the comparison of the final results.
This includes both choices of the model, such as the DFT functional
or the force field (UFF, DREIDING, TraPPE, etc.), as well as secondary
parameters, such as the choice of the DFT basis set or whether to
use tail-corrections in grand canonical Monte Carlo (GCMC) adsorption
calculations.[55]A first step may
involve a relaxation of the atomic positions using force fields, semiempirical
methods, or DFT in order to ensure that the atomic structure is consistent
with the computational method employed. This step can also help identify
mistakes in input structures and take the effect of solvent removal
on the framework into account.Gas–framework interactions
are often evaluated using GCMC
insertion techniques with classical force fields.[17] In this approach, the Coulomb interaction is modeled by
partialcharges, which are tabulated for popular gas molecules but
need to be computed for the framework. If DFT was used for the geometry
relaxation, partialcharges can be computed at negligible extra cost
from the electrostatic potential, usually preferring protocols that
aim at reproducing the electrostatic potential (e.g., REPEAT, DDEC)
over the others (e.g., Mulliken, Hirshfeld, Bader).[56,57] Alternatively, cheaper charge equilibration methods can be used,
but with extreme care (see ref (37)). The partialcharges on the framework atoms need to be
combined with a force field to describe the interaction between the
gas molecule and the framework. Many studies opt for off-the-shelf
parameters for the dispersion interaction, such as DREIDING[58] or UFF[59] for the
framework and TraPPE[60] for the adsorbate.
Steps outside the original design space of existing force fields or
modifications of their parameters need to be carefully validated in
order to ensure that the behavior of the gas molecules in the pores
is reproduced with sufficient accuracy (e.g., adsorption isotherms,
heats of adsorption, etc.).As one moves to larger numbers of
structures and more complex workflows,
it becomes increasingly challenging to manage the calculations and
to provide all information required to reproduce a particular result.
This is where workflow managers can come to the rescue, and numerous
open-source infrastructures are available for orchestrating computationalchemistry codes with advanced logic,[61] such
as AiiDA,[48,49] FireWorks,[62] AFLOW,[63] or signac.[64]
Open Challenges
Once we obtain a reliable force field
for molecule–framework interactions, we are still left with
a number of open challenges. One challenge is the modeling of defects:
as high-throughput computations typically assume perfect crystals,
they will not capture properties that are dominated by crystal defects
present in the experimental material. Another challenge is the modeling
of framework mechanics upon adsorption: most screening studies assume
a rigid framework. For many structures, this approximation is reasonable,
but some materials are known to display structuralchanges upon gas
adsorption, which can affect performance in relevant applications.[65] Assuming the structure to be rigid may lead
to incorrect identification of pore accessibility for gas molecules.
Algorithms based on geometrical assessment of channel diameters can
easily recognize nonaccessible pores and exclude them from the adsorption
calculation.[66] However, it is less trivial
to routinely identify those cases where a small rotation of the ligands
can allow the gas molecule to permeate (such as in the well-studied
case of ZIF-8[67,68]).There are other material
properties relevant to the process modeling of gas-related applications
that can be evaluated from the unit cell, such as gas diffusion,[69] heat capacity,[70,71] mechanical
stability,[72,73] and chemical stability. The studies
cited above propose computational protocols for investigating these
properties, which might be extended for high-throughput screenings.
Combining all these properties in the same screening platform would
allow the filtering out of structures that are unstable or show poor
thermal or molecular diffusion and provide more information to the
process model.Moving beyond the field of gas adsorption brings
yet more properties
into focus, such as more accurate electronic properties for applications
in sensing, semiconductors, and photocatalysis,[19,74] which put increased emphasis on the choice of the electronic structure
method and are beyond the scope of this Outlook.
Ranking Materials
Accurately predicting material properties is the aim of molecular
simulations, but it represents only half of the story: our ultimate
goal is to rank materials for a given application, based on key performance
indicators (KPIs). In the following, we discuss the KPIs for two important
applications of nanoporous materials: hydrogen storage[33,75−79] and CO2 separation from nitrogen.[11,40,80−83]For H2 storage, the main KPI is the deliverable
(or
“working”) capacity, i.e., the difference in gas uptake
between the loading conditions at higher pressure
and/or lower temperature, and the discharging conditions at lower pressure and/or higher temperature. Therefore, the evaluation
of hydrogen storage performance requires calculations at only these
two conditions of temperature and pressure, and molecular simulations
have been shown to be feasible for screening more than half a million
structures.[78,79] Similar considerations are also
valid for the evaluation of naturalgas deliverable capacity.[34,47,84,85] Other important KPIs may focus on the diffusion of gas and heat
inside the framework, in order to enable fast loading/discharge and
heat dissipation.For CO2capture, finding KPIs for
the ranking is more
complex. In 2012, our group developed a simplified thermodynamic model
for carboncapture and sequestration (CCS) involving a temperature–pressure-swing
adsorption process, considering inlet gases from a coal-fired power
plant (14:86 ratio of CO2:N2), a natural-gas-fired
power plant (4% CO2), or air (400 ppm of CO2). This model was used to evaluate different classes of nanoporous
materials,[82,86] and we recently expanded the
study to COFs.[11] Two KPIs were identified:
The “parasitic energy” is defined as the energy needed
to separate 1 kg of CO2 and compress the purified gas to
150 bar for underground storage. The parasitic energy can be taken
as a measure of the operating cost of the separation (OPEX). The working
capacity, on the other hand, determines the amount of adsorbent material
needed and thus the capitalcost (CAPEX). The study showed that the
minimal parasitic energy can be related to an optimal value for the
Henry coefficient of CO2 around 10–3 mol/(kg
Pa) (at 300 K) for power plants—stronger affinity between CO2 and the framework would result in higher energy needed for
the regeneration of the adsorbent. For direct-air capture, the optimal
value lies above 10–1 mol/(kg Pa), and chemisorption
appears to be a more promising solution. Recently, simulations have
been coupled with more advanced models of the pressure-swing process,[87−89] indicating subtle relations between the properties of the material
and their performance in the process. In particular, the often overlooked
nitrogen isotherm was identified as a key indicator.The case
of CO2 separation highlights the importance
of connecting materials’ properties to process modeling: on
one hand, the process modelers need to be aware of the uncertainties
in material property predictions and how they propagate through their
model. If small perturbations in the inputs alter the outcome significantly,
this “butterfly effect” will compromise the reliability
of the final ranking. On the other hand, the molecular simulation
community should focus its efforts on improving the predictive accuracy
for those properties that are shown to have the largest influence
on the process models. In this context, modular workflows and automated
provenance tracking can simplify investigations of individual workflow
components and help trace their impact on the final rankings.
Toward Best
Practices
While the vision of a common platform with easily
extendable and
fully interoperable workflows may appear somewhat utopian today, there
are a number of concrete practices that researchers can adopt to move
closer toward this goal.
Reproducibility
In order for a reader
of a scientific
publication to be able to reproduce its results, the study should
report—among the analysis and discussion of its scientific
results—also all the information needed to reproduce them.
However, for screening studies that involve large numbers of materials
and/or multistep workflows, achieving this “radical transparency”
is easier said than done: manually collecting all necessary input
files, postprocessing scripts, software versions, etc. can be time-consuming,
and completeness is difficult to ensure. Here, workflow managers can
help by tracking the provenance automatically and providing ways to
export and share this information with peers.For example, our
recent work on parsing COFs from the literature and assessing their
performance for carboncapture tries to follow this approach, publishing
both the full provenance graph of the study and the source code of
the workflow used to orchestrate the calculations.[11,50,90] The provenance graph gives any interested
researcher the ability to click on a data point and to inspect every
step of the workflow that was used to compute it, try to reproduce
an individualcalculation themselves, or report mistakes they encounter.
Sharing the source code of the corresponding workflows on collaborative
platforms like GitHub further enables direct suggestions of bug fixes
or improvements to the protocols, both in code and in narrative form.
Automation
Moving from the study
of a few materials
to hundreds or thousands of them puts an emphasis on automation. One
needs an effective way not only to manage the sequence of calculations
but also to handle common errors and perform preliminary data analysis.
We illustrate this using simple, practical examples from gas adsorption
in nanoporous materials: Before submitting a crystal structure to
GCMC simulations one has to detect and block the inaccessible pores
and expand the simulation cell to include twice the cutoff used for
the potential. These operations may need some external packages or ad hoc scripts and can be fully automated using a workflow
manager. Another notable case is the handling of DFT calculations
in which the self-consistent field cycle fails to converge. Depending
on the system under study, remedies can include automatic resubmission
with slower, more conservative minimization schemes (e.g., switching
from orbital transformation to diagonalization methods) or turning
on electronic smearing.[11]When considering
one’s own use cases, and realizing the many intermediate steps
that would need to be automated to go from an input structure to the
final result, one inevitably arrives at the question whether the effort
of full automation is worth the time investment. While this determination
needs to be made case by case, it is easy to forget that each manual
operation makes results harder to reproduce and entails substantial
time investments when others (or even our future self) go on to extend
and build upon the work. At the same time, modern workflow managers
provide time-saving convenience features, such as automatic translations
of job parameters to the language of various queuing systems, automatic
file transfers between the local workstation and the cluster, and
automatic record keeping.The development of a robust workflow
can be challenging, but is
a valuable outcome from a computational study on its own: it ensures
that when new sets of materials are released, they can be included
in the screening with minimal additional effort.
Open Source
The idea of open science does not stop
at access to papers and data but extends naturally to the software
used to obtain the data: the use of free and open-source software
(FOSS) lowers the barriers to reusing, reproducing, and building upon
prior work. This is particularly true for materials screening studies,
where readers may want to compare a new material to the set of the
already screened ones. When data, software, and workflows are made
openly accessible, the barrier for such checks reduces to the marginalcomputationalcost of screening just one more material.Despite
these obvious benefits, the demand for FOSS has been notably missing
from declarations in the open science context.[91] One of the reasons may be that developing and maintaining
high-quality scientific software takes years of teamwork, and commercial
licenses have proven to be a successful model for funding such efforts
in the past.[92] Today, however, FOSS alternatives
exist for most applications in computational materials science, and
we do seem to observe a trend of increasing adoption of these codes
vs their commercialcompetitors over the course of the past decade:[92,93] CASTEP[94,95] (restricted to academic use) and openMolcas[96] being two recent examples of codes that have
decided to switch to a more open licensing model.We believe
that the question of sustainable software development
for open science needs to be on the table and discussed by all stakeholders.
Setting the Stage for Machine Learning
In recent years,
machine learning (ML) has been rapidly mixing with molecular simulations,[97] and we expect the advent of automated workflows
in the field of nanoporous materials modeling to amplify this trend.[53] Among the first applications of ML methods is
the prediction of the Henry coefficient or a full isotherm in a fraction
of a second, from conventional geometric properties of the crystal
structure (such as pore volume and atoms’ connectivity) and/or
more advanced descriptors.[98−101] This massive speed up enables the screening
of even millions of materials at affordable computationalcost, shifting
the role of molecular simulation to providing sufficient training
data for the ML. The main question for a new material then becomes
is this structure different enough from the others already included
in the training set to justify the use of expensive molecular simulations
over ML predictions?[54]However, at
present, ML studies trained on data published by other groups are
rare in the field. In many cases, data are recomputed specifically
for the training, even when similar (but not identical and consistent!)
data are available. In this context, moving as a community from delivering
just a final set of results to including also the infrastructure needed
to obtain them could allow ML experts to easily extend the training
set with new consistent data. Reusing the same data sets in multiple
ML studies would also enable effective assessments of the models themselves,
which is less trivial when the training data differ. Finally, the
training routines should be automated and made reproducible as well.[53]
Toward a Prototype of a Materials Matching
Platform
As a first step toward realizing the ideas put forward
in this
Outlook, we have extended our work of curating COFs and screening
these materials for carboncapture[11] to
six new applications and 250 new COFs.[50] The new structures include mostly COFs published after the original
work (as tracked by the @COF_PAPERS Twitter bot), and the applications
are based on previous screening studies focused on gas storage (methane,[85] hydrogen,[78,79,102] and oxygen[16]) and gas separations (Xe/Kr,[14] H2S removal in wet gases).Performance
of COF structures for CO2capture: parasitic
energy required for the process versus gravimetric working capacity.
Markers of the 250 new COFs are color-coded based on their ranking
from high performance (low parasitic energy and high working capacity,
green) to low performance (red). Markers of materials already included
in ref (11) are shown
in light gray.Provided that the input structure
is chemically sound (e.g., no
missing hydrogens or overlapping atoms) and is charge-neutral (no
counterbalancing ions), the workflow CURATED 99% of the structures
without human intervention. For all CURATED structures, we automatically
computed the adsorption isotherms and/or Henry coefficients of CO2, N2, H2, CH4, O2, H2S, H2O, Xe, and Kr. From these isotherms,
the KPIs were computed automatically and used to rank the materials
as shown in Figure for the extension of our previous study on CO2capture[11] and in Figure for the other new applications included. The full
workflow typically takes 2–5 days from start to finish, using
≈1000 core hours. In other words, it costs about the price
of three cups of coffee: two cups for the curation of the structure
and one for all KPIs.[103] The full provenance
graph of each workflow, shown in Figure , is tracked automatically by the AiiDA workflow
manager.
Figure 3
Performance
of COF structures for CO2 capture: parasitic
energy required for the process versus gravimetric working capacity.
Markers of the 250 new COFs are color-coded based on their ranking
from high performance (low parasitic energy and high working capacity,
green) to low performance (red). Markers of materials already included
in ref (11) are shown
in light gray.
Figure 4
Performance of CURATED-COFs for H2 storage at (a) cryogenic
and (b) near-ambient conditions, (c) methane storage, (d) oxygen storage,
(e) Xe/Kr separation, and (f) (H2S)/water separation. The
ranking is color-coded from high performance (green) to low performance
(red). Selectivities are computed as the ratio of the Henry coefficients
of the two gases at 300 K. The coordinates of the markers for T-COF-2
and JUC-509 are highlighted by dashed and solid lines, respectively.
Figure 5
AiiDA provenance graph of the workflow tracing the entire
path
from the initial CIF file to the properties and performance computed
for it. The graph shows process and data as nodes, and their connection:[49] in an interactive visualization, each node can
be browsed to explore the input parameters of the calculation, its
output results, and the details of the processes.[104] Colors distinguish different modules of the workflow, whose
source code is available online.[90] The
modules make use of other popular open-source tools, such as CP2K,[105] Raspa,[106] Zeo++,[66] and chargemol.[107]
Performance of CURATED-COFs for H2 storage at (a) cryogenic
and (b) near-ambient conditions, (c) methane storage, (d) oxygen storage,
(e) Xe/Kr separation, and (f) (H2S)/water separation. The
ranking is color-coded from high performance (green) to low performance
(red). Selectivities are computed as the ratio of the Henry coefficients
of the two gases at 300 K. The coordinates of the markers for T-COF-2
and JUC-509 are highlighted by dashed and solid lines, respectively.AiiDA provenance graph of the workflow tracing the entire
path
from the initialCIF file to the properties and performance computed
for it. The graph shows process and data as nodes, and their connection:[49] in an interactive visualization, each node can
be browsed to explore the input parameters of the calculation, its
output results, and the details of the processes.[104] Colors distinguish different modules of the workflow, whose
source code is available online.[90] The
modules make use of other popular open-source tools, such as CP2K,[105] Raspa,[106] Zeo++,[66] and chargemol.[107]It is interesting to discuss two
examples of recently reported
structures that were included in the update. T-COF-2 (Figure a) was synthesized in 2020
and tested for a photocatalysis application.[108] The simulations do not predict this material to be among the top
performers for any of the gas adsorption applications implemented
so far. While this is the most likely outcome, as the screening studies
are extended to more applications, the probability of discovering
unexpected hits should increase.
Figure 6
Crystal structures of (a) T-COF-2 and
(b) JUC-509. Elements: H
(white), C (gray), N (blue), oxygen (red), S (yellow), Cl (green).
Crystal structures of (a) T-COF-2 and
(b) JUC-509. Elements: H
(white), C (gray), N (blue), oxygen (red), S (yellow), Cl (green).The other COF, JUC-509 (Figure b), seems more promising. This material was
synthesized
in 2019 for catalysis.[109] Based on the
atomic structure of the material, our screening predicts it to be
among the top performing materials for the storage of H2, CH4, and O2 (Figure ). We acknowledge that there are other factors
that may play a role in deciding whether it is worth testing JUC-509
for these applications, but it is our hope that cases like this one
will lead to interesting, unexpected discoveries going forward.Many things remain to be done in order to transform this prototype
into a platform of broad impact. We have limited ourselves to COFs,
and it is essential to extend it to MOFs and other porous materials.
We have used relatively elementary KPIs to illustrate the concept,
which we hope to replace with more advanced ones, and so far, the
predictions rely on generic force fields which may not always be the
optimalchoice. Finally, we would like to extend the range of applications
beyond the current scope of gas storage and separation. The results
of this screening are updated periodically and can be accessed from materialscloud.org/discover/curated-cofs. Both the data and the source code of the underlying workflows are
made available online.[110]Over time,
we hope to inspire other research groups to build upon
the existing open infrastructure and develop their own modules for
new applications, resulting in “living” screening studies
that are regularly updated with new materials. Infrastructure projects
like these require long-term commitments, which are notoriously difficult
to make in today’s research funding landscape. Thanks to support
from the MARVEL NationalCentre of Competence for Research, we feel
ready to accept the challenge and, given the enormous potential impact
for the field, hope to be able to convince other funding agencies
and possibly commercial partners to join.In summary, the idea
of this Outlook is to illustrate a dormant
potential in the computational materials science community that can
be unlocked by moving toward a more open, collaborative way of doing
science—not necessarily inventing something spectacularly new,
but simply putting together the pieces of a large puzzle. While we
have made the case for the field of nanoporous materials for gas adsorption
applications, the basicconcept would seem extensible to further classes
of materials and applications.
Authors: Li-Chiang Lin; Adam H Berger; Richard L Martin; Jihan Kim; Joseph A Swisher; Kuldeep Jariwala; Chris H Rycroft; Abhoyjit S Bhown; Michael W Deem; Maciej Haranczyk; Berend Smit Journal: Nat Mater Date: 2012-05-27 Impact factor: 43.841
Authors: Peter G Boyd; Arunraj Chidambaram; Enrique García-Díez; Christopher P Ireland; Thomas D Daff; Richard Bounds; Andrzej Gładysiak; Pascal Schouwink; Seyed Mohamad Moosavi; M Mercedes Maroto-Valer; Jeffrey A Reimer; Jorge A R Navarro; Tom K Woo; Susana Garcia; Kyriakos C Stylianou; Berend Smit Journal: Nature Date: 2019-12-11 Impact factor: 49.962
Authors: Debasis Banerjee; Cory M Simon; Anna M Plonka; Radha K Motkuri; Jian Liu; Xianyin Chen; Berend Smit; John B Parise; Maciej Haranczyk; Praveen K Thallapally Journal: Nat Commun Date: 2016-06-13 Impact factor: 14.919
Authors: Varinia Bernales; Manuel A Ortuño; Donald G Truhlar; Christopher J Cramer; Laura Gagliardi Journal: ACS Cent Sci Date: 2017-12-21 Impact factor: 14.553
Authors: Hilal Daglar; Hasan Can Gulbalkan; Gokay Avci; Gokhan Onder Aksu; Omer Faruk Altundal; Cigdem Altintas; Ilknur Erucar; Seda Keskin Journal: Angew Chem Int Ed Engl Date: 2021-03-01 Impact factor: 15.336