A comparative study is presented. The method via chemical variational autoencoder (VAE) and the method via similarity search are compared, focusing on their generation ability for new functional molecular design. Focusing on the natural porphyra-334 as a model molecule, we generated three groups: molecules of mycosporine-like amino acids (MAAs) as seeds (G SEEDS ), molecules generated via chemical VAE (G VAE ) and molecules gathered via similarity search (G SIM ). The number of molecules that satisfy the condition for the light absorption ability of porphyra-334 in G SEEDS , G VAE , and G SIM are 52, 138, and 6, respectively. The method via chemical VAE shows a promising potential for future molecular design. By using quantum chemistry wave function properties for chemical VAE, we find new molecules that are comparable to porphyra-334, including some with unexpected geometries. At the end, we show a group of molecules found with this method.
A comparative study is presented. The method via chemical variational autoencoder (VAE) and the method via similarity search are compared, focusing on their generation ability for new functional molecular design. Focusing on the natural porphyra-334 as a model molecule, we generated three groups: molecules of mycosporine-like amino acids (MAAs) as seeds (G SEEDS ), molecules generated via chemical VAE (G VAE ) and molecules gathered via similarity search (G SIM ). The number of molecules that satisfy the condition for the light absorption ability of porphyra-334 in G SEEDS , G VAE , and G SIM are 52, 138, and 6, respectively. The method via chemical VAE shows a promising potential for future molecular design. By using quantum chemistry wave function properties for chemical VAE, we find new molecules that are comparable to porphyra-334, including some with unexpected geometries. At the end, we show a group of molecules found with this method.
UV
radiation (UVR) has become one of the subjects of environmental
and green chemistry because of the decrease of the thickness of the
ozone layer, which hinder the transmission of UVR from the sun to
the Earth’s surface. Sunlight is the primary energy source
of living organisms; however, UVR damages human skin. It may act as
the origin of skin cancers. Therefore, the development of efficient
sunscreens without side effects is necessary. Porphyra-334 is a UV-resistive
molecule in nature. Mycosporine-like amino acids (MAAs), including
porphyra-334, are chemicals that prevent UVR-induced damage. They
have attracted attention due to having a strong anti-UV effect.[1−4]We reported previously a study on the molecular-level mechanism
in energy transformations from sunlight to heat in porphyra-334 using
first-principles molecular dynamics simulations and by quantum chemistry.[5,6] It revealed that the UV-excited porphyra-334 releases its kinetic
energy via vibrational modes to surrounding water molecules. The structure
of porphyra-334, which contains many hydrophilic functional groups,
favors effective hydrogen bond formation with surrounding water molecules.
Thus, the vibrational modes of water molecules absorb the energy from
the excited molecule. This study provided an interpretation of excellence in a natural molecule, namely porphyra-334. An
ambitious extension in molecular science is the design of such molecules.
Therefore, we explore a design principle in an attempt to advance
toward the natural products.The design and selection of environmentally
friendly and harmless
materials and molecules are critical to establishing a sustainable
society. They are mandatory for the development of functional molecules,
drugs, and a wide range of materials. To achieve the sustainable conditions,
many expensive experiments are in fact necessary. However, considering
the time and cost of the society, we must provide, in parallel, computational
support for the design and selection of these molecules and materials.
Historically, the methodology so far has been based on the analogy
of geometrical appearances (shapes) in molecules and materials starting
from a lead molecule that is found more or less by chance. If such
a methodology was sufficient, we would not be suffering from the current
environmental problems.Chemical space consists of the union
of compounds. While the number
of all feasible compounds is extremely high, estimated to be 1060 possible structures, only a small fraction can be processed
and analyzed at the same time.[7] Exploring
the new horizon of chemical space is a challenge for cheminformatics
and computational molecular design. An alternative approach that does
not depend on the appearance or similarity of molecular shapes is
necessary. We conducted a comparative study to find search criterions
other than shapes and appearances, and the results are reported here.One of the hopeful design approaches is learning from nature-made
molecules such as porphyra-334. Porphyra-334, a molecule that survived
in the long process of evolution, is considered to be the goal of
UV-resistive natural products. We compare the approaches, one via
the shapes and appearances method and the other that uses something
different, as a clue to reach this goal. In fact, we are comparing
the different processes of lead-optimization. We
have carried out a comparison of the molecules generated via chemical
variational autoencoder (ChemVAE)[8] versus
the molecules gathered via Similarity Search (SimSearch).[9] Chemical VAE is a promising approach proposed
recently that is based on machine learning. This provides great opportunities
to generate a new molecule and to explore the search method in chemical
space. In contrast, similarity search is a powerful conventionally
applied method. Notice that Winter et al. proposed the application
of chemical VAE in drug discovery,[10] and
Gao et al. reported the availability of chemical VAE in application
for the generation of novel alternative drug candidates for eight
existing market drugs.[11] We compare lead-optimization
processes starting from the natural product porphyra-334.The
group obtained via SimSearch is based on fingerprints from
a chemical database. This cheminformatics method is a conventional
search that is based on an existing chemical space. The molecular
generation via ChemVAE is based on machine learning structural recognition;
it transforms the input data from SMILES into the vector representation.
There is no need to manually specify the mutation rules. As a result,
unexpected jumps (to desired properties) in chemical space are possible.
In the future, gradient-based optimization will be performed in combination
with Bayesian statistics.[8]In Figure , the
scheme of current study is presented. The design approaches begin
from the seeds, which are derivatives of the molecule porphyra-334;
hereafter, we will refer to them as G (in green in Figure ). The first molecular group was gathered via SimSearch, and
the second was generated via ChemVAE; hereafter, we will call them G (in blue) and G (in orange), respectively.
Figure 1
Scheme of the comparative
study.
Scheme of the comparative
study.For each group of molecules, SMILES
data; 3D MOL data, that is,
(x, y, z) coordinates; and properties by quantum chemical calculations
were obtained. Data for each molecule are represented by vector elements.
Then, the following three data mapping methods for G, G, and G were compared: (I)
a machine learning (ML)-based comparison, (II) the cheminformatic
comparison from 3D MOL, and (III) the quantum chemistry properties
comparison from DFT calculations (see right in Figure , light blue). In this paper, we will show
a demonstrative result that the new lead-optimization process produces
promising results via ChemVAE, especially in connection with quantum
chemical calculations. We believe that the current study provides
an example of machine learning applications in the search for desired
molecule from the vast chemical space.
Materials
and Methods
Preparation of Three Molecular Groups
The Seeds
Structures (370) from MAAs Molecules (19) (G)
A variety of UV-absorbing molecules,
termed mycosporine-like amino acids (MAAs), have been reviewed by
several researchers.[1−3,12−14] The MAAs from a marine organism are imine derivatives of mycosporines,
as shown in Figure a. The MAA motif contain an amino-cyclohexen imine ring linked to
an amino acid, an amino alcohol, or an amino group, which absorbs
UV light from 320 to 362 nm[12] and shows
photoprotective and antioxidant functions.
Figure 2
(a) Structures of 19
natural MAAs molecules. (b) Examples of protonated
MAA motifs of porphyra-334 (see the SI for
others).
(a) Structures of 19
natural MAAs molecules. (b) Examples of protonated
MAA motifs of porphyra-334 (see the SI for
others).As an extension of a previous
study on porphyra-334,[6] we study here the
same family of molecules with
a stable structure. Taking the ubiquitous photosensitive component
of marine algae in a liquid water environment into account, we systematically
and exhaustively obtained all possible structural isomers and tautomers
that existed in the aqueous phase. Thus, derived from the 19 molecules
shown in Figure a
as porphyra-334 derivatives, 370 seeds structures (G) were generated on account of the equilibrium
in water. From the thus-prepared G, two groups of molecules, namely G and G, were
obtained via the ChemVAE method and the SimSearch method, respectively.Given the excellent properties of porphyra-334 in UV energy absorption
and its dispersion mechanism,[1,6,14−18] we must include the protonated MAA motifs. The typical examples
of protonated MAA motifs are shown in Figure b (see the SI for
others). Thus, we added structures reflecting protonated and zwitterionic
molecules (the 99 structures, which are included in the total 370
of G; see the SI).
Molecular
Generation via ChemVAE (G)
Gómez
et al. reported a deep neural
network model consisting of three coupled functions: an encoder, a
decoder, and a predictor. It provides a machine learning-based de novo molecular design method.[8] The code and full training data sets are disclosed at their GitHub
page.[19] This model was trained on hundreds
of thousands of existing chemical structures, which allowed us to
automatically generate novel chemical structures. Owing to this system,
we could carry out the current study, that is, the group of molecules G generated via ChemVAE.Their autoencoder architecture is illustrated in Figure . Notations follow those from
the paper by Gómez et al.[8] This
trained autoencoder system has three latent representations: an embedding
vector (X_1), a latent vector (z_1), and embedding vector (X_r); hereafter,
we will call them X_1, z_1, and X_r, respectively. During the training,
the canonical SMILES strings were assigned as an input to avoid confusion
among chemically equivalent string representations. The encoder and
the decoder shown in Figure are recurrent neural networks (RNNs).
Figure 3
Scheme of the current
Chem VAE.
Scheme of the current
Chem VAE.The encoder RNN that processes
from a given SMILES string and the
decorder RNN that processes from a given X_r are stochastic operations.
As a result, the same input (smi) may be decoded into different outputs
(smi_r), reflecting the different intermediates (X_1, z_1, or X_r).
There is a possibility that the decoder RNN (from X_r to SMILES (smi))
might result in chemically invalid strings. We collected the generated
molecules, 2000 per one SMILES decoding attempt, iteratively for the
ChemVAE method. After removing duplicated strings, we obtained 550784
strings for which we employed RDkit[20] to
validate the chemical structures of the output molecules and discard
invalid ones. Thus, we finally obtained 2454 SMILES strings. Meaningless
structures were ruled out for the following reasons: having less than
four heavy atoms, failing generate a 3D structure for quantum chemistry
calculations, having unrealistic termination during quantum chemistry
calculations, or having an unstable radical species. In total, 1572
molecules were excluded. Finally, 882 molecules (G) were generated via ChemVAE (in orange, left
in Figure ).
Similarity Search by Fingerprint (G)
The SimSearch procedure in chemical databases
is a well-known and widely used process.[9,21,22] We downloaded the “Annotated” subset
of 1 458 577 582 molecules from ZINC15 (as shown
in Figure S23).[23,24] It includes compounds that are in catalogs (but not for sale). We
did not apply any other specific standardization to the molecular
database. We gathered SMILES strings in accordance with Tanimoto similarity
by utilizing MACCS, ECFP, and FCFP fingerprints (see the SI for details). ZINC15 is a research tool for
investigators to search chemical and biological targets. Notice that
fingerprints can be used for applications such as the current SimSearch
as well as for molecular characterization, molecular diversity, and
chemical database clustering. The MACCS keys have 166 bit structural
key descriptors (vector with 166 elements) in which each bit is associated
with a SMARTS pattern.[25,26] Extended-connectivity fingerprints
(ECFPs) are circular topological fingerprints designed for various
wide molecular studies and structure–activity modeling.[27,28] The ECFP encodes substructure patterns from molecules to a bit string
length of 1024 (the length can be varied). The FCFP is a variant of
this ECFP that is intended to capture precise atom environment substructural
features. The FCFPs are intended to capture more abstract role-based
substructural features.These keys were implemented in the open-source
cheminformatics software package RDkit. We gathered 1125 compounds
from a database derived from G (in green in Figure ). We removed some chemicals because of their failure to prepare
3D structures for quantum chemistry calculations. At the final stage,
we obtained 1094 chemicals (G) to be considered in the chemical space exploration (in blue, left
in Figure ).
Quantum Chemistry Properties
To prepare
geometric data for quantum chemistry calculations, the MMFF94 force
field implemented with RDkit was applied to construct 3D structures
for G, G, and G. We then performed the calculations for the ground and excited states
using density functional theory (DFT). We used the B3LYP hybrid functional
and the 6-31G(d) basis sets. The solvent effect of water was taken
account by the integral equation formalism of the polarization continuum
model (IEFPCM). We used the Gaussian 16 program package.[29] We first carried out the geometry optimizations
of the ground states, starting from the structure generated by RDkit.
We then performed the single-point calculation of the excited states
using time-dependent density functional theory (TD-DFT).As
shown in Table , we
extracted 23 properties from the calculated results, such as total
energies, the HOMO (highest occupied molecular orbital)–LUMO
(lowest unoccupied molecular orbital) gap energies, three orbital
energies around the HOMO and the LUMO, viral coefficients, dipole
moments, quadrupole moments, the degrees of freedom in the structures,
the trace of the quadrupole moment, and the coordinate invariants
of the quadrupole moment (Table ). Ground-state properties are selected for versatility.
In total, there are 84 elements for each vector. Then, we carried
out the PCA analyses, to be mentioned later.
Table 1
Quantum
Chemistry Properties Obtained
from DFT Calculations and Some Physical Chemical Properties
detail
number of
elements
estimated molecular volume[30]
1
difference of the orbital
energies (eigen values) of the HOMO and LUMO
1
quadrupole moment
3
total dipole moment
1
total energy and the viral
coefficient
1
electronic spatial
extent
1
absorption wavelength
(nm)
of the nth excited state
20
absorption energy (eV) of
the nth excited state
20
oscillation strength of
the nth excited state
20
number of electrons
1
orbital energy (eigen value)
of first through third highest occupied molecular orbitals
3
orbital energy (eigen value)
of first through third lowest unoccupied molecular orbitals
3
rotational constants
3
degree of freedom
1
number of (H, C, N, O, and
S) atoms
5
Mapping
Representations of various
vectors in chemical space[7,31] were applied for the
comparison or exploration of the internal relations. It is necessary
to map higher-ordered complex information onto a low-dimensional space.
One typical mapping method is principal component analysis (PCA),[32] which is used for exploratory data analysis
and to make predictive models. It is commonly used for dimensional
reduction by projecting each data point onto only the few principal
components to obtain lower-dimensional data. We show the first two
principal components, and the cumulative contribution rate data are
shown in the SI.
Results and Discussions
Representation for Three
Groups: G, G, and G
We
present here the results obtained via ChemVAE generation and SimSearch
mining. The comparison of the three groups (G, G, and G) was carried out by mapping three
different viewpoints: (I) ML-based, (II) cheminformatics, and (III)
quantum chemistry (right in Figure ). It is noteworthy that we used the ChemVAE method
again in the mapping process. That is, in the process of (I) the ML-based
process (Figure ),
we use SMILES strings for G and G as the input (the
second time) for the ChemVAE procedure, then we obtained output vectors
of X_1, z_1, and X_r with which we carried out the PCA mapping . The
results of X_1 and z_1 from the ChemVAE vectorization are shown below.
For the X_r results, see the SI.
Mapping (I): ML-Based Comparison
Two chemical space
representations were mapped by PCA via ChemVAE
vectorization as shown in Figure a and b (see the SI for
the X_r results). At first, the mapping results for the vectors (X_1)
are shown in Figure a, where G is distributed
slightly closer to G than G. For the second mapping, the
vector (z_1) is shown in Figure b. Now, we observe that G is distributed distinctly closer to G than G. The PCA mapping is one of the various methods used. We stay with
the method due to its well-known versatility.[33,34] We also show the results from t-SNE in Figures S15–21 in the SI. The main
arguments are the same.
Figure 4
Mapping of principal components analyses for
three groups, namely G (green), G (orange, via ChemVAE) and G (blue, via SimSearch), using
(a) ML-based
vector X_1, (b) ML-based vector z_1, (c) cheminformatics (ECFP), and
(d) (III) quantum chemistry.
Mapping of principal components analyses for
three groups, namely G (green), G (orange, via ChemVAE) and G (blue, via SimSearch), using
(a) ML-based
vector X_1, (b) ML-based vector z_1, (c) cheminformatics (ECFP), and
(d) (III) quantum chemistry.
Mapping (II): Cheminformatics Comparison
Chemical space is usually described by molecular descriptors, so-called
descriptor space. We adopted the ECFP fingerprint for these three
groups, namely G, G, and G. The PCA mapping results are shown in Figure c (see the SI for results by MACCS and FCFP). The results show that G is closer to G than G. Interestingly, the groups G and G are located
in different areas of the chemical space. This result shows that the
two methods, ChemVAE and SimSearch, provide two distinct groups of
molecules, suggesting the high potential of ChemVAE as a method for
searching through criteria different from similarity toward new areas
in chemical space.
Mapping (III): Quantum
Chemistry Properties
The chemical space spanned by vectors
consisting of quantum chemistry
properties is expressed by PCA and shown in Figure d. It can be seen from this result that
the distribution of G is located
closer to G than G. Contrasting with the other mappings
shown above, as shown in Figure d, the distribution of the two groups G and G scarcely overlap. Therefore, we can infer the fact that the
molecules in G are differentiated
well from those in G when
these vector elements consist of quantum chemistry properties.The results shown in Figure a–d indicate why it is so critical that we adopt a
relevant vector for each molecule. As shown in Figure d, we have arrived at a mapping that enables
us to distinguish among three groups of our samples. By adopting a
vector whose elements consist of quantum chemical properties, reflecting
the wave function of each molecule, we can differentiate the groups
well. The results suggest that we can obtain molecules (in orange)
that might be comparable to porphyra-334. These differentiated molecules
may potentially be new molecules.Here, we had better mention
that there may be another possibility
for the vector selection. The relevant vectors led us to the best
mapping in the molecular space to find molecules comparable to porphyra-334.
What is a rational procedure to find such an optimal vector? To the
best of our knowledge, there is no established methodology. This is
a very important issue in future. Recently, some ML-based fingerprints
have been published. Among them is the promising fingerprint Mol2vec,[35] which has been applied for drug discovery,[36,37] solvation free energy prediction,[38] the
prediction of pKa values of CH acids,[39] and other material designs. Examples include
other ML-based fingerprints such as one that uses graph-convolution
models[40] and another proceeds by the evolution
of the embedding step[41] (including an application
for SAR/SPR). Obtaining a rational procedure for creating a linkage
between classical fingerprints and ML-based fingerprints will be a
future subject.
Differences among the Three
Groups from a
Quantum Chemistry Point of View
The purpose of the current
study is to find excellent molecules. Therefore, we examine the obtained
molecules in three groups from a physical chemistry point of view.
The MAAs are known to possess high stabilities even under relatively
strong UV irradiation.[42] The absorbed energy
is expected to be dissipated very efficiently to the surrounding water
environment.[5,7,29,31,42−45] It is the typical mechanism for porphyra-334 and its charasteristics
of UV-resistance and the nondestructive release of energy properties.Among many properties of porphyra-334, we must consider the critical
ones, that is, its hydrophilic property (log P),
absorption wavelength (λmax), and oscillator strength
(f). Although log P is widely used,
we focus here on quantum chemistry properties and did not include
log P. The results with log P included
did not change our conclusion described below. The details of the
results and arguments for log P are explained in
the SI. Since the excitation wavelength
(λmax) in UV–visible range and the oscillator
strength (f) are the indispensable properties for
the optical property in porphyra-334, we employed the TD-DFT method
to calculate the excitation energies and oscillator strengths of the
three groups G, G, and G.Among the various UV regions, namely UVB (280–315
nm), UVA1
(315–340 nm), and UVA2 (340–400 nm), we filtered molecules
whose calculated spectral characteristics were in the 300–350
nm range, reflecting the absorbing range of porphyra-334. We paid
special attention on the zwitterionic isomers, since the protonated
MAA motifs for photoprotective and antioxidant functions are critical
isomers, as was reported in our previous study.[6] We extracted charge-neutral and zwitterionic forms of G via SimSearch and G via ChemVAE. The histogram of the calculated
oscillator strengths is shown in Figure . Thus, the number of molecules that satisfied
the threshold of spectral properties f > 0.1 and
300 < λ < 350 for G, G, and G, are 52, 138, and 6, respectively. These
molecules were finally filtered and scrutinized described below.
Figure 5
Histogram
of calculated oscillator strengths in the 300 < λ
< 350 nm range for the three groups, namely G (green), G (orange, via ChemVAE), and G (blue, via SimSearch).
Histogram
of calculated oscillator strengths in the 300 < λ
< 350 nm range for the three groups, namely G (green), G (orange, via ChemVAE), and G (blue, via SimSearch).
Mapping of the Final Selected Molecules
The results shown in Figure for ML-based, cheminformatics-based, and quantum chemistry-based
mappings were filtered by the criteria f > 0.1
and
300 < λ < 350, and results are shown in Figure . We then focused on the selected
molecules and examined the features of these molecules. The results
are shown in Figure .
Figure 6
Filtered molecules (f > 0.1 and 300 < λ
< 350 nm) from those shown in Figure for (a) X_1, (b) z_1, (c) ECFP (fingerprint),
and (d) quantum chemistry.
Filtered molecules (f > 0.1 and 300 < λ
< 350 nm) from those shown in Figure for (a) X_1, (b) z_1, (c) ECFP (fingerprint),
and (d) quantum chemistry.All the plots in Figure satisfy the conditions f > 0.1 and 300
<
λ < 350. As shown in Figure , the data points (each plot corresponds to each molecule
expressed by one vector from X_1 or z_1 of the ChemVAE vectorization)
cannot be clearly divided into clusters. This is quite natural in
the sense that the results at the X_1 or z_1 level still correspond
to these bu way of machine learning.By contrast, the data shown
in Figure c show
relatively separated features in
two clusters. One is the G group (orange) and the other is the G (green) and G (blue)
groups. In the latter, the two groups (G and G) are mostly
overlapped. These results suggest the possibility that we can somehow
explore new chemical space using vectors generated via ChemVAE, even
though at this stage the elements consist only of structural information
and do not yet include quantum chemistry information.At the
final stage, as shown in Figure d, the plots show a promising feature. These
data were generated via the vectors whose elements consisted of quantum
chemical properties. The G (orange) data show a distribution with a large diversity, whereas
the other two, G (blue) and G (green), are covered by the G (orange) zone; they stay in one
section and do not spread, suggesting their properties have less diversity.
From the aspects shown in Figure d and Figure d, as a matter of fact, many molecules belonging to G were rejected by the filtration
criteria (f and λ). When we take the quantum
chemical properties into account, we can explore the chemical space
more widely via ChemVAE than via SimSearch.It may be relevant
to cite here the arguments given by Gómez
et al.[8] and various researchers[46−49] as well as the reported studies in which quantum chemical properties
were predicted by machine learning.[50,51] Moreover,
some studies using transfer learning have been published.[47,52] A future subject remains, specifically how to find new strings of
molecular representation beyond SMILES. Currently we are using only
SMILES strings, therefore the performance of machine learning for
chemical information is still limited. It is noteworthy that recently
some research examples beyond SMILES have appeared, such as those
from graph theory[51] and those from linear
string.[46]The current mapping in Figure d shows that quantum
chemical properties do extend a new horizon of the
search area. Methodologies based on
molecular machine learning (ChemVAE) are thus promising when we add
quantum chemical properties.The excellence of porphyra-334
may not be limited only to its intramolecular
properties. The excellence may exist further in its ability to form
intermolecular interactions such as subtle hydrogen bond networks.
If we can include molecular information derived from other dimensions
such as wave functions and responsive properties to the environment
instead of solely structures, the potential of machine learning will
be further realized. The inclusion of such properties will be a future
subject.
De Novo Molecules Generated
via ChemVAE
According to calculated spectral properties and
the mappings after filtration, we have now demonstrated a promising
performance of the method via ChemVAE. We show representative examples
of the filtered and selected final structures from G in Figure .
Figure 7
Selected molecular structures from GVAE.
Selected molecular structures from GVAE.To show the currently obtained promising feature
of ChemVAE molecular
generation together with quantum chemistry properties, we display
eight representative molecules in Figure . Among the filtered (selected) molecules
shown in Figure d,
these eight representative molecules are located in the vicinity of G plots. The other G molecules are also shown in the SI. By contrast, only six molecules from the G group satisfied the calculated
spectral requirements (see the SI).As shown in Figure , the presence of molecules with a five-membered ring is noteworthy.
In their molecular molecular paper, Losantos et al.[17,18] reported the protonated MAA motifs and also proposed protonated
five-membered-ring motifs. Since natural bioactive MAAs have six-membered-ring
motifs, their rational design shows the significance. Indeed, the
thus-proposed five-membered-ring photoactive molecules were not registered
in the database of ZINC15 until now. Even among the molecules in the G group obtained via SimSearch,
we could not find the molecules that they designed. By contrast, we
generated the molecules with five-membered rings, as shown in Figure , in the G group via ChemVAE.
Conclusions
This study reports the results of a comparative
study between the
ChemVAE method and the SimSearch method, which was focused on their
generation ability for new functional molecular designs. Defining
the natural porphyra-334 as a model molecule, we generated three groups:
molecules of MAAs as seeds, molecules generated via ChemVAE, and molecules
gathered via SimSearch (G, G, and G, respectively). There were 52, 138, and 6
molecules that satisfied the condition of the light absorption ability
of porphyra-334 at f > 0.1 and 300 < λ
<
350 in G, G, and G, respectively. The ChemVAE method shows promising potential
for future molecular design capability. When we use quantum chemistry
properties for the ChemVAE method, we can obtain molecules significantly
comparable to porphyra-334, including unexpected ones (five-membered
ring).
Data and Software Availability
We
used the Gaussian 16 program package[29] for
the quantum chemistry calculations. We used RDkit[20] for the 3D structure construction (MMFF94 force field),
the fingerprints (MACCS, ECFP, and FCFP), and the Tanimoto similarity
of the fingerprints. We used the OpenBabel toolkit[53] for the data I/O. The multivariate analysis and mapping
are proprietary but not restricted to our program.
Authors: Dmitry I Osolodkin; Eugene V Radchenko; Alexey A Orlov; Andrey E Voronkov; Vladimir A Palyulin; Nikolay S Zefirov Journal: Expert Opin Drug Discov Date: 2015-06-22 Impact factor: 6.098
Authors: John J Irwin; Teague Sterling; Michael M Mysinger; Erin S Bolstad; Ryan G Coleman Journal: J Chem Inf Model Date: 2012-06-15 Impact factor: 4.956