Literature DB >> 22985496

OMG: Open Molecule Generator.

Julio E Peironcely1, Miguel Rojas-Chertó, Davide Fichera, Theo Reijmers, Leon Coulier, Jean-Loup Faulon, Thomas Hankemeier.   

Abstract

Computer Assisted Structure Elucidation has beepan class="Chemical">n used for decades to discover the chemical structure of unknown compounds. In this work we introduce the first open source structure generator, Open Molecule Generator (OMG), which for a givenpan> elemenpan>tal compositionpan> produces all nonpan>-isomorphic chemical structures that match that elemenpan>tal compositionpan>. Furthermore, this structure genpan>erator canpan> accept as additionpan>al inpan>put onpan>e or multiple nonpan>-overlappinpan>g prescribed substructures to drastically reduce the number of possible chemical structures. Beinpan>g openpan> source allows for customizationpan> anpan>d future extenpan>sionpan> of its functionpan>ality. pan> class="Chemical">OMG relies on a modified version of the Canonical Augmentation Path, which grows intermediate chemical structures by adding bonds and checks that at each step only unique molecules are produced. In order to benchmark the tool, we generated chemical structures for the elemental formulas and substructures of different metabolites and compared the results with a commercially available structure generator. The results obtained, i.e. the number of molecules generated, were identical for elemental compositions having only C, O and H. For elemental compositions containing C, O, H, N, P and S, OMG produces all the chemically valid molecules while the other generator produces more, yet chemically impossible, molecules. The chemical completeness of the OMG results comes at the expense of being slower than the commercial generator. In addition to being open source, OMG clearly showed the added value of constraining the solution space by using multiple prescribed substructures as input. We expect this structure generator to be useful in many fields, but to be especially of great importance for metabolomics, where identifying unknown metabolites is still a major bottleneck.

Entities:  

Year:  2012        PMID: 22985496      PMCID: PMC3558358          DOI: 10.1186/1758-2946-4-21

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


Background

Computer Assisted Structure Elucidation (CASE) of chemical compoupan class="Chemical">nds is one of the classical problems positioned at the intersection of informatics, chemistry, and mathematics. CASE tools have been employed during decades to elucidate the chemical structure of small organic molecules. In its most general definition, a structure elucidation system receives experimental chemistry data of an unknown molecule as input, and outputs a list of possible chemical structures. The input can be the elemental composition of the elusive molecule, nuclear magnetic resonance (n class="Chemical">NMR) anpan>d/or mass spectrometry (MS) spectra (provided the genpan>erator canpan> simulate spectra anpan>d match it to the experimenpan>tal ones) or inpan>formation of prescribed substructures. The output is a list of canpan>didate structures matchinpan>g these conditions, ideally containpan>inpan>g all possible structures without duplications. A small list of canpan>didates is depenpan>denpan>t on the number of constrainpan>ts derived from experimenpan>tal data; the higher the number of constrainpan>ts we use the smaller the canpan>didate list will be. The ultimate goal for such a system beinpan>g fully automated anpan>d returnpan>ing only one anpan>d correct molecule is not yet at our reach, despite decades of research [1]. The DEn class="Chemical">Npan>DRAL [2] project is widely regarded as the initiator of the use of these methods to provide a system for Computer Assisted Structure Elucidationpan> (CASE). It inpan>volved the developmenpan>t of artificial inpan>telligenpan>ce algorithms that would extract heuristics from MS anpan>d pan> class="Chemical">NMR data and use them to constrain the output of a structure generator. CONGEN was the structure generator developed within DENDRAL, which preceded a more advanced generator known as GENOA [3]. Many commercial structure generators were developed later, most renowned ones being CHEMICS [4], ASSEMBLE [5], SMOG [6], and the most widely used of all of them, the general purpose structure generator MOLGEN [7]. These closed source software tools work like a black box, where the user cannot, on the one hand, understand the functioning of the software and on the other hand, customize the tool to his needs. These drawbacks of closed source software (where the source code is not provided) can be circumvented by open source tools. Two open source structure generators have been developed that work with NMR data, the deterministic LSD [8] and the stochastic SENECA [9]. Implementation of open source stochastic and deterministic structure generators have been explored within the Chemistry Development Kit (CDK) [10,11]. Unfortunately, these generators failed to generate all chemical structures possible and were discontinued in recent releases of CDK. Despite these efforts, no general purpose deterministic structure generator has been developed in an open source format so far. The advance of “omics” sciepan class="Chemical">nces in the last decade, in particular of metabolomics [12], has renewed the interest of researchers in developing better structure generators. Metabolomics aims at detecting and identifying metabolites in an organism and has resulted in a large list of potential biomarkers for which the chemical structure is unknown [1,13]. When trying to identify the structure of unknown molecules, scientists first perform an identity search by querying reference databases using their experimental information [1,14-16]. In such case, they use the elemental composition of the metabolite derived from mass spectrometry (MS) or the spectra of nuclear magnetic resonance (NMR). Whenpan> the metabolite is a real unknpan>own it is not presenpan>t inpan> anpan>y database, therefore the query returnpan>s no results. This forces scienpan>tists to propose canpan>didate structures usinpan>g a differenpan>t approach, onpan>e of them is usinpan>g a structure genpan>erator [17,18], which produces all possible molecules givenpan> anpan> elemenpan>tal compositionpan> anpan>d optionpan>al, other conpan>strainpan>ts. Examples of conpan>strainpan>ts are prescribed substructures that each output molecules should conpan>tainpan> anpan>d that are derived from experimenpan>tal pan> class="Chemical">NMR, MS2, or MSn data. Hence, the need for deterministic and flexible structure generators in the field of metabolomics presents should be met with new algorithms [1]. The majority of structure generators rely opan class="Chemical">n graph theory to produce their desired output. Interestingly, compounds can be represented as molecular graphs where atoms and bonds are translated into vertices and edges, respectively, to which theorems and algorithms proposed by graph theory can be applied. This ensures that the output is correct, exhaustive, and free of isomorphs. Such methods can be the orderly enumeration proposed by Read [19] and Faradzev [20], a stochastic generator [21], the homomorphism principle [22] used by MOLGEN, or the “canpan>onical augmenpan>tation path” proposed by McKay [23]. This last method, originpan>ally inpan>tenpan>ded to genpan>erate simple graphs by addinpan>g vertices, has beenpan> applied to the genpan>eration of some families of graphs anpan>d also to genpan>erate the chemical universe of molecules up to 11 atoms [24] anpan>d recenpan>tly to 13 atoms [25]. Despite the goal was to genpan>erate molecules, these two approaches inpan>itially employed canpan>onical path augmenpan>tation to genpan>erate all possible simple graphs up to 11 anpan>d 13 vertices, respectively. Posterior topological anpan>d rinpan>g system filter were used to remove unwanpan>ted graphs. Lastly, the vertices were colored with chemical elemenpan>ts anpan>d the edges with a bond order, which turnpan>ed the graphs inpan>to molecules. Simple chemical constrainpan>ts like connectivity anpan>d atom valenpan>ce were applied to reduce the list of finpan>al molecules. This process, which relies on genpan>eratinpan>g simple graph, is necessarily limited on the size of the molecules that canpan> be genpan>erated because a linpan>ear inpan>crease inpan> the number of atoms produces anpan> exponenpan>tial inpan>crease of both the number of graphs anpan>d molecules. Here we presenpan>t the Openpan> Molecule Genpan>erator (pan> class="Chemical">OMG), a structure generator based too on McKay augmentation algorithms, but rather than first generating graphs and secondly transforming these graphs into molecules, our implementation of McKay technique directly constructs molecules. In this way we can generate chemical structures much greater than 13 atoms. Essential concepts of graph theory will be introduced in the methods section. Chen mepan class="Chemical">ntioned two future challenges facing CASE systems [26]. The first challenge for elucidating structures is to have a knowledge system of previously identified compounds, as well as mining tools for such data. In this direction, Rojas-Chertó et al. [27] developed a system to store spectral data and mine the database to extract substructure information that can be used as prescribed substructures in our structure generator. The second challenge is the need for filtering and selecting candidate structures. This is often performed by predicting a property of the candidate structures that is related to the field of research, for instance, predicting the spectra in analytical chemistry, the bioactivity in ligand design, or the Metabolite-Likeness [28] in metabolomics studies, to name a few. Furthermore, the need of a structure generator tool that can be adapted to the requirements of the field in which it is going to be applied, demonstrates the usefulness of open source tools compared to commercial "black box" generators. In this paper we presepan class="Chemical">nt the first general purpose open source structure generator, Open Molecule Generator. OMG adapts methodologies from the field of graph theory anpan>d determinpan>istic graph enpan>umerationpan> to the classical problem of chemical structure genpan>erationpan>. Inpan> this sense, we have used the approach of “canpan>onpan>ical path augmenpan>tationpan>” to enpan>sure that we exhaustively genpan>erate nonpan>-isomorphic chemical structures for a givenpan> elemenpan>tal compositionpan>. This genpan>erationpan> tool has beenpan> implemenpan>ted usinpan>g CDK [10,11], a widely used openpan> source library for the developmenpan>t of chemoinpan>formatics software. It allowed the represenpan>tationpan> of enpan>tities such as molecules, atoms, anpan>d bonpan>ds inpan> our program anpan>d the use of functionpan>s like removinpan>g pan> class="Chemical">hydrogen atoms, checking the saturation of a molecule, removing a bond, and many more. The resulting tool generates all possible non-duplicate chemical structures for a given elemental composition, with the option to generate only those that contain one or multiple non-overlapping substructures, which is the most important constrain to reduce the number of resulting candidate structures when a knowledge system is not available [18]. We have used OMG to generate molecules for the elemental composition of well known metabolites, also including one or more prescribed substructures as input. These results are compared to those obtained by MOLGEN.

Materials and methods

Chemical elements and atom types

We would like to describe some concepts related to atoms that are necessary to understand the theory and algorithm behind n class="Chemical">OMG anpan>d the use of CDK to hanpan>dle chemistry. In pan class="Chemical">nature, atoms of different chemical elements (carbon, pan> class="Chemical">nitrogen, oxygen, and others) are connected to each other by bonds in order to form molecules. The valence, to which we will also refer as degree, of these chemical elements determine how many bonds each element can have. Carbon has a valence of 4, oxygen of 2, nitrogen of 3 or 5, sulfur of 2,4 or 6, phosphor of 3 or 5. Thus a carbon atom becomes saturated when it has 4 bonds, where a single bond counts as one bond, a double as two bonds, and a triple as three bonds. Regarding molecules, we consider a molecule to be saturated when all its atoms are saturated. In some special occasions, atoms are charged, which makes them having a different valence. In the case of OMG, we only use neutral atoms and as a consequence only neutral molecules are produced, therefore all finished molecules will contain atoms with the valences mentioned before. A chemical element capan class="Chemical">n have multiple atom types, also for the same valence of an element, as defined by the dictionary of atom types in CDK. This dictionary defines for each atom the number of neighbors, pi bonds, charges, lone electron pairs, and hybridizations, in order to accommodate the different states a chemical element can have due to different bonds, number of neighboring atoms, charges and hybridizations. These atom types are based on the chemical elements that have been observed in nature for saturated molecules. This is why we use the CDK atom dictionary to validate the atoms of our finished molecules. n class="Chemical">OMGpan> will output only molecules that are saturated anpan>d that conpan>tainpan> the atoms specified inpan> the elemenpan>tal compositionpan>. Apart from finpan>ished molecules, pan> class="Chemical">OMG has to represent during the generation process intermediate chemical structures that are not finished yet. These might contain disconnected fragments and atoms that are not saturated. CDK atom types are not designed to represent atom types of unsaturated chemical elements; therefore we opted for implementing a simple atom dictionary. For each chemical element, this dictionary defines its valence, in other words, the maximum degree. Hence for intermediate chemical structures we only check that the current degree of each atom does not exceed the maximum degree. MOLGEn class="Chemical">Npan> can also produce molecules with multiple valenpan>ces, but it hanpan>dles them inpan> a differenpan>t way. While with pan> class="Chemical">OMG only the elemental composition needs to be provided to generate molecules with multiple valences, MOLGEN requires knowing a priori which one of the multiple valences has to be used. It uses by default the lowest valence, this is, N valence 3, P valence 3, and S valence 2, unless a different valence is specified. In Table 1 the atom types produced by OMG and MOLGEN for non-default valences are presented. Using sulfur as an example, n class="Chemical">OMG will output molecules with containing sulfur valence 2, 4 and 6. For the same chemical element, MOLGEN will produce by default molecules with sulfur valence 2. If one sets the valence of sulfur to 6, it will only produce sulfur valence 6 and not valence 2 and valence 4. MOLGEN cannot generate molecules with atoms of different valences for the same chemical element, this is, if molecule has two sulfur atoms, one will not be of valence 4 and the other of valence 6, both will be either valence 2, 4 or 6.
Table 1

Atom types produced by OMG and MOLGEN for non-default valences of N(5), P(5) and S(4 and 6)

ValenceMOLGENOMG
N valence 5


P valence 5


S valencee 4


S valence 6

By default OMG outputs molecules with valences N(3 and 5), P(3 and 5), and S (2,4 and 6). By default MOLGEN outputs molecules with valences N(3), P(3), and S(2).

Atom types produced by n class="Chemical">OMGpan> and MOLGEpan> class="Chemical">N for non-default valences of N(5), P(5) and n class="Gene">S(4 and 6) By default n class="Chemical">OMGpan> outputs molecules with valences pan> class="Chemical">N(3 and 5), P(3 and 5), and S (2,4 and 6). By default MOLGEn class="Chemical">N outputs molecules with valences N(3), P(3), and S(2). The principle followed by CDK to build its atom dictiopan class="Chemical">nary is to allow atom types with valences for which there is a consensus agreement on their existence, this is, for which known molecules exist with such valences. Conversely, MOLGEN produces all theoretically possible combinpan>ationpan>s of bonpan>d orders for a givenpan> valenpan>ce, as it canpan> be observed inpan> Table 1. For example, as it canpan> be seenpan> for P valenpan>ce 5 pan> class="Chemical">OMG only produces one atom type with one double bond and three single bonds. In comparison, MOLGEN produces all the combinations of single, double, and triple bonds that add to 5. As a consequence, when the desired valence is unknown, which is usually the case in metabolite identification, molecules need to be generated with all possible valences. As a result, the number of output molecules by both generators is different for elemental compositions that contain chemical elements with multiple valences. This deterministic generation of valences in MOLGEN comes at the expense of generating molecules having unrealistic structures.

Graph theory and chemistry

The chemical structure of molecules can be represepan class="Chemical">nted as a graph, where atoms and bonds in molecules correspond to vertices and edges, respectively, in graphs. In molecules, bonds connecting two atoms can have a degree depending on the number of electrons they share. Such a degree can also be assigned to the edges of a graph, which is called a multigraph. The different chemical elements present in the periodic table are represented in graphs as colors assigned to the vertices. We define a non-directed colored multigraph G = (V, E) as where V is a set of vertices and E is a multiset of edges, where each edge is an unordered pair of vertices, and a function Col : V → colors. In this multigraph, we say that a, b ∈ V are n-connected if there are exactly n edges (a, b) ∈ V. Apart from the color function, a multigraph is characterized by the function d : V × V → n class="Chemical">N which returnpan>s the degree of the edge connectinpan>g each couple of vertices. From now on we will inpan>distinpan>ctively refer to graphs anpan>d multigraphs. In chemistry, the valepan class="Chemical">nce rule determines the maximum number of bonds each chemical element has. In order to take this into account, we define d : V → N which returnpan>s the number of edges of a givenpan> vertex anpan>d a max-degree functionpan> md : V → M, which returnpan>s the maximum number of edges of a givenpan> vertex. We say that a multigraph is unpan>der-saturated if ∀ v ∈ V, d(v) ≤ md(v) there is at least one vertex such that d(v′) < md(v′). A multigraph is saturated if the equality dv = md holds for every vertex. Inpan> chemistry, molecules corresponpan>d to saturated colored multigraphs anpan>d max-degree depenpan>ds onpan> the color, which is the chemical elemenpan>t. For inpan>stanpan>ce, for a pan> class="Chemical">carbon element, md(C) = 4  and for an oxygen element, md(O) = 2. We consider a multigraph to be copan class="Chemical">nnected if ∀ v, w ∈ V,  ∃ S{ = {v1, ⋯, v} such that v, v1 and v, w are connected and for each i < m, v is connected to v. In other words, a multigraph is connected if for all pair of vertices, there exists at least one path S{ connecting both vertices. This condition is necessary for chemistry, since intermediate chemical structures in the generation process can be composed of disconnected fragments, it ensures that the generated molecules are one fully connected structure and not made of disconnected substructures. Notice that pan> class="Chemical">hydrogen atoms (the most frequently found chemical elements with degree 1) are not considered in the generation process, since they are terminal elements of the molecule and they cannot connect two disconnected elements of the molecule. Hydrogen atoms are only used to validate the completeness of finished molecules. Halogen atoms like fluorine, chlorine, and iodine, also of degree 1, are considered during the generation process.

Graph labeling

An isomorphism π is a fupan class="Chemical">nction that for each vertex v ∈ V, Col(π(v)) = Col(v) and for each pair of vertices v ∈ V, v ′ ∈ V ′, d(π(v), π(v′)) = d(v, v′). A labeling function σ : V → {1, ⋯, n} is a bijective map from the vertices of a colored multigraph to an ordered list labels with a cardinality equal to the number of vertices. Put simple σ, assigns to each vertex a label. Let σ− 1 be the inverse function of σ, which returns the vertex corresponding to a label. We say a labeling function is canonical if given any two isomorphic colored multigraphs G = (V, E) and G ′ = (V ′, E ′), the bijective function π : V → V ′ defined as π(a) = σ− 1(σ(a)) is an isomorphism of V in  V ′. Therefore, a canonically labeled multigraph is a multigraph whose vertices are associated to an ordered list through a canonical labeling function. Furthermore, a canonical hash of the labeling is a bijective function between the space of the canonically labeled multigraphs and the value space and it is represented as a string of integers. It is interesting to note here that two isomorphic graphs have the same canonical hash, a fact that will be used to remove duplicated molecules during the generation process.

Using fragments

A fragment or substructure of a molecule is equivalepan class="Chemical">nt to a fragment or subgraph of a graph. We define a fragment as a subset of a graph and it is characterized by the function d : V × V → N where pan> class="Chemical">N is the number of edges connecting each pair of vertices in the subgraph. Such d has to fulfill the condition d(a, b) ≤ d(a, b), ∀ a, b ∈ V and at least for one edge d < d, this is, the fragment should have fewer edges than the graph.

Canonical augmentation

An augmentation of a multigraph G = (V, E) is a multigraph G′ = (V′, E′), defined on the same set of vertices, such that , except for one and only one pair where  d(a, b) = d(a, b) + 1. Let e ′ ∈ E ′ be the edge which degree has been increased, d(e′) = d(e) + 1. Let e be the last edge of σ(G′) and a, b ∈ V the vertices of e. Copan class="Chemical">nsider σ− 1(a, b) = a ′ ′, b ′ ′ to vertices of G ′ ′, a copy of G′, to which a bond order decrease is performed d(a′ ′, b′ ′) = d(a′ ′, b′ ′) − 1. The resulting multigraph G′ ′ after this decrease in bond order, can be seen as the result of a canonical deletion on G′, the reverse operation of a canonical augmentation. In our definition of canonical augmentation we consider a multigraph G′ = (V′, E′) to be canonically augmented from G = (V, E) if it is an augmentation and σ(G′ ′) = σ(G). In other words, we consider G′ to be a canonical augmentation of G if a canonical deletion in G′ results in G.

Description of the algorithm

The generatiopan class="Chemical">n of structures can be seen as a tree of intermediate chemical structures that our tool explores. At the root of the tree we find a collection of fully isolated/disconnected atoms. One bond is added at each level of the tree, resulting in fully connected/finished molecules at the leaves. The canonical augmentation path is a depth-first backtracking algorithm, where the recursive function generate described in Algorithm 1, implements the addition of one bond in all possible ways for a given intermediate chemical structure, and evaluates for each extended molecule that this extension has been performed in a canonical way, as described before. Here adding one bond means increasing the degree of the bond between two atoms, hence a single bond becomes a double bond and a double bond becomes a triple bond. If there is no bond between two atoms, a single bond is created. Between lipan class="Chemical">nes 2 and 9 of Algorithm 1, the molecule is stored if it is finished, which occurs when the molecule is saturated and all the atoms of the elemental composition, including the n class="Chemical">hydrogen atoms, have beenpan> used, all the atoms are validated by the CDK atom dictionary anpan>d are connected forminpan>g one sinpan>gle structure anpan>d npan>ot multiple disconnected fragmenpan>ts. In the case the molecule is pan class="Chemical">not finished, it would be extended in all possible ways by adding one bond. If there exists a bond between a pair of atoms function extend, in line 12 of Algorithm 1, will increase the multiplicity. The generation of new bonds is controlled by n class="Chemical">OMG atom type definpan>itions for inpan>termediate chemical structures, which guaranpan>tee that the degree of the atoms does not exceed the maximum degree allowed for its chemical elemenpan>t. Functiopan class="Chemical">n canonize, in line 15 of Algorithm 1, returns the canonical version of the molecule. We modified the graph canonizer Nauty [23,29] inpan> order to allow multigraphs anpan>d not onpan>ly simple graphs. Other canpan>onpan>izers for graphs exist like MOLGEpan> class="Chemical">N-CID [30] or the Signature Canonizer [31], but Nauty has been the most widely used for graphs as well as for chemistry problems, like InChI [32] codes. Nauty is the canonizer of choice because it is the fastest of all available canonizers for bounded valence graphs below 100 vertices [33] (molecules are examples of this class of graphs). Firstly, the function canonize translates the molecule into a colored multigraph. Secondly, it utilizes Nauty to calculate the canonical labeling of the multigraph. Thirdly, this canonical labeling is used to construct the canonical version of the input molecule. Lastly, the canonical hash string of each augmented molecule is stored in a hash map, lines 16 and 17, in order to remove duplicated extensions at each level of the tree. Each unique extension is checked for canonical augmentation, line 18, using Algorithm 2, or Algorithm 3 in case prescribed substructures were provided. If this extension is successful, the function generate is called, line 19 of Algorithm 1, and the molecule we want to continue extending is passed as a parameter. When a molecule cannot be extended any further, the recursion is terminated and the program backtracks in the search tree.

Input and output

The minimum ipan class="Chemical">nput required is the elemental composition of the structures that have to be generated. Optionally, a structure-data file (SDF) canpan> be provided conpan>tainpan>inpan>g onpan>e or more prescribed substructures that we wanpan>t our output molecules to containpan>. Sinpan>ce pan> class="Chemical">OMG does not take hydrogen atoms into account during the generation of intermediate chemical structures, the hydrogen atoms present in the substructures will be removed before the generation process begins. These substructures should be non-overlapping, i.e. they should not share any atoms. This limitation is due to the fact that our algorithm grows molecules by adding bonds and, if two atoms in different fragments were in fact the same atom, our algorithm would create bonds between those atoms, which would clearly lead to incorrect results. In practice, multiple substructures capan class="Chemical">n be available, but the user does not know if they overlap. This limitation can be circumvented by using the largest substructure as constraint for the generation and the remaining substructures as a posterior filtering, only keeping the molecules with those substructures. By default, the structure generator returpan class="Chemical">ns the count of molecules it generated. Optionally, it can store all the molecules in an SDF file. If prescribed fragmenpan>ts are provided, pan> class="Chemical">OMG outputs only the molecules containing such fragments. We have opted to use SDF as our input and output format, but via CDK, other formats can easily be implemented in OMG.

Data

As mentiopan class="Chemical">ned in the introduction, the identification of the chemical structure of metabolites is one of the current bottlenecks of metabolomics. In this sense, a structure generator can contribute to overcome this bottleneck, since it can provide candidate structures for an unknown metabolite. Therefore, metabolites appear to be a relevant family of compounds to test our structure generator. A list of metabolites was selected and their elemental composition was compiled to evaluate the performance of our structure generator on different inputs. The source of the compounds employed was the Human Metabolome Database (HMDB) [34], which conpan>tainpan>s almost 8,000 metabolites anpan>d is the most comprehenpan>sive database of pan> class="Species">human metabolites. A study of the human metabolite space and the properties of the metabolites that occupy it, has been previously reported [28]. The selection criteria were to include cyclic and acyclic compounds, of different molecular weights, and containing different chemical elements like C, O, N, P, and S. A first test set included metabolites with C, O, and H, chemical elements with one valence. A second test set included metabolites with C,O,H and also chemical elements with multiple valences, like N, P, and S. Furthermore, for some of these metabolites, several substructures were drawn and provided to the structure generator as additional input. These substructures are easily identified by an expert from direct inspection of MS2 or MSn experimental data. The aim was to assess the importance of having fragment information to reduce the list of generated structures.

Results and discussion

Structure generation from elemental formula

The algorithm presented ipan class="Chemical">n this work, the Open Molecule Generator, was tested and compared with the commercial structure generator, MOLGEN. Both genpan>erators take resonpan>anpan>ce inpan>to account producinpan>g all the contributinpan>g structures. As a result, the two resonpan>anpan>t forms of pan> class="Chemical">benzene will be considered as different molecules. Both OMG and MOLGEN are not limited to acyclic structures [35,36],thus the two structure generators tested can generate molecules with rings. Furthermore, both tools generate molecules containing common chemical elements present in metabolites, like C, O, N, H, P, and S, and are not limited to only 4 chemical elements [36]. Both structure generators generate molecules for a given elemental composition by exhaustively producing all non-redundant chemical structures. The number of molecules produced after usipan class="Chemical">ng the elemental compositions of a diverse selection of metabolites containing only C, O and H, is presented in Table 2. For all these metabolites, the same number of molecules is generated by both generators. While both generators produce complete results, MOLGEN does it inpan> less time. The time betweenpan> inpan>itializationpan> anpan>d finpan>alizationpan> was measured usinpan>g time functionpan>s inpan> JAVA for pan> class="Chemical">OMG and equivalent functions in python for MOLGEN. We can observe in Table 2 the time in seconds to generate all the candidate structures and the time to generate each molecule in milliseconds. If we look at time per molecule, MOLGEN is 4 times faster than OMG for small molecules like pyruvic acid. For larger molecules MOLGEN obtains a constant time per molecule between 0.008 and 0.009 milliseconds, while OMG ranges from 18 to 45 milliseconds depending on the elemental composition. Lightweight profiling of OMG was performed using VisualVM (version 1.3.4), in order to have an understanding of the limiting points in the performance of OMG. The most relevant finding was that the canonization process, which uses Nauty, took half of the total running time.
Table 2

Number of chemical structures generated by OMG and MOLGEN using as input only the elemental compositions of metabolites containing C,O and H elements

StructureName HMDB ID elemental compositionMOLGEN
OMG
# Candidate structuresTime (s)Time per molecule (ms)# Candidate structuresTime (s)Time per molecule (ms)

Pyruvic acid HMDB00243 C3H4O3
152
0.129
0.849
152
0.509
3.349

Malic acid HMDB00156 C4H6O5
8,070
0.222
0.028
8,070
27.074
3.355

D-Xylose HMDB00098 C5H10O5
18,092
0.332
0.018
18,092
125.783
6.952

D-Fructose HMDB00660 C6H12O6
267,258
2.381
0.009
267,258
5,035.371
18.841

Sedoheptulose HMDB03219 C7H14O7
4,106,823
38.945
0.009
4,106,823
186,248.085
45.351

Pectin HMDB03402 C6H10O7
3,183,337
26.512
0.008
3,183,337
46,320.522
14.551

Galactonic acid HMDB00565 C6H12O7
767,569
6.957
0.009
767,569
22,475.987
29.282

Galactaric acid HMDB00639 C6H10O8
8,568,129
78.354
0.009
8,568,129
186,730.365
21.794

Cholic acid HMDB00619 C24H40O5
* More than 2,147,483,646
* not available
* not available
* More than 2,147,483,646
* not available
* not available
Phenyllactic acid HMDB00779 C9H10O348,496,265404.0520.008** More than 48,496,265** not available** not available

*Results were not generated due to excessive computational time needed to generate all the candidate structures. However, we expect OMG to generate more molecules than MOLGEN, due to the larger amount of atom types produced by OMG.

**Results were not generated due to excessive computational time needed to generate all the candidate structures.

Npan>umber of chemical structures generated by pan> class="Chemical">OMG and MOLGEN using as input only the elemental compositions of metabolites containing C,O and H elements *Results were not gepan class="Chemical">nerated due to excessive computational time needed to generate all the candidate structures. However, we expect OMG to genpan>erate more molecules thanpan> MOLGEpan> class="Chemical">N, due to the larger amount of atom types produced by OMG. **Results were not generated due to excessive computational time needed to generate all the candidate structures. We observed that MOLGEn class="Chemical">Npan> stops the generationpan> of molecules after two billionpan> molecules, as it canpan> be observed for a large molecule like pan> class="Chemical">cholic acid (Table 2). Since both generators produce the same molecules for elemental composition with C, O and H, we can only assume that more than two billion molecules could be generated. In the case of phenyllactic acid, MOLGEN produces more than 48 million molecules in 404 seconds. Due to excessive computational time, no results for this elemental composition are reported for OMG, though the same number of molecules is expected (if executed for enough time) as is the case for all the other elemental compositions in this subset. As stated in Methods, both gepan class="Chemical">nerators treat atoms having multiple valences in different ways, this is the reason to use a second set of molecules containing also N, P anpan>d S. The default valenpan>ces used by MOLGEpan> class="Chemical">N for N is 3, for P is 3, and for S is 2, unless stated otherwise. The results for these molecules are presented in Table 3. As expected, the number of candidate structures differs between both generators. For the elemental composition of glycine, MOLGEN produces 84 molecules only with N valence 3 and 162 molecules only with N valence 5. For the same elemental composition, OMG produces 97 molecules with valence 3 and 5 for N, which include the 84 of MOLGEN N valence 3 and 13 additional molecules with valence 5, containing N with the atom types depicted in Table 1 for OMG-CDK. The difference in the number of candidate structures is larger for elemental compositions containing many atoms with multiple valences, as is the case of creatinine. For this metabolite, MOLGEN generates 93,323 candidate structures with the default valence 3 for N. On the contrary, OMG produces 303,601 candidate structures, containing N valence 3 and 5.
Table 3

Number of chemical structures generated by OMG and MOLGEN using as input only the elemental compositions of metabolites containing C, O, H, N, P and S elements

StructureName HMDB ID elemental compositionMOLGEN
OMG
# Candidate structuresTime (s)Time per molecule (ms)# Candidate structuresTime (s)Time per molecule (ms)

Glycine HMDB00123 C2H5NO2
N_3 84
0.118
1.405
97
0.452
4.660
 
 
N_5 162
0.120
0.741
 
 
 

Acetyl-HMDB00532 C4H7NO3
18,469
0.282
0.015
26,530
126.117
4.754

Phenylalanine HMDB00159 C9H11NO2
277,810,163
2227.796
0.008
* More than 277,810,163
* not available
* not available

Glutamic acid HMDB00148 C5H9NO4
440,821
2.945
0.007
685,392
12,348.456
18.017

Phosphoenolpyruvic acid HMDB00263 C3H5O6P
P_3 51,323
0.562
0.011
83,977
761.378
9.067
 
 
P_5 129,421
1.398
0.011
 
 
 

Creatinine HMDB00562 C4H7N3O
93,323
0.933
0.010
303,601
3,921.157
12.915

Guanidinoacetic acid HMDB00128 C3H7N3O2
45,626
0.585
0.013
124,808
1,962.532
15.724

Cytosine HMDB00630 C4H5N3O
108,769
1.149
0.011
491,299
3,952.098
8.044

Uric acid HMDB00289 C5H4N4O3
464,899,034
3488.097
0.008
* More than 464,899,034
* not available
* not available

Histamine HMDB00870 C5H9N3
46,125
0.631
0.014
134,278
3,566.544
26.561

D-Cysteine HMDB03417 C3H7NO2S
3,838
0.156
0.041
15,978
131.004
8.199

p-Cresol sulfate HMDB11635 C7H8O4S
S_6
5078.132
0.009
* More than 82,000,000
* not available
* not available
  592,625,133     

* Results were not generated due to excessive computational time needed to generate all the candidate structures. We expect OMG to generate more molecules than MOLGEN, due to the larger amount of atom types produced by OMG.

n class="Chemical">Npan>umber of chemical structures generated by pan> class="Chemical">OMG and MOLGEN using as input only the elemental compositions of metabolites containing C, O, H, n class="Chemical">N, P and S elements * Results were not gepan class="Chemical">nerated due to excessive computational time needed to generate all the candidate structures. We expect OMG to genpan>erate more molecules thanpan> MOLGEpan> class="Chemical">N, due to the larger amount of atom types produced by OMG. In the case of papan class="Chemical">n class="Chemical">phosphoenolpyruvic acid, we require P valenpan>ce 5 to be considered. Onpan> the onpan>e hanpan>d, runninpan>g MOLGEpan> class="Chemical">N with the default valence for P yields 51,323 candidate structures but the correct molecule is missing. On the other hand, forcing the valence of P to be 5, returns 129,421 candidate structures, with the correct molecule also produced but also an excessive quantity of unrealistic molecules due to unrealistic atom types for P. Alternatively, OMG generates 83,977 candidate structures with P valence 3 and 5, including the desired molecule, where all of them are valid molecules as defined by the CDK atom dictionary. We observe in Table 3 that the rupan class="Chemical">nning time per generated molecule now ranges between 0.008 and 0.041 milliseconds, while OMG requires betweenpan> 4.8 anpan>d 26.6 milliseconpan>ds. Such differenpan>ce inpan> executionpan> speed betweenpan> MOLGEpan> class="Chemical">N and OMG makes that for some large elemental compositions, only results are reported for MOLGEN. This is the case of phenylalanine, uric acid and p-cresol sulfate. However, for these metabolites, we assume that the number of candidate structures would have been higher with OMG than the one reported by MOLGEN using the default valences.

Structure generation from elemental formula and prescribed substructures

Structure generatiopan class="Chemical">n is a combinatorial problem where the number of output molecules grows exponentially with to the number of input atoms. When using one or more prescribed substructures as input to the generators in addition to elemental composition, less candidate structures are obtained (Table 4). Whereas MOLGEN canpan> onpan>ly accept onpan>e substructure, pan> class="Chemical">OMG can accept multiple substructures as input with the constraint that these do not overlap, i.e., they should not share any atom. Phenylalanine is a good example how the number of generated structures can be reduced by using more prescribed substructures, as will be discussed below in more detail.
Table 4

Number of chemical structures generated by OMG and MOLGEN using as input an elemental composition and one or more prescribed and non-overlapping fragments

StructureName HMDB ID elemental compositionPrescribed substructure(s)MOLGEN
OMG
# Candidate structuresTime (s)Time per molecule (ms)# Candidate structuresTime (s)Time per molecule (ms)

Glycine

6
0.167
27.833
6
0.539
89.833
HMDB00123
C2H5NO2

D-Cysteine

100
0.193
1.930
210
3.177
15.129
HMDB03417
C3H7NO2S

Phenylalanine

76,247
52.774
0.692
107,155
19386.019
180.916
 
HMDB00159
 
 
 
 
 
 
 
 
C9H11NO2
 
 
 
 
 
 
 
 
 

* not possible
* not possible
* not possible
595
271.809
456.822
 
 

* not possible
* not possible
* not possible
289
172.655
597.422
 
 

* not possible
* not possible
* not possible
26
25.147
967.192

Cholic acid

** not possible
** not possible
* not possible
334
120.519
360.835
 
HMDB00619
 
 
 
 
 
 
 
 
C24H40O5
 
 
 
 
 
 
 
 
 

* not possible
* not possible
* not possible
2,505
119.418
47.672

Malic acid

1,436
0.229
0.159
1,436
4.688
3.265
 
HMDB00156
 
 
 
 
 
 
 
 
C4H6O5
 
 
 
 
 
 
 

Uric acid

150,114
962.016
6.409
6,069,863
155828.437
25.672
 
HMDB00289
 
 
 
 
 
 
 
 
C5H4N4O3
 
 
 
 
 
 
 

Phenyllactic acid

21,040
15.674
0.745
26,164
163.904
6.264
 
HMDB00779
 
 
 
 
 
 
 
 
C9H10O3
 
 
 
 
 
 
 
 
 

* not possible
* not possible
* not possible
525
3.973
7.568

p-Cresol sulfate

S_6 13,177
65.667
4.983
13,177
63.047
4.785
 
HMDB11635
 
 
 
 
 
 
 
 C7H8O4S
 
 
 
 
 
 
 
 S_6 70,33094.8981.34917,2321204.35769.891

*MOLGEN can only accept one prescribed substructure, while OMG accepts multiple substructures, provided that these do not overlap, this is, they do not share any atom.

**MOLGEN is not able to generate molecules using this large substructure as input. The reason could not be found.

Npan>umber of chemical structures generated by pan> class="Chemical">OMG and MOLGEN using as input an elemental composition and one or more prescribed and non-overlapping fragments *MOLGEn class="Chemical">N can only accept one prescribed substructure, while pan> class="Chemical">OMG accepts multiple substructures, provided that these do not overlap, this is, they do not share any atom. **MOLGEn class="Chemical">N is not able to genpan>erate molecules using this large substructure as input. The reason could not be found. Substructure informatiopan class="Chemical">n is of great relevance for metabolomics experiments involving MSn data, where oftenpan> the onpan>ly inpan>formationpan> available of anpan> unpan>known metabolite that needs to be idenpan>tified is the elemenpan>tal compositionpan> anpan>d inpan> some cases substructures. Provided that no database enpan>tries exist for this experimenpan>tal inpan>formationpan>, onpan>e is forced to genpan>erate the structures via CASE. The inpan>clusionpan> of substructure inpan>formationpan> brinpan>gs the list of canpan>didate structures to a manpan>ageable size. For pan> class="Chemical">p-cresol sulfate, using the sulfate group with both generators as prescribed substructure, produces 13,177 molecules. When benzene is the prescribed substructure, OMG generates 17,232 candidate structures and MOLGEN 70,330, all containing sulfur with valence 6, hence the difference between both generators. Whereas only the elemepan class="Chemical">ntal composition of phenylalanine as inpan>put genpan>erates 277 millionpan> structures with MOLGEn class="Chemical">N and for n class="Chemical">OMG an even higher number of candidate structures is expected as both nitrogen valences of 3 and 5 are taken into account (Table 3), using benzene as a substructure provides only 107,155 (OMG) and 76,247 (MOLGEN) candidate structures (Table 4). The number of generated molecules for the elemental composition of phenylalanine is even further reduced by prescribing multiple fragments as input: OMG outputs 595 molecules when provided with two fragments and 289 molecules for three fragments (Table 4). The use of large fragments yields the larger reduction in output molecules, as it can be seen for the last example of phenylalanine, where two big fragments describe most of its structure and return only 26 chemical structures. For larger molecules contaipan class="Chemical">ning ten or more carbon atoms, which is a commonpan> situationpan> inpan> chemistry, it is npan>ot practical for the identificationpan> of metabolites to exhaustively genpan>erate canpan>didate structures without usinpan>g substructure conpan>strainpan>ts, with MOLGEpan> class="Chemical">N and OMG, due to the large number of results. Using the elemental composition of a large metabolite like cholic acid, both structure generators cannot produce all possible candidate structures, which are expected in the order of billions. This was only possible using substructure information to reduce the size of the search tree: when providing a substructure that describes a large part of the molecule, OMG generates only 334 structures (Table 4). When using two substructures, OMG returned 2,505 candidate structures. However, MOLGEN was unable to return results using the same large substructure or two substructures as an input and the reason could not be found by us. The use of prescribed substructures affected the runpan class="Chemical">ning time of both generators. For MOLGEN, the time per molecule ranpan>ged betweenpan> 0.16 anpan>d 27.8 milliseconpan>ds, which represents inpan> some cases a 10,000-fold inpan>crease inpan> computationpan> time compared to usinpan>g onpan>ly elemenpan>tal compositionpan>s. Conpan>cernpan>inpan>g pan> class="Chemical">OMG, the time per molecule ranged between 3.3 and 967. 2 milliseconds, a 100-fold increase in running time. Despite this deterioration of execution time, the advantage of using one or ideally multiple prescribed substructures is clear: the number of candidate substructures is significantly reduced and the total time to calculate candidate structures is also reduced compared to not using any substructure. The results here presented show that if we wapan class="Chemical">nt MOLGEN to genpan>erate the correct molecule whenpan> the valence of some atoms is not the default onpan>e, like pan> class="Chemical">phosphoenolpyruvic acid or p-cresol sulfate, we need to know the valence in advance. Otherwise, MOLGEN should be executed using all possible valences for all atoms. This limitation is not present in OMG, which can produce different valences in the same execution. Unfortunately, the atom dictionary provided by CDK is not comprehensive concerning non-standard valences. On the positive side, the dynamic open source community of CDK keeps adding new atom types with each release of the library and we expect that this will improve the capabilities of n class="Chemical">OMG. This open source nature of CDK allows users to suggest or implement new atom types. The generatiopan class="Chemical">n of the molecules in the Open Molecule Generator has the shape of a tree. As stated by McKay [23], the check for canonical augmentation is branchindependent, which would allow to process branches of the generation trees in parallel. Theoretically the algorithm allows for parallelization, in practice this has not been implemented but it is one future extension of this work. However, we have observed that n class="Chemical">OMGpan> is in most of the cases slower thanpan> MOLGEpan> class="Chemical">N and this fact was more noticeable when generating millions of candidate molecules. The speed of OMG could be improved and we see several possibilities to achieve this, i.e. the use of a different canonizer or a less computationally demanding canonicity test for intermediate chemical structures, could significantly speed up the execution. Actually, obtaining millions of molecules as a result, quickly or slowly, is not desirable, but ideally, the goal of metabolite identification is to obtain a list of candidate structures that is short in order to examine it and find the structure belonging to the unknown metabolite. Exhaustive profiling, covering both on execution time and memory use, would be beneficial to discover improvement points for OMG. Fortunately, n class="Chemical">OMG allows multiple prescribed substructures and can handle large fragments, which reduced the number of generated molecules significantly. Handling multiple substructures allows OMG to provide a short list of candidate structures and additionally, its open source nature permits users to implement specific constraints to further reduce the candidate list, both during and after the generation process. Examples of such constraints would reject intermediate chemical structures with high steric energy values or other physicochemical properties. Therefore we expect OMG to be useful in different application areas and its functionality to be extended in the near future.

Conclusion

In this work we have presepan class="Chemical">nted the Open Molecule Generator, to the best of our knowledge, the first implementation to chemical structure generation of the Canonical Path Augmentation approach, originally designed for simple graph enumeration adding vertices. We have adapted it to generate organic chemical structures and extended so that (i) it grows molecules by adding bonds, (ii) it can handle multigraphs, and (iii) accepts one or multiple non overlapping prescribed substructures. In addition, this is the first open source implementation of a deterministic structure generator. This will enable future developments like parallelization or the inclusion of constraints that are specific to the class of compounds being generated. Our results show that the implementatiopan class="Chemical">n of our algorithm generates all possible and valid chemical structures for a given elemental composition and optionally prescribed substructures. It is as complete as the best commercially available generator. Moreover, the current implementation of the OMG program presenpan>ts anpan> extra advanpan>tage over existinpan>g genpan>erators whenpan> large or multiple fragmenpan>ts are available to be used as conpan>strainpan>ts: we have demonpan>strated the benpan>efit of inpan>corporatinpan>g constrainpan>ts to reduce the number of output molecules signpan>ificanpan>tly. The ability of pan> class="Chemical">OMG to generate multiple valences for an atom has proven to be useful as often no prior information is known on the desired chemical elements and multiple valences of an element can be present in a molecule. When compared to MOLGEN, the only disadvantage of OMG is its speed, which is more severe when using only elemental compositions and less when including prescribed substructures. This issue will be addressed in future improvements of the program. We expect this tool to be used in various fields, one of them being metabolomics, where there is a clear need for flexible structure generators. We have successfully used OMG to propose candidate structures using prescribed substructures, in several on-going metabolite identification projects in our lab.

Availability and requirements

Project name: opepan class="Chemical">nmg Project home page:http://sourceforge.net/p/opepan class="Chemical">nmg Operating system: Linux 64bits, Linux 32bits, Mac OS X Programming languages: Java, C Other requirements (if compiling): License: Gn class="Chemical">NU AGPL v3 Any restrictions to use by non-academics: n class="Chemical">None other thanpan> those specified by the licenpan>se

Algorithms

1: (M) 2:  If saturated(M) AND are_all_H_used(M) 3:    If connected_fragments(M) == 1 4:     store_to_file(M) 5:     Nmols = Nmols + 1 6:     If degree(M) < max_degree(M) 7:       generate(M) 8:     Endif 9:    Endif 10:  Else 11:   New Map 12:   List_of_bonds = extend(M) 13:   Foreach bond in list_of_bonds 14:     M’ = add_bond(bond,M) 15:     canonM’ = canonize(M’) 16:     If not is_present(map,canonM’) 17:       add(map,canonM’) 18:       If is_canonical_augmentation (canonM’,M’,M) 19:          generate(M’) 20:       EndIf 21:     EndIf 22:   End 23:  EndIf 24: End Algorithm 1 1: Is_canonical_augmentation(canonM’, M’, M) 2:  last_bond = get_last_bond(canonM’) 3:  M” = remove_bond(M’, last_bond) 4:  return are_the_same(M”, M) 5: End Algorithm 2 1: Is_canonical_augmentation_fragments(canonM’, M’, M) 2:  last_bond = get_last_bond(canonM’) 3:  While bond_belongs_to_fragment(last_bond, canonM’) 4:     last_bond = get_previous_bond(canonM’) 5:  Endwhile 6:  M” = remove_bond(M’, last_bond) 7:  return are_the_same(M”, M) 8: End Algorithm 3

Competing interests

The authors declare that they have no competipan class="Chemical">ng interests.

Authors’ contributions

JEP designed apan class="Chemical">nd implemented the software, and drafted most of the manuscript. n class="CellLine">MRC anpan>d DF contributed to the implemenpan>tation of the software. JLF conpan>tributed to the design of the algorithm anpan>d supervised specific parts of the project. TR, LC anpan>d TH supervised specific parts of the project, fed it with new ideas, anpan>d participated inpan> testinpan>g the software. All authors approved the finpan>al manpan>uscript.
  21 in total

1.  SENECA: A platform-independent, distributed, and parallel system for computer-assisted structure elucidation in organic chemistry.

Authors:  C Steinbeck
Journal:  J Chem Inf Comput Sci       Date:  2001 Nov-Dec

Review 2.  The next wave in metabolome analysis.

Authors:  Jens Nielsen; Stephen Oliver
Journal:  Trends Biotechnol       Date:  2005-09-12       Impact factor: 19.536

Review 3.  Chemoinformatics: past, present, and future.

Authors:  William Lingran Chen
Journal:  J Chem Inf Model       Date:  2006 Nov-Dec       Impact factor: 4.956

4.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13.

Authors:  Lorenz C Blum; Jean-Louis Reymond
Journal:  J Am Chem Soc       Date:  2009-07-01       Impact factor: 15.419

5.  Enumerating treelike chemical graphs with given path frequency.

Authors:  Hiroki Fujiwara; Jiexun Wang; Liang Zhao; Hiroshi Nagamochi; Tatsuya Akutsu
Journal:  J Chem Inf Model       Date:  2008-06-28       Impact factor: 4.956

6.  Comprehensive analytical strategy for biomarker identification based on liquid chromatography coupled to mass spectrometry and new candidate confirmation tools.

Authors:  Rayane Mohamed; Emmanuel Varesio; Gordana Ivosev; Lyle Burton; Ron Bonner; Gérard Hopfgartner
Journal:  Anal Chem       Date:  2009-09-15       Impact factor: 6.986

7.  The use of MS classifiers and structure generation to assist in the identification of unknowns in effect-directed analysis.

Authors:  E L Schymanski; C Meinert; M Meringer; W Brack
Journal:  Anal Chim Acta       Date:  2008-04-04       Impact factor: 6.558

8.  Efficient enumeration of stereoisomers of outerplanar chemical graphs using dynamic programming.

Authors:  Tomoki Imada; Shunsuke Ota; Hiroshi Nagamochi; Tatsuya Akutsu
Journal:  J Chem Inf Model       Date:  2011-10-25       Impact factor: 4.956

9.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry.

Authors:  Warwick B Dunn; David Broadhurst; Paul Begley; Eva Zelena; Sue Francis-McIntyre; Nadine Anderson; Marie Brown; Joshau D Knowles; Antony Halsall; John N Haselden; Andrew W Nicholls; Ian D Wilson; Douglas B Kell; Royston Goodacre
Journal:  Nat Protoc       Date:  2011-06-30       Impact factor: 13.491

10.  The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.

Authors:  Christoph Steinbeck; Yongquan Han; Stefan Kuhn; Oliver Horlacher; Edgar Luttmann; Egon Willighagen
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr
View more
  19 in total

Review 1.  Open source molecular modeling.

Authors:  Somayeh Pirhadi; Jocelyn Sunseri; David Ryan Koes
Journal:  J Mol Graph Model       Date:  2016-07-30       Impact factor: 2.518

2.  Many InChIs and quite some feat.

Authors:  Wendy A Warr
Journal:  J Comput Aided Mol Des       Date:  2015-06-17       Impact factor: 3.686

3.  The octet rule in chemical space: generating virtual molecules.

Authors:  Rafel Israels; Astrid Maaß; Jan Hamaekers
Journal:  Mol Divers       Date:  2017-08-03       Impact factor: 2.943

4.  An insight into the structures, stabilities, and bond character of B(n)Pt (n=1∼6) clusters.

Authors:  Guangli Yang; Wenwen Cui; Xiaolei Zhu; Ruiying Yue
Journal:  J Mol Model       Date:  2014-10-15       Impact factor: 1.810

5.  An Efficient Algorithm to Count Tree-Like Graphs with a Given Number of Vertices and Self-Loops.

Authors:  Naveed Ahmed Azam; Aleksandar Shurbevski; Hiroshi Nagamochi
Journal:  Entropy (Basel)       Date:  2020-08-22       Impact factor: 2.524

6.  De novo structure determination of 3-((3-aminopropyl)amino)-4-hydroxybenzoic acid, a novel and abundant metabolite in Acinetobacter baylyi ADP1.

Authors:  Marion Thomas; Lucille Stuani; Ekaterina Darii; Christophe Lechaplais; Emilie Pateau; Jean-Claude Tabet; Marcel Salanoubat; Pierre-Loïc Saaidi; Alain Perret
Journal:  Metabolomics       Date:  2019-03-14       Impact factor: 4.290

7.  Small Molecule Identification with MOLGEN and Mass Spectrometry.

Authors:  Markus Meringer; Emma L Schymanski
Journal:  Metabolites       Date:  2013-05-28

8.  MAYGEN: an open-source chemical structure generator for constitutional isomers based on the orderly generation principle.

Authors:  Mehmet Aziz Yirik; Maria Sorokina; Christoph Steinbeck
Journal:  J Cheminform       Date:  2021-07-03       Impact factor: 5.514

9.  Computational mass spectrometry for small molecules.

Authors:  Kerstin Scheubert; Franziska Hufsky; Sebastian Böcker
Journal:  J Cheminform       Date:  2013-03-01       Impact factor: 5.514

10.  Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics.

Authors:  Lochana C Menikarachchi; Ritvik Dubey; Dennis W Hill; Daniel N Brush; David F Grant
Journal:  Metabolites       Date:  2016-05-31
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.