| Literature DB >> 26075200 |
Abstract
We are developing a database named 3DMET, a three-dimensional structure database of natural metabolites. There are two major impediments to the creation of 3D chemical structures from a set of planar structure drawings: the limited accuracy of computer programs and insufficient human resources for manual curation. We have tested some 2D-3D converters to convert 2D structure files from external databases. These automatic conversion processes yielded an excessive number of improper conversions. To ascertain the quality of the conversions, we compared IUPAC Chemical Identifier and canonical SMILES notations before and after conversion. Structures whose notations correspond to each other were regarded as a correct conversion in our present work. We found that chiral inversion is the most serious factor during the improper conversion. In the current stage of our database construction, published books or articles have been resources for additions to our database. Chemicals are usually drawn as pictures on the paper. To save human resources, an optical structure reader was introduced. The program was quite useful but some particular errors were observed during our operation. We hope our trials for producing correct 3D structures will help other developers of chemical programs and curators of chemical databases.Entities:
Keywords: 2D–3D conversion; 3DMET; CLiDE; InChI; canonical SMILES; chemical database; natural products
Year: 2015 PMID: 26075200 PMCID: PMC4443773 DOI: 10.3389/fbioe.2015.00066
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Operation flow to develop 3D structures. Arrows of (A–C) were possible starting points depended on data sources.
Figure 2A schematic example of image to 3D structure about octadehydro-beta-carotene. Each description shows (A) an iniitial structure of 2D drawing, (B) the 2D structure after CLiDE, (C) the minimized 3D structure with chiral constraint option, and (D) directly edited and minimized structure on the MOE window. Green numbers show dihedral angle of the bonds.
Examples of typical errors for translator.
| Query | Result | Errors | |
|---|---|---|---|
| 1. Canavanine | Some elemental descriptions drawn as characters on the paper were not translated | ||
| 2. Ryanodine | Bond direction and connectivity were not recognized in complex structures | ||
| 3. Sparteine | “N” was recognized as a part of bonds. Some kinds of chiral information were lost | ||
| 4. Terfairine | “Br” was not translated as a halogen atom | ||
| 5. 3(R)-Millonol-B | Description of “resonance structure” cannot be converted | ||
| 6. Veratridine | Some parts of the molecule were missing among relatively larger molecules | ||
| 7. 1,7,9,15-Heptadecatetraene-11,13-diyne | No structures were developed when all atoms were clearly described by elements |
The structures in the “query” column shows 2D-structures of the queries drawn by ChemDraw Pro (.
Frequency of errors regarding chirality and cis/trans stereochemistry during the conversion processes by MOE 2011.
| cpd – wash | wash – mm | cpd – mm | |
|---|---|---|---|
| Same | 8,964 | 11,296 | 8,735 |
| Chiral inversion | 268 | 119 | 201 |
| Chiral missing | 2,152 | 521 | 2,417 |
| Cis/trans inversion | 1 | 29 | 30 |
| Cis/trans missing | 0 | 6 | 0 |
| Else unmatched | 589 | 3 | 621 |
All transferred structures were compared, categorized, and counted by each step of database minimization as shown in Table .
Figure 3An example of 2D–3D conversion by MOE. The C09519 entry of KEGG COMPOUND was converted by MOE. Each description shows (A) the molecular drawing provided in KEGG COMPOUND, (B) the 2D-mol file viewing by MOE, (C) the structure after add hydrogen, and (D) the structure after minimization. The structures (B–D) are shown with the corresponding SMILES notation strings. Red characters indicate a position of chiral inversion.
Accuracy of 3D-structure construction by MOE 2011.
| Command | Database | |||
|---|---|---|---|---|
| InChI | SMILES | InChI | SMILES | |
| Whole notification strings | 11,974 | 11,974 | 11,974 | 11,974 |
| Completely same | 6,725 | 6,866 | 8,522 | 8,735 |
| Different | 5,249 | 5,108 | 3,452 | 3,239 |
| Chiral errors | ||||
| Undefined chiral atoms | 1,337 | 2,513 | 1,863 | 2,417 |
| Mismatch about chirality | 3,586 | 2,454 | 147 | 201 |
| Mismatch of cis/trans (including undefined bond stereochemistry) | 109 | 33 | 474 | 30 |
| Correspondence of InChI layer | ||||
| Formula | 11,925 | 9,550 | ||
| c (Connection) | 11,925 | 9,569 | ||
| h (Hydration) | 11,900 | 9,539 | ||
| b (Cis/trans stereochemistry) | 11,860 | 9,217 | ||
| t (Chirality) | 6,973 | 7,140 | ||
Entries of KEGG COMPOUND without “R,” “n,” and “X” were the initial 2D structure dataset (11,974 compounds). Operations of wash and add hydrogen were performed as command-based (command) and molecular database function (database) of MOE. In comparing the initial COMPOUND 2D-mol file and structures after minimization, each description means as follows: same formula, correspondence of formula of main layer in InChI; same string, completely same two strings of InChI or SMILES; chiral difference, unmatched strings with same connectivity and hydration; undefined chiral atom, at least one of two strings lacking atomic chiral information; and chiral mismatch, not corresponding atomic chirality.
Chiral detection by InChI and canonical SMILES.
| Chiral tags | Cis/trans tags | Undefined chiral atoms | |||
|---|---|---|---|---|---|
| InChI | Canonical SMILES | InChI | Canonical SMILES | InChI | |
| Initial 2D structures | 6,618 | 6,704 | 1,895 | 1,564 | 1,316 |
| After wash | 8,010 | 7,803 | 1,942 | 1,555 | 3,045 |
| After minimization | 7,268 | 8,034 | 1,906 | 1,560 | 162 |
A number of structures with chiral tags were indicated. Calculation was based on 11,974 compounds from KEGG COMPOUND.
Figure 4Several SMILES output. The C00125 structure of COMPOUND (A) was transferred to three SMILES notation: (B) isomeric SMILES of SYBYL, (C) unique SMILES of MOE, and (D) canonical SMILES by Daylight.