Literature DB >> 30510977

Data on generation of Kekulé structures for graphenes, graphynes, nanotubes and fullerenes and their aza-analogs.

Sergey Trepalin1, Sasha Gurke2, Mikhail Akhukov3, Andrey Knizhnik3, Boris Potapkin3.   

Abstract

Two new features are added to existing algorithms for kekulization of chemical structures, i.e., handling of triple and cumulene bonds in cycles and use of random atom sorting to remove unmatched atoms. Handling of triple and cumulene bonds enables kekulization of graphynes and graphdiynes. Random sorting speeds up the calculation time, i.e., kekulization of large chemical structures containing about 107 atoms takes ≤1 min on a typical PC. Source codes (Pascal, GNU GPL license) are included as a compiled application (Windows 64). Calculation times and unmatched atom statistics are provided for graphenes, graphynes, nanotubes, graphyne nanotubes and fullerenes. Benchmark comparisons are made for some data.

Entities:  

Year:  2018        PMID: 30510977      PMCID: PMC6258249          DOI: 10.1016/j.dib.2018.10.128

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table Value of the data This data proves the viability of a fast algorithm for the modeling of state-of-the art materials such as graphenes in scientific and commercial applications The data is provided for very large chemical structures (10-10 atoms) and is obtained using ordinary hardware with short calculation times This data has been benchmarked against existing algorithms and can be used for benchmarking for the future improvements to the algorithm

Data

Detailed description of an algorithm for fast generation of Kekulé structures from a list of atomic valences and connectivity matrices is given in [1]. The source code that was based on this algorithm and used for calculations is provided in this publication. Model calculations were performed for graphenes, nanotubes, fullerenes and their aza-analogs, polycyclopentadienes and porphine, using a Windows 2012 server with 2.8 GHz i7 processor and 16 GB RAM. The procedures were not run in parallel but rather in a single thread. The Win API method GetTickCount() was used for precise time measurements. This method allows the measurement of the elapsed time with a millisecond precision. For the compounds containing less than 1,000,000 atoms, the computing time was determined as an average of 1000 calculations, less than 10,000,000 atoms – as an average of 100 calculations and for more than 10,000,000 atoms – as the time of one calculation. For repeated calculations, the sorting order of the sequence of chemical bonds was random. For an algorithm of aromatic bond alternation, this is equivalent to the random selection of a double bond in the node. For a large number of calculations, this procedure allows estimation of the statistical distribution of unmatched atoms after alternation of aromatic bonds. Compiled application (Windows 64) used for calculations is in the Attachment1.zip. Computer-readable chemical structures are in the Attachment2.zip.

Experimental design, materials and methods

Nanotubes

Carbon nanotubes of various lengths were generated using the pattern shown in Fig. 1. Generated nanotubes contain two acyclic methylene groups at the beginning and at the end of a nanotube.
Fig. 1

The pattern for generation of C[12,12] nanotubes.

The pattern for generation of C[12,12] nanotubes. In Fig. 1, the arrows show the points of attachment for generation of a polymer molecule. The points of attachment for the outer rings connect with the points of attachment for the inner rings. The order of an aromatic bond resulting from connecting a pair of attachment points is unknown. For the end moieties of the polymer, the points of attachment were replaced with single bonds to hydrogen atoms. The results of calculation are given in Table 1 and discussed below.
Table 1

Time required for generation of Kekulé structures and unmatched atom statistics for nanotubes and graphenes.

Chain lengthFormulaTime (ms)Unmatched atoms statistics
24681012141618202224262830323436384042
Nanotubes
1C144 H480.3332351510107
10C1440 H482.421289311114623221511393
100C14400 H4828.15121536119270
1000C144000 H48407.71111291281332227513
10000C1440000 H488123.363034237
100000C14400000 H481084371
Graphenes
1C144H380.16313404421825
10C1440 H1462.09942971401661991671075815
100C14400 H122628.561412202477165255248138374
1000C144000 H12026451.822215736168421288653
10000C1440000 H1200268497.811137429
100000C14400000 H12000261182971



Data for some compounds from [2] obtained with our algorithm
tube-980C980H261.361775164312246143394
sheet-1800C1800H1782.1900000000
sheet-19602C19602H59228.900000000

Where:

Chain length – number of monomeric units in a polymer,

Formula – molecular formula,

Time (ms) – time required for kekulization of a single chemical structure in milliseconds,

Unmatched atoms statistic - number of calculations for a given number of unmatched atoms (2-42).

Time required for generation of Kekulé structures and unmatched atom statistics for nanotubes and graphenes. Where: Chain length – number of monomeric units in a polymer, Formula – molecular formula, Time (ms) – time required for kekulization of a single chemical structure in milliseconds, Unmatched atoms statistic - number of calculations for a given number of unmatched atoms (2-42).

Graphene

The pattern for generation of graphene is shown in Fig. 2.
Fig. 2

The pattern for generation of graphenes.

The pattern for generation of graphenes. To generate polymers, the points of attachment on the left were connected with the points of attachment on the right of another graphene block in the way similar to that used for nanotubes. The points of attachment of the first and the last blocks were capped with hydrogen atoms via single bonds. For acyclic carbon atoms, required for the generation of repeating aromatic cycles, two hydrogen atoms were added to the point of attachment, resulting in a methylene group. The results of calculations are given in Table 1. Kekulization of various compounds, including nanotubes and graphenes, is described [2], [3] and computing times are provided. For comparison, we performed calculations for some of the same compounds. The results are in Table 1. Computing times were dramatically shorter than those reported in [2]. Detailed discussion of this data can be found in [1].

Fullerenes and porphine

Alternation of aromatic bonds was studied for fullerenes C20, C60, C70, C80, C82, their random aza-analogs, as well as for porphine (Fig. 3).
Fig. 3

Fullerenes and porphine used for model calculations.

Fullerenes and porphine used for model calculations. Polymeric analogs of these compounds do not exist. Consequently, all the calculations were performed for monomers. To validate the efficiency of the algorithm for five-member cycles, model calculations were performed for azafullerenes by randomly replacing 2, 4 or 8 carbon atoms with nitrogen. Nitrogen has a valence of 3 and three single converging bonds in each node. This substitution can be done for the even number of atoms only. Otherwise, bonds cannot be alternated, and the number of unmatched atoms is odd. The results of calculations are given in Table 2.
Table 2

Time required for generation of Kekulé structures and unmatched atoms statistics before their removal for fullerenes, their aza-analogs and porphine. The data is for 1000 transactions.

CompoundFormulaTime (ms)No. non-existentAvg. no. iterationsMax. no. iterationsUnmatched atoms statistic
02468>8
С20С200.07801.001699301
Diaza-C20C18N20.06201.0334095838
Tetraaza-C20C16N40.43833 + 5914.224994938
Octaaza-C20C12N80.953541 + 21870.617692229
C60C600.15601.001329600683
Diaza-C60C58N20.17101.0121246811821111
Tetraaza-C60C56N40.2344 + 63.72378556337281
Octaaza-C60C52N81.25063 + 8324.431464923065411
C70C700.18701.0032264656764
Diaza-C70C68N20.18801.01310226924
Tetraaza-C70C66N40.4072 + 32.54361498382572
Octaaza-C70C62N80.89049 + 5214.863794193901039
C80C800.18701.093329161594
Diaza-C80C78N20.20301.06539057731221
Tetraaza-C80C76N40.3441 + 51.69450421440845
Octaaza-C80C72N80.86031 + 4212.8455303461162181
C82C820.20301.014222362214951
Diaza-C82C80N20.20301.020274547347302
Tetraaza-C82C78N40.3282 + 42.84242450425785
Octaaza-C82C74N80.98428 + 3713.97346288468175212
PorphineC20H14N40.09301.001441559

Where:

Compound – chemical structure in Fig. 2,

Formula – molecular formula,

Time (ms) – time required for kekulization of a single chemical structure in milliseconds,

No. non-existent – number of compounds from a 1000 set for which Kekulé structure was not generated,

Avg.no. iterations – average number of shuffles. This number includes the maximum number of shuffles (300) after which a decision is made that a Kekulé structure does not exist,

Max. no. iterations – maximum number of shuffles required for successful kekulization,

Unmatched atom statistic - number of calculations for a given number of unmatched atoms (0->8).

Time required for generation of Kekulé structures and unmatched atoms statistics before their removal for fullerenes, their aza-analogs and porphine. The data is for 1000 transactions. Where: Compound – chemical structure in Fig. 2, Formula – molecular formula, Time (ms) – time required for kekulization of a single chemical structure in milliseconds, No. non-existent – number of compounds from a 1000 set for which Kekulé structure was not generated, Avg.no. iterations – average number of shuffles. This number includes the maximum number of shuffles (300) after which a decision is made that a Kekulé structure does not exist, Max. no. iterations – maximum number of shuffles required for successful kekulization, Unmatched atom statistic - number of calculations for a given number of unmatched atoms (0->8). Detailed discussion of this data can be found in [1]. In addition, model calculations were performed for 2488 fullerenes from a library by Yoshida [4]. This data is provided in Table S3. The legend for the column headers in this table is the same as for Table 2, except the column No. non-existent is not provided because every compound from the set of 1000 had a Kekulé structure.

Graphynes and graphyne nanotubes

We studied graphynes GY1 and GY7 (Fig. 4) and graphyne nanotubes of various degrees of polymerization. Graphyne nanotubes were generated by replacing hydrogen atoms in GY7 with carbon atoms and adding a bond between these atoms in a vertical position.
Fig. 4

Monomer units of graphynes GY1 (A) and G7 (B). Upper half of GY7 monomer unit is shown only. The entire monomer could be visualized by connecting the upper part with its reflection relative to the zig-zag line, using two aromatic bonds.

Monomer units of graphynes GY1 (A) and G7 (B). Upper half of GY7 monomer unit is shown only. The entire monomer could be visualized by connecting the upper part with its reflection relative to the zig-zag line, using two aromatic bonds. The times required for kekulization of graphynes and graphyne nanotubes of various degrees of polymerization are shown in Table 3.
Table 3

Kekulization of graphynes and graphyne nanotubes of various degrees of polymerization.

Chain length1
10
100
1000
10,000
100,000
FormulaTime (ms)FormulaTime (ms)FormulaTime (ms)FormulaTime (ms)FormulaTime (ms)FormulaTime (ms)
TubeC192 H720.5C1920 H722.06C19200 H7229.8C192000H72332C1920000 H727201C19200000 H7259,515
GY1C136 H740.45C1360 H1461.36C13600 H86613.7C136000 H8066176C1360000 H800664811C13600000 H80006675,938
GY7C188 H860.39C1880 H1222.44C18800 H48225.3C188000 H4082358C1880000 H400826414C18800000 H40008282,032
Kekulization of graphynes and graphyne nanotubes of various degrees of polymerization.

Polycyclopentadienes

Polycyclopentadienes (Fig. 5) are remarkable because they contain odd-sized cycles and can be easily generated as long-chain polymers.
Fig. 5

Structural repeating unit of polycyclopentadiene.

Structural repeating unit of polycyclopentadiene. We studied three types of polycyclopentadienes. In the first type, after generation of the structure, free valences of the end-group carbon atoms with arrows were replaced with hydrogens. That resulted in a methylene end group. In the second type, free valences in the left-hand end group were combined with these in the right-hand end group to form a cycle. In the third type, end groups were combined in a crisscross fashion for form a Moebius loop. The results of calculations are shown in Table 4.
Table 4

Results of generation of Kekulé structures for polycyclopentadienes.

Chain lengthFormulaTime (ms)No. non-existentAvg. no. iterationsMax. no. iterationsUnmatched atoms statistic
0246
Linear
1C150 H540.530 + 01.9313372607210
10C1500H5044.080 + 04.632574725030
10C1498H504N235.20 + 10(1)22.9264477431811
10C1496H504N465.754 + 91(71)68.930026239321031
10C1492H504N818987 + 348(261)16329950183269247
100C15000H500467.70 + 05.603685214620
1000C150000H5000411920 + 05.614785015000
10000C1500000 H500004473540 + 07.5733891100
100000C15000000 H50000045054010 + 06.5227300
Cyclic
1C150H500.340 + 01,235289681300
10C1500H5002.770 + 02.8917624332440
100C15000H500033.60 + 02.9028622331470
1000C15000H500005850 + 02.8014634335310
10000C1500000 H500000163680 + 03.0912692920
100000C15000000 H50000001938590 + 02.267210
Möbius
1C150H500.360 + 01.2641896871213
10C1500H5003.670 + 03.6629544382731
100C15000H500052.20 + 04.4127616332511
1000C150000H5000013810 + 04.4229602350480
10000C1500000 H500000241470 + 03.7526524260
100000C15000000 H50000006043590 + 06.9288020
Results of generation of Kekulé structures for polycyclopentadienes. The legend for the column headers is the same as for Table 2. Parenthetical values in the column No. non-existent are the counts of structures for which Kekulé representations were found using a backtrack algorithm [5]. The calculation statistics for polycyclopentadienes differ from those for the rest of studied compounds. Specifically, 100 calculations were performed for the number of atoms in the 106-107 range and 10 calculations – for the number of atoms >107. The increase in the number of calculations was due to the probabilistic nature of the algorithm for polycyclopentadienes, requiring multiple initial approximations for the generation of Kekulé structures.
Subject areaChemistry (General)
More specific subject areaKekulization of large chemical structures
Type of dataTables, text file, figures, computer-readable chemical structures
How data was acquiredCalculation using Notebook MSI GE 60 2PG Apache
Data format*.mol, *.xyz and *.cc1 files for chemical structures
Experimental factors2.8MHz i7 processor, 16G RAM, Windows 2012 server
Experimental featuresSingle-thread application; Win API method GetTickCount() used for precise time measurements
Data source locationArticle text for source code and calculation statistics, Attachment1.zipfor compiled application, Attachment2.zipfor chemical structures
Data accessibilityDownloadable .zip
Related research articleTrepalin S., Gurke S., Akhukov M., Knizhnik A., Potapkin B., A fast approximate algorithm for determining bond orders in large polycyclic structures, Journal of Molecular Graphics and Modelling, vol. 86 (2019), pp. 52-65
  2 in total

1.  A Java chemical structure editor supporting the Modular Chemical Descriptor Language (MCDL).

Authors:  Sergei V Trepalin; Alexander V Yarkov; Igor V Pletnev; Andrei A Gakh
Journal:  Molecules       Date:  2006-03-29       Impact factor: 4.411

2.  A fast approximate algorithm for determining bond orders in large polycyclic structures.

Authors:  Sergey Trepalin; Sasha Gurke; Mikhail Akhukov; Andrey Knizhnik; Boris Potapkin
Journal:  J Mol Graph Model       Date:  2018-10-11       Impact factor: 2.518

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.