| Literature DB >> 30510977 |
Sergey Trepalin1, Sasha Gurke2, Mikhail Akhukov3, Andrey Knizhnik3, Boris Potapkin3.
Abstract
Two new features are added to existing algorithms for kekulization of chemical structures, i.e., handling of triple and cumulene bonds in cycles and use of random atom sorting to remove unmatched atoms. Handling of triple and cumulene bonds enables kekulization of graphynes and graphdiynes. Random sorting speeds up the calculation time, i.e., kekulization of large chemical structures containing about 107 atoms takes ≤1 min on a typical PC. Source codes (Pascal, GNU GPL license) are included as a compiled application (Windows 64). Calculation times and unmatched atom statistics are provided for graphenes, graphynes, nanotubes, graphyne nanotubes and fullerenes. Benchmark comparisons are made for some data.Entities:
Year: 2018 PMID: 30510977 PMCID: PMC6258249 DOI: 10.1016/j.dib.2018.10.128
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1The pattern for generation of C[12,12] nanotubes.
Time required for generation of Kekulé structures and unmatched atom statistics for nanotubes and graphenes.
| Chain length | Formula | Time (ms) | Unmatched atoms statistics | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 | 26 | 28 | 30 | 32 | 34 | 36 | 38 | 40 | 42 | |||
| Nanotubes | |||||||||||||||||||||||
| 1 | C144 H48 | 0.33 | 32 | 351 | 510 | 107 | |||||||||||||||||
| 10 | C1440 H48 | 2.42 | 1 | 28 | 93 | 111 | 146 | 232 | 215 | 113 | 9 | 3 | |||||||||||
| 100 | C14400 H48 | 28.1 | 5 | 12 | 15 | 36 | 119 | 270 | |||||||||||||||
| 1000 | C144000 H48 | 407.7 | 1 | 1 | 1 | 12 | 91 | 281 | 332 | 227 | 51 | 3 | |||||||||||
| 10000 | C1440000 H48 | 8123.3 | 6 | 30 | 34 | 23 | 7 | ||||||||||||||||
| 100000 | C14400000 H48 | 108437 | 1 | ||||||||||||||||||||
| Graphenes | |||||||||||||||||||||||
| 1 | C144H38 | 0.16 | 31 | 340 | 442 | 182 | 5 | ||||||||||||||||
| 10 | C1440 H146 | 2.09 | 9 | 42 | 97 | 140 | 166 | 199 | 167 | 107 | 58 | 15 | |||||||||||
| 100 | C14400 H1226 | 28.5 | 6 | 14 | 12 | 20 | 24 | 77 | 165 | 255 | 248 | 138 | 37 | 4 | |||||||||
| 1000 | C144000 H12026 | 451.8 | 2 | 2 | 2 | 1 | 5 | 7 | 36 | 168 | 421 | 288 | 65 | 3 | |||||||||
| 10000 | C1440000 H120026 | 8497.8 | 1 | 11 | 37 | 42 | 9 | ||||||||||||||||
| 100000 | C14400000 H1200026 | 118297 | 1 | ||||||||||||||||||||
| Data for some compounds from | |||||||||||||||||||||||
| tube-980 | C980H26 | 1.36 | 17 | 75 | 164 | 312 | 246 | 143 | 39 | 4 | |||||||||||||
| sheet-1800 | C1800H178 | 2.19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||||||
| sheet-19602 | C19602H592 | 28.9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||||||
Where:
Chain length – number of monomeric units in a polymer,
Formula – molecular formula,
Time (ms) – time required for kekulization of a single chemical structure in milliseconds,
Unmatched atoms statistic - number of calculations for a given number of unmatched atoms (2-42).
Fig. 2The pattern for generation of graphenes.
Fig. 3Fullerenes and porphine used for model calculations.
Time required for generation of Kekulé structures and unmatched atoms statistics before their removal for fullerenes, their aza-analogs and porphine. The data is for 1000 transactions.
| Compound | Formula | Time (ms) | No. non-existent | Avg. no. iterations | Max. no. iterations | Unmatched atoms statistic | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 4 | 6 | 8 | >8 | ||||||
| С20 | С20 | 0.078 | 0 | 1.00 | 1 | 699 | 301 | ||||
| Diaza-C20 | C18N2 | 0.062 | 0 | 1.03 | 3 | 409 | 583 | 8 | |||
| Tetraaza-C20 | C16N4 | 0.438 | 33 + 59 | 14.2 | 2 | 499 | 493 | 8 | |||
| Octaaza-C20 | C12N8 | 0.953 | 541 + 218 | 70.6 | 1 | 769 | 222 | 9 | |||
| C60 | C60 | 0.156 | 0 | 1.00 | 1 | 329 | 600 | 68 | 3 | ||
| Diaza-C60 | C58N2 | 0.171 | 0 | 1.01 | 2 | 124 | 681 | 182 | 11 | 1 | 1 |
| Tetraaza-C60 | C56N4 | 0.234 | 4 + 6 | 3.72 | 3 | 78 | 556 | 337 | 28 | 1 | |
| Octaaza-C60 | C52N8 | 1.250 | 63 + 83 | 24.4 | 3 | 146 | 492 | 306 | 54 | 1 | 1 |
| C70 | C70 | 0.187 | 0 | 1.003 | 2 | 264 | 656 | 76 | 4 | ||
| Diaza-C70 | C68N2 | 0.188 | 0 | 1.01 | 3 | 102 | 269 | 24 | |||
| Tetraaza-C70 | C66N4 | 0.407 | 2 + 3 | 2.54 | 3 | 61 | 498 | 382 | 57 | 2 | |
| Octaaza-C70 | C62N8 | 0.890 | 49 + 52 | 14.86 | 3 | 79 | 419 | 390 | 103 | 9 | |
| C80 | C80 | 0.187 | 0 | 1.093 | 3 | 291 | 615 | 94 | |||
| Diaza-C80 | C78N2 | 0.203 | 0 | 1.065 | 3 | 90 | 577 | 312 | 21 | ||
| Tetraaza-C80 | C76N4 | 0.344 | 1 + 5 | 1.69 | 4 | 50 | 421 | 440 | 84 | 5 | |
| Octaaza-C80 | C72N8 | 0.860 | 31 + 42 | 12.8 | 4 | 55 | 303 | 461 | 162 | 18 | 1 |
| C82 | C82 | 0.203 | 0 | 1.014 | 2 | 223 | 622 | 149 | 5 | 1 | |
| Diaza-C82 | C80N2 | 0.203 | 0 | 1.020 | 2 | 74 | 547 | 347 | 30 | 2 | |
| Tetraaza-C82 | C78N4 | 0.328 | 2 + 4 | 2.84 | 2 | 42 | 450 | 425 | 78 | 5 | |
| Octaaza-C82 | C74N8 | 0.984 | 28 + 37 | 13.97 | 3 | 46 | 288 | 468 | 175 | 21 | 2 |
| Porphine | C20H14N4 | 0.093 | 0 | 1.00 | 1 | 441 | 559 | ||||
Where:
Compound – chemical structure in Fig. 2,
Formula – molecular formula,
Time (ms) – time required for kekulization of a single chemical structure in milliseconds,
No. non-existent – number of compounds from a 1000 set for which Kekulé structure was not generated,
Avg.no. iterations – average number of shuffles. This number includes the maximum number of shuffles (300) after which a decision is made that a Kekulé structure does not exist,
Max. no. iterations – maximum number of shuffles required for successful kekulization,
Unmatched atom statistic - number of calculations for a given number of unmatched atoms (0->8).
Fig. 4Monomer units of graphynes GY1 (A) and G7 (B). Upper half of GY7 monomer unit is shown only. The entire monomer could be visualized by connecting the upper part with its reflection relative to the zig-zag line, using two aromatic bonds.
Kekulization of graphynes and graphyne nanotubes of various degrees of polymerization.
| Chain length | 1 | 10 | 100 | 1000 | 10,000 | 100,000 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Formula | Time (ms) | Formula | Time (ms) | Formula | Time (ms) | Formula | Time (ms) | Formula | Time (ms) | Formula | Time (ms) | |
| Tube | C192 H72 | 0.5 | C1920 H72 | 2.06 | C19200 H72 | 29.8 | C192000H72 | 332 | C1920000 H72 | 7201 | C19200000 H72 | 59,515 |
| GY1 | C136 H74 | 0.45 | C1360 H146 | 1.36 | C13600 H866 | 13.7 | C136000 H8066 | 176 | C1360000 H80066 | 4811 | C13600000 H800066 | 75,938 |
| GY7 | C188 H86 | 0.39 | C1880 H122 | 2.44 | C18800 H482 | 25.3 | C188000 H4082 | 358 | C1880000 H40082 | 6414 | C18800000 H400082 | 82,032 |
Fig. 5Structural repeating unit of polycyclopentadiene.
Results of generation of Kekulé structures for polycyclopentadienes.
| Chain length | Formula | Time (ms) | No. non-existent | Avg. no. iterations | Max. no. iterations | Unmatched atoms statistic | |||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 4 | 6 | ||||||
| Linear | |||||||||
| 1 | C150 H54 | 0.53 | 0 + 0 | 1.93 | 13 | 372 | 607 | 21 | 0 |
| 10 | C1500H504 | 4.08 | 0 + 0 | 4.63 | 25 | 747 | 250 | 3 | 0 |
| 10 | C1498H504N2 | 35.2 | 0 + 10(1) | 22.9 | 264 | 477 | 431 | 81 | 1 |
| 10 | C1496H504N4 | 65.75 | 4 + 91(71) | 68.9 | 300 | 262 | 393 | 210 | 31 |
| 10 | C1492H504N8 | 189 | 87 + 348(261) | 163 | 299 | 50 | 183 | 269 | 247 |
| 100 | C15000H5004 | 67.7 | 0 + 0 | 5.60 | 36 | 852 | 146 | 2 | 0 |
| 1000 | C150000H50004 | 1192 | 0 + 0 | 5.61 | 47 | 850 | 150 | 0 | 0 |
| 10000 | C1500000 H500004 | 47354 | 0 + 0 | 7.57 | 33 | 89 | 11 | 0 | 0 |
| 100000 | C15000000 H5000004 | 505401 | 0 + 0 | 6.5 | 22 | 7 | 3 | 0 | 0 |
| Cyclic | |||||||||
| 1 | C150H50 | 0.34 | 0 + 0 | 1,23 | 5 | 289 | 681 | 30 | 0 |
| 10 | C1500H500 | 2.77 | 0 + 0 | 2.89 | 17 | 624 | 332 | 44 | 0 |
| 100 | C15000H5000 | 33.6 | 0 + 0 | 2.90 | 28 | 622 | 331 | 47 | 0 |
| 1000 | C15000H50000 | 585 | 0 + 0 | 2.80 | 14 | 634 | 335 | 31 | 0 |
| 10000 | C1500000 H500000 | 16368 | 0 + 0 | 3.09 | 12 | 69 | 29 | 2 | 0 |
| 100000 | C15000000 H5000000 | 193859 | 0 + 0 | 2.2 | 6 | 7 | 2 | 1 | 0 |
| Möbius | |||||||||
| 1 | C150H50 | 0.36 | 0 + 0 | 1.26 | 4 | 189 | 687 | 121 | 3 |
| 10 | C1500H500 | 3.67 | 0 + 0 | 3.66 | 29 | 544 | 382 | 73 | 1 |
| 100 | C15000H5000 | 52.2 | 0 + 0 | 4.41 | 27 | 616 | 332 | 51 | 1 |
| 1000 | C150000H50000 | 1381 | 0 + 0 | 4.42 | 29 | 602 | 350 | 48 | 0 |
| 10000 | C1500000 H500000 | 24147 | 0 + 0 | 3.75 | 26 | 52 | 42 | 6 | 0 |
| 100000 | C15000000 H5000000 | 604359 | 0 + 0 | 6.9 | 28 | 8 | 0 | 2 | 0 |