| Literature DB >> 22373441 |
Masaaki Shimizu1, Hiroshi Nagamochi, Tatsuya Akutsu.
Abstract
BACKGROUND: Enumeration of chemical graphs satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics since it leads to a variety of useful applications including structure determination of novel chemical compounds and drug design.Entities:
Mesh:
Year: 2011 PMID: 22373441 PMCID: PMC3287468 DOI: 10.1186/1471-2105-12-S14-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A chemical compound and its feature vector. An illustration of a (Σ, val)-labeled multitree G and its feature vector f1(G). Notice that multiple edges with the same end-vertices are treated as one edge, where #OC = #CO = 2.
Figure 2An instance of ETULF. An instance of ETULF with upper and lower feature vectors, which admits two different solutions.
Figure 3A multigraph and a . A multigraph G and a ρ-detachment H of G.
Figure 4Detachment-cut. Bounding operation by detachment-cut, where vectors g(ℓ, ℓ′), g(ℓ, ℓ′), , and are defined for unordered pairs {ℓ, ℓ′} and those with value=0 are omitted in the tables.
Figure 5Multiplicity-cut. An illustration of the multiplicity-cut procedure, where M = 1.
Comparison of previous method and our method
| Entry Formula | SimEnum | RepEnum | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| time (s) | nodes | solutions | time (s) | nodes | solutions | solved | |||||
| 1 | 1 | 36 | 1037.04 | 177,074,686 | 414,890 | 163.32 | 44,340,488 | 414,890 | 729 | ||
| 2 | 1 | 318 | 2.97 | 392,246 | 44 | T.O. | 2,381,360,000 | N.F. | 65,909,572 | ||
| 3 | 1 | 334 | 1.22 | 145,213 | 2 | T.O. | 3,293,260,000 | N.F. | 96,860,588 | ||
| C00062 | 26 | 4 | 1 | 353 | 0.33 | 34,539 | 1 | T.O. | 2,780,050,000 | N.F. | 81,766,176 |
| C6H14N2O4 | 5 | 1 | 371 | 0.24 | 20,361 | 1 | T.O. | 1,561,230,000 | N.F. | 45,918,529 | |
| 6 | 1 | 385 | 0.25 | 15,166 | 1 | T.O. | 569,590,000 | N.F. | 16,752,647 | ||
| 7 | 1 | 396 | 0.18 | 14,547 | 1 | T.O. | 79,870,000 | N.F. | 2,349,117 | ||
| 1 | 1 | 36 | T.O. | 377,260,000 | N.F. | T.O. | 413,000,000 | N.F. | 460 | ||
| 2 | 1 | 318 | 7.24 | 845,760 | 25 | T.O. | 1,442,760,000 | N.F. | 70,175,902 | ||
| 3 | 1 | 331 | 2.81 | 307,151 | 7 | T.O. | 3,316,970,000 | N.F. | 195,115,882 | ||
| C03343 | 37 | 4 | 1 | 347 | 1.03 | 99,945 | 1 | T.O. | 2,494,780,000 | N.F. | 146,751,764 |
| C16H22O4 | 5 | 1 | 364 | 0.98 | 87,600 | 1 | T.O. | 1,050,480,000 | N.F. | 61,792,941 | |
| 6 | 1 | 382 | 0.76 | 60,194 | 1 | T.O. | 315,820,000 | N.F. | 18,577,647 | ||
| 7 | 1 | 399 | 0.57 | 42,538 | 1 | T.O. | 41,450,000 | N.F. | 2,438,235 | ||
| 1 | 1 | 38 | T.O. | 157,320,000 | N.F. | T.O. | 200,490,000 | N.F. | 1,388 | ||
| 2 | 1 | 326 | 37.59 | 1,940,295 | 238 | T.O. | 2,911,390,000 | N.F. | 66,167,954 | ||
| 3 | 1 | 348 | 1.71 | 60,792 | 3 | T.O. | 2,673,940,000 | N.F. | 60,771,363 | ||
| C07178 | 46 | 4 | 1 | 371 | 0.35 | 14,248 | 1 | T.O. | 1,925,490,000 | N.F. | 43,761,136 |
| C21H28N2O5 | 5 | 1 | 392 | 0.27 | 10,866 | 1 | T.O. | 743,940,000 | N.F. | 16,907,727 | |
| 6 | 1 | 3110 | 0.27 | 10,680 | 1 | T.O. | 93,880,000 | N.F. | 2,133,636 | ||
| 7 | 1 | 3125 | 0.24 | 9,276 | 1 | T.O. | 19,270,000 | N.F. | 437,954 | ||
| 1 | 1 | 35 | T.O. | 382,470,000 | N.F. | T.O. | 552,290,000 | N.F. | 61 | ||
| 2 | 1 | 316 | T.O. | 211,800,000 | N.F. | T.O. | 530,930,000 | N.F. | 10,451,912 | ||
| 3 | 1 | 327 | 1395.13 | 144,244,042 | 206 | T.O. | 3,314,260,000 | N.F. | 194,956,470 | ||
| C03690 | 61 | 4 | 1 | 341 | 121.36 | 11,332,363 | 4 | T.O. | 2,392,530,000 | N.F. | 140,737,058 |
| C24H38O4 | 5 | 1 | 357 | 83.70 | 6,978,557 | 2 | T.O. | 958,650,000 | N.F. | 56,391,176 | |
| 6 | 1 | 375 | 40.11 | 2,923,819 | 1 | T.O. | 298,600,000 | N.F. | 17,564,705 | ||
| 7 | 1 | 392 | 16.50 | 1,096,128 | 1 | T.O. | 38,670,000 | N.F. | 2,274,705 | ||
Comparison of SimEnum and RepEnum for the problem ETULF.
Note:
(1) C00062, C03343, C07178, and C03630 are the chemical compounds in the KEGG LIGAND database, respectively;
(2) n is the number of vertices in an instance preprocessed by replacing each benzene ring with a new atom having six valences;
(3) K is the level of given feature vectors;
(4) w is the width for constructing upper and lower feature vectors;
(5) f is the number of feature vectors in a given set;
(6) “time (s)” is the CPU time in seconds;
(7) T.O. means “time over” (the time limit is set to be 1,800 seconds);
(8) “nodes” is (the sum of) the number of nodes of family trees that are traversed;
(9) “solutions” is the number of all possible solutions;
(10) “solved” is the number of feature vectors which the algorithm RepEnum solved in the time limit; and (11) N.F. means “not found.”
Comparison of varying width
| Entry Formula | SimEnum | |||||
|---|---|---|---|---|---|---|
| time (s) | nodes | solutions | ||||
| 2 | 0 | 0.51 | 55,196 | 6 | ||
| 2 | 1 | 3.58 | 400,501 | 44 | ||
| 2 | 2 | 7.58 | 835,509 | 503 | ||
| C00062 | 26 | 2 | 3 | 10.84 | 1,163,548 | 2,351 |
| C6H14N2O4 | 2 | 4 | 12.55 | 1,349,057 | 5,430 | |
| 2 | 5 | 13.29 | 1,431,075 | 9,852 | ||
| 2 | 50 | 14.31 | 1,537,496 | 25,425 | ||
| 2 | 0 | 0.34 | 35,952 | 9 | ||
| 2 | 1 | 8.39 | 845,760 | 25 | ||
| 2 | 2 | 48.27 | 4,815,369 | 41 | ||
| C03343 | 37 | 2 | 3 | 149.83 | 14,781,738 | 305 |
| C16H22O4 | 2 | 4 | 377.01 | 37,435,878 | 40,732 | |
| 2 | 5 | 639.68 | 63,459,180 | 106,870 | ||
| 2 | 50 | 1118.75 | 110,703,034 | 510,079 | ||
| 2 | 0 | 2.33 | 111,781 | 16 | ||
| 2 | 1 | 46.81 | 2,246,578 | 238 | ||
| 2 | 2 | 96.52 | 4,715,072 | 1,375 | ||
| C07178 | 46 | 2 | 3 | 152.18 | 7,420,060 | 6,824 |
| C21H28N2O5 | 2 | 4 | 179.42 | 8,744,563 | 19,180 | |
| 2 | 5 | 199.66 | 9,677,513 | 29,891 | ||
| 2 | 50 | 255.01 | 12,292,587 | 54,861 | ||
| 5 | 0 | 19.50 | 1,482,017 | 2 | ||
| 5 | 1 | 220.14 | 16,063,569 | 5 | ||
| 5 | 2 | 439.12 | 33,037,741 | 32 | ||
| C03690 | 61 | 5 | 3 | 684.88 | 52,207,745 | 178 |
| C24H38O4 | 5 | 4 | 1024.96 | 78,509,554 | 349 | |
| 5 | 5 | 1285.55 | 98,762,291 | 615 | ||
| 5 | 50 | T.O. | 136,835,134 | N.F. | ||
Comparison of the performance for varying w for the problem ETULF.