| Literature DB >> 23253942 |
Abstract
BACKGROUND: Although programming in a type-safe and referentially transparent style offers several advantages over working with mutable data structures and side effects, this style of programming has not seen much use in chemistry-related software. Since functional programming languages were designed with referential transparency in mind, these languages offer a lot of support when writing immutable data structures and side-effects free code. We therefore started implementing our own toolkit based on the above programming paradigms in a modern, versatile programming language.Entities:
Year: 2012 PMID: 23253942 PMCID: PMC3660204 DOI: 10.1186/1758-2946-4-38
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Output when parsing valid SMILES strings
| LGraph: | LGraph: | LGraph: |
| 0: CH3 | 0: CH(+1) | 0: NH3(+1) |
| 1: CH2 | 1: CH | 1: CH(@) |
| 2: OH | 2: CH | 2: CH3 |
| | | 3: C |
| 0 - 1: - | 0 - 1: : | 4: O |
| 1 - 2: - | 0 - 2: : | 5: O(-1) |
| | 1 - 2: : | |
| | | 0 - 1: - |
| | | 1 - 2: - |
| | | 1 - 3: - |
| | | 3 - 4: = |
| 3 - 5: - |
Molecular graphs are displayed as lists of atoms followed by lists of bonds. Each bond shows the indices of the connected atoms followed by the SMILES symbol representing the type of the bond.
Output when parsing invalid SMILES strings
| CCuCC | Pos. 3 in CCuCC: Unknown character in SMILES-String: u |
| C%12CCCC%1 | Pos. 11 in C%12CCCC%1: % is not followed by two digits |
| C#OC | Invalid bond set for element O: Triple,Single |
In the last example no position is given since this failure did not happen during parsing but during implicit hydrogen detection which is a separate algorithm and therefore has not notion of a ’position in a string’.
Performance of SMILES parsing
| 1 | 17’149 | 10’295 | 7608 |
| 2 | 15’659 | 7123 | 5488 |
| 3 | 15’663 | 7248 | 5433 |
| 4 | 15’880 | 7508 | 5425 |
| 5 | 15’809 | 7669 | 5534 |
| 6 | 15’720 | 7197 | 5471 |
| 7 | 15’665 | 7174 | 5448 |
| 8 | 15’390 | 7296 | 5513 |
| 9 | 15’423 | 7687 | 5696 |
| 10 | 15’523 | 7564 | 5491 |
| 15’788 | 7676 | 5711 |
Time (in milliseconds) taken to parse part of the ZINC database containing about 350’000 structures on a quadcore laptop. The multi-threaded runs ran on all four cores without further optimization of Scala’s parallel collections settings.
Speedup and efficiency of parallelized SMILES parsing
| 1 | 9493 | 1.00 | 1.00 |
| 2 | 5309 | 1.79 | 0.89 |
| 3 | 4134 | 2.30 | 0.77 |
| 4 | 2988 | 3.18 | 0.79 |
| 5 | 2785 | 3.41 | 0.68 |
| 6 | 2647 | 3.59 | 0.60 |
| 7 | 2273 | 4.18 | 0.60 |
| 8 | 2091 | 4.54 | 0.57 |
| 9 | 2037 | 4.66 | 0.52 |
| 10 | 2061 | 4.61 | 0.46 |
| 11 | 2061 | 4.61 | 0.42 |
| 12 | 2096 | 4.53 | 0.38 |
The test runs were performed on a hexacore processor supporting hyperthreading. Again our testing excerpt of the ZINC database was parsed using Scala’s parallel collections.