| Literature DB >> 29086120 |
José R Valdés-Martiní1, Yovani Marrero-Ponce2,3,4,5,6, César R García-Jacas7,8,9, Karina Martinez-Mayorga7, Stephen J Barigye10, Yasser Silveira Vaz d'Almeida11, Hai Pham-The12, Facundo Pérez-Giménez13, Carlos A Morell14.
Abstract
BACKGROUND: In previous reports, Marrero-Ponce et al. proposed algebraic formalisms for characterizing topological (2D) and chiral (2.5D) molecular features through atom- and bond-based ToMoCoMD-CARDD (acronym for Topological Molecular Computational Design-Computer Aided Rational Drug Design) molecular descriptors. These MDs codify molecular information based on the bilinear, quadratic and linear algebraic forms and the graph-theoretical electronic-density and edge-adjacency matrices in order to consider atom- and bond-based relations, respectively. These MDs have been successfully applied in the screening of chemical compounds of different therapeutic applications ranging from antimalarials, antibacterials, tyrosinase inhibitors and so on. To compute these MDs, a computational program with the same name was initially developed. However, this in house software barely offered the functionalities required in contemporary molecular modeling tasks, in addition to the inherent limitations that made its usability impractical. Therefore, the present manuscript introduces the QuBiLS-MAS (acronym for Quadratic, Bilinear and N-Linear mapS based on graph-theoretic electronic-density Matrices and Atomic weightingS) software designed to compute topological (0-2.5D) molecular descriptors based on bilinear, quadratic and linear algebraic forms for atom- and bond-based relations.Entities:
Keywords: Atom/bond-based molecular descriptor; Bilinear and quadratic indices; Double stochastic; Free and open source software; Linear; Mutual probability matrices; Non-stochastic; QSAR; QuBiLS-MAS; Simple stochastic; ToMoCoMD-CARDD
Year: 2017 PMID: 29086120 PMCID: PMC5462671 DOI: 10.1186/s13321-017-0211-5
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
The molecular structure and the atom adjacency stochastic (ss) and non-stochastic (ns) matrices for the k = 0, 1, 2 corresponding to the Isonicotinic Acid
Schema 1The stages involved in the computation of the NS-, SS-, DS-, and MP-pseudograph-theoretical electronic-density matrices
The molecular structure considering lone-pair electrons (n) for the first and second powers of the molecular pseudograph’s atom adjacency mutual probability (mp), non- (ns), double (ds)- and stochastic (ss) matrices for Isonicotinic Acid
The zero, first and second powers of the molecular pseudograph’s atom adjacency double stochastic and mutual probability matrices for Isonicotinic Acid
First, second and third order NS—matrices for Isonicotinic Acid, obtained by applying three types of topological constraints (cut-off): Self-Returning Walks (SRW), Non-Self-Returning Walks (NSRW) and a topological path cut-off distance from 2 to 5 (LAG [2–5])
Schema 2Workflow followed in the computation of the ToMoCoMD-CARDD QuBiLS-MAS MDs
Fig. 1Main graphic user interface for QuBiLS-MAS software (a) and dialog windows to configure the following parameters: invariants or aggregation operators (b), atom properties (c) and local-fragment chemical groups (d)
Comparison between the old software (TOMOCOMD) and the new one proposed in this report (QuBiLS-MAS)
| Features | Computer program | |
|---|---|---|
| TOMOCOMD | QuBiLS-MAS | |
|
| ||
| Theoretical | ||
| Algebraic form maps | 3 (quadratic, bilinear and linear) | |
| Atom and Bond level | Yes | Yes |
| Matrices | 2 (NS, SS) | 4 (NS, SS, DS, MP) |
| Atom Weightings | 4 (M, V, P, E) | 10 (M, V, P, E, A, C, PSA, R, H, S) |
| Local-fragments | 3 (D, G, X) | 7 (A, C, D, G, M, P, X) |
| Chirality | YES, | YES, extended to |
| Lone-pair electrons | – | Yes |
| Topological constraints | – | Yes, three cut-off types (SRW, NSRW, Lag P) |
| H-atoms consideration | – | Yes, permits inclusion or removal |
| Invariants or aggregation operators | – | Yes, 21 aggregation operators classified in four major groups |
| Computational | ||
| Open source | – | Yes, under LGPL |
| Availability | Shareware | Freeware |
| Programming language | Borland Delphi | Java |
| Clear Object-oriented source code design | – | Yes |
| Canonical namespace packages structure | – | Yes, under |
| Target operating system(OS) | Microsoft Windows | Platform-independent |
| Graphical user interface | Yes | Yes |
| Command line | – | Yes |
| Portable MDs library | – | Yes, as pre-compiled Java |
| Supported input format | In-house file format | mol/sdf MDL |
| Output format | Text File (TSV) | Text File (TSV, SSV, CSV), Weka (ARFF) |
| Structure curation and cleaning | – | Yes, available under |
| Built-in example data | – | Yes, six chemical datasets |
| Unique MD header | – | Yes, identifying the codification scheme |
| Batch Processing mode | – | Yes |
| Parallelized computing | – | Yes, using the Fork/Join framework |
| Configurable projects | – | Yes |
| Import/export configuration | – | Yes, using a XML file format |
| Calculation progress | – | Yes, for descriptors and molecules |
| Real-time memory monitor | – | Yes, with garbage collection option when desired |
| Events logging | – | Yes, accessible through the |
| Calculation report | – | Yes |
| Runtime help accessibility | – | Yes |
| User owner’s manual | – | Yes |
| Online webpage | – | Yes |
Matrices Non-stochastic (NS), simple stochastic (SS), double stochastic (DS) and mutual probability (MP). Atom weightings (atomic properties) (1) atomic mass (M), (2) the Van der Waals volume (V), (3) the atomic polarizability (P), (4) atomic electronegativity according to Pauling scale (E), (5) atomic Ghose–Crippen LogP (A), (6) atomic Gasteiger–Marsili charge (C), (7) atomic polar surface area (PSA), (8) atomic refractivity (R), (9) atomic hardness (H), and (10) atomic softness (S). Local-fragments (atom-type and/or group-type) H-bond acceptors (A), carbon atoms in aliphatic chains (C), H-bond donors (D), halogens (G), terminal methyl groups (M), carbon atoms in an aromatic portion (P) and heteroatoms (X). Chirality trigonometric 3D-chirality correction factor (). Topological constraints (cut-offs) (1) keeping only the diagonal elements of the matrix, denoted as “Self-Returning Walks” (SRW), (2) keeping only the offdiagonal elements of the matrix, denoted as “Non-Self-Returning Walks” (NSRW), and (3) keeping only the elements within a given interval, based on the topological distance for a path cut-off, denoted as Lag p
Main features of commonly used tools for molecular descriptors (MDs) calculations
| Software | Number of types of MDs | Configuration of MDs parameters | Advantages | Disadvantages | Additional remarks and online reference |
|---|---|---|---|---|---|
| QuBiLS-MAS v1.0 | 2080 (linear, quadratic and bilinear) | 1. Atom- or Bond-Based | 1. Computes MDs based on algebraic maps | 1. Only accepts MDL files (MOL or SDF) as input formats | 1. Uses CDK to read molecular files and calculate atomic properties |
| 2. Atomic properties | 2. 10 atom weighting schemes | 2. Requires Java JRE 1.7 or above | |||
| 3. Local-fragments | 3. Graphic user-friendly interface and command-line interface | ||||
| 4. Matrix approaches | 4. Platform-independency | ||||
| 5. Aggregation operators | 5. Supports any organic molecules | ||||
| 6. Add (or remove) hydrogen atoms | 6. Free download and support | ||||
| 7. Consider lone-pair electrons | 7. Batch mode processing | ||||
| 8. Data cleaning module | |||||
| 9. Parallel processing | |||||
| PaDEL-Descriptor v2.0 | 43 | None | 1. Graphic user interface | 1. One functionality for data cleaning tasks (remove salts) | 1. Uses CDK to read molecular files and calculate most of the descriptors and fingerprints |
| 2. Fully cross-platform | 2. No MDs batch processing | 2. Employs Java Web Start technology | |||
| 3. Command line interface | |||||
| 4. Free and Open Source | |||||
| 5. Accepts multiple file formats (>90 formats) | |||||
| 6. Parallel processing | |||||
| DRAGON v6.0 | 29 | 1. Predefined atom weighting schemes | 1. Graphic user-friendly interface | 1. Only Windows and Linux platforms | Academic permanent license: 900 euros (to be installed on 3 PCs) |
| 2. Selection of single molecular descriptors included in the different blocks | 2. Command line interface | 2. No parallel processing |
| ||
| 3. Batch mode processing | 3. No data cleaning functionalities | ||||
| 4. Supports any organic molecules | 4. Does not allow selection of local-fragments | ||||
| 5. Accepts the formats: MDL, Sybyl, HyperChem, Macromodel, Smiles, CML and HyperChem | 5. Commercial cost | ||||
| CDK Descriptor Calculator v1.3.9 | 48 | 1. Add (or remove) hydrogen atom | 1. Graphic user interface | 1. Only accepts MDL files (MOL or SDF) as input formats | Use CDK library and requires JRE 1.6 |
| 2. Command line execution | 2. No data cleaning functionalities |
| |||
| 3. Fully cross-platform | 3. Does not allow selection of local-fragments | ||||
| 4. Free software | 4. Does not allow selection of atom weighting schemes | ||||
| 5. Batch mode processing | |||||
| BlueDesc | 36 | None | 1. Free and Open Source | 1. No graphic user interface | Use CDK and JOELib2 library and requires Java JRE 1.6 |
| 2. Fully cross-platform | 2. Only accepts MDL files (MOL or SDF) as input formats |
| |||
| 3. No parallel processing | |||||
| 4. No data cleaning functionalities | |||||
| 5. Does not allow selection of local-fragments | |||||
| 6. Does not allow selection of atom weighting schemes | |||||
| Model | 98 | None | 1. Web-based graphic user interface | 1. No parallel processing | Use of MODEL for commercial purposes is not allowed |
| 2. Accepts the formats: PDB, MDL, MOL2,COR | 2. No data cleaning tasks |
| |||
| 3. Does not allow selection of local-fragments | |||||
| 4. Does not allow selection of atom weighting schemes | |||||
| 5. For academic purposes only | |||||
| Mol2 | 20 | None | 1. Command line interface | 1. No graphic user interface |
|
| 2. Free of charge download request | 2. Only Windows platform | ||||
| 3. Only accepts SDfile format | |||||
| 4. No parallel processing | |||||
| 5. No data cleaning functionalities | |||||
| 6. Does not allow selection of local-fragments | |||||
| 7. Does not allow selection of atom weighting schemes | |||||
| MOE | – | None | 1. Graphic user interface | 1. Only accepts SDfile format |
|
| 2. Command line interface | 2. No parallel processing | ||||
| 3. Data cleaning tasks | 3. Does not allow selection of local-fragment | ||||
| 4. Fully cross-platform | 4. Does not allow selection of atom weighting schemes | ||||
| VolSurf | 22 | None | 1. Graphic user interface | 1. Commercial |
|
| 2. Command line interface | 2. Only Linux platform | ||||
| 3. Accepts several formats: MDL SDF, Sybyl, Mol2, Multi Mol2, GRID kout | 3. Only compute 2D MDs | ||||
| 4. No parallel processing | |||||
| 5. Does not allow selection of local-fragment | |||||
| 6. Does not allow selection of atom weighting schemes | |||||
| Adriana. Code | 5 | None | 1. Graphic user interface | 1. Commercial | A demo version is available on request free of charge |
| 2. Command line interface | 2. Only Windows and Linux platforms |
| |||
| 3. Batch mode processing | 3. No parallel processing | ||||
| 4. Accepts any organic molecule | 4. No data cleaning functionalities | ||||
| 5. Several input and output formats | 5. Does not allow selection of local-fragment | ||||
| 6. Does not allow selection of atom weighting schemes | |||||
| CODESSA PRO | 8 | None | 1. Graphic user interface | 1. Commercial |
|
| 2. Only for Windows platform | |||||
| 3. No parallel processing | |||||
| 4. No batch mode processing | |||||
| 5. Does not allow selection of local-fragment | |||||
| 6. Does not allow selection of atom weighting schemes | |||||
| PowerMV | – | None | 1. Graphic user interface | 1. Only for Windows platform | Requires Microsoft.Net 1.1 or above |
| 2. No parallel processing |
| ||||
| 3. No batch mode processing | |||||
| 4. Does not allow selection of local-fragment | |||||
| 5. Does not allow selection of atom weighting schemes | |||||
| Molconn-Z v4.10 | 79 | Multi-platform SGI Irix, Linux, Solaris, Mac OS-X and Windows. 12 months free Support | No GUI, Commercial | Minimum price US$750 for a Single Educational Node/User license | |
|
| |||||
| Pre ADMET Descriptor | 34 | GUI, Free web-based Limited application and Commercial PC version. Maintenance and Upgrade free of charge | Commercial. Runs on Windows. Only accepts MDL files (MOL or SDF) as input formats | Requires Microsoft.NET Framework 2.0 and minimum price is US$1 000 for 1 year Academic license | |
|
| |||||
| Toxicity Estimation Software Tool (T.E.S.T.) v4.1 | 13 (628) | GUI, Open source and multi-platform | Platform specific distributions. Only accepts MOL or SMILES as input formats | Based on CDK library. Requires Java JRE 1.6 | |
|
| |||||
| ADAPT | 27 | Non-Commercial | Runs on Unix. Heavy-atom limitations up to 255 atoms. Only accepts MOL as input formats | Written in Fortran and is installed on a DEC alpha workstation | |
|
| |||||
| ChemAxon Calculator Plugins | 12 | 27 | Free for non-commercial, freely accessible web pages | s |
|
| GUI, Batch execution from command line | |||||
| Multi-platform Windows, HP, MacOS X, Solaris and Linux | |||||
| JOELib2 | 40 | Free, Open Source, Redistributable. Multi-platform |
| ||
|
| Several (mainly edge-based) topological indices | GUI | Runs on Windows |
| |
| Non-Commercial | No Batch execution |
Fig. 2In-house comparison of Shannon’s entropy distribution for the QuBiLS-MAS 2D-Indices considering the non-stochastic, simple stochastic, double-stochastic and mutual probability matrix formalisms
Fig. 3In-house comparison of Shannon’s entropy distribution for the QuBiLS-MAS 2D-Indices considering the norms, the statistical operators of central tendency and the operators for dispersion and form
Fig. 4Shannon’s entropy distribution for DRAGON MDs families versus bilinear, linear and quadratic QuBiLS-MAS 2D-Indices
Fig. 5Shannon’s entropy distribution for QuBiLS-MAS topological indices and other descriptors computed by well-known software used in cheminformatics studies
Fig. 6Comparison of the performance of some inner features of the QuBiLS-MAS software in QSAR modeling: a the matrix formalisms, b the aggregation operators and c the classical algorithms
Statistical parameters for the best models for 2–6 variables for the physicochemical property log K, considering the 31 structures as the training set
| Size |
|
|
| a ( | F | Models | Equations |
|---|---|---|---|---|---|---|---|
| 2 | 0.778 | 0.734 | 0.738 | −0.208 | 49.16 | log K = 1.596 (±0.885) + 3.809 (±0.582) | (19) |
| TS[1]_MX_B_AB_nCi_2_SS12_T_KA_a-h − 0.118 (±0.011) | |||||||
| KH[1]_MX_F_AB_nCi_2_MP2_T_KA_h | |||||||
| 3 | 0.863 | 0.826 | 0.820 | −0.259 | 57.14 | log K = −32.132 (±3.841) − 75.624 (±9.789) | (20) |
| TS[1]_RA_F_AB_nCi_2_MP2_T_KA_h + 135.484 (±13.179 | |||||||
| TS[4]_PN_Q_AB_nCi_2_MP0_T_KA_h + 1782.101 (±257.835) | |||||||
| KH[2]_PN_B_AB_nCi_2_SS8_T_KA_v-h | |||||||
| 4 | 0.915 | 0.887 | 0.879 | −0.324 | 70.59 | log K = −66.472 (±6.939) − 0.223 ± 0.021) | (21) |
| AC[2]_MX_B_AB_nCi_2_SS7_T_KA_r-h + 0.407 (±0.089) | |||||||
| TS[5]_HM_B_AB_nCi_2_SS8_T_KA_v-h + 131.848 (±10.928) | |||||||
| TS[4]_PN_Q_AB_nCi_2_MP0_T_KA_h + 3323.451 (±355.509) | |||||||
| KH[2]_PN_B_AB_nCi_2_SS8_T_KA_v-h | |||||||
| 5 | 0.932 | 0.902 | 0.890 | −0.376 | 68.53 | log K = −70.522 (±6.342) − 0.246 (±0.020) | (22) |
| AC[2]_MX_B_AB_nCi_2_SS7_T_KA_r-h + 0.422 (±0.081) | |||||||
| TS[5]_HM_B_AB_nCi_2_SS8_T_KA_v-h + 144.507 (±9.991) | |||||||
| TS[4]_PN_Q_AB_nCi_2_MP0_T_KA_h + 4616.536 (±15.439) | |||||||
| GV[2]_MX_Q_AB_nCi_2_MP3_X_KA_h + 3536.215 (±324.863) | |||||||
| KH[2]_PN_B_AB_nCi_2_SS8_T_KA_v-h | |||||||
| 6 | 0.942 (0.960)a | 0.914 (0.937)a | 0.898 (0.925)a | −0.414 (−0.465)a | 65.26 (91.74)a | log K = −81.005 (±6.216) − 0.233 (±0.020) | (23) |
| AC[2]_MX_B_AB_nCi_2_SS7_T_KA_r-h − 39,144.250 (±4.757) | |||||||
| AC[2]_MN_B_AB_nCi_2_MP2_A_KA_c-h + 0.572 (±17.485) | |||||||
| TS[5]_HM_B_AB_nCi_2_SS8_T_KA_v-h + 120.683 (±1.681) | |||||||
| TS[4]_PN_Q_AB_nCi_2_MP0_T_KA_h + 0.804 (±0.354) | |||||||
| TS[6]_HM_Q_AB_nCi_2_SS0_A_KA_h + 3979.089 (±310.376) | |||||||
| KH[2]_PN_B_AB_nCi_2_SS8_T_KA_v-h |
aCompound 31 excluded, taken as outlier, is not taken into account in the training set
Comparison of Qloo2 statistics of nD-QSAR methods for the property log K (CGB)† for 31 (or 30)
|
| PCs/var. | Statistical method |
| Equations/references |
|---|---|---|---|---|
|
| ||||
| Combined electrostatic and shape similarity matrix | 6 | Genetic NN | 0.941 | [ |
| QuBiLS-MASc | 6 | MLR and GA |
| Equation 23 |
| QuBiLS-MAS | 6 | MLR and GA |
| Equation 23 |
| Hodking SM | 6 | Genetic NN | 0.903 | [ |
| QuBiLS-MAS | 5 | MLR and GA |
| Equation 22 |
| QuBiLS-MAS | 4 | MLR and GA |
| Equation 21 |
| Fragment QS-SM | 4 | PLS | 0.886 | [ |
| MEDV-13 | 5 | MLR and GA | 0.882 | [ |
| MiDSASA—“template” | 2 “compounds” | – | 0.88 | [ |
| SOMa | 3 | – | R2 0.85 | [ |
| Tuned-QSAR | 6 | MLR and PCA | 0.842 | [ |
| Autocorrelation vector 30 | – | – | 0.84 | [ |
| CoMMA | 3 | PLS | 0.828 | [ |
| QuBiLS-MAS | 3 | MLR and GA |
| Equation 20 |
| Similarity Indices (ESP MC matrix 30) | 1 | PLS | 0.820 | [ |
| SOMFA/esp + ALPHA | – | SOR | 0.82 | [ |
| Combined electrostatic and shape similarity matrix | 6 | MLR and GA | 0.819 | [ |
| EEVA | 4 | PLS | 0.81 | [ |
| SOM-4D-QSAR | 4 | SOM neural network | 0.80 | [ |
| Charges and Properties from MEPS-AM1 | 5 | MLR | 0.80 | [ |
| HE State/E-Statea,b | 3 | – | 0.80 | [ |
| E-Statea,b | 3 | – | 0.79 | [ |
| CoSA | 3 “Bins” | PLS | 0.78 | [ |
| QSAR/E-State | 3 “atoms” | – | 0.78 | [ |
| TQSI | 4 | MLR | 0.775 | [ |
| EVA | 5 | PLS | 0.77 | [ |
| CoMSA | 1 | PLS | 0.76 | [ |
| MQSM | 5 | MLR and PCA | 0.759 | [ |
| EVA + ALPHA | – | SOR | 0.75 | [ |
| GRIND | – | PLS | 0.75 | [ |
| SEAL | 3 | PLS | 0.748 | [ |
| SOMFA/esp | 6 | PLS | 0.74 | [ |
| CoSCoSAa | 3 | – | 0.74 | [ |
| CoSASA | 3 “atoms” | PLS | 0.73 | [ |
| E-State and kappa shape index | 4 | MLR | 0.72 | [ |
| TARIS | 2 | – | 0.71 | [ |
| MQSM | 3 | MLR | 0.705 | [ |
| Combined electrostatic and shape similarity matrix | 5 | PLS | 0.70 | [ |
| SAMFA-RF | – | RF | 0.69 | [ |
| SAMFA-PLS | 4–5 | PLS | 0.69 | [ |
| 4D-QSAR | 2 | PLS | 0.69 | [ |
| CoMMA (ab initio) | 6 | PLS | 0.689 | [ |
| QSARa | 3 | – | 0.68 | [ |
| SOM-4D-QSAR | 4 | SOM Neural Network | 0.68 | [ |
| Wagener’s (AMSP Method) | – | k-NN and FNN | 0.630 | [ |
| SAMFA-SVM | – | SVM | 0.60 | [ |
| ALPHA | 2 | PLS | 0.57 | [ |
Italic values indicate the results of QuBiLS-MAS approach
aWhen it is applicable, specifies the number of components (PCs)
b1.0 A models
cCompound 31 excluded, taken as outlier, is not taken into account in the training set
†Logarithm of the binding affinity to the corticosteroid-binding globulin (CBG)