| Literature DB >> 25383097 |
Matteo Floris1, Alberto Manganaro2, Orazio Nicolotti3, Ricardo Medda4, Giuseppe Felice Mangiatordi3, Emilio Benfenati2.
Abstract
BACKGROUND: Methods that provide a measure of chemical similarity are strongly relevant in several fields of chemoinformatics as they allow to predict the molecular behavior and fate of structurally close compounds. One common application of chemical similarity measurements, based on the principle that similar molecules have similar properties, is the read-across approach, where an estimation of a specific endpoint for a chemical is provided using experimental data available from highly similar compounds.Entities:
Keywords: Applicability domain; Chemical similarity; QSAR; Read-across
Year: 2014 PMID: 25383097 PMCID: PMC4212147 DOI: 10.1186/s13321-014-0039-1
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Descriptors in the constitutional descriptors (CD) key
| Name | Description |
|---|---|
| MW | Molecular weight |
| AMW | Average molecular weight |
| Sv | Sum of atomic van der Waals volumes |
| Mv | Mean atomic van der Waals volum |
| Sp | Sum of atomic polarizabilities |
| Mp | Mean atomic polarizability |
| Se | Sum of atomic Sanderson electronegativities |
| Me | Mean atomic Sanderson electronegativity |
| nAt | Number of atoms |
| nSk | Number of non-H atoms |
| nBt | Number of bonds |
| nBo | Number of non-H bonds |
| nBm | Number of multiple bonds |
| nDblBo | Number of double bonds |
| nTrpBo | Number of triple bonds |
| nArBo | Number of aromatic bonds |
| SCBO | Sum of conventional bond orders (H-depleted) |
| nH | Number of Hydrogen atoms |
| nC | Number of Carbon atoms |
| nN | Number of Nitrogen atoms |
| nO | Number of Oxygen atoms |
| nP | Number of Phosphorous atoms |
| nS | Number of Sulfur atoms |
| nF | Number of Fluorine atoms |
| nCl | Number of Chlorine atoms |
| nBr | Number of Bromine atoms |
| nI | Number of Iodine atoms |
| nB | Number of Boron atoms |
| HPerc | Percentage of H atoms |
| CPerc | Percentage of C atoms |
| NPerc | Percentage of N atoms |
| OPerc | Percentage of O atoms |
| XPerc | Percentage of halogen atoms |
| nHet | Number of heteroatoms |
| nX | Number of halogen atoms |
Descriptors in the hetero-atoms descriptors (HD) key
| Name | Description |
|---|---|
| nN | Number of Nitrogen atoms |
| nO | Number of Oxygen atoms |
| nP | Number of Phosphorous atoms |
| nS | Number of Sulfur atoms |
| nF | Number of Fluorine atoms |
| nCl | Number of Chlorine atoms |
| nBr | Number of Bromine atoms |
| nI | Number of Iodine atoms |
| nB | Number of Boron atoms |
| nHet | Number of heteroatoms |
| nX | Number of halogen atoms |
Binary similarity coefficients
| No. | Name | No. | Name |
|---|---|---|---|
| 1 | Simple matching | 23 | Dennis |
| 2 | Rogers/Tanimoto | 24 | Cole 1 |
| 3 | Jaccard/Tanimoto | 25 | Cole 2 |
| 4 | Gleason/Dice/Sorensen/Nei-Li | 26 | Dispersion |
| 5 | Russel-Rao | 27 | Goodman-Kruskal |
| 6 | Forbes | 28 | Sokal-Sneath 3 |
| 7 | Simpson | 29 | Sokal-Sneath 4 |
| 8 | Braun-Blanquet | 30 | Phi |
| 9 | Driver-Kroeber/Ochiai | 31 | Dice 1 |
| 10 | Baroni-Urbani 1 | 32 | Dice 2 |
| 11 | Kulczynski 1 | 33 | Sorgenfrei |
| 12 | Sokal-Sneath 1 | 34 | Cohen |
| 13 | Sokal-Sneath 2 | 35 | Peirce 1 |
| 14 | Jaccard 2 | 36 | Peirce 2 |
| 15 | Faith | 37 | Maxwell-Pilliner |
| 16 | Mountford | 38 | Harris-Lahey |
| 17 | Michael | 39 | CT1 |
| 18 | Rogot-Goldberg | 40 | CT2 |
| 19 | Hawkins-Dotson | 41 | CT3 |
| 20 | Yule 1 | 42 | CT4 |
| 21 | Yule 2 | 43 | CT5 |
| 22 | Fossum | 44 | Austin-Colwell angular coeff. |
The number of each coefficient is the same as in the paper by Todeschini et al.
Non-binary similarity coefficients
| No. | Name | Code |
|---|---|---|
| 1 | Mean Camberra | MC |
| 2 | Divergence | Div |
| 3 | Bray/Curtis | BC |
| 4 | Dice | Dice |
| 5 | Sokal/Sneath | SS1 |
| 6 | Cosine/Ochiai | Cos |
The code of each coefficient is the same as in the paper by Holliday et al.
Best ten results for fingerprints/similarity metrics combinations
| FP | Metrics | BCF R2 | BCF RMSE | LogP R2 | LogP RMSE | DES | UTI |
|---|---|---|---|---|---|---|---|
| Extended | 37 | 0.546 | 0.917 | 0.775 | 0.872 | 0.970 | 0.971 |
| Extended | 34 | 0.546 | 0.919 | 0.776 | 0.870 | 0.970 | 0.970 |
| Extended | 18 | 0.542 | 0.922 | 0.777 | 0.869 | 0.965 | 0.965 |
| Pubchem | 28 | 0.541 | 0.906 | 0.772 | 0.870 | 0.963 | 0.963 |
| Extended | 42 | 0.534 | 0.919 | 0.780 | 0.858 | 0.961 | 0.962 |
| Default | 18 | 0.549 | 0.913 | 0.766 | 0.890 | 0.954 | 0.955 |
| Extended | 13 | 0.541 | 0.913 | 0.770 | 0.875 | 0.954 | 0.954 |
| Default | 34 | 0.549 | 0.913 | 0.765 | 0.891 | 0.953 | 0.953 |
| Extended | 1 | 0.540 | 0.917 | 0.770 | 0.876 | 0.950 | 0.950 |
| Default | 37 | 0.549 | 0.913 | 0.764 | 0.893 | 0.950 | 0.950 |
FP stands for the fingerprint type, Metrics for the number (id) of the binary similarity coefficient (as reported in Table 3), for the R2 correlation coefficient, RMSE for the root mean square error, DES for the desirability function, UTI for the utility function.
Best ten results for keys weights/similarity metrics combinations
| Wfp | Whd | Wcd | Wfg | Metrics | BCF R2 | BCF RMSE | LogP R2 | LogP RMSE | DES | UTI |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.4 | 0.1 | 0.35 | 0.15 | 3 | 0.63 | 0.83 | 0.87 | 0.68 | 0.996 | 0.996 |
| 0.3 | 0.15 | 0.35 | 0.2 | 3 | 0.62 | 0.84 | 0.87 | 0.67 | 0.996 | 0.996 |
| 0.3 | 0.15 | 0.3 | 0.25 | 3 | 0.62 | 0.84 | 0.87 | 0.68 | 0.993 | 0.993 |
| 0.3 | 0.1 | 0.35 | 0.25 | 3 | 0.62 | 0.84 | 0.87 | 0.67 | 0.992 | 0.992 |
| 0.3 | 0.2 | 0.35 | 0.15 | 3 | 0.62 | 0.84 | 0.87 | 0.67 | 0.992 | 0.992 |
| 0.3 | 0.2 | 0.3 | 0.2 | 3 | 0.62 | 0.84 | 0.87 | 0.68 | 0.991 | 0.991 |
| 0.3 | 0.2 | 0.25 | 0.25 | 3 | 0.62 | 0.84 | 0.87 | 0.69 | 0.989 | 0.989 |
| 0.4 | 0.15 | 0.3 | 0.15 | 3 | 0.62 | 0.83 | 0.86 | 0.70 | 0.989 | 0.989 |
| 0.4 | 0.1 | 0.3 | 0.2 | 3 | 0.62 | 0.84 | 0.86 | 0.69 | 0.988 | 0.988 |
| 0.3 | 0.05 | 0.35 | 0.3 | 3 | 0.61 | 0.85 | 0.87 | 0.67 | 0.988 | 0.988 |
Wxx stands for the weights of the different keys contributions (FP, HD, CD, FG, as defined in the article), Metrics for the number (id) of the non-binary similarity coefficient (as reported in Table 4), for the R2 correlation coefficient, RMSE for the root mean square error, DES for the desirability function, UTI for the utility function.