| Literature DB >> 31516335 |
Andrey A Toropov1, Alla P Toropova1, Giuseppa Raitano1, Emilio Benfenati1.
Abstract
A high level of chromosomal aberrations in peripheral blood lymphocytes may be an early marker of cancer risk, but data on risk of specific cancers and types of chromosomal aberrations are limited. Consequently, the development of predictive models for chromosomal aberrations test is important task. Majority of models for chromosomal aberrations test are so-called knowledge-based rules system. The CORAL software (http://www.insilico.eu/coral, abbreviation of "CORrelation And Logic") is an alternative for knowledge-based rules system. In contrast to knowledge-based rules system, the CORAL software gives possibility to estimate the influence upon the predictive potential of a model of different molecular alerts as well as different splits into the training set and validation set. This possibility is not available for the approaches based on the knowledge-based rules system. Quantitative Structure-Activity Relationships (QSAR) for chromosome aberration test are established for five random splits into the training, calibration, and validation sets. The QSAR approach is based on representation of the molecular structure by simplified molecular input-line entry system (SMILES) without data on physicochemical and/or biochemical parameters. In spite of this limitation, the statistical quality of these models is quite good.Entities:
Keywords: CORAL software; Chromosome aberration; Monte Carlo method; QSAR; SMILES
Year: 2018 PMID: 31516335 PMCID: PMC6734133 DOI: 10.1016/j.sjbs.2018.05.013
Source DB: PubMed Journal: Saudi J Biol Sci ISSN: 1319-562X Impact factor: 4.219
Examples of the Sk, SSk, and HARD for molecular structure represented by the following SMILES O = [N+]([O−])c1ccc(cc1)Cl.
Fig. 1Interpretations for traditional correlation and semi-correlation.
Fig. 2Graphical representation of semi correlations for split 2 (“lucky split”) and statistical characteristics of this model for chromosome aberration test. TP = true positive; TN = true negative; FP = false positive; and FN = false negative.
The statistical quality of models for chromosome aberration test.
| Split | Set | n | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|---|---|
| 1 | Training | 399 | 0.7592 | 0.7981 | 0.7794 | 0.5578 |
| Calibration | 39 | 0.8333 | 0.8667 | 0.8462 | 0.6868 | |
| Validation | 39 | 0.8750 | 0.8387 | 0.8462 | 0.6244 | |
| 2 | Training | 407 | 0.7016 | 0.8009 | 0.7543 | 0.5059 |
| Calibration | 35 | 0.9375 | 0.9471 | 0.9429 | 0.8849 | |
| Validation | 35 | 0.8750 | 1.000 | 0.9429 | 0.8898 | |
| 3 | Training | 380 | 0.7348 | 0.7889 | 0.7632 | 0.5248 |
| Calibration | 49 | 0.9333 | 0.8235 | 0.8571 | 0.7097 | |
| Validation | 48 | 0.8148 | 1.000 | 0.8958 | 0.8112 | |
| 4 | Training | 398 | 0.7513 | 0.7707 | 0.7613 | 0.5221 |
| Calibration | 40 | 0.9412 | 0.9565 | 0.9500 | 0.8977 | |
| Validation | 39 | 1.000 | 0.6923 | 0.7949 | 0.6574 | |
| 5 | Training | 399 | 0.6742 | 0.8326 | 0.7619 | 0.5156 |
| Calibration | 39 | 0.7600 | 1.000 | 0.8462 | 0.7294 | |
| Validation | 39 | 0.8500 | 0.9474 | 0.8974 | 0.7995 | |
The statistical quality of models for chromosome aberration test suggested in the literature.
| Reference | Set | n | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|---|
| Multicase methodology | Training | 537 | 0.528 | 0.75 | 0.649 |
| Internal Validation | 53 | 0.568 | 0.717 | 0.651 | |
| Machine learning | Training | 521 | 0.751 | 0.768 | 0.76 |
| Validation | 58 | 0.708 | 0.714 | 0.716 | |
| Dataset in 9 cross-validation folds | 190 | 0.54 | 0.70 | 0.62 | |
| (KNN) | Training | 346 | 0.693 | 0.861 | 0.812 |
| Validation | 37 | 0.727 | 0.923 | 0.865 | |
| (SVM) | Training | 308 | 0.989 | 1 | 0.997 |
| Cross-validation | 38 | 0.727 | 1 | 0.921 | |
| Validation | 37 | 0.727 | 0.885 | 0.838 | |
| Training | 216 | 0.849 | 0.869 | 0.86 | |
| Validation | 156 | 0.818 | 0.829 | 0.828 |
Mean value of 10 indipendent validations.
Values represent mean ± standard deviation of 20 indipendent validations.