| Literature DB >> 30149700 |
Irene Luque Ruiz1, Miguel Ángel Gómez-Nieto1.
Abstract
The knowledge of the capacity of a data set to be modeled in the first stages of the building of quantitative structure-activity relationship (QSAR) prediction models is an important issue because it might reduce the effort and time necessary to select or reject data sets and in refining the data set's composition. The modelability index (MODI) is based on the counting of the first nearest neighbor belonging to the molecules of the data set and is a standardized measurement assumed in the QSAR community. In this paper, we revisit the calculation of the modelability index, proposing a more formal formulation that extends the calculation to the first nearest neighbors that belong to each existing class in the data set. In addition, this new formulation allows the calculation of the rivality index, as a measurement of the presence of correctly classifiable molecules and activity cliffs. By weighting the rivality index considering the cardinality of the neighborhood of each molecule of the data set, the calculated weighted modelability index is highly correlated with the correct classification rate (QSAR_CCR) obtained in the building of QSAR models using different classification algorithms. The results obtained with the weighted modelability index show correlations of r2 higher than 0.9, slopes close to 1, and bias close to zero for different algorithms.Mesh:
Year: 2018 PMID: 30149700 DOI: 10.1021/acs.jcim.8b00188
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956