Literature DB >> 33322123

Classification of Biodegradable Substances Using Balanced Random Trees and Boosted C5.0 Decision Trees.

Alaa M Elsayad1,2, Ahmed M Nassef1,3, Mujahed Al-Dhaifallah4, Khaled A Elsayad5.   

Abstract

Substances that do not degrade over time have proven to be harmful to the environment and are dangerous to living organisms. Being able to predict the biodegradability of substances without costly experiments is useful. Recently, the quantitative structure-activity relationship (QSAR) models have proposed effective solutions to this problem. However, the molecular descriptor datasets usually suffer from the problems of unbalanced class distribution, which adversely affects the efficiency and generalization of the derived models. Accordingly, this study aims at validating the performances of balanced random trees (RTs) and boosted C5.0 decision trees (DTs) to construct QSAR models to classify the ready biodegradation of substances and their abilities to deal with unbalanced data. The balanced RTs model algorithm builds individual trees using balanced bootstrap samples, while the boosted C5.0 DT is modeled using cost-sensitive learning. We employed the two-dimensional molecular descriptor dataset, which is publicly available through the University of California, Irvine (UCI) machine learning repository. The molecular descriptors were ranked according to their contributions to the balanced RTs classification process. The performance of the proposed models was compared with previously reported results. Based on the statistical measures, the experimental results showed that the proposed models outperform the classification results of the support vector machine (SVM), K-nearest neighbors (KNN), and discrimination analysis (DA). Classification measures were analyzed in terms of accuracy, sensitivity, specificity, precision, false positive rate, false negative rate, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUROC).

Entities:  

Keywords:  C5.0 decision tree; K-nearest neighbors; QSAR; biodegradable substances; discrimination analysis; machine learning; random trees; support vector machine

Mesh:

Year:  2020        PMID: 33322123      PMCID: PMC7763457          DOI: 10.3390/ijerph17249322

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   3.390


  15 in total

1.  In silico assessment of chemical biodegradability.

Authors:  Feixiong Cheng; Yutaka Ikenaga; Yadi Zhou; Yue Yu; Weihua Li; Jie Shen; Zheng Du; Lei Chen; Congying Xu; Guixia Liu; Philip W Lee; Yun Tang
Journal:  J Chem Inf Model       Date:  2012-02-29       Impact factor: 4.956

2.  A tutorial on support vector machine-based methods for classification problems in chemometrics.

Authors:  Jan Luts; Fabian Ojeda; Raf Van de Plas; Bart De Moor; Sabine Van Huffel; Johan A K Suykens
Journal:  Anal Chim Acta       Date:  2010-03-24       Impact factor: 6.558

3.  Efficient kNN Classification With Different Numbers of Nearest Neighbors.

Authors:  Shichao Zhang; Xuelong Li; Ming Zong; Xiaofeng Zhu; Ruili Wang
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2017-04-12       Impact factor: 10.451

4.  Quantitative structure-activity relationship models for ready biodegradability of chemicals.

Authors:  Kamel Mansouri; Tine Ringsted; Davide Ballabio; Roberto Todeschini; Viviana Consonni
Journal:  J Chem Inf Model       Date:  2013-03-27       Impact factor: 4.956

Review 5.  Best Practices for QSAR Model Development, Validation, and Exploitation.

Authors:  Alexander Tropsha
Journal:  Mol Inform       Date:  2010-07-06       Impact factor: 3.353

6.  Development of models predicting biodegradation rate rating with multiple linear regression and support vector machine algorithms.

Authors:  Weihao Tang; Yanying Li; Yang Yu; Zhongyu Wang; Tong Xu; Jingwen Chen; Jun Lin; Xuehua Li
Journal:  Chemosphere       Date:  2020-04-04       Impact factor: 7.086

7.  Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS.

Authors:  Ali Golkarian; Seyed Amir Naghibi; Bahareh Kalantar; Biswajeet Pradhan
Journal:  Environ Monit Assess       Date:  2018-02-17       Impact factor: 2.513

8.  Modelling of ready biodegradability based on combined public and industrial data sources.

Authors:  F Lunghini; G Marcou; P Gantzer; P Azam; D Horvath; E Van Miert; A Varnek
Journal:  SAR QSAR Environ Res       Date:  2019-12-20       Impact factor: 3.000

9.  Classification of biodegradable materials using QSAR modelling with uncertainty estimation.

Authors:  W F C Rocha; D A Sheen
Journal:  SAR QSAR Environ Res       Date:  2016-10-06       Impact factor: 3.000

Review 10.  Artificial intelligence in healthcare: past, present and future.

Authors:  Fei Jiang; Yong Jiang; Hui Zhi; Yi Dong; Hao Li; Sufeng Ma; Yilong Wang; Qiang Dong; Haipeng Shen; Yongjun Wang
Journal:  Stroke Vasc Neurol       Date:  2017-06-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.