Literature DB >> 11272703

Warmr: a data mining tool for chemical data.

R D King1, A Srinivasan, L Dehaspe.   

Abstract

Data mining techniques are becoming increasingly important in chemistry as databases become too large to examine manually. Data mining methods from the field of Inductive Logic Programming (ILP) have potential advantages for structural chemical data. In this paper we present Warmr, the first ILP data mining algorithm to be applied to chemoinformatic data. We illustrate the value of Warmr by applying it to a well studied database of chemical compounds tested for carcinogenicity in rodents. Data mining was used to find all frequent substructures in the database, and knowledge of these frequent substructures is shown to add value to the database. One use of the frequent substructures was to convert them into probabilistic prediction rules relating compound description to carcinogenesis. These rules were found to be accurate on test data, and to give some insight into the relationship between structure and activity in carcinogenesis. The substructures were also used to prove that there existed no accurate rule, based purely on atom-bond substructure with less than seven conditions, that could predict carcinogenicity. This results put a lower bound on the complexity of the relationship between chemical structure and carcinogenicity. Only by using a data mining algorithm, and by doing a complete search, is it possible to prove such a result. Finally the frequent substructures were shown to add value by increasing the accuracy of statistical and machine learning programs that were trained to predict chemical carcinogenicity. We conclude that Warmr, and ILP data mining methods generally, are an important new tool for analysing chemical databases.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11272703     DOI: 10.1023/a:1008171016861

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  11 in total

1.  Analysis of a large structure/biological activity data set using recursive partitioning.

Authors:  A Rusinko; M W Farmen; C G Lambert; P L Brown; S S Young
Journal:  J Chem Inf Comput Sci       Date:  1999 Nov-Dec

2.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase.

Authors:  R D King; S Muggleton; R A Lewis; M J Sternberg
Journal:  Proc Natl Acad Sci U S A       Date:  1992-12-01       Impact factor: 11.205

3.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins.

Authors:  R D Cramer; D E Patterson; J D Bunce
Journal:  J Am Chem Soc       Date:  1988-08-01       Impact factor: 15.419

Review 4.  Prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 44 chemicals by the National Toxicology Program.

Authors:  R W Tennant; J Spalding; S Stasiewicz; J Ashby
Journal:  Mutagenesis       Date:  1990-01       Impact factor: 3.000

5.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming.

Authors:  R D King; S H Muggleton; A Srinivasan; M J Sternberg
Journal:  Proc Natl Acad Sci U S A       Date:  1996-01-09       Impact factor: 11.205

6.  The NIEHS Predictive-Toxicology Evaluation Project.

Authors:  D W Bristol; J T Wachsman; A Greenwell
Journal:  Environ Health Perspect       Date:  1996-10       Impact factor: 9.031

7.  Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming.

Authors:  R D King; A Srinivasan
Journal:  Environ Health Perspect       Date:  1996-10       Impact factor: 9.031

8.  Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolate reductase by pyrimidines.

Authors:  J D Hirst; R D King; M J Sternberg
Journal:  J Comput Aided Mol Des       Date:  1994-08       Impact factor: 3.686

9.  Quantitative structure-activity relationships by neural networks and inductive logic programming. II. The inhibition of dihydrofolate reductase by triazines.

Authors:  J D Hirst; R D King; M J Sternberg
Journal:  J Comput Aided Mol Des       Date:  1994-08       Impact factor: 3.686

10.  Long-term chemical carcinogenesis experiments for identifying potential human cancer hazards: collective database of the National Cancer Institute and National Toxicology Program (1976-1991).

Authors:  J Huff; J Haseman
Journal:  Environ Health Perspect       Date:  1991-12       Impact factor: 9.031

View more
  1 in total

1.  A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.

Authors:  Juliana S Bernardes; Alessandra Carbone; Gerson Zaverucha
Journal:  BMC Bioinformatics       Date:  2011-03-23       Impact factor: 3.169

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.