Literature DB >> 20591161

Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

Ola Spjuth1, Egon L Willighagen, Rajarshi Guha, Martin Eklund, Jarl Es Wikberg.   

Abstract

BACKGROUND: QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data.
RESULTS: We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services.
CONCLUSIONS: Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but also allows for analyzing the effect descriptors have on the statistical model's performance. The presented Bioclipse plugins equip scientists with graphical tools that make QSAR-ML easily accessible for the community.

Entities:  

Year:  2010        PMID: 20591161      PMCID: PMC2909924          DOI: 10.1186/1758-2946-2-5

Source DB:  PubMed          Journal:  J Cheminform        ISSN: 1758-2946            Impact factor:   5.514


  24 in total

1.  Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors:  A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

2.  Lazy structure-activity relationships (lazar) for the prediction of rodent carcinogenicity and Salmonella mutagenicity.

Authors:  Christoph Helma
Journal:  Mol Divers       Date:  2006-05-24       Impact factor: 2.943

3.  Web service infrastructure for chemoinformatics.

Authors:  Xiao Dong; Kevin E Gilbert; Rajarshi Guha; Randy Heiland; Jungkee Kim; Marlon E Pierce; Geoffrey C Fox; David J Wild
Journal:  J Chem Inf Model       Date:  2007-06-29       Impact factor: 4.956

Review 4.  Exploiting QSAR models in lead optimization.

Authors:  Peter Gedeck; Richard A Lewis
Journal:  Curr Opin Drug Discov Devel       Date:  2008-07

5.  Bioclipse 2: a scriptable integration platform for the life sciences.

Authors:  Ola Spjuth; Jonathan Alvarsson; Arvid Berg; Martin Eklund; Stefan Kuhn; Carl Mäsak; Gilleain Torrance; Johannes Wagener; Egon L Willighagen; Christoph Steinbeck; Jarl E S Wikberg
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

6.  Spline-fitting with a genetic algorithm: a method for developing classification structure-activity relationships.

Authors:  Jeffrey J Sutherland; Lee A O'Brien; Donald F Weaver
Journal:  J Chem Inf Comput Sci       Date:  2003 Nov-Dec

7.  Collaborative development of predictive toxicology applications.

Authors:  Barry Hardy; Nicki Douglas; Christoph Helma; Micha Rautenberg; Nina Jeliazkova; Vedrin Jeliazkov; Ivelina Nikolova; Romualdo Benigni; Olga Tcheremenskaia; Stefan Kramer; Tobias Girschick; Fabian Buchwald; Joerg Wicker; Andreas Karwath; Martin Gütlein; Andreas Maunz; Haralambos Sarimveis; Georgia Melagraki; Antreas Afantitis; Pantelis Sopasakis; David Gallagher; Vladimir Poroikov; Dmitry Filimonov; Alexey Zakharov; Alexey Lagunin; Tatyana Gloriozova; Sergey Novikov; Natalia Skvortsova; Dmitry Druzhilovsky; Sunil Chawla; Indira Ghosh; Surajit Ray; Hitesh Patel; Sylvia Escher
Journal:  J Cheminform       Date:  2010-08-31       Impact factor: 5.514

8.  The C1C2: a framework for simultaneous model selection and assessment.

Authors:  Martin Eklund; Ola Spjuth; Jarl Es Wikberg
Journal:  BMC Bioinformatics       Date:  2008-09-02       Impact factor: 3.169

9.  The Blue Obelisk-interoperability in chemical informatics.

Authors:  Rajarshi Guha; Michael T Howard; Geoffrey R Hutchison; Peter Murray-Rust; Henry Rzepa; Christoph Steinbeck; Jörg Wegner; Egon L Willighagen
Journal:  J Chem Inf Model       Date:  2006 May-Jun       Impact factor: 4.956

10.  Design and implementation of microarray gene expression markup language (MAGE-ML).

Authors:  Paul T Spellman; Michael Miller; Jason Stewart; Charles Troup; Ugis Sarkans; Steve Chervitz; Derek Bernhart; Gavin Sherlock; Catherine Ball; Marc Lepage; Marcin Swiatek; W L Marks; Jason Goncalves; Scott Markel; Daniel Iordan; Mohammadreza Shojatalab; Angel Pizarro; Joe White; Robert Hubley; Eric Deutsch; Martin Senger; Bruce J Aronow; Alan Robinson; Doug Bassett; Christian J Stoeckert; Alvis Brazma
Journal:  Genome Biol       Date:  2002-08-23       Impact factor: 13.583

View more
  17 in total

1.  Many InChIs and quite some feat.

Authors:  Wendy A Warr
Journal:  J Comput Aided Mol Des       Date:  2015-06-17       Impact factor: 3.686

2.  Computational modeling to accelerate the identification of substrates and inhibitors for transporters that affect drug disposition.

Authors:  S Ekins; J E Polli; P W Swaan; S H Wright
Journal:  Clin Pharmacol Ther       Date:  2012-09-26       Impact factor: 6.875

3.  Bigger data, collaborative tools and the future of predictive drug discovery.

Authors:  Sean Ekins; Alex M Clark; S Joshua Swamidass; Nadia Litterman; Antony J Williams
Journal:  J Comput Aided Mol Des       Date:  2014-06-19       Impact factor: 3.686

4.  Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation.

Authors:  Sean Ekins; Joel S Freundlich; Robert C Reynolds
Journal:  J Chem Inf Model       Date:  2013-10-30       Impact factor: 4.956

5.  Why open drug discovery needs four simple rules for licensing data and models.

Authors:  Antony J Williams; John Wilbanks; Sean Ekins
Journal:  PLoS Comput Biol       Date:  2012-09-27       Impact factor: 4.475

6.  ChemMine tools: an online service for analyzing and clustering small molecules.

Authors:  Tyler W H Backman; Yiqun Cao; Thomas Girke
Journal:  Nucleic Acids Res       Date:  2011-05-16       Impact factor: 16.971

7.  The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web.

Authors:  Janna Hastings; Leonid Chepelev; Egon Willighagen; Nico Adams; Christoph Steinbeck; Michel Dumontier
Journal:  PLoS One       Date:  2011-10-03       Impact factor: 3.240

8.  A survey of quantitative descriptions of molecular structure.

Authors:  Rajarshi Guha; Egon Willighagen
Journal:  Curr Top Med Chem       Date:  2012       Impact factor: 3.295

9.  Bioclipse-R: integrating management and visualization of life science data with statistical analysis.

Authors:  Ola Spjuth; Valentin Georgiev; Lars Carlsson; Jonathan Alvarsson; Arvid Berg; Egon Willighagen; Jarl E S Wikberg; Martin Eklund
Journal:  Bioinformatics       Date:  2012-11-23       Impact factor: 6.937

10.  Using Pareto points for model identification in predictive toxicology.

Authors:  Anna Palczewska; Daniel Neagu; Mick Ridley
Journal:  J Cheminform       Date:  2013-03-22       Impact factor: 5.514

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.