Literature DB >> 24681908

molBLOCKS: decomposing small molecule sets and uncovering enriched fragments.

Abstract

UNLABELLED: The chemical structures of biomolecules, whether naturally occurring or synthetic, are composed of functionally important building blocks. Given a set of small molecules-for example, those known to bind a particular protein-computationally decomposing them into chemically meaningful fragments can help elucidate their functional properties, and may be useful for designing novel compounds with similar properties. Here we introduce molBLOCKS, a suite of programs for breaking down sets of small molecules into fragments according to a predefined set of chemical rules, clustering the resulting fragments, and uncovering statistically enriched fragments. Among other applications, our software should be a great aid in large-scale chemical analysis of ligands binding specific targets of interest.
AVAILABILITY AND IMPLEMENTATION: molBLOCKS is available as GPL C++ source code at http://compbio.cs.princeton.edu/molblocks.

Entities: Chemical Disease Gene

Mesh：

Substances：

Year: 2014 PMID： 24681908 PMCID： PMC4080744 DOI： 10.1093/bioinformatics/btu173

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Endogenous small molecules are synthesized in the cell in a modular fashion, using building blocks or fragments that are often conserved across organisms (Muto ). Fragment-based drug discovery has also emerged as an important paradigm to navigate the diversity of the chemical landscape and to profile protein druggability (Hajduk and Greer, 2007). Further, it has been shown that the toxicity of certain drugs can be explained by the presence in their structure of fragments that are shared by toxic compounds (Ahmed ). Although many programs are available to assemble small molecules from fragments (Schneider and Baringhaus, 2013), the reverse problem of breaking down small molecules and analyzing the corresponding fragment sets has been studied less extensively. An implementation of the RECAP algorithm (Lewell ) to fragment small molecules can be found in a commercial program (fragmenter, www.chemaxon.com), and is available in the RDKit library (http://www.rdkit.org), which also implements the BRICS fragmentation algorithm (Degen ). However, given a diverse set of small molecules that share a property of interest, there is no automated tool to identify statistically enriched fragments that might explain their activity. Here we introduce the molBLOCKS suite, which allows users to break down small molecules into chemically meaningful fragments and analyze the resulting fragment distribution (Fig. 1). The software consists of two command-line programs: fragment and analyze. The fragment program reads user-defined rules to specify the bonds to break or uses default sets of rules [RECAP (Lewell ), CCQ [www.chemaxon.com], and BRICS (Degen )]. Then, the program applies these rules to fragment the molecules, and generates all fragments with a number of heavy atoms above a minimum size defined by the user.

Fig. 1.

The fragment program takes as input a set of small molecules and user-defined rules that specify the bonds to break, and then applies these rules to fragment the molecules. As an optional second step, carried out by the analyze program, the user can cluster the fragments and/or determine whether the frequency of any of the fragments is enriched as compared with a background set of fragments The analyze program collects statistics on the frequency with which each fragment occurs, clusters fragments using a user-defined similarity threshold based on a fingerprint representation (O’Boyle ) of the fragments and selects a representative fragment for each cluster. This program can also perform enrichment analysis at the level of either fragments or clusters. A typical scenario where fragment and enrichment analyses can be applied is when dealing with a library of small molecules, a subset of which has a specific property of interest. In these cases, molBLOCKS can be used to fragment the whole library and determine which (if any) fragments are significantly enriched in the set with the property of interest. Fragmentation and enrichment analysis of small molecules may also be useful in analyzing proteins. For example, ligands bound by proteins that share a common property, such as a specific function, can be analyzed in this manner. Such an approach would provide a complement to the functional enrichment analyses that are routinely performed with Gene Ontology terms (Huang da ). Extensive fragmentation of the entire DrugBank (Wishart ) collection of 6460 small molecules with the default rules took 53 s on an iMac with a 2.66 GHz processor. A user’s guide with implementation details and more tests is provided with the suite.

2 METHODS

2.1 fragment

Small molecules and bond-breaking rules are specified with SMILES (Weininger, 1988) and SMARTS (Daylight Inc.) notation, respectively. The open-source Open Babel C API (O’Boyle ) is used to process the SMILES and SMARTS notation. To ensure that all possible fragments of a minimum given size are generated (extensive fragmentation, which can be turned on with the -e flag), the program uses the following strategy. Cleavable bonds are represented as nodes in an undirected graph, with an edge between two nodes if both bonds can be cut; we note that not all bonds that match the rules can be cleaved at the same time, because doing so would yield fragments smaller than the minimum size. Subsequently, the Bron–Kerbosch algorithm (Bron and Kerbosch, 1973) is used to identify all maximal cliques (i.e. all sets of bonds that can be cleaved simultaneously). Finally, all possible fragments are generated by cutting the bonds within each maximal clique, one clique at a time. Without extensive fragmentation, the program returns only one possible set of fragments.

2.2 analyze

2.2.1 Fragment frequency

The program returns a frequency distribution with the total number of molecules that contain a given fragment. Multiple instances of the same fragment in a molecule are counted only once.

2.2.2 Fragment clustering

Fragments are first converted to the Open Babel (O’Boyle ) default FP2 fingerprint representation, which is based on linear segments of up to seven atoms in length. The Tanimoto coefficient between the fingerprint representations of two fragments is used to compute their fragment similarity. For a given threshold of similarity, a graph is created where there is a node for each fragment, and an edge between two nodes whose corresponding fragments are considered similar. Subsequently, the analyze program extracts the connected components of the graph, and selects a representative element for each cluster as the fragment with the highest average similarity to all the other fragments in the cluster.

2.2.3 Enrichment analysis

Enrichment analysis can be carried out to identify whether specific fragments (or clusters of fragments) appear in a set of molecules more frequently than expected by chance, as compared with a background set of fragments. The hypergeometric distribution was chosen to model the probability of obtaining a number of fragments (or clusters of fragments) equal to or greater than the observed by chance alone, in analogy to what is routinely done in Gene Ontology enrichment analyses (Rivals ). The analyze program returns both uncorrected P-values and FDRs obtained with the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995) to handle multiple hypothesis testing.

3 USAGE

As an example of how to use the molBLOCKS suite, we fragmented a set of antineoplastic drugs extracted from KEGG (Kanehisa ) with the following command: where antineoplastic.smi is a text file containing the small molecules in SMILES format to fragment. The RECAP.txt file contains a definition of the cleavable bonds, encoded as SMARTS patterns. The −e flag specifies extensive fragmentation, and the −n parameter controls the minimum size of a fragment, defined as the total number of heavy atoms. The antineoplastic.frag file contains the output of the fragmentation. fragment -i antineoplastic.smi -r RECAP.txt -n 4 -o antineoplastic.frag –e Subsequently, we identified the enriched fragments in a background dataset of drugs in KEGG with the analyze program: analyze -i antineoplastic.frag -c 0.8 -e background.frag -o distr.txt With the optional −c parameter, analyze clusters the fragments at the specified Tanimoto coefficient. The optional −e parameter specifies the background set for enrichment analysis; this set must contain the fragments in the input set for the results to be meaningful. Figure 2 shows an example of an enriched fragment and its parent molecules in the antineoplastic set. See the Supplementary Materials for further details.

Fig. 2.

Antineoplastic (i.e. tumor inhibitor) drugs were fragmented and analyzed with molBLOCKS. Four clusters of fragments were found to be enriched in this set of 165 drugs. The representative fragment for the first cluster is shown in the left panel, and drugs that contain a fragment in this cluster are shown in the right panel. These compounds are alkylating agents, which damage DNA by attaching an alkyl group to the guanine base. The enriched fragment comes from nitrosurea, the molecule from which these compounds derive. Molecules are visualized with Marvin Sketch (http://www.chemaxon.com/products/marvin/marvinsketch/). The remaining enriched clusters are given in the Supplementary Materials

10 in total

1. Enrichment or depletion of a GO category within a class of genes: which test?

Authors: Isabelle Rivals; Léon Personnaz; Lieng Taing; Marie-Claude Potier
Journal: Bioinformatics Date: 2006-12-20 Impact factor: 6.937

Review 2. A decade of fragment-based drug design: strategic advances and lessons learned.

Authors: Philip J Hajduk; Jonathan Greer
Journal: Nat Rev Drug Discov Date: 2007-02-09 Impact factor: 84.694

3. Analysis of common substructures of metabolic compounds within the different organism groups.

Authors: Ai Muto; Masahiro Hattori; Minoru Kanehisa
Journal: Genome Inform Date: 2007

4. On the art of compiling and using 'drug-like' chemical fragment spaces.

Authors: Jörg Degen; Christof Wegscheid-Gerlach; Andrea Zaliani; Matthias Rarey
Journal: ChemMedChem Date: 2008-10 Impact factor: 3.466

5. RECAP--retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry.

Authors: X Q Lewell; D B Judd; S P Watson; M M Hann
Journal: J Chem Inf Comput Sci Date: 1998 May-Jun

6. Open Babel: An open chemical toolbox.

Authors: Noel M O'Boyle; Michael Banck; Craig A James; Chris Morley; Tim Vandermeersch; Geoffrey R Hutchison
Journal: J Cheminform Date: 2011-10-07 Impact factor: 5.514

7. FragmentStore--a comprehensive database of fragments linking metabolites, toxic molecules and drugs.

Authors: Jessica Ahmed; Catherine L Worth; Paul Thaben; Christian Matzig; Corinna Blasse; Mathias Dunkel; Robert Preissner
Journal: Nucleic Acids Res Date: 2010-10-21 Impact factor: 16.971

8. KEGG for integration and interpretation of large-scale molecular data sets.

Authors: Minoru Kanehisa; Susumu Goto; Yoko Sato; Miho Furumichi; Mao Tanabe
Journal: Nucleic Acids Res Date: 2011-11-10 Impact factor: 16.971

9. DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors: Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal: Nucleic Acids Res Date: 2008-11-25 Impact factor: 16.971

10 in total

12 in total

Review 1. Computational Fragment-Based Drug Design: Current Trends, Strategies, and Applications.

Authors: Yuemin Bian; Xiang-Qun Sean Xie
Journal: AAPS J Date: 2018-04-09 Impact factor: 4.009

2. Redefining the Protein Kinase Conformational Space with Machine Learning.

Authors: Peter Man-Un Ung; Rayees Rahman; Avner Schlessinger
Journal: Cell Chem Biol Date: 2018-05-31 Impact factor: 8.116

3. Identifying Molecular Fragments That Drive 7-Dehydrocholesterol Elevation.

Authors: Dario Ghersi; Thiago C Genaro-Mattos
Journal: ACS Pharmacol Transl Sci Date: 2021-12-29

4. Smiles2Monomers: a link between chemical and biological structures for polymers.

Authors: Yoann Dufresne; Laurent Noé; Valérie Leclère; Maude Pupin
Journal: J Cheminform Date: 2015-12-29 Impact factor: 5.514

5. Break Down in Order To Build Up: Decomposing Small Molecules for Fragment-Based Drug Design with eMolFrag.

Authors: Tairan Liu; Misagh Naderi; Chris Alvin; Supratik Mukhopadhyay; Michal Brylinski
Journal: J Chem Inf Model Date: 2017-04-04 Impact factor: 4.956

6. Chemical Space Expansion of Bromodomain Ligands Guided by in Silico Virtual Couplings (AutoCouple).

Authors: Laurent Batiste; Andrea Unzue; Aymeric Dolbois; Fabrice Hassler; Xuan Wang; Nicholas Deerain; Jian Zhu; Dimitrios Spiliotopoulos; Cristina Nevado; Amedeo Caflisch
Journal: ACS Cent Sci Date: 2018-02-07 Impact factor: 14.553

7. Novel scaffolds for inhibition of Cruzipain identified from high-throughput screening of anti-kinetoplastid chemical boxes.

Authors: Emir Salas-Sarduy; Lionel Urán Landaburu; Joel Karpiak; Kevin P Madauss; Juan José Cazzulo; Fernán Agüero; Vanina Eder Alvarez
Journal: Sci Rep Date: 2017-09-21 Impact factor: 4.379