Literature DB >> 28066532

Matched Molecular Pair Analysis in Short: Algorithms, Applications and Limitations.

Abstract

Molecular matched pair (MMP) analysis has been used for more than 40 years within molecular design and is still an important tool to analyse potency data and other compound properties. The methods used to find matched pairs range from manual inspection, through supervised methods to unsupervised methods, which are able to find previously unknown molecular pairs. Recent publications demonstrate the value of automatic MMP analysis of publicly available bioactivity databases. The MMP concept has its limitations, but because of its easy to use and intuitive nature, it will remain one of the most important tools in the toolbox of many drug designers.

Entities: Chemical Disease Gene Species

Year: 2016 PMID： 28066532 PMCID： PMC5198793 DOI： 10.1016/j.csbj.2016.12.003

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

The challenge of molecular design is the decision what to do next based on available data, medicinal chemistry expert knowledge, experience and intuition [1]. In small sets of molecules an experienced chemist can spot trends and relationships by eye. As the numbers of compounds increases, more systematical approaches are needed. Already in the early 70s, methods for systematic analysis were published e.g. the Topliss Scheme [1] or the Craig Plot [2], recommending a systematic stepwise method of building a structure–activity relationship for a chemical series. Hansch [3], Free and Wilson [4] reasoned in the 1960s that the biological activity for a set of analogues could be described by the contributions that substituents or structural elements make to the activity of a parent structure. Generally speaking these local QSAR methods try to find a correlation between structural and physicochemical descriptors towards a given endpoint [4], such as biological activity. The term Molecular Matched Pair (MMP) was coined in 2004 by Kenny and Sadowski [5], for a special case of QSAR; now a widely used concept throughout drug design processes. In the most common situation, MMP describes a pair of compounds that differ structurally at a single site through a well-defined transformation (see Fig. 1) that is associated with a relative change in a property value. The correlation between the structural change and the property change is used in rationalizing observed structure–property-relationships (SPR) and compound optimization. Several different applications for MMP analysis originating from industry or academia have been developed and published, highlighting its importance. Among others these include: Drug-Guru [6], [7], Buy me Grease [7], WizePairZ [8], T-Analyse and T-Morph [9], VAMMPIRE [10] as well as the Hussain-Rea MMP algorithm [11] (Table 1). The MMP concept has been further developed into Matched Pair Series [12], [13] or Matched Molecular Series (MMS) [14] to describe a set of compounds (not only a pair) differing by only a single chemical transformation.

Fig. 1

Example of a matched molecular pair (MMP).

Table 1

Classified MMP algorithms.

Non-supervised methods
	R. Guha (2012) [46]	BCI structural fingerprints, CDK 1024-bit path fingerprint	2016
	Fuchs et al. (2015) [15]	Sequence alignment for peptide MMPa	2015
	T.J. Ritchie et al. (2015) [36]	HRFb	2015
	Matsy (2014) [13]	HRFb	2014
	VAMMPIRE (2013, 2014) [10], [49]	MCS and HRFb	2013
	C.E. Keefer et al. (2011) [25]	Modified HRFb (Pairfinder)	2011
	J. Bajorath et al. (2010–2016) [12], [14], [19], [27], [43], [50], [51], [52]	HRFb, modified HRFb, RECAPc fragmentation	2010
	J. Hussain et al. (2010) [11]	Hussain and Rea fragmentation (HRFb)	2010
	L. Cururull-Sanchez (2010) [35]	ECFP6 fingerprints with sub-structure search	2010
	Papadatos et al. (2010) [42]	dt_commonsubstruct and findsub routine from Daylight and HRFb	2010
	WizePairs (2010) [8]	MCSd and SMIRKSe	2010
	Raymond et al. (2009) [47]	MSMf rule framework based on MCSd	2009
	R. Sheridan et al. (2002, 2006) [9], [28]	Similarity and MCSd method (T-Analyse)	2008

Supervised methods	ThricePairs (2010) [34]	Defined transformations, SMARTSg
	Gleeson et al. (2009) [26]	Substructure Search	2010
	Buy me Grease (2009, 2010) [7], [35]	Defined transformations, RXNh format	2009
	P.J. Hajduk et al. (2008) [33]	Findsub routine from Daylight and defined transformations, SMIRKSe	2009
	D.Y. Haubertin et al. (2007) [30]	RECAPc method	2008
	Drug Guru (2006) [6]	Defined transformations, SMIRKSe	2007
	N.T. Southall et al. (2006) [53]	Topological torsion similarity and MCSd	2006
	A. G. Leach et al. (2006) [29]	Defined transformations, SMARTSg (Leatherface)	2006
	T.J. Ritchie (2016) [48].	SMIRKSe
			2006

MMP: Molecular Matched Pair.

HRF: Hussain and Rea fragmentation.

RECAP: Retrosynthetic Combinatorial Analysis Procedure.

MCS: Maximum common substructure.

SMIRKS: SMILES reaction specification.

MSM: molecular substructure modification.

SMARTS: SMiles ARbitrary Target Specification.

RXN: MDL Molfile Reaction Format.

Recently an extension of the MMP concept towards biopharmaceutical applications was published, using macromolecular sequence data to predict the effect of single amino acid substitutions on property optimization [15]. Besides supporting hypothesis development and testing, an important application of MMP is in the detection of outliers, namely a pair of compounds that show a step change in a property; a so called activity cliff. These compounds are often the most interesting to study in the design of compounds targeting improvement of the property showing this change [16], [17]. An inherently difficult problem to detect these activity cliffs is confounded by experimental uncertainties in the measured properties, since they are a function of the chemical space representation [18]. One systematic approach to the detection of activity cliffs and determination of their depth uses support vector regression [19]. Not only can different chemical space representations lead to significant changes in the nature of these activity cliffs, but even simple atomic variations can cause dramatic effects on important complex endpoints in medicinal chemistry; dose to man prediction, potency, clearance, solubility and permeability to name a few [18], [20]. If the structural change (R group) is small and the scaffold in a chemical series is conserved, the MMP represents a relevant and easy to interpret chemical space representation. The MMP approach can further be extended to systematically analyse non-additivity in a structure property relationship (SPR) series [21].

Application and Limitation

The assumption that the effect of chemical substitution can be generalized, is inherently assumed in all QSAR methods, including the MMP approach, successfully highlighted by the work of Lipinski et al. who correlated physicochemical properties to oral bioavailability [22]. With the increasing availability of public databases containing millions of structure–activity-relationship (SAR) [23], [24] or SPR data, multiple papers have been published applying MMP concept to: ADME [25], [26], bioisosterism [9], [27], [28], aqueous solubility [29], [30], [31], [32], plasma protein binding [29], [30], oral exposure [29], logD [8], [30], [32], potency [8], [9], [27], [31], [33], intrinsic clearance [7], [34], herG and P450 metabolism [29], [32], [34], in vitro UGT (Uridine 5′-diphospho-glucuronosyltransferase) glucuronidation clearance [35], half-life [31], selectivity against off-targets [31], impact of N- and O-methylation on aqueous solubility and lipophilicity [36] or mode of action; [31] the analysis differing only in the MMP algorithm used. In two relatively recent publications [31], [37] an apparently simple MMP transformation of CH → C-CH3 is analysed in greater detail and highlights some general limitations and drawbacks of using the MMP concept prospectively in drug design. The methyl group is a commonly occurring carbon fragment in small-molecule drugs and can modulate both the biological and physical properties of a molecule. Two literature analysis of > 2000 cases of methylation revealed that an activity boost of a factor of 10 or more is found with an approximate 8% frequency, and the probability of achieving a 100-fold boost is less than 1% [33]. However, the distribution of potency changes in respect to the MMP is often nearly symmetrical and centred at or near zero resulting in a similar likelihood of causing potency gains or losses. A consistent bias of specific substituents towards improved potency could not be observed. Nevertheless an understanding of these rare events affecting the binding potency by improving the IC50 value of a compound by more than 100-fold would provide great value in prospective affinity optimisation. From logD measurements the free energy of binding can be estimated for this specific transformation to be about 0.8 kcal/mol [31]. This corresponds to an approximate 3.5-fold boost in potency from methylation based on partitioning effect alone. A more empirical evaluation of literature examples by Jorgensen and co-workers suggested that a single methyl group might actually boost potency approximately 10-fold if the new methyl group sits optimally in a hydrophobic pocket of the active site [37]. Further they found that methyl substitutions ortho to an aryl ring can be particularly effective at modulating activity due to the induction of a conformational change. Coupling the conformational gain with the correct placing of the methyl group in a hydrophobic pocket of the protein might therefore result in the greatest improvements of activity. It must be emphasized that this does not explain all observations of a > 100-fold boost in potency and the reasons for this increase are not necessarily straightforward to rationalize. Besides the rationalization of the observed property change in a structural context, the prospective use of MMP is further hampered by several additional limitations. These are discussed in various publications [12], [38] and include: the problem of generalized relationships like global versus local effects, molecular context, database, time and data set dependencies and definition of the relevance of observed difference (e.g. cut-off at 2 fold change). The interpretation of the relative property value change in the MMP analysis is also dependent on the experimental error [39], [40]. Commonly authors use e.g. Z-Scores to define the significance of the potency difference [9], the mean, standard deviation and standard error [33], [41] or a specific value cut-off [42], [43]. By assuming different experimental errors Kramer et al. showed how the minimum number of pairs necessary to achieve significance can be calculated, as they explained the difference between statistical significance and effect size estimation [21], [44]. Relevant experimental errors for public and industry SAR databases are nowadays published [21], [39]. Finally, MMP analysis as a linear substituent contribution model generally assumes additivity and thus do not work in cases of non-additivity [21]. Matched Square Pairs, an extension of MMP analysis, can allow a judgement of the quality of generalized relationships. By looking at squares (pairs of matched-pairs) it is possible to check for non-additivity and for outliers. The matched square shows four transforms, which could involve a change of the core and two R-groups or one core and three R-groups as shown in Fig. 2. Non-additivity is calculated as (pAct3-pAct4)-(pAct2-pAct1) [21] and indicates if there is an apparent non-linear SAR for a subset of compounds. This in turn could lead to new binding or interaction hypothesis e.g. conformational changes or restrictions leading to different binding mode or protein dynamics.

Fig. 2

Matched Square Pairs cycle to determine non-additivity in a SAR analysis.

MMP Algorithms

In principle all published MMP algorithms can be defined as supervised or non-supervised methods. In supervised methods the chemical transformation that generates the MMP is predefined, while in the unsupervised methods an algorithm is used to find all possible pairs in a set of compounds, mainly using maximum common substructure (MCS) or fragmentation approaches [38], [45]. The advantage with supervised methods lies within the precise control of the definition of the MMP to address a particular question [7], [29], [34]. On the other hand, these methods cannot find new and surprising MMPs in the way that unsupervised methods can. In the 90s Van Drie and coworkers defined structure activity landscape index as:where A and A are the activities of the ith and the jth molecules, and sim(i, j) is the similarity coefficient between the two molecules [16] in an approach to find compounds with small structural differences and large difference in properties. Guha further extended the structure activity landscape to predict compounds with ability to have activity cliffs [46]. Later, Wassermann et al. [27] analysed matched molecular pairs, generated by a modified reimplementation of the Hussain-Rea algorithm, in respect to their ability to introduce activity cliffs using public domain compound data. Approximately 250 nonredundant substitutions were identified with tendency to display activity cliffs. A definition of activity cliff is given and distribution of MMPs is shown. Hu et al. [43] further looked at these activity cliffs with substructures (MMP) in contrast to similarity searches and could identify more relevant cases with MMP. Sheridan et al. [9] screened for the most common chemical replacements in a large collection of drug-like molecules from the MDL Drug Data Report. Different treatments of replacements in rings are implemented and used to identify potential bioisosteres. A maximum common substructure (MCS) approach was used to define the MMP, based on a clique detection method and one single replacement side. Similarly, Haubertin and Bruneau [30] used about 9000 predefined functional groups to analyse the effects on a corporate compound deck in respect to various compound properties. For MMP creation they used a RECAP fragmentation algorithm which identified one of the predefined groups. Hajduk and Sauer [33] looked at the influence of common chemical substitutions on ligand potency. Overall 127 different chemical changes were compared and shifts analysed. MMPs were identified by a pairwise comparison with the findsub routine from Daylight, which looks at terminal or side chain groups only. The analysed database consisted of 84,000 compounds from lead optimisation of more than 30 different targets. They couldn't find a substituent which perfectly biased the result in an always gain or loss, the distribution was normal and nearly symmetrical centred at or near 0, with a potency change probability of 10 fold gain around ~ 8.5%, 100 fold less than 1%; similar to the findings of Schönherr et al. [31]. Raymond et al. [47] used a MCS based MMP algorithm to identify chemical changes within a collection of 2,7 million compounds and discussed statistical relevance of these modifications. Also published in 2009, Gleeson et al. [26] used a partial supervised MMP algorithm by defining a given transformation described by a substructure combined with a retrieval algorithm to find all transformations of this substructure to derive ADMET rules of thumb. Hussain and Rea [11], [34] introduced in 2010 an efficient fragmentation algorithm to systematically extract all MMPs from a given compound data set. They perform single, double and triple cuts at all bonds, without breaking ring bonds. Some retrieval problems are discussed e.g. alcohol to amide would retrieve compounds where a carboxylic acid has been substituted with an amide. Dossetter [34] analysed groups in a similar fashion as Topliss [1] in applying a statistical analysis to in vitro human microsomal metabolic stability data for small phenyl group substituents using AstraZeneca inhouse proprietary software, ThricePairs. ThricePairs uses SMARTS patterns to specify substructures. This was later followed by WizePairZ [8], described in 2010 by Warner et al. as a tool which can automatically detect and identify matched molecular pairs and encoded them in SMIRKS reaction notation. In essence, it is a MCS method which captures different levels of the local single site environment similar to the approach of Sheridan. Using SMIRK definitions Ritchie et al. [48] looked at the replacement of mono-substituted benzene ring with aromatic or aliphatic heterocycle MMPs and the effect on nine ADME properties. Weber et al. [10] published in 2013 a strategy to relate the substitution effect within matched molecular pairs to the atom environment within the cocrystallized protein–ligand complex with the aim to predict ligand binding from extrapolation of the effect of the substitution with the molecular environment taken into account [49]. In 2014 Leon et al. [50] introduced a method to automatically generate synthetically accessible MMPs by applying reaction rules following the retrosynthetic combinatorial analysis procedure (RECAP) with the aim to generate more chemically interpretable and accessible pairs. A library of more than 92,000 RECAP-MMPs was generated from public domain compounds active against 435 different targets exclusively utilizing high-confidence activity data. MMP has also lately been coupled to network analysis of composition and topology to be able to predict potency as in QSAR type approaches [51]. Similarly, Ghosh et al. [52] recently published a study where pairs from matched molecular series (MMS) were coupled to SAR information creating a database of MMS with SAR characteristics to be used in future medicinal chemistry work. They applied the fragmentation technique of Hussain and Rea to a set of 48,000 bioactive compounds.

Conclusion

MMP has been used for more than 40 years within molecular design and is still an important tool to analyse potency data and many other compound properties. The methods used to identify matched pairs range from manual inspection, through supervised methods to unsupervised methods. The MMP framework allows one to study numerous properties (most commonly binding affinity or potency) and to rationalize the design of the next compound to make within a series. Despite its usefulness several limitations have to be considered, including: the representation of molecular structures, selection of the most appropriate algorithm for the task, and the statistical analysis method applied to the data to ensure that the found property difference is indeed relevant. This is especially important in respect to identifying activity cliffs, where a small chemical change relates to a large change in potency or another property of interest. These activity cliffs are often the most important MMPs to study within a lead series. In contrast to traditional SAR analysis, where similar compounds are assumed to have similar properties, activity cliffs describe the substitution pattern with the most impact upon a small structural change. Because of the intuitive nature of the MMP concept through connecting small structural changes to a property change, and the maturity of its framework and simplicity of use, it will remain one of the most important tools in the toolbox of a drug designer.

46 in total

1. MMP-Cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs.

Authors: Xiaoying Hu; Ye Hu; Martin Vogt; Dagmar Stumpfe; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2012-04-17 Impact factor: 4.956

2. A statistical analysis of in vitro human microsomal metabolic stability of small phenyl group substituents, leading to improved design sets for parallel SAR exploration of a chemical series.

Authors: Alexander G Dossetter
Journal: Bioorg Med Chem Date: 2010-04-29 Impact factor: 3.641

3. Lead optimization using matched molecular pairs: inclusion of contextual information for enhanced prediction of HERG inhibition, solubility, and lipophilicity.

Authors: George Papadatos; Muhammad Alkarouri; Valerie J Gillet; Peter Willett; Visakan Kadirkamanathan; Christopher N Luscombe; Gianpaolo Bravi; Nicola J Richmond; Stephen D Pickett; Jameed Hussain; John M Pritchard; Anthony W J Cooper; Simon J F Macdonald
Journal: J Chem Inf Model Date: 2010-10-25 Impact factor: 4.956

4. On outliers and activity cliffs--why QSAR often disappoints.

Authors: Gerald M Maggiora
Journal: J Chem Inf Model Date: 2006 Jul-Aug Impact factor: 4.956

5. Molecular transformations as a way of finding and exploiting consistent local QSAR.

Authors: Robert P Sheridan; Peter Hunt; J Chris Culberson
Journal: J Chem Inf Model Date: 2006 Jan-Feb Impact factor: 4.956

6. Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes.

Authors: Rajarshi Guha
Journal: J Chem Inf Model Date: 2012-08-16 Impact factor: 4.956

7. Matched molecular pair analysis: significance and the impact of experimental uncertainty.

Authors: Christian Kramer; Julian E Fuchs; Steven Whitebread; Peter Gedeck; Klaus R Liedl
Journal: J Med Chem Date: 2014-04-16 Impact factor: 7.446

8. Prediction of Activity Cliffs Using Condensed Graphs of Reaction Representations, Descriptor Recombination, Support Vector Machine Classification, and Support Vector Regression.

Authors: Dragos Horvath; Gilles Marcou; Alexandre Varnek; Shilva Kayastha; Antonio de la Vega de León; Jürgen Bajorath
Journal: J Chem Inf Model Date: 2016-08-26 Impact factor: 4.956

9. Interdependence between physical parameters and selection of substituent groups for correlation studies.

Authors: P N Craig
Journal: J Med Chem Date: 1971-08 Impact factor: 7.446

10. Using matched molecular series as a predictive tool to optimize biological activity.

Authors: Noel M O'Boyle; Jonas Boström; Roger A Sayle; Adrian Gill
Journal: J Med Chem Date: 2014-03-14 Impact factor: 7.446

11 in total

1. The use of matched molecular series networks for cross target structure activity relationship translation and potency prediction.

Authors: Christopher E Keefer; George Chang
Journal: Medchemcomm Date: 2017-10-11 Impact factor: 3.597

2. LogD Contributions of Substituents Commonly Used in Medicinal Chemistry.

Authors: Matthew L Landry; James J Crawford
Journal: ACS Med Chem Lett Date: 2019-12-11 Impact factor: 4.345

Review 3. Artificial intelligence to deep learning: machine intelligence approach for drug discovery.

Authors: Rohan Gupta; Devesh Srivastava; Mehar Sahu; Swati Tiwari; Rashmi K Ambasta; Pravir Kumar
Journal: Mol Divers Date: 2021-04-12 Impact factor: 3.364

4. Chemical reaction vector embeddings: towards predicting drug metabolism in the human gut microbiome.

Authors: Emily K Mallory; Ambika Acharya; Stefano E Rensi; Peter J Turnbaugh; Roselie A Bright; Russ B Altman
Journal: Pac Symp Biocomput Date: 2018

5. SMILES-based deep generative scaffold decorator for de-novo drug design.

Authors: Josep Arús-Pous; Atanas Patronov; Esben Jannik Bjerrum; Christian Tyrchan; Jean-Louis Reymond; Hongming Chen; Ola Engkvist
Journal: J Cheminform Date: 2020-05-29 Impact factor: 5.514

6. Mapping the S1 and S1' subsites of cysteine proteases with new dipeptidyl nitrile inhibitors as trypanocidal agents.

Authors: Lorenzo Cianni; Carina Lemke; Erik Gilberg; Christian Feldmann; Fabiana Rosini; Fernanda Dos Reis Rocho; Jean F R Ribeiro; Daiane Y Tezuka; Carla D Lopes; Sérgio de Albuquerque; Jürgen Bajorath; Stefan Laufer; Andrei Leitão; Michael Gütschow; Carlos A Montanari
Journal: PLoS Negl Trop Dis Date: 2020-03-12

7. Transformer-based molecular optimization beyond matched molecular pairs.

Authors: Jiazhen He; Eva Nittinger; Christian Tyrchan; Werngard Czechtizky; Atanas Patronov; Esben Jannik Bjerrum; Ola Engkvist
Journal: J Cheminform Date: 2022-03-28 Impact factor: 5.514

Review 8. Application of Artificial Intelligence in Discovery and Development of Anticancer and Antidiabetic Therapeutic Agents.

Authors: Amal Alqahtani
Journal: Evid Based Complement Alternat Med Date: 2022-04-25 Impact factor: 2.650

9. Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design.

Authors: Karolina Kwapien; Eva Nittinger; Jiazhen He; Christian Margreitter; Alexey Voronov; Christian Tyrchan
Journal: ACS Omega Date: 2022-07-19

10. Retrospective assessment of rat liver microsomal stability at NCATS: data and QSAR models.

Authors: Vishal B Siramshetty; Pranav Shah; Edward Kerns; Kimloan Nguyen; Kyeong Ri Yu; Md Kabir; Jordan Williams; Jorge Neyra; Noel Southall; Ðắc-Trung Nguyễn; Xin Xu
Journal: Sci Rep Date: 2020-11-26 Impact factor: 4.996