| Literature DB >> 35628532 |
Sebastjan Kralj1, Marko Jukič1,2, Urban Bren1,2.
Abstract
High-throughput virtual screening (HTVS) is, in conjunction with rapid advances in computer hardware, becoming a staple in drug design research campaigns and cheminformatics. In this context, virtual compound library design becomes crucial as it generally constitutes the first step where quality filtered databases are essential for the efficient downstream research. Therefore, multiple filters for compound library design were devised and reported in the scientific literature. We collected the most common filters in medicinal chemistry (PAINS, REOS, Aggregators, van de Waterbeemd, Oprea, Fichert, Ghose, Mozzicconacci, Muegge, Egan, Murcko, Veber, Ro3, Ro4, and Ro5) to facilitate their open access use and compared them. Then, we implemented these filters in the open platform Konstanz Information Miner (KNIME) as a freely accessible and simple workflow compatible with small or large compound databases for the benefit of the readers and for the help in the early drug design steps.Entities:
Keywords: compound filtering; compound libraries; high-throughput virtual screening; library design; virtual screening
Mesh:
Year: 2022 PMID: 35628532 PMCID: PMC9147459 DOI: 10.3390/ijms23105727
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Percentage of compounds passing the described filters. The total number of sampled compounds of the unfiltered library is used as the denominator.
Figure 2(A) The average descriptor value with SD of SlogP for the unfiltered and post filtering libraries. The majority of the filters have values close to the average, with AggregatorsLow and Rule-of-3 scoring lower, since they have strict cut-off values for SlogP. The partition coefficient is used to assess the lipophilicity of a drug and its ability to cross cell membranes. (B) The average descriptor value with SD of SMR for the unfiltered and post filtering libraries. The clear outliers are the Ro3 and Ro4, which strictly define the chemical space. (C) The average descriptor value with SD of TPSA for the unfiltered and post filtering libraries. The Ro3 with molecules small in size scores lower than the average, with the Ro4 being slightly above the average, but not as significantly as in the previous graphs. (D) The average descriptor value with SD of MW for the unfiltered and filtered libraries. The molecular weight descriptor is a very common descriptor used for cut-off values. As most of the filters aim at drug-like molecules except for the Ro4 and Ro3, the average weights are very similar.
Figure 3(A) The average descriptor value with SD of the No. of rotatable bonds for the unfiltered and filtered libraries. The graph bears resemblance to the other graphs where Ro3 and Ro4 stand out, with the other filters having average values close to the unfiltered library. (B) The average descriptor value with SD of the No. of HBD. We see that molecules that pass the aggregators filter have a slightly higher value of hydrogen bond donors. What is interesting is also the fact that the Ro4 scores lower than the average, despite having molecules that are larger and contain more N and O atoms which are usually involved in hydrogen bonding (Figure 4). (C) The average descriptor value with SD of the No. of HBA before and after filtering of the library. The Ro3 filter has a significantly lower value as its aim is to find the starting fragments from which the molecule is built. This usually leaves space for the attachment of desired functional groups to the fragment, but as a result the number of HBA is lower. (D) The average descriptor value with SD of the No. of rings present before and after filtering of the library. As Ro4 shifts the chemical space towards larger molecules the number of rings increases as well. The opposite happens with Ro3 where the small molecular weight does not allow for a large number of rings to be present.
Figure 4The average number of C, N, and O atoms present in the compounds. The majority of the libraries are within the average values of the unfiltered, with Rule-of-4 having higher values since the filter retains large molecules that are better suited for inhibiting protein–protein interactions. Rule-of-3 scores lower as it retains smaller compounds suitable for fragment-based drug design.
Figure 5A 2D scatter plot of SlogP and molecular weight for compounds that passed individual filters. We can see the impact of using filters on the chemical space as the Rule-of-3 (blue) is totally separated from the Rule-of-4 (purple) group. This is due to the strict molecular weight cut-off and the SlogP cut-off as we see horizontal and vertical lines indicating where the cut-offs are. Since Lipinski’s Rule-of-5 (green) allows one rule break per compound, we do not observe such strict horizontal lines and the chemical space after filtering is still very similar to the unfiltered (red) library.
Figure 6A 3D scatterplot of exact molecular weight, SlogP, and total polar surface area (TPSA) for the unfiltered (red) and filtered Rule-of-3 (Blue), Rule-of-4 (purple), and Rule-of-5 (green). We can see how the space occupied by compounds changes drastically; similar observations can be made as in the case of the 2D plot in Figure 5.
The most common functional group filters described in the scientific literature presented in alphabetical order.
| Name/Reference | Description | Features/Cutoff Values |
|---|---|---|
| Aggregators | Tanimoto coefficient similarity search to a database of known aggregators. | Tanimoto coefficient similarity ≥ 0.85 or SlogP > 5 (high similarity), |
| Tanimoto coefficient similarity ≥ 0.5 and SlogP > 3 (medium similarity), | ||
| Tanimoto coefficient similarity < 0.85 and SlogP ≤ 3 (low similarity) | ||
| Ely Lilly Rules | A set of 275 rules, developed over an 18-year period, used to identify compounds that may interfere with biological assays, allowing their removal from screening sets. | Reasons for rejection of compounds: reactivity, interference with assay measurements (fluorescence, absorbance, quenching), instability and lack of druggability (lacking both oxygen and nitrogen) |
| Muegge method | Bioavailability prediction rules dubbed the Muegge method. Pharmacophore filter developed by analyzing known drug databases, with four functional molecular motifs determined to be important in drug-like molecules: | Primary, secondary, and tertiary amines are considered pharmacophore points but not pyrrole, indole, thiazole, isoxazole, other azoles, or diazines. Compounds with more than one carboxylic acid are dismissed. Compounds without a ring structure are dismissed. Intracyclic amines that occur in the same ring are fused and count as only one pharmacophore point. |
| PAINS | Removal of frequent hitters (promiscuous compounds) by identifying sub-structural features not recognized by filters commonly used to identify reactive compounds. | Functional groups such as rhodanines, phenolic Mannich bases, hydroxyphenylhydrazones, alkylidene barbiturates, alkylidene heterocycles, 1,2,3-aralkylpyrroles, activated benzofurazans, 2-amino-3-carbonylthiophenes, catechols, and quinones do not pass the filters. |
| REOS 1 | Seven property filters | H-bond donor ≤ 5, |
| (similar to the PATTY | H-bond acceptors ≤ 10, | |
| rules in program developed at Merck) | −2 ≤ Formal charge ≤ +2, | |
| Number of rotatable bonds ≤ 8, | ||
| 200 ≤ Molecular weight ≤ 500, | ||
| 20 ≤ number of heavy atoms ≤ 50, | ||
| −2 ≤ logP ≤ 5 | ||
| Functional group filters for the removal of problematic structures dubbed REOS (rapid elimination of swill; program developed at Vertex). | Reactive, toxic and other undesirable moieties such as nitro groups, preoxides, triflates, aldehydes, acetals, etc. |
1 REOS is a hybrid filter which combines a set of functional group filters with property filters. As the REOS filter can be combined with other (property) filtering schemes, the property filtering part can be omitted and only functional group filters employed. As implemented in KNIME, the user can also specify the maximum quantity for each of the functional group rules, tuning the filter to the needs of the individual research scenario. REOS moieties in the SMARTS format can be found inside the KNIME workflow “REOS substructures” node.
The most common property filters described in the scientific literature.
| Name/Reference | Description | Features/Cutoff Values |
|---|---|---|
| Egan | Set of rules designed by analyzing the data on compounds both well and poorly absorbed in humans with multivariate statistics. Two descriptors (AlogP and PSA) were chosen for inclusion when determining membrane permeability. Compounds that pass exhibit good bioavailability. | AlogP ≤ 5.88, |
| polar surface area ≤ 131.6 Å2 | ||
| Fichert | Rules for structure-permeability based on a set of 41 small drug-like molecules. LogD is the main property that determines permeability, with structures passing this filter being highly permeable in the Cacao-2 model. | Molecular weight ≤ 500, |
| 0 ≤ logD ≤ 3 | ||
| Ghose | A set of rules for drug-likeness derived from characterizing 6304 compounds taken from the Comprehensive Medicinal Chemistry Database. | 180 ≤ molecular weight ≤ 480, |
| 40 ≤ molecular refractivity ≤ 130, | ||
| −0.4 ≤ ClogP ≤ 5.6, | ||
| 20 ≤ number of atoms ≤ 70 | ||
| Lee filter | Analysis of natural products to determine potential appealing scaffolds for future drug design. Pharmacophoric properties of natural products, trade drugs, and virtual combinatorial library were assessed, finding key properties and several scaffolds which could work as building blocks. | MW mean ~356 |
| LogP mean ~2.1 | ||
| Lipinski | A set of four rules for drug-likeness and oral bioavailability derived from a subset of 2245 drugs from the World Drug Index. The rules aim to address the ADME issues. | Molecular weight ≤ 500, |
| logP ≤ 5, | ||
| H-bond donors ≤ 5, | ||
| H-bond acceptors ≤ 10 | ||
| Mozzicconacci | Filter developed by Mozziconacci after analyzing 15 freely available chemical libraries (2 million compounds). Drug-likeness was examined using common chemical features and based on the successive filters were designed to extract the drug-like subset. | Rotatable bonds ≤ 15, |
| number of rings ≤ 6, | ||
| oxygen atoms ≥ 1, | ||
| nitrogen atoms ≥ 1, | ||
| halogen atoms ≤ 7 | ||
| Murcko filter | Rules for determining CNS activity, joining 7 property descriptors (Rule-of-5 with the addition of rotatable bonds, aromatic density, and a measure for branching) and 166 fingerprint descriptors to determine presence or absence of functional groups. | MW 200–540, |
| logP 0–5.2, | ||
| H-bond acceptors ≤ 4, | ||
| H-bond donor ≤ 3, | ||
| rotatable bonds ≤ 7, | ||
| branching behavior 3.4–12.2, | ||
| aromatic rings < 3 | ||
| Oprea Lead-Like | A set of rules based on lead-like vs. drug-like comparison after examination of several commercially available databases. The rules aim to maintain focus towards effective and orally absorbable compounds. Beside the properties chosen based on the Rule-of-5, additional properties were chosen to better reflect molecular complexity of a library and the rigidity of a molecule. | Molecular weight < 450, |
| −3.5 ≤ logP < 4.5, | ||
| −4 ≤ logD ≤ 4, | ||
| number of rings ≤ 4, | ||
| nonterminal single bonds ≤ 10, | ||
| H-bond donor ≤ 5, | ||
| H-bond acceptor ≤ 8 | ||
| Rule-of-3 | Rules designed to support “fragment-based” drug research. Hits obtained using this filter can be useful for fragment libraries used to generate potential leads. Fragment libraries are useful for sampling chemical diversity or targeting specific interactions. | Molecular weight ≤ 300, |
| logP ≤ 3, | ||
| H-bond donor ≤ 3, | ||
| H-bond acceptors ≤ 3 | ||
| rotatable bonds ≤ 3 | ||
| Rule-of-4 | A set of rules derived from analyzing the 2P2I database that contains protein–protein interaction inhibitors with the aim of establishing guidelines for druggable protein–protein inhibitors, since these most often break traditional property filter rules. | Molecular weight ≥ 400, |
| logP ≥ 4, | ||
| number of rings ≥ 4, | ||
| H-bond acceptors ≥ 4 | ||
| van de Waterbeemd | Physiochemical properties for estimation of blood–brain barrier crossing of compounds. Rules were derived by examination of lipophilicity, H-bonding capacity, and molecular shape and size descriptors of marketed CNS and CNS-inactive drugs. | Molecular weight ≤ 450, |
| polar surface area ≤ 90 Å2 | ||
| Veber | Two rules to meet the criteria for oral bioavailability derived after studying bioavailability measurements in rats for of over 1100 drug candidates at GlaxoSmithKline. | Rotatable bonds ≤ 10, |
| polar surface area ≤ 140 Å2 |
Figure 7KNIME workflow example of our Veber filter implementation for effective design of compound libraries. Black lines represent the expanded meta node that contains sub-nodes [51].
Figure 8An example of combining meta nodes to form a complex drug design workflow.