| Literature DB >> 35524571 |
Dominique Sydow1, Jaime Rodríguez-Guerra1, Talia B Kimber1, David Schaller1, Corey J Taylor1, Yonghui Chen1, Mareike Leja1, Sakshi Misra1, Michele Wichmann1, Armin Ariamajd1, Andrea Volkamer1.
Abstract
Computational pipelines have become a crucial part of modern drug discovery campaigns. Setting up and maintaining such pipelines, however, can be challenging and time-consuming-especially for novice scientists in this domain. TeachOpenCADD is a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects. We offer Python-based solutions for common tasks in cheminformatics and structural bioinformatics in the form of Jupyter notebooks, based on open source resources only. Including the 12 newly released additions, TeachOpenCADD now contains 22 notebooks that cover both theoretical background as well as hands-on programming. To promote reproducible and reusable research, we apply software best practices to our notebooks such as testing with automated continuous integration and adhering to the idiomatic Python style. The new TeachOpenCADD website is available at https://projects.volkamerlab.org/teachopencadd and all code is deposited on GitHub.Entities:
Year: 2022 PMID: 35524571 PMCID: PMC9252772 DOI: 10.1093/nar/gkac267
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Overview of 12 new talktorials. (i) Querying webservices (blue): T011 gives a broad introduction to programmatic access to webservices from Python, T012 and T013 demonstrate how to query the KLIFS (11) and PubChem (12) databases for kinase and compound data, respectively. (ii) Structural bioinformatics (orange): T014 detects the binding site in an EGFR kinase structure and compares the prediction to the binding site defined by KLIFS (11). T015 performs a re-docking for an EGFR–ligand complex with Smina (13). T016 detects protein–ligand interactions in an EGFR–ligand complex structure with PLIP (14). T017 introduces basic and advanced usages of the molecular visualization tool NGLView (15), used throughout most of TeachOpenCADD’s talktorials. T018 outlines a fully automated lead optimization pipeline: Based on an input structure, the pocket is detected and a set of compounds similar to a selected ligand are fetched from PubChem (12). These compounds are docked into the selected binding site. The most promising compounds w.r.t. docking scores and interaction profiles are proposed as optimized compounds. T019 demonstrates how to set up and run a molecular dynamics (MD) simulation on Google Colab with OpenMM (16). T020 analyzes the resulting MD trajectory with a focus on the root-mean-square deviation (RMSD) between trajectory frames and the dynamics of protein-ligand interactions using MDAnalysis (17,18). (iii) Cheminformatics (green): T021 exhibits the steps to numerically encode a small molecule from its SMILES representation. T022 lays the groundwork for deep learning and focuses on a simple feed-forward neural network for activity prediction using molecular fingerprints.
Figure 2.Data and information that can be automatically gathered for the EGFR kinase using the different web query talktorials as of September 2021, created based on ChEMBL v.27 (6) (T001), PDB (8) (T008), PubChem (12) (T013), and KLIFS (11) (T012). Input: yellow boxes, output: gray boxes, plots and molecule visualizations (using NGLView (15) and RDKit).
Figure 3.Case study for talktorial T018 depicting (A) 2D structure of the input ligand for the pipeline that was used with an EGFR crystal structure (PDB: 3W32, IC50 = 75nM); (B) 2D structure of gefitinib (IC50 = 0.17nM), an EGFR ligand found during similarity searches; (C) crystal structure of gefitinib co-crystallized with EGFR (PDB: 2ITY, black CPK representation); (D) docked pose of gefitinib (yellow CPK representation). Some segments of the protein structure have been removed for clarity. The ligand RMSD between (C) and (D) and the discovery of a higher-affinity ligand demonstrate the utility of the fully automated pipeline for early stage drug discovery.