canSAR (http://cansar.icr.ac.uk) is a publicly available, multidisciplinary, cancer-focused knowledgebase developed to support cancer translational research and drug discovery. canSAR integrates genomic, protein, pharmacological, drug and chemical data with structural biology, protein networks and druggability data. canSAR is widely used to rapidly access information and help interpret experimental data in a translational and drug discovery context. Here we describe major enhancements to canSAR including new data, improved search and browsing capabilities, new disease and cancer cell line summaries and new and enhanced batch analysis tools.
canSAR (http://cansar.icr.ac.uk) is a publicly available, multidisciplinary, cancer-focused knowledgebase developed to support cancer translational research and drug discovery. canSAR integrates genomic, protein, pharmacological, drug and chemical data with structural biology, protein networks and druggability data. canSAR is widely used to rapidly access information and help interpret experimental data in a translational and drug discovery context. Here we describe major enhancements to canSAR including new data, improved search and browsing capabilities, new disease and cancer cell line summaries and new and enhanced batch analysis tools.
Translating biological knowledge and discoveries from large-scale omic data to new cancer drugs and clinical biomarkers requires significant effort invested into understanding of mechanisms and experimental biological validation. These experiments are greatly empowered by the availability of as much relevant information as possible in an easily accessible and understandable form. In our increasingly multidisciplinary world, this information needs to come from many different scientific domains that have historically been separate.canSAR, initially described in NAR in 2011 (1) and updated in 2014 (2), is the first and, to our knowledge, remains the largest multidisciplinary resource to support cancer drug discovery and translational research. canSAR was developed to bring together diverse data from across all domains that will benefit cancer drug discovery. It is used by >150 000 unique users from 179 countries, and is used by biologists, chemists and translational and clinical scientists, from both academia and industry. Here we describe major updates in canSAR v3.0 both in data and functionality.
DATA CONTENT AND GROWTH
canSAR's aim is to provide comprehensive multidisciplinary annotation for genes and biological systems to enable target validation and drug discovery. canSAR contains the full complement of the human proteome as well as 528 805 proteins from 16 634 model organisms and data for 11 778 cancer and non-transformed cell line models. Furthermore, canSAR contains 208 269 659 experimental data points for 9 390 patient-derived tissue samples (for breakdown see http://cansar.icr.ac.uk/cansar/data-sources/). There are 111 414 3D structures for 21 658 proteins, collectively containing 215 178 ligands determined in complex with a protein. We have collated 367 465 high quality experimentally derived protein–protein interactions (see below) for 16 680 proteins which we have annotated with all chemogenomic and structural data form canSAR.canSAR contains chemical and pharmacological data for over one million, bioactive, small molecule drugs and compounds corresponding to >8 121 000 pharmacological bioactivities as well as over 10 million calculated chemical properties. Moreover, we have now begun curating these bioactive compounds for their suitability as investigative chemical probes for target validation (see Target Synopsis section below).To our knowledge, canSAR remains the world's most comprehensive druggability assessment resource containing multidisciplinary druggability assessments for the majority of the human proteome. The latest version of canSAR provides 3D-structure-based druggability assessment for 2 836 425 cavities on 109 475 protein structures (PDB chains); ligand-based druggability assessment for 8 197 human proteins and, more recently, protein network-based druggability results for 13 345 human proteins. Together these provide a powerful enabler for target selection and validation for drug discovery.The underlying architecture of canSAR is designed to ensure full linkage of all data types across the multidisciplinary data contained within it. All data are linked to their original data sources or publications, wherever available, thus ensuring data provenance and enabling researchers to access the original studies. The data in canSAR are updated at regular intervals as dictated by the data type. For example, 3D structure data (3) and canSAR's structure-based druggability (4) calculations are updated weekly; while data from the ChEMBL (5) database are typically 1–2 weeks after the ChEMBL update. Full details about the updates are provided here (http://cansar.icr.ac.uk/cansar/data-sources/).
In the era of mechanism-driven drug discovery and translational research, scientists frequently need to access as much information about a gene or target of interest in one place, in an easily digestible form, to enable them to identify key pieces of information and generate hypotheses for experimental validation and biological exploration. The new enhanced canSAR Target Synopsis provides visual and tabular summaries on diverse data including functional data, protein families, 3D structure, chemical bioactivities and pharmacological data, genetic and gene transcriptional alterations and pharmacologically annotated protein interaction networks and other data. The Target Synopsis allows rapid visualisation of genetic and gene transcriptional alterations from patient tissue as well as cancer cell lines (Figure 1).
Figure 1.
Molecular target synopsis: new features. (A) Curated chemical probes targeting BRD4. (B) BRD4 normalised expression (z-scores) across TCGA studies. Interactive comparison between normal (green bars) and primary tumour (blue bars) samples. Metastatic samples have been deselected in this example. (C) NCI60 and Gene Expression Atlas cell lines ranked by BRD4 expression. (D) Mutation incidence for BRD4 across COSMIC cell lines. Clicking on a cell line graph bar presents a detailed mutational profile for the gene. For example: (E) BRD4 mutations in colorectal cell lines.
Molecular target synopsis: new features. (A) Curated chemical probes targeting BRD4. (B) BRD4 normalised expression (z-scores) across TCGA studies. Interactive comparison between normal (green bars) and primary tumour (blue bars) samples. Metastatic samples have been deselected in this example. (C) NCI60 and Gene Expression Atlas cell lines ranked by BRD4 expression. (D) Mutation incidence for BRD4 across COSMIC cell lines. Clicking on a cell line graph bar presents a detailed mutational profile for the gene. For example: (E) BRD4 mutations in colorectal cell lines.We also provide an individual target view on a target's druggability using all calculable druggability assessments (3D structure-based, ligand-based and network-based druggability). canSAR contains an increasing number of manually curated drugs, clinical candidates and, more recently, we have begun the curation of chemical probes from public repositories such as the Chemical Probes Portal (www.chemicalprobes.org) for use in experimental evaluation of the target or its pathway (Figure 1).The immediate availability and visualisation of these data allows researchers to rapidly gain a view about the state of knowledge around a particular target including its alteration in cancer cohorts, to assess its druggability, and to discover whether drugs or chemical tools exist to evaluate its function.
DISEASE SYNOPSIS AND CLINICAL TRIAL DATA
A ‘disease’ view on all the multidisciplinary data in canSAR allows rapid view and drill down into drugs approved, or under clinical investigation, for a particular cancer type. The ‘Disease Synopsis’ (Figure 2) provides summaries on the number of drugs and clinical trials available for any cancer type or subtype and allows the exploration of key genetic and transcriptional alterations identified in patient cohorts as well as cancer cell line models for this cancer type. Moreover, the clinical trial view allows immediate visualisation of the number, phases and status of drugs in clinical trials for this cancer. We include information from >179 150 cancer trials. Finally, the user can also browse and explore cancer cell line models for a particular cancer type (Figure 2). These data are updated monthly.
Figure 2.
Disease synopsis: clinical trials. (A) Global summary of clinical trials with information on approved drugs, clinical candidates, chemical probes and cell lines models applicable to each particular cancer type. Clicking on the desired disease link, e.g. Prostate Cancer, reveals detailed information specific to the disease (B). This includes: (i) drugs and clinical candidates with chemical structure links to the detailed canSAR Compound Synopsis. (ii) Timelines for applicable clinical trials, that can be filtered, sorted and grouped by phase by the user. Hovering over a specific timeline displays a brief synopsis for the trial and the bar colour reflects the trial phase. (iii) Cell line models relevant to the disease with sortable links to the canSAR cell line synopsis pages.
Disease synopsis: clinical trials. (A) Global summary of clinical trials with information on approved drugs, clinical candidates, chemical probes and cell lines models applicable to each particular cancer type. Clicking on the desired disease link, e.g. Prostate Cancer, reveals detailed information specific to the disease (B). This includes: (i) drugs and clinical candidates with chemical structure links to the detailed canSAR Compound Synopsis. (ii) Timelines for applicable clinical trials, that can be filtered, sorted and grouped by phase by the user. Hovering over a specific timeline displays a brief synopsis for the trial and the bar colour reflects the trial phase. (iii) Cell line models relevant to the disease with sortable links to the canSAR cell line synopsis pages.
CANCER CELL LINE SYNOPSIS
Cancer cell line models remain the workhorse of cancer biological studies and target validation. Despite the plethora of information available for cancer cell lines, few, if any, resources attempted to bring all broad multidisciplinary data together in a meaningful way. The canSAR cell-line synopsis summarises genetic, gene expression and pharmacological data for 11 778 cell lines thus allowing users to identify key mutations, expressed genes and drug sensitivity behaviour for any given cell lines. Moreover, we have annotated and clustered cell lines based on tissue and cancer type allowing simple browsing and navigation. Most importantly, we utilize all the underlying information including mutations, copy number alterations, gene expression and drug sensitivity data to objectively compare all cell lines and present cell line similarity rankings. This feature enables scientists to select groups of cell lines with shared or complementary characteristics, based on full, objective, experimentally derived data (Figure 3).
Figure 3.
Cell line synopsis: similar cell lines. Apart from accessing the genetic, expression and pharmacological profile for a cell line of interest (e.g. PC-3), the user can also investigate which cell lines exhibit similar features such as (A) similarity across the cell line mutational spectrum and (B) overall or chromosome-specific copy number variation. In addition similarity can be assessed by gene expression or by drug sensitivity (not shown).
Cell line synopsis: similar cell lines. Apart from accessing the genetic, expression and pharmacological profile for a cell line of interest (e.g. PC-3), the user can also investigate which cell lines exhibit similar features such as (A) similarity across the cell line mutational spectrum and (B) overall or chromosome-specific copy number variation. In addition similarity can be assessed by gene expression or by drug sensitivity (not shown).
ENHANCED DRUGGABLE PROTEIN NETWORKS
One of the new unique utilities of canSAR is the automated annotation of protein interaction networks with key pharmacological, drug and druggability data as well as information on alteration in cancer. This allows researchers to view the environment around their target to explore other proteins within its pathway or connected cellular network. If the protein of interest is not itself druggable, or has no chemical probes that can be used to explore the biological activity of the pathway, then the immediate knowledge that other proteins that interact with it are druggable or have chemical probes becomes greatly enabling.In canSAR v3.0, as well as utilizing key protein–protein interaction databases directly (e.g. STRING (6)), we constructed a high confidence experimentally derived interactome by combing data from the IMeX consortium (7), Phosphosite (8) and other resources. The advantage of this new collection of protein interaction is that it contains directional data (>5100 direct interactions are directional) and complements the data found in other public databases.Starting with either a single target in the Target Synopsis or several targets using one of canSAR's batch annotation tools, the researcher can view and interact with protein networks where protein nodes are coloured by druggability and icons indicate the availably of key information on available drugs or chemical tools, druggability and alterations in cancer (Figure 4).
Figure 4.
Enhanced druggable protein networks. (A) Each interactor icon indicates the pharmacological potential and genetic information available in canSAR. (B) Dynamically generated interactome for EGFR, using phosphorylation reactions derived from Phosphosite (purple directed interactions) and transcriptional regulation (light blue directed interactions). Additional interaction types include Reactome Functional Interactions and molecular complexes. The network complexity can be controlled by the number of visible nodes. The network can be saved as an image or a json object for further analysis (e.g. in Cytoscape).
Enhanced druggable protein networks. (A) Each interactor icon indicates the pharmacological potential and genetic information available in canSAR. (B) Dynamically generated interactome for EGFR, using phosphorylation reactions derived from Phosphosite (purple directed interactions) and transcriptional regulation (light blue directed interactions). Additional interaction types include Reactome Functional Interactions and molecular complexes. The network complexity can be controlled by the number of visible nodes. The network can be saved as an image or a json object for further analysis (e.g. in Cytoscape).
TOOLS EMPOWERING LARGE-SCALE BIOLOGICAL DATA ANALYSIS
Following our successful initial implementation of the Cancer Protein Annotation Tool (CPAT) and in response to user feedback, we have enhanced CPAT and developed a new tool, the Cancer Cell Line Annotation Tool which provides batch-based summaries of the cell line data in canSAR.
CONCLUDING REMARKS AND FUTURE DEVELOPMENT
canSAR continues to grow both in content and functionality to enable rapid access to data relevant to cancer translational research. canSAR provides unique views on genes and proteins, drugs, 3D structures, protein interaction networks, cancer cell lines, cancer clinical trials and more. canSAR is globally used not only to access rapid multidisciplinary knowledge, but also as the key resource to aid target selection and prioritization for drug discovery (4,9–11). Documentation and example use cases are published on the canSAR online documentation pages (http://cansar.icr.ac.uk/cansar/documentation/).canSAR will continue to expand in its data and functionality. We will continue the annotation of patient-derived experimental data and cancer clinical trial information and will include clinical trial outcome data both for cancer drugs and biomarkers. We will enhance growth and the annotation of protein-network data and introduce pathways and pathway exploration tools. Much of the focus in the next phase of canSAR development will be on enhancing the search and browsing power and development of expert tools in response to user feedback.
Authors: Sandra Orchard; Samuel Kerrien; Sara Abbani; Bruno Aranda; Jignesh Bhate; Shelby Bidwell; Alan Bridge; Leonardo Briganti; Fiona S L Brinkman; Fiona Brinkman; Gianni Cesareni; Andrew Chatr-aryamontri; Emilie Chautard; Carol Chen; Marine Dumousseau; Johannes Goll; Robert E W Hancock; Robert Hancock; Linda I Hannick; Igor Jurisica; Jyoti Khadake; David J Lynn; Usha Mahadevan; Livia Perfetto; Arathi Raghunath; Sylvie Ricard-Blum; Bernd Roechert; Lukasz Salwinski; Volker Stümpflen; Mike Tyers; Peter Uetz; Ioannis Xenarios; Henning Hermjakob Journal: Nat Methods Date: 2012-04 Impact factor: 28.547
Authors: Laurence H Pearl; Amanda C Schierz; Simon E Ward; Bissan Al-Lazikani; Frances M G Pearl Journal: Nat Rev Cancer Date: 2015-03 Impact factor: 60.716
Authors: Mishal N Patel; Mark D Halling-Brown; Joseph E Tym; Paul Workman; Bissan Al-Lazikani Journal: Nat Rev Drug Discov Date: 2013-01 Impact factor: 84.694
Authors: Damian Szklarczyk; Andrea Franceschini; Michael Kuhn; Milan Simonovic; Alexander Roth; Pablo Minguez; Tobias Doerks; Manuel Stark; Jean Muller; Peer Bork; Lars J Jensen; Christian von Mering Journal: Nucleic Acids Res Date: 2010-11-02 Impact factor: 16.971
Authors: Anna Gaulton; Louisa J Bellis; A Patricia Bento; Jon Chambers; Mark Davies; Anne Hersey; Yvonne Light; Shaun McGlinchey; David Michalovich; Bissan Al-Lazikani; John P Overington Journal: Nucleic Acids Res Date: 2011-09-23 Impact factor: 16.971
Authors: Mark D Halling-Brown; Krishna C Bulusu; Mishal Patel; Joe E Tym; Bissan Al-Lazikani Journal: Nucleic Acids Res Date: 2011-10-19 Impact factor: 16.971
Authors: Peter V Hornbeck; Bin Zhang; Beth Murray; Jon M Kornhauser; Vaughan Latham; Elzbieta Skrzypek Journal: Nucleic Acids Res Date: 2014-12-16 Impact factor: 16.971
Authors: Krishna C Bulusu; Joseph E Tym; Elizabeth A Coker; Amanda C Schierz; Bissan Al-Lazikani Journal: Nucleic Acids Res Date: 2013-12-03 Impact factor: 16.971
Authors: Aleksandras Gutmanas; Younes Alhroub; Gary M Battle; John M Berrisford; Estelle Bochet; Matthew J Conroy; Jose M Dana; Manuel A Fernandez Montecelo; Glen van Ginkel; Swanand P Gore; Pauline Haslam; Rowan Hatherley; Pieter M S Hendrickx; Miriam Hirshberg; Ingvar Lagerstedt; Saqib Mir; Abhik Mukhopadhyay; Thomas J Oldfield; Ardan Patwardhan; Luana Rinaldi; Gaurav Sahni; Eduardo Sanz-García; Sanchayita Sen; Robert A Slowley; Sameer Velankar; Michael E Wainwright; Gerard J Kleywegt Journal: Nucleic Acids Res Date: 2013-11-27 Impact factor: 16.971
Authors: Ben Kinnersley; Amit Sud; Elizabeth A Coker; Joseph E Tym; Patrizio Di Micco; Bissan Al-Lazikani; Richard S Houlston Journal: JCO Clin Cancer Inform Date: 2018-12
Authors: Matthew J Regner; Kamila Wisniewska; Susana Garcia-Recio; Aatish Thennavan; Raul Mendez-Giraldez; Venkat S Malladi; Gabrielle Hawkins; Joel S Parker; Charles M Perou; Victoria L Bae-Jump; Hector L Franco Journal: Mol Cell Date: 2021-11-04 Impact factor: 17.970
Authors: Michelle L Woods; Astrid Weiss; Anna M Sokol; Johannes Graumann; Thomas Boettger; Antje M Richter; Ralph T Schermuly; Reinhard H Dammann Journal: Cancer Gene Ther Date: 2022-07-28 Impact factor: 5.854
Authors: Barry S Rosenstein; Arvind Rao; Jean M Moran; Daniel E Spratt; Marc S Mendonca; Bissan Al-Lazikani; Charles S Mayo; Corey Speers Journal: Med Phys Date: 2018-09-18 Impact factor: 4.071
Authors: David C Wedge; Gunes Gundem; Thomas Mitchell; Dan J Woodcock; Inigo Martincorena; Mohammed Ghori; Jorge Zamora; Adam Butler; Hayley Whitaker; Zsofia Kote-Jarai; Ludmil B Alexandrov; Peter Van Loo; Charlie E Massie; Stefan Dentro; Anne Y Warren; Clare Verrill; Dan M Berney; Nening Dennis; Sue Merson; Steve Hawkins; William Howat; Yong-Jie Lu; Adam Lambert; Jonathan Kay; Barbara Kremeyer; Katalin Karaszi; Hayley Luxton; Niedzica Camacho; Luke Marsden; Sandra Edwards; Lucy Matthews; Valeria Bo; Daniel Leongamornlert; Stuart McLaren; Anthony Ng; Yongwei Yu; Hongwei Zhang; Tokhir Dadaev; Sarah Thomas; Douglas F Easton; Mahbubl Ahmed; Elizabeth Bancroft; Cyril Fisher; Naomi Livni; David Nicol; Simon Tavaré; Pelvender Gill; Christopher Greenman; Vincent Khoo; Nicholas Van As; Pardeep Kumar; Christopher Ogden; Declan Cahill; Alan Thompson; Erik Mayer; Edward Rowe; Tim Dudderidge; Vincent Gnanapragasam; Nimish C Shah; Keiran Raine; David Jones; Andrew Menzies; Lucy Stebbings; Jon Teague; Steven Hazell; Cathy Corbishley; Johann de Bono; Gerhardt Attard; William Isaacs; Tapio Visakorpi; Michael Fraser; Paul C Boutros; Robert G Bristow; Paul Workman; Chris Sander; Freddie C Hamdy; Andrew Futreal; Ultan McDermott; Bissan Al-Lazikani; Andrew G Lynch; G Steven Bova; Christopher S Foster; Daniel S Brewer; David E Neal; Colin S Cooper; Rosalind A Eeles Journal: Nat Genet Date: 2018-04-16 Impact factor: 38.330
Authors: Rita Santos; Oleg Ursu; Anna Gaulton; A Patrícia Bento; Ramesh S Donadi; Cristian G Bologa; Anneli Karlsson; Bissan Al-Lazikani; Anne Hersey; Tudor I Oprea; John P Overington Journal: Nat Rev Drug Discov Date: 2016-12-02 Impact factor: 84.694