Literature DB >> 22718786

ChromoHub: a data hub for navigators of chromatin-mediated signalling.

Lihua Liu¹, Xi Ting Zhen, Emily Denton, Brian D Marsden, Matthieu Schapira.

Abstract

UNLABELLED: The rapidly increasing research activity focused on chromatin-mediated regulation of epigenetic mechanisms is generating waves of data on writers, readers and erasers of the histone code, such as protein methyltransferases, bromodomains or histone deacetylases. To make these data easily accessible to communities of research scientists coming from diverse horizons, we have created ChromoHub, an online resource where users can map on phylogenetic trees disease associations, protein structures, chemical inhibitors, histone substrates, chromosomal aberrations and other types of data extracted from public repositories and the published literature. The interface can be used to define the structural or chemical coverage of a protein family, highlight domain architectures, interrogate disease relevance or zoom in on specific genes for more detailed information. This open-access resource should serve as a hub for cell biologists, medicinal chemists, structural biologists and other navigators that explore the biology of chromatin signalling. AVAILABILITY: http://www.thesgc.org/chromohub/.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2012 PMID： 22718786 PMCID： PMC3413389 DOI： 10.1093/bioinformatics/bts340

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Chromatin-mediated control of gene expression and cell fate is regulated in part by distinct combinations of post-translational modifications on histone proteins, predominantly methylation or acetylation of lysine or arginine side chains at the N-terminal tails of histones (Fierz and Muir, 2012; Kouzarides, 2007; Strahl and Allis, 2000). Alteration of this histone code can lead to diseases states, and chemical inhibition of proteins that write, read or erase histone marks represents a promising avenue to restore to normal level disease-associated gene expression (Arrowsmith ). Following the clinical validation of this strategy with histone deacetylase (HDAC) inhibitors (Prince ), writers, readers and erasers of histone marks constitute emerging therapeutic target classes for a variety of disease conditions. Consequently, a dramatic increase in research activity has been observed in the field. The rapidly growing body of heterogeneous data generated by diverse communities of scientists is in part accessible through public repositories or the published literature. But retrieving the data is time-consuming at best for the non-specialist: will a cell biologist know where to find what small molecule inhibitors are available for his protein of interest, and what their IC 50's are? Will a medicinal chemist easily retrieve all compounds co-crystallized with her protein target? Will a biochemist know which somatic aberration, buried in vast cancer genomics databases, is linked to the protein he is characterizing? We have built a database that integrates such heterogeneous data types extracted from multiple repositories and the published literature. A simple yet powerful web interface allows research scientists coming from diverse horizons who are interested in writers, readers and erasers of the histone code to interrogate this database through phylogenetic representations of each protein family. The interface is freely available and should promote cross-pollination between diverse communities of scientists interested in epigenetic signalling.

2 METHODS

2.1 Assembling protein families

Human protein families were defined by the presence of specific domains involved in writing, reading and erasing histone marks (Arrowsmith ; Kouzarides, 2007): protein methyltransferase (PMT) and histone acetyltransferase (HAT) domains for writers, lysine demethylase (KDM) and HDAC domains for erasers and bromodomains (BRD) for readers of acetylated lysines. Different domains are known to bind methyl-lysines (Tudor, MBT, Chromo, PWWP, PHD and BAH), and each defined a subfamily of its own (Kuo ; Taverna ). The human protein reference database (Keshava Prasad ), the PFAM (Punta ) and SMART databases (Schultz ) were queried to retrieve all human genes containing at least one of these domains. Duplicates were removed and missing genes clearly documented in the published literature were added manually.

2.2 Generating phylogenetic trees

For each protein family, two phylogenetic trees were produced. The first was based on a ClustalW (Larkin ) multiple sequence alignment of the default UniProt protein variant of each human gene. The second was based on a multiple sequence alignment of the domain after which the family was named (a domain-based tree was not generated for HATs as the catalytic domain is not always clearly defined for this family). In this case, a seed sequence alignment was derived from available protein structures by aligning residues that were superimposed in the three-dimensional space in ICM (Molsoft, San Diego). Additional sequences were appended by aligning them to the closest seed sequence in ICM. A PHP script plotted a phylogenetic tree from the Newick string of the multiple sequence alignment and automatically defined X, Y coordinates next to each leaf of the tree for metadata mapping (Supplementary Methods). We verified that this methodology produced a phylogeny in agreement with trees previously published in the literature (Filippakopoulos ; Richon ). A larger version of the PMT family was reported that includes numerous putative arginine methyltransferases; these were not included as the authors of that work stated that they did not want to imply that these proteins are protein arginine methyltransferases per se (Richon ).

2.3 Metadata source

Data related to the biology, structural and chemical coverage of each gene were extracted from diverse repositories and stored in MySQL. Function summary, sub-cellular location and polymorphisms were retrieved from UniProt records. Tissue expression data were collected from the GNF's BioGPS (Wu ). Cancer-associated chromosomal aberrations were extracted from the Mitelman database (http://cgap.nci.nih.gov/Chromosomes/Mitelman)and the Sanger Institute's cancer gene census (http://www.sanger.ac.uk/genetics/CGP/Census/). Protein interactions were from the String database (Szklarczyk ). Structural coverage was produced by querying the Protein Databank (http://www.rcsb.org/pdb)with Blast. Protein domain architecture was defined by querying the PFAM database with HMMER ( e-value cutoff of 0.01) (Sonnhammer ). NIH funding was extracted from NIH's RePORT (http://projectreporter.nih.gov/reporter.cfm)and published literature from Pubmed. NCBI's built-in links between Pubmed records and genes were used to retrieve articles associated to human, mouse or rat orthologues of the gene of interest, and keywords embedded in Pubmed's MeSH terms served to associate Pubmed records with diseases. Histone substrates and chemical inhibitors were manually extracted from the literature and all records were linked to their respective Pubmed or patent reference. All chemical inhibitors from BindingDb can also be mapped on the trees (Liu ). Pubmed records, disease association, funding and structure coverage are updated automatically on a weekly basis. Other data are updated manually.

3 RESULTS

The online user interface is based on phylogenetic representations of protein families involved in writing, reading and erasing histone post-translational modifications. Users can choose between phylogenetic classification derived from multiple alignments of full-length sequences or sequences of the domain after which the family was named. Thumbnails of phylogenetic trees for each protein family can be clicked to display larger images. Once a tree is selected, the sequence alignment used to generate the tree can be downloaded. Checkboxes can be selected to map a diverse array of data on the tree of interest. Information on the data source is provided in a window that pops-up when hovering over a [i] icon next to the checkbox. Once a checkbox is selected, associated symbols are shown next to each protein for which data is available. More information is then accessible by hovering over or clicking on the symbol of interest. Users can easily navigate the functional, structural and chemical landscape of each protein family. They can display functional summaries for each gene on the trees, list structures in the Protein Databank covering each gene and map them on linear representations of the protein where PFAM domains are highlighted, display small molecule co-crystallized with any protein or retrieve chemical inhibitors reported in the published or patent literature; they can see the number of entries in Pubmed for each gene and inspect disease associations automatically inferred from Pubmed records; users can easily access chromosomal aberrations linked to cancer, tissue expression data, sub-cellular location or histone substrates. Images can be saved on the desktop and embedded in presentations. Newcomers in the field can search for potential collaborators by looking for research laboratories with active funding on their gene of interest.

4 CONCLUSION

The explosion of research activity on epigenetic signalling and recent technological breakthroughs in genome-scale biology are providing a wealth of data related to writers, readers and erasers of histone marks. The open-access resource that we have developed should help research scientists involved in chromatin biology rapidly find data that inform their research.

17 in total

1. SMART: a web-based tool for the study of genetically mobile domains.

Authors: J Schultz; R R Copley; T Doerks; C P Ponting; P Bork
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

Review 2. Epigenetic protein families: a new frontier for drug discovery.

Authors: Cheryl H Arrowsmith; Chas Bountra; Paul V Fish; Kevin Lee; Matthieu Schapira
Journal: Nat Rev Drug Discov Date: 2012-04-13 Impact factor: 84.694

Review 3. Chromatin modifications and their function.

Authors: Tony Kouzarides
Journal: Cell Date: 2007-02-23 Impact factor: 41.582

Review 4. How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers.

Authors: Sean D Taverna; Haitao Li; Alexander J Ruthenburg; C David Allis; Dinshaw J Patel
Journal: Nat Struct Mol Biol Date: 2007-11-05 Impact factor: 15.369

Review 5. Clinical studies of histone deacetylase inhibitors.

Authors: H Miles Prince; Mark J Bishton; Simon J Harrison
Journal: Clin Cancer Res Date: 2009-06-09 Impact factor: 12.531

6. Clustal W and Clustal X version 2.0.

Authors: M A Larkin; G Blackshields; N P Brown; R Chenna; P A McGettigan; H McWilliam; F Valentin; I M Wallace; A Wilm; R Lopez; J D Thompson; T J Gibson; D G Higgins
Journal: Bioinformatics Date: 2007-09-10 Impact factor: 6.937

7. Pfam: multiple sequence alignments and HMM-profiles of protein domains.

Authors: E L Sonnhammer; S R Eddy; E Birney; A Bateman; R Durbin
Journal: Nucleic Acids Res Date: 1998-01-01 Impact factor: 16.971

8. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities.

Authors: Tiqing Liu; Yuhmei Lin; Xin Wen; Robert N Jorissen; Michael K Gilson
Journal: Nucleic Acids Res Date: 2006-12-01 Impact factor: 16.971

9. Human Protein Reference Database--2009 update.

Authors: T S Keshava Prasad; Renu Goel; Kumaran Kandasamy; Shivakumar Keerthikumar; Sameer Kumar; Suresh Mathivanan; Deepthi Telikicherla; Rajesh Raju; Beema Shafreen; Abhilash Venugopal; Lavanya Balakrishnan; Arivusudar Marimuthu; Sutopa Banerjee; Devi S Somanathan; Aimy Sebastian; Sandhya Rani; Somak Ray; C J Harrys Kishore; Sashi Kanth; Mukhtar Ahmed; Manoj K Kashyap; Riaz Mohmood; Y L Ramachandra; V Krishna; B Abdul Rahiman; Sujatha Mohan; Prathibha Ranganathan; Subhashri Ramabadran; Raghothama Chaerkady; Akhilesh Pandey
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971

10. BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources.

Authors: Chunlei Wu; Camilo Orozco; Jason Boyer; Marc Leglise; James Goodale; Serge Batalov; Christopher L Hodge; James Haase; Jeff Janes; Jon W Huss; Andrew I Su
Journal: Genome Biol Date: 2009-11-17 Impact factor: 13.583

35 in total

Review 1. The bromodomain: from epigenome reader to druggable target.

Authors: Roberto Sanchez; Jamel Meslamani; Ming-Ming Zhou
Journal: Biochim Biophys Acta Date: 2014-03-28

Review 2. Targeting bromodomains: epigenetic readers of lysine acetylation.

Authors: Panagis Filippakopoulos; Stefan Knapp
Journal: Nat Rev Drug Discov Date: 2014-04-22 Impact factor: 84.694

Review 3. Lysine Methylation Regulators Moonlighting outside the Epigenome.

Authors: Evan M Cornett; Laure Ferry; Pierre-Antoine Defossez; Scott B Rothbart
Journal: Mol Cell Date: 2019-09-19 Impact factor: 17.970

4. Epigenetic Control of Skeletal Development by the Histone Methyltransferase Ezh2.

Authors: Amel Dudakovic; Emily T Camilleri; Fuhua Xu; Scott M Riester; Meghan E McGee-Lawrence; Elizabeth W Bradley; Christopher R Paradise; Eric A Lewallen; Roman Thaler; David R Deyle; A Noelle Larson; David G Lewallen; Allan B Dietz; Gary S Stein; Martin A Montecino; Jennifer J Westendorf; Andre J van Wijnen
Journal: J Biol Chem Date: 2015-09-30 Impact factor: 5.157

5. ChromoHub V2: cancer genomics.

Authors: Muhammad A Shah; Emily L Denton; Lihua Liu; Matthieu Schapira
Journal: Bioinformatics Date: 2013-12-06 Impact factor: 6.937

6. Engineering Methyllysine Writers and Readers for Allele-Specific Regulation of Protein-Protein Interactions.

Authors: Simran Arora; W Seth Horne; Kabirul Islam
Journal: J Am Chem Soc Date: 2019-09-20 Impact factor: 15.419

7. Profiling of human epigenetic regulators using a semi-automated real-time qPCR platform validated by next generation sequencing.

Authors: Amel Dudakovic; Martina Gluscevic; Christopher R Paradise; Halil Dudakovic; Farzaneh Khani; Roman Thaler; Farah S Ahmed; Xiaodong Li; Allan B Dietz; Gary S Stein; Martin A Montecino; David R Deyle; Jennifer J Westendorf; Andre J van Wijnen
Journal: Gene Date: 2017-01-27 Impact factor: 3.688

8. Global Profiling of Acetyltransferase Feedback Regulation.

Authors: David C Montgomery; Julie M Garlick; Rhushikesh A Kulkarni; Steven Kennedy; Abdellah Allali-Hassani; Yin-Ming Kuo; Andrew J Andrews; Hong Wu; Masoud Vedadi; Jordan L Meier
Journal: J Am Chem Soc Date: 2016-05-17 Impact factor: 15.419

Review 9. RNA-modifying proteins as anticancer drug targets.

Authors: P Ann Boriack-Sjodin; Scott Ribich; Robert A Copeland
Journal: Nat Rev Drug Discov Date: 2018-05-18 Impact factor: 84.694

10. An integrated genomic analysis of Tudor domain-containing proteins identifies PHD finger protein 20-like 1 (PHF20L1) as a candidate oncogene in breast cancer.

Authors: Yuanyuan Jiang; Lanxin Liu; Wenqi Shan; Zeng-Quan Yang
Journal: Mol Oncol Date: 2015-10-28 Impact factor: 6.603