Literature DB >> 26400163

PlantDHS: a database for DNase I hypersensitive sites in plants.

Tao Zhang1, Alexandre P Marand1, Jiming Jiang2.   

Abstract

Gene expression is regulated by orchestrated binding of regulatory proteins to promoters and other cis-regulatory DNA elements (CREs). Several plant databases have been developed for mapping promoters or DNA motifs associated with promoters. However, there is a lack of databases that allow investigation for all CREs. Here we present PlantDHS (http://plantdhs.org), a plant DNase I hypersensitive site (DHS) database that integrates histone modification, RNA sequencing, nucleosome positioning/occupancy, transcription factor binding sites, and genomic sequence within an easily navigated user interface. DHSs are indicative of all CREs, including promoters, enhancers, silencers, insulators and transcription factor binding sites; all of which play immense roles in global gene expression regulation. PlantDHS provides a platform to predict all CREs associated with individual genes from three model plant species, including Arabidopsis thaliana, Brachypodium distachyon and rice (Oryza sativa). PlantDHS is especially valuable in the detection of distant CREs that are located away from promoters.
© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26400163      PMCID: PMC4702941          DOI: 10.1093/nar/gkv962

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

DNase I hypersensitive sites (DHSs) are genomic regions that exhibit hypersensitivity to cleavage by DNase I endonucleases. These specific sites are typically inferred as open chromatin, which is accessible to regulatory proteins and thus marks cis-regulatory enriched regions in eukaryotic genomes. Chromatin accessibility can be examined by the extent of DNase I digestion, which was discovered over 30 years ago (1). Formerly, this approach involved titration of DNase I followed by characterization of digested DNAs by Southern blot hybridization (2). Recent advances in massively parallel sequencing technologies have enabled genome-wide mapping of DHSs, which can be achieved by partial DNase I digestion followed by sequencing (DNase-seq). Genome-wide DHS mapping has laid the foundations for the assembly of comprehensive catalogs of regulatory DNA sequences (3,4). This method has been particularly useful in identifying accessible cis-regulatory DNA elements (CREs), including promoters and enhancers of actively transcribed genes (4,5). Recently, DHS maps have been developed in plant species, such as Arabidopsis thaliana and rice, by using this high throughput method (6–9). However, information about such DHSs is often trapped in the supplementary materials of a publication or is only accessible through the NCBI GEO database. To address this problem, we developed PlantDHS, a web interface/application of plant DHSs. In PlantDHS, we have introduced unified DHS IDs for plant species comprising several tissues. In addition, we have also integrated histone modification, RNA-seq, nucleosome positioning/occupancy, and transcription factor (TF) binding site data. Moreover, by applying modern web infrastructure that allows user browsing and searching, DHS-related information can be readily visualized.

DATABASE ARCHITECTURE

The PlantDHS incorporates A. thaliana, B. distachyon and rice genomes, three model plant species, with DHSs, histone modification data sets developed from chromatin immunoprecipitation followed by sequencing (ChIP-seq), nucleosome positioning and occupancy, RNA-seq, and transcription factor binding site data. All the DHSs and DHS scores were identified using an in-house developed software, Popera (https://github.com/forrestzhang/Popera). First, Popera identifies the DHSs by applying the kernel density estimation algorithm, which is similar to the algorithm defined by F-seq (10). Second, all the DHSs identified from various tissues and developmental stages were merged to create a unified DHS file; DHSs were then assigned unique ID tags. Lastly, normalized scores of unified DHSs were calculated for each unique tissue and developmental stage. For RNA-seq data analysis, we used TopHat2 (11) and Cufflinks (12) for mapping RNA-seq reads and calculating gene expression levels (Fragment Per Kilobase of exon per million fragments Mapped, (FPKM)), respectively. Histone modification ChIP-seq data were mapped using Bowtie (13). We calculated the normalized ChIP-seq read counts within DHSs and ±300 bp flanking the DHSs. The DHS, gene expression and histone modification data sets were integrated using SQLAlchemy, which is an object-relational mapper (ORM) for the Python programming language. This allowed us to build a virtual object database and import the data to a MySQL database server. The PlantDHS utilizes the Web Server Gateway Interface (WSGI), which facilitates an interaction between the server and the application, and is the most preferred interface for python-based web programming. For the construction of PlantDHS, we used HTML5 (http://www.w3.org/TR/html5/), CSS (http://www.w3.org/Style/CSS/), JavaScript, jQuery (https://jquery.com) and HighCharts (http://www.highcharts.com). The client side of the web application/interface is delivered via the apache2 HTTP server (http://httpd.apache.org) to provide modern web browsers on any platform, such as Mac OS, Windows, iOS, Android, Linux, etc. The server side was constructed using Flask (http://flask.pocoo.org), SQLAlchemy (http://www.sqlalchemy.org), MySQL (https://www.mysql.com) and JBrowse (14) on Ubuntu 14.04 LTS server (http://www.ubuntu.com).

CORE FUNCTIONALITY

The PlantDHS includes a total of five web pages within the navigational bar: Home, JBrowse, Genome, Download and Help (Figure 1A). The web application infrastructure provides a user-friendly interface that expedites exploring, visualizing and browsing DHSs information with relative ease. The Home page provides a quick search menu for immediate DHS browsing, basic help information and JBrowse quick links for species selection. JBrowse provides fast navigation, zooming and track selection functions for layering DHSs, TF binding sites and nucleosome positioning/occupancy features of the genome. The Genome page includes information regarding available data sets and a search menu for information about a specific species. Users can download the DHS data sets from the Download page. We supply GFF files, which contain the positional information of the DHSs. The CSV files contain DHS scores in different tissues and development states. The Help page contains links to the data sources, several demo/tutorial videos and information regarding website maintenance and version releases.
Figure 1.

Structure of PlantDHS database. (A) Navigation bar of PlantDHS home page. (B) Quick search box for finding DHSs for a region or gene of interest. (C) Species-specific search engine (Genome Tab). (D) DHS search results and links to JBrowse pages (red box). This page shows the gene name, reference genome, gene name synonyms, if there are DHSs associated with the gene or region of interest, and a short description of the gene. (E) Details of region and/or gene of interest. Top panel represents the table of DHSs in the intersected region, and more specifically the coordinates of DHSs. The ‘View’ links take the user to the histone modification and RNA-seq information for each DHS, which is displayed in Figure 2. Bottom panel is the JBrowser view of the region of interest. The purple box highlights a DHS (blue arrow), a SEP3-binding site, a AP1-binding site, and two positioned nucleosomes that present in leaf tissue but miss in flower tissue (blue double arrows), which are all located upstream of the SUPERMAN gene (AT3G23130).

Structure of PlantDHS database. (A) Navigation bar of PlantDHS home page. (B) Quick search box for finding DHSs for a region or gene of interest. (C) Species-specific search engine (Genome Tab). (D) DHS search results and links to JBrowse pages (red box). This page shows the gene name, reference genome, gene name synonyms, if there are DHSs associated with the gene or region of interest, and a short description of the gene. (E) Details of region and/or gene of interest. Top panel represents the table of DHSs in the intersected region, and more specifically the coordinates of DHSs. The ‘View’ links take the user to the histone modification and RNA-seq information for each DHS, which is displayed in Figure 2. Bottom panel is the JBrowser view of the region of interest. The purple box highlights a DHS (blue arrow), a SEP3-binding site, a AP1-binding site, and two positioned nucleosomes that present in leaf tissue but miss in flower tissue (blue double arrows), which are all located upstream of the SUPERMAN gene (AT3G23130).
Figure 2.

Histone modification, RNA-seq and DHS score information page. (A) Mean and max DHS scores in different tissues and developmental stages associated with a single DHS. (B) Normalized histone modification scores within and around (flanking) the DHS. (C) Expression levels of SUPERMAN (AT3G23130) and two nearby genes (within 10 kb of DHS) in flower and leaf tissues. All three genes are not expressed in leaf tissue.

A major function of PlantDHS is mapping putative CREs that are likely binding sites of unknown regulatory proteins. PlantDHS allows for browsing and visualization of histone modification, gene expression and nucleosome positioning around possible regulatory factor binding sites. To begin investigating, users will start by entering a gene ID or by specifying genomic coordinates for the species of interest (Figure 1B and C). A successful gene ID search will yield a page displaying the gene name, reference genome, gene name synonyms and a description of the gene (Figure 1D). To retrieve DHS information for the target gene or region from this page, the user can click the ‘GoTo’ button under ‘Search DHSs’ (red box in Figure 1D). This action will display detailed DHSs information on JBrowse for target gene or region (Figure 1E); specifically, at the top of this page, the user can find a list of DHSs considered CREs associated with unknown regulators (Figure 1E). The bottom of this page shows the JBrowse window, which contains all listed DHSs (Figure 1E). Within the JBrowse page, users can integrate DHS genomic coordinates with DHS scores, transcription factor binding sites identified by ChIP-seq or ChIP-chip, and nucleosome positioning/occupancy information. Utilizing the ‘Available Tracks’ menu on the left hand side, the user can select multiple layers of available data to add or subtract to the JBrowse window. For plotting DHS-related data such as DHS scores, histone modification and expression levels of the DHS-associated genes, users can either click ‘DHSs’ in the JBrowse window or the ‘View’ button in the DHSs list (red box in Figure 1E). There are three panels in the DHS-related information plot view page. The DHS score panel (Figure 2A) represents the mean and max score of the DHS in different tissues. The histone modification panel (Figure 2B) displays normalized scores for various histone modifications within the DHSs; where ‘flanking’ represents the normalized score of the histone modification ±300 bp flanking the DHS peak. The gene expression panel (Figure 2C) included RNA-seq-based expression data for all genes located within ±10 kb of the DHS's midpoint. By looking at these three plot panels, users can easily visualize DHS scores along with gene expression and histone modification data for different tissues or developmental stages simultaneously. Histone modification, RNA-seq and DHS score information page. (A) Mean and max DHS scores in different tissues and developmental stages associated with a single DHS. (B) Normalized histone modification scores within and around (flanking) the DHS. (C) Expression levels of SUPERMAN (AT3G23130) and two nearby genes (within 10 kb of DHS) in flower and leaf tissues. All three genes are not expressed in leaf tissue.

EXAMPLE

To further demonstrate the utility of our database, we examined the collaborative effect of these data sets for the well-characterized floral regulator gene, SUPERMAN, in A. thaliana. A DHS (TAIR_chr3:8241330–8241426, indicated by a blue arrow in Figure 1E) upstream of the transcription start site (TSS) of SUPERMAN lies congruently with ChIP-predicted binding sites of floral MADS box transcription factors AP1 and SEP3. The DNase-seq (DHS) read depth at this position in the floral tissue was significantly higher compared to the leaf sample, suggesting that this particular DHS is specific to floral tissues (Figures 1E and 2A). In addition, there is a depletion of nucleosome occupancy at this site in the flower tissue compared with the leaf tissue, indicative of open chromatin and thereby increasing accessibility to transcription factors, and subsequent downstream consequences. Of the three genes within 10 kb surrounding TAIR10_Chr3:8241330–8241426, SUPERMAN is the most expressed gene in flower tissue, whereas none of the three genes are expressed in the leaf tissue (Figure 2C). Interestingly, the other florally expressed gene downstream of SUPERMAN is the hormone-induced developmental tissue regulator, UPRIGHT ROSETTE (15). Furthermore, there is an enrichment of the repressive H3K27me3 histone modification in leaf tissue, compared with flower tissue (Figure 2B). Several additional DHSs can be detected at both upstream and downstream of the SUPERMAN gene. Only one of these additional DHSs overlapped with predicted binding sites of AP1 and SEP3 (Figure 1E). These DHSs are potential CREs that may regulate SUPERMAN and/or its neighboring genes. Taken together, these results corroborate the TF binding site data and reveal the complex regulation for the SUPERMAN gene.

DATA SOURCE

TAIR10 (https://www.arabidopsis.org) (16), TIGR7 (http://rice.plantbiology.msu.edu) (17) and MIPS1.2 (http://www.brachypodium.org) (18) were used as the reference genomes of A. thaliana, rice and B. distachyon, respectively. We included a total of seven DHS libraries: Arabidopsis (Col-0) leaf, Arabidopsis (Col-0) flower, Arabidopsis (Col-0) ddm1 mutant (deficient in DNA methylation1) leaf, ddm1 mutant flower (6), rice (Nipponbare) seedling tissue, rice (Nipponbare) callus (7) and B. distachyon (BD21) seedling tissue. RNA-seq libraries included: Arabidopsis leaf, Arabidopsis flower (6), rice seedlings, rice callus (19) and B. distachyon leaf (20). Histone modification data sets contain: H3K27me3, H3K27ac and H3K4me1 from Arabidopsis leaf and flower tissues; and H3K36me3, H3K4me3, H3K9me2, H4K12ac, H3K9ac, H3K4me2 and H3K27me3 from rice leaf tissue (7,21). Histone modification was calculated in two distinct regions: (i) ±300 bp flanking the full DHS and (ii) exclusively within the DHS. The nucleosome positioning data includes: rice leaf, Arabidopsis leaf and Arabidopsis flower (22,23). Arabidopsis transcription factor data sets: AGL-15 (24), AP1 (25), AP3 (26), BES1 (27), EIN3 (28), ERF115 (29), FHY3 (30), FLC (31), FLM (32), FUS3 (33), GL1 (34), GL3 (34), GTL1 (35), LFY (36), PI (26), PIF3 (37), PIF4 (38), PIF5 (39), PRR5 (40), PRR7 (41), SEP3 (42), SMZ (43), SOC1 (44), TOC1 (45) and WUS (46). All Arabidopsis transcription factor binding site information was downloaded from http://bioinformatics.psb.ugent.be/cig_data/RegNet/ (47).

DISCUSSION AND FUTURE DIRECTIONS

Identification of CREs in plants has been mainly dependent on bioinformatic and computational predictions (48). Several algorithms and bioinformatic tools have been developed to identify CREs in plants. Most of these tools were established either exclusively based on analysis of DNA sequences from the upstream regions of genes (49,50), or based on identification of co-expressed genes in different tissues or/and under the same biotic or abiotic stress, followed by sequence/motif analysis of the presumed upstream regulatory regions of the co-expressed genes (48). These tools and the established databases have been valuable to the plant research community. However, since these prediction tools have mainly focused on promoter regions, the vast majority of other types of CREs, including enhancers, are missed in these predictions. DNase I hypersensitivity is a universal mark for all active CREs (51). For example, more than 90% of the SEP3-binding and AP1-binding sites detected by ChIP-seq were covered by DHSs (6). Thus, DHSs provide corroborative information of TF-binding sites predicted based on the classical ChIP-seq method. PlantDHS provides a platform that allows predicting of all potential CREs associated with specific plant genes. The position of the promoter of a plant gene can be readily predicted based on various tools and databases developed by the plant research community. By contrast, plant enhancers have proved to be difficult to identify, which is due to the fact that enhancers can be located at various positions relative to a specific gene. DHSs located outside of the promoter regions are putative enhancers. We recently examined the function of several intergenic DHSs in A. thaliana using the β-glucuronidase gene reporter. Enhancer function was found to be associated with of more than 70% of these candidates (52). This result confirmed the power of mapping CREs using DHSs. We plan to maintain and improve the PlantDHS by adding additional DHS data sets, including those from additional plant species and from model plant species grown under various stress conditions. Epigenomic data sets will also be added in the database. We are currently developing a genome-wide enhancer map in A. thaliana based largely on the DHS information (52). The enhancer information will be integrated into PlantDHS, which will be one of our near future goals.
  52 in total

1.  Transcriptional control of a plant stem cell niche.

Authors:  Wolfgang Busch; Andrej Miotk; Federico D Ariel; Zhong Zhao; Joachim Forner; Gabor Daum; Takuya Suzaki; Christoph Schuster; Sebastian J Schultheiss; Andrea Leibfried; Silke Haubeiss; Nati Ha; Raquel L Chan; Jan U Lohmann
Journal:  Dev Cell       Date:  2010-05-18       Impact factor: 12.270

Review 2.  The 'dark matter' in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin.

Authors:  Jiming Jiang
Journal:  Curr Opin Plant Biol       Date:  2015-01-24       Impact factor: 7.834

3.  Genome-Wide Prediction and Validation of Intergenic Enhancers in Arabidopsis Using Open Chromatin Signatures.

Authors:  Bo Zhu; Wenli Zhang; Tao Zhang; Bao Liu; Jiming Jiang
Journal:  Plant Cell       Date:  2015-09-15       Impact factor: 11.277

4.  FLOWERING LOCUS C (FLC) regulates development pathways throughout the life cycle of Arabidopsis.

Authors:  Weiwei Deng; Hua Ying; Chris A Helliwell; Jennifer M Taylor; W James Peacock; Elizabeth S Dennis
Journal:  Proc Natl Acad Sci U S A       Date:  2011-04-04       Impact factor: 11.205

5.  F-Seq: a feature density estimator for high-throughput sequence tags.

Authors:  Alan P Boyle; Justin Guinney; Gregory E Crawford; Terrence S Furey
Journal:  Bioinformatics       Date:  2008-09-10       Impact factor: 6.937

6.  Identification of direct targets of FUSCA3, a key regulator of Arabidopsis seed development.

Authors:  Fangfang Wang; Sharyn E Perry
Journal:  Plant Physiol       Date:  2013-01-11       Impact factor: 8.340

7.  Temperature-dependent regulation of flowering by antagonistic FLM variants.

Authors:  David Posé; Leonie Verhage; Felix Ott; Levi Yant; Johannes Mathieu; Gerco C Angenent; Richard G H Immink; Markus Schmid
Journal:  Nature       Date:  2013-09-25       Impact factor: 49.962

8.  A systems approach reveals regulatory circuitry for Arabidopsis trichome initiation by the GL3 and GL1 selectors.

Authors:  Kengo Morohashi; Erich Grotewold
Journal:  PLoS Genet       Date:  2009-02-27       Impact factor: 5.917

9.  AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors.

Authors:  Ramana V Davuluri; Hao Sun; Saranyan K Palaniswamy; Nicole Matthews; Carlos Molina; Mike Kurtz; Erich Grotewold
Journal:  BMC Bioinformatics       Date:  2003-06-23       Impact factor: 3.169

10.  Dynamics of chromatin accessibility and gene regulation by MADS-domain transcription factors in flower development.

Authors:  Alice Pajoro; Pedro Madrigal; Jose M Muiño; José Tomás Matus; Jian Jin; Martin A Mecchia; Juan M Debernardi; Javier F Palatnik; Salma Balazadeh; Muhammad Arif; Diarmuid S Ó'Maoiléidigh; Frank Wellmer; Pawel Krajewski; José-Luis Riechmann; Gerco C Angenent; Kerstin Kaufmann
Journal:  Genome Biol       Date:  2014-03-03       Impact factor: 13.583

View more
  22 in total

1.  Genome-Wide Characterization of DNase I-Hypersensitive Sites and Cold Response Regulatory Landscapes in Grasses.

Authors:  Jinlei Han; Pengxi Wang; Qiongli Wang; Qingfang Lin; Zhiyong Chen; Guangrun Yu; Chenyong Miao; Yihang Dao; Ruoxi Wu; James C Schnable; Haibao Tang; Kai Wang
Journal:  Plant Cell       Date:  2020-05-29       Impact factor: 11.277

2.  Genome-Wide Transcription Factor Binding in Leaves from C3 and C4 Grasses.

Authors:  Steven J Burgess; Ivan Reyna-Llorens; Sean R Stevenson; Pallavi Singh; Katja Jaeger; Julian M Hibberd
Journal:  Plant Cell       Date:  2019-08-19       Impact factor: 11.277

3.  A bipartite transcription factor module controlling expression in the bundle sheath of Arabidopsis thaliana.

Authors:  Patrick J Dickinson; Jana Kneřová; Marek Szecówka; Sean R Stevenson; Steven J Burgess; Hugh Mulvey; Anne-Maarit Bågman; Allison Gaudinier; Siobhan M Brady; Julian M Hibberd
Journal:  Nat Plants       Date:  2020-11-23       Impact factor: 15.793

Review 4.  ChIP-ping the branches of the tree: functional genomics and the evolution of eukaryotic gene regulation.

Authors:  Georgi K Marinov; Anshul Kundaje
Journal:  Brief Funct Genomics       Date:  2018-03-01       Impact factor: 4.241

5.  Ancient duons may underpin spatial patterning of gene expression in C4 leaves.

Authors:  Ivan Reyna-Llorens; Steven J Burgess; Gregory Reeves; Pallavi Singh; Sean R Stevenson; Ben P Williams; Susan Stanley; Julian M Hibberd
Journal:  Proc Natl Acad Sci U S A       Date:  2018-02-05       Impact factor: 11.205

6.  pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines.

Authors:  Shanxin Zhang; Minjun Chang; Zhiping Zhou; Xiaofeng Dai; Zhenghong Xu
Journal:  Mol Genet Genomics       Date:  2018-03-29       Impact factor: 3.291

7.  Transcriptional competition shapes proteotoxic ER stress resolution.

Authors:  Dae Kwan Ko; Federica Brandizzi
Journal:  Nat Plants       Date:  2022-05-16       Impact factor: 17.352

Review 8.  The gymnastics of epigenomics in rice.

Authors:  Aditya Banerjee; Aryadeep Roychoudhury
Journal:  Plant Cell Rep       Date:  2017-09-02       Impact factor: 4.570

9.  Characterization of Arabidopsis thaliana Promoter Bidirectionality and Antisense RNAs by Inactivation of Nuclear RNA Decay Pathways.

Authors:  Axel Thieffry; Maria Louisa Vigh; Jette Bornholdt; Maxim Ivanov; Peter Brodersen; Albin Sandelin
Journal:  Plant Cell       Date:  2020-03-25       Impact factor: 11.277

10.  LncRNA GUARDIN suppresses cellular senescence through a LRP130-PGC1α-FOXO4-p21-dependent signaling axis.

Authors:  Xuedan Sun; Rick Francis Thorne; Xu Dong Zhang; Miao He; Jinming Li; Shanshan Feng; Xiaoying Liu; Mian Wu
Journal:  EMBO Rep       Date:  2020-03-09       Impact factor: 8.807

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.