Literature DB >> 26981355

p53 transcriptional programs in B cells upon exposure to genotoxic stress in vivo: Computational analysis of next-generation sequencing data.

Claudia Tonelli1, Bruno Amati2, Marco J Morelli3.   

Abstract

The transcriptional programs activated by p53 in B cells in vivo following exposure to ionizing radiation were studied through the integrated analysis of various types of next-generation sequencing data: genome-wide profiling of p53 binding sites, mapping of histone marks and open chromatin regions and quantification of gene expression. Moreover, the binding of p53 was associated to a series of specific motifs on the DNA, which were directly inferred from the data. Here, we describe in detail the computational analysis of the datasets associated with our study (Tonelli et al., Oncotarget 6 (2015), 24611-26), deposited in the GEO archive (accession code GSE71180), and we provide the R scripts needed to generated the figures of the paper.

Entities:  

Keywords:  ChIP-Seq; Genotoxic stress; Motif analysis; RNA-Seq; p53

Year:  2015        PMID: 26981355      PMCID: PMC4778592          DOI: 10.1016/j.gdata.2015.11.006

Source DB:  PubMed          Journal:  Genom Data        ISSN: 2213-5960


Direct link to deposited data

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE71180.

Experimental design, materials and methods

The GEO submission SuperSeries GSE71180, associated with the Tonelli et al. study [1], contains a total of 32 NGS samples, divided in three series: GSE71175, containing 6 ChIP-Seq samples (5 ChIP against p53 and one Input); GSE71176, containing 24 RNA-Seq samples (4 conditions with 2 replicates each for the p53 KO cells, 4 conditions with 4 replicates each for the C57/Bl6 cells); GSE71177, containing a DNase-Seq sample and the corresponding input. The datasets are summarized in Table 1.
Table 1

Summary of the 32 samples available in the GSE71180 SuperSeries.

Sample IDSample nameReplicateData type
GSM1828855p53.wt.Bcells.mock1/1ChIP-Seq
GSM1828856p53.wt.Bcells.IR1/1ChIP-Seq
GSM1828857p53.wt.nonBcells.mock1/1ChIP-Seq
GSM1828858p53.wt.nonBcells.IR1/1ChIP-Seq
GSM1828859p53.null.spleen.IR1/1ChIP-Seq
GSM1828860Input1/1ChIP-Seq
GSM1828861p53.null.Bcells.mock.11/2RNA-Seq
GSM1828862p53.null.Bcells.mock.22/2RNA-Seq
GSM1828863p53.null.nonBcells.mock.11/2RNA-Seq
GSM1828864p53.null.nonBcells.mock.22/2RNA-Seq
GSM1828865p53.null.Bcells.IR.11/2RNA-Seq
GSM1828866p53.null.Bcells.IR.22/2RNA-Seq
GSM1828867p53.null.nonBcells.IR.11/2RNA-Seq
GSM1828868p53.null.nonBcells.IR.22/2RNA-Seq
GSM1828869p53.wt.Bcells.mock.11/4RNA-Seq
GSM1828870p53.wt.Bcells.mock.22/4RNA-Seq
GSM1828871p53.wt.Bcells.mock.33/4RNA-Seq
GSM1828872p53.wt.Bcells.mock.44/4RNA-Seq
GSM1828873p53.wt.nonBcells.mock.11/4RNA-Seq
GSM1828874p53.wt.nonBcells.mock.22/4RNA-Seq
GSM1828875p53.wt.nonBcells.mock.33/4RNA-Seq
GSM1828876p53.wt.nonBcells.mock.44/4RNA-Seq
GSM1828877p53.wt.Bcells.IR.11/4RNA-Seq
GSM1828878p53.wt.Bcells.IR.22/4RNA-Seq
GSM1828879p53.wt.Bcells.IR.33/4RNA-Seq
GSM1828880p53.wt.Bcells.IR.44/4RNA-Seq
GSM1828881p53.wt.nonBcells.IR.11/4RNA-Seq
GSM1828882p53.wt.nonBcells.IR.22/4RNA-Seq
GSM1828883p53.wt.nonBcells.IR.33/4RNA-Seq
GSM1828884p53.wt.nonBcells.IR.44/4RNA-Seq
GSM1828885p53.wt.Bcells.DNaseI1/1DNase-Seq
GSM1828886Input.DNaseI1/1DNase-Seq
These samples allowed studying the genomic occupancy and the transcriptional changes induced by p53 activation in B and non-B cells in vivo, following DNA damage produced by ionizing radiation. Cells from p53 null mice were analyzed to define the p53-dependent response.

Data analysis

We complement the methods of the original publication and the instructions deposited in the GEO archive with the source code used to produce the Figures from the Next-Generation Sequencing (NGS) data files. Under the accession number GSE71180, we provided the raw data files (sequencing reads, in fastq format), plus a series of processed data files: for the ChIP-Seq and DNase-Seq samples (excluding the inputs), we supplied the locations of the bound genomic regions in BED format, as obtained with the MACS [2] peak caller (v. 2.0.9), while for the RNA-Seq samples, we provided the quantification of the expression of each gene, i.e. the number of reads assigned to every gene, normalized to the gene length and to the total number of reads aligned on any exon of any gene. We called this quantification exonic RPKM, or eRPKM, to distinguish it for the conventional normalization of read counts to the total number of aligned reads (anywhere on the genome). Most information needed to produce the figures is already available in the processed data, with the exception of four fields for the ChIP-Seq peaks: 1) annotation, 2) enrichment, 3) summit and 4) motif annotation. Here, we provide the complete resources needed to reproduce the figures of the main paper, and the instructions to generate the missing information. Finally, the genomic regions associated with previously published histone modifications [3], [4] are also attached for convenience.

Analysis environment

Data analysis was entirely performed in R, the widely-used open-source environment for statistical computing and data analysis. The main package used for the analysis is CompEpiTools v1.2.6 [5], which is part of the BioConductor project [6] and it can be installed from the URL http://www.bioconductor.org/packages/release/bioc/html/compEpiTools.html. CompEpiTools is a flexible and user-friendly package to perform basic analyses of NGS data.

Description of the source files

The source code TonelliEtAl2015_sourceCode.zip is composed of 5 files and 2 directories, described below: filemapping_GEO.R This file contains the links between the R objects used to produce the figures and the files deposited on the GEO archive, listed in Table 1. In particular, ChIP-Seq BED files are converted to GRanges and gene expression quantifications are organized in a data frame. This code also arranges in a list ChIP-Seq alignment (BAM) files, which must be obtained from the raw sequencing files (fastq) following the instructions deposited on the GEO archive. analysisEnvironment.R This R script loads all the data files needed to produce the figures of the main paper [1], and contains all the libraries and functions invoked in the R scripts contained in the file TonelliEtAl2015_Figures.R. In particular, the environment requires the following libraries: compEpiTools, gplots, VennDiagram, lattice, flashClust, TxDb.Mmusculus.UCSC.mm9.knownGene, and org.Mm.eg.db (see sessionInfo.log). TonelliEtAl2015_Figures.R This R script sources the instructions contained in the file analysisEnvironment.R to load the pre-generated data objects associated with the main paper [1], and contains the code used to produce all the figures referring to the computational analyses of NGS data. Occasionally, some figures require computing tag density on genomic intervals, and therefore require alignment (BAM) files: in these cases, a pre-computed table was used. prepareDatasets.R This collection of R scripts allows complementing the processed ChIP-Seq files available on GEO with the extra fields required for the generation of the final figures. The output of these scripts is contained in the ChIPpeaks.rds datafile in the data directory, under the form of a list of genomic ranges, which is automatically loaded in the analysisEnvironment.R script. The scripts contained in prepareDatasets.R use several external tools (MEME [7], TOMTOM [8] FIMO [9]), the mm9 reference genome, available in Bioconductor in the library BSgenome.Mmusculus.UCSC.mm9 (v. 1.4.0), and may require a consistent amount of time (6–24 h) to complete, depending on the platform used. In order to execute the scripts, the processed ChIP-Seq files should first be downloaded from GEO and organized according to the instructions contained in the filemapping.R file. Subsequently, alignment (BAM) files must be generated from the raw fastq files deposited in GEO, following the instructions on the archive. The extra fields consist in: the genomic annotation of the p53 ChIP-Seq peaks: a peak overlapping with a [− 5 kb, + 2 kb] window around a standard promoter is considered “promoter”, those overlapping with an H3K4me1 peak, “enhancers”, otherwise they are classified as “distal”; the enrichment of the peak: computed with the GRenrichment function in the compEpiTools suite; the summit of the peak: computed with the GRcoverageSummit function in the compEpiTools suite; the motif annotation of the peak, which is obtained through five main steps: i) the generation of a FASTA file containing the sequences of the top 1000 enriched genomic regions spanned by the peaks of the p53.wt.Bcells.IR sample; ii) the estimation of the unspaced p53 motif from these sequences using MEME [7] (we verify that the estimated motif coincides with the p53 canonical motif contained in the Jaspar Core Vertebrata database [10] using TOMTOM [8]); iii) the creation of the motifs with spacers, obtained by inserting sequences with constant probability over the 4 nucleotides (spacers) between the two half decameric sites; iv) the scoring of these motifs against the mouse genome with FIMO [9]; v) the assignment of the motifs to the ChIP-Seq peaks. SessionInfo.log A log file containing the output of the R sessionInfo() command, specifying all the versions of all the libraries used in the analysis environment. data folder This folder contains all the R objects needed to produce the figures of the main paper. figures folder This folder contains all the figures of the main paper, in pdf format, obtained by running the scripts contained in TonelliEtAl2015_Figures.R.
Specifications
Organism/cell line/tissueMouse (C57/Bl6 B cells and non-B cells; p53KO B cells and non-B cells)
SexNot applicable
Sequencer or array typeIllumina Hi-Seq 2000
Data formatRaw and analyzed
Experimental factorsSpleens from C57/Bl6 and p53KO mice were collected 4 h after exposure to 7 Gy total body irradiation and from a control cohort of mice. After pressing the spleens through nylon cell strainers and hypotonic lysis of red blood cells, the cell suspensions were incubated with B220 MicroBeads (Miltenyi Biotec) and B cells were enriched by magnetic cell sorting (MACS), according to the manufacturer's instructions (Miltenyi Biotec). The remaining fraction constituted the non-B cell populations used in this study.
Experimental featuresPreviously described cell types were used for ChIP-Seq (for p53), RNA-Seq and DNase-Seq experiments.
Consentn/a
Sample source locationMilan, Italy
  10 in total

1.  Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Authors:  T L Bailey; C Elkan
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1994

2.  FIMO: scanning for occurrences of a given motif.

Authors:  Charles E Grant; Timothy L Bailey; William Stafford Noble
Journal:  Bioinformatics       Date:  2011-02-16       Impact factor: 6.937

3.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

4.  Selective transcriptional regulation by Myc in cellular growth control and lymphomagenesis.

Authors:  Arianna Sabò; Theresia R Kress; Mattia Pelizzola; Stefano de Pretis; Marcin M Gorski; Alessandra Tesi; Marco J Morelli; Pranami Bora; Mirko Doni; Alessandro Verrecchia; Claudia Tonelli; Giovanni Fagà; Valerio Bianchi; Alberto Ronchi; Diana Low; Heiko Müller; Ernesto Guccione; Stefano Campaner; Bruno Amati
Journal:  Nature       Date:  2014-07-09       Impact factor: 49.962

5.  Quantifying similarity between motifs.

Authors:  Shobhit Gupta; John A Stamatoyannopoulos; Timothy L Bailey; William Stafford Noble
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

6.  Selective transcriptional regulation by Myc: Experimental design and computational analysis of high-throughput sequencing data.

Authors:  Mattia Pelizzola; Marco J Morelli; Arianna Sabò; Theresia R Kress; Stefano de Pretis; Bruno Amati
Journal:  Data Brief       Date:  2015-02-12

7.  Genome-wide analysis of p53 transcriptional programs in B cells upon exposure to genotoxic stress in vivo.

Authors:  Claudia Tonelli; Marco J Morelli; Salvatore Bianchi; Luca Rotta; Thelma Capra; Arianna Sabò; Stefano Campaner; Bruno Amati
Journal:  Oncotarget       Date:  2015-09-22

8.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles.

Authors:  Elodie Portales-Casamar; Supat Thongjuea; Andrew T Kwon; David Arenillas; Xiaobei Zhao; Eivind Valen; Dimas Yusuf; Boris Lenhard; Wyeth W Wasserman; Albin Sandelin
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

9.  Model-based analysis of ChIP-Seq (MACS).

Authors:  Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal:  Genome Biol       Date:  2008-09-17       Impact factor: 13.583

10.  methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.

Authors:  Kamal Kishore; Stefano de Pretis; Ryan Lister; Marco J Morelli; Valerio Bianchi; Bruno Amati; Joseph R Ecker; Mattia Pelizzola
Journal:  BMC Bioinformatics       Date:  2015-09-29       Impact factor: 3.169

  10 in total
  2 in total

Review 1.  Retrotransposon-derived p53 binding sites enhance telomere maintenance and genome protection.

Authors:  Paul M Lieberman
Journal:  Bioessays       Date:  2016-08-19       Impact factor: 4.345

2.  p53-dependent induction of P2X7 on hematopoietic stem and progenitor cells regulates hematopoietic response to genotoxic stress.

Authors:  Lin Tze Tung; HanChen Wang; Jad I Belle; Jessica C Petrov; David Langlais; Anastasia Nijnik
Journal:  Cell Death Dis       Date:  2021-10-08       Impact factor: 8.469

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.