Literature DB >> 26981355

p53 transcriptional programs in B cells upon exposure to genotoxic stress in vivo: Computational analysis of next-generation sequencing data.

Claudia Tonelli¹, Bruno Amati², Marco J Morelli³.

Abstract

The transcriptional programs activated by p53 in B cells in vivo following exposure to ionizing radiation were studied through the integrated analysis of various types of next-generation sequencing data: genome-wide profiling of p53 binding sites, mapping of histone marks and open chromatin regions and quantification of gene expression. Moreover, the binding of p53 was associated to a series of specific motifs on the DNA, which were directly inferred from the data. Here, we describe in detail the computational analysis of the datasets associated with our study (Tonelli et al., Oncotarget 6 (2015), 24611-26), deposited in the GEO archive (accession code GSE71180), and we provide the R scripts needed to generated the figures of the paper.

Entities: CellLine Chemical Disease Gene Species

Keywords: ChIP-Seq; Genotoxic stress; Motif analysis; RNA-Seq; p53

Year: 2015 PMID： 26981355 PMCID： PMC4778592 DOI： 10.1016/j.gdata.2015.11.006

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Direct link to deposited data

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE71180.

Experimental design, materials and methods

The GEO submission SuperSeries GSE71180, associated with the Tonelli et al. study [1], contains a total of 32 NGS samples, divided in three series: GSE71175, containing 6 ChIP-Seq samples (5 ChIP against p53 and one Input); GSE71176, containing 24 RNA-Seq samples (4 conditions with 2 replicates each for the p53 KO cells, 4 conditions with 4 replicates each for the C57/Bl6 cells); GSE71177, containing a DNase-Seq sample and the corresponding input. The datasets are summarized in Table 1.

Table 1

Summary of the 32 samples available in the GSE71180 SuperSeries.

Sample ID	Sample name	Replicate	Data type
GSM1828855	p53.wt.Bcells.mock	1/1	ChIP-Seq
GSM1828856	p53.wt.Bcells.IR	1/1	ChIP-Seq
GSM1828857	p53.wt.nonBcells.mock	1/1	ChIP-Seq
GSM1828858	p53.wt.nonBcells.IR	1/1	ChIP-Seq
GSM1828859	p53.null.spleen.IR	1/1	ChIP-Seq
GSM1828860	Input	1/1	ChIP-Seq
GSM1828861	p53.null.Bcells.mock.1	1/2	RNA-Seq
GSM1828862	p53.null.Bcells.mock.2	2/2	RNA-Seq
GSM1828863	p53.null.nonBcells.mock.1	1/2	RNA-Seq
GSM1828864	p53.null.nonBcells.mock.2	2/2	RNA-Seq
GSM1828865	p53.null.Bcells.IR.1	1/2	RNA-Seq
GSM1828866	p53.null.Bcells.IR.2	2/2	RNA-Seq
GSM1828867	p53.null.nonBcells.IR.1	1/2	RNA-Seq
GSM1828868	p53.null.nonBcells.IR.2	2/2	RNA-Seq
GSM1828869	p53.wt.Bcells.mock.1	1/4	RNA-Seq
GSM1828870	p53.wt.Bcells.mock.2	2/4	RNA-Seq
GSM1828871	p53.wt.Bcells.mock.3	3/4	RNA-Seq
GSM1828872	p53.wt.Bcells.mock.4	4/4	RNA-Seq
GSM1828873	p53.wt.nonBcells.mock.1	1/4	RNA-Seq
GSM1828874	p53.wt.nonBcells.mock.2	2/4	RNA-Seq
GSM1828875	p53.wt.nonBcells.mock.3	3/4	RNA-Seq
GSM1828876	p53.wt.nonBcells.mock.4	4/4	RNA-Seq
GSM1828877	p53.wt.Bcells.IR.1	1/4	RNA-Seq
GSM1828878	p53.wt.Bcells.IR.2	2/4	RNA-Seq
GSM1828879	p53.wt.Bcells.IR.3	3/4	RNA-Seq
GSM1828880	p53.wt.Bcells.IR.4	4/4	RNA-Seq
GSM1828881	p53.wt.nonBcells.IR.1	1/4	RNA-Seq
GSM1828882	p53.wt.nonBcells.IR.2	2/4	RNA-Seq
GSM1828883	p53.wt.nonBcells.IR.3	3/4	RNA-Seq
GSM1828884	p53.wt.nonBcells.IR.4	4/4	RNA-Seq
GSM1828885	p53.wt.Bcells.DNaseI	1/1	DNase-Seq
GSM1828886	Input.DNaseI	1/1	DNase-Seq

These samples allowed studying the genomic occupancy and the transcriptional changes induced by p53 activation in B and non-B cells in vivo, following DNA damage produced by ionizing radiation. Cells from p53 null mice were analyzed to define the p53-dependent response.

Data analysis

We complement the methods of the original publication and the instructions deposited in the GEO archive with the source code used to produce the Figures from the Next-Generation Sequencing (NGS) data files. Under the accession number GSE71180, we provided the raw data files (sequencing reads, in fastq format), plus a series of processed data files: for the ChIP-Seq and DNase-Seq samples (excluding the inputs), we supplied the locations of the bound genomic regions in BED format, as obtained with the MACS [2] peak caller (v. 2.0.9), while for the RNA-Seq samples, we provided the quantification of the expression of each gene, i.e. the number of reads assigned to every gene, normalized to the gene length and to the total number of reads aligned on any exon of any gene. We called this quantification exonic RPKM, or eRPKM, to distinguish it for the conventional normalization of read counts to the total number of aligned reads (anywhere on the genome). Most information needed to produce the figures is already available in the processed data, with the exception of four fields for the ChIP-Seq peaks: 1) annotation, 2) enrichment, 3) summit and 4) motif annotation. Here, we provide the complete resources needed to reproduce the figures of the main paper, and the instructions to generate the missing information. Finally, the genomic regions associated with previously published histone modifications [3], [4] are also attached for convenience.

Analysis environment

Data analysis was entirely performed in R, the widely-used open-source environment for statistical computing and data analysis. The main package used for the analysis is CompEpiTools v1.2.6 [5], which is part of the BioConductor project [6] and it can be installed from the URL http://www.bioconductor.org/packages/release/bioc/html/compEpiTools.html. CompEpiTools is a flexible and user-friendly package to perform basic analyses of NGS data.

Description of the source files

The source code TonelliEtAl2015_sourceCode.zip is composed of 5 files and 2 directories, described below: filemapping_GEO.R This file contains the links between the R objects used to produce the figures and the files deposited on the GEO archive, listed in Table 1. In particular, ChIP-Seq BED files are converted to GRanges and gene expression quantifications are organized in a data frame. This code also arranges in a list ChIP-Seq alignment (BAM) files, which must be obtained from the raw sequencing files (fastq) following the instructions deposited on the GEO archive. analysisEnvironment.R This R script loads all the data files needed to produce the figures of the main paper [1], and contains all the libraries and functions invoked in the R scripts contained in the file TonelliEtAl2015_Figures.R. In particular, the environment requires the following libraries: compEpiTools, gplots, VennDiagram, lattice, flashClust, TxDb.Mmusculus.UCSC.mm9.knownGene, and org.Mm.eg.db (see sessionInfo.log). TonelliEtAl2015_Figures.R This R script sources the instructions contained in the file analysisEnvironment.R to load the pre-generated data objects associated with the main paper [1], and contains the code used to produce all the figures referring to the computational analyses of NGS data. Occasionally, some figures require computing tag density on genomic intervals, and therefore require alignment (BAM) files: in these cases, a pre-computed table was used. prepareDatasets.R This collection of R scripts allows complementing the processed ChIP-Seq files available on GEO with the extra fields required for the generation of the final figures. The output of these scripts is contained in the ChIPpeaks.rds datafile in the data directory, under the form of a list of genomic ranges, which is automatically loaded in the analysisEnvironment.R script. The scripts contained in prepareDatasets.R use several external tools (MEME [7], TOMTOM [8] FIMO [9]), the mm9 reference genome, available in Bioconductor in the library BSgenome.Mmusculus.UCSC.mm9 (v. 1.4.0), and may require a consistent amount of time (6–24 h) to complete, depending on the platform used. In order to execute the scripts, the processed ChIP-Seq files should first be downloaded from GEO and organized according to the instructions contained in the filemapping.R file. Subsequently, alignment (BAM) files must be generated from the raw fastq files deposited in GEO, following the instructions on the archive. The extra fields consist in: the genomic annotation of the p53 ChIP-Seq peaks: a peak overlapping with a [− 5 kb, + 2 kb] window around a standard promoter is considered “promoter”, those overlapping with an H3K4me1 peak, “enhancers”, otherwise they are classified as “distal”; the enrichment of the peak: computed with the GRenrichment function in the compEpiTools suite; the summit of the peak: computed with the GRcoverageSummit function in the compEpiTools suite; the motif annotation of the peak, which is obtained through five main steps: i) the generation of a FASTA file containing the sequences of the top 1000 enriched genomic regions spanned by the peaks of the p53.wt.Bcells.IR sample; ii) the estimation of the unspaced p53 motif from these sequences using MEME [7] (we verify that the estimated motif coincides with the p53 canonical motif contained in the Jaspar Core Vertebrata database [10] using TOMTOM [8]); iii) the creation of the motifs with spacers, obtained by inserting sequences with constant probability over the 4 nucleotides (spacers) between the two half decameric sites; iv) the scoring of these motifs against the mouse genome with FIMO [9]; v) the assignment of the motifs to the ChIP-Seq peaks. SessionInfo.log A log file containing the output of the R sessionInfo() command, specifying all the versions of all the libraries used in the analysis environment. data folder This folder contains all the R objects needed to produce the figures of the main paper. figures folder This folder contains all the figures of the main paper, in pdf format, obtained by running the scripts contained in TonelliEtAl2015_Figures.R.

Specifications
Organism/cell line/tissue	Mouse (C57/Bl6 B cells and non-B cells; p53KO B cells and non-B cells)
Sex	Not applicable
Sequencer or array type	Illumina Hi-Seq 2000
Data format	Raw and analyzed
Experimental factors	Spleens from C57/Bl6 and p53KO mice were collected 4 h after exposure to 7 Gy total body irradiation and from a control cohort of mice. After pressing the spleens through nylon cell strainers and hypotonic lysis of red blood cells, the cell suspensions were incubated with B220 MicroBeads (Miltenyi Biotec) and B cells were enriched by magnetic cell sorting (MACS), according to the manufacturer's instructions (Miltenyi Biotec). The remaining fraction constituted the non-B cell populations used in this study.
Experimental features	Previously described cell types were used for ChIP-Seq (for p53), RNA-Seq and DNase-Seq experiments.
Consent	n/a
Sample source location	Milan, Italy

10 in total

1. Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Authors: T L Bailey; C Elkan
Journal: Proc Int Conf Intell Syst Mol Biol Date: 1994

2. FIMO: scanning for occurrences of a given motif.

Authors: Charles E Grant; Timothy L Bailey; William Stafford Noble
Journal: Bioinformatics Date: 2011-02-16 Impact factor: 6.937

3. Bioconductor: open software development for computational biology and bioinformatics.

Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583

4. Selective transcriptional regulation by Myc in cellular growth control and lymphomagenesis.

Authors: Arianna Sabò; Theresia R Kress; Mattia Pelizzola; Stefano de Pretis; Marcin M Gorski; Alessandra Tesi; Marco J Morelli; Pranami Bora; Mirko Doni; Alessandro Verrecchia; Claudia Tonelli; Giovanni Fagà; Valerio Bianchi; Alberto Ronchi; Diana Low; Heiko Müller; Ernesto Guccione; Stefano Campaner; Bruno Amati
Journal: Nature Date: 2014-07-09 Impact factor: 49.962

5. Quantifying similarity between motifs.

Authors: Shobhit Gupta; John A Stamatoyannopoulos; Timothy L Bailey; William Stafford Noble
Journal: Genome Biol Date: 2007 Impact factor: 13.583

6. Selective transcriptional regulation by Myc: Experimental design and computational analysis of high-throughput sequencing data.

Authors: Mattia Pelizzola; Marco J Morelli; Arianna Sabò; Theresia R Kress; Stefano de Pretis; Bruno Amati
Journal: Data Brief Date: 2015-02-12

2 in total

Review 1. Retrotransposon-derived p53 binding sites enhance telomere maintenance and genome protection.

Authors: Paul M Lieberman
Journal: Bioessays Date: 2016-08-19 Impact factor: 4.345

2. p53-dependent induction of P2X7 on hematopoietic stem and progenitor cells regulates hematopoietic response to genotoxic stress.

Authors: Lin Tze Tung; HanChen Wang; Jad I Belle; Jessica C Petrov; David Langlais; Anastasia Nijnik
Journal: Cell Death Dis Date: 2021-10-08 Impact factor: 8.469