Literature DB >> 27656414

Fine resolution mapping of double-strand break sites for human ribosomal DNA units.

Bernard J Pope¹, Khalid Mahmood¹, Chol-Hee Jung¹, Daniel J Park².

Abstract

DNA breakage arises during a variety of biological processes, including transcription, replication and genome rearrangements. In the context of disease, extensive fragmentation of DNA has been described in cancer cells and during early stages of neurodegeneration (Stephens et al., 2011 Stephens et al. (2011) [5]; Blondet et al., 2001 Blondet et al. (2001) [1]). Stults et al. (2009) Stults et al. (2009) [6] reported that human rDNA gene clusters are hotspots for recombination and that rDNA restructuring is among the most common chromosomal alterations in adult solid tumours. As such, analysis of rDNA regions is likely to have significant prognostic and predictive value, clinically. Tchurikov et al. (2015a, 2016) Tchurikov et al. (2015a, 2016) [7], [9] have made major advances in this direction, reporting that sites of human genome double-strand breaks (DSBs) occur frequently at sites in rDNA that are tightly linked with active transcription - the authors used a RAFT (rapid amplification of forum termini) protocol that selects for blunt-ended sites. They reported the relative frequency of these rDNA DSBs within defined co-ordinate 'windows' of varying size and made these data (as well as the relevant 'raw' sequencing information) available to the public (Tchurikov et al., 2015b). Assay designs targeting rDNA DSB hotspots will benefit greatly from the publication of break sites at greater resolution. Here, we re-analyse public RAFT data and make available rDNA DSB co-ordinates to the single-nucleotide level.

Entities: CellLine Chemical Disease Species

Keywords: Double-strand breaks; Forum domains; Fragile sites; HEK293T; rDNA

Year: 2016 PMID： 27656414 PMCID： PMC5021761 DOI： 10.1016/j.gdata.2016.08.012

Source DB: PubMed Journal: Genom Data ISSN： 2213-5960

Direct link to deposited data

https://figshare.com/s/1d21827bb891461845cc

Experimental design, materials and methods

Sequencing data

The FASTQ file for Illumina Genome Analyzer IIx run accession SRR944107 was downloaded from http://www.ebi.ac.uk/ena/data/view/SRR944107, having sourced the accession code via http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49302. The origin of these data has been reported previously [8]. Briefly, HEK 293T cells were suspended in 1% (w/v) low-melt agarose prior to lysis. DNA was then fractionated by gel electrophoresis and collected by electroelution. Free DNA ends (sites of DSBs) were ligated to a double-stranded biotinylated adapter oligonucleotide before digestion with the restriction endonuclease Sau3AI. DSB site-containing termini were phase-purified using streptavidin paramagnetic particles prior to Sau3AI site adapter ligation and PCR amplification. PCR products were ligated to Illumina adapters such that they had the potential to be represented in either orientation. Library fragments of ~200–400 bp (insert plus adapter and PCR primer sequences) were band isolated from an agarose gel and the purified DNA was subjected to Illumina Genome Analyzer IIx sequencing.

Data processing

Fig. 1 provides a schematic representation of our bioinformatic analysis pipeline. In the first step, we produced custom software [raft_fastq_parse.py] to produce a modified representation of SRR944107.fastq, SRR944107_cleaned.fastq. We will describe this tool in more detail and make it available to the public via a forthcoming publication. Briefly, it filters reads based on the observation of expected arrangements of adapter sequences. Reads exhibiting evidence of ligation artefacts or insufficient evidence of expected adapter sequences were removed. Reads oriented with the DSB site towards the start were processed to remove adapter sequence(s). Reads with the DSB site oriented distally were also processed to remove adapter sequence(s) and were transformed to be represented with the DSB site at the start. Reads harbouring the Sau3AI site towards the start but without both adapters being evident were removed because the DSB site could not be defined. Reads were retained if the library insert was greater than or equal to 25 nucleotides in length.

Fig. 1

Schematic illustration of our bioinformatic analysis pipeline to derive counts of DSBs in human rDNA by co-ordinate.

The human rDNA sequence U13369.1.fa was indexed using BWA (version 0.7.5a) [2] using the command: bwa index -a is U13369.1.fa U13369.1 Reads of the transformed FASTQ file were then mapped to U13369.1 using BWA, thus: bwa mem U13369.1.fa SRR944107_cleaned.fastq SRR944107_cleaned_rDNA.sam Samtools (version 1.3.1) [3] was used to convert from SAM file format to BAM file format and to sort the resulting BAM file with the following command: samtools view -u SRR944107_cleaned_rDNA.sam | samtools sort -@ 4 -o SRR944107.rDNA.sort.bam Bedtools (version 2.17.0) [4] was then employed to produce a BED file representing the mapping, including CIGAR string information and mapping orientation, with the following command: bedtools bamtobed -cigar -i SRR944107.rDNA.sort.bam > SRR944107.rDNA.bed We then produced custom software [raft_bed_parse.py] to further filter the data and to count the number of times DSBs were observed at co-ordinates in rDNA U13369.1. We will include further details of this and make it available to the public along with [raft_fastq_parse.py]. Briefly, this tool assesses the orientation of mapping for each read. Because we presented the DSB at the beginning of each read prior to mapping, we can determine the exact location of the DSB at the single nucleotide level for each read. This tool also performs additional filtering steps. Reads that mapped in the ‘+’ orientation (sequence was compared directly to the ‘sense’ strand of U13369.1 to result in successful mapping) were tested for evidence of ‘clipping’ at the start of the cigar string. If such clipping was evident, the read did not contribute to DSB recording - instead, the would-be DSB signal was treated as likely artefactual in nature. Similarly, reads that mapped in the ‘-’ orientation (reverse complemented sequence mapped successfully to the ‘sense’ strand of U13369.1) were treated as likely artefactual if the cigar string showed clipping at the end. The most frequent single nucleotide-resolved sites of rDNA DSBs derived from this analysis fall within the hotspot windows reported previously [8]. Fig. 2 illustrates the frequency of DSBs at single nucleotide resolution in the hotspot-enriched rDNA region between co-ordinates 30,000 and 40,000 of rDNA. Fig. 3 depicts such sites across the entire length of human rDNA U13369.1.

Fig. 2

Histogram depicting counts of DSBs by co-ordinate based on mapping to human rDNA U13369.1.fa. Results are shown for co-ordinates 30,000 to 40,000, in which regions the highest counts of DSBs for rDNA occur.

Fig. 3

Circos plot showing counts of DSBs by co-ordinate based on mapping to human rDNA U13369.1.fa.

Discussion

Here, we present sites contributing to DSBs in human rDNA, among the most fragile regions of the genome, at single nucleotide resolution. The most frequent sites for rDNA DSBs concur with previously reported genomic windows [7], [8]. Given the relevance of these sites to cancer and other diseases, it is likely that these new data will prove to be useful for the development of predictive, diagnostic and prognostic assays of clinical importance [1], [5], [6], [9]. The fine resolution of such sites will allow better targeted and, therefore, cost-effective approaches to assessing an individual's level of genomic ‘scarring’. This will likely be informative in stratifying risks for particular types of cancer and determining likely responses to particular cancer treatments, for example. Such information will be extremely important in informing preventative screening programs and best matching patients to the most likely effective treatments, with enormous potential to reduce morbidity and mortality. Our methods will be applicable to future RAFT datasets derived from the use of a variety of restriction endonucleases, alone or in combinations (instead of the sole use of Sau3AI). Analysis of such data will improve accuracy incrementally by reducing bias in the form of restriction endonuclease site positioning effects – the locations of these sites relative to DSB sites determines the library insert size distribution, which, in turn, affects relative amplification efficiency and the ability to map reads accurately. In future experiments, the use of paired-end sequencing will assist in locating DSB sites regardless of the orientation of the insert for a given library element.

9 in total

1. Transient massive DNA fragmentation in nervous system during the early course of a murine neurodegenerative disease.

Authors: B Blondet; A Aït-Ikhlef; M Murawsky; F Rieger
Journal: Neurosci Lett Date: 2001-06-15 Impact factor: 3.046

2. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

3. BEDTools: a flexible suite of utilities for comparing genomic features.

Authors: Aaron R Quinlan; Ira M Hall
Journal: Bioinformatics Date: 2010-01-28 Impact factor: 6.937

4. Human rRNA gene clusters are recombinational hotspots in cancer.

Authors: Dawn M Stults; Michael W Killen; Erica P Williamson; Jon S Hourigan; H David Vargas; Susanne M Arnold; Jeffrey A Moscow; Andrew J Pierce
Journal: Cancer Res Date: 2009-11-17 Impact factor: 12.701

5. Massive genomic rearrangement acquired in a single catastrophic event during cancer development.

Authors: Philip J Stephens; Chris D Greenman; Beiyuan Fu; Fengtang Yang; Graham R Bignell; Laura J Mudie; Erin D Pleasance; King Wai Lau; David Beare; Lucy A Stebbings; Stuart McLaren; Meng-Lay Lin; David J McBride; Ignacio Varela; Serena Nik-Zainal; Catherine Leroy; Mingming Jia; Andrew Menzies; Adam P Butler; Jon W Teague; Michael A Quail; John Burton; Harold Swerdlow; Nigel P Carter; Laura A Morsberger; Christine Iacobuzio-Donahue; George A Follows; Anthony R Green; Adrienne M Flanagan; Michael R Stratton; P Andrew Futreal; Peter J Campbell
Journal: Cell Date: 2011-01-07 Impact factor: 41.582

6. Mapping of genomic double-strand breaks by ligation of biotinylated oligonucleotides to forum domains: Analysis of the data obtained for human rDNA units.

Authors: N A Tchurikov; O V Kretova; D M Fedoseeva; V R Chechetkin; M A Gorbacheva; A A Karnaukhov; G I Kravatskaya; Y V Kravatsky
Journal: Genom Data Date: 2014-11-12