Literature DB >> 21172017

Evidences showing wide presence of small genomic aberrations in chronic lymphocytic leukemia.

Yeong C Kim1, Yong-Chul Jung, Jun Chen, Ali H Alhasan, Parawee Kaewsaard, Yanming Zhang, Shuo Ma, Steve Rosen, San Ming Wang.   

Abstract

BACKGROUND: Chronic lymphocytic leukemia (CLL) is the most common adult leukemia in the western population. Although genetic factors are considered to contribute to CLL etiology, at present genomic aberrations identified in CLL are limited compared with those identified in other types of leukemia, which raises the question of the degree of genetic influence on CLL. We performed a high-resolution genome scanning study to address this issue.
FINDINGS: Using the restriction paired-end-based Ditag Genome Scanning technique, we analyzed three primary CLL samples at a kilobase resolution, and further validated the results in eight primary CLL samples including the two used for ditag collection. From 51,632 paired-end tags commonly detected in the three CLL samples representing 5% of the HindIII restriction fragments in the genomes, we identified 230 paired-end tags that were present in all three CLL genomes but not in multiple normal human genome reference sequences. Mapping the full-length sequences of the fragments detected by these unmapped tags in seven additional CLL samples confirmed that these are the genomic aberrations caused by small insertions and deletions, and base changes spreading across coding and non-coding regions.
CONCLUSIONS: Our study identified hundreds of loci with insertion, deletion, base change, and restriction site polymorphism present in both coding and non-coding regions in CLL genomes, indicating the wide presence of small genomic aberrations in chronic lymphocytic leukemia. Our study supports the use of a whole genome sequencing approach for comprehensively decoding the CLL genome for better understanding of the genetic defects in CLL.

Entities:  

Year:  2010        PMID: 21172017      PMCID: PMC3016268          DOI: 10.1186/1756-0500-3-341

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


Findings

CLL (Chronic lymphocytic leukemia) is an incurable disease mainly affecting the B cell lineage in the western population, with a median age of diagnosis of 72 year old [1]. Determining the cause of CLL is crucial for understanding the acquisition and for clinical diagnosis, treatment and prognosis of CLL. Genetic factors have been linked to the etiology of CLL. Cytogenetic analyses identified chromosomal abnormalities including del11q23 affecting the ATM gene, tri12, del 13q14, and del17p13 affecting TP53 gene [2]. In addition, CGH studies found gains and losses in Xp11.2-p21 and Xq21-qter [3]. Molecular studies identified three genes: IgVH, CD38 and ZAP-70 that correlate with CLL prognosis [4-6]. A CLL-specific microRNA signature was also identified, suggesting that microRNA deletion could be involved in CLL [7]. SNP array studies identified 2q21.2, 6p22.1 and 18q21.1 abnormalities that follow a Mendelian inheritance pattern [8]. Whole genome association studies also identified multiple loci at 2q37.3, 8q24.21, 15q21.3 and 16q24.1 that appear to be associated with genetic susceptibility to CLL [9]. Although evidence supports the involvement of genetic factors in CLL, the frequency of genomic aberrations identified in CLL is relatively lower than those observed in the leukemias affecting other types of hematopoietic lineages [10]. This information suggests that the CLL genome is relatively intact with fewer aberrations than other types of leukemia. Alternatively, more genomic aberrations may exist in CLL but these could mainly be small lesions in the CLL genome that are difficult to detect using conventional technologies due to their limited resolution. With the rapid progress of genome sequencing technologies, enthusiasm is increasing for pursuing comprehensive detection of genomic aberrations in cancer by sequencing cancer genomes. In the case of CLL, a critical issue is to know the degree of genomic aberrations in order to justify the use of whole genome sequencing approach to analyze CLL genome. We reasoned if we can scan certain CLL genomes at sufficient high resolution and at reasonable genome coverage, we should gain first-hand information to estimate the degree of genomic aberrations in CLL. We recently developed the DGS (Ditag Genome Scanning) technique that uses next-generation DNA sequencing technologies to collect paired-end sequences from restriction DNA fragments across a genome [11]. Using this technique, we analyzed CLL genomes. Nine samples of peripheral blood from untreated CLL patients diagnosed in Northwestern University Lurie Cancer Center and University of Chicago Medical Center were used in this study, of which three were used for paired-end tag collection, and eight including two used in paired-end tag collection were used for full-length sequencing analysis (Additional file 1: Supplemental Table S1). Informed consent was made by the patients, and the use of clinical CLL samples was approved by the institutional review board of University of Chicago and Northwestern University following institutional guidelines. The detailed experimental process followed the published protocol [11] and outlined in Figure 1. Briefly, mononuclear cells were isolated from each CLL peripheral blood or bone marrow sample by using NycoPrep™ A solution (Axis-Shield). Human genomic DNA was extracted from mononuclear cells by using QIAamp DNA Blood Kit (QIAGEN) following the manufacturer's protocol. To generate the DGS library, genomic DNA was fractionated by HindIII restriction digestion. The restriction fragments were dephosphorylated by CIP and cloned into pDGS-HindIII vector that contains two MmeI sites next to the HindIII cloning site. The genomic library was digested by MmeI to release two tags from the cloned DNA fragments. The tag-vector-tag fragments were then gel-purified, and re-ligated to form a ditag library. Ditags were released from the vectors by HindIII digestion, gel-purified, and concatemerized by using T4 DNA ligase (Promega). The concatemers at 200 to 500 bps were agarose-gel-purified and used for ditag sequencing by using a 454 GS20 sequencer (454 Life Sciences). Ditags were extracted from the resulting sequences based on the HindIII sites. Same ditags were combined to generate a unique ditag with the corresponding copy numbers.
Figure 1

Outline of the experimental process. Genomic DNA samples were digested by restriction enzymes. Ditags (paired-end tags) were collected from both ends of restriction fragments and sequenced. The ditag sequences were compared to known human reference genome sequences. The unmapped ditags were used as sense and antisense PCR primers to amplify their original DNA fragments to generate full-length sequences. The sequences were mapped to reference genome sequences to determine the type of genomic aberrations.

Outline of the experimental process. Genomic DNA samples were digested by restriction enzymes. Ditags (paired-end tags) were collected from both ends of restriction fragments and sequenced. The ditag sequences were compared to known human reference genome sequences. The unmapped ditags were used as sense and antisense PCR primers to amplify their original DNA fragments to generate full-length sequences. The sequences were mapped to reference genome sequences to determine the type of genomic aberrations. To generate the reference ditag database, virtual HindIII restriction fragments were generated from known human genomic sequences. Two 16-bp virtual tags were extracted from the 5' and the 3' ends of each virtual fragment, and connected to form a reference ditag representing the virtual DNA fragment. The following sequences were used to extract the reference ditags: 1. Human genome reference sequences (hg18): http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/ 2. Human dbSNP 126: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606 3. Chimpanzee genome reference sequences (PanTro2): http://hgdownload.cse.ucsc.edu/goldenPath/panTro2/bigZips/ 4. Human GM15510 fosmid paired-end sequences: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?&cmd=retrieve&val=CENTER_PROJECT%20%3D%20%22G248%22&size=0&retrieve=Submit 5. Celera human genome sequences: http://www.ncbi.nlm.nih.gov/genomeprj/1431 6. Venter genome sequences: ftp://ftp.ncbi.nih.gov/pub/TraceDB/Personal_Genomics/Venter/ 7. Watson genome sequences: ftp://ftp.ncbi.nih.gov/pub/TraceDB/Personal_Genomics/Watson/ 8. Reference ditags were also extracted from HindIII fragments of E.coli K12 genome sequences to eliminate the ditags from E. coli DNA contaminated during library construction process. Initial ditag mapping was performed with perfect match between experimental ditags and hg18 reference ditags. For the unmapped experimental ditags, a single-base mismatch in each single tag of the ditag was allowed to compensate for possible sequencing error or SNP. To identify the unmapped ditags related with homopolymer generated by 454 sequencing chemistry, the unmapped ditags with more than two homo-bases were stretched, e.g. AAA -> AAAA, or shortened, e.g. AAA -> AA, and mapped to reference ditags again. For the ditags remaining unmapped, they were mapped to the reference ditags of other sequence sources in the ditag reference database. The ditags remaining unmapped after these processes were defined as the unmapped ditags. Unmapped ditag sequences were used to design sense primers and antisense (reverse/complementary) primers, with four extra bases CAGC added to the 5' end of sense primer and CGCC to the 5' end of antisense primer. Genomic DNA digested by HindIII was used as the templates for PCR amplification. PCR was performed with 35 cycles at 95°C 30 sec, 57°C 60 sec, and 72°C 3 min, followed by extension at 72°C for 10 min. The amplified products in each reaction were cloned into pGEM-T vector (Promega), transformed into E. coli TOP10 (Invitrogen), and plated in a single well of the 48-well Qtrays (Genetix). Four clones from each transformation were amplified by colony-PCR using M13F and M13R primers, and sequenced by Big-Dye Terminator v3.1 Cycle Sequencing Kit (ABI) using M13F primer. For the sequences that did not reach the full-length, second sequencing reactions were performed using M13R primer. To determine the genomic aberrations, each full-length sequence was mapped to hg18 using BLAT at a minimum of 90% identity as the cut-off. The paired-end ditags were collected from three CLL samples. Genomic DNA from each sample was fractionated by HindIII digestion, which provides 3,561-bp resolution on average across the genome based on hg18 sequences [11]. Unique paired-end ditags of 272,193, 320,283, and 307,547 was collected from each CLL sample, covering 32%, 34% and 38% HindIII fragments in each CLL genome respectively. Comparing the three ditag sets shows that between 87,968 and 108,579 ditags are present between two CLL samples, and 51,632 ditags are commonly present in all three CLL samples (Table 1A). The ditags present only in individual CLL sample could be the ditags representing individual genomic differences, the ditags potentially originating from experimental artifacts, or ditags detected in one but not in others due to unsaturated ditag collection in each CLL under the sequencing scale. The 51,632 ditags detected in all three CLL samples cover 5% of genomic DNA fragments commonly detected in the three CLL genomes. In order to provide high confidence for further downstream studies, we focused on the 51,632 common ditags for further mapping analysis. We compared the 51,632 common ditags with multiple known human genome sequences, including the human genome reference sequence hg18, human SNP, human GM15510 genome sequences, chimpanzee genome sequences that are highly homologous to the humans, Watson genome sequences, and Venter genome sequences. Of the 51,632 ditags used for the mapping, 98.3% (50,799) map to hg18 that represent normal genomic fragments in the CLL genomes, 0.4% (230) are unmapped ditags that represent potential genomic aberrations commonly present in all three CLL genomes, and the remaining ditags map to other genomes that represent normal genome variations (Table 1B).
Table 1

Paired-end tags collected from three CLL samples

A. Ditag distribution in three CLL samples
CLL sample (%)

123

Total sequence reads231,941321,290268,124
Total ditags623,539859,836700,991
Unique ditag272,188 (100)320,278 (100)307,542 (100)
Ditags common in two99,81599,815
87,96887,968
108,579108,579
Ditags common in three51,632 (19)51,632 (16)51,632 (17)

B. Mapping ditags to reference human genome sequences

Mapped (%)Unmapped (%)

Total common ditags51,632 (100)
HG1850,799 (98.3)833 (1.7)
Homopolymer22816
Chimpanzee genome195616
Other human Genomes386*230
 GM15510 genome28
 Celera genome165
 Venter genome352
 Watson genome61

Total unmapped230 (0.4)

*These mapped to multiple genomes was counted only once.

Paired-end tags collected from three CLL samples *These mapped to multiple genomes was counted only once. To determine the types of genomic aberrations for the unmapped ditags, we generated full-length sequence for the restriction DNA fragment detected by the unmapped ditags by using the "ditag-PCR" method, in which the ditag sequences were used as PCR sense and antisense primers to amplify the original DNA fragment that derived the unmapped ditag. We performed 192 reactions in eight CLL samples including two used in ditag collection and six additional CLL samples. Under the conditions that a full-length sequence must be longer than 50 bases and detected at least in the CLL used in ditag collection or at least in two additional CLL samples, 220 full-length sequences were generated from 100 unmapped ditags. Mapping the full-length sequences to hg18 identified different types of genomic aberrations caused by insertion, deletion and base change. Many of these aberrations created new HindIII restriction site that leads to the release of unmapped ditag, or the change of ditag sequence composition that prevents ditag mapping. These aberrations were observed in both coding and non-coding regions in CLL genome. For example, aberrations were detected in exons of NEK8, RUNX1 and MUC2 genes, and introns of 20 other genes (Table 2A, Additional file 2: Supplementary table S2). NEK8 encodes a member of the serine/threonine protein kinase family, which plays a role in cell cycle progression from G2 to M phase and is over-expressed in breast cancer [12]. A 353-base sequence converted from the unmapped ditag AAGCTTACCCTCTGGACGCCTGTATGAAGCTT maps to the last exon (Exon 15) coding for the 3' UTR of NEK8. Two HindIII restriction sites were inserted in the sequence that are not present in the wild-type NEK8 gene. RUNX1 is a gene involved in AML through its involvement in the t(8;21) [13]. A 434-base full-length sequence from a ditag AAGCTTCGGCCTATAG/ACAACCTAACAAGCTT was detected in all eight CLL samples, and maps to intron 3 and exon 4 of RUNX1. Analyzing the mapped region shows a T to C single-base change between the sequence and exon 4 of RUNX1 gene. Searching dbSNP reveals that this is a SNP (rs1235270). Due to the uncertainty of RUNX1 protein coding sequence itself, it is not certain if this germline SNP causes a coding amino acid change. Several bases are also changed in the mapped intron 3 of RUNX1 gene. These base changes raise an interesting question whether RUNX1 could be involved in CLL. MUC2 is a member of the MUCIN family, which codes for high molecular weight glycoproteins. The abnormalities of MUC2 is linked with colorectal and pancreas cancer [14]. A 410-base sequence derived from an unmapped ditag AAGCTTCCGGTCGGCTTCGCAGTAGAAAGCTT covers intron 29, exon 30 and intron 30 of MUC2 gene. This sequence also contains two HindIII restriction sites AAGCTT inserted at both its ends that do not exist in the wild-type MUC2 gene. Only three aberrations were detected in the exon of three known genes. This could be attributed to the limited genome coverage of the study and the low percentage of the exon-coding sequences in the genome. With increased genome coverage, it would be possible to identify the aberrations affecting more exons.
Table 2

Aberrations in exon and intron, and only present in CLL

DitagFull-length (bp)*CLL Sample**Chr.LocationAberrationGene


12345678ExonIntron
A. Aberrations affecting exon and intron
AAGCTTACCCTCTGGACGCCTGTATGAAGCTT353+-------17q11.224093147-24093483InsertionNEK8
268++------19p13.210331021-10331266InsertionTYK2
AAGCTTCGGCCTATAGACAACCTAACAAGCTT434+++++++-21q22.1235949034-35949460Base changeRUNX1
AAGCTTCCGGTCGGCTTCGCAGTAGAAAGCTT410-+-+----11p15.51084558-1084956InsertionMUC2MUC2
AAGCTTGAGGGTGGAGTTCCTTCTGTAAGCTT181-++-----2p11.287573183-87573358Base change, insertionBC070201
AAGCTTGGCCAGAGACAGGCATCATGAAGCTT116+-------10q22.173213666-73213767Base changeCDH23
AAGCTTGTGGACCACCGCTGTGAGTCAAGCTT300+++++-++20p12.36033711-6034006Base changeFERMT1
AAGCTTCATATGAGGATCAAAAACGAAAGCTT283++-+++--3p14.261015245-61015527Base changeFHIT
AAGCTTCTTTGTGATGCTCAGACATGAAGCTT332-+---+--11q23.3120263755-120264067Base changeGRIK4
AAGCTTAGATCAGTGAGCCTACGGCGAAGCTT605++++++-+16q22.269601759-69602345Base changeHYDIN
AAGCTTCGCCGTGGGCTCACTGATCTAAGCTT614+++-++++16q22.269601760-69602350Base change, insertionHYDIN
AAGCTTGCTGAACGCACCTGCGTGGAAAGCTT448+-+---++5q22.2112671099-112671541Base change, insertionMCC
AAGCTTGCTTCTTTGCTGATACTGGCAAGCTT582-++--+--12q24.31123563471-123564048Base change, insertionNCOR2
AAGCTTGGCGTCAATCCACACCAAAGAAGCTT349----++--5q33.1149870225-149870557InsertionNDST1
1306---+-+--8q22.299587880-99589168Base change, insertionSTK3
AAGCTTGAAATAAGTGCTGCATCCTGAAGCTT163-++-+---2q32.1182858618-182858771Base change, insertionPDE1A
AAGCTTTCCTAGGGAGCTGGGTGGTGAAGCTT638---++---17q25.168849242-68849864InsertionSDK2
211+-------9q34.11129692280-129692481Base changeST6GALNAC6
AAGCTTGCAGAAGGGGAGCCAGGGTGAAGCTT312--+---+-2p25.33382092-3382390InsertionTTC15
187+----++-13q12.1326098258-26098437Base change, insertionWASF3
AAGCTTCAGGAAAGTCCACTAGCAAAAAGCTT197+++++++-9p13.333972115-33972294InsertionUBAP2
112+-------9p13.333972115-33972294InsertionUBAP2
170+-------9p13.333972115-33972294InsertionUBAP2
AAGCTTTAATGACTGAGGGGTTCTCAAAGCTT1147+++-+-+-6q25.3157931989-157933131Base changeZDHHC14
AAGCTTTGAGAACCCCTCAGTCATTAAAGCTT1134++----+-6q25.3157931989-157933131Base changeZDHHC14
AAGCTTGCACAAGGGGCCCCTTGTGCAAGCTT691+-------4p16.32372796-2373473InsertionZFYVE28
B. Sequences only present in CLL genome
AAGCTTGATATCGTGATCACCTTAAGAAGCTT332-++----------
AAGCTTAGATAGAGCGCAGTCAACTGAAGCTT107++++++++-----
AAGCTTCCGGTCGGCTTCGCAGTAGAAAGCTT159++++-+-------
AAGCTTCTCATCCTTCACCTTGGTCGAAGCTT182***+------------
540-+----++-----
362------++-----
AAGCTTGAAAAAGGTTCAGGCAAACTAAGCTT84+------------
AAGCTTGCTGAACGCACCTGCGTGGAAAGCTT923--+--+-+-----
AAGCTTGGCGTCAATCCACACCAAAGAAGCTT347-++----+-----
AAGCTTTCTTGATAAGGCTCCTACGCAAGCTT250-+-----+-----

*A full-length sequence must be detected in at least one of the two CLL sampes used in ditag collection, or at least in one other CLL samples.

** Sample #1 and 2 were used for ditag collection.

***tag1 part map to chr1:56672307-56672331

Aberrations in exon and intron, and only present in CLL *A full-length sequence must be detected in at least one of the two CLL sampes used in ditag collection, or at least in one other CLL samples. ** Sample #1 and 2 were used for ditag collection. ***tag1 part map to chr1:56672307-56672331 Aberrations also affect the introns of multiple genes. FHIT encodes diadenosine 5',5'''-P1,P3-triphosphate hydrolase involved in purine metabolism [15]. It is located in the common fragile site FRA3B on chromosome 3, where carcinogen-induced damage can lead to translocations in several cancers. A 283-base sequence maps to intron 8 of FHIT gene but its tag 1 contains GA to TG change. HYDIN encodes an axonemal protein; mutation of HYDIN is related to congenital hydrocephalus [16]. Two full-length sequences of 605-bp and 614-bp from two different unmapped ditags were obtained from seven CLL samples. Both sequences map to 21st intron of HYDIN. The 605-bp sequence contains CCTACGGCG in its tag 2 converted from wild-type gCcACaGCa (lowercase refers to the changed base), and the 614-bp sequence contains CGCC converted from wild-type tGCt in its tag 1 and an internal insertion. NCOR2 is a transcriptional regulator that recruits histone deacetylases to promoters [17]. A 582-base sequence maps to intron 1 of NCOR2, but its tag 1 contains an AAGC insertion, and tag 2 contains a C to T change, an AG deletion, and a T insertion. TYK2 is a member of the JAK family involving in IFN-g, IL-6, IL-10 and IL-12 signaling. Mutation in this gene is associated with hyperimmunoglobulin E syndrome [18]. A 268-base sequence maps to intron 14 of TYK2 but its tag 1 contains an AAGCTTA insertion and its tag 2 contains a TGAAGCTT insertion. Both insertions create HindIII restriction sites that lead to the generation of the unmapped ditag. A 197-base sequence was detected in seven CLL samples and two different sequences of 112-base and 170-base were generated from the CLL used in ditag collection. All three sequences map to UBAP2 located at 9p13.3, a gene involved in the ubiquitination pathway [19]. For the 197-base sequence, its 178 bases map to intron 6 of UBAP2 gene and the remaining 18 bases have no map, whereas the 112-base and 170-base sequences contain different insertions. Although the aberrations in many of these genes have been correlated with different types of cancer, most have not been linked with CLL. Non-coding regions contribute to the majority of the genome, and contain important functional elements involving DNA replication, genome stability, regulation of gene expression, and coding for non-coding transcripts etc. Extensive characterization of non-coding region could provide rich candidate markers for clinical applications and identify the hotspots of genomic aberrations involving cancer development. A total of 37 sequences generated from 30 unmapped ditags mapped to the non-coding regions in the genome with various types of abnormalities (Table 3, Additional file 3: Supplemental Table S3). Although these loci are not directly located in the coding regions, many genes are located nearby the mapped locations. Of the 26 loci specifically mapped by the sequences, 15 have genes located either upstream, downstream or both within 100 kb distance. For example, a 614 base sequence maps to 5q35.1 between169443856-169444467, where DOCK2 is located 27,836 base upstream and FOXI1 is located 21028 downstream. A 398-base sequence maps to 15q26.1 between 88110782 and 88111168, where two homologous transcriptional factor genes, MESP1 and MESP2, are located 16,678-base upstream and 9,425-base downstream correspondingly. microRNA gene MIR663 are located 20,580 base upstream of 20p11.1 between 26157494-26158252 mapped by a 920-base sequence detected in seven CLL samples. Another microRNA gene MIR663B is located 10,964-base upstream of 2q21.2 between 132742087-132742356 mapped by a 290-base sequence, of which a non-coding RNA gene NCRNA00164 is located in between. The aberrations could affect the nearby genes through influencing the regulation of gene expression.
Table 3

Aberrations in the intergenic region

DitagFullL-length (bp)CLL samplechr.LocationAberrationNearby genes


12345678UpstreamDistance (bp)DownstreamDistance (bp)
AAGCTTACTTTCTCGGTTCCATTACTAAGCTT614--+--++-5q35.1169443856-169444467Base changeDOCK227,836FOXI121,028
AAGCTTAGCCGGGCATCCTCTTTCCTAAGCTT427------++17q25.168688801-68689216Base changeSSTR216,046COG111,552
398-++-----15q26.188110782-88111168Insertion, base changeMESP116,678MESP29,425
AAGCTTAGTTTGGCTGCATGAGACTGAAGCTT737--+----+16q23.174682385-74683114Base change
AAGCTTATGATGATCCCCTGAGCTAAAAGCTT358+-------1q23.3162264795-162265148Insertion, base change
264+-------5p15.332449552-2449774Insertion
AAGCTTCAACGATAGTTCATCATCATAAGCTT265-++-----16p13.3808762-808956Base changePRR254,900LMF134,679
AAGCTTCAATAGCCGAAGCCAAACTAAAGCTT556++--+++-12q1566234005-66234554Base changeDYRK294,467
AAGCTTCACTCAGTCATATGGCATGGAAGCTT130-----++-10q26.3133154300-133154419Insertion
AAGCTTCACTGCAGCTATAACACTGCAAGCTT920++++++-+20p11.126157494-26158252Insertion, base changeMIR66320,580
AAGCTTCCTCTGTACTCACATTAACGAAGCTT892-++--++-9q1267914079-67914963Base change
AAGCTTGAAATAAGTGCTGCATCCTGAAGCTT606+-------1q41220652161-220652760Insertion, base change
252--+----+20q13.1346025615-46025861Base change
AAGCTTGACTCATTGCGTCTTATTCTAAGCTT1060+-------9q22.3194477651-94478404InsertionIPPK5,283BICD235,062
AAGCTTGCACAAGGGGCCCCTTGTGCAAGCTT606+-------2q35220216964-220217558InsertionSLC4A316,038
AAGCTTGCAGAAGGGGAGCCAGGGTGAAGCTT553---+---+11q23.2115071517-115072057Insertion
AAGCTTGCTGAACGCACCTGCGTGGAAAGCTT758---++---13q12.330403963-30404714Insertion, base changeC13orf3325,635C13orf26120
602+++-+--+1q42.13227318140-227318732Insertion
AAGCTTGGAGCCCTAGCCACAATTGTAAGCTT1453-+--+++-13q21.3370285628-70287080Base change
AAGCTTGGCCAGAGACAGGCATCATGAAGCTT900+-++-+-+6q23.2135162469-135163363Base change
AAGCTTTCACTTCATTGGAGTCAGTGAAGCTT322+++-+++-13q14.1244367511-44367832Base changeNUFIP143,552
AAGCTTTCCTAGGGAGCTGGGTGGTGAAGCTT290--+---+-2q21.2132742087-132742356InsertionNCRNA00164MIR663B10,075; 10,964
120+-------4p15.3314280120-14280230Insertion, base change
AAGCTTTCCTTTTCCTTCTGCTCTTAAAGCTT1071+-+--+--6q27164534304-164535365Base change
AAGCTTTGCATTGGCAGAAGCCACCAAAGCTT1039-+----+-9q1269920049-69921093Base change
AAGCTTTTAAGGGATCATGCCTCTCCAAGCTT1534+-+-+++-1q21.2148015439-148016961Base changeFCGR1A3,951
AAGCTTACCCTCTGGACGCCTGTATGAAGCTT185+-------9q22.33100893873-100893975InsertionTGFBR113,258
Aberrations in the intergenic region One hundred and forty seven full-length sequences converted from 57 unmapped tags map to the highly repetitive sequences in the non-coding regions. Of these sequences, 110 sequences map to the ALR/Alpha satellite sequences of the centromere, and chromosome 2, 10, and 17 are among the most frequent ones (Table 4, Additional file 4: Supplemental Table S4): 23 sequences converted from 13 unmapped tags map to the centromere of chromosome 2 at 2p11.1, 41 sequences converted from 16 ditags map to the centromere of chromosome 10 at 10q11.1, and 22 sequences converted from 6 unmapped ditags map to the centromere of chromosome 17 at 17p11.1. The presence of highly frequent aberrations in ALR/Alpha satellite sequences in these three chromosomes suggests that these could be the hot spot of genomic aberrations in CLL. Aberrations in repetitive sequences have been shown to contribute to cancer development [20]. However, it is difficult to analyze the aberrations in these highly repetitive regions using the hybridization-based approach due to the difficulty to designing specific probes. Our results show that restriction sequencing-based approach provides a useful tool to study the aberrations in these regions.
Table 4

Aberrations in the centromere region

DitagFull-length (bp)CLL sampleChr.LocationSequence type

12345678
AAGCTTTCATTGGGATAACAGTGTTGAAGCTT562-++--+++2p11.1132722630-132722850ALR/Alpha
893-+----+-2p11.191677156-91682632ALR/Alpha
217++-+--+-2p11.191677835-91680254ALR/Alpha
AAGCTTTCCAGTTAAGCTTTCTGGGGAAGCTT1067+++++++-2p11.191257036-91258039
1002+-----+-2p11.191257036-91258039
AAGCTTCTTTATGAGGAACAGTGTTGAAGCTT216++++--+-2p11.191670531-91670746ALR/Alpha
896++-+----2p11.191670531-91686712ALR/Alpha
901+-----+-2p11.191670531-91686712ALR/Alpha
560+---+---2p11.191655191-91672448ALR/Alpha
2231----++--2p11.191670550-91684334ALR/Alpha
AAGCTTCTGAGAATGCCATCCCAATGAAGCTT686+-----+-2p11.191677155-91689898ALR/Alpha
AAGCTTATTTGAGATGAAAGGAGTAGAAGCTT1234++------2p11.191664565-91688410ALR/Alpha
726+++++++-2p11.191676309-91680254ALR/Alpha
AAGCTTCAACACTGTTGTTCCCAATGAAGCTT612-++--+--2p11.191676431-91689428ALR/Alpha
AAGCTTCAATGGGATGAAGAGTGTTGAAGCTT556+++-+++-2p11.191684461-91685014ALR/Alpha
894--+---+-2p11.191677156-91680254ALR/Alpha
AAGCTTCAATTGGGATAACAGTGTTGAAGCTT555---+-+++2p11.191677836-91680592ALR/Alpha
AAGCTTCATTAGGGATAACAGTGTTGAAGCTT555++---+--2p11.191677156-91677709ALR/Alpha
AAGCTTCATTGGGAACAACAGTGTTGAAGCTT269-+----+-2p11.191677155-91677709ALR/Alpha
AAGCTTCATTGGGATGGCATTCTCAGAAGCTT685+-------2p11.191674610-91684466ALR/Alpha
AAGCTTCTATTGGGATAACAGTGTTGAAGCTT556+++--+++2p11.191672232-91680592ALR/Alpha
893-+----++2p11.191653836-91682632ALR/Alpha
AAGCTTGACTCATTGCGTCTTATTCTAAGCTT1179-++++++-2p11.191031886-91033063ERVL-B4-int
AAGCTTAAAACTCCTTTATGAAAAGAAAGCTT637+-------10q11.141848823-41861037ALR/Alpha
AAGCTTAAACTCCGTGCATCAAAAGAAAGCTT689++----+-10q11.141718813-41720661ALR/Alpha
1407+-------10q11.141718474-41727608ALR/Alpha
601+-------10q11.141718888-41720661ALR/Alpha
AAGCTTAAACTTCTTGTATGAAAAGAAAGCTT2067-++---+-10q11.141847790-41864775ALR/Alpha
1023++------10q11.141718797-41729299ALR/Alpha
970+-------10q11.141850170-41864775ALR/Alpha
346+-----+-10q11.141718460-41719477ALR/Alpha
AAGCTTCAACGCTGCGCTATTGAAGGAAGCTT345-++-----10q11.141726415-41729786ALR/Alpha
860++++++++12p11.134724897-34729300ALR/Alpha
AAGCTTCAACTCTGTCCGCCTAAAGGAAGCTT175----+-+-10q11.141719301-41720316ALR/Alpha
AAGCTTCAACTCTGTGCATTGGCCTCAAGCTT279++++++++10q11.141849321-41850275ALR/Alpha
622++++-++-10q11.141849321-41858079ALR/Alpha
619-+---++-10q11.141849321-41861137ALR/Alpha
AAGCTTCAACTCTGTGCCGCTAAAGGAAGCTT344-++-++++10q11.141718623-41720492ALR/Alpha
2280+----+--10q11.141717944-41720994ALR/Alpha
1359+-------10q11.141717944-41729975ALR/Alpha
1190+-------10q11.141718623-41720492ALR/Alpha
681+-------10q11.141720147-41729975ALR/Alpha
AAGCTTCCTTCAGAAACAAGGAGTTTAAGCTT858+-+-+++-10q11.141718623-41720661ALR/Alpha
AAGCTTCCTTTTAGGCCACAGAGTTGAAGCTT348-++-----10q11.141719301-41720492ALR/Alpha
1029-++-----10q11.141847960-41861361ALR/Alpha
684+-----+-10q11.141718622-41721845ALR/Alpha
AAGCTTCCTTTTCATACAAGGAGTTTAAGCTT1498-+---+--10q11.141718461-41720661ALR/Alpha
AAGCTTCTTTTTCATGCAAGGAGTTTAAGCTT385+---+++-10q11.141718767-41719477ALR/Alpha
724+-------10q11.141718767-41720661ALR/Alpha
722+-------10q11.141847452-41858698ALR/Alpha
AAGCTTTCCTTTAGGCCACAGAGTTGAAGCTT1904-++-----10q11.141847960-41863582ALR/Alpha
345--++--+-10q11.141720491-41722352ALR/Alpha
AAGCTTTCTTTTTCATCAAGGAGTTTAAGCTT386+-------10q11.141720293-41720661ALR/Alpha
AAGCTTTGAAATCTCCCACCTAAAGGAAGCTT408+-+++++-10q11.141718623-41722413ALR/Alpha
750--++----10q11.141847563-41863586ALR/Alpha
1262+-------10q11.141718623-41720552ALR/Alpha
AAGCTTTTCTTTTCATCAAGGAGTTTAAGCTT1379--+---+-10q11.141718797-41720661ALR/Alpha
691--+--+--10q11.141718460-41720661ALR/Alpha
2041+-------10q11.141847790-41866145ALR/Alpha
832+-------10q11.141718460-41720661ALR/Alpha
347---+-++-10q11.141718460-41720661ALR/Alpha
1024++---++-10q11.141718117-41720661ALR/Alpha
AAGCTTTTGAGGCCAACACAGAGTTGAAGCTT620-+---+++10q11.141849321-41855026ALR/Alpha
281++------10q11.141849321-41850276ALR/Alpha
AAGCTTCCTGTGATGATTCGAGAGAGAAGCTT1419++--++--17p11.122175465-22179262ALR/Alpha
576-----+++17p11.122182601-22184019ALR/Alpha
1966+-+-----17p11.122175465-22186396ALR/Alpha
2314--+--+--17p11.122173089-22186396ALR/Alpha
234++++-+-+17p11.122170709-22172128ALR/Alpha
578+++--+-+17p11.122176307-22179262ALR/Alpha
1090-+----++17p11.122170709-22179262ALR/Alpha
AAGCTTTCTCTCTCGACATCACAGAGAAGCTT641+-------17p11.122178721-22179262ALR/Alpha
1389+-------17p11.122175464-22184019ALR/Alpha
1824+-------17p11.122175464-22181640ALR/Alpha
579-+--+++-17p11.122170709-22179262ALR/Alpha
1420-++-++--17p11.122180222-22184019ALR/Alpha
AAGCTTCTCTCTCGAACATCGCAGAGAAGCTT1091+----+-+17p11.122173083-22184019ALR/Alpha
749--+----+17p11.122183291-22184019ALR/Alpha
AAGCTTCTCTGAGATGTTCGAGAGAGAAGCTT579++++--++17p11.122181236-22184019ALR/Alpha
918+++-----17p11.122173087-22184019ALR/Alpha
407-+-+-+--17p11.122174101-22184019ALR/Alpha
406--+----+17p11.122175464-22176883ALR/Alpha
1087+-------17p11.122175459-22181640ALR/Alpha
AAGCTTCTGAGAATGCTTTTCTGAAAAAGCTT355++--+++-17p11.122184624-22184977ALR/Alpha
1037+-------17p11.122184624-22186848ALR/Alpha
AAGCTTTGAGACCTGTCTCAGAGTTGAAGCTT799+++-+++-17p11.121687309-21687527ALR/Alpha
Aberrations in the centromere region Ten full-length sequences generated from eight unmapped ditags did not map to known human genome sequences (Table 2B. Additional file 5: Supplementary table S5). For example, a 107-base full-length sequence converted from an unmapped ditag AAGCTTAGATAGAGCGCAGTCAACTGAAGCTT was detected in all eight CLL samples. However, it does not map to the reference genome sequences. These sequences represent the DNA contents present in CLL genomes but not in normal genomes. Through high-resolution scanning of three CLL genomes and verifying the results using full-length sequences and additional CLL genomes, our study provides evidence showing the wide presence of genomic aberrations in CLL, of which most are small lesions. Studies with increased number of CLL samples and at high genome coverage will be required to better understand the genetic aberrations in CLL. Although the study used multiple genomics databases to eliminate the changes from normal genomic polymorphism, further studies with normal DNA from the same patient will be required to fully distinguish somatic mutations from germline variations in CLL.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YK, YCJ, JC, AHA, PK performed laboratory work. YZ, SM, SR provided clinical samples and data analysis, SR, SW designed the experiment. SW wrote the paper. All authors read and approved the final manuscript.

Additional file 1

Supplementary table S1. CLL samples used for the study. Click here for file

Additional file 2

Supplementary table S2. Aberrations in exon and intron. Click here for file

Additional file 3

Supplementary table S3. Aberrations in the intergenic region. Click here for file

Additional file 4

Supplementary table S4. Aberrations in the repetitive region. Click here for file

Additional file 5

Supplementary table S5. Aberrations only present in CLL genomes. Click here for file
  20 in total

1.  Nek8, a NIMA family kinase member, is overexpressed in primary human breast tumors.

Authors:  Alex J Bowers; John F Boylan
Journal:  Gene       Date:  2004-03-17       Impact factor: 3.688

Review 2.  Chronic lymphocytic leukemia.

Authors:  Nicholas Chiorazzi; Kanti R Rai; Manlio Ferrarini
Journal:  N Engl J Med       Date:  2005-02-24       Impact factor: 91.245

Review 3.  Molecular cytogenetic analysis of B-cell chronic lymphocytic leukemia.

Authors:  S Stilgenbauer; K Döhner; M Bentz; P Lichter; H Döhner
Journal:  Ann Hematol       Date:  1998 Mar-Apr       Impact factor: 3.673

4.  The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway.

Authors:  K Hofmann; P Bucher
Journal:  Trends Biochem Sci       Date:  1996-05       Impact factor: 13.807

5.  Clinical significance of ZAP-70 protein expression in B-cell chronic lymphocytic leukemia.

Authors:  Maria Ilaria Del Principe; Giovanni Del Poeta; Francesco Buccisano; Luca Maurillo; Adriano Venditti; Antonella Zucchetto; Rita Marini; Pasquale Niscola; Maria Antonietta Irno Consalvo; Carla Mazzone; Licia Ottaviani; Paola Panetta; Antonio Bruno; Riccardo Bomben; Giovanna Suppo; Massimo Degan; Valter Gattei; Paolo de Fabritiis; Maria Cantonetti; Francesco Lo Coco; Domenico Del Principe; Sergio Amadori
Journal:  Blood       Date:  2006-04-06       Impact factor: 22.113

6.  Human tyrosine kinase 2 deficiency reveals its requisite roles in multiple cytokine signals involved in innate and acquired immunity.

Authors:  Yoshiyuki Minegishi; Masako Saito; Tomohiro Morio; Ken Watanabe; Kazunaga Agematsu; Shigeru Tsuchiya; Hidetoshi Takada; Toshiro Hara; Nobuaki Kawamura; Tadashi Ariga; Hideo Kaneko; Naomi Kondo; Ikuya Tsuge; Akihiro Yachie; Yukio Sakiyama; Tsutomu Iwata; Fumio Bessho; Tsutomu Ohishi; Kosuke Joh; Kohsuke Imai; Kazuhiro Kogawa; Miwa Shinohara; Mikiya Fujieda; Hiroshi Wakiguchi; Srdjan Pasic; Mario Abinun; Hans D Ochs; Eleonore D Renner; Annette Jansson; Bernd H Belohradsky; Ayse Metin; Norio Shimizu; Shuki Mizutani; Toshio Miyawaki; Shigeaki Nonoyama; Hajime Karasuyama
Journal:  Immunity       Date:  2006-11       Impact factor: 31.745

Review 7.  Mucins and mucin binding proteins in colorectal cancer.

Authors:  James C Byrd; Robert S Bresalier
Journal:  Cancer Metastasis Rev       Date:  2004 Jan-Jun       Impact factor: 9.264

8.  AML1, the target of multiple chromosomal translocations in human leukemia, is essential for normal fetal liver hematopoiesis.

Authors:  T Okuda; J van Deursen; S W Hiebert; G Grosveld; J R Downing
Journal:  Cell       Date:  1996-01-26       Impact factor: 41.582

9.  Relation of gene expression phenotype to immunoglobulin mutation genotype in B cell chronic lymphocytic leukemia.

Authors:  A Rosenwald; A A Alizadeh; G Widhopf; R Simon; R E Davis; X Yu; L Yang; O K Pickeral; L Z Rassenti; J Powell; D Botstein; J C Byrd; M R Grever; B D Cheson; N Chiorazzi; W H Wilson; T J Kipps; P O Brown; L M Staudt
Journal:  J Exp Med       Date:  2001-12-03       Impact factor: 14.307

10.  Mutations in Hydin impair ciliary motility in mice.

Authors:  Karl-Ferdinand Lechtreck; Philippe Delmotte; Michael L Robinson; Michael J Sanderson; George B Witman
Journal:  J Cell Biol       Date:  2008-02-04       Impact factor: 10.539

View more
  2 in total

1.  Demographics, treatment patterns, safety, and real-world effectiveness in patients aged 70 years and over with chronic lymphocytic leukemia receiving bendamustine with or without rituximab: a retrospective study.

Authors:  Kathryn S Kolibaba; James A Sterchele; Avani D Joshi; Michael Forsyth; Erin Alwon; Hooman Beygi; Gerard T Kennealey
Journal:  Ther Adv Hematol       Date:  2013-06

2.  AFLP-AFLP in silico-NGS approach reveals polymorphisms in repetitive elements in the malignant genome.

Authors:  Jitka Koblihova; Klara Srutova; Monika Krutska; Hana Klamova; Katerina Machova Polakova
Journal:  PLoS One       Date:  2018-11-08       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.