| Literature DB >> 18769679 |
Adam J de Smith1, Robin G Walters, Lachlan J M Coin, Israel Steinfeld, Zohar Yakhini, Rob Sladek, Philippe Froguel, Alexandra I F Blakemore.
Abstract
Copy number variants (CNVs) contribute significantly to human genomic variation, with over 5000 loci reported, covering more than 18% of the euchromatic human genome. Little is known, however, about the origin and stability of variants of different size and complexity. We investigated the breakpoints of 20 small, common deletions, representing a subset of those originally identified by array CGH, using Agilent microarrays, in 50 healthy French Caucasian subjects. By sequencing PCR products amplified using primers designed to span the deleted regions, we determined the exact size and genomic position of the deletions in all affected samples. For each deletion studied, all individuals carrying the deletion share identical upstream and downstream breakpoints at the sequence level, suggesting that the deletion event occurred just once and later became common in the population. This is supported by linkage disequilibrium (LD) analysis, which has revealed that most of the deletions studied are in moderate to strong LD with surrounding SNPs, and have conserved long-range haplotypes. Analysis of the sequences flanking the deletion breakpoints revealed an enrichment of microhomology at the breakpoint junctions. More significantly, we found an enrichment of Alu repeat elements, the overwhelming majority of which intersected deletion breakpoints at their poly-A tails. We found no enrichment of LINE elements or segmental duplications, in contrast to other reports. Sequence analysis revealed enrichment of a conserved motif in the sequences surrounding the deletion breakpoints, although whether this motif has any mechanistic role in the formation of some deletions has yet to be determined. Considered together with existing information on more complex inherited variant regions, and reports of de novo variants associated with autism, these data support the presence of different subgroups of CNV in the genome which may have originated through different mechanisms.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18769679 PMCID: PMC2518860 DOI: 10.1371/journal.pone.0003104
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the 20 sequenced deletion variants.
| Genomic Position (deleted sequence) | Deletion size (bp) | Number of samples with deletion (≥1 copy) | Number of chromosomes sequenced | Deletion within gene | Intronic/exonic | Repeat sequences at upstream breakpoint | Repeat sequences at downstream breakpoint | Maximum R2 values within 100 kb window | P-values |
| Chr 1: 145,312,298–145,314,875 | 2578 | 3 | 2 | Low-Complexity Repeat (AT-rich) |
| 0.07 | 0.155 | ||
| Chr 2: 229,467,533–229,468,151 | 619 | 10 | 10 |
| 0.67 | 1.0E-4 | |||
| Chr 3: 181,137,036–181,137,500 | 465 | 9 | 10 |
| Intron 2–3 | 0.43 | 1.0E-4 | ||
| Chr 4: 98,573,315–98,578,237 | 4923 | 11 | 11 | LINE1 element | ERV1 LTR (endogenous retrovirus 1, long terminal repeat) | 0.10 | 0.184 | ||
| Chr 5: 65,479,440–65,479,975 | 536 | 2 | 2 | 0.99 | 1.0E-4 | ||||
| Chr 5: 78,145,556–78,147,626 | 2071 | 8 | 6 |
| Intron 6–7 | 0.24 | 0.032 | ||
| Chr 6: 24,433,346–24,435,791 | 2446 | 9 | 10 |
| Intron 2–3 |
| 0.75 | 1.0E-4 | |
| Chr 6: 34,425,089–34,427,582 | 2494 | 4 | 3 |
| Intron 1–2 | LINE2 element | 0.47 | 1.0E-4 | |
| Chr 6: 162,645,085–162,645,903 | 819+4 bp insertion | 5 | 6 |
| Intron 1–2 | LINE1 element | 0.41 | 6.0E-3 | |
| Chr 7: 82,856,584–82,857,509 | 926 | 12 | 14 |
| Intron 14–15 | 0.91 | 1.0E-4 | ||
| Chr 12: 20,859,912–20,859,936 | 25+3 bp insertion | 26 | 32 |
| ENSE00000822206 | 0.52 | 1.0E-4 | ||
| Chr 14: 72,402,707–72,403,561 | 855 | 5 | 6 |
| Intron 1–2 |
| 0.27 | 0.01 | |
| Chr 14: 72,615,524–72,616,685 | 1162 | 10 | 10 |
| Intron 4–5 |
| 0.88 | 1.0E-4 | |
| Chr 15: 83,858,016–83,860,206 | 2191 | 14 | 19 |
| Intron 3–4 | LINE1 element | 0.90 | 1.0E-4 | |
| Chr 16: 22,955,277–22,957,032 | 1756+6 bp insertion | 44 | 36 | 0.82 | 1.0E-4 | ||||
| Chr 16: 56,282,301–56,285,908 | 3608 | 18 | 3 |
|
| 0.92 | 1.0E-4 | ||
| Chr 16: 76,115,174–76,115,188 | 15 | 5 | 4 | Whole deletion within LINE1 element | Whole deletion within LINE1 element | 0.18 | 0.259 | ||
| Chr 16: 88,089,521–88,095,227 | 5707 | 2 | 2 |
|
| 0.05 | 0.5 | ||
| Chr 19: 35,979,321–35,981,593 | 2273 | 6 | 12 |
|
| 0.32 (0.88) | 1.0E-4 | ||
| Chr 22: 32,085,572–32,090,063 | 4492 | 8 | 6 |
| Intron 10–11 |
|
| 0.80 | 1.0E-4 |
Column 1: exact genomic position (UCSC March 2006 genome build) of each deletion, determined from sequencing results; Column 2: size of each deletion (bp) and 3 breakpoint insertions; Column 3: number of samples carrying each deletion (≥1 copy), identified by aCGH or imputed using CNVhap; Column 4: number of chromosomes sequenced–samples with homozygous deletions are scored twice, not all deletions missed by aCGH but imputed by CNVhap were sequenced; Column 5: name of genes (if any) overlapped by deletions; Column 6: introns/exons overlapped by deletions; Columns 7 & 8: these show whether the upstream and/or downstream deletion breakpoints are located within/adjacent to any repeat sequence elements. * indicates Alu elements with poly-A tails flanking the breakpoint; Column 9: max r2 scores for LD with surrounding SNPs within 50 kb window for each deletion; Column 10: significance values for r2 scores. For the deletion on chromosome 19, the analysis was first performed assuming that the reference sample had a copy number of 2 (i.e. homozygous undeleted), and then re-calculated on the basis that the reference was a heterozygous deletion at this loci. The number in brackets refers to the calculation assuming a heterozygous deleted reference.
Figure 1Example of deletion within DCDC2 gene having identical sequence breakpoints in 9 samples (representing 10 chromosomes).
A: CGH Analytics view of deletion detected by 9 consecutive probes within intronic region of gene DCDC2 at chr6: 24,433,346–24,435,791 (UCSC March 2006). The superimposed log2 ratios for the 50 samples are plotted as a function of chromosomal position and with different colours for putatively different copy number states (undeleted samples = green, putative heterozygous deletion samples = blue, and the putative homozygous deletion sample = red). Log2 ratios for 8 samples are around −1 (putative heterozygous deletion compared to the reference sample) and log2 ratio for one sample is around −4 (putative homozygous deletion). Blue arrows indicate approximate position of PCR primers. B: Multiple sequence alignment (using ClustalW) of 9 deleted samples (rows 1–9) with reference genome sequence (row 10). Asterisks indicate where all 10 sequences are perfectly aligned around the deleted region; upstream and downstream deletion breakpoints are indicated by the red arrows. The deleted region begins 345 bp into the reference sequence, and ends at 2790 bp, thus the deletion size is 2446 bp in all 9 samples. Blue boxes indicate 5 bp sequences of microhomology between the upstream and downstream breakpoints.
Figure 2Frequency distribution of the percentage of SINE intersection as computed for 1000 sets of 40 random sequences, compared with the percentage determined for the 40 breakpoint flanking sequences (red arrow).
Figure 3Frequency distribution of the percentage of Alu element intersection as computed for 1000 sets of 40 random sequences, compared with the percentage determined for the 40 breakpoint flanking sequences (red arrow).
Figure 4A Shannon logo description of the motif we have found to be enriched around deletion breakpoints.
The representation was generated using the WebLogo service (http://weblogo.berkeley.edu/). The x-axis represents the position in the motif, the y-axis represents the certainty we have in the content of that position and the mixture of letters represents the position specific probabilities.
Figure 5Zoomed in LD plot of deletion on chromosome 7: 82,856,584–82, 857,509, showing the 27 Kb haplotype block containing the deletion.
The Haploview default colouring scheme is used: positions are white if LOD <2 and D' <1; blue if LOD <2 and D' = 1; shades of red as a function of D' if LOD ≥2; bright red if D' = 1 and LOD ≥2. Numbers within boxes refer to the r2 values between two given positions, so are not directly connected to the colouring scheme. Solid black lines delineate LD between the deletion and other markers in this region. This deletion is in a block of strong LD, and has high LD to all positions in this block (see Figure S5-j for full LD plot).