| Literature DB >> 18992157 |
Peter E Warburton1, Dan Hasson, Flavia Guillem, Chloe Lescale, Xiaoping Jin, Gyorgy Abrusan.
Abstract
BACKGROUND: Tandemly Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation. Here we present a genome wide analysis of the largest tandem repeats found in the human genome sequence.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18992157 PMCID: PMC2588610 DOI: 10.1186/1471-2164-9-533
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Analysis of tandem repeats from the human genome. Output from tandem repeats finder (TRF), plotted showing the repeat unit size on the X axis (log scale) and the array length on the Y axis (log scale). 24,358 arrays between 600 bp and 10,000 bp in length were found (grey squares). 503 arrays ≥ 10 kb found by TRF are shown classified into different types of repeats (see legend at top). Prominent "simple sequence" satellites are shown as color coded triangles. Classical satellites are shown as color coded circles. Single locus VNTR repeats are indicated by color coded diamonds. 373 arrays found at multiples of 171 bp repeat unit size represent alpha satellite DNA (purple circles). Arrays greater then 2 kb not found by TRF are also shown, and listed in Table 2. Some arrays containing repeat units greater than ~1.5 kb are also listed in Table 2 because they contain more complex repeat units than those listed in Table 1. Both the 1.5 kb NBPF repeats (square) and the 1.9 kb "mer5A1" repeats (square) were found by TRF but are listed in Table 2. Multiple LTR arrays (Table 3) are shown as red circles at a repeat unit size of 3.5 kb.
Arrays of satellite DNA >10 kb in size
| GAATG/GAGTG | 1q12 | 5 | chr1:141476959–141484530 | 7,571 | |
| GAATG | 4p11 | 5 | chr4:48788006–48853362 | 65,356 | HOR |
| GAATG | 4p11 | 5 | chr4:49328072–49354872 | 26,800 | |
| GAATG | 10p12.33 | 5 | chr10:18881540–18902473 | 20,933 | inversion |
| GAATG | 10p11.1 | 5 | chr10:38812347–38858839 | 46,492 | |
| GAATG | 10p11.1 | 5 | chr10:39116615–39194939 | 78,302 | inversion |
| GAATG | 10q11.1 | 5 | chr10:41674944–41703229 | 28,286 | |
| GAATG | 10p11.21 | 5 | chr10:42110576–42137136 | 26,560 | |
| GAATG | 20q11.21 | 5 | chr20:29267576–29296923 | 29,347 | |
| GAATG | 21p11.2 | 5 | chr21:9795590–9882589 | 86,855 | |
| GAATG | Yq11.1 | 5 | chrY:12106028–12205425 | 99,398 | HOR, 3360 bp, 5630 bp |
| GAATG | Yq11.1 | 5 | chrY:12308738–12380225 | 71,488 | |
| GAATG | Yq12 | 5 | chrY:57228756–57327036 | 98,280 | HOR, 3600 bp |
| CCTTG | Xp21.2 | 5 | chrX:30716537–30734673 | 18,136 | |
| CAGC | 22q11.1 | 9, 17, 26 | chr22:15419137–15429349 | 10,212 | |
| CAGC | 2p11.2 | 4, 9, 26 | chr2:87486329–87512059 | 25,730 | |
| hsatII | 2p11.2 | 26, 49 | chr2:90958833–90979427 | 20,594 | HO |
| hsatII | 7q11.21 | 26, 49 | chr7:61377290–61396834 | 19,544 | |
| hsatII | 7q11.21 | 26, 49 | chr7:61417257–61440549 | 23,292 | |
| hsatII | random chr9 | 26 | chr9_random:321296–332315 | 11,019 | |
| hsatII | random chr9 | 26 | chr9_random:421232–456177 | 34,945 | inversion |
| hsatII | 10p11.1 | 26, 98 | chr10:38915450–38928835 | 13,385 | |
| hsatII | 10p11.1 | 26, 49 | chr10:41703232–41717296 | 14,064 | HOR |
| hsatII | 16p11.2 | 26, 49 | chr16:33783664–33806096 | 22,432 | inversion |
| hsatII | 16p11.2 | 23, 26 | chr16:34038041–34057759 | 19,718 | |
| hsatII | 16q11.2 | 23, 26 | chr16:44943305–45014085 | 70,780 | HOR |
| hsatII | 22q11.1 | 26 | chr22:15227855–15242475 | 14,620 | |
| Gsat | 8q11.1 | 217 | chr8:47404069–47479909 | 75,840 | |
| Gsat | 8q11.1 | 217 | chr8:47356921–47374868 | 17,947 | |
| Gsat | 8q11.1 | 217 | chr8:47119593–47154540 | 34,947 | |
| GsatX | 8p11.1 | 217 | chr8:43535538–43546546 | 11,008 | |
| GsatII | 12p11.1 | ~200 | chr12:34330336–34451537 | 121,201 | Inversions |
| GsatII | 12p11.1 | 188 | chr12:34640002–34652160 | 12,158 | |
| CER | random chr17 | 48 | chr17_random:135989–174585 | 38,596 | |
| CER | 14q11.1 | 48 | chr14:18267094–18390499 | 123,405 | inversion |
| CER | 18p11.21 | 48 | chr18:15225272–15236086 | 10,814 | |
| CER | 22q11.1 | 48 | chr22:14886892–15006569 | 119,677 | inversion |
| BSR/beta | Yq12 | 68 | chrY:57392659–57407170 | 14,511 | HOR |
| hsat4 | 1q42.13 | 35 | chr1:226783516–226809283 | 25,767 | |
| hsat4 | 16p11.1 | 35 | chr16:34775197–34825861 | 50,664 | |
| Hsat 4/alpha sat | 19q12 | 35 | chr19:32423625–32954368 | 530,743 | |
| Hsat 4/alpha sat | Xp11.1 | 35 | chrX:58228883–58296456 | 67,573 | |
| VNTR | 2p21 | 20 | chr2:43935739–43946526 | 10,787 | |
| VNTR | 2p25.3 | 28 | chr2:1505487–1520958 | 15,471 | |
| VNTR | 13q34 | 34 | chr13:111978814–112021592 | 47,778 | |
| VNTR | 15q11.2 | 36 | chr15:20373884–20384373 | 10,489 | HOR |
| VNTR | 3q29 | 38 | chr3:196680314–196694112 | 13,798 | HOR |
| VNTR | 19p12 | 38 | chr19:20841181–20890996 | 49,815 | |
| VNTR | 1p36.32 | 40 | chr1:2571135–2624075 | 52,940 | HOR |
| VNTR | 11q23.2 | 42 | chr11:113979947–114001592 | 21,645 | |
| VNTR | 4q35.2 | 59 | chr4:188117458–188131626 | 14,168 | |
| VNTR | Xp22.33 Yp11.32 | 61 | chrX:94937–106217 | 11,280 | HOR |
| VNTR | Xp22.33 Yp11.32 | 61 | chrX:16203–34821 | 18,618 | HOR, 1600 bp, 2400 bp |
| VNTR | 2q37.1 | 89 | chr2:232395638–232422714 | 27,076 | |
| VNTR | 14q32.33 | 102 | chr14:104767008–104778328 | 11,320 | |
| VNTR | Yq11.222 | 125 | chrY:20,675,953–20,922,323 | 246,371 | |
| VNTR | 7q22.1 | 177 | chr7:100461970–100472619 | 10,649 | HOR |
| VNTR | 14q32.33 | 495 | chr14:104478982–104490824 | 11,842 | |
| VNTR | 1q21.3 | 972 | chr1:150542280–150553107 | 10,827 | |
| VNTR | 7q11.21 | 1823 | chr7:62846888–62859169 | 12,281 |
Tandem arrays, repeat units >2 kb
| 1q21.1 | 250.0 | 38.8 | 5 | + | Gdis, inv | AMY | [ | chr1:103,893,838–104,143,838 | |
| 1q21.3 | 30.4 | 10.1 | 3 | >90.3% | + | LCE2 | chr1:150,897,180–150,927,534 | ||
| 1q23.3 | 33.9 | 7.4 | 4.5 | >99% | + | tRNA | chr1:159,675,041–159,708,915 | ||
| 1q42.13 | 42.1 | 2.5 | 16 | >99% | - | 5sRNA | chr1:226,809,390–226,851,530 | ||
| 4p16.1 | 42.7 | 4.7 | 9 | >99% | + | Gdis, inter (1) | DUB3 | [ | chr4:8,935,516–8,978,292 |
| 4q35.2 | 23.8 | 3.3 | 7 | >99% | -- | Inter (2) | DUX4 | [ | chr4:191,224,553–191,248,328 |
| 7p14.1 | 26.9 | 4.4 | 5.8 | >91% | + | TRGV | chr7:38,347,944–38,374,805 | ||
| 8p23.1 | 127.0 | 12.1 | 10 | >98% | + | DEFA | chr8:6,774,101–6,901,753 | ||
| 8q21.2 | 166.7 | 12.2 | 6 | >99% | + | Gint | Gor1 | chr8:86,744,493–86,911,178 | |
| 10q26.3 | 18.4 | 3.3 | 6 | >99% | -- | Inter (2) | DUX4 | [ | chr10:135,328,811–135,347,195 |
| 13q21.1 | 31.1 | 6.6 | 5 | >99% | + | FLJ40296 | .chr13:56,613,947–56,645,084 | ||
| 16p11.1 | 9.5 | 1.5 | 7 | + | 5sRNA | chr16:34,837,642–34,847,159 | |||
| 17q23.3 | 59.8 | 22.9 | 3 | >95% | -- | CSH1,2 | chr17:59,292,406–59,340,026 | ||
| Xp22.31 | 10.1 | 1.9 | 5 | >95% | + | 5sRNA | chrX:9,331,977–9,342,069 | ||
| Xp11.23 | 38 | 2.5 | 15 | >99% | - | Gint | Gage4 | chrX:49,059,954–49,271,622 | |
| Xq24 | 56.0 | 4.8 | 12 | >99% | + | CT47 | [ | chrX:119,893,246–119,948,579 | |
| Xq26.3 | 108.1 | 19.9 | 4 | >99% | + | CT45 | chrX:134,683,931–134,792,078 | ||
| Yp11.2 | 791.7 | 20.3 | 9 | >99% | + | Gint | TSPY | chrY:9,226,249–10,017,916 | |
| Yq11.223 | 55.3 | 23.6 | 2.3 | >98% | + | Intra (3) | RMBY | chrY:22,069,208–22,124,461 | |
| Yq11.223 | 46.2 | 23.6 | 2 | >98% | + | Intra (3) | RBMY | chrY:22,431,619–22,477,826 | |
| 1q21.1 | 66.3 | 1.5 | 41 | >97% | + | Gdis, Gprox, intra (4) | NBPF | [ | chr1:142,869,841–142,935,173 |
| 1q21.1 | 56.1 | 1.5 | 35.7 | >95% | + | Intra (4) | NBPF | [ | chr1:144,022,952–144,079,081 |
| 1q21.1 | 10.8 | 1.5 | 6.9 | >95% | + | Gprox, Intra(4) | NBPF | [ | chr1:146,472,125–146,482,922 |
| 1q21.3 | 7.8 | 1.4 | 5.6 | + | HRNR | chr1:150,452,354–150,460,167 | |||
| 1q32.2 | 59.4 | 18.6 | 3 | >97% | + | CR1 | chr1:205,769,057–205,828,425 | ||
| 2q11.2 | 49.5 | 1.9 | 26.5 | 85% | - | SD, intra (5) | FLJ41632 | [ | chr2:95,910,350–95,959,888 |
| 2q11.2 | 39.9 | 1.9 | 21.3 | 85% | + | SD, intra (5) | UNG2430 | chr2:97,208,306–97,248,230 | |
| 2q11.2 | 25.2 | 1.9 | 13.5 | 82% | + | SD, intra (5) | KIAA 1641 | chr2:97,519,639–97,544,901 | |
| 6q26 | 26.4 | 5.6 | 5 | >99% | + | SD | LPA | [ | chr6:160,953,270–160,979,666 |
| 7q22.1 | 7.0 | 1.4 | 5.1 | 98% | - | SD | BC056606 | chr7:99,743,773–99,750,792 | |
| 8p23.1 | 47.8 | 7.1 | 6.2 | >99% | + | SD, intra (6) | AK090418 | chr8:7,098,335–7,146,182 | |
| 8p23.1 | 61.7 | 7.1 | 8 | >97% | + | SD, intra (6) | AK090418 | chr8:7,606,874–7,668,576 | |
| 8p23.1 | 23.5 | 7.7 | 3 | >97% | + | SD, intra (6) | AK090418 | chr8:7,905,630–7,929,101 | |
| 10p11.21 | 47.4 | 11.1 | 4 | >95% | + | SD | ANKRD30A | chr10:37,483,216–37,530,613 | |
| 10q26.13 | 42.0 | 2.9 | 13 | >79% | + | DMBT1 | [ | chr10:124,329,825–124,371,833 | |
| 19q13.2 | 48.0 | 15.7 | 3 | >95% | + | FCGBP | chr19:45,056,041–45,104,055 | ||
| Yq11.223 | 17.5 | 2.4 | 7 | + | Intra (7) | Daz1 | [ | chrY:23,706,609–23,724,156 | |
| Yq11.223 | 29.5 | 2.4 | 12 | + | Intra (7) | Daz2 | chrY:23,783,768–23,813,249 | ||
| Yq11.223 | 16.7 | 2.4 | 7 | + | Intra (7) | Daz2 | chrY:23,820,447–23,837,326 | ||
| Yq11.23 | 22.3 | 2.4 | 9 | + | Intra (7) | Daz3 | chrY:25,337,948–25,360,227 | ||
| Yq11.23 | 22.3 | 2.4 | 9 | + | Intra (7) | Daz4 | chrY:25,409,003–25,431,283 | ||
| Yq11.23 | 14.8 | 2.4 | 6 | + | Intra (7) | Daz4 | chrY:25,438,190–25,453,002 | ||
| Yq11.23 | 26.2 | 10.8 | 2.4 | + | Intra (7) | Daz1 | chrY:23,726,459–23,752,640 | ||
| 1p36.66 | 14.4 | 3.4 | 4 | >98% | + | SD | AK125248 | chr1:655,059–669,457 | |
| 2p13.2 | 28.4 | 4.9 | 5.7 | >95% | + | FLJ43987 | chr2:73,862,733–73,891,172 | ||
| 4q28.3 | 41.3 | 2.5 | 17 | >95% | + | SD, inter (8) | SST | [ | chr4:132,864,491–132,905,799 |
| 5q13.2 | 36.9 | 21.3 | 2 | >99% | + | SD | GUSBP | chr5:69,823,338–69,860,219 | |
| 16p11.2 | 6.4 | 1.9 | 3.4 | >88% | + | SD, inter (5) | BC012355 | chr16:33,450,421–33,456,850 | |
| 7p11.2 | 7.0 | 2.4 | 3 | >87% | + | SD | Div SST | chr7:57,712,475–57,719,436 | |
| 17p11.2 | 11.1 | 2.4 | 4 | >84% | + | SD | Div SST | chr17:21,825,509–21,836,586 | |
| 20p11.1 | 16.1 | 1.9 | 8.2 | >87% | Div SST | chr20:25,781,895–25,797,997 | |||
| 2q11.2 | 6.4 | 1.9 | 3.4 | >89% | - | SD, intra (5) | Mer5A1 | chr2:95,986,764–95,993,173 | |
| 2q37.1 | 28.1 | 1.5 | 18.7 | + | Mer20 | chr2:232,396,082–232,424,219 | |||
| 4p16.1 | 16.5 | 5.6 | 3 | >99% | + | Mer65A | chr4:8,673,058–8,689,560 | ||
| 4p11 | 51.4 | 6.0 | 8 | >96% | + | SD, inv, Gprox | Acro | chr4:48,976,229–49,027,623 | |
| 5p15.1 | 71.0 | 3.4 | 20 | >97% | + | SD, Inv | Charlie 2a | chr5:17,570,661–17,641,581 | |
| 7q36.1 | 10.1 | 1.7 | 5 | + | SD | HERVE | chr7:149,361,474–149,371,533 | ||
| 8p23.1 | 38.8 | 7.7 | 5 | >98% | + | SD, intra (6) | LTR5A | chr8:7,392,330–7,431,109 | |
| 8p23.1 | 12.9 | 4.7 | 3 | >94% | + | SD, intra (1) | CA | [ | chr8:7,175,100–7,187,953 |
| 8p23.1 | 12.5 | 4.7 | 3 | >87% | + | SD, intra (1) | CA | chr8:12,023,084–12,035,607 | |
| 9q32 | 32.1 | 5.5 | 5.9 | >96% | + | L1MA7 | chr9:114,860,525–114,892,874 | ||
| 18q22.1 | 6.3 | 2.3 | 3.2 | >96% | + | L1PB4 | chr18:64,354,361–64,360,677 | ||
| 19p13.2 | 53.7 | 7.5 | 7.5 | >97% | + | Charlie 5 | chr19:8,708,195–8,761,926 | ||
| 19q13.12 | 44.5 | 2.5 | 18 | >95% | - | SD, intra (8) | SST | [ | chr19:41,448,243–41,492,723 |
| 19q13.31 | 38.5 | 2.5 | 16 | >95% | - | SD, intra (8) | SST | [ | chr19:42,451,366–42,489,869 |
| 19q13.32 | 54.4 | 5.4 | 10 | >98% | + | SD, intra (9) | Mer33 | chr19:53,098,557–53,152,942 | |
| 19q13.33 | 56.2 | 5.4 | 10 | >97% | + | SD, intra (9) | Mer33 | chr19:55,280,783–55,336,942 | |
| Xq23 | 51.7 | 3.0 | 17 | >99% | + | DXZ4 | [ | chrX:114,867,433–114,919,088 | |
Assembly- Gdis- Gap at distal end of array, Gprox-Gap at proximal end of array, Gint- Gap internal to array. Inv- inversion within array. SD- array is associated with a surrounding segmental duplication (not including segmental duplications due merely to repeat unit homology within array). Intra- intrachromosomal duplication of related arrays on same chromosome Inter- Interchromosomal duplication of related arrays on different chromosomes. (1) – (9) Related arrays are grouped by number, either intra or interchromosomal, or both, as indicated.
Figure 2Dot plot- analysis of tandem arrays reveals higher-order structure. For each dot-plot shown, the type of repeat, chromosomal location and stringency (window size and % homology) are indicated. Black dots and horizontal lines represent tandem orientation, whereas blue dots and vertical lines represent inverted orientation. The repeat masker tracks for each region are shown below. A-G) Arrays are listed in Table 1. The Repeat Masker tracks indicate a large continuous domain of satellite DNA. A) 70.7 kb array of hsatII from 16q11.2 at low stringency, showing dense pattern indicative of homologous satellite DNA. A large inversion is seen in this array. ~20 kb of neighboring non-satellite DNA is also shown. B) 121 kb array of GsatII from 12p11.1, showing complex multiple inversions within this array. C) Same region as in B at increased stringency, showing 3 distinct domains of homology within overall array. D) Same array as in A at increased stringency, showing higher-order repeats in proximal 50 kb in both orientations. E) ~100 kb array of GAATG on Yq11.1, showing the 3.36 kb higher-order repeats in the distal 60 kb region. F) The 100 kb array of GAATG on Yq12, showing the 3.6 kb HOR across the entire sequenced array. G) The 61 bp VNTR from Xp22.33 at high stringency showing complex higher-order structure. H-L) Arrays listed in Table 2. The Repeat Masker tracks show repetitive patterns containing the different classes of transposable elements. H) The array containing the CT47 genes. I) The DMBT gene, showing the internal repetitive domain structure. J) The LPA gene, showing the internal repetitive domain structure. K) The 54.5 kb array of 5.4 kb megasatellite repeats, each of which contains a Mer33 repeat. L) The 51.4 kb array containing the ~6.0 kb Acro repeats. This array has an inversion in orientation of the repeat units, indicated by the vertical lines visible on the dotplot.
Figure 3Analysis of large tandem array in 8q21.2. A) Information from the UCSC genome browser (hg18) showing region containing the 12 kb tandem repeat from 8q21.2. This repeat array contains an "87 kb" gap with ~5 repeat units on the proximal side and ~1.5 repeats on the distal side. The repeats can be seen in the repeating patterns of the Repeat Masker Tracks. The AF495523 (Gor1) gene is found once in each repeat unit. Copy number variation was detected at this repeat array using both BAC microarrays and fosmids. The restriction enzyme PmeI does not cut in the 12 kb repeats, but cuts close to the edge of the array in the genomic DNA sequence. The position of the PCR amplified probes used on the Southern blot are indicated. B) Pulsed Field Gel analysis of the array size in two pedigrees (lanes 1–4, and lanes 5–10).
Figure 4In situ hybridization of megasatellite DNA families. A) FISH using a 636 bp probe to the SST repeats from chromosome 4, which hybridizes to chromosome 19 (two arrays, Table 2d) and chromosome 4. B) FISH using a probe from the acro repeats from chromosome 4p11 (Table 2d), which hybridizes to pericentromeric regions of chromosomes 3 and 4, and the acrocentric chromosomes. C) FISH using a probe to the 3.5 kb repeats from the LTR arrays. Right- Additional acrocentric chromosomes from different individuals showing the variation in hybridization patterns.
Figure 5Analysis of the repeat unit structure of the LTR arrays from chromosomes 13, 18 and 21. A) Genomic region from chromosome 13q11 containing the LTR array. The REPEAT MASKER Tracks from the UCSC genome browser are shown, which indicate the large 60 kb array of LTR transposons. Homologous monomeric repeat units are indicated by arrows. B) Self similarity dot plot of LTR array. 30 bp windows at 90% homology reveals the ~3.5 kb monomeric repeat units as horizontal lines. C) Schematic of a higher-order repeat unit (HOR) consisting of 6 ~3.5 kb monomeric repeat units. The insertion of an LTR6A into monomer C of each HOR is shown, as well as additional deletions of the MSTA-int repeats. D) Detail of the composition of monomers C and D indicating the MaLR LTR fragments that make up the repeat units, taken from the REPEAT MASKER output and numbered relative to the consensus for each element. The insertion of a full length LTR6A into monomer C can be seen. E) Self-similarity dot plots of LTR array from chromosome 13q11, 18p11 and 21q11 at 50 bp windows 90% homology. The HOR organization is revealed as bold solid horizontal lines, and are shown schematically by arrows below. Putative unequal crossing over events unique to each LTR array are revealed by the gaps and shift of these lines, and deleted monomeric repeat units indicated below.
LTR arrays in the human genome
| 1q12 | 14.7 | 4 | chr1:141,614,596–141,629,292 |
| 1q12 | 14.7 | 4 | chr1:142,011,066–142,025,761 |
| 1q12 | 14.4 | 4 | chr1:142,245,576–142,259,928 |
| 2q11 | 21.7 | 8 | chr2:94,794,007–94,815,544 |
| 4p11 | 17.8 | 5 | chr4:48,911,896–48,929,687 |
| 4p11 | 14.8 | 4 | chr4:49,259,299–49,274,079 |
| 8p11.1 | 17.0 | 5 | chr8:43,489,114–43,506,146 |
| 9p13.3 | 20.3 | 7 | chr9:33,567,377–33,587,718 |
| 9p13.1 | 20.5 | 7 | chr9:38,536,498–38,557,002 |
| 9p11.2 | 27.2 | 9 | chr9:42,429,334–42,456,540 |
| 9p11.2 | 27.2 | 9 | chr9:43,025,353–43,052,560 |
| 9p11.2 | 22.1 | 7 | chr4:43,964,543–43,986,620 |
| 9q12 | 22.1 | 7 | chr9:66,111,804–66,133,881 |
| 9q12 | 21.0 | 7 | chr9:68,742,783–68,763,792 |
| 9q22.33 | 17.5 | 5 | chr9:98,936,138–98,953,601 |
| 13q11 | 60.4 | 18 | chr13:18,218,666–18,279,040 |
| 18p11.21 | 79.5 | 25 | chr18:14,244,667–14,324,176 |
| 21q11.2 | 53.8 | 16 | chr21:14,145,756–14,199,554 |