| Literature DB >> 18812401 |
Agnès Thierry1, Christiane Bouchier, Bernard Dujon, Guy-Franck Richard.
Abstract
Minisatellites are DNA tandem repeats that are found in all sequenced genomes. In the yeast Saccharomyces cerevisiae, they are frequently encountered in genes encoding cell wall proteins. Minisatellites present in the completely sequenced genome of the pathogenic yeast Candida glabrata were similarly analyzed, and two new types of minisatellites were discovered: minisatellites that are composed of two different intermingled repeats (called compound minisatellites), and minisatellites containing unusually long repeated motifs (126-429 bp). These long repeat minisatellites may reach unusual length for such elements (up to 10 kb). Due to these peculiar properties, they have been named 'megasatellites'. They are found essentially in genes involved in cell-cell adhesion, and could therefore be involved in the ability of this opportunistic pathogen to colonize the human host. In addition to megasatellites, found in large paralogous gene families, there are 93 minisatellites with simple shorter motifs, comparable to those found in S. cerevisiae. Most of the time, these minisatellites are not conserved between C. glabrata and S. cerevisiae, although their host genes are well conserved, raising the question of an active mechanism creating minisatellites de novo in hemiascomycetes.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18812401 PMCID: PMC2566889 DOI: 10.1093/nar/gkn594
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Simple minisatellites in C. glabrata genes
| MS # | Gene Name | MS | Size (nt) | MS in S.c. | Domain ( | ||
|---|---|---|---|---|---|---|---|
| 1 | 5 × 15 | 75 | YGL028c ( | 8 × 12 | |||
| 2 | 18 × 9 | 162 | |||||
| 3 | A02728g | 3 × 18 | 54 | YDR363wa ( | |||
| 4 | 3 × 12 | 36 | YLR194c | ||||
| 5 | A04257g | 6 × 12 | 72 | YBL054w | |||
| 6 | B00946g | 3 × 18 | 54 | YCL028w ( | |||
| 7 | B02299g | 3 × 12 | 36 | YML114c ( | |||
| 8 | 11 × 18 | 198 | YJR151c ( | 30 × 18; 7 × 72 | TM × 3; Serpaup | ||
| 9 | 5 × 12 | 60 | YOL155c ( | 5 × 39 | TM × 6 | ||
| 10 | 3 × 12 | 36 | |||||
| 11 | 42 × 12 | 504 | |||||
| 12 | 45 × 12 | 540 | YOL155c ( | 5 × 39 | TM × 8; ABC | ||
| 13 | C01265g | 3 × 15 | 45 | YIL115c ( | |||
| 14 | D01254g | 3 × 12 | 36 | ||||
| 15 | D01364g | 4 × 12 | 48 | YBR112c ( | 3 × 18 | ||
| 16 | D03674g | 4 × 9 | 36 | YPL226w ( | |||
| 17 | D04708g | 4 × 9 | 36 | YPR124w ( | |||
| 18 | 3 × 12 | 36 | |||||
| 19 | 3 × 108 | 324 | YAL063c ( | 13 × 135 | TM × 9 | ||
| 20 | 4 × 15 | 60 | YLL021w ( | ||||
| 21 | 3 × 12 | 36 | YOL109w ( | ||||
| 22 | 4 × 120 | 480 | YAR050w ( | 10 × 135 | TM × 1 | ||
| 23 | 3 × 18 | 54 | YOR010c ( | 5 × 33 | |||
| 24 | 3 × 12 | 36 | YLR054c ( | ||||
| 25 | F06831g | 4 × 18 | 72 | YIR033w | |||
| 26 | F07513g | 3 × 12 | 36 | YKL093w ( | |||
| 27 | G00154g | 6 × 12 | 72 | YGR285c ( | |||
| 28 | G01991g | 4 × 12 | 48 | YOR056c ( | |||
| 29 | G02827g | 6 × 12 | 72 | YIL105c ( | |||
| 30 | 9 × 12 | 108 | YJR004c ( | ||||
| 31 | G04829g | 5 × 12 | 60 | YML017w ( | |||
| 32 | G05830g | 5 × 54 | 270 | YHR146w | |||
| 33 | 4 × 12 | 48 | YHR143w ( | ||||
| 34 | G08954g | 6 × 12 | 72 | YOL019w ( | |||
| 35 | 3 × 12 | 36 | YGR189c ( | 5 × 24 | |||
| 36 | 5 × 30 | 150 | YJR151c ( | 30 × 18; 7 × 72 | TM × 4; Serpaup | ||
| 37 | H02057g | 5 × 30 | 150 | YHR089c ( | |||
| 38 | H02123g | 9 × 12 | 108 | YHR086w ( | |||
| 39 | H02189g | 13 × 9 | 117 | YMR269w ( | |||
| 40 | H03443g | 3 × 18 | 54 | YGL073w ( | |||
| 41 | H04037g | 3 × 12 | 36 | YOR178c | |||
| 42 | H05577g | 9 × 18 | 162 | YPL085w ( | |||
| 43 | H06897g | 3 × 12 | 36 | YML098w ( | |||
| 44 | H07557g | 5 × 12 | 60 | YGL254w ( | |||
| 45 | 36 × 45 | 1,620 | YMR173w ( | 6 × 24; 4 × 24 | |||
| 46 | 3 × 18 | 54 | YER011w ( | 7 × 36 | |||
| 47 | 3 × 18 | 54 | YER011w ( | 7 × 36 | |||
| 48 | I02156g | 3 × 21 | 63 | YHR161c ( | |||
| 49 | I05610g | 5 × 12 | 60 | YNR014w | |||
| 50 | I06006g | 4 × 9 | 36 | YJL148w ( | |||
| 51 | 4 × 57 | 228 | YKL164c | 8 × 57 or 6 × 54 | |||
| 52 | I07161g | 4 × 12 | 48 | YOR141c ( | |||
| 53 | J01980g | 7 × 12 | 84 | YIR002c ( | |||
| 54 | 9 × 15 | 135 | TM × 3; Collagen; Antifreeze | ||||
| 55 | 11 × 18 | 198 | |||||
| 56 | J02530g | 9 × 15 | 135 | TM × 1; Collagen | |||
| 57 | 26 × 18 | 468 | |||||
| 58 | 12 × 15 | 180 | TM × 3; Collagen; Antifreeze; PRich × 3 | ||||
| 59 | 11 × 18 | 198 | |||||
| 60 | 23 × 18 | 414 | |||||
| 61 | J04246g | 3 × 18 | 54 | YMR234w ( | |||
| 62 | 8 × 15 | 120 | YNL166c ( | ||||
| 63 | J09988g | 4 × 15 | 60 | YNL063w ( | |||
| 64 | J10076g | 3 × 15 | 45 | YNL058c | |||
| 65 | J11352g | 7 × 12 | 84 | YNL186w ( | 7 × 12 | ||
| 66 | J11968g(2) ( | 27 × 12 | 324 | YIR019c ( | 5 × 30; 5 × 36 | TM × 1; PT × 2; S-T Kin.; Plakin | |
| 67 | 17 × 24 | 408 | |||||
| 68 | K03707g | 4 × 12 | 48 | YMR124w | |||
| 69 | K06435g | 6 × 12 | 72 | YDR464w ( | |||
| 70 | 9 × 27 | 243 | YFL023w ( | 6 × 30 | |||
| 71 | L04488g | 4 × 15 | 60 | YOR166c ( | |||
| 72 | L05280g | 3 × 15 | 45 | YKL087c ( | |||
| 73 | 9 × 27 | 243 | YLR110c | 3 × 12 | |||
| 74 | L06644g | 6 × 12 | 72 | YHR154w ( | |||
| 75 | L11418g | 4 × 21 | 84 | YML071c ( | |||
| 76 | M01738g | 3 × 18 | 54 | YBR081c ( | |||
| 77 | 3 × 18 | 54 | YNL298w ( | ||||
| 78 | 4 × 12 | 48 | YNL322c ( | ||||
| 79 | M05181g | 4 × 15 | 60 | YMR240c ( | |||
| 80 | M09273g | 3 × 15 | 45 | YJR083c ( | |||
| 81 | 3 × 39 | 117 | TM × 3 | ||||
| 82 | M12573g | 3 × 12 | 36 | YIL061c ( |
C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text).
(1) As described by InterProScan. TM, transmembrane span domain. Given the high number of false positive predictions, proteins with only one TM span have a low probability of being transmembrane proteins [ref. (17)]; Serpaup, Seripauperin domain PT, short repeat domain composed on the tetrapeptide XPTX; ABC, ABC transporter type-1 domain; EGF, found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted; Q6CXZ8, lipoprotein GPI-anchor membrane facilitator; α−β Hyd., domain found in the superfamily of α−β hydrolases; PA14, found in bacterial toxins, glucosidases and adhesins, probably involved in carbohydrate binding [ref. (18)]; β-lac., β-lactamase/transpeptidase domain; NADP, NADP-binding domain; Invasin, Invasin/intimin cell-adhesion domain; Ribo, ribosomal protein S14 domain; Collagen, member of the collagen superfamily, involved in connective tissue structure; Antifreeze, insect cysteine-rich antifreeze protein; PRich, highly glycosylated proline-rich cell wall proteins (extensins) in plants, probably involved in interactions with cell-wall carbohydrates [ref. (21)]; S-T Kin., Serine-Threonine kinase domain; Plakin, multiple repeats of beta(2)-alpha(2) motif, found in Ankyrin and Plakin repeats; GLNA, glutamine synthetase domain; CWP, cell wall peptidoglycan-anchor surface signal; Kelch, actin-interacting Kelch domain; ASP, aspartyl protease active site; Dynein, outer arm dynein light chain superfamily domain; GETHR, pentapeptide repeat of unknown function, mainly found in C. elegans.
(2) Overall quality of the sequence is not sufficient to determine the precise number of repeat units.
Figure 1.Distribution of minisatellites in the C. glabrata genome. Each chromosome is represented by a horizontal line, from the left to the right telomere. Vertical short lines represent the 109 minisatellite-containing genes and pseudogenes. Each gene starts with CAGL0 followed by the chromosome letter (A–M) then by the gene five-digit number and a final ‘g’ (38). Only the five-digit number is given here (e.g. 01284 on chromosome A stands for CAGL0A01284g). Note that some minisatellites may cumulate several properties, i.e. being a compound minisatellite with a long motif, in which case it is both underlined and colored. Size of the two rDNA arrays is not precisely known.
Comparative distributions of minisatellites in the S. cerevisiae and C. glabrata genomes
| Regions | Characteristics | ||
|---|---|---|---|
| Genome size (Mb) | 12.1 | 12.3 | |
| Total number of minisatellites ( | 66 | 238 | |
| Coding regions | Number of minisatellites | 55 | 145 |
| Average GC% ( | 44 | 43 | |
| Minisatellite GC% | 48 | 51 | |
| Minisatellite GC skew ( | −0.11 | 0.00 | |
| Intergenic regions | Number of minisatellites | 11 | 93 |
| Average GC% | 29–36 ( | ND | |
| Minisatellite GC% | 29 | 46 |
(1) From Richard and Dujon (3).
(2) Excluding the 18 Y′ subtelomeric minisatellites.
(3) Calculated on minisatellite-containing genes only. The average for the complete set of genes is 39% for S. cerevisiae and 41% for C. glabrata.
(4) Calculated as the difference between the minisatellite GC skew and the gene GC skew (excluding the minisatellite sequence).
(5) GC% in intergenic regions varies between promoter-convergent and promoter-divergent regions.
Compound minisatellites in C. glabrata genes
| MS# | Gene Name | MS | Size (nt) | MS in S.c. | Domain ( | Motif ( | |
|---|---|---|---|---|---|---|---|
| 101 | 5 × 27/3 × 30 | 135/90 | TM × 4 | ||||
| 102 | B03685g | 4 × 15/3 × 18 | 60/54 | YCR004c | |||
| 83 | 13 × 42 | 546 | YIR019c ( | 5 × 30; 5 × 36 | TM × 5; PRich × 7; PA14 | ||
| 201 | ( | 4 × 168 | 672 | TTITL | |||
| 103 | 48 × 15/14 × 48 | 720/672 | |||||
| 84 | 10 × 42 | 420 | YKR102w ( | 3 × 81 | TM × 5; PRich × 5; PA14 | ||
| 104 | ( | 26 × 15/9 × 63 | 390/567 | ||||
| 85 | 8 × 33 | 264 | YIR019c ( | 5 × 30; 5 × 36 | TM × 10; PA14; PRich × 4; CWP; Kelch | ||
| 105 | 35 × 33/4 × 240 | 1155/912 | -/SHITT-G | ||||
| 106 | 47 × 15/13 × 24 | 705/312 | YIR019c ( | 5 × 30; 5 × 36 | TM × 8 | ||
| 86 | J01774g ( | 5 × 24 | 120 | YKL112w ( | Collagen | ||
| 107 | 32 × 24/60 × 48 | 768/2880 | |||||
| 108 | 6 × 33/2 × 258 | 198/420 | -/SHITT-G | ||||
| 202 | J05170g ( | 7 × 135 | 945 | SHITT | |||
| 203 | 4 × 270 | 1080 | SFFIT degen. | ||||
| 109 | 20 × 12/3 × 168 | 240/504 | -/SHITT | ||||
| 110 | J11924g ( | 35 × 12/3 × 429 | 420/1287 | TM × 2; ASP | -/SFFIT degen. | ||
| 87 | 12 × 15 | 180 | TM × 7; Dynein; GETHR × 9 | ||||
| 111 | 45 × 15/10 × 48 | 675/480 | |||||
| 204 | 5 × 168 | 840 | TTITL | ||||
| 88 | 16 × 42 | 672 | |||||
| 89 | L00227g ( | 39 × 39 | 1521 | TM × 1 | |||
| 112 | 31 × 75/44 × 45/4 × 243 | 2325/1980/924 | -/-/SHITT-G | ||||
| 90 | M00132g ( | 17 × 24 | 408 | YIR019c ( | 5 × 30; 5 × 36 | TM × 1; PT × 2; Plakin | |
| 113 | ( | 31 × 12/4 × 51 | 372/204 |
C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Table 2 for cues (1–2).
(3) Only long motifs are indicated (see text for details).
Megasatellites in C. glabrata genes
| MS # | Gene Name | MS | Size (nt) | MS in S.c. | Domain ( | Motif ( | |
|---|---|---|---|---|---|---|---|
| 205 | C00253g ( | 6 × 300 | 1800 | YIR019 ( | 5 × 30; 5 × 36 | TM × 1 | SFFIT |
| 206 | E00231g ( | 5 × 135 | 675 | SHITT | |||
| 207 | 3 × 300 | 900 | SFFIT | ||||
| 208 | E01661g | 5 × 300 | 1500 | YIR019 ( | 5 × 30; 5 × 36 | TM × 1 | SFFIT |
| 209 | G10219g ( | 5 × 138 | 690 | YHR211w ( | 7 × 135; 3 × 21 | EGF × 1 | SHITT |
| 210 | H02783g | 3 × 135 | 405 | YJL076w ( | unknown ( | ||
| 211 | H10626g ( | 3 × 135 | 405 | YAR050w ( | 10 × 135 | TM × 1 | SHITT |
| 212 | I00220g ( | 4 × 177 | 708 | YAR050w ( | 10 × 135 | TM × 1 | SHITT degen. |
| 213 | I07293g ( | 16 × 135 | 2160 | known ( | |||
| 214 | I10147g ( | 32 × 300 | 9600 | YHR211w ( | 7 × 135; 3 × 21 | PA14; β-lac.; NADP | SFFIT |
| 215 | I10246g ( | 5 × 300 | 1500 | TM × 1 | SFFIT | ||
| 216 | I10340g ( | 3 × 300 | 900 | YAL063c ( | 13 × 135 | TM × 1 | SFFIT |
| 217 | I10362g ( | 3 × 135 | 405 | PA14; Invasin | SHITT | ||
| 218 | 4 × 135 | 540 | SHITT | ||||
| 219 | 27 × 12 | 324 | TM × 6; Ribo | ||||
| 220 | 4 × 135 | 540 | unknown ( | ||||
| 91 | K13024g ( | 4 × 39 | 156 | YIR019c ( | 5 × 30; 5 × 36 | TM × 8 | |
| 221 | 5 × 132 | 660 | SHITT | ||||
| 222 | L00157g ( | 11 × 141 | 1551 | YAR050w ( | β-lac.; GLNA | SHITT | |
| 223 | 4 × 300 | 1200 | SFFIT | ||||
| 224 | L09911g ( | 5 × 300 | 1500 | SFFIT | |||
| 225 | L13310g ( | 10 × 141 | 1410 | PA14 | SHITT | ||
| 226 | ( | 7 × 300 | 2100 | SFFIT | |||
| 227 | L13332g ( | 4 × 297 | 1188 | YAL063c ( | 13 × 135 | TM × 1 | SHITT-V |
| 228 | L10092g ( | 3 × 300 | 900 | TM × 1; β-lac. | SFFIT | ||
| 229 | 3 × 306 | 918 | SHITT-V |
C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Tables 2 and 3 for cues (1–3).
(4) No occurence of this motif was found in databases.
(5) Several occurences of genes containing this motif were found in K. delphensis (see text).
Minisatellites in C. glabrata pseudogenes
| MS# | Pseudogene Name | MS | Size (nt) | Coordinates ( | Domain ( | Motif ( |
|---|---|---|---|---|---|---|
| 230 | A04873g ( | 4 × 300 | 1200 | 482 956–484 291 | GLNA | SFFIT |
| 114 | 20 × 141/10 × 309 | 2780/3090 | SHITT/SHITT-V | |||
| 231 | B05093g ( | 11 × 135 | 1485 | 499 712–501 364 | TM × 1 | SHITT |
| 92 | 17 × 12 | 204 | 104 419–106 401 | TM × 9; PT × 2; ABC | ||
| 93 | 29 × 12 | 348 | ||||
| 232 | E00143g ( | 6 × 141 | 846 | 4621–6420 | EGF × 2 | SHITT |
| 233 | 4 × 300 | 1200 | SFFIT | |||
| 115 | F00110g ( | 11 × 9/5 × 126 | 99/630 | 2275–2910 | TM × 2; EGF | -/SHITT |
| 234 | H00132g ( | 4 × 129 | 516 | 4229–4837 | TM × 2; Q6CXZ8 | SHITT |
| 235 | I00110g ( | 3 × 141 | 423 | 2407–5280 | TM × 2; α−β Hyd. | SHITT |
| 236 | 9 × 300 | 2700 | SFFIT | |||
| 237 | I10200g ( | 10 × 300 | 3000 | 992 434–998 401 | PA14; TM × 1 | SFFIT |
C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Tables 2 and 3 for cues (1–3).
(4) Coordinates of beginning and end of the pseudogene, in nucleotides.
Figure 2.Two examples of compound minisatellites. Minisatellites are shown by color boxes, short motifs in yellow, long motifs in blue. Gray boxes represent partial 5′ and 3′ parts of gene coding sequences, along with gene names for which the first five characters have been ommited (see legend to Figure 1). Short motifs have been numbered from 1 to 10, motif 1 is used as the reference, and point mutations are shaded. Long motifs have been lettered from A to E, motif A is used as the reference, and point mutations are shaded. The 5′ part of each motif (L region) is common to both short and long motifs, whereas the 3′ part (R region) is different between short and long motifs. Duplicated blocks are roman numbered under each minisatellite. Note that for MS#111, the large duplicated block in the middle of the minisatellite contains several shorter internal duplications.
Figure 3.Comparison of minisatellites in EPA genes in two different C. glabrata strains. (A) Schematic representation of the EPA1, EPA2 and EPA3 genes, located on the right subtelomeric region of chromosome VI. Note that gene order and organization are identical in both the BG2 and the CBS138 strains. (B) Minisatellites in the three EPA genes show size polymorphism. DNA self-matrix of EPA1, EPA2 and EPA3 are shown for each of the two strains studied (BG2 and CBS138). Gene names are indicated in the right upper corner of each matrix. Number and size of each motif are shown next to each minisatellite. Note the additional compound minisatellite in EPA3 in the BG2 strain. The smaller repeats (2 × 24 bp and 2 × 15 bp), not detected in the CBS138 strain due to the parameters chosen for the program (see Materials and methods section), are slightly expanded in the BG2 strain.
Figure 4.Alignments of megasatellite motifs. The first motif of each megasatellite was aligned using ClustalW. The signature motif in each family (SFFIT, SHITT, TTITL) is shown in a light gray box to the left. The megasatellite number is indicated left to the sequence (MS#), followed by the number of repeat motifs within the megasatellite (in parentheses). The central part of the SHITT motif in which insertions occur (see text) is indicated by a dark gray box.
Amino acids encoded by minisatellites and megasatellites
| Minisatellites | Megasatellites | ||||
|---|---|---|---|---|---|
| Amino acid | AA | Amino acid | AA | ||
| Serine | S | 27.8 | Threonine | T | 20.5 |
| Glycine | G | 18.0 | Serine | S | 9.9 |
| Proline | P | 11.8 | Aspartic acid | D | 7.8 |
| Asparagine | N | 10.0 | Valine | V | 7.3 |
| Alanine | A | 6.8 | Glycine | G | 6.8 |
| Threonine | T | 3.8 | Proline | P | 6.7 |
| Glutamic acid | E | 3.8 | Isoleucine | I | 6.4 |
| Valine | V | 3.5 | Glutamic acid | E | 5.8 |
| Aspartic acid | D | 3.3 | Alanine | A | 4.7 |
| Lysine | K | 2.9 | Asparagine | N | 3.9 |
| Glutamine | Q | 2.6 | Tyrosine | Y | 3.6 |
| Methionine | M | 1.7 | Leucine | L | 3.4 |
| Isoleucine | I | 1.0 | Phenylalanine | F | 3.3 |
| Leucine | L | <1.0 | Lysine | K | 3.3 |
| Arginine | R | <1.0 | Histidine | H | 2.4 |
| Histidine | H | <1.0 | Arginine | R | 1.6 |
| Tyrosine | Y | <1.0 | Tryptophane | W | 1.1 |
| Phenylalanine | F | <1.0 | Glutamine | Q | <1.0 |
| Cysteine | C | <1.0 | Methionine | N | <1.0 |
| Tryptophane | W | <1.0 | Cysteine | C | <1.0 |
Size distribution of C. glabrata minisatellites and megasatellites
| Minisatellites | ||||||||||||||||||||
| Motif size | 9 | 12 | 15 | 18 | 21 | 24 | 27 | 30 | 33 | 39 | 42 | 45 | 48 | 51 | 54 | 57 | 63 | 75 | 108 | 120 |
| Nb of occurrences | 6 | 41 | 19 | 17 | 2 | 5 | 3 | 3 | 3 | 3 | 3 | 2 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Nb of families | 5 | 38 | 18 | 15 | 2 | 5 | 3 | 3 | 3 | 3 | 2 | 2 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Megasatellites | ||||||||||||||||||||
| Motif size | 126 | 129 | 132 | 135 | 138 | 141 | 168 | 177 | 240 | 243 | 258 | 270 | 297 | 300 | 306 | 309 | 429 | |||
| Nb of occurrences | 1 | 1 | 1 | 9 | 1 | 5 | 3 | 1 | 1 | 1 | 1 | 1 | 1 | 14 | 1 | 1 | 1 | |||
| Nb of families | 1 | 1 | 1 | 4 | 4 | 1 | 1 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| Family ( | H | H | H | H (2) | H | H | H (3) | H | G | G | G | S | V | S | V | V | S | |||
(1) Megasatellite family: H, SHITT; G, SHITT-G; V, SHITT-V; S, SFFIT; T, TTITL.
(2) Four families, including one SHITT (H) and three other unrelated familes
(3) Two families, including one SHITT (H) and one TTITL (T)
Figure 5.Insertion of a new motif within a minisatellite: two possible models. The motif may target a pre-existing minisatellite, and subsequently spread by intragenic gene conversion (A). Alternatively, the same motif may target a gene that does not contain a minisatellite, and is afterwards expanded in a minisatellite (B). Note that both models are not mutually exclusive, but only model A may lead to compound minisatellites.