Literature DB >> 18812401

Megasatellites: a peculiar class of giant minisatellites in genes involved in cell adhesion and pathogenicity in Candida glabrata.

Agnès Thierry1, Christiane Bouchier, Bernard Dujon, Guy-Franck Richard.   

Abstract

Minisatellites are DNA tandem repeats that are found in all sequenced genomes. In the yeast Saccharomyces cerevisiae, they are frequently encountered in genes encoding cell wall proteins. Minisatellites present in the completely sequenced genome of the pathogenic yeast Candida glabrata were similarly analyzed, and two new types of minisatellites were discovered: minisatellites that are composed of two different intermingled repeats (called compound minisatellites), and minisatellites containing unusually long repeated motifs (126-429 bp). These long repeat minisatellites may reach unusual length for such elements (up to 10 kb). Due to these peculiar properties, they have been named 'megasatellites'. They are found essentially in genes involved in cell-cell adhesion, and could therefore be involved in the ability of this opportunistic pathogen to colonize the human host. In addition to megasatellites, found in large paralogous gene families, there are 93 minisatellites with simple shorter motifs, comparable to those found in S. cerevisiae. Most of the time, these minisatellites are not conserved between C. glabrata and S. cerevisiae, although their host genes are well conserved, raising the question of an active mechanism creating minisatellites de novo in hemiascomycetes.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18812401      PMCID: PMC2566889          DOI: 10.1093/nar/gkn594

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

As more and more eukaryotic genomes are sequenced, a wealth of new information on gene duplications, evolution of paralogous sets of genes, differential loss of genes and neo-functionalization of paralogues becomes available. With the largest number of species sequenced within a single phylum, hemiascomycetous yeasts stand up as a reference for comparative genomics (1). Minisatellites are a subclass of DNA tandem repeats, that exhibit size polymorphism among different individuals or isolates (2). In previous works, it was found that minisatellites are spread in the Saccharomyces cerevisiae genome, and are preferentially encountered in genes encoding proteins involved in cell wall formation (3,4). Such proteins, including those belonging to the FLO family of flocculins, exhibit a variable number of repeats among different yeast strains. The role of such repeats is illustrated by the fact that strains having a larger number of repeats in the FLO1 gene exhibit better adhesion than those with a smaller number of repeats (5). Similarly, S. cerevisiae strains isolated from biofilms formed at the surface of sherry wines contain an increased number of repeats in one of the FLO11 minisatellites, supporting the importance of such sequences in cell adhesion (6). We previously reported that S. cerevisiae minisatellites are frequently not conserved in the corresponding orthologous gene in other hemiascomycetes, suggesting that minisatellites are created, evolve and disappear at a faster pace that the genes containing them (3), a property shared by microsatellites, another class of DNA tandem repeats with shorter motifs (7). In order to investigate more thoroughly the mechanisms of creation and loss of minisatellites in a pathogenic hemiascomycete, we searched all such repeats in the genome of Candida glabrata, a human opportunistic pathogen, responsible for mucosal candidiasis, blood stream infections and vaginitis. Candida glabrata is the second cause of nosocomial infections due to yeasts, after C. albicans. Its genome was completely sequenced, and revealed its closer relationship to S. cerevisiae than to C. albicans (8), making comparisons easier. Despite similar genome sizes, we found three times as many minisatellites in C. glabrata as compared to S. cerevisiae. We also discovered two new species of minisatellites absent from the S. cerevisiae genome, including some unusually long minisatellites, composed of several kilobases of a tandemly repeated sequence. We propose to name them ‘megasatellites’, in order to distinguish them from more regular minisatellites. Megasatellites are present in genes whose sequences suggest that they are involved in cellular adhesion. Some of the peculiar DNA motifs encoded by megasatellites are also found in Kluyveromyces delphensis, but not in more distantly related yeast species (nor in any other living organism), suggesting that they are specific to this branch of the hemiascomycetes, and may be involved in creating new gene functions in these yeast species.

MATERIALS AND METHODS

Analysis of the C. glabrata genome

The complete sequence of C. glabrata strain CBS138 was analyzed using the MREPS program (9), and the following parameters: minimal size of repeat unit (-minp) equal to 10, minimal repeat length (-minsize) equal to 30. Since the resolution parameter (allowing some degree of ‘fuzziness’ within the repeat) was set at the minimal value, variant repeats could not be detected. Therefore, repeats were individually examined and minisatellites manually extended 5′ and 3′ of the initial repeat detected by MREPS, as described previously (3). In addition, some minisatellites, corresponding in fact to imperfect microsatellites (10), were detected by the program but not taken into account thereafter. Using this approach, MREPS detected 706 repeats fulfilling the required criteria. After careful examination, some of the repeats found by the program were partially overlapping or were part of the same minisatellite, resulting in a final number of 238 minisatellites used for the present analysis, including 145 detected in coding regions. Since several genes contain more than one repeat array, each of the 145 minisatellites was given a unique identifier, from MS#1 to MS#237. Compound minisatellites are also given a single identifier, followed by a letter for each motif of the minisatellite (e.g. MS#109a represents the 20 × 12 bp motif and MS#109b represents the 3 × 168 bp motif of compound MS#109). Minisatellite size polymorphism was determined by standard PCR and Southern blot methods, using the CBS138 type strain, the laboratory BG2 strain (11), and two strains isolated form infected patients, F11017Blo1 and F15035Blo1, a kind gift of C. Hennequin (Muller,H. et al., manuscript in preparation).

Search for orthologues

The functional annotation of the C. glabrata genome, developed during the course of the Génolevures 2 project, was used [http://cbi.labri.fr/Genolevures; (8)]. Whenever several homologs were found in the S. cerevisiae genome, synteny data were used to discriminate among the possible genes. When synteny data were unsufficient to discriminate between two or three possible homologs of a C. glabrata gene, all of them were indicated (seven instances, Table 2). Many C. glabrata genes exhibit sequence similarities to several S. cerevisiae genes belonging to the FLO/STA superfamily of flocculins. These similarities were always limited in size and not sufficient to identify the right ortholog. Synteny data did not allow to discriminate among the possible homologues either. Sequence similarities to large DNA motifs in K. delphensis were searched with tblastx, using the motif itself (SHITT, SFFIT or TTITL) as a query, in a K. delphensis DNA database of 17 000 sequences (genome coverage 0.8×), provided by the Pasteur Genopole DNA sequencing facilities.
Table 2.

Simple minisatellites in C. glabrata genes

MS #Gene NameMSSize (nt)S. cerevisiae homologueMS in S.c.Domain (1)
 1A01474g5 × 1575YGL028c (SCW11)8 × 12
 218 × 9162
 3A02728g3 × 1854YDR363wa (SEM1)
 4A04081g3 × 1236YLR194c
 5A04257g6 × 1272YBL054w
 6B00946g3 × 1854YCL028w (RNQ1)
 7B02299g3 × 1236YML114c (TAF65)
 8C00209g (2)11 × 18198YJR151c (DAN4)30 × 18; 7 × 72TM × 3; Serpaup
 9C00968g (2)5 × 1260YOL155c (HPF1)5 × 39TM × 6
103 × 1236
1142 × 12504
12C01133g (2)45 × 12540YOL155c (HPF1)5 × 39TM × 8; ABC
13C01265g3 × 1545YIL115c (NUP159)
14D01254g3 × 1236
15D01364g4 × 1248YBR112c (CYC8)3 × 18
16D03674g4 × 936YPL226w (NEW1)
17D04708g4 × 936YPR124w (CTR1)
183 × 1236
19D06226g3 × 108324YAL063c (FLO9)13 × 135TM × 9
20D06622g4 × 1560YLL021w (SPA2)
21E02255g3 × 1236YOL109w (ZEO1)
22EO6644g (EPA1)4 × 120480YAR050w (FLO1)10 × 135TM × 1
23F01463g3 × 1854YOR010c (TIR2) or YER011w (TIR1)5 × 33
24F01859g3 × 1236YLR054c (OSW2)
25F06831g4 × 1872YIR033w (MGA2) or YKL020c (SPT23)
26F07513g3 × 1236YKL093w (MBR1)
27G00154g6 × 1272YGR285c (ZUO1)
28G01991g4 × 1248YOR056c (NOB1)
29G02827g6 × 1272YIL105c (SLM1)
30G04125g9 × 12108YJR004c (SAG1)
31G04829g5 × 1260YML017w (PSP2)
32G05830g5 × 54270YHR146w (CRP1) or YNL173c (MDG1)
33G05896g4 × 1248YHR143w (DSE2)
34G08954g6 × 1272YOL019w (TOS7)
35G09449g3 × 1236YGR189c (CRH1)5 × 24
36G10175g5 × 30150YJR151c (DAN4)30 × 18; 7 × 72TM × 4; Serpaup
37H02057g5 × 30150YHR089c (GAR1)
38H02123g9 × 12108YHR086w (NAM8)
39H02189g13 × 9117YMR269w (TMA23)
40H03443g3 × 1854YGL073w (HSF1)
41H04037g3 × 1236YOR178c (GAC1) or YLR273c (PIG1)
42H05577g9 × 18162YPL085w (SEC16)
43H06897g3 × 1236YML098w (TAF13)
44H07557g5 × 1260YGL254w (FZF1)
45H08844g36 × 451,620YMR173w (DDR48)6 × 24; 4 × 24
46H09592g3 × 1854YER011w (TIR1)7 × 36
47H096143 × 1854YER011w (TIR1)7 × 36
48I02156g3 × 2163YHR161c (YAP1801)
49I05610g5 × 1260YNR014w
50I06006g4 × 936YJL148w (RPA34)
51I06204g4 × 57228YKL164c (PIR1) or YKL163w (PIR3)8 × 57 or 6 × 54
52I07161g4 × 1248YOR141c (ARP8)
53J01980g7 × 1284YIR002c (MPH1)
54J02508g9 × 15135TM × 3; Collagen; Antifreeze
5511 × 18198
56J02530g9 × 15135TM × 1; Collagen
5726 × 18468
58J02552g12 × 15180TM × 3; Collagen; Antifreeze; PRich × 3
5911 × 18198
6023 × 18414
61J04246g3 × 1854YMR234w (RNH1)
62J06160g8 × 15120YNL166c (BNI5)
63J09988g4 × 1560YNL063w (MTQ1)
64J10076g3 × 1545YNL058c
65J11352g7 × 1284YNL186w (UBP10)7 × 12
66J11968g(2) (EPA15)27 × 12324YIR019c (FLO11)5 × 30; 5 × 36TM × 1; PT × 2; S-T Kin.; Plakin
6717 × 24408
68K03707g4 × 1248YMR124w
69K06435g6 × 1272YDR464w (SPP41)
70K07700g9 × 27243YFL023w (BUD27)6 × 30
71L04488g4 × 1560YOR166c (SWT1)
72L05280g3 × 1545YKL087c (CYT2)
73L06424g9 × 27243YLR110c (CCW12) or YDR134c3 × 12
74L06644g6 × 1272YHR154w (RTT107)
75L11418g4 × 2184YML071c (COG8)
76M01738g3 × 1854YBR081c (SPT7)
77M03729g3 × 1854YNL298w (CLA4)
78M04169g4 × 1248YNL322c (KRE1)
79M05181g4 × 1560YMR240c (CUS1)
80M09273g3 × 1545YJR083c (ACF4)
81M10395g3 × 39117TM × 3
82M12573g3 × 1236YIL061c (SNP1)

C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text).

(1) As described by InterProScan. TM, transmembrane span domain. Given the high number of false positive predictions, proteins with only one TM span have a low probability of being transmembrane proteins [ref. (17)]; Serpaup, Seripauperin domain PT, short repeat domain composed on the tetrapeptide XPTX; ABC, ABC transporter type-1 domain; EGF, found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted; Q6CXZ8, lipoprotein GPI-anchor membrane facilitator; α−β Hyd., domain found in the superfamily of α−β hydrolases; PA14, found in bacterial toxins, glucosidases and adhesins, probably involved in carbohydrate binding [ref. (18)]; β-lac., β-lactamase/transpeptidase domain; NADP, NADP-binding domain; Invasin, Invasin/intimin cell-adhesion domain; Ribo, ribosomal protein S14 domain; Collagen, member of the collagen superfamily, involved in connective tissue structure; Antifreeze, insect cysteine-rich antifreeze protein; PRich, highly glycosylated proline-rich cell wall proteins (extensins) in plants, probably involved in interactions with cell-wall carbohydrates [ref. (21)]; S-T Kin., Serine-Threonine kinase domain; Plakin, multiple repeats of beta(2)-alpha(2) motif, found in Ankyrin and Plakin repeats; GLNA, glutamine synthetase domain; CWP, cell wall peptidoglycan-anchor surface signal; Kelch, actin-interacting Kelch domain; ASP, aspartyl protease active site; Dynein, outer arm dynein light chain superfamily domain; GETHR, pentapeptide repeat of unknown function, mainly found in C. elegans.

(2) Overall quality of the sequence is not sufficient to determine the precise number of repeat units.

Amino acid composition and motif analysis

To determine the global composition of the 93 minisatellites with short motifs, all motifs were concatenated and calculation was performed using the DNA Strider 1.4f6 software (12). Long motifs were aligned using the ClustalW software on the BioWeb interface at the Pasteur Institute (http://bioweb.pasteur.fr/seqanal/interfaces/clustalw-simple.html). GC skews were calculated as (G−C/G+C) or (A−T)/(A+T), using DNA Strider. Both GC content and GC skew of minisatellite-containing genes were calculated on the gene DNA sequence without the minisatellite. Search for known domains in cell wall proteins was performed using the InterProScan software (http://www.ebi.ac.uk/InterProScan). The long motifs (SHITT, SFFIT, TITTL and the three unknown motifs) were used as queries for a Blast search into the NCBI non-redundant GeneBank CDS translations, PDB, Swissprot and PIR databases.

RESULTS

We performed a genome-wide search for minisatellites in the C. glabrata genome, using the MREPS software (9), set to the same parameters used previously for the S. cerevisiae genome (3) (see Materials and methods section). Given that the two yeast genomes have similar sizes and nucleotide composition [12.1 Mb for S.cerevisiae, 12.3 Mb for C. glabrata; (8)], a similar number of minisatellites was expected. Instead, a total of 145 minisatellites in 112 protein-coding genes and 93 minisatellites in intergenic regions were found in C. glabrata, compared to 55 in genes and 11 in intergenic regions in S. cerevisiae. Minisatellites in C. glabrata show no obvious bias for specific chromosomal locations (Figure 1).
Figure 1.

Distribution of minisatellites in the C. glabrata genome. Each chromosome is represented by a horizontal line, from the left to the right telomere. Vertical short lines represent the 109 minisatellite-containing genes and pseudogenes. Each gene starts with CAGL0 followed by the chromosome letter (A–M) then by the gene five-digit number and a final ‘g’ (38). Only the five-digit number is given here (e.g. 01284 on chromosome A stands for CAGL0A01284g). Note that some minisatellites may cumulate several properties, i.e. being a compound minisatellite with a long motif, in which case it is both underlined and colored. Size of the two rDNA arrays is not precisely known.

Distribution of minisatellites in the C. glabrata genome. Each chromosome is represented by a horizontal line, from the left to the right telomere. Vertical short lines represent the 109 minisatellite-containing genes and pseudogenes. Each gene starts with CAGL0 followed by the chromosome letter (A–M) then by the gene five-digit number and a final ‘g’ (38). Only the five-digit number is given here (e.g. 01284 on chromosome A stands for CAGL0A01284g). Note that some minisatellites may cumulate several properties, i.e. being a compound minisatellite with a long motif, in which case it is both underlined and colored. Size of the two rDNA arrays is not precisely known. Minisatellites are, on average, more GC-rich than the genes containing them, but no obvious GC skew was noted, in contrary to S. cerevisiae where minisatellites show more cytosines than guanines on the coding strand. In intergenic regions, minisatellites are shorter and contain less repeat units than in coding regions, like in S. cerevisiae. All these data are summarized in Table 1.
Table 1.

Comparative distributions of minisatellites in the S. cerevisiae and C. glabrata genomes

RegionsCharacteristicsS. cerevisiae (1)C. glabrata
Genome size (Mb)12.112.3
Total number of minisatellites (2)66238
Coding regionsNumber of minisatellites55145
Average GC% (3)4443
Minisatellite GC%4851
Minisatellite GC skew (4)−0.110.00
Intergenic regionsNumber of minisatellites1193
Average GC%29–36 (5)ND
Minisatellite GC%2946

(1) From Richard and Dujon (3).

(2) Excluding the 18 Y′ subtelomeric minisatellites.

(3) Calculated on minisatellite-containing genes only. The average for the complete set of genes is 39% for S. cerevisiae and 41% for C. glabrata.

(4) Calculated as the difference between the minisatellite GC skew and the gene GC skew (excluding the minisatellite sequence).

(5) GC% in intergenic regions varies between promoter-convergent and promoter-divergent regions.

Comparative distributions of minisatellites in the S. cerevisiae and C. glabrata genomes (1) From Richard and Dujon (3). (2) Excluding the 18 Y′ subtelomeric minisatellites. (3) Calculated on minisatellite-containing genes only. The average for the complete set of genes is 39% for S. cerevisiae and 41% for C. glabrata. (4) Calculated as the difference between the minisatellite GC skew and the gene GC skew (excluding the minisatellite sequence). (5) GC% in intergenic regions varies between promoter-convergent and promoter-divergent regions.

Unusual minisatellites scatter the C. glabrata genome

In addition to the presence of 93 ‘simple’ minisatellites similar in size and composition to those discovered in S. cerevisiae (Tables 2, 3, 4, 5, minisatellites numbered from #1 to #93), the C. glabrata genome contains two peculiar types of minisatellites. First, 15 minisatellites are made of two different motifs (or even three, in one case) intermingled with each other (Tables 3, 5 minisatellites numbered from #101 to #115). In each case, the two motifs have different sizes and are repeated a different number of times, with no regular period, as if two decks of playing cards were shuffled with each other (two examples are shown in Figure 2). In 10 cases out of these 15 ‘compound minisatellites’, the two motifs share a common sequence at their 5′-ends (L on Figure 2) but the 3′ ends (R on Figure 2) are different (5- and 3-ends are defined according to the coding DNA strand of the gene that contains the minisatellite) (Tables 3, 5).
Table 3.

Compound minisatellites in C. glabrata genes

MS#Gene NameMSSize (nt)S. cerevisiae homologueMS in S.c.Domain (1)Motif (3)
101A01284g (2)5 × 27/3 × 30135/90TM × 4
102B03685g4 × 15/3 × 1860/54YCR004c (YCP4) or YDR032c (PST2) or YBR052c
83E06666g (2)13 × 42546YIR019c (FLO11)5 × 30; 5 × 36TM × 5; PRich × 7; PA14
201(EPA2)4 × 168672TTITL
10348 × 15/14 × 48720/672
84E06688g (2)10 × 42420YKR102w (FLO10)3 × 81TM × 5; PRich × 5; PA14
104(EPA3)26 × 15/9 × 63390/567
85I10098g8 × 33264YIR019c (FLO11)5 × 30; 5 × 36TM × 10; PA14; PRich × 4; CWP; Kelch
10535 × 33/4 × 2401155/912-/SHITT-G
106J01727g47 × 15/13 × 24705/312YIR019c (FLO11)5 × 30; 5 × 36TM × 8
86J01774g (2)5 × 24120YKL112w (ABF1)Collagen
10732 × 24/60 × 48768/2880
1086 × 33/2 × 258198/420-/SHITT-G
202J05170g (2)7 × 135945SHITT
2034 × 2701080SFFIT degen.
10920 × 12/3 × 168240/504-/SHITT
110J11924g (2)35 × 12/3 × 429420/1287TM × 2; ASP-/SFFIT degen.
87K00170g (2)12 × 15180TM × 7; Dynein; GETHR × 9
11145 × 15/10 × 48675/480
2045 × 168840TTITL
8816 × 42672
89L00227g (2)39 × 391521TM × 1
11231 × 75/44 × 45/4 × 2432325/1980/924-/-/SHITT-G
90M00132g (2)17 × 24408YIR019c (FLO11)5 × 30; 5 × 36TM × 1; PT × 2; Plakin
113(EPA12)31 × 12/4 × 51372/204

C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Table 2 for cues (1–2).

(3) Only long motifs are indicated (see text for details).

Table 4.

Megasatellites in C. glabrata genes

MS #Gene NameMSSize (nt)S. cerevisiae homologueMS in S.c.Domain (1)Motif (3)
205C00253g (2)6 × 3001800YIR019 (FLO11)5 × 30; 5 × 36TM × 1SFFIT
206E00231g (2)5 × 135675SHITT
2073 × 300900SFFIT
208E01661g5 × 3001500YIR019 (FLO11)5 × 30; 5 × 36TM × 1SFFIT
209G10219g (2)5 × 138690YHR211w (FLO5)7 × 135; 3 × 21EGF × 1SHITT
210H02783g3 × 135405YJL076w (NET1)unknown (4)
211H10626g (2)3 × 135405YAR050w (FLO1)10 × 135TM × 1SHITT
212I00220g (2)4 × 177708YAR050w (FLO1)10 × 135TM × 1SHITT degen.
213I07293g (2)16 × 1352160known (5)
214I10147g (2)32 × 3009600YHR211w (FLO5)7 × 135; 3 × 21PA14; β-lac.; NADPSFFIT
215I10246g (2)5 × 3001500TM × 1SFFIT
216I10340g (2)3 × 300900YAL063c (FLO9)13 × 135TM × 1SFFIT
217I10362g (2)3 × 135405PA14; InvasinSHITT
2184 × 135540SHITT
219J01800g (2)27 × 12324TM × 6; Ribo
2204 × 135540unknown (4)
91K13024g (2)4 × 39156YIR019c (FLO11)5 × 30; 5 × 36TM × 8
2215 × 132660SHITT
222L00157g (2)11 × 1411551YAR050w (FLO1)β-lac.; GLNASHITT
2234 × 3001200SFFIT
224L09911g (2)5 × 3001500SFFIT
225L13310g (2)10 × 1411410PA14SHITT
226(EPA11)7 × 3002100SFFIT
227L13332g (2) (EPA13)4 × 2971188YAL063c (FLO9)13 × 135TM × 1SHITT-V
228L10092g (2)3 × 300900TM × 1; β-lac.SFFIT
2293 × 306918SHITT-V

C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Tables 2 and 3 for cues (1–3).

(4) No occurence of this motif was found in databases.

(5) Several occurences of genes containing this motif were found in K. delphensis (see text).

Table 5.

Minisatellites in C. glabrata pseudogenes

MS#Pseudogene NameMSSize (nt)Coordinates (4)Domain (1)Motif (3)
230A04873g (2)4 × 3001200482 956–484 291GLNASFFIT
11420 × 141/10 × 3092780/3090SHITT/SHITT-V
231B05093g (2)11 × 1351485499 712–501 364TM × 1SHITT
92C01067g (2)17 × 12204104 419–106 401TM × 9; PT × 2; ABC
9329 × 12348
232E00143g (2)6 × 1418464621–6420EGF × 2SHITT
2334 × 3001200SFFIT
115F00110g (2)11 × 9/5 × 12699/6302275–2910TM × 2; EGF-/SHITT
234H00132g (2)4 × 1295164229–4837TM × 2; Q6CXZ8SHITT
235I00110g (2)3 × 1414232407–5280TM × 2; α−β Hyd.SHITT
2369 × 3002700SFFIT
237I10200g (2)10 × 3003000992 434–998 401PA14; TM × 1SFFIT

C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Tables 2 and 3 for cues (1–3).

(4) Coordinates of beginning and end of the pseudogene, in nucleotides.

Figure 2.

Two examples of compound minisatellites. Minisatellites are shown by color boxes, short motifs in yellow, long motifs in blue. Gray boxes represent partial 5′ and 3′ parts of gene coding sequences, along with gene names for which the first five characters have been ommited (see legend to Figure 1). Short motifs have been numbered from 1 to 10, motif 1 is used as the reference, and point mutations are shaded. Long motifs have been lettered from A to E, motif A is used as the reference, and point mutations are shaded. The 5′ part of each motif (L region) is common to both short and long motifs, whereas the 3′ part (R region) is different between short and long motifs. Duplicated blocks are roman numbered under each minisatellite. Note that for MS#111, the large duplicated block in the middle of the minisatellite contains several shorter internal duplications.

Two examples of compound minisatellites. Minisatellites are shown by color boxes, short motifs in yellow, long motifs in blue. Gray boxes represent partial 5′ and 3′ parts of gene coding sequences, along with gene names for which the first five characters have been ommited (see legend to Figure 1). Short motifs have been numbered from 1 to 10, motif 1 is used as the reference, and point mutations are shaded. Long motifs have been lettered from A to E, motif A is used as the reference, and point mutations are shaded. The 5′ part of each motif (L region) is common to both short and long motifs, whereas the 3′ part (R region) is different between short and long motifs. Duplicated blocks are roman numbered under each minisatellite. Note that for MS#111, the large duplicated block in the middle of the minisatellite contains several shorter internal duplications. Simple minisatellites in C. glabrata genes C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). (1) As described by InterProScan. TM, transmembrane span domain. Given the high number of false positive predictions, proteins with only one TM span have a low probability of being transmembrane proteins [ref. (17)]; Serpaup, Seripauperin domain PT, short repeat domain composed on the tetrapeptide XPTX; ABC, ABC transporter type-1 domain; EGF, found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted; Q6CXZ8, lipoprotein GPI-anchor membrane facilitator; α−β Hyd., domain found in the superfamily of α−β hydrolases; PA14, found in bacterial toxins, glucosidases and adhesins, probably involved in carbohydrate binding [ref. (18)]; β-lac., β-lactamase/transpeptidase domain; NADP, NADP-binding domain; Invasin, Invasin/intimin cell-adhesion domain; Ribo, ribosomal protein S14 domain; Collagen, member of the collagen superfamily, involved in connective tissue structure; Antifreeze, insect cysteine-rich antifreeze protein; PRich, highly glycosylated proline-rich cell wall proteins (extensins) in plants, probably involved in interactions with cell-wall carbohydrates [ref. (21)]; S-T Kin., Serine-Threonine kinase domain; Plakin, multiple repeats of beta(2)-alpha(2) motif, found in Ankyrin and Plakin repeats; GLNA, glutamine synthetase domain; CWP, cell wall peptidoglycan-anchor surface signal; Kelch, actin-interacting Kelch domain; ASP, aspartyl protease active site; Dynein, outer arm dynein light chain superfamily domain; GETHR, pentapeptide repeat of unknown function, mainly found in C. elegans. (2) Overall quality of the sequence is not sufficient to determine the precise number of repeat units. Compound minisatellites in C. glabrata genes C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Table 2 for cues (1–2). (3) Only long motifs are indicated (see text for details). The second type of peculiar minisatellites is made by those composed of unusually long motifs (from 126 to 429 bp) repeated from 3 to 32 times. Thirty-seven such minisatellites with long motifs were detected (numbered from #201 to #237), in addition to seven being part of a compound minisatellite, for a total of 44 minisatellites with long motifs. Five of them reach a total length >2 kb, the longest being 9.6 kb long (MS#214 in gene CAGLOI10147g), a length that no S. cerevisiae minisatellite reached. Given the unusual size of the repeated motif, these tandem repeats were named ‘megasatellites’ (Tables 3, 4). Both compound minisatellites and megasatellites are exclusively found in coding regions and their motif is always a multiple of three, raising the intriguing question of their formation in the genes containing them. Note that 12 out of 144 minisatellites are found in pseudogenes (Table 5), a much higher proportion than expected from random distribution, the C. glabrata genome containing ∼1% of pseudogenes, compared to active genes (I. Lafontaine and B. D., personal communication). In order to estimate the degree of polymorphism found in such large arrays, we analyzed megasatellite sizes, by Southern blot hybridization of DNA extracted from three C. glabrata strains, isolated from infected patients (Muller,H. et al., manuscript in preparation), and compared them to the same megasatellites in the sequenced strain (CBS138), used as a reference. For two megasatellites out of the three tested, we found polymorphism in at least one of the strains tested (data not shown). One strain shows a large increase in the MS#213 megasatellite size, corresponding to 7–8 additional 135-bp repeat units within gene CAGL0I07293g. In addition, gene CAGL0J05170g shows size increase in this strain, whereas in another strain it exhibits a size decrease. This gene contains three different megasatellites (MS#202, 203 and 109), and we did not determine which one is polymorphic (more than one may exhibit polymorphism). The last minisatellite tested (MS#224) did not show any clear polymorphism among the three strains tested as compared to the reference strain (data not shown). In addition, we also compared the size of minisatellites found in the EPA gene family, between the CBS138 reference strain and the BG2 strain. The EPA gene family is composed of at least 15 members in the BG2 strain. These genes encode surface glycoproteins involved in cell–cell adhesion and pathogenicity. Eight of them (EPA1 to EPA8) were sequenced in the BG2 strain (13,14). Among them, EPA7 and EPA8 do not contain minisatellites, the six other members containing simple minisatellites, as well as compound minisatellites and megasatellites. EPA4 and EPA5 were not found in the CBS138 strain, and EPA6 does not contain any minisatellite in this strain. We, therefore, focused on the three remaining members (EPA1 to EPA3), that contain minisatellites both in the CBS138 and in the BG2 strains. As shown in Figure 3, five out of six minisatellites found within these three genes exhibit polymorphism between the two strains sequenced. An additional megasatellite was detected in the EPA3 gene, in the BG2 strain, that was absent from the CBS138 strain. This suggests that this megasatellite was inserted or deleted since the separation of the two strains. Note that outside of the regions containing tandem repeats, the EPA3 genes in both strains show 99.8% identity, at the nucleotidic level. In a more specific analysis, Frieman and colleagues (15) showed that the number of repeat units of the 120 bp minisatellite found within the EPA1 gene, varied from three to six (four repeat units are found in the CBS138 and in the BG2 strains), among a panel of 25 clinical isolates of C. glabrata. Additional experiments using PCR and Southern blot analyses to determine the size of minisatellites in EPA genes, confirmed that they exhibit size polymorphism among four different strains of C. glabrata (Muller,H. et al., manuscript in preparation). We concluded that, like microsatellites and minisatellites in S. cerevisiae, several minisatellites exhibit size polymorphism in C. glabrata.
Figure 3.

Comparison of minisatellites in EPA genes in two different C. glabrata strains. (A) Schematic representation of the EPA1, EPA2 and EPA3 genes, located on the right subtelomeric region of chromosome VI. Note that gene order and organization are identical in both the BG2 and the CBS138 strains. (B) Minisatellites in the three EPA genes show size polymorphism. DNA self-matrix of EPA1, EPA2 and EPA3 are shown for each of the two strains studied (BG2 and CBS138). Gene names are indicated in the right upper corner of each matrix. Number and size of each motif are shown next to each minisatellite. Note the additional compound minisatellite in EPA3 in the BG2 strain. The smaller repeats (2 × 24 bp and 2 × 15 bp), not detected in the CBS138 strain due to the parameters chosen for the program (see Materials and methods section), are slightly expanded in the BG2 strain.

Comparison of minisatellites in EPA genes in two different C. glabrata strains. (A) Schematic representation of the EPA1, EPA2 and EPA3 genes, located on the right subtelomeric region of chromosome VI. Note that gene order and organization are identical in both the BG2 and the CBS138 strains. (B) Minisatellites in the three EPA genes show size polymorphism. DNA self-matrix of EPA1, EPA2 and EPA3 are shown for each of the two strains studied (BG2 and CBS138). Gene names are indicated in the right upper corner of each matrix. Number and size of each motif are shown next to each minisatellite. Note the additional compound minisatellite in EPA3 in the BG2 strain. The smaller repeats (2 × 24 bp and 2 × 15 bp), not detected in the CBS138 strain due to the parameters chosen for the program (see Materials and methods section), are slightly expanded in the BG2 strain.

Proteins encoded by megasatellite-containing genes

There are 44 megasatellites, encoded by 33 different genes. Sixteen out of these 44 megasatellites share a common motif, that was called the SFFIT motif, conserved in all cases except two in which it is slightly degenerated (MS#203 and MS#110, Tables 3, 4, 5 and Figure 4). This 100 amino-acid SFFIT motif is conserved in 37 proteins in Kluyveromyces delphensis, a hemiascomycetous yeast closely related to C. glabrata. In these proteins, it is tandemly repeated, like in C. glabrata. This protein motif is threonine rich (20%), but also contains numerous serine (9%) and proline (8%) residues.
Figure 4.

Alignments of megasatellite motifs. The first motif of each megasatellite was aligned using ClustalW. The signature motif in each family (SFFIT, SHITT, TTITL) is shown in a light gray box to the left. The megasatellite number is indicated left to the sequence (MS#), followed by the number of repeat motifs within the megasatellite (in parentheses). The central part of the SHITT motif in which insertions occur (see text) is indicated by a dark gray box.

Alignments of megasatellite motifs. The first motif of each megasatellite was aligned using ClustalW. The signature motif in each family (SFFIT, SHITT, TTITL) is shown in a light gray box to the left. The megasatellite number is indicated left to the sequence (MS#), followed by the number of repeat motifs within the megasatellite (in parentheses). The central part of the SHITT motif in which insertions occur (see text) is indicated by a dark gray box. Megasatellites in C. glabrata genes C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Tables 2 and 3 for cues (1–3). (4) No occurence of this motif was found in databases. (5) Several occurences of genes containing this motif were found in K. delphensis (see text). Minisatellites in C. glabrata pseudogenes C. g. gene names are abbreviated, only the chromosome letter and the five-digit number are given. Underlined C. g. names bear the signature of cell wall components or involved in cell wall metabolism (see text). Please refer to Tables 2 and 3 for cues (1–3). (4) Coordinates of beginning and end of the pseudogene, in nucleotides. We identified a second long motif, contained in 16 megasatellites, that was called the SHITT motif. It is 45 amino acid long, rich in threonine (28%), serine (16%) and valine (14%) residues. In one megasatellite (MS#212), it is slightly degenerated, and in another one (MS#109b) it contains a small insertion of nine amino acids (Figure 4). Two variant forms of the SHITT motif are also found, one corresponds to an insertion of 60 amino acids in the middle of the motif and the other one to an insertion of 30 amino acids at the same position. The first one was called SHITT-V (three occurences: MS#114b, #227 and #229), and is rich in threonine (20%) and valine (16%), residues, and the second one was called SHITT-G (three occurences: MS#105b, #108b and #112c; Figure 4), and is rich in glycines (48%) and serines (23%). Seven occurences of the SHITT motif were detected in K. delphensis, tandemly repeated in proteins, but no other match was found in protein databases, proving that this motif, like the SFFIT motif, is specific to species closely related to C. glabrata. Another long motif was found in two megasatellites (MS#201 and #204). It was called the TITTL motif, and is rich is threonine (26%), serine (16%) and aspartic acid, glycine and proline residues (9% each). It is found in two genes (EPA2 and CAGL0K00170g) in the completely sequenced CBS138 strain of C. glabrata, and in addition, in two other genes (EPA4 and EPA5), in the BG2 strain of C. glabrata (13,14). All these proteins are involved in cell–cell adhesion. Finally, three megasatellites contain a unique motif in C. glabrata (MS#210, #213, #220), one of them being homologous to a motif found in five proteins in K. delphensis, tandemly repeated form two to eight times (MS#213). Proteins containing the SFFIT or any of the three SHITT motifs often show weak matches with the flocculin family of proteins in S. cerevisiae [FLO/STA superfamily; (16)], involved in flocculation and cell adhesion, but this similarity is restricted to the serine-rich repeated region. We have subsequently used the InterProScan software (see Materials and methods section) to look for possible conserved domains within genes that contain megasatellites. Several of the corresponding proteins contain putative transmembrane spans (TM in Tables 3, 4, 5), the signature of membrane-anchored proteins (17). Another frequent domain encountered is the PA14 domain found in bacterial toxins, glucosidases and adhesins (18), and shown to be involved in carbohydrate binding, both in C. glabrata (19) and in S. cerevisiae (20), making proteins containing this domain good candidates to interact with membrane glycoproteins. In S. cerevisiae, the PA14 domain is found in four proteins encoded by minisatellite-containing genes, belonging to the family of flocculins, directly involved in cell–cell adhesion: FLO1, FLO5, FLO9 and FLO10. The PRich domain is found in highly glycosylated proline-rich cell wall proteins in plants, and is probably involved in interactions with cell wall carbohydrates (21). It is present in two proteins encoded by genes that contain compound minisatellites (CAGL0E06666g/EPA2 and CAGL0I10098g; Table 3). EPA2 encodes an adhesin responsible for cell–cell adhesion in C. glabrata (13,14). Altogether, 10 proteins encoded by megasatellite-containing genes, out of 33, show the signature of membrane proteins, many of them probably involved in interactions with glycoproteins. InterProScan was also used to find structural domains in proteins that are not encoded by megasatellite-containing genes. Out of 79 such genes (Table 2), 12 contain at least three transmembrane spans and are therefore good candidates to be membrane proteins. Altogether, ∼30–40% of minisatellite-containing genes are suspected to encode either cell wall components or proteins involved in cell wall formation, a much higher figure than expected if minisatellites were randomly distributed among C. glabrata genes. SHITT motifs are well conserved among the different minisatellites in the N- and C-terminal parts of the motif, but insertions are found in the central region (Figure 4). The SHITT-G and SHITT-V motifs correspond to insertions of 90 and 180 nt (30 and 60 amino acids), respectively. At the DNA level, SHITT motifs are split into two parts, the sequence corresponding to the N-terminal region is very rich in cytosines (GC skew: −0.7) and adenosines (AT skew: +0.4), whereas the sequence encoding the C-terminal part does not show such biases. The central region, in which minisatellite insertions occur (MS#105b, 108b, 112c, 114b, 227 and 229), is also rich in cytosines and adenosines. This observation suggests that negative GC skews (and to a lesser extent positive AT skews) are a determinant favoring the insertion of new DNA sequences, a conclusion that was also reached for S. cerevisiae minisatellites (3). In comparison, the SFFIT motif does not exhibit any particular sequence bias, whereas the TTITL motif is almost as skewed as the SHITT motif (GC skew: −0.5, AT skew: +0.3). The remaining 93 minisatellites contain shorter motifs (up to 120 nt), that do not belong to any of the families described above. The global amino acid composition of proteins encoded by these 93 minisatellites is given in Table 7. The most common amino acid found in such repeats is serine, followed by glycine, proline and asparagine. This is quite different from motif composition of S. cerevisiae minisatellites, in which, serine and threonine residues are the most frequent amino acids encountered, as in the C. glabrata SFFITT and SHITT megasatellites. In S. cerevisiae, serine- and threonine-rich repeats are thought to be the sites of O-glycosylations of cell wall proteins by the Pmt4 protein (22,23). It is therefore possible that in C. glabrata, proteins containing long-motif minisatellites are targets of similar posttranslational modifications and play a role at the cell wall surface, whereas short-motif minisatellites are involved in a variety of other cellular processes.
Table 7.

Amino acids encoded by minisatellites and megasatellites

Minisatellites
Megasatellites
Amino acidAAn (%)Amino acidAAn (%)
SerineS27.8ThreonineT20.5
GlycineG18.0SerineS9.9
ProlineP11.8Aspartic acidD7.8
AsparagineN10.0ValineV7.3
AlanineA6.8GlycineG6.8
ThreonineT3.8ProlineP6.7
Glutamic acidE3.8IsoleucineI6.4
ValineV3.5Glutamic acidE5.8
Aspartic acidD3.3AlanineA4.7
LysineK2.9AsparagineN3.9
GlutamineQ2.6TyrosineY3.6
MethionineM1.7LeucineL3.4
IsoleucineI1.0PhenylalanineF3.3
LeucineL<1.0LysineK3.3
ArginineR<1.0HistidineH2.4
HistidineH<1.0ArginineR1.6
TyrosineY<1.0TryptophaneW1.1
PhenylalanineF<1.0GlutamineQ<1.0
CysteineC<1.0MethionineN<1.0
TryptophaneW<1.0CysteineC<1.0

DISCUSSION

In the present work, we analyzed the distribution and composition of all minisatellites detected in the genome of the pathogenic yeast C. glabrata. Although similar in size to that of S. cerevisiae, the genome of C. glabrata exhibits a much larger number of minisatellites. The human genome was estimated to contain approximately 6000 minisatellites (≈ 2 minisatellites/Mb of sequences), whereas 6 and 7 minisatellites/Mb were found in Arabidopsis thaliana and in Caenorhabditis elegans, respectively (2,24). Similar figures were found in S. cerevisiae [9 minisatellites/Mb; (3)], but a larger number of minisatellites was found in the present study of the C. glabrata genome (19 minisatellites/Mb). Of particular interest are two new two types of minisatellites absent in S. cerevisiae: compound minisatellites, containing two different intermingled motifs, and megasatellites with long motifs (126–429 bp), that can be tandemly repeated up to 32 times. The latter are often encountered in genes whose products show signatures of cell wall proteins (Tables 3, 4, 5). In contrast to microsatellites, that have been the subject of numerous studies in all sequenced organisms, there are very few reports in the litterature on minisatellite distribution in eukaryotic genomes. The genome of Tetraodon nigroviridis, extensively examined in search of such elements (25), revealed that minisatellites cover only 0.41% of the total sequence, compared to 0.7% in C. glabrata. In T. negroviridis, minisatellites are mainly located in two regions: a subtelocentric minisatellite (10 bp highly polymorphic motif) hybridizing on the short arm of 10 out of 11 subtelocentric chromosomes and a minisatellite with a 118 bp repeated motif, found at all centromeres. Except for these two minisatellites, found in very large arrays in the tetraodon genome, no minisatellite with a repeat motif size >200 bp was detected, nor any kind of tandem repeat resembling compound minisatellites or megasatellites.

Possible origin of C. glabrata minisatellites

One intriguing question is the origin of the numerous C. glabrata minisatellites. Are they de novo created, or are they propagated when the genes that contain them are duplicated? We classified minisatellites into families, based on their motif length and sequence. In total, 109 different motifs are found in 117 simple minisatellites (Table 6, top), showing that, most of the time, each motif is unique. Therefore, minisatellites do not propagate by duplicating a minisatellite-containing gene, but are probably de novo created in existing genes.
Table 6.

Size distribution of C. glabrata minisatellites and megasatellites

Minisatellites
Motif size91215182124273033394245485154576375108120
Nb of occurrences64119172533333231111111
Nb of families53818152533332231111111
Megasatellites
Motif size126129132135138141168177240243258270297300306309429
Nb of occurrences111915311111114111
Nb of families1114411221111111111
Family (1)HHHH (2)HHH (3)HGGGSVSVVS

(1) Megasatellite family: H, SHITT; G, SHITT-G; V, SHITT-V; S, SFFIT; T, TTITL.

(2) Four families, including one SHITT (H) and three other unrelated familes

(3) Two families, including one SHITT (H) and one TTITL (T)

Size distribution of C. glabrata minisatellites and megasatellites (1) Megasatellite family: H, SHITT; G, SHITT-G; V, SHITT-V; S, SFFIT; T, TTITL. (2) Four families, including one SHITT (H) and three other unrelated familes (3) Two families, including one SHITT (H) and one TTITL (T) Amino acids encoded by minisatellites and megasatellites Megasatellites can be classified into defined families, even though their motif size exhibits some size variation (Table 6, bottom). We compared ten genes containing SFFIT megasatellites with each other, and found that only two of them (CAGL0L00157g and CAGL0E00231g), are similar in their 3′-end (35% identity at the nucleotidic level), and are therefore, most probably paralogues. The remaining genes do not show any significant similarity (besides the SFFIT motif itself), suggesting that these megasatellites are also, most of the time, de novo created in genes. It was previously proposed that minisatellites result from replication slippage between two short DNA sequences located downstream and upstream of a central element (26). Almost all S. cerevisiae minisatellites exhibit such short repeated DNA sequences, consistent with this model (3). However, in C. glabrata, only half of the simple minisatellites show such short repeats, upstream and downstream of the minisatellite. When present, their mean size is 5 ± 0.8 nt, very similar to what was observed in S. cerevisiae. (5 ± 0.4 nt). The absence of such repeats in so many minisatellites in C. glabrata suggests that an additional mechanism may exist to create minisatellites in C. glabrata, or that these short repeats were subsequently erased by mutational decay in this yeast species.

Evolution of C. glabrata minisatellites

In the present study, only 15 of the 65 (23%) S. cerevisiae homologs to the C. glabrata genes containing simple minisatellites, also contain a minisatellite in S. cerevisiae (Table 2). It was previously reported (3), that out of 24 minisatellite-containing S. cerevisiae genes, only six of them (25%) also contain a minisatellite in C. glabrata, a similar proportion to what was found in the present study. Hence, minisatellites evolve faster than the genes containing them. It is interesting to note that among the 53 S. cerevisiae homologs that do not contain a minisatellite (Table 2), only six encode products that probably play a role in cell wall metabolism (YLR194c, OSW2, SAG1, DSE1, BNI1 and KRE1). The others exhibit various functions, in all the known cellular compartments. This suggests that minisatellites in C. glabrata are found in a much wider variety of genes than in S. cerevisiae, in which they are mostly found in cell wall genes (3). This could be due to a higher flexibility of the C. glabrata genes to accomodate such tandem repeats, and underlines the fact that minisatellites may have a function in other genes besides cell wall genes. The insertion of internal motifs into the SHITT motif itself (Figure 4), can be explained by two models, not mutually exclusive (Figure 5). A pre-existing gene already containing a minisatellite will be modified by the insertion of a second motif into one of the previous motifs, and subsequently either lost or propagated by intra-allelic gene conversion or replication slippage (Figure 5A) [for an in-depth review on gene conversion, see ref. (27)]. An alternative hypothesis supposes that one gene contains a nonrepeated motif. A new amino acid motif in inserted into it, and is subsequently amplified to give rise to a minisatellite (Figure 5B). Note that both hypotheses postulate that a short DNA sequence has the propensity to ‘jump’ into another DNA sequence, a property reminiscent of transposable elements (2,28). The same model may be used to explain the presence of the compound minisatellites, showing an irregular alternation of two different motifs (Figure 2), that would result from intermediate steps before complete homogeneization of the minisatellite (Figure 5A, bottom).
Figure 5.

Insertion of a new motif within a minisatellite: two possible models. The motif may target a pre-existing minisatellite, and subsequently spread by intragenic gene conversion (A). Alternatively, the same motif may target a gene that does not contain a minisatellite, and is afterwards expanded in a minisatellite (B). Note that both models are not mutually exclusive, but only model A may lead to compound minisatellites.

Insertion of a new motif within a minisatellite: two possible models. The motif may target a pre-existing minisatellite, and subsequently spread by intragenic gene conversion (A). Alternatively, the same motif may target a gene that does not contain a minisatellite, and is afterwards expanded in a minisatellite (B). Note that both models are not mutually exclusive, but only model A may lead to compound minisatellites. In S. cerevisiae, the CUP1 locus is amplified under selection pressure, by intra-allelic gene conversion and unequal crossing-over between tandem repeats of the CUP1 gene (29). Similarly, human minisatellites CEB1 and MS32 show high levels of inter- and intra-allelic gene conversions during meiosis and mitosis in S. cerevisiae, leading to complex reshuffling of repeat order and composition (30–32). Such mechanisms are also operating on human minisatellites, both during meiosis (33) and mitosis (34), and are probably also active in C. glabrata. Given that its genome contains significantly more unusual minisatellites than the S. cerevisiae genome, one can hypothesize that replication and/or recombination machineries have slightly different properties in each yeast species. In silico comparisons of the gene content of several hemiascomycetous yeasts showed that both replication and recombination machineries are very well conserved between C. glabrata and S. cerevisiae, exhibiting very few differences (35). However, the few differences found (like the presence of two copies of the TOP1 gene and an extra truncated copy of the SGS1 helicase in C. glabrata), might point to some specific properties of replication and/or recombination of the C. glabrata genome, that may explain the numerous peculiar minisatellites found there. In a very recent analysis, Muller and colleagues (Muller,H. et al., manuscript in preparation) showed that two deletions in two C. glabrata strains isolated from infected patients (F11017Blo1 and F15035Blo1), were located in close proximity to three megasatellites (MS#228/229 and MS#214, the largest megasatellite in the genome), suggesting that megasatellites may behave as fragile sites. Fragile sites are natural sites of chromosomal breakage in humans (36) and in yeast (37). It is therefore possible that due to the large repeated nature of megasatellites, spontaneous breakage occur during DNA replication at or near the megasatellite, giving rise to deletions around it (2).

FUNDING

Agence Nationale de la Recherche (ANR-05-BLAN-0331). Funding for open access charge: Agence Nationale de la Recherche. Conflict of interest statement. None declared.
  37 in total

Review 1.  Minisatellites: mutability and genome architecture.

Authors:  G Vergnaud; F Denoeud
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

2.  O-mannosylation precedes and potentially controls the N-glycosylation of a yeast cell wall glycoprotein.

Authors:  Margit Ecker; Vladimir Mrsa; Ilja Hagen; Rainer Deutzmann; Sabine Strahl; Widmar Tanner
Journal:  EMBO Rep       Date:  2003-06       Impact factor: 8.807

3.  An AT-rich sequence in human common fragile site FRA16D causes fork stalling and chromosome breakage in S. cerevisiae.

Authors:  Haihua Zhang; Catherine H Freudenreich
Journal:  Mol Cell       Date:  2007-08-03       Impact factor: 17.970

4.  Intragenic tandem repeats generate functional variability.

Authors:  Kevin J Verstrepen; An Jansen; Fran Lewitter; Gerald R Fink
Journal:  Nat Genet       Date:  2005-08-07       Impact factor: 38.330

Review 5.  A unified classification system for eukaryotic transposable elements.

Authors:  Thomas Wicker; François Sabot; Aurélie Hua-Van; Jeffrey L Bennetzen; Pierre Capy; Boulos Chalhoub; Andrew Flavell; Philippe Leroy; Michele Morgante; Olivier Panaud; Etienne Paux; Phillip SanMiguel; Alan H Schulman
Journal:  Nat Rev Genet       Date:  2007-12       Impact factor: 53.242

Review 6.  Fragile sites and human disease.

Authors:  Kim Debacker; R Frank Kooy
Journal:  Hum Mol Genet       Date:  2007-06-13       Impact factor: 6.150

7.  Modular domain structure in the Candida glabrata adhesin Epa1p, a beta1,6 glucan-cross-linked cell wall protein.

Authors:  Matthew B Frieman; J Michael McCaffery; Brendan P Cormack
Journal:  Mol Microbiol       Date:  2002-10       Impact factor: 3.501

Review 8.  Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae.

Authors:  F Pâques; J E Haber
Journal:  Microbiol Mol Biol Rev       Date:  1999-06       Impact factor: 11.056

9.  Analysis of microsatellites in 13 hemiascomycetous yeast species: mechanisms involved in genome dynamics.

Authors:  Alain Malpertuy; Bernard Dujon; Guy-Franck Richard
Journal:  J Mol Evol       Date:  2003-06       Impact factor: 2.395

10.  Region of FLO1 proteins responsible for sugar recognition.

Authors:  O Kobayashi; N Hayashi; R Kuroki; H Sone
Journal:  J Bacteriol       Date:  1998-12       Impact factor: 3.490

View more
  18 in total

1.  Heterogeneous expression of the virulence-related adhesin Epa1 between individual cells and strains of the pathogen Candida glabrata.

Authors:  Samantha C Halliwell; Matthew C A Smith; Philippa Muston; Sara L Holland; Simon V Avery
Journal:  Eukaryot Cell       Date:  2011-12-02

Review 2.  Comparative genomics and molecular dynamics of DNA repeats in eukaryotes.

Authors:  Guy-Franck Richard; Alix Kerrest; Bernard Dujon
Journal:  Microbiol Mol Biol Rev       Date:  2008-12       Impact factor: 11.056

Review 3.  Megasatellites: a new class of large tandem repeats discovered in the pathogenic yeast Candida glabrata.

Authors:  Agnès Thierry; Bernard Dujon; Guy-Franck Richard
Journal:  Cell Mol Life Sci       Date:  2009-11-28       Impact factor: 9.261

4.  Detection and characterization of megasatellites in orthologous and nonorthologous genes of 21 fungal genomes.

Authors:  Fredj Tekaia; Bernard Dujon; Guy-Franck Richard
Journal:  Eukaryot Cell       Date:  2013-03-29

5.  Dynamic evolution of megasatellites in yeasts.

Authors:  Thomas Rolland; Bernard Dujon; Guy-Franck Richard
Journal:  Nucleic Acids Res       Date:  2010-03-31       Impact factor: 16.971

6.  Candida glabrata Pwp7p and Aed1p are required for adherence to human endothelial cells.

Authors:  Chirayu Desai; John Mavrianos; Neeraj Chauhan
Journal:  FEMS Yeast Res       Date:  2011-07-29       Impact factor: 2.796

7.  WAMI: a web server for the analysis of minisatellite maps.

Authors:  Mohamed Abouelhoda; Mohamed El-Kalioby; Robert Giegerich
Journal:  BMC Evol Biol       Date:  2010-06-06       Impact factor: 3.260

8.  Formation of new chromosomes as a virulence mechanism in yeast Candida glabrata.

Authors:  Silvia Poláková; Christian Blume; Julián Alvarez Zárate; Marek Mentel; Dorte Jørck-Ramberg; Jørgen Stenderup; Jure Piskur
Journal:  Proc Natl Acad Sci U S A       Date:  2009-02-09       Impact factor: 11.205

Review 9.  Candida glabrata: a review of its features and resistance.

Authors:  C F Rodrigues; S Silva; M Henriques
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2013-11-19       Impact factor: 3.267

10.  Functional variability in adhesion and flocculation of yeast megasatellite genes.

Authors:  Cyril Saguez; David Viterbo; Stéphane Descorps-Declère; Brendan P Cormack; Bernard Dujon; Guy-Franck Richard
Journal:  Genetics       Date:  2022-05-05       Impact factor: 4.402

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.