| Literature DB >> 31231763 |
Roman Matyášek1, Alena Kuderová1, Eva Kutílková1, Marek Kučera1, Aleš Kovařík1.
Abstract
The intergenic spacer (IGS) of rDNA is frequently built of long blocks of tandem repeats. To estimate the intragenomic variability of such knotty regions, we employed PacBio sequencing of the Cucurbita moschata genome, in which thousands of rDNA copies are distributed across a number of loci. The rRNA coding regions are highly conserved, indicating intensive interlocus homogenization and/or high selection pressure. However, the IGS exhibits high intragenomic structural diversity. Two repeated blocks, R1 (300-1250 bp) and R2 (290-643 bp), account for most of the IGS variation. They exhibit minisatellite-like features built of multiple periodically spaced short GC-rich sequence motifs with the potential to adopt non-canonical DNA conformations, G-quadruplex-folded and left-handed Z-DNA. The mutual arrangement of these motifs can be used to classify IGS variants into five structural families. Subtle polymorphisms exist within each family due to a variable number of repeats, suggesting the coexistence of an enormous number of IGS variants. The substantial length and structural heterogeneity of IGS minisatellites suggests that the tempo of their divergence exceeds the tempo of the homogenization of rDNA arrays. As frequently occurring among plants, we hypothesize that their instability may influence transcription regulation and/or destabilize rDNA units, possibly spreading them across the genome.Entities:
Keywords: zzm321990 Cucurbita moschatazzm321990 ; DNA-minisatellite; intragenomic structural heterogeneity; non-canonical DNA conformations; ribosomal DNA intergenic spacer
Mesh:
Substances:
Year: 2019 PMID: 31231763 PMCID: PMC6589552 DOI: 10.1093/dnares/dsz008
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1FISH at metaphase chromosomes of C. pepo. Four very strong and nine significantly weaker signals for 35S rDNA (green; panels A and D) complemented with two strong signals for 5S rDNA (red; panels B and D) were detected. Chromosomes were counterstained with DAPI (C). To count all very small chromosomes hardly detected with colour, chromosomes were highlighted in inverted black and white format (panels E and F). Two metaphase preparations are shown, and panel E corresponds to panels (A–D). The chromosomes that harbour 35S rDNA are marked by arrowheads. Panel (F) shows metaphase preparation with all chromosomes well separated, counted and numbered from 1 through 41, approximately in the order of their relative sizes. Note that odd total number of chromosomes suggests an aneuploidy in this metaphase, which is in agreement with the odd number of rDNA signals shown in panels (A, D and E), although showing different metaphase. Scale bars (A–F) = 10 µm.
Figure 2Repetitive organization of the IGS in the pumpkin 35S rDNA unit. (A) Three regions (R1–R3), each built of tandemly arranged repetitive units, are separated by two mutually highly homologous sequences (Ps and Pg), as demonstrated by the dot-plot diagram constructed from the consensus sequence derived from mapped ROIs. Motifs related to TISs are denoted by arrows. (B) The frequency of abundant (>10%) SNPs plotted along the rDNA unit (100-bp window) shows that the R1 and R2 regions represent the most variable extremes in the rDNA unit. In contrast, all three genes are highly conserved and separated by the moderately variable ITS1 and ITS2. The pattern of all abundant SNPs along the source consensus sequence is shown in Supplementary Fig. S4. It is necessary to stress that the highest variability at repeats R1 and R2 may be artificially overestimated owing to possible incorrect mapping of the corresponding ROIs with variable copy numbers of repeats.
Figure 3The multimodal distribution of size variants in the R2 repeated region. Three independent methods were used to evaluate the size variability: (A) Distance between two boundary unique sequences (see Section 2.3). (B) Overall size of all tandem repeats detected in the R2 region by Tandem Repeats Finder. (C) Size of the uninterrupted region with GC content higher than 75%. Significant correlations between the results obtained using these three approaches, are demonstrated in panels (D–F).
Figure 4Size polymorphisms at R2(R1) region(s) may be supported on a genome-wide range by Southern blot analyses. (A) Target(s) for IGS hybridization probe(s) are shown on the background of a restriction map for NdeI (N), HpaI (H) and DraI (D) constructed from the consensus sequence for the corresponding IGS regions (Supplementary Fig. S4). The restricted DNA fragments that are expected to hybridize with the probe(s) are drawn below. (B) Southern hybridization of IGS probe(s) to genomic DNA restricted by the quoted RE reveals multiple rather diffuse bands. For better resolution, only short (<2 kb) fragments are shown (long fragments are shown in Supplementary Fig. S10). (C) Densitometric evaluations of hybridization profiles. Note that NdeI digests show size variability in R2 (Bands 1–2), whereas double digestion with NdeI and HpaI shows size variability in both R1 (Bands 3–4) and R2 (1–2) regions. (D) Longer exposition of NdeI digests highlighted structural variability (asterisk) across three plants selected within a single pumpkin cultivar. Plant marked by number 1 was sequenced and analysed in panel (B).
Figure 5Five prominent repeated families coexist in R2 regions in pumpkin IGSs. (A) Each family (A–E) is characterized by a specific arrangement of two kinds of short sequence motifs, pZ and pG4. Individual abundant pZ- and pG4-variants are distinguished by small and numbered capital letters, respectively, within corresponding rectangles and specified in Supplementary Tables S3 and S4, respectively. C3/C4 means that two pG4-motifs (C3 and C4) are overlapping. Three pG4 prominent variants that are distinct in length of G tracts (G2–G4) are considered. A particular family may consist of several size variants (subfamilies) with a distinct number of basic tandem repeats (arrows). Each subfamily is represented by the most abundant distribution of pG4-motifs. (B) The relative abundance of individual subfamilies (left) and families (right) in the genome is estimated from the number of corresponding ROIs.
Selected features characterizing individual R2 families
| Feature | Specification | Family | ||||
|---|---|---|---|---|---|---|
| A | B | C | D | E | ||
| Overall sizea,b,c | mean [bp] | 341; 294; 433 | 303.5; 284.5; 396.5 | 424; 427; 519 | 207; 189; 295 | 264; 215; 358 |
| min. [bp] | 199; 186; 286 | 216; 205; 318 | 309; 310; 401 | 194; 133; 283 | 200; 170; 291 | |
| max. [bp] | 446; 435; 539 | 415; 403; 511 | 551; 553; 643 | 239; 201; 327 | 310; 275; 401 | |
| max. – min. [bp] | 247; 249; 253 | 199; 198; 193 | 242; 243; 242 | 45; 68; 44 | 110; 105; 110 | |
| Tandem unita,d | consensus mean size [bp] | 34 | 79 [39]e | 42 | 38 | 37 |
| copy numbers range | 4.8–12.6 | 2.0–4.3 [7.1]e | 6.4–13.5 | 3.6 | 2.4–3.8 | |
| pZ-motifsa,f | copy numbers range | 5–12 | 6–10 | 6–12 | 5 | 5–8 |
| no of variants (family-specific) | 7(4) | 6(1) | 5(3) | 5(4) | 5(2) | |
| the most abundant variant(s) | (GC)6 | (GC)6; (CG)6C | (CG)5C | (GC)5 | (CG)5C; (CG)6C | |
| pG4-motifsa,g,c | copy numbers rangeh | 2–3 | 4–7 | 9–14 | 4 | 5–8 |
| no of variants (family-specific) | 8(6) | 10(7) | 9(6) | 6(5) | 8(4) | |
| the most abundant varianti | G2 | G2; G3 | G4 | G2 | G3 | |
| Microsatellite | consensus | nd | c4g; c4gc3(gc)3j | cgc3 | nd | c4g; ccacgc |
| abundance [% of ROIs] | nd | 59; 30k | 92 | nd | 97; 76 | |
| copy number mean | nd | 7.6; 5.6 | 43.2 | nd | 26.2; 4.3 | |
| copy number min. | nd | 6.6; 5.6 | 13.0 | nd | 10.2; 4.3 | |
| copy number max. | nd | 19.6; 8.6 | 62.2 | nd | 33.6; 5.3 | |
| Ratio GC3/C3Gc | mean | 1.5 | 1.1 [0.9]e | 1.0 | 1.2 | 1.3 |
| RTAnT-sizec,l | mean; min.; max. [bp] | 39; 21; 49 | 12; 11; 14 | 8; 0; 8 | 10; 9; 10 | 9; 8; 9 |
For individual subfamilies, see also Supplementary Table S5.
Three values for each descriptive statistical parameter are derived from three independent methods used for repeat size estimations and representing: (i) region with GC content >75%, (ii) Tandem Repeats Finder and (iii) distance between conserved unique bordering sequences, respectively.
For comparative statistics, see also Supplementary Table S6.
For the consensus, see also Supplementary Table S2.
Values for the B8a subfamily are in square brackets, if significantly different from other B-subfamilies.
Frequency of abundant pZ-motifs are comprehensively specified in Supplementary Table S3.
Frequency of abundant pG4-variants are comprehensively specified in Supplementary Table S4.
Only motifs with G3 and G4 tracts are considered.
Only variants with G-tracts of different lengths (G2–G4) are distinguished.
Longer motif is specific for the B8a-subfamily.
Computed from the number of ROIs in B8 and B8a subfamilies, respectively.
Length of the spatially linked AT-region (RTAnT) located between the putative spacer promoter and R1 minisatellite (Supplementary Fig. S4).
Figure 6Structural variability of pG4- and pZ-motifs in distinct positions along the R2 minisatellites. The relative abundance of particular pG4- and pZ-variants are represented by charts located above and under the corresponding positions, respectively, in selected A9- and C9-sub-families (for more annotation, see Fig. 5). The pG4-variants are distinguished by numerals representing the in silico computed potential (score) to form a G4-conformation. Each pG4-variant may, however, be composed of several sequence motifs with the same score. Only variants that occurred more frequently than in one ROI are distinguished.
Figure 7ML phylogenetic tree constructed from the SNP distribution along putative promoters. Genic and spacer promoter variants, determined in Supplementary Fig. S6, are well separated. In addition, both genic and spacer promoter sequences are split into the same number (five) of clusters of 1g–5g and 1s–5s, respectively. Owing to high number of sequences, individual clusters are represented by circles with areas proportional to the number (specified in brackets) of corresponding promoters, specified in more details in Supplementary Table S10. Only bootstrap values higher than 50% are shown.