| Literature DB >> 35346033 |
Qiutao Ding1, Runsheng Li1,2, Xiaoliang Ren1, Lu-Yan Chan1, Vincy W S Ho1, Dongying Xie1, Pohao Ye1, Zhongying Zhao3,4.
Abstract
BACKGROUND: Ribosomal DNAs (rDNAs) are arranged in purely tandem repeats, preventing them from being reliably assembled onto chromosomes during generation of genome assembly. The uncertainty of rDNA genomic structure presents a significant barrier for studying their function and evolution.Entities:
Keywords: 18S-5.8S-26S; 5S; 5S rDNA cluster; C. briggsae; Caenorhabditis elegans; Oxford Nanopore technologies
Mesh:
Substances:
Year: 2022 PMID: 35346033 PMCID: PMC8961926 DOI: 10.1186/s12864-022-08476-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Read statistics
| Library name | Total number of reads | Total bases (Gbp) | Mean length (bp) | Median length (bp) | N50 length (bp) | Max mapped length (bp) |
|---|---|---|---|---|---|---|
| 789,871 | 10.8 | 13,724 | 13,084 | 18,558 | 163,153 | |
| 199,712 | 3.7 | 18,300 | 13,605 | 31,265 | 196,902 | |
| 822,902 | 9.3 | 11,341 | 9187 | 19,566 | 174,664 | |
| 1,433,280 | 11.1 | 7724 | 4148 | 15,427 | 182,506 | |
| 870,874 | 12.6 | 14,479 | 10,248 | 25,074 | 247,180 | |
| 2,696,939 | 12.9 | 4785 | 2463 | 9429 | 252,751 | |
| 60,187 | 0.6 | 10,720 | 2409 | 27,723 | 139,839 | |
| 2,294,403 | 15.1 | 6562 | 2347 | 19,197 | 382,430 |
EMB Mix-staged embryos, L1 Larval stage 1, YA Young adult
Fig. 1Structure of the C. elegans (N2a) 5S rDNA cluster. a INDELs identified with Nanopore reads within the 5S rDNA unit. Shown are normalized INDEL occurrences along with GC content. Deletion and insertion identified with Nanopore raw reads are shown in red and blue, respectively. Cross-validated INDELs used in the subsequent analysis are indicated with black circles (see Methods). Two large indels are indicated. b SNPs in the 5S rDNA are identified with existing NGS data. SNPs present or absent in new rDNA variants are colored in red and black, respectively. c Structure of the 5S rDNA-containing regions on the chromosome V in the current C. elegans N2 reference genome (WBcel235). d Structure of the 5S rDNA cluster assembled by ONT reads, which carries a total of at least 167 partial or complete units. The cluster is divided into 5 regions (R1-5) based on the SNPs and INDELs present in each unit or the position relative to Repeat 1a. Newly identified rDNA units or unique repeats are differentially color coded (see Table 2). Variation in rDNA copy number is indicated with a dash line. Note that three copies of Repeat 1a are inserted into the 5S rDNA unit at the same position within R5
List of the variants of 5S rDNA unit in C. elegans (N2a) used in this study
| Variant | Size (bp) | Copy number | Sequence variation relative to |
|---|---|---|---|
| unit 1.1 | 976 | Dynamic | Not applicable |
| unit 1.2 | 971 | 2 | 766_771delinsC |
| unit 1.3 | 972 | 16 | 99_102del, 162C > G |
| unit 1.4 | 976 | 2 | 621 T > G |
| unit 1.5 | 946 | 3 | 780_809del, 621 T > G |
| unit 1.6 | 972 | 2 | 99_102del, 162C > G, 220C > A, 621 T > G |
| unit 1.7 | 976 | 4 | 220C > A, 621 T > G |
| unit 1.9 | 976 | 9 | 99_102del, 162C > G, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 339C > T, 621 T > G |
| unit 1.10 | 976 | 1 | 99_102del, 162C > G, 318 T > C, 325_326insCAAT, 329G > A, 332 T > G, 339C > T, 621 T > G |
| unit 1.11 | 976 | 3 | 99_102del, 162C > G, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 339C > T, 431 T > G, 621 T > G |
| unit 1.12 | 963 | 4 | 99_102del, 162C > G, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 339C > T, 393_405del, 431 T > G, 621 T > G |
| unit 1.13 | 976 | 6 | 99_102del, 162C > G, 318 T > C, 325_326insCAAT, 329G > A, 332 T > G, 339C > T, 431 T > G, 621 T > G, 636 T > G |
| unit 1.14 | 980 | 6 | 309 T > C, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 621 T > G |
| unit 1.15 | 950 | 1 | 780_809del, 220C > A, 309 T > C, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 621 T > G |
| unit 1.16 | 976 | 1 | 99_102del, 162C > G, 309 T > C, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 621 T > G |
| unit 1.18 | 984 | 1 | 99_102del, 162C > G, 309 T > C, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 354_355insGGTATT, 367A > T, 371 T > A, 621 T > G, 718_719insGA |
| unit 1.19 | 982 | 3 | 99_102del, 162C > G, 309 T > C, 318 T > C, 325_326insCAAT, 329G > T, 332 T > G, 354_355insGGTATT, 367A > T, 371 T > A, 621 T > G |
| unit 1.20 | 972 | 15 | 99_102del, 162C > G, 431 T > G, 621 T > G |
| unit 1.21 | 976 | 1 | 162C > G, 621 T > G |
| unit 1.23 | 972 | 29 | 99_102del, 162C > G, 621 T > G |
| unit 1.24 | 942 | 7 | 780_809del, 99_102del, 162C > G, 621 T > G |
| unit 1.26 | 976 | 1 | 99_102del, 162C > G, 335G > C, 407C > T, 621 T > G |
| unit s1a | 975 | 0 | 99_102del, 162C > G, 318_319insA, 390delC, 621 T > G |
| unit s2a | 981 | 0 | 354_355insGGTATT, 367A > T, 371 T > A, 545G > A, 621 T > G |
| unit s3a | 972 | 0 | 325_326insCAAT, 329G > A, 332 T > G, 339C > T, 431 T > G,621 T > G |
aCombinations of variants in s1-s3 are not identified in the 5S rDNA cluster. Del deletion, Ins insertion, Delins deletion followed by insertion
Fig. 2Structural variations within the 5S rDNA cluster between our C. elegans N2a and other N2-derived strains. a Overview of the structures of 5S rDNA clusters for five strains as shown in Fig. 1d. Strain names are indicated on the left. Position and size of transgenic insertions are indicated in scale. b Comparison of unit compositions and estimated copy number in R1. Identified variation in unit composition is highlighted with a vertical dashed line. c-f Comparison of unit compositions in R2 (c), R3 (d), R4 (e), and R5 (f) as in (b). g Ancestry of the strains based on strain history. Our N2a was shipped from Waterston lab in 2010. PD1074 was a recent derivative of VC2010 that was derived from a separate N2 in Don Moerman lab. ZZY0600 and ZZY0603 were generated by transgene insertion into unc-119(tm4063) worms, which was derived from another C. elegans N2 in Mitani lab
Fig. 3Structural variations within 5S rDNA clusters between C. elegans N2a and CB4856 strains. a INDELs identified with CB4856 ONT reads within the 5S rDNA unit as in Fig. 1a. Cross-validated INDELs used in the subsequent analysis are indicated with black circles (see Methods). b SNPs in the 5S rDNA are identified with existing NGS data as in Fig. 1b. SNPs present or absent in new rDNA variants are colored in red and black, respectively. c Overview of the structures of 5S rDNA clusters between N2a and CB4856 as shown in Fig. 1d. Note the differences between the two, including lack of unit 1.1 (red) predominantly seen in N2a, whereas the unit (cel-5S unit 1.17 (see Table 2 & S5)) is unique to and predominantly seen in CB4856. d Structure of the C. elegans CB4856 5S rDNA cluster. The Repeat 1a and 1b are shown as in Fig. 1d
Fig. 4Evaluation of the ONT reads assembled C. briggsae AF16 genome. a Schematic representation of AF16 long reads assembled contig lengths. b Dot plot of corresponding chromosomes between CB4 and ONT reads assembled genome. c Bar chart with summary assessment for the proportion of genes present in three assembled genomes. AF16-ONT: the assembled C. briggsae draft genome in this study, WBcel235: the C. elegans N2 reference genome, CB4: the C. briggsae AF16 reference genome
Fig. 5Characterization of the 5S rDNA units in C. briggsae AF16. a Phylogenetic tree of two divergent 5S rDNA units in C. briggsae (cbr) and the canonical C. elegans (cel) 5S rDNA unit. b Dot plot showing the sequence alignment between two C. briggsae 5S rDNA units. c Multiple sequence alignment of 5S rDNA units from C. elegans and C. briggsae. Alignments for the 5S rRNA gene are shaded in the grey box (indicated at the top). d A contig was misassembled into the rDNA cluster on chromosome III in the reference genome CB4. e Schematic representation of C. briggsae AF16 5S rDNA cluster annotated by ONT reads
Fig. 6Comparison of 45S rDNA units and clusters between strains and species. a Comparison of 45S rDNA units between C. elegans and C. briggsae. b Dot plot showing the alignment of the 45S rDNA unit sequences between two species. c Pairwise sequence alignment of the 45S rDNA unit between two species. The 18S, 5.8S, and 26S RNA gene regions are shaded in grey. Conservation scores are shown at the bottom. d Schematics of the 45S rDNA cluster of C. elegans N2a and CB4856 annotated by ONT reads. In the N2a, the cluster left and right boundaries are flanked by partial 26S rRNA sequences and a partial ETS, respectively. In the 45S rDNA-containing region in C. elegans CB4856, the 45S rDNA cluster is located at the right end of chromosome I while fragmented 45S rDNA sequences along with other sequences are located at the left end. The estimated copy number of the unit is shown. Note that both the chromosome left and right ends are flanked by a ~ 11.6 kb fragment derived from the left end of chromosome IV (pink, see Fig. S8), which is interrupted by some no homologous sequences (white box). A pSX1 cluster is also found adjacent to 45S rDNA. e Schematics of the C. briggsae AF16 genomic regions containing the 45S rDNA annotated by ONT reads in this study. Reconstructed 45S rDNA cluster is located at the left end of chromosome V containing about 85 copies of the 45 rDNA unit. Bottom: A misassembled contig containing partial 26S rRNA gene sequences and 5 protein coding genes was assigned to chromosome I in CB4