| Literature DB >> 32370063 |
Yuan Li1, Xiao Chen1, Kun Wu1, Jiao Pan1, Hongan Long1, Ying Yan1.
Abstract
Simple sequence repeats (SSRs) are prevalent in the genomes of all organisms. They are widely used as genetic markers, and are insertion/deletion mutation hotspots, which directly influence genome evolution. However, little is known about such important genomic components in ciliated protists, a large group of unicellular eukaryotes with extremely long evolutionary history and genome diversity. With recent publications of multiple ciliate genomes, we start to get a chance to explore perfect SSRs with motif size 1-100 bp and at least three motif repeats in nine species of two ciliate classes, Oligohymenophorea and Spirotrichea. We found that homopolymers are the most prevalent SSRs in these A/T-rich species, with AAA (lysine, charged amino acid; also seen as an SSR with one-adenine motif repeated three times) being the codons repeated at the highest frequencies in coding SSR regions, consistent with the widespread alveolin proteins rich in lysine repeats as found in Tetrahymena. Micronuclear SSRs are universally more abundant than the macronuclear ones of the same motif-size, except for the 8-bp-motif SSRs in extensively fragmented chromosomes. Both the abundance and A/T content of SSRs decrease as motif-size increases, while the abundance is positively correlated with the A/T content of the genome. Also, smaller genomes have lower proportions of coding SSRs out of all SSRs in Paramecium species. This genome-wide and cross-species analysis reveals the high diversity of SSRs and reflects the rapid evolution of these simple repetitive elements in ciliate genomes.Entities:
Keywords: evolution; genome instability; genome repetivity; protists; simple sequence repeats
Year: 2020 PMID: 32370063 PMCID: PMC7285179 DOI: 10.3390/microorganisms8050662
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Features of macronuclear and micronuclear genomes analyzed in this study.
| Species | G (Mbp) | A/T | TNG | n | N50 (kbp) | Platform | Class | Data Source |
|---|---|---|---|---|---|---|---|---|
| 48.80 | 84.09 | 8096 | 49 | 55.11 | 454, Sanger | Oligohymenophorea | [ | |
| 67.16 | 68.65 | 18500 | 0 | 3.74 | Illumina, 454, Sanger | Spirotrichea | [ | |
| 496.29 | 71.56 | 810 a | - | 27.81 | Illumina, PacBio | Spirotrichea | [ | |
| 79.96 | 74.23 | 39242 | 0 | - | Illumina, 454 | Oligohymenophorea | [ | |
| 30.48 | 71.80 | 18509 | 0 | - | Illumina, 454 | Oligohymenophorea | [ | |
| 68.02 | 75.93 | 34939 | 0 | - | Illumina, 454 | Oligohymenophorea | [ | |
| 72.09 | 71.95 | 39521 | 144 | 413 | Sanger | Oligohymenophorea | [ | |
| 55.46 | 81.19 | 13186 | 0 | 368 | Illumina | Oligohymenophorea | [ | |
| 50.16 | 68.30 | 20740 | 0 | - | Illumina | Spirotrichea | [ | |
| 103.01 | 77.68 | 24725 | 60 | 521 | Sanger | Oligohymenophorea | [ | |
| 157.69 | 77.92 | 47 b | - | 486.55 | Illumina | Oligohymenophorea | [ |
A/T, A/T content of the genome; Class, the taxonomic class in which the species is; G, genome size; MAC, macronucleus; MIC, micronucleus; n, number of overlapping genes; N50, scaffold N50; Platform, genome sequencing platform; TNG, total number of genes in the genome; a, not including internally eliminated sequences (IES)-less genes; b, genes only predicted in non-maintained macronuclear chromosomes, which are lost after macronuclear differentiation.
Macronuclear simple sequence repeats information.
| Species | A/T | SSR/G | H/SSR | A/T-H | CSP | RPG(SEM) | ||
|---|---|---|---|---|---|---|---|---|
|
| 97.63 | 11.97 | 91.04 | 97.62 | −0.72(3.76 × 10−6) | −0.55(0.01) | 17.08(20.60) | 0.50(2.62 × 10−4) |
|
| 87.74 | 8.02 | 95.12 | 87.76 | −0.73(1.27 × 10−3) | −0.80(6.08 × 10−4) | 63.41(70.50) | 0.50(1.58 × 10−4) |
|
| 95.18 | 8.22 | 94.52 | 93.95 | −0.19(0.33) | −0.02(0.93) | 73.67(72.77) | 0.51(1.41 × 10−4) |
|
| 92.17 | 7.59 | 95.15 | 91.86 | −0.81(4.51 × 10−4) | −0.79(7.10 × 10−4) | 15.34(86.46) | 0.51(2.01 × 10−4) |
|
| 95.54 | 8.68 | 94.83 | 95.49 | −0.31(0.09) | −0.40(0.05) | 69.24(73.43) | 0.51(1.97 × 10−4) |
|
| 91.97 | 7.80 | 94.99 | 92.07 | −0.31(0.15) | −0.08(0.74) | 72.24(75.55) | 0.50(1.49 × 10−4) |
|
| 95.91 | 11.38 | 93.75 | 95.95 | −0.48(1.23 × 10−3) | −0.56(0.01) | 34.59(39.34) | 0.50(1.59 × 10−4) |
|
| 87.35 | 7.81 | 94.96 | 87.39 | −0.71(4.76 × 10−3) | −0.72(8.67 × 10−3) | 63.70(71.39) | 0.50(1.88 × 10−4) |
|
| 96.69 | 10.09 | 95.29 | 96.61 | −0.35(0.05) | −0.72(8.67 × 10−3) | 41.40(49.39) | 0.50(1.21 × 10−4) |
All numbers are percentages, except for those in the r1, r2, and RPG columns. A/T, A/T content of SSRs in the genome; SSR/G, proportion of SSR sequences in the whole genome; H/SSR, proportion of homopolymer runs in SSR sequences; A/T-H, A or T homopolymers out of all homopolymers; r1(P), Pearson’s correlation coefficient (P value) of motif size vs. A/T content at all sites; r2(P), Pearson’s correlation coefficient (P value) of motif size vs. A/T content at coding sites; CSP, coding SSR proportion, proportions of SSRs in coding regions out of all SSRs, proportions of coding sequences out of the whole-genome sequences are in the parentheses; RPG, relative position of homopolymer SSRs in a gene, calculated by (|homopolymer median genomic coordinate-gene start position|+1)/(gene length); SEM, standard error of the mean.
Figure 1Counts of simple sequence repeats (SSRs) with 1–100 bp motifs (≥three repeats) in the nine ciliate macronuclear genomes. The y-axis is log10 transformed.
Figure 2Number of motif repeats, which is represented by y-axis values ≥3, and A/T content in SSRs with different sizes of motifs, represented by y-axis values ≤1. Dots are jittered. Due to the limited jittering-distance, the sizes of dots do not reflect the dominating number of homopolymer SSRs. The y-axis is log10 transformed.
Figure 3Comparison of SSR counts in the macronucleus and micronucleus of Oxytricha trifallax and Tetrahymena thermophila. The arrow marks the 8-bp-motif SSRs in the macronuclear genome. The y-axis is log10-transformed.
Figure 4Numbers of codons that are in SSR regions. White boxes represent 0. Ich, Ichthyophthirius multifiliis; Oxy, Oxytricha trifallax; Pbia, Paramecium biaurelia; Pcau, P. caudatum; Psex, P. sexaurelia; Ptet, P. tetraurelia; Pseudo, Pseudocohnilembus persalinus; Sty, Stylonychia lemnae; Tetra, Tetrahymena thermophila.
Total counts of SSRs with codon repeats (>=10) in the nine ciliate genomes.
| Codons | Amino Acid | Ich | Oxy | Pbia | Pcau | Psex | Ptet | Pseudo | Sty | Tetra |
|---|---|---|---|---|---|---|---|---|---|---|
| GCA|GCG|GCC|GCT | Alanine | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| CGA|CGG|CGC|CGT|AGA|AGG | Arginine | 8 | 0 | 0 | 0 | 5 | 0 | 1 | 0 | 1 |
| AAC|AAT | Asparagine | 65 | 0 | 70 | 0 | 111 | 38 | 8 | 0 | 12 |
| GAC|GAT | Aspartic acid | 13 | 0 | 0 | 0 | 1 | 3 | 2 | 0 | 1 |
| TGC|TGT | Cysteine | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 |
| GGA|GGG|GGC|GGT | Glycine | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 |
| GAA|GAG | Glutamic acid | 16 | 0 | 1 | 1 | 6 | 5 | 7 | 0 | 3 |
| CAA|CAG | Glutamine | 0 | 0 | 1 | 0 | 1 | 3 | 1 | 0 | 0 |
| CAC|CAT | Histidine | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ATA|ATC|ATT | Isoleucine | 80 | 1 | 70 | 1 | 113 | 20 | 6 | 1 | 6 |
| CTA|CTG|GTC|CTT|TTA|TTG | Leucine | 13 | 0 | 98 | 0 | 0 | 48 | 1 | 0 | 2 |
| AAA|AAG | Lysine | 15 | 0 | 5 | 0 | 10 | 1 | 10 | 0 | 5 |
| ATG | Methionine | 2 | 0 | 0 | 0 | 1 | 0 | 3 | 0 | 1 |
| TTC|TTT | Phenylalanine | 2 | 0 | 5 | 0 | 0 | 0 | 2 | 0 | 0 |
| CCA|CCG|CCC|CCT | Proline | 1 | 0 | 1 | 0 | 0 | 5 | 1 | 0 | 0 |
| TCA|TCT|TCC|TCT|AGC|AGT | Serine | 4 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
| ACA|ACG|ACC|ACT | Threonine | 10 | 0 | 1 | 0 | 4 | 3 | 2 | 0 | 1 |
| TGG | Tryptophan | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| TAC|TAT | Tyrosine | 17 | 0 | 60 | 0 | 0 | 30 | 0 | 0 | 0 |
| GTA|GTG|GTC|GTT | Valine | 3 | 0 | 0 | 0 | 1 | 0 | 3 | 0 | 1 |
Ich, Ichthyophthirius multifiliis; Oxy, Oxytricha trifallax; Pbia, Paramecium biaurelia; Pcau, P. caudatum; Psex, P. sexaurelia; Ptet, P. tetraurelia; Pseudo, Pseudocohnilembus persalinus; Sty, Stylonychia lemnae; Tetra, Tetrahymena thermophila.