| Literature DB >> 17270054 |
Wei-Jen Chang1, Victoria M Addis, Anya J Li, Elin Axelsson, David H Ardell, Laura F Landweber.
Abstract
BACKGROUND: The somatic DNA molecules of spirotrichous ciliates are present as linear chromosomes containing mostly single-gene coding sequences with short 5' and 3' flanking regions. Only a few conserved motifs have been found in the flanking DNA. Motifs that may play roles in promoting and/or regulating transcription have not been consistently detected. Moreover, comparing subtelomeric regions of 1,356 end-sequenced somatic chromosomes failed to identify more putatively conserved motifs.Entities:
Year: 2007 PMID: 17270054 PMCID: PMC1805493 DOI: 10.1186/1745-6150-2-6
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 4.540
Figure 1Different layers of DNA and RNA processing in ciliates. A schematic drawing of how germline (micronucleus) information is passed to soma (macronucleus). Genes, or macronuclear destined sequences (MDSs, large white open boxes), in micronuclear DNA are separated by internal eliminated sequences (IESs, thick black lines), flanked by pairs of direct repeats (grey boxes). Intergenic noncoding sequences are indicated by thin lines. After extensive DNA processing, IESs and intergenic noncoding sequences are deleted, MDSs are sewn together with one copy of each direct repeat retained. Telomeres (hatched boxes) are added to the ends and each macronuclear chromosome undergoes different levels of replication. mRNA transcribed from macronuclear chromosomes is capped (solid oval), polyadenylated, and a representative intron (small white box) is deleted.
Comparisons of sequence removal via intron (RNA) versus IES (DNA) splicing.
| -X| | XY | XGTZAGY | |
| -XA| | XATY | XAGTY | Impossible |
| -XAGZ | XAGZZGTY | XAGZGTY | |
† X, Y, and Z are arbitrary (potentially zero-length) sequences.
‡ refers to the length in nucleotides of sequence Z and ''mod'' refers to the modulo operator.
Figure 2Features of DNA polymerase α (pol α) genes from nine spirotrichous ciliates. A schematic macronuclear DNA pol α gene is shown in the inset. This is flanked by telomeres. The inset also shows the consensus sequence of the 5' conserved motif (AATACCGCC), the transcription start site (right arrow), the putative translation start site (ATG), introns (3 found in seven stichotrichous ciliates as small grey boxes and 2 in Euplotes spp. as small hatched boxes), the putative translation termination codon (STOP), the putative polyadenylation signal sequence (poly(A)), and the mRNA polyadenylation site (solid black diamond). Although relative positions of these features are shown, they are not drawn to scale. Italicized numbers indicate intron lengths; allelic differences, when detected, are separated by "/". Numbers in the last column indicate the putative lengths (number of amino acids) of DNA pol α proteins from each species. Other numbers represent the distances from one motif to the next motif. For example, numbers in the first column represent distances from the 5' telomere to the 5' conserved motif. The symbol "#" indicates that data were not available due to the unavailability of RNA, while a dashed line ("-") indicates that the feature was not detected. Nucleotides in the 5' conserved motifs are shown as dots if they are identical to the consensus AATACCGCC. For each species, only nucleotides that differ from this consensus sequence are shown, as well as nucleotides that comprise the putative polyadenylation signal. A phylogenetic tree [18] is provided at the left of the figure for reference.
Figure 3Sequence logos of the 5' subtelomeric regions of the DNA pol α genes from nine spirotrichous ciliates. Sequences were aligned at A, the transcription start site (position 0); B, the 5' conserved motif; C, the 5' telomere sequence; and D, the putative translation start site (ATG, position 0). Logos were calculated at [63].
Direct repeats found near intron/exon boundaries. The first six rows correspond to the first 5' intron, the next five rows to the second 5' intron, and the last three to the 3' intron.
| Species | Intron | Intron length (bp) | 5' repeat† | 3' repeat | XL?‡ | Location (MDS no.) | Freq.§ |
| 1 | 49 | CAG < gta | cag < GTA | Yes | 8 | 2 | |
| 1 | 79 | TAG < gtt | tag < GTT | Yes | 1 | 2 | |
| 1 | 76 | TAG < gt | tag < GT | Yes | 1 | 3 | |
| 1 | 43 | A < g | ag < | No | 1–2 | 1936 | |
| 1 | 49 | A < g | ag < | No | 1 | 1871 | |
| 1 | 73 | A < g | ag < | No | ND | 1940 | |
| 2 | 32 | T < gta | No | 8–9 | 15 | ||
| 2 | 34 | < g | No | 2 | 2178 | ||
| 2 | 26 | No | 3 | 412 | |||
| 2 | 71 | AT < | No | 2 | 2325 | ||
| 2 | 53 | AT < | No | 2 | 2174 | ||
| 1 | 192 | < gtaag | g< CTTATTA | Yes | 41/41–42 | 2 | |
| 3 | 33 | G < gtaa | g < GTAA | Yes | 34 | 10 | |
| 2 | 193 | AG < g | ag < G | Yes | ND | 245 |
† Symbolism: EXON < intron or intron < EXON. Repeats may be underlined for clarity.
‡ XL?: translationally frame-preserving after excision as an IES
§ Freq: frequency (%) of repeat word in entire MAC sequence
Estimated frequency of intron-flanking repeats as large or larger than observed in the natural data under species-specific random models of ciliate genes.
| Assumed intron splicing constraints | ||||
| Species | intron | None† | Ciliate§ | |
| 1 | ||||
| 1 | ||||
| 1 | 6848 | 14413 | ||
| 1|| | ||||
| 1|| | ||||
| 1|| | ||||
| 2 | 709 | |||
| 2|| | ||||
| 2|| | ||||
| 2|| | ||||
| 2|| | ||||
| 1 | ||||
| 3 | 6895 | 42397 | ||
| 2|| | ||||
** FDR ≤ 0.01, + FDR ≤ 0.1, ~FDR ≤ 0.25: FDR (False Discovery Rate) controlled within each column by the method of Benjamini and Hochberg (1995), which was shown to control the FDR for positively dependent test statistics by Benjamini and Yekutieli (2001).
† No intron splicing constraints: the entire intron was permuted.
‡ Eukaryotic intron splicing constraints: the two bases at the 5' and 3' intron ends were fixed.
§ Putative ciliate intron splicing constraints: the five 5'-most and three 3'-most bases of the introns were fixed.
|| The values in italics were calculated exactly, multiplied by 105 and rounded; other values in upright face were calculated from permutation tests (N = 100,000).