| Literature DB >> 22384362 |
Joseph N Fass, Nikhil A Joshi, Mary T Couvillion, Josephine Bowen, Martin A Gorovsky, Eileen P Hamilton, Eduardo Orias, Kyungah Hong, Robert S Coyne, Jonathan A Eisen, Douglas L Chalker, Dawei Lin, Kathleen Collins.
Abstract
Genetically programmed DNA rearrangements can regulate mRNA expression at an individual locus or, for some organisms, on a genome-wide scale. Ciliates rely on a remarkable process of whole-genome remodeling by DNA elimination to differentiate an expressed macronucleus (MAC) from a copy of the germline micronucleus (MIC) in each cycle of sexual reproduction. Here we describe results from the first high-throughput sequencing effort to investigate ciliate genome restructuring, comparing Sanger long-read sequences from a Tetrahymena thermophila MIC genome library to the MAC genome assembly. With almost 25% coverage of the unique-sequence MAC genome by MIC genome sequence reads, we created a resource for positional analysis of MIC-specific DNA removal that pinpoints MAC genome sites of DNA elimination at nucleotide resolution. The widespread distribution of internal eliminated sequences (IES) in promoter regions and introns suggests that MAC genome restructuring is essential not only for what it removes (for example, active transposons) but also for what it creates (for example, splicing-competent introns). Consistent with the heterogeneous boundaries and epigenetically modulated efficiency of individual IES deletions studied to date, we find that IES sites are dramatically under-represented in the ∼25% of the MAC genome encoding exons. As an exception to this general rule, we discovered a previously unknown class of small (<500 bp) IES with precise elimination boundaries that can contribute the 3' exon of an mRNA expressed during genome restructuring, providing a new mechanism for expanding mRNA complexity in a developmentally regulated manner.Entities:
Keywords: DNA breakage and joining; Tetrahymena; ciliate nuclear dualism; community sequencing project; genome rearrangement
Year: 2011 PMID: 22384362 PMCID: PMC3276166 DOI: 10.1534/g3.111.000927
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1 Annotation of MAC positions of IES excision. (A) A representative high-confidence win1 IES site prediction in the MAC genome. The candidate IES site (IES D) falls within an internal exon of the gene model for TTHERM_00198180, which is displayed in its entirety (see scale bar at top). The genome browser track “Putative IES sites using uniquely mapped Sanger reads” indicates positions of the win1, win2, and win3 IES predictions described in the text. The track “Sanger Reads Left (unique)” shows the extent of MAC-matching sequence within a MIC genome read that maps to the MAC with its left end but has a nonmapping sequence on its right end; the read matched a unique MAC sequence, in contrast to reads segregated to the track “Sanger Reads Left (multi).” The track “Sanger Reads Right (unique)” shows the extent of MAC-matching sequence within a MIC genome read that maps to the MAC with its right end but has a nonmapping sequence on its left end; the read matched a unique MAC sequence, in contrast to reads segregated to the track “Sanger Reads Right (multi).” The tracks “Sanger Reads Fully Mapped” show read alignments that matched the MAC without a left-end-only or right-end-only extension of nonmapping sequence. (B) Widespread MAC chromosome distribution of IES sites predicted from Sanger L and/or R read alignments. Note that the conMAC region containing the smallest MAC genome contigs at right is also overrepresented in the 10 kbp spacer blocks of N that were added between contigs; therefore, the bp amount of MAC assembly in this region is exaggerated as a proportion of conMAC length. The upper track shows the entire conMAC assembly of MAC genome contigs joined in order of decreasing length; in the browser, this track is designated as the fakeasome.
Exon-interrupting IES candidates
| IES | conMAC | TTHERM | Gene | Reads | MAC Junction | Exon | IES Length |
|---|---|---|---|---|---|---|---|
| 1 | 18231806 | 00142380 | e, a | 1L+1R | ttaTTAAtgg | 3′ | 194 |
| 2 | 41846824 | 00348490 | e, a | 1L+2R | ttaTTAAtta | 5′ | 453 |
| 3 | 62203355 | 00586680 | e, a | 2L+3R | tttTTAATTttt | Single | n.d. |
| A | 94944570 | 01101620 | e | 1L+1R | tacATAatc | Single | 483 |
| B | 15671490 | 00569290 | e, a | 1R | tcaTTAAatt | 3′ | 337 |
| C | 64246345 | 00617820 | e | 3L+3R | cccAAtgt | 3′ | ∼1,500 |
| D | 24290765 | 00198180 | e, a | 1L+1R | atat/cctg | Mid | ∼1,200 |
| E | 95586240 | 01119380 | e | 1L+1R | aaaGAttg | Mid | ∼2,000 |
| F | 42109185 | 00359230 | e, a | 1L+3R | tccTtta | Jxn 3′ | n.d. |
| G | 99551500 | 01259660 | a | 3L+1R | gtcAAata | 5′ | n.d. |
ConMAC, approximate browser coordinate of IES.
TTHERM, gene model number.
Gene, evidence for gene function based on putative mRNA expression (EST and/or microarray detection) and/or predicted protein properties (protein domain annotation) indicated by “e” and/or “a” respectively.
Reads, number of Sanger L and/or R reads.
MAC Junction, MAC sequence following IES removal: sequence present on both sides of the IES before elimination and retained as single-copy in the MAC is indicated in upper case; a slash separates flanking sequences joined without microhomology.
Exon, predicted position of the IES-containing exon within the gene model: single indicates a single-exon gene model, Mid indicates an internal exon, Jxn 3′ is the intron/exon boundary of the 3′ exon.
IES Length, actual or minimum length of IES in bp: n.d. indicates size not determined. Note that it is possible that IES length is longer than detected by PCR if the IES contains internal repeat(s).
Figure 2 IES validation by PCR. Genomic DNA isolated from strain SB210, CU428, or B2086 was amplified by PCR using primers flanking the putative IES site in MAC-destined DNA, as schematized at right. PCR products are visualized here as the negative image of an agarose gel stained with ethidium bromide. The smaller panels below the main panels for IES 2, A, and E show IES-specific PCR amplification using primer(s) that overlap or are internal to the IES, as also schematized. Relevant DNA standards are indicated. Expected MAC genome amplification products are labeled with “”; IES-containing PCR products are labeled with “”; note that IES size could be underestimated for IES C, D, and E if the IES contains internal repeat(s).
Figure 3 IES excision requirements. For each IES, PCR was done with the schematized primer pairs using genomic DNA isolated from the strain SB210 in vegetative growth or from the polyclonal pool of cells arrested 28 hr after initiation of conjugation for gene knockout strains lacking DCL1, PDD1, or LIA1. PCR products are visualized as the negative image of an agarose gel stained with ethidium bromide. DNA standard lanes have the 500 bp marker denoted with “o”; expected MAC genome amplification products are labeled with “”; and IES-containing PCR products are labeled with “” (these labels are included in a subset of adjacent gel lanes for clarity).
Figure 4 Sequence alignment of MAC junctions resulting from removal of IES C. A multiple sequence alignment is shown for cloned MAC junctions resulting from the removal of IES C, which together have three lengths and four distinct sequences. Clones were sequenced for junctions amplified from DNA of the inbred strains B2086, CU428, and SB210 (lines 1–3) or amplified from the DNA of noninbred strain crosses used in Figure 3 (lines 4–5; conj1 and conj2 indicate the two distinct sequences obtained from many cloned DNA fragment sequences). Below the alignment, “” indicates the region of consensus sequence.
Figure 5 IES host gene expression. (A) Total RNA was isolated from strain SB210 in asexual vegetative growth (V) or at the indicated times after initiation of conjugation using strains SB210 and CU428 (in hours). Northern blot hybridization was performed to detect the putative mRNA region adjacent to the indicated IES. Blots are shown in size register; nonspecific signal left from a previous probing of the same membrane is indicated by “NS.” (B) Transcripts from the region of the LIA2 locus hosting IES B were isolated by RT-PCR and sequenced. Boxes denote exons, and thick lines denote introns or untranslated regions. Gray boxes and solid lines indicate MAC-destined sequence, and the open box and dashed line indicate IES. Translation termination codons (TGA) and positions of mRNA polyadenylation (pA) are shown.