| Literature DB >> 20935050 |
Cestmir Vlcek1, William Marande, Shona Teijeiro, Julius Lukes, Gertraud Burger.
Abstract
Arguably, the most bizarre mitochondrial DNA (mtDNA) is that of the euglenozoan eukaryote Diplonema papillatum. The genome consists of numerous small circular chromosomes none of which appears to encode a complete gene. For instance, the cox1 coding sequence is spread out over nine different chromosomes in non-overlapping pieces (modules), which are transcribed separately and joined to a contiguous mRNA by trans-splicing. Here, we examine how many genes are encoded by Diplonema mtDNA and whether all are fragmented and their transcripts trans-spliced. Module identification is challenging due to the sequence divergence of Diplonema mitochondrial genes. By employing most sensitive protein profile search algorithms and comparing genomic with cDNA sequence, we recognize a total of 11 typical mitochondrial genes. The 10 protein-coding genes are systematically chopped up into three to 12 modules of 60-350 bp length. The corresponding mRNAs are all trans-spliced. Identification of ribosomal RNAs is most difficult. So far, we only detect the 3'-module of the large subunit ribosomal RNA (rRNA); it does not trans-splice with other pieces. The small subunit rRNA gene remains elusive. Our results open new intriguing questions about the biochemistry and evolution of mitochondrial trans-splicing in Diplonema.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20935050 PMCID: PMC3035467 DOI: 10.1093/nar/gkq883
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Sequences deposited in the public domain
| Sequence | Length (bp) | Description | GenBank acc. no. | References |
|---|---|---|---|---|
| Chromosome A (A3207) | 5852 | Carries | EU12356 | Marande and Burger ( |
| Chromosome A (A3208) | 5802 | Carries | HQ288823 | This report |
| Chromosome A (A4001) | 5794 | Carries | HQ288824 | This report |
| Chromosome B (B3209) | 7182 | Carries | EU12357 | Marande and Burger ( |
| set of sequences | Comprise | HQ288825-33 | This report | |
| 1214 | Sequences include 5′-UTR and A-tail; all module junctions are annotated | HQ288819 | This report | |
| 2280 | EU123538 | This report: updated version of Marande and Burger ( | ||
| 924 | HQ288820 | This report | ||
| 1012 | HQ288821 | This report | ||
| 1251 | HQ288822 | This report |
Figure 1.Mitochondrial chromosome architecture in D. papillatum. (A) The constant region is identical within Classes A and B chromosomes. The portion shared by Classes A- and B chromosomes is indicated in dark grey. The cassette includes the coding region (gene module) and two unique module-flanking regions. (B) repeat structure of two representative chromosomes, A3207 (Class A) and B3209 (Class B). Triangles denoted a–h symbolize distinct repeat motifs that are all arranged in the same orientation. The replication origin inferred from the GC skew is indicated by a hollow arrow. For GenBank accession numbers, see Table 1.
Cassette structure for completely determined genes and cDNAs
| Gene | Module no. | Length of cassette | Chromosome class (chromosome id) | Strand |
|---|---|---|---|---|
| 1 | 276 (24/198/54) | A | + | |
| 2 | 286 (34/154/96) | A | – | |
| 3 | 290 (53/138/99) | A | + | |
| 4 | 294 (42/198/54) | A | – | |
| 5 | 317 (15/279/22) | A | – | |
| 6 | 206 (39/123/44) | B | – | |
| 1 | 282 (52/195/35) | B | + | |
| 2 | 222 (34/124/63) | A | + | |
| 3 | 321 (26/263/32) | A | + | |
| 4 | 310 (43/226/47) | B (B3209) | + | |
| 5 | 266 (63/179/24) | A | + | |
| 6 | 284 (35/169/80) | A | + | |
| 7 | 241 (116/89/36) | A | + | |
| 8 | 251 (118/110/23) | A | + | |
| 9 | 311 (11/248/52) | A (A3207; A3208) | – | |
| 1 | 308 (41/237/30) | A | + | |
| 2 | 248 (80/160/8) | B | + | |
| 3 | 288 (57/76/155) | A | + | |
| 4 | 284 (26/125/133) | A | – | |
| 1 | 357 ( | A | + | |
| 2 | 333 (55/266/12) | A | + | |
| 3 | 304 (17/230/57) | A | – | |
| 1 | 296 (40/221/35) | A | + | |
| 2 | 274 (110/75/89) | A | – | |
| 3 | 289 ( | A | + | |
| 4 | 186 (46/ | B | – | |
| 5 | 284 (81/192/11) | A | + | |
| 6 | 295 (36/66/ | A (A4001) | – | |
| 7 | 253 (31/169/53) | B | + | |
| 8 | 219 (34/182/ | A | + | |
| 9 | 273 (66/79/128) | A | – |
aSizes of modules and flanking regions have been inferred by comparison of genomic and cDNA sequences. The chromosome class was inferred from the sequence of the constant region adjacent to the cassettes. Underscores highlight the minimum and maximum length of modules and module-flanking regions. In cases where module junctions are not precisely known (see text), the shortest length is indicated. For sequence accession numbers, see Table 1.
In silico assignment of gene modules in mtDNA sequence
aShading indicates modules that a given gene includes. Dark shading, modules for which cDNA data were obtained. X, identified module displaying a strong hit (Blast: ≤1.0e-12; HMM-full: ≤1.0e-8; HMM-partials: ≤1.0e-5). x, identified module displaying a moderate hit (Blast: between <1.0e-12 and >1.0e-5; HMM-full: >1.0e-8; HMM-partials: >1.0e-5). +, identified module displaying a marginal hit with an e-value in the range of numerous false positives (Blast: ≥1.0e-5; HMM-full: >2.0e-2; HMM-partials: >1.0e-2); true positives have been validated by the criteria described in the ‘Materials and Methods’ section. –, non-identified module by similarity search with top hits of unrelated proteins. Modules which could not be identified by the three in silico methods are likely not missing from the genomic data set (see ‘Materials and Methods’ section for coverage), but rather unrecognizable. Note that many modules have been assigned by database searches with the conceptually translated contiguous cDNA sequence, while no significant results were obtained with individual gene modules.
bProteins encoded by mtDNA. For full product names, see Table 4.
cBlast-nr, remote blastp search of proteins <14 residues long, conceptually translated in all six frames from Diplonema cassette sequences, against the non-redundant nucleotide database in GenBank. Hits to published Diplonema sequences were not considered in this table. HMM-full, search with a profile HMM, which was constructed from complete homologous proteins from non-diplonemid species, against proteins conceptually translated in all six frames from Diplonema cassette-sequences. HMM-partials, profile HMMs were constructed from protein sub-regions corresponding to modules in Diplonema mtDNA.
Gene content and structure of D. papillatum mtDNA
aGenes and corresponding gene products are: atp6, ATP synthase subunit 6; cob, cytochrome b apoprotein; cox1-cox3, cytochrome c oxidase subunits; nad1-nad9, NADH dehydrogenase subunits; rps12, ribosomal protein S12; rnl, LSU rRNA; rns, SSU rRNA. Completely sequenced genes and cDNAs are highlighted by grey background.
bThe gene set of Trypanosoma brucei serves for comparison. Data are inferred from the following GenBank accession nos. (protein GI). cox1: AAB59223 (343538); cox3: AAA32122 (343596); cob: CAA24915 (578828); atp6: AAA97428 (343544); nad1, rps2, cox2: M94286 (343546); nad4: AAB59224 (343539); nad5: AAB59225.1 (343540); nad7: M55645 (343542); nad8 AA91499 (552291); nad9: AAA03749 (162166).
cm1…m12, all modules from m1 to m12. Modules in boldface indicate those so far only found in genomic DNA. Modules exclusively found in the 454 data set are nad5-m3 and nad8-m2. For incomplete genes, module numbers have been estimated from multiple protein alignments assuming an average module length of 170 bp.
dTentative module numbers, see footnote c.
ePoly-adenylated 3′-terminal module.
Figure 2.Genomic cox1 modules aligned according to cox1 cDNA. Parts of module-flanking regions are shown as well. m1–m9, gene Modules 1–9. Six Ts are present at the junction of Modules 4 and 5 in cDNA, but not in the corresponding genomic modules or their flanking regions. Bold uppercase, sequence of genomic modules corresponding to that of cDNA. Lowercase, sequence of module-flanking regions in mtDNA. Five module junctions (1/2, 2/3, 5/6, 7/8, 8/9) cannot be inferred accurately from the genomic sequence alone; all junctions except 7/8 (highlighted in grey) have been mapped by sequencing transcript intermediates (not shown).