| Literature DB >> 16476162 |
Andreas Simon1, Gernot Glöckner, Marius Felder, Michael Melkonian, Burkhard Becker.
Abstract
BACKGROUND: The Viridiplantae (land plants and green algae) consist of two monophyletic lineages, the Chlorophyta and the Streptophyta. The Streptophyta include all embryophytes and a small but diverse group of freshwater algae traditionally known as the Charophyceae (e.g. Charales, Coleochaete and the Zygnematales). The only flagellate currently included in the Streptophyta is Mesostigma viride Lauterborn. To gain insight into the genome evolution in streptophytes, we have sequenced 10,395 ESTs from Mesostigma representing 3,300 independent contigs and compared the ESTs of Mesostigma with available plant genomes (Arabidopsis, Oryza, Chlamydomonas), with ESTs from the bryophyte Physcomitrella, the genome of the rhodophyte Cyanidioschyzon, the ESTs from the rhodophyte Porphyra, and the genome of the diatom Thalassiosira.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16476162 PMCID: PMC1413533 DOI: 10.1186/1471-2229-6-2
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Mesostigma viride cDNA libraries used.
| Name | cDNA | Number of primary clones | Percentage recombinant clones | size of inserts1 (bp) | average size of inserts1 (bp) | number of ESTs sequenced |
| Meso1 | small size fraction | 310 000 | 90 | 250–2000 | 706 | 100 |
| Meso2 | large size fraction | 295 000 | 88 | 600–3200 | 1142 | 4954 |
| Meso3 | total cDNA normalized | 106 000 | 63 | 100–1100 | 650 | 535 |
| Meso4 | large size fraction normalized | 304 000 | 56 | 400–6600 | 2025 | 2403 cDNAs = 4806 ESTs2 |
1 determined by agarose gel electrophoresis, 2sequenced from 3' and 5'end
Summary of Mesostigma viride expressed genes obtained from four cDNA libraries (Meso 1 – Meso 4).
| Category | No of contigs | ||
| Mitochondrial1 | 36 | ||
| Plastidic1 | 65 | ||
| Bacterial2 | 193 | ||
| Novel | 1691 | ||
| with recognizable protein motif | 574 | ||
| no protein motifs | 1117 | ||
| Similarity | 1315 | ||
| with known function3 | 574 | ||
| unknown function4 | 395 | ||
| low similarity5 | 346 | ||
| Total Contigs | 3300 |
1 sequences showing only similarity to organelle genomes. 2 sequences showing only similarity to bacterial sequences or the highest similarity to bacterial sequences; the origin of these putative bacterial contaminations is currently not clear, as bacteria-free cultures of Mesostigma were used. 3similarity to proteins with a well-defined function (BLAST score >100). 4similarity to conserved proteins with no established function (BLAST score >100). 5low similarity to proteins from a few organisms (BLAST score generally between 100 and 200); might reflect conserved protein domains.
Functional classification of 3006 Mesostigma viride contigs using the KOG system [43] and an expectation threshold of e = 10-7.
| Functional Category | No. of Contigs |
| INFORMATION STORAGE AND PROCESSING | 236 |
| [J] Translation, ribosomal structure and biogenesis | 168 |
| [A] RNA processing and modification | 28 |
| [K] Transcription | 26 |
| [L] Replication, recombination and repair | 9 |
| [B] Chromatin structure and dynamics | 5 |
| CELLULAR PROCESSES AND SIGNALING | 209 |
| [D] Cell cycle control, cell division, chromosome partitioning | 7 |
| [Y] Nuclear structure | 0 |
| [V] Defense mechanisms | 2 |
| [T] Signal transduction mechanisms | 31 |
| [M] Cell wall/membrane/envelope biogenesis | 9 |
| [N + Z] Cytoskeleton and cell motility | 18 |
| [W] Extracellular structures | 0 |
| [U] Intracellular trafficking, secretion, and vesicular transport | 41 |
| [O] Posttranslational modification, protein turnover, chaperones | 101 |
| METABOLISM | 212 |
| [C] Energy production and conversion | 87 |
| [G] Carbohydrate transport and metabolism | 35 |
| [E] Amino acid transport and metabolism | 24 |
| [F] Nucleotide transport and metabolism | 9 |
| [H] Coenzyme transport and metabolism | 13 |
| [I] Lipid transport and metabolism | 21 |
| [P] Inorganic ion transport and metabolism | 16 |
| [Q] Secondary metabolites biosynthesis, transport and catabolism | 7 |
| POORLY CHARACTERIZED | 158 |
| [R] General function prediction only | 55 |
| [S] Function unknown | 37 |
| [X] Unnamed protein | 66 |
| Unknown | 2191 |
Figure 1Classification of expressed genes from . All non-redundant expressed genes were used as a query in (t)blastx similarity searches with the Swissprot, Genbank, Chlamydomonas, Cyanidioschyzon, Porphyra, Physcomitrella, Arabidopsis and Oryza data sets. The outermost circle represents all Mesostigma expressed genes. The inner circles, which are labeled chlorophyte, streptophyte and rhodophyte, represent genes, which have similarity to chlorophyte, streptophyte or rhodophyte sequences, respectively. The areas depicted are not proportional to the gene numbers and the number of Mesostigma expressed genes in each category is written in each segment. Numbers in brackets indicate the number of expressed genes in a category after removal of low similarity hits (see Table 2 for a definition of low similarity hits).
Comparison of the Mesostigma expressed genes with the genomes and ESTs from various organisms. Average identity (AI) of pair wise comparisons of Mesostigma expressed genes with the indicated organismal data set.
| Data set (No. of contigs) | ||||||||
| Genome | EST | Genome | EST | Genome | EST | Genome | Genome | |
| All (969) | 0.573 | 0.563 | 0.522 | 0.528 | 0.528 | 0.590 | 0.577 | 0.565 |
| Constrained (314) | 0.648 | 0.653 (n = 244)1) | 0.557 | 0.585 (n = 188)1) | 0.569 | 0.675 (n = 301)1) | 0.679 | 0.671 |
| Evolutionary distance D2) | 0.473 | 0.463 | 0.658 | 0.597 | 0.631 | 0.425 | 0.418 | 0.432 |
The total data set contains all Mesostigma expressed genes with significant similarity to proteins from other organisms with known or unknown function (see Table 2). The constrained data set contains only Mesostigma expressed genes with significant similarity to proteins in all completely sequenced eukaryotic autotroph organisms.
1) Number of ESTs showing similarity to Mesostigma expressed genes from the constrained data set in a tBLASTX analysis. 2) Evolutionary distances were calculated using the constrained data set and the approximation given by Kimura [28]: D = -ln (1 - p - 0.2 p2), where p is the fraction of amino acid that differs between the two species.
Statistical significance of the obtained AI values. A paired students t-test was performed for the constrained data set to test whether the observed differences between the average identity of pair wise comparisons of Mesostigma expressed genes with the indicated organismal data set are significant. Differences are considered significant when p is < 0.0071 (0.05/8 Bonferroni adjustment [22]).
| Variable1 | mean3 | standard deviation4 | t-value | Degrees of freedom | ρ | ||
| 1 | 244 | 0.652992 0.652992 | 0.149868 0.151255 | -0,03107 | 243 | 0.975239 | |
| 2 | 301 | 0.649934 0.674618 | 0.149292 0.153641 | -3.24578 | 300 | 0.001304 | |
| 3 | 314 | 0.648057 0.677994 | 0.148512 0.140940 | -4.44025 | 313 | 0.000012 | |
| 4 | 314 | 0.648057 0.670382 | 0.148512 0.148384 | -3.15371 | 313 | 0.001768 | |
| 5 | Mesostigma/Physcomitrella | 302 | 0.675364 0.681159 | 0.153933 0.140567 | -135783 | 301 | 0.175535 |
| 6 | Mesostigma/Physcomitrella | 302 | 0.675364 0.673311 | 0.153933 0.147624 | 0,43158 | 301 | 0.666355 |
| 7 | 314 | 0.678730 0.671175 | 0.141321 0.148813 | 2.480053 | 313 | 0.013660 |
1 Compared data sets E = ESTs, G = Genome. 2 No. of genes shared between the compared data sets. 3 AI recalculated on the basis of the genes shared between the compared datasets. 4 Standard deviation of 3.
Figure 2Consistency of the constrained data set used to calculate AI values. (A) The figure illustrates the effect of the number of genes included in the AI-values. The significant differences in the AI values are stable when more than 150 genes are included. (B) 150 genes were resampled randomly and the AIs calculated for the indicated organisms (1 – 8). AI values were calculated for the 150 most strongly (9, as revealed by the number of ESTs in a contig) and weakly (10, only single ESTs) expressed genes.
Figure 3Alignment of the deduced amino acid sequence of the putative GAPDHB (Meso2a42g12) gene from . The conserved cysteine residues are indicated in red letters. Numbers refer to the amino acid position (spinach) or nucleotide position (Mesostigma).
Figure 4Phylogenetic tree of glycolate oxidase and glycolate oxidase-like genes. The tree shown was derived by Bayesian inference analysis from 402 amino acid positions using a mixed model for amino acid substitutions and a gamma correction for rate variation among sites. The Bayesian inference utilized MRBAYES, Ver. 3.0 * with posterior probabilities derived from 100000 generations and discarding a burnin of 1000. The tree obtained with a parsimony analysis using PHYLIP gave essentially the same topology.
Comparison of Mesostigma genes related to metabolic functions with Chlamydomonas and three embryophytes. The average identity (AI) of pair-wise comparisons of Mesostigma expressed genes coding for the indicated metabolic function with the ESTs or genome of the given organisms are presented.
| Function1 | ||||
| Metabolism (259) | 0.613 | 0.601 | 0.593 | 0.594 |
| Plastidic metabolism (107)2 | 0,595 | 0,572 | 0,567 | 0,572 |
| Mitochondrial metabolism (16)2 | 0,647 | 0,593 | 0,569 | 0,578 |
| cytosolic metabolism (glycolysis, NDP-sugar metabolism, nucleotide synthesis, 22)2 | 0,568 | 0,639 | 0,641 | 0,632 |
1 Numbers in brackets indicate the number of genes in this category.
Comparison of Mesostigma genes related to cell structure functions with the genome or ESTs of Chlamydomonas and three embryophytes. The average identity of pair-wise comparisons of Mesostigma expressed genes coding for the indicated cellular functions with the ESTs or genomes of the given organisms are presented.
| Function1 | ||||
| Cell structure (201) | 0.629 | 0.641 | 0.627 | 0.618 |
| Cytoskeleton (7) | 0.751 | 0.729 | 0.0.731 | 0.731 |
| Protein folding/Chaperones (21) | 0.685 | 0.605 | 0.580 | 0.581 |
| Cytosolic protein degradation (22) | 0.682 | 0.733 | 0.722 | 0.703 |
| Vesicular transport (22) | 0.633 | 0.701 | 0.680 | 0.673 |
| Regulation (18) | 0.585 | 0.599 | 0.548 | 0.544 |
| DNA structure, replication, cell cycle (21) | 0.580 | 0.616 | 0.655 | 0.625 |
| Transcription (16) | 0.684 | 0.675 | 0.647 | 0.623 |
| RNA metabolism (23) | 0.599 | 0.619 | 0.637 | 0.637 |
1 Numbers in brackets indicate the number of genes in this category.
Regulation of plastidic enzymes by the thioredoxin system. Proteins similar to embryophyte plastidic thioredoxin-regulated proteins were identified in the genomes of Cyanidioschyzon, Chlamydomonas, and the ESTs of Mesostigma using the BLASTP or BLASTX algorithms. A putative thioredoxin-regulated orthologue as revealed by the conserved cysteine residues is indicated with +. An asterisk indicates putative cyanobacterial/plastidic proteins, which do not contain the conserved cysteines required for thioredoxin-regulation. Missing enzymes are indicated with -.
| PRK | + | + | + | + | + |
| SDPase | * | + | + | + | + |
| G6PDH | * | + | + | + | |
| FBPase | * | *1) | + | + | + |
| γ-ATPase | * | * | + | + | + |
| GABDHB | - | - | - | + | + |
| NADP-MDH | - | - | +2) | + | |
| Rubisco activase | -3) | - | * | * | (+)4) |
n.d. not detected in Mesostigma. 1) In Galdieria (Cyanidioschyzon) 2 (1) of the 3 conserved cysteines occurring in the Viridiplantae are present [48]. 2) Chlorophyte NADP-malate dehydrogenase possesses a C- and N-terminal extension like the embryophyte enzyme, however only the C-terminal cysteines of the embryophyte enzyme are conserved [49, 50]. 3) A few cyanobacteria contain an unusual rubisco activase. Only the central AAA+ domain shows similarity to plant rubisco activases, whereas the N and C terminal domain are very different [51]. 4) Many angiosperms contain two forms of rubisco activase. Only the long form is regulated by the thioredoxin system [52].