| Literature DB >> 15287975 |
Rodrigo A Gutiérrez1, Pamela J Green, Kenneth Keegstra, John B Ohlrogge.
Abstract
BACKGROUND: The availability of the complete genome sequence of Arabidopsis thaliana together with those of other organisms provides an opportunity to decipher the genetic factors that define plant form and function. To begin this task, we have classified the nuclear protein-coding genes of Arabidopsis thaliana on the basis of their pattern of sequence similarity to organisms across the three domains of life.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15287975 PMCID: PMC507878 DOI: 10.1186/gb-2004-5-8-r53
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Identification of plant-specific proteins. (a) Classification of Arabidopsis proteins based on their pattern of sequence similarity to other organisms. The 27,288 Arabidopsis proteins were classified on the basis of their phylogenetic profiles (PP). Each PP recorded whether similar sequences were found or not found in the protein sets from the following organisms: Homo sapiens, Rattus norvegicus, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Schizosaccharomyces pombe, Saccharomyces cerevisiae, a combined set of 88 species of Bacteria, and a combined set of 16 species of Archaea. Not drawn to scale. (b) Identification of putative plant-specific proteins. The Arabidopsis proteins that lack similarity to any other organism (7,868 proteins represented in the black circle in (a)) were compared against sequences in the expressed sequence tag (EST) database of Arabidopsis and 13 other plant species. A total of 3,848 Arabidopsis proteins were identified as plant specific because they showed sequence similarity to proteins in the Arabidopsis EST database and to proteins in EST databases of at least four other plant species (at E-value ≤ 10-10). In addition, 892 other Arabidopsis proteins show similarity to the Arabidopsis and one to three other plant EST databases, 2,691 Arabidopsis proteins exhibit similarity to sequences in the Arabidopsis EST database only, and 437 lack similarity to any sequence in the EST databases used.
Proteins with poor annotation are abundant among plant-specific proteins
| Plant-specific proteins | Whole | ||
| Hypothetical protein | 557 (14%) | 12 (0.5%) | 4,363 (15%) |
| Unknown proteins | 3 (0.1%) | 0 (0.0%) | 26 (0.1%) |
| Expressed protein | 1,457 (38%) | 50 (2.0%) | 6,683 (23%) |
| Total | 2,017 (52%) | 62 (2.5%) | 11,072 (38%) |
| Total in class | 3,848 | 2,436 | 28,581 |
Comparison of the number of proteins annotated as 'expressed protein', 'hypothetical protein' or 'unknown protein' in the list of plant-specific proteins, proteins conserved throughout the phylogeny and in the whole Arabidopsis proteome. Total number of protein sequences per category (percentage relative to the total in the class) are shown. *Although the Arabidopsis TIGR genome release v3.0 (2002) was used to make the classification (plant-specific or other groups) and for all the other data analysis in this study, the numbers in this table reflect the latest protein annotation available from TIGR (genome release v4.0, April 2003).
Prediction of subcellular localization and transmembrane helices
| Unknown plant-specific proteins | Known plant-specific proteins | Whole | |
| Any other location | 1,073 (53%) | 1,006 (55%) | 15,706 (58%) |
| Chloroplast | 360 (18%) | 253 (14%) | 3,972 (15%) |
| Mitochondria | 254 (13%) | 178 (10%) | 2,963 (11%) |
| Secretory pathway | 330 (16%) | 394 (22%) | 4,647 (17%) |
| Membrane associated | 220 (11%) | 89 (5%) | 2,075 (8%) |
| 2,017 | 1,831 | 27,288 |
Total number of protein sequences per category (percentage relative to the total in the group).
Arabidopsis plant-specific proteins with known or hypothetical function and that are involved in central cellular processes
| Gene family | Number of plant-specific proteins |
| AP2/ERF | 124 |
| ARF | 16 |
| B3 | 14 |
| bHLH | 31 |
| bZIP | 12 |
| C2C2 | 42 |
| C2H2 | 2 |
| EIN/EIL | 5 |
| GRAS | 28 |
| HD | 13 |
| Leafy | 1 |
| MADS | 2 |
| MYB | 25 |
| NAC | 73 |
| SBP | 11 |
| TCP | 21 |
| Trihelix | 1 |
| VP1/ABI3 | 1 |
| WRKY | 44 |
| Other | 28 |
| AUX/IAA | 24 |
| Other | 10 |
| 1 | |
| 7 | |
| F-box proteins | 115 |
Plant-specific proteins that are found in the AraCyc database
| Locus | Protein description | Metabolic pathway | Enzyme name | Reaction* |
| At1g78240 | Similar to early-responsive to dehydration stress ERD3 protein | Carbon monoxide dehydrogenase pathway | Methyltransferase | 2.1.1.- |
| At1g08550 | Violaxanthin de-epoxidase precursor, putative | Carotenoid biosynthesis | Violaxanthin de-epoxidase | RXN-325 |
| At1g08550 | Violaxanthin de-epoxidase precursor, putative | Carotenoid biosynthesis | Violaxanthin de-epoxidase | RXN-314 |
| At1g78240 | Similar to early-responsive to dehydration stress ERD3 protein | CO2 formation from methanol | Methyltransferase | METHTRANSBARK-RXN |
| At1g53520 | Chalcone-flavanone isomerase-related | Flavonoid biosynthesis | Chalcone isomerase | 5.5.1.6 |
| At5g05270 | Chalcone-flavanone isomerase family | Flavonoid biosynthesis | Chalcone-flavonone isomerase | 5.5.1.6 |
| At5g66220 | Putative chalcone-flavanone isomerase (chalcone isomerase) (CHI) | Flavonoid biosynthesis | Chalcone isomerase | 5.5.1.6 |
| At1g27690 | Lipase -related | Glycerol biosynthesis | Lipase | 3.1.1.3 |
| At5g03980 | gdsl-motif lipase/hydrolase protein | Glycerol biosynthesis | Lipase | 3.1.1.3 |
| At1g13280 | Allene oxide cyclase family similar to ERD12 | Jasmonic acid biosynthesis | Allene oxide cyclase | 5.3.99.6 |
| At1g19640 | Jasmonic acid biosynthesis | S-adenosyl L-methionine:jasmonic acid carboxyl methyltransferase | RXN1F-28 | |
| At1g13280 | Allene oxide cyclase family similar to ERD12 | Lipoxygenase pathway | Allene oxide cyclase | 5.3.99.6 |
| At4g21610 | Lsd1 like protein | L-serine degradation | LSD1 | 4.2.1.13 |
| At1g53520 | Chalcone-flavanone isomerase-related | Phytoalexin biosynthesis | Chalcone isomerase | 5.5.1.6 |
| At5g66220 | Putative chalcone-flavanone isomerase (chalcone isomerase) (CHI) | Phytoalexin biosynthesis | Chalcone isomerase | 5.5.1.6 |
| At5g05270 | Chalcone-flavanone isomerase family | Phytoalexin biosynthesis | Chalcone-flavonone isomerase | 5.5.1.6 |
| At1g03040 | bHLH protein component of the pyruvate dehydrogenase complex E3 | Pyruvate dehydrogenase | Pyruvate dehydrogenase (lipoamide) | 1.2.4.1 |
| At2g45880 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At3g23920 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At4g15210 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At4g17090 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At5g18670 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At5g45300 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At5g55700 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
| At2g32290 | Glycosyl hydrolase family 14 (beta-amylase) | Starch degradation† | Beta-amylase | 3.2.1.2 |
*EC number is given when available, otherwise the AraCyc [70] frame name for the reaction is given. †AraCyc designation for this metabolic pathway is 'Starch and cellulose biosynthesis'. However, as far as we know, genes in this family are only involved in starch degradation.
Figure 2Arabidopsis genes encoding plant-specific proteins exhibit preferential expression in organs. (a) Heat map showing the 600 plant-specific genes that exhibited differential expression in at least one microarray experiment comparing RNA samples from different plant organs. Microarray experiments were obtained from the Stanford Microarray Database. The mean was calculated for the replicates. Organ preferential expression was defined as a twofold or higher ratio in the comparison. Gene expression is expressed as the log2(ratio). The bar at the top right indicates the magnitude of change. Green indicates induction and red indicates depression of gene expression. Ref, reference sample; see Materials and methods for details. (b) For all organ comparisons the number of differentially expressed genes in the plant-specific category was statistically higher than the number of differentially expressed genes that are not plant specific. Calculation of the statistical significance was done using the chi-square test for contingency tables.
Plant-specific genes are preferentially expressed in organs compared with genes that are evolutionarily conserved
| Classification | Differentially expressed | Percentage of total | χ2 | ||
| Statistically significant organ-preferential expression | Plant specific | 600 | 56% | 51.27 | 8.1e-13 |
| Bacteria (includes cyanobacteria) | 122 | 67% | 32.74 | 1.1e-08 | |
| No statistically significant difference from a random sample | Eukaryotes and archaea | 37 | 61% | 4.82 | 2.8e-02 |
| Eukaryotes and bacteria | 370 | 46% | 0.22 | 6.4e-01 | |
| Archaea | 3 | 60% | 0.03 | 8.5e-01 | |
| Archaea and bacteria | 52 | 49% | 0.01 | 9.2e-01 | |
| Common to all | 15 | 48% | 0.00 | 9.6e-01 | |
| Statistically significant expression everywhere | Eukaryotes | 112 | 27% | 61.75 | 3.9e-15 |
The first column indicates the conclusion from the statistical test. The second column indicates the phylogenetic classification of the genes analysed in each row. The number of genes from each class (for example, plant-specific) that showed organ-preferential expression is indicated in the third column. The fourth column shows the percentage of genes that showed organ-preferential expression as compared to the total number of genes represented on Arabidopsis glass-slide microarrays for each class. The χ2 statistic and the p value are presented in the fifth and sixth columns, respectively.
Two groups of plant-specific genes exhibit common expression profiles
| PP* | Representative EST clone ID | Locus | Description |
| PS | 111O21XP | At1g19180 | Expressed protein |
| PS | 123B21T7 | At1g30755 | Expressed protein |
| PS | 209F11T7 | At1g63090 | F-box protein (SKP1 interacting partner 3-related) |
| PS | 181I16T7 | At1g72510 | Expressed protein |
| PS | 148B19T7 | At1g74950 | Expressed protein |
| PS | 40F4T7 | At2g23320 | Identical to WRKY DNA-binding protein 15 |
| EBA | 240G12T7 | At2g31880 | Putative leucine-rich repeat transmembrane protein kinase |
| EBA | 169J16T7 | At2g39660 | Putative protein kinase |
| PS | 172K21XP | At3g16860 | Expressed protein |
| PS | 94C19T7 | At3g25870 | Expressed protein |
| PS | 114O7T7 | At4g12070 | Expressed protein |
| PS | 250F15T7 | At4g19515 | Similar to disease resistance protein |
| PS | 137B1T7 | At4g30390 | Expressed protein |
| PS | 122N24T7 | At5g13180 | NAM-like protein; hypothetical senescence upregulated protein SENU5 |
| PS | 204H15T7 | At5g13200 | GRAM-domain-containing protein similar to ABA-responsive protein |
| E | 195M6T7 | At5g22250 | CCR4-associated factor-like protein |
| PS | 200J12T7 | At5g62520 | Expressed protein |
| PS | 169C12T7 | At1g05250 | Putative peroxidase |
| PS | 113H5XP | At1g52050 | Jacalin lectin family similar to myrosinase-binding protein homolog |
| EBA | 121N12T7 | At1g61590 | Putative serine/threonine protein kinase |
| PS | 40E4T7 | At1g74770 | Hypothetical protein; predicted by GenemarkHMM |
| B | 34E12T7 | At3g24670 | Polysaccharide lyase family 1 (pectate lyase) |
| PS | 122J15T7 | At4g14060 | Major latex protein (MLP)-related |
| PS | 194B13T7 | At4g15390 | Acyltransferase family |
| PS | 204N5XP | At4g26010 | Putative peroxidase |
| PS | 144C19T7 | At5g07080 | Transferase family similar to 10-deacetylbaccatin III-10-O-acetyl transferase |
| PS | 116F2T7 | At5g45070 | Putative disease resistance protein (TIR class) |
| PS | 110O2T7 | At5g57685 | Unknown protein; predicted by GenemarkHMM |
Experiments were ranked according to the proportion of genes in the cluster that were differentially expressed. The most important experiments for each cluster are indicated. *PP, phylogenetic profile. †Low expression in flowers compared to leaves, unstable and moderately unstable transcripts. ‡High expression in roots as compared to a reference made of the whole plant, repressed during shoot development from root explants. PS, plant specific; EBA, Arabidopsis protein with similarity to proteins in other eukaryotes, bacteria and archaea; B, Arabidopsis protein with similarity to proteins in bacteria.