Literature DB >> 16504176

Computational and experimental analysis identifies Arabidopsis genes specifically expressed during early seed development.

Cristian Becerra1, Pere Puigdomenech, Carlos M Vicient.   

Abstract

BACKGROUND: Plant seeds are complex organs in which maternal tissues, embryo and endosperm, follow distinct but coordinated developmental programs. Some morphogenetic and metabolic processes are exclusively associated with seed development. The goal of this study was to explore the feasibility of incorporating the available online bioinformatics databases to discover Arabidopsis genes specifically expressed in certain organs, in our case immature seeds.
RESULTS: A total of 11,032 EST sequences obtained from isolated immature seeds were used as the initial dataset (178 of them newly described here). A pilot study was performed using EST virtual subtraction followed by microarray data analysis, using the Genevestigator tool. These techniques led to the identification of 49 immature seed-specific genes. The findings were validated by RT-PCR analysis and in situ hybridization.
CONCLUSION: We conclude that the combined in silico data analysis is an effective data mining strategy for the identification of tissue-specific gene expression.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16504176      PMCID: PMC1420293          DOI: 10.1186/1471-2164-7-38

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Seeds are complex genetic entities with a diploid maternal genotype, derived from the ovary wall, a diploid embryo, with equal genetic contributions from the pollen donor and pollen recipient, and a triploid endosperm, in which the maternal genetic contribution is twice that of the paternal parent. Endosperm development is a process with many unique features determining the coordinated development and disappearance of a highly specialized organ [1]. During embryogenesis, the egg cell divides and develops into an embryo, passing through different developmental phases: globular, heart, torpedo, cotyledon, curled-cotyledon and maturation [2]. Key steps in early embryo development are the acquisition of a polar structure with a shoot-root axis, the formation of the apical and root meristems, and the differentiation of the cotyledon primordia. After this last stage, the size of the embryo increases and deposition of storage macromolecules begins. Finally, during maturation, the embryo desiccates. During this process, the seed coat develops from the two integuments that surround the embryo. Several of the processes described above are not present in any other plant tissues, so the genetic program for seed development is likely to involve the concerted activity of many seed-specific genes. Determination of the genes involved in seed development, and their functions, is one of the major goals in plant developmental biology. Mutational approaches have been extensively used to analyse seed development in Arabidopsis [3-5]. Several mutants have been isolated giving loss-of- or altered-seed development allowing the identification of several genes [6,7]. However, insertional mutagenesis has some deficiencies. For example, probably due to gene redundancy, many of the insertions in genes do not produce any detectable phenotype, and genes whose disruption produces alterations in seed development are not necessarily genes with seed specific expression [6]. In consequence, although mutational approaches have been, and still are, basic for understanding the processes involved in seed development, they are not enough to build a complete picture of the process. Expression profiling and definition of genes specifically or preferentially expressed in certain tissues complement the genetic and molecular approaches. The generation of EST collections and the oligonucleotide-based microarrays can produce reliable, high-quality data [8,9]. The deposition of the results of RNA profiling experiments in public databases provides a valuable tool for in silico analysis of organ specific gene expression. There have been several reports of EST-based computer analysis of human tissue transcriptomes [10-15], and computer analyses have been performed in differential human EST database searches [16]. EST abundance in plants is not as high as for humans, but for some species the total number of ESTs in publicly available databases exceeds the total number of genes by more than one order of magnitude. For example, the NCBI dbEST database release 111105 (November 11, 2005) [17] included 656,945 from Zea mays (maize), 600,039 sequences from Triticum aestivum (wheat), 420,789 from Arabidopsis thaliana (thale cress) and 406,790 from Oryza sativa (rice), compared with the 7,057,754 for humans. Despite this, there are few examples of in silico expression studies in plants [18,19]. From the complete sequencing of certain plant genomes, it is possible to monitor gene expression on a genome-scale using high-density oligonucleotide arrays [20]. Thousands of Arabidopsis arrays, containing probes for more than twenty thousand genes, have been processed, and systematic analyses of gene expression in different organs, developmental conditions and stress responses, have been performed [9,21-23]. The results of many of these are publicly available through web browser interfaces such as the Genevestigator tool [24-26]. In view of this, at least for Arabidopsis, data analysis rather than data collection is the first challenge for biologists in determining patterns of gene expression. The focus of this work was the identification of genes whose expression is specific in immature seeds. Firstly, we sequenced cDNA clones from isolated immature seeds. Secondly, we used in silico subtraction in a combination of EST selection and microarray data analysis in order to select genes with the desired pattern of expression. Finally, 49 genes specifically expressed during seed development were selected. Our study demonstrates the reliability of in silico subtraction methods in Arabidopsis and provides a basis for targeted reverse-genetic approaches aimed at identifying key genes involved in reproductive development in plants.

Results and discussion

Sequencing Arabidopsis young seed ESTs

ESTs from isolated Arabidopsis immature seeds are not very abundant in EST databases (Figure 1). Among the 420,789 Arabidopsis ESTs deposited (release 111105) [17], 10,854 correspond to isolated immature seeds, 10,800 correspond to seeds in mid-development stages [27] and only 54 were obtained from early stages of seed development. We constructed a cDNA library from developing Arabidopsis seeds isolated at a stage from mid-globular to curled-cotyledon (2 to 6 days after pollination) and obtained 178 single pass 5' end sequences (>140 bp). The average sequence length was 579 bp. Newly sequenced ESTs were assembled in contigs and gene identities were assigned querying against the Arabidopsis genome database at TAIR [28] using the BLAST algorithm. They corresponded to 95 individual genes: 93 nuclear and two from chloroplasts. Functional categories were determined based on GO data in the TAIR database [28]. 21% of the genes are linked to translation, 6% to carbohydrate metabolism and 5% to development. The function of 31% of the genes remained unknown. For two of the genes (At1g60987 and At2g02490) no ESTs have been previously sequenced.
Figure 1

Overview of EST libraries from isolated immature Arabidopsis seeds. At the top, a representation of the available EST collections extracted from immature seeds. Lines in colour represent the period of development covered by the library. The library code according to the TIGR Arabidopsis Gene Index ( [29]) is indicated next to the line. The number of ESTs available from the corresponding library is indicated above the line. Green lines correspond to previously existing EST collections, and the blue line corresponds to the new library described here. At the bottom, the stages of embryo and seed development, related to days after flowering (DAF), is shown [49]. The main processes associated with seed development are indicated.

Identification of genes specifically expressed in seeds during early development

A two step in silico subtraction procedure was used to select genes specifically transcribed in immature seeds. The first selection step was based on EST abundance and the second step on microarray data analysis. The objective of the first step was to identify genes having ESTs only from immature seeds and not from other organs. We divided the Arabidopsis EST libraries deposited in the TIGR Arabidopsis Gene Index [29] into three categories, according to the organs they were made from (Additional file 1): a) Immature seed: this includes 10,854 ESTs from four cDNA libraries (Figure 1). b) Other tissues: this includes 50,992 ESTs from 78 cDNA libraries obtained from vegetative tissues, non-pollinated flowers and dry seeds. c) Non-informative: this includes libraries obtained from mixed organs and whole plants, including libraries from siliques. Subtraction was done based on the EST contigs and gene assignations in TIGR Arabidopsis Gene Index [29]. We selected genes having corresponding EST sequences in category a (immature seeds) and not in category b (other tissues). 640 genes passed our first subtraction criteria (Additional file 2). Two correspond to chloroplast genes, three to mitochondrial genes and 26 had homology to parts of the Arabidopsis genome in which no genes have been reported. The second selection step was based on the Arabidopsis Affymetrix GeneChip® average data available on the Genevestigator analysis tool site [24-26]. We used the meta-analyzer program, which performs a heat map of normalized signal intensity values, corresponding to the different organs of the plant, for each gene. Values range from 0 to 100, 100 being the highest level of expression. We selected the genes using the following criteria: (i) The expression in seeds should be higher than 80. (ii) The expression in other organs should be lower than 5, except for siliques, carpels and inflorescences, as these three organs could contain immature seeds at the very early stages after pollination. Detected level 5 is probably low, but was chosen in order to avoid possible errors in the normalisation algorithm in the meta-analyzer program. (iii) The expression level in seeds should be higher or equal to the expression in siliques, carpels or inflorescences. 49 of the 634 selected genes were not considered in the second analysis because they are not included in the Arabidopsis Affymetrix 22K GeneChip®. Of the remaining 585 genes, 49 (8%) fulfilled the selection criteria and may represent genes specifically expressed in immature seeds (Table 1). From the non-selected genes, 51% did not fit the selection condition (i), 96% the selection condition (ii) and 35% the selection condition (iii). Surprisingly, 21% of the genes showed higher values in siliques than in seeds. The different conditions in which tissues were collected for cDNA synthesis and microarray hybridizations could explain these results.
Table 1

Genes selected by in silico subtraction

Gene AGI codeImm. seed ESTsIndi- ferent ESTsDefinitionFunctional categoryPattern of expression1MutantsTandem arraysSegmental duplication
At1g0379015Zinc finger (CCCH-type) family proteinRegulation of gene expressionIIc-11
At1g038904122Cruciferin 12S seed storage proteinNutrient reservoirIIb-21
At1g14950215Major latex protein type1Secondary metabolismIIc-41
At1g48130212PeroxiredoxinResponse to abiotic stressIIb-11
At1g4866010Auxin-responsive GH3 family proteinDevelopmentIIc-31
At1g620603225UnknownUnknownIIa-21
At1g6509026UnknownUnknownIIb-11
At1g6710033Seed specific protein Bn15D17AUnknownIIb-11
At1g73190816Tonoplast intrinsic protein 3.1Protein processingIIb-12
At1g8009032UnknownUnknownIIb-11
At2g2842014Lactoylglutathione lyase family proteinCarbohydrate metabolismIIc-11
At2g3352011UnknownUnknownIIc-11
At2g3470024Proline-rich glycoproteinDevelopmentI-11
At3g015701552OleosinNutrient reservoirIIb-11
At3g0417010Germin-like protein subfamily 1UnknownI-51
At3g0419010Germin-like protein subfamily 1UnknownI-51
At3g1296010Similar to seed maturation protein PM28UnknownIIc-11
At3g2465063ABI3 proteinRegulation of gene expressionIIbAbi3211
At3g2766070OleosinNutrient reservoirIIb-11
At3g4858011Xyloglucan:xyloglucosyl transferaseCarbohydrate metabolismIIc-11
At3g54940418Cysteine proteinaseProtein processingIIb-11
At3g6073022Pectinesterase-like proteinDevelopmentIIc-11
At3g6104010Cytochrome P450 monooxygenase-likeRespiration and energyIIc-11
At3g627305517Desiccation-related proteinResponse to abiotic stressIIb-11
At3g6304010UnknownUnknownIIb-11
At4g2514015Glycine-rich protein/ oleosinNutrient reservoirIIb-11
At4g2715068332S seed storage protein 2 precursorNutrient reservoirIIb-41
At4g2852092412S cruciferin seed storage protein (CRU3)Nutrient reservoirIIb-11
At4g367004816Globulin-like proteinNutrient reservoirIIa-11
At4g3705025Patatin-likeNutrient reservoirIIa-31
At5g0167011Aldose reductase-like proteinCarbohydrate metabolismIIc-11
At5g03860118Malate synthaseCarbohydrate metabolismIIc-11
At5g0401010UnknownUnknownIIc-11
At5g071901015Embryo-specific protein 3 (ATS3)UnknownIIb-11
At5g09640104Serine carboxypeptidase-likeProtein processingISng2311
At5g2247081Poly (ADP-ribose) polymerase family proteinProtein processingIIc-11
At5g404203968OleosinNutrient reservoirIIb-11
At5g4431051Late embryogenesis abundant protein-likeResponse to abiotic stressIIc-11
At5g4569046UnknownUnknownIIc-11
At5g4583011UnknownUnknownIIc-11
At5g481003019LaccaseResponse to abiotic stressIIa-11
At5g4919090Sucrose synthase (SUS2)Carbohydrate metabolismI-11
At5g5070094111-beta-hydroxysteroid dehydrogenase-likeResponse to abiotic stressIIb-21
At5g547407372S storage protein-likeNutrient reservoirIIb-11
At5g5524063Embryo-specific protein 1UnknownIIb-11
At5g5726010Cytochrome P450Respiration and energyIIb-12
At5g59170116Cell wall protein precursor, extensinDevelopmentIIb-11
At5g6249025AtHVA22bResponse to abiotic stressIIc-11
At5g6280010Seven in absentia (SINA) family proteinProtein processingIIb-11

(1) Information in Figure 4.

(2) Mutant is abscisic acid-insensitive and lacks seed dormancy.

(3) Mutant accumulates sinapoylglucose instead of sinapoylcholine.

The advantage of the selection method is demonstrated by the presence of several genes already characterized as specifically expressed in seeds, such as: abi3 [30]; At1g48130, encoding a peroxiredoxin (PER1) whose expression is restricted to seeds [31]; At1g67100, which is homologous to the Brassica Bn15D17A gene, highly and specifically expressed in embryos and seed coat at the early stages of seed development [32]; and At5g07190 and At5g55240, which encode embryo-specific proteins isolated in the course of a differential display experiment [33]. We also tested the direct application of the microarray subtraction without EST selection. We chose the first 1,500 genes from chromosome 1 (according to the AGI code) included in the Arabidopsis Affymetrix 22K GeneChip® (from At1g01010 to At1g18340). 28 of the 1,500 genes (1.9%) fell within the microarray-based selection criteria. If there is the same proportion in the whole genome, about 550 genes would be selected. These results indicate that Genevestigator may be a useful tool to investigate organ specific gene expression in Arabidopsis. However, data obtained from Genevestigator is based on the normalised average signal intensity values obtained from several array experiments [24-26]. The normalisation algorithms used to generate Genevestigator values could introduce false positives and negatives, particularly for genes with low levels of expression. In consequence, combining Genevestigator results with EST abundance data gives a more reliable dataset of genes specifically expressed in a certain organ, seeds in our case.

Experimental validation of the patterns of expression of the selected genes

We used RT-PCR to check our selection procedure (Figure 2). Ten genes were selected, five of which were only used in the EST based selection and not the microarray, and the other five genes passed both selection steps. Two genes were used as additional controls: actin, which is expressed in all tissues, and AtEm6, which is specifically expressed during late embryogenesis [34]. All 10 genes analyzed showed higher expression levels in siliques, but silique specificity is, in general, higher in the genes selected by EST and microarray than in the genes selected only by EST subtraction. Two of the genes in the EST and microarray group, At1g67100 and At5g22470, gave low levels of amplification in rosette leaves and At1g67100 also in stem. This difference between Genevestigator and experimental data could be a consequence of different levels of detection in RT-PCR and microarray experiments or different experimental conditions. They do not indicate strong bias in the results. EST and microarray based selection produces a specific, expression-based, list of genes.
Figure 2

RT-PCR analysis of the expression profiles of ten genes isolated by in silico screening. "EST + microarray" indicates genes isolated by the combination of EST selection and microarray data analyses. "EST" indicates genes isolated only by EST selection. Siliques 1 to 3 correspond to whole siliques at different stages of development (1, young green; 2, green fully developed; 3, desiccating siliques). Siliques I to V correspond to siliques at different stages of development (I, 0–4 daf; II, 4–8 daf; III, 8–12 daf; IV, 12–16 daf; V, 17–21 daf). In each case, the size of the bands was as expected.

Seed specific expression was further demonstrated by in situ hybridization for the At5g22470 gene encoding a Poly (ADP-ribose) polymerase family protein (PARP) (Figure 3). The At5g22470 transcripts were detected specifically in the embryo and not in the endosperm, pericarp, valves or septum. The profile of the expression of the At5g22470 gene is consistent with the predicted seed specific transcription.
Figure 3

Seed-specific transcript labelling of embryos at the late torpedo stage as shown by in situ hybridization of transverse sections of Arabidopsis siliques probed with digoxigenin-labelled At5g22470 mRNA, viewed under bright-field optics.

The RT-PCR experiments and the presence of genes known to be specifically expressed in seed demonstrate that the selection procedure identifies genes specifically, or at least, predominantly, expressed in developing seeds. The relatively low number of genes selected is probably a consequence of the small number of initial ESTs corresponding to immature seeds (11,032 sequences). This is especially true in the case of genes only expressed during very early stages of seed development, for which only 232 ESTs are available. A recent report showed that only 16,115 of Arabidopsis genes are represented in the EST databases [35]. An additional problem is that not all the genes are represented in the Affymetrix 22K GeneChip®. We estimate that, if all genes were present in EST and microarray databases about a hundred would have been selected by our in silico method. It has been proposed that the developmental processes occurring during embryogenesis are active during the vegetative development of the plant, therefore some genes may also be expressed in other growing organs of the plant, and so not seed specific.

Functional classification of the selected genes

The 49 selected seed-specific genes were grouped into different functional categories (Table 2) according to their predicted gene products, based on the Gene Ontology (GO) Consortium through the Arabidopsis consortium information [28]. The data were compared with the functional categories assigned for all Arabidopsis genes [36].
Table 2

Functional categories of the seed specific genes

Functional categoryWhole genome (%)Subtracted genes (%) (p-value)1
Amino acid metabolism0.10.01.00
Carbohydrate metabolism2.410.20.01*
Cell division cycle2.30.00.63
Defense0.90.01.00
Development6.08.20.54
Lipid metabolism0.90.01.00
Metabolism6.40.00.07
Nucleic acid metabolism3.10.00.41
Nutrient reservoir0.220.40.00*
Photosynthesis0.30.01.00
Protein processing9.410.20.81
Regulation of gene expression7.44.10.58
Respiration and energy4.04.11.00
Response to abiotic stress3.112.2 0.00*
Secondary metabolism0.72.00.28
Transport and subcellular trafficking8.70.00.02*
Transcription and splicing6.10.00.07
Translation2.70.00.64
Unknown38.428.60.17

1. p-value for the same or a stronger association of Fisher's exact test compared with total genome

*. p-value < 0.05.

14 of the selected genes correspond to genes of unknown function (28.6%). This is lower but not significantly different (Fisher's exact test, α = 0.05) to the percentage obtained for the total genome (38.4%). Particularly interesting is At1g62060, whose function is unknown but is represented in databases by a total of 57 EST sequences (32 from immature seed libraries). Two of the genes encode germin-like proteins (At3g04170 and At3g04190), and four have been listed as seed or embryo specific genes of unknown function (At1g67100, At3g12960, At5g07190 and At5g55240). Genes in the "nutrient reservoir" category represent 20.4% of the selection and include ten genes, four encoding oleosins, three globulins, two cruciferins and one a patatin-like protein. Accumulation of seed storage proteins is a highly seed specific process [37], so it is not surprising that the proportion of these genes in the selected group is significantly higher than that obtained for the whole genome (0.2%). The third category is "response to abiotic stress", which includes six genes (12.2%), and is significantly more abundant than in the whole genome (3.1%). This is an indication of the importance of genes providing stress-tolerance in correct seed development. Three of the genes encode oxidative stress-related enzymes, the function of two genes is related to desiccation (At3g62730 and At5g44310), and one is an ABA and stress inducible gene (At5g62490). Five genes involved in carbohydrate metabolism were selected (10.2%). This percentage is significantly higher than that observed for the whole genome (2.4%). This category includes a gene encoding a xyloglucan:xyloglucosyl transferase (At3g48580), an enzyme (E.C.2.4.1.207) involved in the biosynthesis of the cell wall. It also includes a gene encoding a sucrose synthase (At5g49190). Sucrose represents a signal for differentiation during embryo development and up-regulates storage-associated gene expression [38]. Five genes involved in protein modification, localization or degradation were selected (10.2%), two of them being proteases (At3g54940 and At5g09640). No genes involved in translation were selected, even though these represent 2.7% of the genes in the whole genome, nor any involved in transport and subcellular trafficking, even though these represent 8.7% of the genes in the whole genome. Four genes involved in different aspects of development (8%) were selected. Two of them are involved in cell wall synthesis or modification (At5g59170, encoding a cell wall protein precursor, extensin; and At3g60730, encoding a pectinesterase-like protein). This is an indication of the high rate of synthesis of new cell wall during seed development, and could also be an indication of the importance of specific cell wall components in co-ordinating gene expression programmes during embryo development [39], an effect observed in immature maize embryos [40]. The number of selected genes involved in development is not significantly higher than in the whole genome (60%). This is not surprising as the whole genome contains several genes involved, for example, in flower or root development. A third gene encodes an auxin-responsive GH3 family protein (At1g48660). Auxins are important signalling molecules involved in shoot/root axis establishment, among other processes [41]. Two genes involved in the regulation of gene expression (40%) were selected : abi3 and a gene encoding a CCCH-type zinc finger protein (At1g03790). Although not significantly, this number is lower than that observed for the whole genome (7.4%). The reduced number of transcription factor genes selected is surprising, but recent data from global analysis of gene expression indicate that the number of transcription factor genes specifically expressed during seed development is relatively low compared with other organs [8,42]. The expression of several MADS-box genes have been analyzed in different Arabidopsis tissues and it was found that, although many of these genes are expressed in embryonic tissue culture, few of them are exclusively expressed in this tissue [42]. Similarly, the number of specifically expressed transcription factor genes in developing siliques is relatively low compared to other tissues [8]. An additional explanation could be that, as this category of genes has relatively low levels of expression, they may be under-represented in EST collections used for selection. Finally, two genes involved in respiration and energy (4.1%) and one in secondary metabolism (2.0%) (At1g14950 encoding a major latex protein type 1) were selected. Interestingly, two of the most highly represented categories in the genome are not represented in our selection: metabolism (6.4%) and transcription and splicing (6.1%). Nor were any genes detected for cell division, metabolism of amino acids, nucleic acid or lipids, defense or photosynthesis. As these genes are involved in general cell processes, they are expressed in several tissues and organs and they are unlikely to be selected in a seed-specific subtraction.

Gene redundancy and mutant phenotypes

Mutational approaches have been extensively used in Arabidopsis to identify gene functions [3]. Mutation in about 800 genes produced loss of function phenotypes in Arabidopsis [6]. Of these, about 250 produce an altered embryo. Based on the information available in the Arabidopsis information resource (TAIR) [28] and Seedgenes [7], two of the 49 genes have a mutant phenotype (4%) (Table 1), and in only one of them the mutation produces alterations in embryo development (abi3). Gene redundancy may explain the reduced number of mutants detected. Many Arabidopsis genes are in tandem arrays or segmental duplications [43]. We examined how many of the genes in our selection were part of gene tandem arrays or duplicated in different parts of the genome (Table 1). 11 of the selected genes (22%) are duplicated, which is higher than that observed in the whole genome (17%) (p-value = 0.33 in Fisher's exact test).

Patterns of gene expression during silique and seed development

The patterns of expression during seed development were investigated for each of the selected genes. Expression data was obtained from the Digital Northern tool in Genevestigator [24], corresponding to microarray hybridization of Affymetrix ATH1GeneChip® microarrays using labelled cDNAs of siliques and seeds at different stages of development, from mid-globular to green cotyledon embryos [9]. We used SOTA analysis in the TMEV 3.1 analysis package to identify expression patterns during silique and seed development (Figure 4). From this analysis, we can distinguish four major patterns of expression (Table 1):
Figure 4

Expression profiles during seed development showing four different patterns of expression in the subtracted genes. Expression data are based on the microarray results [9]. Blue, pattern I; yellow, pattern IIa; red, pattern IIb; green, pattern IIc. Solid lines correspond to average expression and shaded areas to the standard errors. Developmental stages: 3, siliques with embryos at the mid-globular to early heart embryo stage; 4, siliques with embryos at the early to late heart-embryo stage; 5, siliques with embryos at the late heart to mid torpedo stages; 6, seeds with embryos at the late torpedo stage; 7, seeds with embryos at the late torpedo to early walking-stick stage; 8, seeds with embryos at the walking-stick to early curled-cotyledon stages; 9, seeds with embryos at the curled-cotyledon to early green-cotyledon stages; 10, seeds with embryos at the green cotyledon stage. The dotted line corresponds to 25% of the maximum expression.

Group I: higher expression at early seed development. Genes that reach the maximum level of expression between late torpedo and early walking-stick embryo stages. This group includes five genes: At5g09640, encoding a serine carboxypeptidase, At5g49190, encoding a sucrose synthase, At2g34700, encoding a proline rich glycoprotein, and two genes encoding germin-like proteins (At3g04170 and At3g04190). Group II: higher expression at mid seed development or later. The expression increases progressively, reaching the maximum level at the early cotyledon stage or later. In turn, SOTA analysis divided this class into three groups that can be distinguished by the stage at which their transcription level is higher than 25% of the maximum: • IIa. Very early expression. The expression increases to more than 25% of the maximum before the early embryo stage. Four genes are included in this group. At5g48100, encoding a laccase, At4g36700, encoding a globulin-like protein, At4g37050, encoding a patatin-like protein, and At1g62060, encoding a protein of unknown function. • IIb. Early expression. The expression increases to more than 25% of the maximum between the early heart and late torpedo stages. This group has 23 genes and includes the majority of the "nutrient reserve" genes. • IIc. Mid stage expression. The expression increases to more than 25% of the maximum later than the late torpedo stage. It includes 17 genes of diverse functions.

Conclusion

Despite the technical problems associated with the relatively reduced number of Arabidopsis ESTs available, we have demonstrated here that the combination of EST profiling with microarray-based in silico selection may be a quick and cheap first step in the identification of Arabidopsis genes specifically expressed in certain organs, or in response to certain environmental stimuli. The same method could be applied to several other plant species in which EST sequences are available from several different organs and under different conditions (maize, wheat, rice, barley soybean, loblolly pine, etc). However, microarray data available for species other than Arabidopsis are very limited and less openly accessible, severely limiting the applicability of our two-step selection approach. An increase in EST sequencing, using more specific libraries, and in the contents of public microarray databases will greatly contribute to the efficiency of the method in plants.

Methods

Plant material

Arabidopsis thaliana Col-0 plants were grown in soil, in growth chambers, at 22°C, with 18 h day. Plants used for root RNA extractions were grown on 0.8% (w/v) MS basal salt mixture agar plates in growth chambers, at 22°C, with 18 h day.

cDNA library construction and tag sequencing of expressed sequences

Total RNA was extracted from frozen seeds as previously described [44] and treated with RNAse-free DNAseI (Promega). Double stranded cDNA was built using the SMART cDNA Library Construction Kit (Clontech) according to the manufacturer's instructions, and introduced into the pCRII-TOPO (Invitrogen) vector for sequencing using the TOPO TA Cloning kit (Invitrogen). For sequencing, DNA was amplified using PCR primers specific for the plasmid vector (5'-GTCACGACGTTGTTAAACGACGGC-3' and 5'-GGAAACAGCTATGACCATGATTACG-3') and sequencing was carried out using a 5' specific primer (5'-GTATCAACGCAGAGTCG-3') and BigDye Terminator (Applied Biosystems) technology according to the manufacturer's instructions, in an ABI PRISM 3700 (Applied Biosystems). Cloning vector sequences were masked, and low quality and short (<190 bp) sequences removed. Homology searches for function assignment were performed using the BLASTN program in the Arabidopsis Information Resource (TAIR) [28]. EST sequences were deposited in the GeneBank database under the Accession numbers AM111128-AM111305.

In Silico Subtraction

Newly sequenced expressed sequence tags and 10,854 EST sequences of three libraries from immature Arabidopsis seeds (5564, 5576 and #C6I in TIGR Arabidopsis Gene Index [29] were used as the initial source of immature seed sequences. In silico subtraction was done using a second set of EST libraries that did not contain immature seed sequences (50,992 ESTs from 78 libraries). Comparisons were based on the tentative gene contigs classification in the TIGR Arabidopsis database [29]. Libraries constructed from mixed tissues which could include immature seeds, such as immature siliques, were not considered for the subtraction. Subtraction was done by comparing the lists of genes that are represented in "immature seed" EST libraries with the list of genes represented by in "other organ" EST libraries. A second selection step was based on the Arabidopsis Affymetrix GeneChip® data, available from the Meta-analyzer tool of the Genevestigator software [24-26]. Genes represented in the arrays with more than one probe were selected only when the results with all the probes passed the selection criteria.

Gene Ontology

Functional characterization was performed according to the Gene Ontology (GO) Consortium through the Arabidopsis consortium information [28]. Fisher's exact test was performed using the MATFORSK, Norwegian Food Research Institute online facility [45,46].

RT-PCR

Total RNAs were extracted from frozen organs of Arabidopsis as previously described [44] and treated with RNAse-free DNAseI (Promega). Total pre-treated RNA (2 μg) was reverse transcribed with the Omniscript reverse transcriptase kit (Qiagen) using an oligo-dT primer. cDNAs were amplified with specific primers (Table 3), and controls, with non-reverse transcribed RNA, were also used to detect gDNA contamination. The actin gene was used as a control for RNA loading. PCR reactions were performed using 0.2 mM of each dNTP, 360 μg/ml BSA and 1 pmol μL-1 of each primer in a final volume of 50 μL. The reaction mixtures were heated to 95°C for 5 min, followed by 28 cycles of 94°C for 30 sec, 55°C for 30 sec, and 72°C for 90 sec. Reactions were completed by incubating at 72°C for 10 min. The amounts of template cDNA and the number of PCR cycles were determined for each gene to ensure that amplification occurred in the linear range and allowed for good comparison of the amplified products. At least two independent analyses were carried out on the different RNA samples. Reactions were performed in a Minicycler (MJ Research, Waltham, MA) thermal cycler.
Table 3

Primers used for RT-PCR analysis

Gene (Atg)Forward primerReverse primer
At5g09640GACACACCAAACATCAGAACCGCTACTCATCATCCAAGGTCTCC
At5g22470TATGCTCTCTTCCGGTTCCTGGATGGAACCAACCGTCCACAAGG
At5g45690ACGATTGCGACTCCTCTAAACCGAACGGAGCCAATTTCTGCATC
At1g67100GCTCATGAACCTCCTCAACACCCCCGATCCAAGTCTTTGGTTCC
At3g60730TCAAGCTGTGGCGTTGAGAGTGGGTAAACGGAGAAGCCTCTTCC
At3g12203GGCACTGATCTCTGATGAACACTTCTGAACCATCCATGGTCTCC
At1g71691GCTTGTTCTTCATCGGAATGGGTACGACAAGGCGTTTCAAAGGG
At2g43260TTCCGGCTTGAACCATAACTGCTGAACCACCTTTTCTGCCTTCG
At1g68380TGTTTTATGGCCGCCGTATTCCTCCAAGTAAGCGTCCTATTCGC
At4g14780TCAAACTCGCTCTTGATCTCGCTTTCACCACCTCCTTCATCTCC

In situ hybridization

The protocol for in situ hybridization was done as previously described [47] except for the labelling of the probes and the detection of the signal. Probes were synthesized and labelled using the Boehringer digoxigenin system, and detected using the BM purple AP substrate (Boehringer). The probe was synthesized from the product of PCR amplification cloned into the pCRII-TOPO vector (Invitrogene).

Gene distribution in tandem arrays and mutants

The presence of the selected genes in tandem arrays was based on previously described data [43]. Genes whose loss-of-function give an embryo mutant phenotype were determined according to data previously collected [6,7].

Expression cluster analysis

For expression cluster analysis, we used the TIGR Multi Experiment Viewer (TMEV) software [48]. Original data was obtained from the Genevestigator tool [24-26] and correspond to a microarray analysis of silique and seed development [9].

Authors' contributions

CB carried out the experimental molecular genetic studies. Database searches and analyses were performed by CMV and CB. PP supervised the study and wrote the manuscript jointly with CMV and CB.

Additional file 1

Libraries used in the subtraction process step 1 Data obtained from the TIGR Arabidopsis Gene Index . Click here for file

Additional file 2

Genes selected by EST subtraction Genes having corresponding EST sequences in immature seed libraries and not in libraries of other tissues. Click here for file
  40 in total

1.  Isolation of total RNA from Arabidopsis thaliana seeds.

Authors:  C M Vicient; M Delseny
Journal:  Anal Biochem       Date:  1999-03-15       Impact factor: 3.365

2.  Expressed sequence tags: alternative or complement to whole genome sequences?

Authors:  Stephen Rudd
Journal:  Trends Plant Sci       Date:  2003-07       Impact factor: 18.313

Review 3.  Mechanisms of pattern formation in plant embryogenesis.

Authors:  Viola Willemsen; Ben Scheres
Journal:  Annu Rev Genet       Date:  2004       Impact factor: 16.830

4.  Expression profile of active genes in granulocytes.

Authors:  K Itoh; K Okubo; H Utiyama; T Hirano; J Yoshii; K Matsubara
Journal:  Blood       Date:  1998-08-15       Impact factor: 22.113

5.  ATS1 and ATS3: two novel embryo-specific genes in Arabidopsis thaliana.

Authors:  M L Nuccio; T L Thomas
Journal:  Plant Mol Biol       Date:  1999-04       Impact factor: 4.076

6.  Expression of MADS-box genes during the embryonic phase in Arabidopsis.

Authors:  Melissa D Lehti-Shiu; Benjamin J Adamczyk; Donna E Fernandez
Journal:  Plant Mol Biol       Date:  2005-05       Impact factor: 4.076

7.  Insertional mutagenesis of genes required for seed development in Arabidopsis thaliana.

Authors:  J McElver; I Tzafrir; G Aux; R Rogers; C Ashby; K Smith; C Thomas; A Schetter; Q Zhou; M A Cushman; J Tossberg; T Nickle; J Z Levin; M Law; D Meinke; D Patton
Journal:  Genetics       Date:  2001-12       Impact factor: 4.562

8.  Development and evaluation of an Arabidopsis whole genome Affymetrix probe array.

Authors:  Julia C Redman; Brian J Haas; Gene Tanimoto; Christopher D Town
Journal:  Plant J       Date:  2004-05       Impact factor: 6.417

Review 9.  Seed development and differentiation: a role for metabolic regulation.

Authors:  L Borisjuk; H Rolletschek; R Radchuk; W Weschke; U Wobus; H Weber
Journal:  Plant Biol (Stuttg)       Date:  2004-07       Impact factor: 3.081

10.  Identification of expressed sequence tags preferentially expressed in human placentas by in silico subtraction.

Authors:  David Miner; Aleksandar Rajkovic
Journal:  Prenat Diagn       Date:  2003-05       Impact factor: 3.050

View more
  11 in total

1.  Expression dynamics of metabolic and regulatory components across stages of panicle and seed development in indica rice.

Authors:  Rita Sharma; Pinky Agarwal; Swatismita Ray; Priyanka Deveshwar; Pooja Sharma; Niharika Sharma; Aashima Nijhawan; Mukesh Jain; Ashok Kumar Singh; Vijay Pal Singh; Jitendra Paul Khurana; Akhilesh Kumar Tyagi; Sanjay Kapoor
Journal:  Funct Integr Genomics       Date:  2012-03-31       Impact factor: 3.410

Review 2.  Functions of the poly(ADP-ribose) polymerase superfamily in plants.

Authors:  Rebecca S Lamb; Matteo Citarelli; Sachin Teotia
Journal:  Cell Mol Life Sci       Date:  2011-08-23       Impact factor: 9.261

3.  The paralogous genes RADICAL-INDUCED CELL DEATH1 and SIMILAR TO RCD ONE1 have partially redundant functions during Arabidopsis development.

Authors:  Sachin Teotia; Rebecca S Lamb
Journal:  Plant Physiol       Date:  2009-07-22       Impact factor: 8.340

4.  Gibberellic acid-mediated salt signaling in seed germination.

Authors:  Sang-Gyu Kim; Chung-Mo Park
Journal:  Plant Signal Behav       Date:  2008-10

5.  Gene networks and chromatin and transcriptional regulation of the phaseolin promoter in Arabidopsis.

Authors:  Sabarinath Sundaram; Sunee Kertbundit; Eugene V Shakirov; Lakshminarayan M Iyer; Miloslav Jurícek; Timothy C Hall
Journal:  Plant Cell       Date:  2013-07-19       Impact factor: 11.277

6.  Feasibility of the seed specific cruciferin C promoter in the self excision Cre/loxP strategy focused on generation of marker-free transgenic plants.

Authors:  Jana Moravcíková; Eva Vaculková; Miroslav Bauer; Jana Libantová
Journal:  Theor Appl Genet       Date:  2008-09-09       Impact factor: 5.699

7.  Transcription profiling of fertilization and early seed development events in a solanaceous species using a 7.7 K cDNA microarray from Solanum chacoense ovules.

Authors:  Faiza Tebbji; André Nantel; Daniel P Matton
Journal:  BMC Plant Biol       Date:  2010-08-12       Impact factor: 4.215

8.  Screening and identification of seed-specific genes using digital differential display tools combined with microarray data from common wheat.

Authors:  Xinglu Yang; Hongliang Xu; Wenhui Li; Le Li; Jinyue Sun; Yaxuan Li; Yueming Yan; Yingkao Hu
Journal:  BMC Genomics       Date:  2011-10-17       Impact factor: 3.969

9.  Genomic analysis of parent-of-origin allelic expression in Arabidopsis thaliana seeds.

Authors:  Mary Gehring; Victor Missirian; Steven Henikoff
Journal:  PLoS One       Date:  2011-08-17       Impact factor: 3.240

10.  Transcript profiling for early stages during embryo development in Scots pine.

Authors:  Irene Merino; Malin Abrahamsson; Lieven Sterck; Blanca Craven-Bartle; Francisco Canovas; Sara von Arnold
Journal:  BMC Plant Biol       Date:  2016-11-18       Impact factor: 4.215

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.