Literature DB >> 20130013

Coexpression analysis of tomato genes and experimental verification of coordinated expression of genes found in a functionally enriched coexpression module.

Soichi Ozaki1, Yoshiyuki Ogata, Kunihiro Suda, Atsushi Kurabayashi, Tatsuya Suzuki, Naoki Yamamoto, Yoko Iijima, Taneaki Tsugane, Takashi Fujii, Chiaki Konishi, Shuji Inai, Somnuk Bunsupa, Mami Yamazaki, Daisuke Shibata, Koh Aoki.   

Abstract

Gene-to-gene coexpression analysis is a powerful approach to infer the function of uncharacterized genes. Here, we report comprehensive identification of coexpression gene modules of tomato (Solanum lycopersicum) and experimental verification of coordinated expression of module member genes. On the basis of the gene-to-gene correlation coefficient calculated from 67 microarray hybridization data points, we performed a network-based analysis. This facilitated the identification of 199 coexpression modules. A gene ontology annotation search revealed that 75 out of the 199 modules are enriched with genes associated with common functional categories. To verify the coexpression relationships between module member genes, we focused on one module enriched with genes associated with the flavonoid biosynthetic pathway. A non-enzyme, non-transcription factor gene encoding a zinc finger protein in this module was overexpressed in S. lycopersicum cultivar Micro-Tom, and expression levels of flavonoid pathway genes were investigated. Flavonoid pathway genes included in the module were up-regulated in the plant overexpressing the zinc finger gene. This result demonstrates that coexpression modules, at least the ones identified in this study, represent actual transcriptional coordination between genes, and can facilitate the inference of tomato gene function.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20130013      PMCID: PMC2853382          DOI: 10.1093/dnares/dsq002

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

To elucidate functional relationships between genes, coexpression analysis has proven to be a powerful approach. From a practical point of view, coexpression analysis requires two technical bases. The first is transcriptome data. Several model organisms, including Escherichia coli, yeast, and Arabidopsis, have been regarded as excellent targets for coexpression analysis, since a large amount of microarray data are publicly available. The second technical basis is the development of analytical methods. Generally, once coexpression measures between genes (e.g. correlation coefficients) have been estimated, subsequent coexpression analysis steps include visualization of the coexpression relationships, identification of densely correlated gene groups, and interpretation of biological relevance.[1] Among several possible visualization methods, network representation provides an efficient way to depict complex relationships between many genes. To identify densely correlated gene groups, several algorithms have been developed based on connectivity of the network.[2-4] For biologically relevant interpretation, data from other types of ‘omics’ analyses (e.g. metabolomics and interactomics) often help greatly. With these analytical methods, coexpression analysis at last allows the function of an unknown gene to be inferred. Excellent technical bases for coexpression analysis have been established for the model plant Arabidopsis thaliana. To accumulate transcriptome data, various data repositories are now available, including NASCArrays,[5] GEO,[6] SMD,[7] ArrayExpress,[8] and AtGenExpress,[9-11] which collectively provide more than 4753 microarray data points (3 September 2009). The results of coexpression analysis using this huge data set were combined with comprehensive gene annotations and data from metabolomics analysis, and then functions of several unknown genes associated with glucosinolate and flavonoid biosynthesis were elucidated.[12-14] In parallel with the elucidation of function of individual genes, the genome-wide coexpression profile was also investigated and the results are available in databases such as ATTED-II,[15] GENEVESTIGATOR,[16] and BAR.[17] Recently, several attempts have been made for large-scale gene expression analysis of tomato (Solanum lycopersicum). For example, the gene expression profile during fruit development was investigated in detail using a tomato microarray.[18,19] Another example is the investigation of tissue-dependent gene expression. In fruit peel, coordinated expression of genes associated with metabolism of cuticular components, metabolism of hormones, and metabolism of cell wall components was demonstrated by hierarchical clustering analysis.[20] Hierarchical clustering of gene expression patterns also demonstrated that acquisition of the fleshy fruit trait depends on tight regulation of gene expression.[21] These studies suggest that large-scale coexpression analysis can shed light on the molecular mechanisms that control fruit development. However, these studies focused on few specific biological processes, and comprehensive identification of groups of highly correlated genes has not been reported. In this study, we report on the comprehensive identification of groups of highly correlated genes, or coexpression modules, in tomato. We performed a gene-to-gene coexpression analysis using a network-based approach. We evaluated developmental changes in gene expression profiles in various tissues of tomato plants using an Affymetrix GeneChip Tomato Genome Array. Using gene expression data from 67 hybridizations, gene-to-gene Pearson's correlation coefficients (PCCs) were estimated, and then 199 coexpression modules associated with various biological processes were identified based on an analysis of network topology. Gene ontology (GO) annotation analysis revealed enrichment of genes belonging to common functional categories in 75 modules. We then experimentally verified the coordinated expression of module member genes using tomato plants overexpressing a non-enzymatic module member gene that is a strong candidate for a regulatory gene in flavonoid biosynthesis. This result demonstrates the facilitation by coexpression analysis of the identification of the function of uncharacterized tomato genes.

Materials and methods

Plant materials

Miniature tomato, S. lycopersicum cultivar Micro-Tom, was grown as described previously.[22] Roots were harvested 5 weeks after germination. Hypocotyl and cotyledon were harvested 3 weeks after germination. Third leaves were harvested 3 weeks after germination. All leaves of tomato plants were harvested 3 and 5 weeks after germination. Fruits were harvested at four developmental stages: mature green (MG, ∼30 days after anthesis), yellow (Y, ∼35 days after anthesis), orange (O, ∼38–40 days after anthesis), and red (R, ∼45–48 days after anthesis). S. lycopersicum cultivar Momotaro 8® (Takii & Co., Ltd, Kyoto, Japan) was grown in a greenhouse under natural photoperiod conditions from March to July 2006 in Chiba Prefecture. S. lycopersicum line 27859 was grown under field conditions from March to July 2006 in Gunma Prefecture. Monogenic mutant tomato, Anthocyanin fruit (Aft, LA1996), was provided by the C. M. Rick Tomato Genetic Resource Center (University of California, Davis, CA, USA), and was grown in a greenhouse under natural photoperiod conditions from March to July 2006 in Chiba Prefecture. Fruits of Momotaro 8®, line 27859, and Aft were harvested at MG and R stages. The peel and the flesh of fruits of Micro-Tom, Momotaro 8®, line 27 859, and Aft were separated using a razor blade. Harvested tissues were immediately frozen in liquid nitrogen and stored at −80°C.

Preparation of RNA

Total RNA was extracted from tissues by an acid guanidinium thiocyanate-phenol-chloroform method.[23] Sugars were removed by a sodium acetate-precipitation method.[24]

DNA microarray analysis

Target for hybridization experiments was prepared using GeneChip One-Cycle Labeling and Control Reagents (Affymetrix, URL: http://www.affymetrix.com/) according to the manufacturer's instructions. GeneChip Tomato Genome Arrays (Affymetrix) were used for hybridization. Hybridization, washing, and staining were performed according to the manufacturer's instructions. Scanned GeneChip images were analysed using Microarray Suite version 5.0.1 software (Affymetrix). Normalization and analysis of microarray data were performed using GeneSpring GX 7.3 software (Agilent Technologies, URL: http://www.home.agilent.com/). The data were normalized per chip and per gene to the median value. CEL files of these experiments are available in Gene Expression Omnibus[6] (GEO) DataSets (http://www.ncbi.nlm.nih.gov/gds) series record GSE19326.

Coexpression analysis and network analysis

Before performing coexpression analysis, the probes used for the analysis were screened as follows. First, probes for which flags were ‘A’ (absent) in all of the samples were excluded. Second, the coefficient of variance between biological replicates of a tissue was calculated, and probes were selected if they showed a coefficient of variance <1 in all of the samples. This probe screening procedure left 7644 probes for the following coexpression analysis. Normalized values of the selected probes were used to estimate the pairwise PCC. The data set for the PCC was then analysed using a network-based module-finding algorithm described previously by Ogata et al.[4] This algorithm generates coexpression modules from a given ‘seed’ gene in six steps.[4] In the first step, a seed gene was arbitrary chosen. In the second step, genes that directly connect to the seed gene with PCC higher than cutoff value (0.6) were selected, and referred to as a highly correlated gene group. In the third step, VB index was defined as VB(i) = e(i)/d(i), where VB(i) is a VB value of ith gene in the group, e(i) the number of edges between ith gene and other group member genes, and d(i) the number of edges between ith gene and all genes irrespective of group membership. VB value was calculated for all group member genes, and a gene that has the lowest VB value was excluded from the group. In the fourth step, from the highly correlated gene group, a subgroup that had the highest NB value[4] was selected. NB value is defined as NB = Σe(i)/Σd(i), where definitions of e(i) and d(i) are the same as above. In short, NB represents a ratio of the number of edges within the subgroup to the number of all edges associated with subgroup members. The selected subgroup was referred to as ‘the best kernel gene group’. In the fifth step, VB value was calculated for all non-member genes. If a non-member gene had the ratio higher than threshold value, that gene was incorporated into the group. Finally, the best kernel group genes and genes incorporated in the fifth step were selected as members of a coexpression module. NB values of coexpression modules were calculated again, and coexpression modules with NB values >0.5 were selected. Threshold values were as follows: 0.6 for PCC, 0.333 for VB value, and 0.5 for NB value. For GO annotation of tomato genes, similarity search of the Affymetrix tomato consensus sequences that were used to design GeneChip probes (Tomato Consensus Sequences, downloaded from http://www.affymetrix.com/products_services/arrays/specific/tomato.affx#1_4) was performed against Arabidopsis genes (TAIR8_cdna_20080412, downloaded from the TAIR FTP site, http://www.arabidopsis.org/download/index.jsp) using the BLASTN algorithm. GO annotations of tomato genes were retrieved from TAIR GO Annotation Search (http://www.arabidopsis.org/tools/bulk/go/index.jsp) according to the best match to Arabidopsis genes.

Transformation of tomato plant

A full-length cDNA clone of zinc finger protein (clone ID: LEFL2003DB10, GenBank accession number AK326277) was provided by National Bio-Resource Project Tomato[25] (http://tomato.nbrp.jp/indexEn.html). Protein coding region of LEFL2003DB10 was amplified by PCR using a gene-specific primer set (5′-GGGGGGATCCATGGCAGTTGAGGCAAGACATC and 5′-GGGGGAGTCTTCAAGAAGACATGTTAACATGCAC). PCR product was cloned in between BamHI and SacI sites of pBE2113-GUS.[26] Transformation of S. lycopersicum cv. Micro-Tom was performed essentially as described by Sun et al.[27] with slight modification. Cotyledon and hypocotyl segments from 7-day-old seedling were used as explants. Explants were dipped in Agrobacterium tumefaciens (strain EHA105) suspension for 10 min and blotted dry on a sterilized paper towel. The explants were then placed on co-cultivation medium [MS salts, 3% (w/v) sucrose, 0.8% (w/v) agar, 1.75 mg/l zeatin, pH 5.8], and the plate was incubated for 48 h in the dark at 25°C. The explants were then cultured and selected on a callus induction plate containing MS salts, 3% (w/v) sucrose, 0.8% (w/v) agar, 1.5 mg/l zeatin, 50 mg/l kanamycin, 125 mg/l carbenicillin, 50 mg/l Meropen (Dainippon Sumitomo Pharma, Osaka, Japan) (pH 5.8). Every 2 weeks, calli were subcultured to a fresh callus induction plate. Subculture was repeated three times and zeatin concentration in the medium was gradually decreased (1.5, 1.0, and then 0.75 mg/l). Regenerated shoots were then rooted on a rooting plate containing half-strength MS salts, 3% (w/v) sucrose, 0.8% (w/v) agar, 50 mg/l Meropen (pH 5.8). Rooted plants were transferred to rock fibre (Nittobo, Tokyo, Japan, URL: http://www.nittobo.co.jp/english/index.htm), and then to a mixture of vermiculite and Powersoil (mix ratio 1:1, Kureha Chemical Ind., Tokyo, Japan, and Kanto Hiryou Ind., Saitama, Japan).

Real-time RT–PCR

RT–PCR experiments were performed to confirm gene expression patterns observed in microarray experiments. The total RNA samples used as templates in microarray analysis were reverse transcribed using SuperScript III First-Strand Synthesis System (Invitrogen Corp., URL: http://www.invitrogen.com/) according to the manufacturer's instructions. Following reverse transcription, PCR was carried out using rTaq DNA polymerase (Takara Bio Inc., URL; http://www.takara-bio.com/index.htm). Real-time PCR reactions to confirm gene expression were carried out using a DyNAmo™ HS SYBR® Green qPCR Kit (New England Biolabs Inc., URL: http://www.neb.com/nebecomm/default.asp) by a DNA Engine Opticon 2 system (MJ Research Inc., Waltham, MA, USA). Primers used in this study are shown in Supplementary data 1. Elongation factor 1a gene (GenBank accession number X14449) was used as a control.

Transient expression of GFP-fusion protein

Transient expression vectors of the zinc finger protein fused to GFP were produced as described in Supplementary data 2. CaMV35S-sGFP(S65T)-nos3′ vector[28] was used for transient expression of free GFP. The vectors were introduced to the epidermis of onion purchased from local market. Particle bombardment was performed by using Helios Gene Gun (Bio-Rad Laboratories, URL: http://www3.bio-rad.com/) according to the manufacturer's instruction. Expression of GFP-fusion proteins was monitored by using a confocal laser scanning microscope LSM700 (Carl Zeiss, URL: http://www.zeiss.com/). Image processing was performed using the ZEN 2008 software (Carl Zeiss).

Results and discussion

Identification of coexpression modules

We obtained gene expression data for tomato from 67 hybridizations using RNA derived from roots, hypocotyls, cotyledons, leaves, and fruits (Table 1, Supplementary data 3). To estimate coexpression profiles, we first calculated PCC values for all pair-wise combinations of the 7644 quality-checked probes (see Materials and methods). To find coexpression modules, we first generated network graphs using different PCC cutoff values. Network density became minimal at a cutoff value of 0.91, suggesting that, at a PCC cutoff value >0.91, a decreasing number of nodes are more tightly connected (Fig. 1A). Indeed, the numbers of nodes and edges decreased at PCC cutoffs >0.91 (Fig. 1B and C). However, even at a PCC cutoff of 0.95, the network was still complex, containing ∼1000 nodes and 5000 edges (Fig. 1B and C). Thus, we concluded that the use of a PCC cutoff value alone will not efficiently find coexpression modules to an extent allowing the inference of module functions.
Table 1

Microarray data used for coexpression analysis

Tomato cultivarTissueDevelopmental stageBiological replicatesID in Supplementary data 1
Micro-TomRoot5 weeks after germination3MT_5Wroot
Hypocotyl3 weeks after germination2MT_3Whypocoty
Cotyledon3 weeks after germination2MT_3Wcotyledon
Leaf3 weeks after germination2MT_3rdleaf
Leaf3 weeks after germination3MT_3Wleaf
Leaf5 weeks after germination3MT_5Wleaf
Fruit fleshMG, 30 days after anthesis3MT_MG_flesh
Fruit fleshY, 35 days after anthesis2MT_Y_flesh
Fruit fleshO, 38–40 days after anthesis2MT_O_flesh
Fruit fleshR, 45–48 days after anthesis3MT_R_flesh
Fruit peelMG, 30 days after anthesis3MT_MG_peel
Fruit peelY, 35 days after anthesis2MT_Y_peel
Fruit peelO, 38–40 days after anthesis2MT_O_peel
Fruit peelR, 45–48 days after anthesis3MT_R_peel
Aft (LA1996)Fruit fleshMG, 40 days after anthesis3Aft_MG_flesh
Fruit fleshR, 50–55 days after anthesis3Aft_R_flesh
Fruit peelMG, 40 days after anthesis3Aft_MG_peel
Fruit peelR, 50–55 days after anthesis3Aft_R_peel
Line27859Fruit fleshMG, 40 days after anthesis3Line27859_MG_flesh
Fruit fleshR, 50–55 days after anthesis3Line27859_R_flesh
Fruit peelMG, 40 days after anthesis3Line27859_MG_peel
Fruit peelR, 50–55 days after anthesis3Line27859_R_peel
Momotaro8Fruit fleshR, 50–55 days after anthesis4MO_R_flesh
Fruit peelR, 50–55 days after anthesis4MO_R_peel

MG, mature green; Y, yellow; O, orange; R, red.

Figure 1

Global topology of the tomato coexpression network. (A) Network density, (B) number of nodes, and (C) number of edges, at varied PCC cutoff values. Inserts are magnified curves within a cutoff range from 0.85 to 1.0. Network density showed the minimal value at PCC cutoff 0.91. Arrows in the inserts indicate this cutoff value.

Global topology of the tomato coexpression network. (A) Network density, (B) number of nodes, and (C) number of edges, at varied PCC cutoff values. Inserts are magnified curves within a cutoff range from 0.85 to 1.0. Network density showed the minimal value at PCC cutoff 0.91. Arrows in the inserts indicate this cutoff value. Microarray data used for coexpression analysis MG, mature green; Y, yellow; O, orange; R, red. We attempted to identify coexpression modules using an alternative module-finding algorithm developed by Ogata et al.[4] This algorithm detects coexpression modules not only by using PCC cutoff, but also by evaluating density and connectivity of networks. Each coexpression module was reconstituted from a given seed gene. Genes directly connected to the seed gene were first selected using PCC cutoff value. From this set of correlated genes, a subgroup that had the highest NB value[4] (see Materials and methods) was selected, and referred to as the kernel group. Next, VB value (for definition, see Materials and methods) was calculated for all genes not belonging to the kernel group. If the gene had VB value higher than the threshold, that gene was incorporated into the kernel group. Resulting set of genes was defined as coexpression module. Modules with NB value above threshold were selected for further analysis. As a result, generated modules have dense connections within the module and sparse connections to other modules. When member genes overlapped between multiple modules, non-redundant member genes were bundled into a larger module. It has been reported that this approach can detect coexpression modules with better assignment to biological processes (e.g. metabolic pathway) than other algorithms.[4] On the basis of this approach, 199 coexpression modules were identified (Supplementary data 4) using following threshold values: PCC cutoff, 0.6; VB value, 0.333; and NB value, 0.5. The number of member probes per module ranged from 3 to 103, with a median value of 7 member probes per module (Fig. 2A). The distribution of the NB value[4] showed that more than 40% of the modules have an NB value >0.8, indicating that the modules have high intra-modular connectivity (Fig. 2B).
Figure 2

Distribution of characteristic parameters of the identified coexpression modules. Distribution of (A) number of member probes per module and (B) NB value. NB value is defined as a ratio of a number of edges within the module and a total number of edges between module members and all possible nodes irrespective of membership in the module. Median value of the number of member probes per module is 7. More than 40% of the modules have NB values >0.8, indicating that intra-modular connectivity is high in the coexpression modules.

Distribution of characteristic parameters of the identified coexpression modules. Distribution of (A) number of member probes per module and (B) NB value. NB value is defined as a ratio of a number of edges within the module and a total number of edges between module members and all possible nodes irrespective of membership in the module. Median value of the number of member probes per module is 7. More than 40% of the modules have NB values >0.8, indicating that intra-modular connectivity is high in the coexpression modules. Functions of the modules were inferred using GO annotations. GO annotations to tomato probes were provided according to their similarity to Arabidopsis genes using a TAIR GO annotation search. First, we investigated whether or not specific GO terms were enriched in a given module compared with the GO term distribution in all Affymetrix tomato microarray probes. Enrichment of GO categories with significance at the 1% level was observed in 75 modules (Table 2). Enriched GO categories included chloroplast, plastid, cytosol, ribosome, other enzymatic activity, transferase activity, hydrolase activity, kinase activity, structural molecule activity, protein metabolism, and response to stress (Fig. 3). Ribosome-related genes were expected to be coexpressed, since ribosome is a protein complex. Coexpression modules enriched with chloroplast-related genes appear to be classified into several subgroups according to the sub-plastidal localization (e.g. envelope, thylakoid, and stroma) of proteins encoded by the module member genes.
Table 2

Coexpression modules in which transcription factor genes are present or GO annotations are enriched

IDNBAverage PCCNum. of member probesNum. of Arabidopsis genesNum. of SGN unigenesNum. of DFCI TCsNum. of transcription factorsEnriched GO categoriesa
11.0000.80554552
21.0000.87053221
31.0000.642117691CC2, CC6
51.0000.74952031
61.0000.55295560BP5, BP6
91.0000.647109890MF1, MF4
151.0000.36864461
171.0000.73166660BP7
181.0000.57243440CC2
221.0000.77476670BP8
251.0000.847111011110CC4, BP6
271.0000.65576670MF9
291.0000.48377560MF1
331.0000.75855350BP8
341.0000.40742341
360.9380.88186670BP1
380.9140.7521068100CC2
390.9100.667107670MF3
410.9010.83755350CC10, CC11
480.8750.538121012121
490.8630.664141113130MF1, MF4, BP6
500.8580.6341388110CC10, CC11, MF12, BP4
520.8580.84212410112
530.8580.60998883
550.8580.78076670CC10, CC11, MF12, BP4
610.8580.76655450CC2
620.8580.654108990BP5, BP6
640.8500.824171313150MF1, BP2
650.8470.40454551
660.8340.771121211120CC9, MF8
670.8340.70011118110CC2, CC6
700.8260.578181513160CC2, CC6
710.8190.47088881MF7
720.8150.59763461
740.8110.628191818183CC8
770.8010.75344130CC11
780.8010.74564661
790.8010.63110710100MF5
800.8010.89455450MF3
820.8010.88855551
860.7860.7861037885972CC1, CC2, CC3, CC6, MF1, MF4, MF9, BP2, BP12
880.7670.73797670MF1
890.7500.666141312130CC4, MF2, BP3
900.7500.77288880BP4
930.7500.613121210120MF2, BP3
1010.7340.810141110140CC10, CC11, MF12, BP4
1020.7340.825121112120CC2, CC6
1030.7330.6771187100MF7
1040.7280.56475370MF1
1050.7280.84585881
1080.7190.65288681
1090.7110.63310069809414CC4, CC8, MF1, MF11, BP5, BP6, BP7
1100.7060.80010108100CC7, CC13, BP7
1110.6930.722108890CC2, CC6
1170.6880.6351088100MF5
1180.6880.78484881
1190.6710.851141310130BP9
1200.6670.599141411141CC4, BP4
1220.6670.63974370BP5, BP6
1260.6670.63910109100CC7, MF6
1270.6670.60177770BP3
1280.6670.78399690BP4
1320.6670.81477472
1370.6670.69243440CC11
1390.6660.6695244435012CC4, CC8, CC12, MF10, MF11, BP5, BP6, BP9, BP11
1400.6500.62398890CC4
1410.6500.66299790CC3, CC10, CC11, MF10, MF12
1420.6430.81987881
1440.6350.787161415162CC4, MF7, BP7
1450.6320.722131313131CC4, MF7
1470.6320.89333331
1480.6250.66255450MF3, MF6
1490.6220.864131010130CC1, MF5, BP4
1510.6170.734342227340CC7, MF5
1520.6090.860181416180CC9, BP7, BP10
1540.6000.89332220MF6
1550.6000.62377671
1630.5960.77814129140CC4, CC5, BP8
1650.5890.78899891
1660.5870.838161112140CC10, CC11, MF12, BP4
1670.5810.832231722221MF5, MF6
1690.5760.686322230294BP9
1710.5710.876251920251CC9, MF7, MF8, MF9
1720.5700.758473441431CC2, CC3, CC6, MF1, BP10
1740.5630.89888780CC10, CC11, MF12, BP4
1750.5560.86877570MF6
1760.5510.71913119130CC4, MF2, BP3
1800.5460.76272771
1820.5460.70477671CC2
1840.5420.716282621273MF6, BP7
1880.5240.8101077100CC10, CC11
1890.5220.52065560BP6
1900.5190.77596770MF4
1920.5000.78176770CC3
1930.5000.84875660CC10, CC11, MF12, BP4
1970.5000.78544440CC10, CC11
1990.5000.79298891

aCC1, other intracellular components; CC2, chloroplast; CC3, other cytoplasmic components; CC4, unknown cellular components; CC5, other membranes; CC6, plastid; CC7, plasma membrane; CC8, nucleus; CC9, mitochondria; CC10, cytosol; CC11, ribosome; CC12, cell wall; CC13, ER; MF1, other enzyme activity; MF2, unknown molecular functions; MF3, transferase activity; MF4, other binding; MF5, hydrolase activity; MF6, kinase activity; MF7, protein binding; MF8, nucleotide binding; MF9, transporter activity; MF10, DNA or RNA binding; MF11, transcription factor activity; MF12, structural molecule activity; BP1, other cellular processes; BP2, other metabolic processes; BP3, unknown biological processes; BP4, protein metabolism; BP5, response to abiotic or biotic stimulus; BP6, response to stress; BP7, developmental processes; BP8, transport; BP9, other biological processes; BP10, cell organization and biogenesis; BP11, transcription; BP12, electron transport or energy pathways.

Figure 3

Distribution of GO category significantly enriched within a coexpression module. Note that GO categories of ‘chloroplast’ and ‘plastid’ are frequently associated with the same genes. GO categories of ‘cytosol’, ‘ribosome’, ‘structural molecule activity’, and ‘protein metabolism’ are frequently associated with the same genes.

Distribution of GO category significantly enriched within a coexpression module. Note that GO categories of ‘chloroplast’ and ‘plastid’ are frequently associated with the same genes. GO categories of ‘cytosol’, ‘ribosome’, ‘structural molecule activity’, and ‘protein metabolism’ are frequently associated with the same genes. Coexpression modules in which transcription factor genes are present or GO annotations are enriched aCC1, other intracellular components; CC2, chloroplast; CC3, other cytoplasmic components; CC4, unknown cellular components; CC5, other membranes; CC6, plastid; CC7, plasma membrane; CC8, nucleus; CC9, mitochondria; CC10, cytosol; CC11, ribosome; CC12, cell wall; CC13, ER; MF1, other enzyme activity; MF2, unknown molecular functions; MF3, transferase activity; MF4, other binding; MF5, hydrolase activity; MF6, kinase activity; MF7, protein binding; MF8, nucleotide binding; MF9, transporter activity; MF10, DNA or RNA binding; MF11, transcription factor activity; MF12, structural molecule activity; BP1, other cellular processes; BP2, other metabolic processes; BP3, unknown biological processes; BP4, protein metabolism; BP5, response to abiotic or biotic stimulus; BP6, response to stress; BP7, developmental processes; BP8, transport; BP9, other biological processes; BP10, cell organization and biogenesis; BP11, transcription; BP12, electron transport or energy pathways.

Coexpression modules containing transcription factor genes

Coexpression analysis can facilitate prediction of functions of regulatory proteins that do not have enzymatic, transporter, or structural molecule activities. Modules containing transcription factor genes are of particular interest, since these transcription factors may have a role in controlling the expression of other module member genes. We identified 37 modules containing transcription factors in the 199 modules. In 16 modules containing transcription factors, significant enrichment of certain GO categories was observed (Table 2). For example, two transcription factor genes are found in module 52 (Table 2, Supplementary data 4). Genes corresponding to Les.3716.1.S1_at (GenBank accession number AJ277944) and Les.3517.2.S1_a_at (GenBank accession number BT012879), respectively, encoding Myb-family and TCP-family transcription factors, are tightly correlated with seven protease inhibitor unigenes (SGN tomato unigenes: SGN-U313509, SGN-U312622, SGN-U312829, SGN-U312623, SGN-U312822, SGN-U313508, and SGN-U312824) (Supplementary data 4). This implies that these transcription factors regulate expression of protease inhibitor genes. Another example is module 3 (Table 2, Supplementary data 4). Gene corresponding to LesAffx.69411.1.S1_at (GenBank accession number AW651000) encoding bHLH-family transcription factor is correlated exclusively with plastid-associated genes, implying that this bHLH-family protein is associated with the regulation of plastid function.

Regulatory protein that does not belong to transcription factor family can regulate expression of module member genes

Elucidation of the role of transcription factors in regulating expression of coexpressed genes has been well documented.[13] However, the role of regulatory protein genes that are not classified as transcription factors in the regulation of coexpressed genes remains unclear. We tested whether a non-transcription factor-type regulatory gene can control the coordinated expression of genes in a given module. To exemplify this, we performed an experimental analysis of module 64 (NB value 0.850, average PCC value 0.822), in which flavonoid biosynthesis genes are enriched (Table 3). Expression profiles of member genes of this module show that they are highly expressed in fruit peel tissues, and are expressed at a lower level in leaf and fruit flesh tissues, an expression pattern that correlates with the localization of tomato flavonoid compounds[22] (Fig. 4A). We found that one of the non-enzymatic genes, corresponding to Les.2294.2.A1_at (GenBank accession number AK326277), encodes RING-finger type zinc finger protein by protein domain search using InterProScan[29] (http://www.ebi.ac.uk/Tools/InterProScan/), although description of the best match Arabidopsis gene (At1g79110) and SGN unigene (SGN-U323178) indicates ‘expressed protein’ (Fig. 4B, Supplementary data 4). In a network graph of module 64, enzymatic genes of flavonoid biosynthesis are tightly interconnected. The zinc finger protein gene, hereafter referred to as ZnF, has direct links to genes of 4-coumarate-CoA ligase, cinnamoyl-CoA reductase, chalcone synthase 1, flavanone 3-hydroxylase, flavonol synthase, glycosyltransferases, and malonyl-CoA synthetase (Fig. 4C). To test whether this ZnF gene controls expression of flavonoid biosynthetic genes, we overexpressed a full-length cDNA of this gene (clone ID: LEFL2003DB10) in Micro-Tom. Gene expression analysis was performed using leaf tissues, in which expression of flavonoid biosynthesis genes is low in wild-type plants. Expression of genes of 4-coumarate-CoA ligase, cinnamate 4-hydroxylase, cinnamoyl-CoA reductase, chalcone synthase 1, chalcone synthase 2, chalcone isomerase, flavanone 3-hydroxylase 1, and flavonol synthase was higher in ZnF-overexpressing leaves than in control leaves, although PCC values between ZnF and these up-regulated genes were not very high, mainly because of high expression levels in one of the transformant lines (Fig. 4D). On the other hand, the expression of phenylalanine ammonia-lyase, which is not a member of the module, did not change significantly. The expression of flavonoid 3′-hydroxylase genes correlated negatively with overexpression of the ZnF gene. These results demonstrated that the ZnF gene positively regulates the expression of enzymatic genes in the early part of the flavonoid biosynthetic pathway, which is consistent with the coexpression relationship seen in module 64.
Table 3

Coexpression modules 64

Module IDMember probesSGN unigeneDFCI TCDescription
64Les.3649.1.S1_atTC193015Chalcone synthase 2
Les.3650.1.S1_atSGN-U316359TC193390Chalcone synthase 1
Les.5427.1.S1_atSGN-U317537TC193461Malonyl-CoA synthetase
LesAffx.61398.1.S1_atSGN-U320999TC208694Expressed protein
LesAffx.63776.1.S1_atSGN-U316228TC195757UDP-glucosyl transferase family protein
Les.2633.1.A1_atSGN-U316228TC203484UDP-glucosyl transferase family protein
LesAffx.68320.1.S1_atSGN-U319782ES893432Chalcone-flavanone isomerase family protein
Les.1968.1.A1_atSGN-U319782TC205850Chalcone-flavanone isomerase family protein
Les.3085.1.S1_atTC200116Flavonol synthase
LesAffx.34276.2.A1_atTC198877Cinnamoyl-CoA reductase
Les.2278.1.S1_atSGN-U312401TC191763Flavanone 3-hydroxylase
LesAffx.34276.1.S1_atTC198877Cinnamoyl CoA reductase
LesAffx.30397.1.A1_atTC209623Allyl alcohol dehydrogenase
LesAffx.34276.2.S1_atTC198877Cinnamoyl-CoA reductase
Les.2294.2.A1_atSGN-U323178TC211502Expressed protein (zinc finger protein)
Les.5848.2.S1_atSGN-U321355TC1996134-coumarate–CoA ligase
LesAffx.5010.2.S1_atSGN-U316789TC194689Cytochrome b-561 family protein
Figure 4

Experimental verification of the coexpression relationship between members of module 64. (A) Expression profiles of 15 member genes (corresponding to 17 probes, see Supplementary data 2). 5W, five week; 3W, three week; MG, mature green; Y, yellow; O, orange; R, red. (B) Sequence of a full-length cDNA corresponding to the probe Les.2294.2.A1_at (LEFL2003DB10, GenBank accession number AK326277). Gray-shaded letters indicate a unigene sequence used to design Les.2294.2.A1_at. Boxed ATG indicates the start codon. Underlined TGA indicates the stop codon. Dotted line indicates cDNA sequence corresponding to zinc finger domain. (C) Flavonoid biosynthesis pathway (left) and coexpression network of module 64 (right, correlation coefficient cutoff at 0.6). ZnF, zinc finger; PAL, phenylalanine ammonia-lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumarate–CoA ligase; CCR, cinnamoyl-CoA reductase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; FLS, flavonol synthase; GT, glycosyltransferase; F3′H, flavonoid 3′-hydroxylase; EP, expressed protein; Cyt B561, cytochrome b-561; AADH, allyl alcohol dehydrogenase; dKae, dihydrokaempferol; dQue, dihydroquercetin; Kae, kaempferol; Que, quercetin. In the network graph, black edges indicate PCC ≥0.8, and grey edges indicate PCC from 0.6 to 0.8. (D) Changes in expression levels of flavonoid biosynthesis genes in module 64. Expression level of each gene is indicated as a relative value to the level in control line. Black and grey bars indicate control lines and ZnF-overexpression lines, respectively. Each of the four grey bars indicates independent ZnF-overexpression plant.

Experimental verification of the coexpression relationship between members of module 64. (A) Expression profiles of 15 member genes (corresponding to 17 probes, see Supplementary data 2). 5W, five week; 3W, three week; MG, mature green; Y, yellow; O, orange; R, red. (B) Sequence of a full-length cDNA corresponding to the probe Les.2294.2.A1_at (LEFL2003DB10, GenBank accession number AK326277). Gray-shaded letters indicate a unigene sequence used to design Les.2294.2.A1_at. Boxed ATG indicates the start codon. Underlined TGA indicates the stop codon. Dotted line indicates cDNA sequence corresponding to zinc finger domain. (C) Flavonoid biosynthesis pathway (left) and coexpression network of module 64 (right, correlation coefficient cutoff at 0.6). ZnF, zinc finger; PAL, phenylalanine ammonia-lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumarate–CoA ligase; CCR, cinnamoyl-CoA reductase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; FLS, flavonol synthase; GT, glycosyltransferase; F3′H, flavonoid 3′-hydroxylase; EP, expressed protein; Cyt B561, cytochrome b-561; AADH, allyl alcohol dehydrogenase; dKae, dihydrokaempferol; dQue, dihydroquercetin; Kae, kaempferol; Que, quercetin. In the network graph, black edges indicate PCC ≥0.8, and grey edges indicate PCC from 0.6 to 0.8. (D) Changes in expression levels of flavonoid biosynthesis genes in module 64. Expression level of each gene is indicated as a relative value to the level in control line. Black and grey bars indicate control lines and ZnF-overexpression lines, respectively. Each of the four grey bars indicates independent ZnF-overexpression plant. Coexpression modules 64 Analysis of intracellular localization demonstrated that the localization of GFP-ZnF fusion protein was the same as that of free GFP protein (Fig. 5). We obtained the same result using ZnF–GFP fusion protein (data not shown). This result suggests that ZnF protein is localized to cytosol, and that ZnF protein is not a canonical transcription factor protein. The RING-finger type zinc finger domain is reportedly involved in protein–protein interaction.[30] Thus, it can be hypothesized that the ZnF gene positively regulates the expression of flavonoid biosynthetic genes through interaction with other transcriptional regulator proteins. This example demonstrates the potential of coexpression analysis in inferring functions of unknown regulatory genes that do not belong to transcription factor families.
Figure 5

Intracellular localization of (A) GFP protein and (B) GFP-ZnF fusion protein. N, nucleus. Scale bar, 50 µm. Localization pattern of GFP-ZnF is the same as free GFP, suggesting that ZnF protein is localized to cytosol.

Intracellular localization of (A) GFP protein and (B) GFP-ZnF fusion protein. N, nucleus. Scale bar, 50 µm. Localization pattern of GFP-ZnF is the same as free GFP, suggesting that ZnF protein is localized to cytosol.

Potential of coexpression analysis in predicting functions of uncharacterized genes

Recently, coexpression analysis was used to predict the function of a transporter gene involved in Arabidopsis glucosinolate biosynthesis.[31] The function of this transporter gene, BASS5, was experimentally demonstrated using BASS5 knockout Arabidopsis plants, in which the accumulation of methionine-derived glucosinolates decreased. The results shown in the present study, together with this previous transporter study, suggest that the validity of gene-to-gene coexpression analysis is not limited to genes involved in protein complex formation or transcriptional regulation, but is also applicable to inferring the function of various types of uncharacterized genes. Experimental verification of the functions of several other candidate genes for regulatory protein is in progress.

Supplementary data

Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was partly supported by the Research and Development Program for New Bio-Industry Initiatives, and by a grant from Kazusa DNA Research Institute.
  30 in total

1.  Analysis of large-scale gene expression data.

Authors:  G Sherlock
Journal:  Brief Bioinform       Date:  2001-12       Impact factor: 11.622

Review 2.  Approaches for extracting practical information from gene co-expression networks in plant biology.

Authors:  Koh Aoki; Yoshiyuki Ogata; Daisuke Shibata
Journal:  Plant Cell Physiol       Date:  2007-01-23       Impact factor: 4.927

Review 3.  The RING finger domain: a recent example of a sequence-structure family.

Authors:  K L Borden; P S Freemont
Journal:  Curr Opin Struct Biol       Date:  1996-06       Impact factor: 6.809

4.  Efficient promoter cassettes for enhanced expression of foreign genes in dicotyledonous and monocotyledonous plants.

Authors:  I Mitsuhara; M Ugaki; H Hirochika; M Ohshima; T Murakami; Y Gotoh; Y Katayose; S Nakamura; R Honkura; S Nishimiya; K Ueno; A Mochizuki; H Tanimoto; H Tsugawa; Y Otsuki; Y Ohashi
Journal:  Plant Cell Physiol       Date:  1996-01       Impact factor: 4.927

5.  Gene expression and metabolism in tomato fruit surface tissues.

Authors:  Shira Mintz-Oron; Tali Mandel; Ilana Rogachev; Liron Feldberg; Ofra Lotan; Merav Yativ; Zhonghua Wang; Reinhard Jetter; Ilya Venger; Avital Adato; Asaph Aharoni
Journal:  Plant Physiol       Date:  2008-04-25       Impact factor: 8.340

6.  Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis.

Authors:  Masami Yokota Hirai; Kenjiro Sugiyama; Yuji Sawada; Takayuki Tohge; Takeshi Obayashi; Akane Suzuki; Ryoichi Araki; Nozomu Sakurai; Hideyuki Suzuki; Koh Aoki; Hideki Goda; Osamu Ishizaki Nishizawa; Daisuke Shibata; Kazuki Saito
Journal:  Proc Natl Acad Sci U S A       Date:  2007-04-09       Impact factor: 11.205

7.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks.

Authors:  Md Altaf-Ul-Amin; Yoko Shinbo; Kenji Mihara; Ken Kurokawa; Shigehiko Kanaya
Journal:  BMC Bioinformatics       Date:  2006-04-14       Impact factor: 3.169

8.  InterProScan: protein domains identifier.

Authors:  E Quevillon; V Silventoinen; S Pillai; N Harte; N Mulder; R Apweiler; R Lopez
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

9.  ATTED-II provides coexpressed gene networks for Arabidopsis.

Authors:  Takeshi Obayashi; Shinpei Hayashi; Motoshi Saeki; Hiroyuki Ohta; Kengo Kinoshita
Journal:  Nucleic Acids Res       Date:  2008-10-25       Impact factor: 16.971

10.  Metabolite annotations based on the integration of mass spectral information.

Authors:  Yoko Iijima; Yukiko Nakamura; Yoshiyuki Ogata; Ken'ichi Tanaka; Nozomu Sakurai; Kunihiro Suda; Tatsuya Suzuki; Hideyuki Suzuki; Koei Okazaki; Masahiko Kitayama; Shigehiko Kanaya; Koh Aoki; Daisuke Shibata
Journal:  Plant J       Date:  2008-02-07       Impact factor: 6.417

View more
  24 in total

1.  Direct targets of the tomato-ripening regulator RIN identified by transcriptome and chromatin immunoprecipitation analyses.

Authors:  Masaki Fujisawa; Yoko Shima; Naoki Higuchi; Toshitsugu Nakano; Yoshiyuki Koyama; Takafumi Kasumi; Yasuhiro Ito
Journal:  Planta       Date:  2011-12-09       Impact factor: 4.116

2.  Novel promoters that induce specific transgene expression during the green to ripening stages of tomato fruit development.

Authors:  Kyoko Hiwasa-Tanase; Hirofumi Kuroda; Tadayoshi Hirai; Koh Aoki; Kenichi Takane; Hiroshi Ezura
Journal:  Plant Cell Rep       Date:  2012-04-06       Impact factor: 4.570

3.  Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches.

Authors:  Atsushi Fukushima; Tomoko Nishizawa; Mariko Hayakumo; Shoko Hikosaka; Kazuki Saito; Eiji Goto; Miyako Kusano
Journal:  Plant Physiol       Date:  2012-02-03       Impact factor: 8.340

4.  Comparative co-expression network analysis extracts the SlHSP70 gene affecting to shoot elongation of tomato.

Authors:  Nam Tuan Vu; Ken Kamiya; Atsushi Fukushima; Shuhei Hao; Wang Ning; Tohru Ariizumi; Hiroshi Ezura; Miyako Kusano
Journal:  Plant Biotechnol (Tokyo)       Date:  2019-09-25       Impact factor: 1.133

5.  The functional network of the Arabidopsis plastoglobule proteome based on quantitative proteomics and genome-wide coexpression analysis.

Authors:  Peter K Lundquist; Anton Poliakov; Nazmul H Bhuiyan; Boris Zybailov; Qi Sun; Klaas J van Wijk
Journal:  Plant Physiol       Date:  2012-01-24       Impact factor: 8.340

6.  Transcriptome analysis of rin mutant fruit and in silico analysis of promoters of differentially regulated genes provides insight into LeMADS-RIN-regulated ethylene-dependent as well as ethylene-independent aspects of ripening in tomato.

Authors:  Rahul Kumar; Manoj K Sharma; Sanjay Kapoor; Akhilesh K Tyagi; Arun K Sharma
Journal:  Mol Genet Genomics       Date:  2012-01-03       Impact factor: 3.291

7.  Integrating Coexpression Networks with GWAS to Prioritize Causal Genes in Maize.

Authors:  Robert J Schaefer; Jean-Michel Michno; Joseph Jeffers; Owen Hoekenga; Brian Dilkes; Ivan Baxter; Chad L Myers
Journal:  Plant Cell       Date:  2018-11-09       Impact factor: 11.277

8.  Integrated bioinformatics to decipher the ascorbic acid metabolic network in tomato.

Authors:  Valentino Ruggieri; Hamed Bostan; Amalia Barone; Luigi Frusciante; Maria Luisa Chiusano
Journal:  Plant Mol Biol       Date:  2016-03-23       Impact factor: 4.076

9.  Transcriptome and metabolite profiling show that APETALA2a is a major regulator of tomato fruit ripening.

Authors:  Rumyana Karlova; Faye M Rosin; Jacqueline Busscher-Lange; Violeta Parapunova; Phuc T Do; Alisdair R Fernie; Paul D Fraser; Charles Baxter; Gerco C Angenent; Ruud A de Maagd
Journal:  Plant Cell       Date:  2011-03-11       Impact factor: 11.277

10.  Predicting Protein Functions Based on Differential Co-expression and Neighborhood Analysis.

Authors:  Jael Sanyanda Wekesa; Yushi Luan; Jun Meng
Journal:  J Comput Biol       Date:  2020-04-17       Impact factor: 1.479

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.