Literature DB >> 20027228

VitisNet: "Omics" integration through grapevine molecular networks.

Jérôme Grimplet¹, Grant R Cramer, Julie A Dickerson, Kathy Mathiason, John Van Hemert, Anne Y Fennell.

Abstract

BACKGROUND: Genomic data release for the grapevine has increased exponentially in the last five years. The Vitis vinifera genome has been sequenced and Vitis EST, transcriptomic, proteomic, and metabolomic tools and data sets continue to be developed. The next critical challenge is to provide biological meaning to this tremendous amount of data by annotating genes and integrating them within their biological context. We have developed and validated a system of Grapevine Molecular Networks (VitisNet). METHODOLOGY/PRINCIPAL
FINDINGS: The sequences from the Vitis vinifera (cv. Pinot Noir PN40024) genome sequencing project and ESTs from the Vitis genus have been paired and the 39,424 resulting unique sequences have been manually annotated. Among these, 13,145 genes have been assigned to 219 networks. The pathway sets include 88 "Metabolic", 15 "Genetic Information Processing", 12 "Environmental Information Processing", 3 "Cellular Processes", 21 "Transport", and 80 "Transcription Factors". The quantitative data is loaded onto molecular networks, allowing the simultaneous visualization of changes in the transcriptome, proteome, and metabolome for a given experiment.
CONCLUSIONS/SIGNIFICANCE: VitisNet uses manually annotated networks in SBML or XML format, enabling the integration of large datasets, streamlining biological functional processing, and improving the understanding of dynamic processes in systems biology experiments. VitisNet is grounded in the Vitis vinifera genome (currently at 8x coverage) and can be readily updated with subsequent updates of the genome or biochemical discoveries. The molecular network files can be dynamically searched by pathway name or individual genes, proteins, or metabolites through the MetNet Pathway database and web-portal at http://metnet3.vrac.iastate.edu/. All VitisNet files including the manual annotation of the grape genome encompassing pathway names, individual genes, their genome identifier, and chromosome location can be accessed and downloaded from the VitisNet tab at http://vitis-dormancy.sdstate.org.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Transcription Factors

Year: 2009 PMID： 20027228 PMCID： PMC2791446 DOI： 10.1371/journal.pone.0008365

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

During the pre-genomics era, gene function was established through a reductionist approach [1] where organism physiology was understood by breaking components into pieces, studying them, and then putting them back together to see the larger picture. With the emergence of genome sequencing, organisms are now seen as complex interactive systems. Systems biology, adapted from the general system theory [2] and the living system theory [3], intends to explain biological phenomena utilizing a systemic view of the objects' relationships rather than their simple composition [4]. Integrative functional genomics combines the molecular components (transcripts, proteins, and metabolites) of an organism and incorporates them into functional networks or models designed to describe the dynamic activities of that organism. While many of the functions of individual parts are unknown or not well defined, their biological role can sometimes be inferred through association with other known parts, providing a better understanding of the biological system as a whole. On a system-wide scale the description requires three levels of information [5], [6]: (1) identification of the components (structural annotation) and characterization of their identity (functional annotation); (2) identification of molecules that interact with each component, which leads to the reconstruction of a biochemical reaction network; and (3) characterization of the behaviors of the transcripts, proteins, and metabolites under various conditions. Integration of the three levels of information into a coherent framework (or canvas) provides a powerful approach to tackle the difficult problem of extracting systems-wide behavior from the component interactions. The most developed examples of application of this approach can be found in prokaryotes, because of their small genomes [7], [8]. For example, in E. coli, 92% of the gene product functions have been experimentally verified. Genome-scale models (GEMs) have been used for metabolic engineering to systematically manipulate E. coli strains to overproduce lycopene, lactic acid, ethanol, succinate, amino acids, and many other products including hydrogen and vanillin. New biological discoveries of open reading frames (ORF) can be made by focusing on the gaps in the unknown portions of the Omic maps, using the genomic responses of different genotypes under different conditions to determine the probable gene candidates that fill knowledge gaps. GEMs have been widely used to characterize and understand physiological responses to environmental conditions such as abiotic and biotic stresses. This has been particularly useful in the identification of resistance mechanisms that can be established in new strains. Such global analyses have become possible with the development of high throughput genomics technologies in both the field of nucleic acid sequencing and quantitative data acquisition. Over the last 20 years, expressed tag sequencing (EST) [9] has been widely utilized for gene discovery and genome characterization. EST data are stored in comprehensive databases such as UniGene [10] or the DFCI Gene Indices [11]. Recently, cheaper and faster Next-Gen sequencing technologies have emerged such as 454 [12] or Illumina [13]. Recently, cheaper and faster Next-Gen sequencing technologies have emerged such as 454 [12] or Illumina [13]. In parallel, methods have been developed for quantitative data acquisition: microarrays are used to quantitatively assess the transcriptome [14]. Two dimensional-gels have routinely been used for proteome studies [15]. Recently, however, gel-free technologies have emerged such as ICAT [16] or iTRAQ [17]. Metabolome studies are performed with a variety of tools such as gas chromatography or high performance liquid chromatography for separation and mass spectrometry and nuclear magnetic resonance for the identification and quantification of the metabolites [18]. Genomics resources for Vitis vinifera and related species have proliferated rapidly within the last several years, including EST sequencing [19], [20], [21] to whole genome sequencing [22], [23] and integrated genetic maps [24]. These resources have permitted large-scale mRNA expression profiling studies of gene expression profiles during berry development using cDNA or oligonucleotide microarrays [25], [26]. A high-density, Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array containing approximately one-third of the expected gene content of the V. vinifera genome with some bias towards leaf and berry tissues was developed, leading to numerous publications [27], [28], [29], [30], [31], [32], [33]. Under the encouragement of the international grape community, the microarray data for several of these experiments has been centralized and can be accessed at PLEXdb (http://www.plexdb.org) [34]. Six additional microarray datasets using cDNA, oligo, or Affymetrix arrays are available through Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/sites/entrez?db=geo) and citations for publications are also linked to these public data sets [33], [35], [36]. Proteomics resources have also emerged recently. Most of these studies use 2-D gel analysis and focus either on berry metabolism [37], [38] or abiotic stress resistance [39], [40], [41] or both [42]. Recently high resolution techniques, such as iTRAQ, have also been applied to grape [43]. Metabolomics studies for grape are still rudimentary; however, several works have presented simultaneous analysis of about 50 to 120 compounds [28], [30], [42], [44]. To date, only two studies present the transcriptomic, proteomic, and metabolomic analyses on the same material, one in berry tissues [29], [42] and the other on abiotic stress in shoots [40], [28]. Information from structural and functional genomics must be combined with detailed biochemical reaction networks to further our understanding of biological function and incorporate the knowledge into cultural practice. While a considerable amount of effort has been put into resolving the structural information (level 1) and “Omics” characterization of individual groups of transcripts, proteins or metabolites (level 3), relatively few biochemical reaction networks (level 2) have been constructed in grapevines or other plant systems. While pathway databases exist at the KEGG (http://www.genome.jp/kegg/pathway.html) or AraCyc [45], they are limited to metabolic pathways. In contrast, MetNet (http://metnet3.vrac.iastate.edu/) stores both metabolic and regulatory interactions for Arabidopsis and soybean [46]. In order to contextualize the molecular structure and a metric representing their behavior, we have developed a model of the molecular networks present in grapevines (VitisNet). This resource allows visualization of the dynamic interactions in the transcriptome, proteome, and metabolome within known molecular networks (for example, metabolic or signaling pathways). Integrating transcripts with protein and metabolite profiles in a comprehensive molecular map enables the researcher to elucidate different biochemical responses of grapevines to developmental and environmental cues.

Results and Discussion

A Set of 39,424 Unique Sequences Defined

The set of unique genes was not restricted to the Pinot Noir genome sequences, as an extensive amount of data have been produced on other V. vinifera cultivars and other Vitis species. The V. vinifera EST database contains only a very small fraction of Pinot Noir sequences (1.8% or 6,385/353,688), whereas Cabernet Sauvignon (half of the EST sequences), Chardonnay, Thompson Seedless, Muscat de Hambourg, and Perlette each have at least two times the number of Pinot Noir sequences. In addition, a significant amount of ESTs have been produced for other Vitis species. It is expected that a significant amount of transcript sequences are cultivar and species specific and may not be represented within the Pinot Noir PN40024 genome. A set of 39,424 unique sequences were defined after the matching of the genomic sequences and the transcripts (Figure 1). Only 36.4% of these sequences (14,330) were found in both the genomic sequences and the transcripts. In the set of unique sequences, the genomic sequences were conserved over transcript sequences because they should be the full length gene, whereas there is less certainty for the transcript. In some cases, several supposedly unique transcript sequences matched a single gene, mainly because they matched different regions of the gene. A total of 652 unique sequences corresponded to previously published grapevine sequences (Table S1).

Figure 1

Overview of the unique set assembly and results of the annotation procedure.

Box sizes are relative to respective number of genes inside.

Overview of the unique set assembly and results of the annotation procedure.

Box sizes are relative to respective number of genes inside. The set that was found only in the genomic sequences included 40.8% (16,104) of the unique sequences. This means that so far there is no proof that these sequences are actually transcribed. Finally 22.8% (8990) of the unique sequences were found only as transcripts. This set could include cultivar or species specific genes absent in the Pinot Noir genome or genes not yet extracted from the genome. However as 73% (6553) of these unique sequences were not homologous to sequences from other organisms, it is likely that most of them corresponded to short sequences or contained mostly UTR regions so that a BLAST analysis could not be conducted against the genome sequences encoding for their putative proteins. These sequences were of interest because many of them were placed on the highly popular Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array. There were 3208 sequences amongst the 11,734 non-redundant sequences in the Affymetrix chip that did not present a match in the genome.

Half of the Matched Sequences Were Assigned to Molecular Networks

Seventy percent (27,680) of the unique genes matched a previously described Vitis cDNA or protein or a sequence from another organism. The remaining 11,744 sequences were Vitis-specific and a function could not be assigned. This number rose to 83% when only genes from the genome sequences were used. This gene set was divided into two groups, a group that could not be assigned to molecular networks and a group that could be assigned. The group that was not assigned to molecular networks consisted of 14,535 genes (52.5%) that covered a wide range of functional descriptions. At one extreme, the sequences (1,817) presented a completely unknown function. At the other extreme, an identifier was attributed to unmapped sequences (1,578). An identifier was assigned because an EC or KO number could be attributed to these sequences or an Arabidopsis homolog had an identifier; however, they couldn't be placed on the networks. In between the unknown and EC/KO identity, the description of the function ranged from sequences containing a poorly described domain, a general enzymatic activity, or to a well-documented gene. The second subset of the matched genes (13,145 sequences, 47.5%), which were homologous to proteins with a known function, was assigned to the molecular networks. The 13,145 genes present in the networks were classified into 6 main overlapping categories (Table 1- 6): Metabolism (5442 sequences), Genetic Information Processing (1249 sequences), Environmental Information Processing (1305 sequences), Cellular Processes (1121 sequences), Transport (3523 sequences), and Transcription Factors (2423 sequences). The complete annotation of the genes and relevant information for each is presented in Table S1. The references used for annotating genes and for developing pathways not found in KEGG are presented in Text S1.

Table 1

List of Metabolic Pathways.

VVID	Network Name	#gen	#pro	#met	VVID	Network Name	#gen	#pro	#met
1.1	Carbohydrate Metabolism
10010	Glycolysis / Gluconeogenesis	192	28	28	10530	Aminosugars metabolism	90	9	11
10020	Citrate cycle (TCA cycle)	74	17	21	10520	Nucleotide sugars met.	60	16	18
10030	Pentose phosphate pathway	83	17	21	10620	Pyruvate metabolism	197	27	19
10040	Pentose/glucuron. interconv.	57	11	14	10630	Glyoxyl., dicarboxyl. met.	90	18	19
10051	Fructose and mannose met.	108	21	21	10640	Propanoate metabolism	73	9	12
10052	Galactose metabolism	155	17	27	10650	Butanoate metabolism	85	18	22
10053	Ascorbate and aldarate met.	40	9	9	10562	Inositol phosphate met.	131	18	20
10500	Starch and sucrose met.	337	43	34
1.2	Energy Metabolism
10190	Oxidative phosphorylation	343	101	7	10720	Red. Carb. cyc. (CO2 fix.)	41	10	14
10195	Photosynthesis	173	52		10680	Methane metabolism	130	9	11
10196	Photosynthesis - antenna prot.	27	11		10910	Nitrogen metabolism	112	22	19
10710	Carbon fixation	140	21	20	10920	Sulfur metabolism	46	12	12
1.3	Lipid Metabolism
10061	Fatty acid biosynthesis	76	13	36	10561	Glycerolipid met.	146	19	18
10062	Fatty acid elongation in mitoc.	25	7	29	10564	Glycerophospholipid met.	140	29	32
10071	Fatty acid metabolism	94	17	40	10565	Ether lipid metabolism	57	8	9
10072	Synth. / degr. of ketone bodies	18	3	4	10600	Sphingolipid metabolism	67	13	15
10100	Biosynthesis of steroids	142	47	74	10592	alpha-Linolenic acid met.	104	14	29
10140	C21-Steroid hormone met.	20	6	14	11040	Biosynth. unsat. fatty ac.	42	14	27
1.4	Nucleotide Metabolism
10230	Purine metabolism	151	48	62	10240	Pyrimidine metabolism	109	35	46
1.5	Amino Acid Metabolism
10251	Glutamate metabolism	93	28	25	10330	Arginine and proline met.	54	17	23
10252	Alanine and aspartate met.	109	23	24	10340	Histidine metabolism	70	16	19
10260	Gly, ser and thr met.	110	30	38	10350	Tyrosine metabolism	149	25	39
10271	Methionine metabolism	124	33	48	10360	Phenylalanine metabolism	212	15	14
10272	Cysteine metabolism	78	17	25	10380	Tryptophan metabolism	20	6	7
10280	Val, leu and Ile degr.	85	18	34	10400	Phe, tyr and try biosynth.	144	30	35
10290	Val, leu and Ile biosynth.	60	13	26	10220	Urea cyc., met. amino grp	120	31	41
10300	Lysine biosynthesis	82	17	22
1.6	Met. of Other Amino Acids
10410	beta-Alanine met.	60	12	13	10460	Cyanoamino acid met.	35	8	16
10450	Selenoamino acid met.	69	15	17	10480	Glutathione met.	127	35	16
1.7	Glycan Biosynth. And Met.
10510	N-Glycan biosynthesis	50	19	21	10563	GPI-anchor biosynthesis	21	12	14
10511	N-Glycan degradation	67	8		10602	Glycosphingolip. biosynth.	15	7	16
10540	Lipopolysac. Biosynth.	12	10	13	11030	Glycan struct. biosynth. 1	88	26	49
10550	Peptidoglycan biosynth.	18	3	15
1.8	Met. of Cofactors and Vit.
10730	Thiamine metabolism	21	12	20	10780	Biotin metabolism	12	6	8
10740	Riboflavin metabolism	63	12	15	10790	Folate biosynthesis	39	18	24
10750	Vitamin B6 metabolism	23	7	13	10670	One carbon pool by folate	42	15	9
10760	Nicotinate, nicotinamide met	30	12	13	10860	Porph. and chloroph. met.	67	31	39
10770	Pantothenate, CoA biosynth.	44	15	19	10130	Ubiquinone biosynthesis	31	17	25
1.9	Biosynth. of Secondary Met.
10900	Terpenoid biosynthesis	182	18	24	10941	Flavonoid biosynthesis	183	25	52
10904	Diterpenoid biosynthesis	72	18	37	10942	Anthocyanin biosynthesis	59	8	18
10902	Monoterpenoid biosynth.	192	24	37	10943	Isoflavonoid biosynthesis	63	7	17
10908	Zeatin biosynthesis	52	10	20	10950	Alkaloid biosynthesis I	65	17	23
10906	Carotenoid biosynth.	40	19	33	10311	Penicillin/cephalosp. bioS.	14	4	5
10905	Brassinosteroid biosynth.	19	7	24	11002	Auxin biosynthesis	98	18	12
10940	Phenylpropanoid biosynth.	220	21	44	11012	IBA metabolism	14	11	5
1.10	Other
11000	Single reactions	162	15	38