Literature DB >> 20027228

VitisNet: "Omics" integration through grapevine molecular networks.

Jérôme Grimplet1, Grant R Cramer, Julie A Dickerson, Kathy Mathiason, John Van Hemert, Anne Y Fennell.   

Abstract

BACKGROUND: Genomic data release for the grapevine has increased exponentially in the last five years. The Vitis vinifera genome has been sequenced and Vitis EST, transcriptomic, proteomic, and metabolomic tools and data sets continue to be developed. The next critical challenge is to provide biological meaning to this tremendous amount of data by annotating genes and integrating them within their biological context. We have developed and validated a system of Grapevine Molecular Networks (VitisNet). METHODOLOGY/PRINCIPAL
FINDINGS: The sequences from the Vitis vinifera (cv. Pinot Noir PN40024) genome sequencing project and ESTs from the Vitis genus have been paired and the 39,424 resulting unique sequences have been manually annotated. Among these, 13,145 genes have been assigned to 219 networks. The pathway sets include 88 "Metabolic", 15 "Genetic Information Processing", 12 "Environmental Information Processing", 3 "Cellular Processes", 21 "Transport", and 80 "Transcription Factors". The quantitative data is loaded onto molecular networks, allowing the simultaneous visualization of changes in the transcriptome, proteome, and metabolome for a given experiment.
CONCLUSIONS/SIGNIFICANCE: VitisNet uses manually annotated networks in SBML or XML format, enabling the integration of large datasets, streamlining biological functional processing, and improving the understanding of dynamic processes in systems biology experiments. VitisNet is grounded in the Vitis vinifera genome (currently at 8x coverage) and can be readily updated with subsequent updates of the genome or biochemical discoveries. The molecular network files can be dynamically searched by pathway name or individual genes, proteins, or metabolites through the MetNet Pathway database and web-portal at http://metnet3.vrac.iastate.edu/. All VitisNet files including the manual annotation of the grape genome encompassing pathway names, individual genes, their genome identifier, and chromosome location can be accessed and downloaded from the VitisNet tab at http://vitis-dormancy.sdstate.org.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 20027228      PMCID: PMC2791446          DOI: 10.1371/journal.pone.0008365

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

During the pre-genomics era, gene function was established through a reductionist approach [1] where organism physiology was understood by breaking components into pieces, studying them, and then putting them back together to see the larger picture. With the emergence of genome sequencing, organisms are now seen as complex interactive systems. Systems biology, adapted from the general system theory [2] and the living system theory [3], intends to explain biological phenomena utilizing a systemic view of the objects' relationships rather than their simple composition [4]. Integrative functional genomics combines the molecular components (transcripts, proteins, and metabolites) of an organism and incorporates them into functional networks or models designed to describe the dynamic activities of that organism. While many of the functions of individual parts are unknown or not well defined, their biological role can sometimes be inferred through association with other known parts, providing a better understanding of the biological system as a whole. On a system-wide scale the description requires three levels of information [5], [6]: (1) identification of the components (structural annotation) and characterization of their identity (functional annotation); (2) identification of molecules that interact with each component, which leads to the reconstruction of a biochemical reaction network; and (3) characterization of the behaviors of the transcripts, proteins, and metabolites under various conditions. Integration of the three levels of information into a coherent framework (or canvas) provides a powerful approach to tackle the difficult problem of extracting systems-wide behavior from the component interactions. The most developed examples of application of this approach can be found in prokaryotes, because of their small genomes [7], [8]. For example, in E. coli, 92% of the gene product functions have been experimentally verified. Genome-scale models (GEMs) have been used for metabolic engineering to systematically manipulate E. coli strains to overproduce lycopene, lactic acid, ethanol, succinate, amino acids, and many other products including hydrogen and vanillin. New biological discoveries of open reading frames (ORF) can be made by focusing on the gaps in the unknown portions of the Omic maps, using the genomic responses of different genotypes under different conditions to determine the probable gene candidates that fill knowledge gaps. GEMs have been widely used to characterize and understand physiological responses to environmental conditions such as abiotic and biotic stresses. This has been particularly useful in the identification of resistance mechanisms that can be established in new strains. Such global analyses have become possible with the development of high throughput genomics technologies in both the field of nucleic acid sequencing and quantitative data acquisition. Over the last 20 years, expressed tag sequencing (EST) [9] has been widely utilized for gene discovery and genome characterization. EST data are stored in comprehensive databases such as UniGene [10] or the DFCI Gene Indices [11]. Recently, cheaper and faster Next-Gen sequencing technologies have emerged such as 454 [12] or Illumina [13]. Recently, cheaper and faster Next-Gen sequencing technologies have emerged such as 454 [12] or Illumina [13]. In parallel, methods have been developed for quantitative data acquisition: microarrays are used to quantitatively assess the transcriptome [14]. Two dimensional-gels have routinely been used for proteome studies [15]. Recently, however, gel-free technologies have emerged such as ICAT [16] or iTRAQ [17]. Metabolome studies are performed with a variety of tools such as gas chromatography or high performance liquid chromatography for separation and mass spectrometry and nuclear magnetic resonance for the identification and quantification of the metabolites [18]. Genomics resources for Vitis vinifera and related species have proliferated rapidly within the last several years, including EST sequencing [19], [20], [21] to whole genome sequencing [22], [23] and integrated genetic maps [24]. These resources have permitted large-scale mRNA expression profiling studies of gene expression profiles during berry development using cDNA or oligonucleotide microarrays [25], [26]. A high-density, Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array containing approximately one-third of the expected gene content of the V. vinifera genome with some bias towards leaf and berry tissues was developed, leading to numerous publications [27], [28], [29], [30], [31], [32], [33]. Under the encouragement of the international grape community, the microarray data for several of these experiments has been centralized and can be accessed at PLEXdb (http://www.plexdb.org) [34]. Six additional microarray datasets using cDNA, oligo, or Affymetrix arrays are available through Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/sites/entrez?db=geo) and citations for publications are also linked to these public data sets [33], [35], [36]. Proteomics resources have also emerged recently. Most of these studies use 2-D gel analysis and focus either on berry metabolism [37], [38] or abiotic stress resistance [39], [40], [41] or both [42]. Recently high resolution techniques, such as iTRAQ, have also been applied to grape [43]. Metabolomics studies for grape are still rudimentary; however, several works have presented simultaneous analysis of about 50 to 120 compounds [28], [30], [42], [44]. To date, only two studies present the transcriptomic, proteomic, and metabolomic analyses on the same material, one in berry tissues [29], [42] and the other on abiotic stress in shoots [40], [28]. Information from structural and functional genomics must be combined with detailed biochemical reaction networks to further our understanding of biological function and incorporate the knowledge into cultural practice. While a considerable amount of effort has been put into resolving the structural information (level 1) and “Omics” characterization of individual groups of transcripts, proteins or metabolites (level 3), relatively few biochemical reaction networks (level 2) have been constructed in grapevines or other plant systems. While pathway databases exist at the KEGG (http://www.genome.jp/kegg/pathway.html) or AraCyc [45], they are limited to metabolic pathways. In contrast, MetNet (http://metnet3.vrac.iastate.edu/) stores both metabolic and regulatory interactions for Arabidopsis and soybean [46]. In order to contextualize the molecular structure and a metric representing their behavior, we have developed a model of the molecular networks present in grapevines (VitisNet). This resource allows visualization of the dynamic interactions in the transcriptome, proteome, and metabolome within known molecular networks (for example, metabolic or signaling pathways). Integrating transcripts with protein and metabolite profiles in a comprehensive molecular map enables the researcher to elucidate different biochemical responses of grapevines to developmental and environmental cues.

Results and Discussion

A Set of 39,424 Unique Sequences Defined

The set of unique genes was not restricted to the Pinot Noir genome sequences, as an extensive amount of data have been produced on other V. vinifera cultivars and other Vitis species. The V. vinifera EST database contains only a very small fraction of Pinot Noir sequences (1.8% or 6,385/353,688), whereas Cabernet Sauvignon (half of the EST sequences), Chardonnay, Thompson Seedless, Muscat de Hambourg, and Perlette each have at least two times the number of Pinot Noir sequences. In addition, a significant amount of ESTs have been produced for other Vitis species. It is expected that a significant amount of transcript sequences are cultivar and species specific and may not be represented within the Pinot Noir PN40024 genome. A set of 39,424 unique sequences were defined after the matching of the genomic sequences and the transcripts (Figure 1). Only 36.4% of these sequences (14,330) were found in both the genomic sequences and the transcripts. In the set of unique sequences, the genomic sequences were conserved over transcript sequences because they should be the full length gene, whereas there is less certainty for the transcript. In some cases, several supposedly unique transcript sequences matched a single gene, mainly because they matched different regions of the gene. A total of 652 unique sequences corresponded to previously published grapevine sequences (Table S1).
Figure 1

Overview of the unique set assembly and results of the annotation procedure.

Box sizes are relative to respective number of genes inside.

Overview of the unique set assembly and results of the annotation procedure.

Box sizes are relative to respective number of genes inside. The set that was found only in the genomic sequences included 40.8% (16,104) of the unique sequences. This means that so far there is no proof that these sequences are actually transcribed. Finally 22.8% (8990) of the unique sequences were found only as transcripts. This set could include cultivar or species specific genes absent in the Pinot Noir genome or genes not yet extracted from the genome. However as 73% (6553) of these unique sequences were not homologous to sequences from other organisms, it is likely that most of them corresponded to short sequences or contained mostly UTR regions so that a BLAST analysis could not be conducted against the genome sequences encoding for their putative proteins. These sequences were of interest because many of them were placed on the highly popular Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array. There were 3208 sequences amongst the 11,734 non-redundant sequences in the Affymetrix chip that did not present a match in the genome.

Half of the Matched Sequences Were Assigned to Molecular Networks

Seventy percent (27,680) of the unique genes matched a previously described Vitis cDNA or protein or a sequence from another organism. The remaining 11,744 sequences were Vitis-specific and a function could not be assigned. This number rose to 83% when only genes from the genome sequences were used. This gene set was divided into two groups, a group that could not be assigned to molecular networks and a group that could be assigned. The group that was not assigned to molecular networks consisted of 14,535 genes (52.5%) that covered a wide range of functional descriptions. At one extreme, the sequences (1,817) presented a completely unknown function. At the other extreme, an identifier was attributed to unmapped sequences (1,578). An identifier was assigned because an EC or KO number could be attributed to these sequences or an Arabidopsis homolog had an identifier; however, they couldn't be placed on the networks. In between the unknown and EC/KO identity, the description of the function ranged from sequences containing a poorly described domain, a general enzymatic activity, or to a well-documented gene. The second subset of the matched genes (13,145 sequences, 47.5%), which were homologous to proteins with a known function, was assigned to the molecular networks. The 13,145 genes present in the networks were classified into 6 main overlapping categories (Table 1- 6): Metabolism (5442 sequences), Genetic Information Processing (1249 sequences), Environmental Information Processing (1305 sequences), Cellular Processes (1121 sequences), Transport (3523 sequences), and Transcription Factors (2423 sequences). The complete annotation of the genes and relevant information for each is presented in Table S1. The references used for annotating genes and for developing pathways not found in KEGG are presented in Text S1.
Table 1

List of Metabolic Pathways.

VVIDNetwork Name#gen#pro#metVVIDNetwork Name#gen#pro#met
1.1 Carbohydrate Metabolism
10010Glycolysis / Gluconeogenesis192282810530Aminosugars metabolism90911
10020Citrate cycle (TCA cycle)74172110520Nucleotide sugars met.601618
10030Pentose phosphate pathway83172110620Pyruvate metabolism1972719
10040Pentose/glucuron. interconv.57111410630Glyoxyl., dicarboxyl. met.901819
10051Fructose and mannose met.108212110640Propanoate metabolism73912
10052Galactose metabolism155172710650Butanoate metabolism851822
10053Ascorbate and aldarate met.409910562Inositol phosphate met.1311820
10500Starch and sucrose met.3374334
1.2 Energy Metabolism
10190Oxidative phosphorylation343101710720Red. Carb. cyc. (CO2 fix.)411014
10195Photosynthesis1735210680Methane metabolism130911
10196Photosynthesis - antenna prot.271110910Nitrogen metabolism1122219
10710Carbon fixation140212010920Sulfur metabolism461212
1.3 Lipid Metabolism
10061Fatty acid biosynthesis76133610561Glycerolipid met.1461918
10062Fatty acid elongation in mitoc.2572910564Glycerophospholipid met.1402932
10071Fatty acid metabolism94174010565Ether lipid metabolism5789
10072Synth. / degr. of ketone bodies183410600Sphingolipid metabolism671315
10100Biosynthesis of steroids142477410592alpha-Linolenic acid met.1041429
10140C21-Steroid hormone met.2061411040Biosynth. unsat. fatty ac.421427
1.4 Nucleotide Metabolism
10230Purine metabolism151486210240Pyrimidine metabolism1093546
1.5 Amino Acid Metabolism
10251Glutamate metabolism93282510330Arginine and proline met.541723
10252Alanine and aspartate met.109232410340Histidine metabolism701619
10260Gly, ser and thr met.110303810350Tyrosine metabolism1492539
10271Methionine metabolism124334810360Phenylalanine metabolism2121514
10272Cysteine metabolism78172510380Tryptophan metabolism2067
10280Val, leu and Ile degr.85183410400Phe, tyr and try biosynth.1443035
10290Val, leu and Ile biosynth.60132610220Urea cyc., met. amino grp1203141
10300Lysine biosynthesis821722
1.6 Met. of Other Amino Acids
10410beta-Alanine met.60121310460Cyanoamino acid met.35816
10450Selenoamino acid met.69151710480Glutathione met.1273516
1.7 Glycan Biosynth. And Met.
10510N-Glycan biosynthesis50192110563GPI-anchor biosynthesis211214
10511N-Glycan degradation67810602Glycosphingolip. biosynth.15716
10540Lipopolysac. Biosynth.12101311030Glycan struct. biosynth. 1882649
10550Peptidoglycan biosynth.18315
1.8 Met. of Cofactors and Vit.
10730Thiamine metabolism21122010780Biotin metabolism1268
10740Riboflavin metabolism63121510790Folate biosynthesis391824
10750Vitamin B6 metabolism2371310670One carbon pool by folate42159
10760Nicotinate, nicotinamide met30121310860Porph. and chloroph. met.673139
10770Pantothenate, CoA biosynth.44151910130Ubiquinone biosynthesis311725
1.9 Biosynth. of Secondary Met.
10900Terpenoid biosynthesis182182410941Flavonoid biosynthesis1832552
10904Diterpenoid biosynthesis72183710942Anthocyanin biosynthesis59818
10902Monoterpenoid biosynth.192243710943Isoflavonoid biosynthesis63717
10908Zeatin biosynthesis52102010950Alkaloid biosynthesis I651723
10906Carotenoid biosynth.40193310311Penicillin/cephalosp. bioS.1445
10905Brassinosteroid biosynth.1972411002Auxin biosynthesis981812
10940Phenylpropanoid biosynth.220214411012IBA metabolism14115
1.10 Other
11000Single reactions1621538

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network.

Table 2

List of Genetic Information Processing Networks.

VVIDNetwork Name#gen#proVVIDNetwork Name#gen#pro#met
2.1 Transcription
23020RNA polymerase853223022Basal transcription factors5520
2.2 Translation
23010Ribosome47314720970Aminoacyl-tRNA biosynthesis1282264
2.3 Folding, Sorting Degr.
23060Protein export361624120Ubiquitin mediated proteolysis158654
24130SNARE int. in ves. transport632224140Regulation of autophagy48155
23050Proteasome5848
2.4 Replication and Repair
23030DNA replication603823430Mismatch repair3719
23410Base excision repair312123440Homologous recombination3919
23420Nucleotide excision repair533623450Non-homologous end-joining148

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network.

Table 3

List of Environmental Information Processing Networks.

VVIDNetwork Name#gen#pro#metVVIDNetwork Name#gen#pro#met
3.1 Signal Transduction
34020Calcium signaling142272234150mTOR signaling2817
34070Phosphatidylinositol sign. syst.981317
3.2 Hormone Signaling
30001ABA signaling102561130008Ethylene signaling2481013
30003Auxin signaling262103230010Gibberellin signaling31141
30005Brassinosteroids signaling3013230011Jasmonate signaling86364
30007Cytokinin signaling70422
3.3 Plant-Specific Signaling
34710Circadian rhythm944830009Flower development185

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network.

Table 4

List of Cellular Processes Networks.

VVIDNetwork Name#gen#pro#met
4.1 Cell Motility
44810Regulation of actin cytoskeleton3601141
4.2 Cell Growth and Death
44110Cell cycle315192
4.3 Cell Wall
40006Cell wall4485311

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network.

Table 5

List of Transport Networks.

VVIDNetwork Name#gen#pro#met.VVIDNetwork Name#gen#pro
5.1 Membrane Transport
52010ABC transporters28387
5.2 Hormone Transport
50004Auxin transport57232
5.3 Transport System
50110Protein coat1578350112Nuclear pore complex7226
50111tethering factors1006550113Thylakoid targeting pathway6215
5.4 Transporter Catalog
50101Channels and pores39113150124Porters categories 30 to 6415569
50104Group translocators39450125Porters categories 66 to 9421550
50105Transport electron carriers893850131Prim. active transp. cat. A2-A420044
50108Accessory fact. Inv. in transp.1731150132Prim. active transp. cat. A5-A818469
50109Incomp. charact. transp. syst.33210150133Prim. act. transp. cat. A9-A1819171
50121Porters categories 1 to 61878550134Primary. active transp. Cat. D116439
50122Porters categories 7 to 172424950135Prim. active transp. Cat. D3-E212543
50123Porters categories 18 to 2920446

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network.

Table 6

List of Transcription Factors Networks.

VVIDNetwork Name#gen#proVVIDNetwork Name#gen#proVVIDNetwork Name#gen#pro
60001ABI3VP1262660028FHA191960055SBP2222
60002Alfin8860029G2-like393960056SET PCG5252
60003AP2 EREBP13913960030GeBP7760057Sigma70-like88
60004ARF272760031GIF4460058SNF24444
60005ARID111160032GRAS535360059SRS55
60006ARR-B151560033GRF141460060TAZ77
60007AS2424260034HB939360061TCP2020
60008AUXIAA282860035HMG161660062Trihelix3636
60009BBR5560036HRT1160063TUB1717
60010BES17760037HSF232360064ULT11
60011BHLH14614660038Jumonji272760065VOZ22
60012BZIP666660039LFY1160066WRKY6969
60013BHSH1160040LIM151560067zf-MYND44
60014C2C2-CO151560041LUG7760068zf-HD1515
60015C2C2-DOF262660042MADS717160069ZIM1414
60016C2C2-GATA202060043MBF15560070Orph_CCT99
60017C2H211711660044MYB17617660071Orph_FAR-RED5353
60018C3H797960045MYB rel.595960072Orph_Resp_reg1414
60019C2C2-YABBY7760046NAC868660073Orph_zf-b_box1414
60020CAMTA6660047PBF-2like2260074Orph_zf-SWIM99
60021CCAAT303060048PHD717160075Other BSD88
60022CPP7760049PLATZ111160076Other GTF77
60023CSD3360050PsARR-B8860077Other zf-AN11414
60024DBP3360051RB2260078Other zf-C3HC4244244
60025DDT8860052RWP-RK101060079Other zf-DHHC2424
60026E2F-DP9960053S1Fa-like3360080Other zf3232
60027EIL4460054SAP11

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network.

VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network. VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network. VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network. VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network. VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network; #met: number of metabolites in network. VVID: VitisNet identification number; #gen: number of genes in network; #pro: number of proteins in network.

Construction of 219 Networks

The networks were constructed with the CellDesigner software. This software has the benefit of being able to save the networks in the SBML (System Biology Markup Language) format. This format is highly portable into a variety of software packages, including Cytoscape, which was used here for data visualization of molecular expression. The networks were constructed with four main families of nodes (gene, transcripts, proteins, and metabolites) represented by specific shapes and colors in CellDesigner (Figure 2) and by shape only in Cytoscape (Figure 3; color was used to visualize abundance). In VitisNet, some extra node styles can be used in the networks for additional categories (phenotypes, phylogenic tree node, etc.). Edge styles represented different types of reactions, and they were specified by shape in CellDesigner and color in Cytoscape; Text S2 has a legend that summarizes the node and edge styles used in VitisNet in Cytoscape. Five digit IDs were assigned to the networks (Table S2). The first digit refers to the network category (metabolic pathway etc.), and the last four digits refer to the KEGG pathway number (if it existed in KEGG).
Figure 2

Citrate cycle pathway visualized using CellDesigner.

Symbols represent different molecules or reactions, i.e. blue rectangle: gene; green parallelogram: transcript; orange round rectangle: protein; and yellow ellipse: metabolite. Edges with a circle at the tip: catalysis (A). Edges with Delta at the tip: metabolic reaction (B). Edges with dash-dot-dot-dash: transcription (C). Edges with dash-dot-dash: translation (D). Insert box at the upper right represents a zoom-in of an area of the network showing the different molecule types.

Figure 3

Flavonoid biosynthesis pathway and tissue-specific molecule abundance visualized using Cytoscape.

Parallelogram: transcript; rectangle: protein; ellipse: metabolite; triangle: reaction node. Blue edges with circle at the tip: catalysis. Black edges with Delta at the tip: metabolic reaction. Turquoise edges: translation. Transcript node in bold: existence of an Affymetrix (Vitis Vinifera (Grape) Genome Array) probeset. Red: over abundant in seed; magenta: over abundant in skin; green: over abundant in pulp; orange: over abundant in seed and skin. Insert box at the upper right represents a zoom-in of an area of the network showing the different molecule types.

Citrate cycle pathway visualized using CellDesigner.

Symbols represent different molecules or reactions, i.e. blue rectangle: gene; green parallelogram: transcript; orange round rectangle: protein; and yellow ellipse: metabolite. Edges with a circle at the tip: catalysis (A). Edges with Delta at the tip: metabolic reaction (B). Edges with dash-dot-dot-dash: transcription (C). Edges with dash-dot-dash: translation (D). Insert box at the upper right represents a zoom-in of an area of the network showing the different molecule types.

Flavonoid biosynthesis pathway and tissue-specific molecule abundance visualized using Cytoscape.

Parallelogram: transcript; rectangle: protein; ellipse: metabolite; triangle: reaction node. Blue edges with circle at the tip: catalysis. Black edges with Delta at the tip: metabolic reaction. Turquoise edges: translation. Transcript node in bold: existence of an Affymetrix (Vitis Vinifera (Grape) Genome Array) probeset. Red: over abundant in seed; magenta: over abundant in skin; green: over abundant in pulp; orange: over abundant in seed and skin. Insert box at the upper right represents a zoom-in of an area of the network showing the different molecule types.

Metabolic pathways (1)

Metabolic pathways are the most common type of pathway that can be found for plants in several online databases such as KEGG or PlantCyc (http://www.plantcyc.org/). These networks (Table 1) represented metabolic reactions known to occur in grapevines. With the software package KEGG2SBML, it was easy to import the metabolic pathways from KEGG. The KEGG pathways were limited when they were used; they only showed metabolites and proteins involved in reactions and included reactions that may not occur in plants. Therefore, additional information and symbols representing the missing grape genes and transcripts were added to the networks in VitisNet described in this paper. Reactions in KEGG without a putative grape protein identified and for which no evidence for their presence in plants could be found in the literature were removed. Finally, reactions in grapevines that were absent in KEGG were manually added to the networks. The total number of items in the 88 grape metabolic pathways constructed included: 7,854 genes and transcripts, 1,631 proteins, and 1,998 metabolites. Some of these items were present in more than one network.

Genetic information processing (2)

The category “Genetic Information Processing” (Table 2) corresponds to housekeeping mechanisms that are present and highly conserved in all eukaryotes. These networks were present on the KEGG website but in a different format than the metabolic networks; therefore exportation with KEGG2SBML was not possible. These networks were represented by a picture of a specific modus operandi, with every involved protein listed at the side rather than in a diagram of the enzymatic reactions. In VitisNet, we have tried to represent these pictures interactively. Where this was not possible, the networks were presented as lists of genes, transcripts, and proteins. The total number of items in the 15 “Genetic Information Processing” networks included 1,338 genes and transcripts, 527 proteins, and 71 metabolites.

Environmental information processing (3)

The category “Environmental Information Processing” (Table 3) represents signal processes that occur in the grapevine. The networks belonging to “Signal Transduction” are highly variable amongst species but they are well documented for Arabidopsis in KEGG and were constructed using the Arabidopsis data. The networks for hormone signaling and plant-specific signaling were reconstructed from the literature. To the best of our knowledge, these networks could not be found in any other pathway databases. These networks are particularly valuable for the plant community since hormonal signaling is an important subject in many plant physiology studies. The total number of items in the 12 “Environmental Information Processing” networks included 1,373 genes and transcripts, 563 proteins, and 63 metabolites.

Cellular processes (4)

These networks for the “Cellular Processes” category (Table 4) were named from the KEGG pathways; however the KEGG pathways were not related to the molecular events occurring in plants. Although a small portion of the pathways were derived from KEGG, most components of the networks were constructed from information collected from the literature. The total number of items in the 3 “Cellular process” networks included 1,123 genes and transcripts, 359 proteins, and 12 metabolites.

Transport (5)

The networks for Hormone Transport (5.2) and Transport Systems were constructed from the literature (Table 5). The networks in “Transporters Catalog” present the classification of the putative grape transporters according to the transporter classification (TC) system. This classification was formally adopted by the International Union of Biochemistry and Molecular Biology (IUBMB) in June 2001 and is the international standard for the classification of transporters. In VitisNet, molecules designating a transporter were linked to their corresponding category. The total number of items in the 21 “Transport” networks included 3,622 genes and transcripts, 1,149 proteins, and 1 metabolite.

Transcription factors (6)

These networks presented the classification of the grape putative transcription factors (Table 6). The classification used here was a customized version of two plant transcription factor databases that contained a total of 80 families. The PlantTFDB [47] contained 64 families and the PlnTFDB [48] contained 68 families. Most of the families (58) were present in the two databases, although two families were exclusive to PlantTFDB and eight were exclusive to PlnTFDB. In addition, 12 families were exclusive to the grapevine transcription factors. Representatives of five of these families were present in the plntfdb under the family named “orphans” and we chose to break this group into distinct families. The seven other families identified were proteins that contain a domain found in BTF2-like transcription factors, Synapse associated proteins and DOS2-like proteins (BSD, [49]), the Global Transcription Factor group (GTF), and subfamilies of zinc finger proteins. The transcription factor families were presented as a phylogenetic tree, which allowed subfamilies to be grouped together. The total number of items in the 80 “Transcription factors” networks included 2,423 genes, transcripts, and proteins.

Omics Data Can Be Visualized on the Networks

Annotation of the genes and construction of VitisNet has filled a major gap in precise descriptive and quantitative tools for grapevine systems biology. The next challenge is the integration of the data. The molecular networks were built to allow simultaneous visualization of transcripts, proteins, and metabolites. Their respective abundance under various conditions can be visualized through the Cytoscape software. Several methods exist to correlate and integrate transcript, protein, and metabolite profiles. For example molecular abundance profiles were linked with Pearson [50], [51] and Spearman [52] correlation coefficients, the BL-SOM method [53], [54] and the O2PLS method [55]. The O2PLS method enables the determination of the effect of each variable, in a multivariable experiment, on the co-expression of molecules. More recently the O2PLS method has been developed further to integrate all three molecular profiles (transcripts, proteins, and metabolites) [56]. In most of these statistical studies, data were visualized by representing molecules by nodes and the correlation by edges. Subsequently, selected pathways were drawn manually for biological phenomenon highlighted by the correlations of molecular abundance. In the visualization of “omics” data in VitisNet, edges represented biological processes and nodes represented molecules, as in classical presentations of pathways. Molecular abundance was represented by color changes of the nodes and biological phenomenon could be visualized automatically. As an illustration of the methodology used in VitisNet to provide visualization of “omics” data, datasets from a study of the differential transcript, protein, and metabolite abundance measured in three berry tissues [29], [42] was uploaded into the molecular maps. For consistency, proteins and metabolites [42] were clustered with the same methods used for clustering the transcripts [29] and the same color scheme was used, (green = molecules over-abundant in pulp, purple = molecules over-abundant in the skin, and orange = molecules over-abundant in seed [29]). The flavonoid biosynthesis pathway (Figure 3) presented here was more complex than previous representations of the pathway in [29] and [42]. Here it was further customized from the total flavonoid biosynthesis pathway in VitisNet by removing the gene nodes for easier visualization. As these studies have illustrated, molecules involved in the flavonoid biosynthesis pathway are slightly more abundant in skin than seed and clearly more abundant in both skin and seed than in the pulp. Transcriptomic results from Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array were used here, but data from any microarray platform can be uploaded onto the networks. For example, Table S1 contains data for mapping the cDNA array used in a grape bud chilling requirement fulfillment study [35]. The integration of the berry tissues “omic” data on all the pathways was divided into higher level pathway categories; the Cytoscape session files, molecular networks and a tutorial (Text S2) can be accessed and downloaded at the VitisNet tab at vitis-dormancy.sdstate.org. All molecular network files are also available for browsing or downloading at MetNet (http://metnet3.vrac.iastate.edu/)

Conclusion

An exhaustive coverage of the network of grapevine molecules has been developed. It presents an easy, fast, and comprehensive method for simultaneous integration and visualization of “omics” data. These molecular networks provide biological value for both grapevine researchers and the rest of the plant scientist community. The following attributes are provided: (i) original plant-specific pathways within VitisNet, (ii) the possibility to create a mapping file of genes from other plants, and (iii) the ability to customize the schematics for new or species-specific reactions. In the future, in cooperation with the scientific community's curation of gene annotations, we are planning to release new networks and update existing networks with emerging data (ie. miRNA) at MetNet (http://metnet3.vrac.iastate.edu/) and VitisNet (http://vitis-dormancy.sdstate.org/pathways.cfm).

Materials and Methods

Definition of a Unique Set of Genes

The 30,434 DNA sequences encoding for putative proteins from the Vitis vinifera (c.v Pinot noir PN40024) genome [23] were matched to EST sequences from Vitis vinifera and other Vitis species. The V. vinifera sequences originated from the 5.0 release of the DFCI grape index (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=grape) which contained 34,134 unique sequences. The set of non-vinifera sequences contained a total of 26,589 redundant ESTs obtained from the NCBI website. This set included sequences from the following species: V. shuttleworthii (10,704 sequences), hybrid cultivars (6,542 sequences), V. arizonica x rupestris (5,421 sequences), V. aestivalis (2,101 sequences), and V. riparia (1,821 sequences). A BLAST analysis of the sequences from the V. vinifera EST set and the non-vinifera EST set (Megablast, p > 95, e-value<1e-15) was conducted against the genomic sequences. Sequences not identified in the genome were added to the genomic sequences to constitute the unique sequences set. The 1395 mRNAs corresponding to grapevine protein sequences registered in UniProt and not belonging to one of the two genome sequencing projects were manually retrieved and BLAST analyzed (blastn e-value <1 e-15) against the unique sequences set.

Gene Annotation

During the first steps of annotation, a batch BLAST analysis (blastx, e-value<1e-10) of unique sequences was conducted against several relevant databases, including the Arabidopsis and rice genomes and the Viridiplantae protein sequences in NCBI. For each gene, the ten best significant matches in each database were conserved and reviewed for defining the most likely annotation. Particular attention was paid to using identical nomenclature for genes with the same function. A BLAST analysis of the genes that had at least one significant match containing a putative function was conducted against the KEGG database (http://www.genome.jp/kegg/) for defining an enzyme commission (EC) number or a KEGG Orthology (KO) number. For genes not identified in this screen, the EC number of genes suspected to encode for a protein with enzymatic function was identified by browsing enzyme nomenclature databases (such as Expasy (http://www.expasy.org/enzyme/) or BRENDA (http://www.brenda-enzymes.org/)). A BLAST analysis (blastx, e-value<1e-10) of the unique set was conducted against the Transport Classification Database (TCDB) (http://www.tcdb.org/) and the genes matching sequences from that database were again manually reviewed and assigned to a category from the Transport Classification System [57]. BLAST analysis (blastx, e-value<1e-10) of the unique set was conducted against two plant transcription factor databases, PlantTFDB, (http://planttfdb.cbi.pku.edu.cn/) [47] and PlnTFDB (http://plntfdb.bio.uni-potsdam.de/v2.0/) [48]. InterPro domains obtained for the grape sequences from the UniProt website were also used for the classification of transcription factors. The transcription factors were then grouped into families. Where molecular interactions were identified in the literature, the gene function was browsed to identify the Vitis gene potentially involved. The genes described in the literature were validated by BLAST against the unique set of Vitis sequences to correctly identify any potential homolog that was previously mislabeled. A short identifier was defined for genes that were present on the networks but did not have a previously defined EC number or a KO. For most of these, that identifier corresponded to the one commonly used for their Arabidopsis homolog in their Entrez webpage (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene). For genes without an Arabidopsis homolog with a clear identifier, a unique identifier was created that was consistent with the gene function.

Network Construction

KEGG metabolic pathways were downloaded from the KEGG website and converted into SBML files with the KEGG2SBML software package [58]. Grape genes and transcripts were manually added to the networks and linked to their corresponding proteins with the CellDesigner software package [59]. Plant- or grape-specific reactions that were not present in KEGG but were described in the literature were added manually.

Genetic information processing (2), signal transduction (3.1), and ABC transporters (5.2)

KEGG pathways were manually reconstructed with CellDesigner using the SBML format, and then grape genes and transcripts were manually added to the networks and linked to their corresponding proteins. Plant- or grape-specific processes that were not present in KEGG but were described in the literature were manually added.

Hormones signaling (3.2), plant-specific signaling (3.3), cellular processes (4), hormone transport (5.2), and transport system (5.3)

Networks were manually constructed from the literature with CellDesigner using the SBML format, and then grape genes and transcripts were manually added to the networks and linked to their corresponding proteins.

Transport catalog (5.4)

Networks were manually constructed with CellDesigner using the SBML format. Grape genes and transcripts matching transporter proteins from any other organisms were manually added to the networks and linked to their corresponding proteins. Proteins were linked to an object class representing a transporter subcategory from the TCdb.

Transcription (6)

Networks were manually constructed with CellDesigner using the SBML format. Grape genes and transcripts matching transcription factors from other species were manually added to the networks and linked to their corresponding proteins. For each transcription factor family, a phylogenetic tree was constructed based on protein alignment generated with the neighbor-joining method using ClustalW. The transcription factors were then grouped according to the phylogenetic tree. Distances are not related to respective phylogenic distances. All the relevant bibliography for the construction of literature-based pathways is included in Table S2 and Text S1.

Expression Profiling

Affymetrix probesets were matched to the genome using the same process as that used between the genome sequences and EST sequences. The tentative contigs from the DFCI Grape Gene Index (http://compbio.dfci.harvard.edu/tgi/cgi-bin/tgi/gimain.pl?gudb=grape), that contain the ESTs that were used as templates for the Affymetrix probesets, were BLAST analyzed against the genome sequences (Megablast, p>95, e-value<1e-15). Transcriptomic data were retrieved from Grimplet et al. [29]. Proteomics and metabolomics data were retrieved from Grimplet et al. [42]. All molecules with differential abundance were grouped into 12 clusters presented by Grimplet et al. [29] according to their abundance in the three berry tissues. Data were visualized using VitisNet with the Cytoscape software [60] (see Text S2 for a tutorial on the complete procedure). The complete grape gene annotation based on the 8X assembly (Jaillon et al., 2007) of transcript sequences. Unique Gene: Genoscope ID (Jaillon et al., 2007) is used if a genome sequence has been identified, otherwise VVGI 5 TC (Tentative Consensus sequences) number or EST GenBank ID is used. Unique transcript: VVGI 5 TC number or EST GenBank ID is used if a transcript has been identified, otherwise the Genoscope ID is used. Function: tentative functional annotation. Network ID: the identifier that is used in the networks. Network or simplified category: list of the networks where the genes appear, otherwise a short description of the biological role. In Network: the gene is present in at least one network. Probeset: probeset ID for the Affymetrix GeneChip® Vitis vinifera (Grape) Genome Array. Best Arabidopsis match: best matched hit in Arabidopsis putative proteins. InterPro domain ID: list of the domains detected from InterPro (Hunter et al., 2009). Gene Ontology ID: list of the identified GO terms. Gene Ontology description: description of the GO term (The Gene Ontology Consortium, 2009). Accession UniProt: UniProt ID for the genome sequences (Apweiler et al., 2004). Accession UniProt for published grapevine protein: UniProt ID for grapevine proteins individually published apart of the genome sequencing. EST probeset: EST from which the probeset was designed. IASMA gene: ID from the heterozygote Vitis genome (Velasco et al., 2007). Chromosome position: position of the gene on chromosome retrieved from Gramene.org. Other Vitis: presence in non-vinifera Vitis species. cDNA array: ID used in the cDNA array from Mathiason et al., (2009). Other TC from VVGI5: list of other TC from the DFCI matching the gene. Other probesets: other Affymetrix probesets matching the gene. (10.28 MB XLS) Click here for additional data file. List of pathways constructed from bibliographic data and the corresponding journal articles used. (0.03 MB DOC) Click here for additional data file. References for supporting material. (0.06 MB DOC) Click here for additional data file. Tutorial for Using VitisNet, a database for the grapevine molecular networks. (3.51 MB DOC) Click here for additional data file.
  53 in total

Review 1.  Systems biology: a brief overview.

Authors:  Hiroaki Kitano
Journal:  Science       Date:  2002-03-01       Impact factor: 47.728

2.  BSD: a novel domain in transcription factors and synapse-associated proteins.

Authors:  Tobias Doerks; Saskia Huber; Erich Buchner; Peer Bork
Journal:  Trends Biochem Sci       Date:  2002-04       Impact factor: 13.807

3.  Grape berry biochemistry revisited upon proteomic analysis of the mesocarp.

Authors:  Jean-Emmanuel Sarry; Nicolas Sommerer; François-Xavier Sauvage; Alexis Bergoin; Michel Rossignol; Guy Albagnac; Charles Romieu
Journal:  Proteomics       Date:  2004-01       Impact factor: 3.984

4.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

5.  Phenotype characterisation using integrated gene transcript, protein and metabolite profiling.

Authors:  Matej Oresic; Clary B Clish; Eugene J Davidov; Elwin Verheij; Jack Vogels; Louis M Havekes; Eric Neumann; Aram Adourian; Stephen Naylor; Jan van der Greef; Thomas Plasterer
Journal:  Appl Bioinformatics       Date:  2004

6.  Elucidation of gene-to-gene and metabolite-to-gene networks in arabidopsis by integration of metabolomics and transcriptomics.

Authors:  Masami Yokota Hirai; Marion Klein; Yuuta Fujikawa; Mitsuru Yano; Dayan B Goodenowe; Yasuyo Yamazaki; Shigehiko Kanaya; Yukiko Nakamura; Masahiko Kitayama; Hideyuki Suzuki; Nozomu Sakurai; Daisuke Shibata; Jim Tokuhisa; Michael Reichelt; Jonathan Gershenzon; Jutta Papenbrock; Kazuki Saito
Journal:  J Biol Chem       Date:  2005-05-02       Impact factor: 5.157

7.  MetaCyc and AraCyc. Metabolic pathway databases for plant research.

Authors:  Peifen Zhang; Hartmut Foerster; Christophe P Tissier; Lukas Mueller; Suzanne Paley; Peter D Karp; Seung Y Rhee
Journal:  Plant Physiol       Date:  2005-05       Impact factor: 8.340

8.  The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species.

Authors:  J Quackenbush; J Cho; D Lee; F Liang; I Holt; S Karamycheva; B Parvizi; G Pertea; R Sultana; J White
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

9.  Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents.

Authors:  Philip L Ross; Yulin N Huang; Jason N Marchese; Brian Williamson; Kenneth Parker; Stephen Hattan; Nikita Khainovski; Sasi Pillai; Subhakar Dey; Scott Daniels; Subhasish Purkayastha; Peter Juhasz; Stephen Martin; Michael Bartlet-Jones; Feng He; Allan Jacobson; Darryl J Pappin
Journal:  Mol Cell Proteomics       Date:  2004-09-22       Impact factor: 5.911

10.  Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation.

Authors:  Scott A Becker; Bernhard Ø Palsson
Journal:  BMC Microbiol       Date:  2005-03-07       Impact factor: 3.605

View more
  64 in total

1.  Identification of putative stage-specific grapevine berry biomarkers and omics data integration into networks.

Authors:  Anita Zamboni; Mariasole Di Carli; Flavia Guzzo; Matteo Stocchero; Sara Zenoni; Alberto Ferrarini; Paola Tononi; Ketti Toffali; Angiola Desiderio; Kathryn S Lilley; M Enrico Pè; Eugenio Benvenuto; Massimo Delledonne; Mario Pezzotti
Journal:  Plant Physiol       Date:  2010-09-08       Impact factor: 8.340

2.  Heat and water stress induce unique transcriptional signatures of heat-shock proteins and transcription factors in grapevine.

Authors:  Margarida Rocheta; Jörg D Becker; João L Coito; Luísa Carvalho; Sara Amâncio
Journal:  Funct Integr Genomics       Date:  2014-03       Impact factor: 3.410

3.  Differential floral development and gene expression in grapevines during long and short photoperiods suggests a role for floral genes in dormancy transitioning.

Authors:  Lekha Sreekantan; Kathy Mathiason; Jérôme Grimplet; Karen Schlauch; Julie A Dickerson; Anne Y Fennell
Journal:  Plant Mol Biol       Date:  2010-02-12       Impact factor: 4.076

4.  Multiomics in grape berry skin revealed specific induction of the stilbene synthetic pathway by ultraviolet-C irradiation.

Authors:  Mami Suzuki; Ryo Nakabayashi; Yoshiyuki Ogata; Nozomu Sakurai; Toshiaki Tokimatsu; Susumu Goto; Makoto Suzuki; Michal Jasinski; Enrico Martinoia; Shungo Otagaki; Shogo Matsumoto; Kazuki Saito; Katsuhiro Shiratake
Journal:  Plant Physiol       Date:  2015-03-11       Impact factor: 8.340

5.  Timing and Order of the Molecular Events Marking the Onset of Berry Ripening in Grapevine.

Authors:  Marianna Fasoli; Chandra L Richter; Sara Zenoni; Edoardo Bertini; Nicola Vitulo; Silvia Dal Santo; Nick Dokoozlian; Mario Pezzotti; Giovanni Battista Tornielli
Journal:  Plant Physiol       Date:  2018-09-17       Impact factor: 8.340

Review 6.  Recent advances in biotechnological studies on wild grapevines as valuable resistance sources for smart viticulture.

Authors:  Samia Daldoul; Hatem Boubakri; Mahmoud Gargouri; Ahmed Mliki
Journal:  Mol Biol Rep       Date:  2020-03-04       Impact factor: 2.316

7.  Pectic-β(1,4)-galactan, extensin and arabinogalactan-protein epitopes differentiate ripening stages in wine and table grape cell walls.

Authors:  John P Moore; Jonatan U Fangel; William G T Willats; Melané A Vivier
Journal:  Ann Bot       Date:  2014-05-07       Impact factor: 4.357

8.  Molecular memory of Flavescence dorée phytoplasma in recovering grapevines.

Authors:  Chiara Pagliarani; Giorgio Gambino; Alessandra Ferrandino; Walter Chitarra; Urska Vrhovsek; Dario Cantu; Sabrina Palmano; Cristina Marzachì; Andrea Schubert
Journal:  Hortic Res       Date:  2020-08-01       Impact factor: 6.793

9.  Proteomic analysis of shoot tissue during photoperiod induced growth cessation in V. riparia Michx. grapevines.

Authors:  Kim J Victor; Anne Y Fennell; Jérôme Grimplet
Journal:  Proteome Sci       Date:  2010-08-12       Impact factor: 2.480

10.  MetNetAPI: A flexible method to access and manipulate biological network data from MetNet.

Authors:  Yves Sucaet; Eve Syrkin Wurtele
Journal:  BMC Res Notes       Date:  2010-11-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.