Literature DB >> 20965973

Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics.

Zhangjun Fei¹, Je-Gun Joung, Xuemei Tang, Yi Zheng, Mingyun Huang, Je Min Lee, Ryan McQuinn, Denise M Tieman, Rob Alba, Harry J Klee, James J Giovannoni.

Abstract

Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu.

Entities: Chemical Species

Mesh：

Substances：
RNA, Small Untranslated

Year: 2010 PMID： 20965973 PMCID： PMC3013811 DOI： 10.1093/nar/gkq991

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Tomato (Solanum lycopersicum) is an economically important vegetable/fruit crop throughout the world with significant importance for human health and nutrition. It has long served as a model system for fleshy fruit development, plant genetics, pathology and physiology. Currently the entire genome of tomato is being sequenced by an international consortium, resulting in a wealth of genomics resources including BAC and fosmid libraries and their end sequences, high density genetic and physical maps, a large set of molecular markers and a number of powerful computational pipelines for sequence analysis and genome annotation (1). Meanwhile, numerous large-scale functional genomics resources for tomato have been developed over the past several years with new resources accumulating at a rapid rate. A large collection of tomato Expressed Sequence Tags (ESTs) that currently represents approximately 40 000 unigenes derived from more than 300 000 ESTs has been generated (http://www.sgn.cornell.edu). In addition a collection of more than 11 000 tomato full-length cDNA sequences has been released (2). Several publicly available microarray platforms have been created based on the tomato EST collection and some have been used extensively by the community to investigate the dynamics of the tomato transcriptome in different biological processes, resulting in large amounts of gene expression data. Recently, comprehensive profiles of numerous tomato metabolites have been generated and integrated with phenotypic trait data and transcript profiles to identify metabolic regulatory networks for the purpose of improving fruit quality (3,4). Profiles of fruit flavor and nutrition-related metabolites from well-defined tomato introgression lines (ILs) have also been generated in order to identify loci or genes affecting flavor and nutrition through systems analysis of genotype, metabolite and gene expression data (5,6). In the past several years, large numbers of tomato small RNA (sRNA) sequences have been accumulated and are exponentially expanding due to rapid advances in sequencing technologies (7–9). Understanding functions of these tomato sRNAs represents an emerging and relatively unexploited opportunity to provide novel insights into the regulatory mechanisms of biologically and agriculturally important processes amenable to the tomato system including fruit development and ripening. We previously described the Tomato Expression Database (TED), which serves as a central repository of tomato expression data and contains a suite of data presentation and analysis tools to assist in the development and testing of biological hypotheses (10). With the rapid accumulation of large-scale metabolite profile and sRNA data in tomato, we have expanded the database into what we now term the Tomato Functional Genomics Database (TFGD). Besides the newly added tomato metabolite and sRNA data sets, TFGD has been significantly improved with a number of new features and analysis tools. Furthermore, we have developed computational pipelines to process and analyze raw tomato microarray expression, metabolite profile and sRNA data sets, respectively, to ensure uniformity of the analyzed results for cross-experiment comparisons.

DATABASE CONTENTS AND FUNCTIONS

TFGD is a comprehensive collection of tomato functional genomics data. Currently, the database contains three major data components: gene expression, metabolite profiles and sRNAs. All the data were collected from the public domain with regular updates and retrieval of newly released data. The database is tightly linked to the Solanaceae Genomics Network (SGN, http://www.sgn.cornell.edu), a community database containing comprehensive genomics information for solanaceous species including the tomato genome sequence (11).

Annotations of microarray probes

Currently three microarray platforms are publicly available for tomato: TOM1 cDNA array, TOM2 oligonucleotide array and Affymetrix genome array. Probes on these three array platforms were annotated by comparing their corresponding consensus or SGN unigene sequences against GenBank nr, SwissProt/TrEMBL and Arabidopsis protein databases. Gene Ontology (GO) terms were then assigned to array probes using the Gene Ontology Annotation Database (12) based on their top Swiss-Prot/TrEMBL hits. GO terms assigned to each probe were then mapped to a set of plant specific GO slims using a Perl script, map2slim.pl, available at the Gene Ontology website (http://www.geneontology.org/GO.slims.shtml). Array probes were further assigned to tomato metabolic pathways based on the LycoCyc database (tomato metabolic pathway database) available at SGN.

Updates on tomato gene expression data

Tomato microarray data sets have been collected from public repositories including NCBI Gene Expression Omnibus (13) and EBI ArrayExpress (14), in addition to those directly submitted to TFGD. All array data sets are archived in TFGD following MIAME guidelines (15). Currently TFGD contains a total of 1308 hybridizations from 43 experiments, of which 773 248 and 287 are from TOM1 cDNA, TOM2 oligonucleotide and Affymetrix genome arrays, respectively (Table 1).

Table 1.

Statistics of tomato microarray experiments in TFGD

Array platform	No. of experiments	No. of hybridizations^a	No. of distinct hybridizations
TOM1 cDNA array	20	773	132
TOM2 oligonucleotide array	8	248	38
Affymetrix genome array	15	287	100
Total	43	1308	270

aIncluding biological and technical replicates.

Statistics of tomato microarray experiments in TFGD aIncluding biological and technical replicates. To ensure uniformity of the analyzed results for cross-experiment comparisons, we have implemented computational pipelines to process the microarray data sets archived in our database. Briefly, for data sets generated using spotted arrays (TOM1 and TOM2), raw data were normalized using the print-tip LOWESS normalization strategy (16). Spots flagged by image quantification programs as poor quality and spots that were not expressed in both channels were filtered out. Probes with at least two replicated data points were included in downstream statistical analysis. Significance of differential gene expression was determined using Patterns from Gene Expression (PaGE) (17). For data sets generated using the Affymetrix array, raw array data (CEL files) were normalized at the probe level using the gcRMA algorithm (18) and significance of differential gene expression was determined with the LIMMA package (19). All raw and analyzed array data can be downloaded from the database without restriction. In addition to query interfaces and tools described in our previous report (10), a number of new tools to facilitate mining and analyzing the array results have been implemented. A co-expression analysis tool that can identify genes whose expression profiles are highly positively or negatively correlated with that of a given gene was implemented in TFGD. This tool can help to identify genes with similar functions since co-expressed genes are often involved in same or related pathways and biological processes. Microarray experiments typically produce a list of hundreds or thousands of interesting genes based on defined statistical criteria. Condensing and translating such a list into biologically meaningful and manageable information is required to better understand the underlying biological phenomena of interest. To achieve this goal, we implemented GO term enrichment analysis and biochemical pathway analysis tools in the database. Both tools were adopted from Plant MetGenMAP (20). The GO term enrichment tool, which was implemented based on the GO::TermFinder Perl module (21), can identify a set of over-represented GO terms reflecting highly affected biological processes from a list of user input genes or a microarray data set archived in the database. The pathway analysis tool can rapidly retrieve a list of significantly altered biochemical pathways (Figure 1A) and provide intuitive visualization of transcriptional events within a pathway with genes highlighted in different colors to reflect their expression level changes (Figure 1B). The results obtained from these analyses can provide insight into the mechanisms that underlie targeted biological phenomena or biochemical changes associated with them at the molecular level.

Figure 1.

Pathway analysis in the database. (A) Screenshot of an example result returned by the pathway analysis tool in TFGD which lists altered pathways identified from a gene expression data set. (B) Visualization of detailed transcript expression changes in a pathway. One of the major tasks in gene expression data analysis is to sort a list of genes into different functional categories as a means of furthering downstream analysis. In TFGD, we implemented a tool that uses plant specific GO slims, which are a list of high level GO terms providing a broad overview of the ontology content (http://www.geneontology.org/GO.slims.shtml), to functionally classify a list of user input genes.

Tomato metabolite data

During the last decade, analyses of mRNA at the whole-genome level have proven central to most functional genomics initiatives. Recently, metabolite profiling has emerged as an additional layer of phenotypic information to more fully inform gene functional interpretation and has the potential not only to provide deeper insight into complex regulatory processes but also to determine biochemical and downstream phenotypes directly (22). Currently TFGD contains profiles of numerous flavor and nutrition-related metabolites. Fruit flavor and nutrition composition have clear positive human benefit. However flavor and nutrition are difficult traits to modify via either traditional breeding or transgenic approaches due to their generally complex biosynthetic and regulatory pathways. Recent advances in genomics, bioinformatics and high-throughput technologies provide an opportunity to dissect the regulatory mechanisms of fruit nutrition and flavor through systems biology to reveal key regulatory steps and thus putative targets for breeding or engineering. Profiles of a total of more than 60 flavor and nutrition-related metabolites from multiple seasons in the ripe fruit tissues of a collection of 76 S. pennellii-derived ILs (23) and a collection of 89 S. habrochaites-derived ILs (24), in addition to their corresponding parental control lines, have been generated (5,6) and archived in TFGD. These ILs represent overlapping single introgressions of the S. pennellii and S. habrochaites genomes, respectively, into the S. lycopersicum genome. Metabolite profiles from multiple seasons were analyzed using two-way ANOVA tests followed by post hoc Dunnett’s tests to identify lines with significant metabolite content changes compared to parental controls. Based on these metabolite profiles, multiple loci affecting fruit nutrition and flavor have been identified (5,6). All the analyzed metabolite profile data were included in the database. Interfaces which allow users to efficiently retrieve profiles of a specific metabolite across all ILs in addition to metabolite profiles of a specific IL were implemented. Tools to identify ILs that display significant changes in a specific metabolite as well as ILs with specific metabolite properties were also implemented in the database. Transcriptome profiles of S. pennellii-derived ILs from the same tissue samples used for flavor and nutrition-related metabolite profile generation are all available in TFGD. Inclusion of both data types with tools that link them allows identification of novel genes involved in or regulating specific metabolic pathways using an integrated systems approach. To this end, a tool to correlate metabolite and transcript profiles by employing the Pearson or Spearman rank correlation coefficient to measure the similarity of profiles was implemented (Figure 2A). Using this tool, several meaningful and significant correlations between metabolite and gene expression profiles were identified (Figure 2B). In addition, a number of novel correlations have been observed. Based on these correlations, we have successfully identified a number of transcription factors associated with fruit metabolite levels and have functionally verified at least one which influences fruit ripening and carotenoid levels when repressed in transgenic tomato fruits (Lee and Giovannoni, submitted).

Figure 2.

Correlation analysis between gene expression and metabolite profiles in TFGD. (A) Interface of the correlation analysis. (B) An example of known correlations identified in the database: correlation between profiles of phytoene (green) and phytoene synthase (red) across 17 ILs (r = 0.655, P = 0.00432).

Tomato sRNA data

In the past few years, small RNAs (sRNAs) have been found to act as key regulators of cellular processes. They regulate gene expression by acting either on DNA to guide sequence elimination and chromatin remodeling or on RNA to guide cleavage and translation repression (25). Advances in high-throughput sequencing technologies have greatly accelerated the discovery and characterization of new classes of sRNAs including miRNA, ta-siRNA and nat-siRNA, as well as identification of their novel regulatory roles in diverse biological processes. Recently several large-scale sRNA data sets have been generated for tomato (7–9; http://smallrna.udel.edu). TFGD provides a central repository with tools to disseminate these sRNAs and assist in their analysis. Currently, the database contains approximately 15.4 million sRNA sequences that are mainly derived from fruit, leaf and flower tissues, representing more than 5.3 million unique sRNAs. The sRNAs were first annotated by comparing them to rRNA, tRNA and tomato repeat sequence databases. miRNA candidates were then identified using an in-house pipeline. In short, highly abundant sRNAs were first aligned to tomato genome sequences and the flanking sequences (200 bp on each side) of sRNAs were extracted and folded in silico using the RNAfold program (26). Resulting folded structures were then checked with miRcheck (27) to identify potential miRNA candidates. These miRNA candidates were further compared to miRBase (28) to identify conserved miRNAs. In addition, potential miRNA star sequences for each miRNA candidate were also identified from the tomato sRNA data set. Finally, miRNA targets were identified using a program we developed according to the scoring matrix described in Jones-Rhoades and Bartel (29). Expression profiles of the target genes, if available, were provided in the database by linking to the gene expression module. TFGD provides sRNA sequence data, annotations, their digital expression in each sample, and candidate miRNA information. For each candidate miRNA, the database provides several lines of evidence to determine the confidence that a given miRNA candidate is a true miRNA. Potential evidence includes the abundance of the candidate, whether the candidate is conserved with known miRNAs, and whether the miRNA candidate has corresponding miRNA star sequences, and predicted targets (Figure 3). Several query interfaces and tools have been developed to assist in exploring and analyzing the tomato sRNAs and miRNA candidates. Users can retrieve a specific family of miRNA candidates, as well as the most abundant sRNAs in each tissue. The database also allows users to compare their own sRNA sequences against the sRNAs archived in the database. Finally, a very useful tool to identify potential miRNA targets of user-supplied miRNA sequences and to identify tomato miRNAs that potentially target specific transcript sequences was developed and added to the database.

Figure 3.

Tomato miRNA candidate information in TFGD. (A) Abundances of a miRNA candidate in each sample. (B) Conservation between the miRNA candidate and known miRNAs. (C) miRNA candidate precursors and their secondary structures and corresponding miRNA star sequences. (D) Predicted miRNA targets. Tomato sRNAs were further aligned to EST/mRNA sequences, in order to identify small interference RNAs (siRNAs). A siRNA viewer was developed in the database which shows the distribution of siRNAs in each tomato gene (Figure 4). Expression profiles of siRNAs and their corresponding genes can be compared, which also helps in designing siRNAs to efficiently silence their target genes. In short, we have merged our sRNA and EST/gene expression functions to facilitate prediction and likelihood analysis of sRNA involvement in regulation of specific genes via a user driven interface.

Figure 4.

siRNA viewer in TFGD. siRNAs in red were aligned to mRNA in the forward direction while those in green were in the reverse direction.

FUTURE DIRECTIONS

The complex functions of a living cell are carried out through the concerted activity of many genes and gene products. This activity is often coordinated by the organization of the genome into regulatory modules, or sets of co-regulated genes that share a common function. Identifying these regulatory modules is crucial for understanding important cellular processes. For this purpose, we are identifying tomato regulatory modules using large-scale expression data sets archived in the database. In addition, with the recent completion of the tomato genome sequence, we are in the process of mapping probes on tomato arrays to the genome and extracting promoter sequences for every probe. Tools are being developed in the database to assist in the identification of regulatory motifs from sets of co-regulated genes. We will continue to collect and archive publicly available tomato microarray, metabolite profile and sRNA data sets, as well as incorporating emerging RNA-seq, proteomics and phenotypic data sets to insure capture and utilization of the full complement of public tomato genomics resources developed and released in the plant science community.

FUNDING

National Science Foundation (IOS-0501778 and IOS-0923312). Funding for open access charge: National Science Foundation. Conflict of interest statement. None declared.

26 in total

1. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal: Nat Genet Date: 2001-12 Impact factor: 38.330

2. GO::TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes.

Authors: Elizabeth I Boyle; Shuai Weng; Jeremy Gollub; Heng Jin; David Botstein; J Michael Cherry; Gavin Sherlock
Journal: Bioinformatics Date: 2004-08-05 Impact factor: 6.937

Review 3. Post-transcriptional small RNA pathways in plants: mechanisms and regulations.

Authors: Hervé Vaucheret
Journal: Genes Dev Date: 2006-04-01 Impact factor: 11.361

4. Metabolite profiling for plant functional genomics.

Authors: O Fiehn; J Kopka; P Dörmann; T Altmann; R N Trethewey; L Willmitzer
Journal: Nat Biotechnol Date: 2000-11 Impact factor: 54.908

5. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.

Authors: Evelyn Camon; Michele Magrane; Daniel Barrell; Vivian Lee; Emily Dimmer; John Maslen; David Binns; Nicola Harte; Rodrigo Lopez; Rolf Apweiler
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

6. NCBI GEO: archive for high-throughput functional genomic data.

Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Dmitry Rudnev; Carlos Evangelista; Irene F Kim; Alexandra Soboleva; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Rolf N Muertter; Ron Edgar
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

7. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

Authors: Koh Aoki; Kentaro Yano; Ayako Suzuki; Shingo Kawamura; Nozomu Sakurai; Kunihiro Suda; Atsushi Kurabayashi; Tatsuya Suzuki; Taneaki Tsugane; Manabu Watanabe; Kazuhide Ooga; Maiko Torii; Takanori Narita; Tadasu Shin-I; Yuji Kohara; Naoki Yamamoto; Hideki Takahashi; Yuichiro Watanabe; Mayumi Egusa; Motoichiro Kodama; Yuki Ichinose; Mari Kikuchi; Sumire Fukushima; Akiko Okabe; Tsutomu Arie; Yuko Sato; Katsumi Yazawa; Shinobu Satoh; Toshikazu Omura; Hiroshi Ezura; Daisuke Shibata
Journal: BMC Genomics Date: 2010-03-30 Impact factor: 3.969

8. Small RNAs in tomato fruit and leaf development.

Authors: Asuka Itaya; Ralf Bundschuh; Anthony J Archual; Je-Gun Joung; Zhangjun Fei; Xinbin Dai; Patrick X Zhao; Yuhong Tang; Richard S Nelson; Biao Ding
Journal: Biochim Biophys Acta Date: 2007-12-03

9. Identification of loci affecting flavour volatile emissions in tomato fruits.

Authors: Denise M Tieman; Michelle Zeigler; Eric A Schmelz; Mark G Taylor; Peter Bliss; Matias Kirst; Harry J Klee
Journal: J Exp Bot Date: 2006-02-10 Impact factor: 6.992

10. Identification of novel small RNAs in tomato (Solanum lycopersicum).

Authors: Rachel L Rusholme Pilcher; Simon Moxon; Nima Pakseresht; Vincent Moulton; Kenneth Manning; Graham Seymour; Tamas Dalmay
Journal: Planta Date: 2007-04-06 Impact factor: 4.540

50 in total

1. Quantitative peptidomics study reveals that a wound-induced peptide from PR-1 regulates immune signaling in tomato.

Authors: Ying-Lan Chen; Chi-Ying Lee; Kai-Tan Cheng; Wei-Hung Chang; Rong-Nan Huang; Hong Gil Nam; Yet-Ran Chen
Journal: Plant Cell Date: 2014-10-31 Impact factor: 11.277

2. Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches.

Authors: Atsushi Fukushima; Tomoko Nishizawa; Mariko Hayakumo; Shoko Hikosaka; Kazuki Saito; Eiji Goto; Miyako Kusano
Journal: Plant Physiol Date: 2012-02-03 Impact factor: 8.340

Review 3. Reuse of public genome-wide gene expression data.

Authors: Johan Rung; Alvis Brazma
Journal: Nat Rev Genet Date: 2012-12-27 Impact factor: 53.242

4. Identification and validation of a virus-inducible ta-siRNA-generating TAS4 locus in tomato.

Authors: Archana Singh; Shradha Saraf; Indranil Dasgupta; Sunil Kumar Mukherjee
Journal: J Biosci Date: 2016-03 Impact factor: 1.826

5. Comprehensive Transcriptome Analyses Reveal that Potato Spindle Tuber Viroid Triggers Genome-Wide Changes in Alternative Splicing, Inducible trans-Acting Activity of Phased Secondary Small Interfering RNAs, and Immune Responses.

Authors: Yi Zheng; Ying Wang; Biao Ding; Zhangjun Fei
Journal: J Virol Date: 2017-05-12 Impact factor: 5.103

Review 6. Functional genomics of tomato: opportunities and challenges in post-genome NGS era.

Authors: Rahul Kumar; Ashima Khurana
Journal: J Biosci Date: 2014-12 Impact factor: 1.826

7. Tissue- and cell-type specific transcriptome profiling of expanding tomato fruit provides insights into metabolic and regulatory specialization and cuticle formation.

Authors: Antonio J Matas; Trevor H Yeats; Gregory J Buda; Yi Zheng; Subhasish Chatterjee; Takayuki Tohge; Lalit Ponnala; Avital Adato; Asaph Aharoni; Ruth Stark; Alisdair R Fernie; Zhangjun Fei; James J Giovannoni; Jocelyn K C Rose
Journal: Plant Cell Date: 2011-11-01 Impact factor: 11.277