Literature DB >> 19029140

Update of the Diatom EST Database: a new tool for digital transcriptomics.

Uma Maheswari1, Thomas Mock, E Virginia Armbrust, Chris Bowler.   

Abstract

The Diatom Expressed Sequence Tag (EST) Database was constructed to provide integral access to ESTs from these ecologically and evolutionarily interesting microalgae. It has now been updated with 130,000 Phaeodactylum tricornutum ESTs from 16 cDNA libraries and 77,000 Thalassiosira pseudonana ESTs from seven libraries, derived from cells grown in different nutrient and stress regimes. The updated relational database incorporates results from statistical analyses such as log-likelihood ratios and hierarchical clustering, which help to identify differentially expressed genes under different conditions, and allow similarities in gene expression in different libraries to be investigated in a functional context. The database also incorporates links to the recently sequenced genomes of P. tricornutum and T. pseudonana, enabling an easy cross-talk between the expression pattern of diatom orthologs and the genome browsers. These improvements will facilitate exploration of diatom responses to conditions of ecological relevance and will aid gene function identification of diatom-specific genes and in silico gene prediction in this largely unexplored class of eukaryotes. The updated Diatom EST Database is available at http://www.biologie.ens.fr/diatomics/EST3.

Entities:  

Mesh:

Year:  2008        PMID: 19029140      PMCID: PMC2686495          DOI: 10.1093/nar/gkn905

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Diatoms are globally distributed, eukaryotic brown microalgae that participate in various biogeochemical cycles and play key roles in maintaining the ecological balance of the earth. They are major contributors to global primary production and CO2 sequestration (1,2), and are also receiving attention as a potential source of biofuels (3). They fall within the heterokont branch of the eukaryotic tree (4) and are believed to have evolved from a secondary endosymbiotic process (5–7). The molecular and cellular biology of diatoms is dramatically underexplored. Previous Expressed Sequence Tag (EST) studies (8,9) together with the first whole genome sequences from diatoms, Thalassiosira pseudonana (10) and Phaeodactylum tricornutum (11), have shown that less than 50% of diatom genes can be assigned a putative function using homology-based methods, due to the lack of genomic information from well studied taxonomically related organisms. Similar observations were also made in a pilot study of ESTs derived from the polar diatom Fragilariopsis cylindrus grown at low temperature (12). Our earlier diatom EST database (9) enabled comparative studies of eukaryotic algal genomes and revealed some interesting differences in genes involved in basic cell metabolism (13,14). It also aided the study of key signalling and regulatory pathways (15), silica metabolism (16,17), nitrogen metabolism (18) and carbohydrate metabolism (19). Furthermore, elucidation of the functions of diatom-specific genes can be facilitated by identifying conditions in which they are expressed. Non normalized EST libraries made from cells grown in different growth conditions can therefore provide a good dataset for comparative, functional as well as phylogenetic studies. For example, comparative study of the mRNAs expressed under different conditions can provide a systematic exploration of the molecular adaptations of a cell by differential gene expression. As a case in point, EST collections derived from cells grown under different conditions have proven to be a good tool for transcriptomics studies and genome annotation in the green alga Chlamydomonas reinhardtii (20–24). By comparing the expression profiles from more than one growth condition, differential gene expression studies can therefore provide a useful means to explore diatom gene function and genome annotation. In this update we describe EST collections derived from diatom cells grown under different conditions and statistical methods used to explore gene expression. This digital gene expression database contains more than 200 000 ESTs from the two recently sequenced diatom genomes, T. pseudonana (10) and P. tricornutum (11). T. pseudonana is a centric diatom and has been a model organism for physiological studies of widely distributed species belonging to the order Thalassiosirales. P. tricornutum is a pennate diatom for which a range of reverse genetics tools have been generated (25), therefore making it a good model for functional genomic studies. The sequenced diatoms revealed many interesting features of diatom genes and metabolic pathways, although comparative studies also revealed a high level of molecular divergence (11,15). Bearing in mind these striking differences, the updates in the Diatom EST Database described here provide key insights into differential gene expression in diatoms grown in a range of ecologically relevant conditions.

DATA SOURCES AND DATABASE CONSTRUCTION

The Diatom EST Database was initially made with 12 136 ESTs from P. tricornutum and 15 174 ESTs from T. pseudonana, each obtained from a single growth condition (9). These libraries were expanded with 120 411 ESTs from P. tricornutum and 61 913 ESTs from T. pseudonana obtained from cells grown in 15 and 6 additional growth conditions, respectively. The new sets of ESTs were subjected to preliminary analysis such as vector clipping, quality control, etc. (9) and sequence assembly and redundancy checking was then done in two steps. First, the ESTs were clustered together with the predicted gene models from their respective genomes (http://genome.jgi-psf.org/Thaps3/Thaps3.home.html and http://genome.jgi-psf.org/Phatr2/Phatr2.home.html). We were able to assign 120 575 ESTs to 8944 of the 10 402 gene models in P. tricornutum and 43 114 ESTs to 7268 of the 11 776 gene models in T. pseudonana using the BLASTN programme (cut-off e-value 10–10) (26). These 8944 and 7268 transcriptional units (TUs) with predicted gene models were directly added to the non-redundant transcript sets with new sequence identifiers containing ‘G’ as a prefix along with the gene model identifier, e.g., G10065 for gene model 10065. The number of ESTs clustering to each gene model gives the redundancy or cluster size of the transcript. Secondly, transcripts which did not have a predicted gene model (11 513 ESTs from P. tricornutum and 18 073 ESTs from T. pseudonana), mainly due to the fact that ESTs from only a few libraries were used for training the gene prediction programmes (11), were subjected to analysis by CAP3 (27). Sequences with greater than 95% identity over a region longer than 30 base pairs were clustered using this programme and we thus obtained 1330 contigs and 2096 singletons for P. tricornutum and 1769 contigs and 2039 singletons for T. pseudonana. These were added to the non-redundant transcript set with sequence identifiers starting with ‘C’ for contigs and ‘S’ for the singletons. Adding the TUs with gene models to the contigs and singletons obtained from CAP3, we counted 12 370 non-redundant TUs in P. tricornutum and 11 076 TUs in T. pseudonana. Among the non-redundant TUs which do not have a predicted gene model, we found only 612 TUs in P. tricornutum and 1 083 TUs in T. pseudonana that do not align in their respective genomes, likely because of remaining gaps in the genome sequences. The contribution of ESTs from different libraries to the cluster size of each TU gives the abundance of each expressed transcript across different libraries. The counts were normalized to the library size by converting the counts to frequencies, which allows a statistical comparison to be made of expression levels of transcripts in different conditions. Specifically, the log-likelihood ratio was calculated for each contig (28) to statistically validate whether a difference in frequency across different libraries was random or due to differential expression. The database schematized in Figure 1 provides access to frequency distribution plots (Figure 1E) and log-likelihood ratios (R-values) for each TU, which are catalogued by library (Figure 1C) as well as across libraries (Figure 1D and H). Figure 1E shows an example of a TU with high R-value (i.e. a gene that is strongly differentially expressed in the conditions tested). By cataloguing the TUs based on their R-values we were then able to identify transcripts that are differentially expressed under each condition. For example, transcripts expressed during iron limitation served as a useful starting point to explore the molecular response of P. tricornutum to life at low iron concentrations (29), providing experimental validation of our statistical methods. TUs were also subjected to hierarchical clustering (30) to identify transcripts with similar expression profiles in the different conditions. These analyses together with relevant functional information were visualized using Java Treeview (31). Figure 1F shows a screen shot of hierarchical clustering (30) of P. tricornutum contigs.
Figure 1.

Overview of the updated Diatom EST Database.

Overview of the updated Diatom EST Database. The updated dataset and the accompanying results are stored in upgraded servers with the Linux Debian ‘etch’ platform in DELL1850 hosting the relational database PostgreSQL 8.3 and DELL1855 with the web server Apache 2.0 and PHP 5. The relational database was migrated to postgreSQL for faster access and to enable the dynamic clustering of expression data. The new web interface is also linked to the gene models on the JGI diatom genome browsers (http://genome.jgi-psf.org/Thaps3/Thaps3.home.html and http://genome.jgi-psf.org/Phatr2/Phatr2.home.html), which enables the user to have direct access to annotation and gene structure for each TU (Figure 1G).

DATABASE CONTENTS AND WEB INTERFACE

The database provides access to details of each cDNA library and corresponding growth conditions (Figure 1A). The raw sequences are catalogued by library and each raw sequence table gives access to DNA sequence, length and BLAST output. These tables also provide links to the TU that each sequence belongs to. The contig tables give access to the TU of each library, catalogued based on the abundance of ESTs in each condition (Figure 1C). The cluster size of each TU is linked to the dynamically generated frequency plot (Figure 1E), which enables comparison of expression levels in the other libraries. This table also shows R-values and the best BLAST results. The expression of each TU across all the libraries can be accessed by two different methods (Figure 1D), either in tabular form (Figure 1H) or as a hierarchical cluster visualized using Java Treeview (Figure 1F). The tabular view gives access to all TUs expressed more than once in any given condition and they are catalogued based on cluster size, which is again linked to each frequency plot (Figure 1E). This table also provides a link to the ortholog if present in the other diatom and its expression profile, as well as the corresponding gene models hyperlinked to the genome databases hosted at JGI, providing access to further functional annotation and visualization of neighbouring genes (Figure 1G). The Java Treeview visualizes the two-way hierarchical clustering of all the transcripts which are expressed more than once, helping to identify libraries that cluster together and transcripts with similar expression patterns. The annotations for each TU are hyperlinked to the frequency plots and to the JGI genome browsers. The new web interface is inspired by Google, having a simplified, self-explanatory look and easy retrieval of data. The database is queryable by keyword, based on annotation from homology search methods and the TU identifier, and sequence retrieval is possible by using either the sequence identifier or TU identifier. Homology searches, using BLAST against each library and the total non-redundant sets are also available via the web interface.

FUTURE DIRECTIONS

The diatom genomic repository is rapidly expanding with several sequencing projects. For example, the genomes of two additional pennate diatoms, Pseudo-nitzschia multiseries and F. cylindrus, are currently nearing completion at JGI, together with accompanying EST collections. The database analysis and pipeline described here are semi-automated and can easily incorporate these and other data sets from diatoms and related species. Pilot microarray projects in T. pseudonana and P. tricornutum have already provided experimental validation for this EST-based digital transcriptomics database under some conditions (29,32) and possibilities to link microarray based studies to the existing database are currently being explored, as is the incorporation of transcriptomics data from massively parallel sequencing platforms. Reverse genetics studies are providing additional experimental validation for the expression, localization and functions of individual TUs (33) and so information derived from the database can also be used to train the gene prediction programmes to improve in silico gene annotation in diatoms and related organisms.

AVAILABILITY

The Diatom EST database is freely available on the web at http://www.biologie.ens.fr/diatomics/EST3. The P. tricornutum ESTs have been submitted to the NCBI dbEST (Genbank accession numbers CD374840–CD384835, BI306757BI307753, CD374840–CD384835, BI306757BI307753, CT868744–CT950687 and CU695349–CU740080). Requests for bulk queries of the expression data and to house EST data from other diatoms can be addressed to Dr Chris Bowler.

FUNDING

Partial funding for the Diatom EST Database was obtained from the EU-funded Diatomics (LSHG-CT-2004-512035) and Marine Genomics Europe projects (GOCE-CT-2004-505403) and the Agence Nationale de la Recherche. P. tricornutum ESTs were funded by Genoscope (Evry, Paris). Generation of T. pseudonana ESTs was funded by a Gordon and Betty Moore Foundation Marine Microbiology Investigator Award (EVA). Funding for open access charge: Centre National de la Recherche Scientifique. Conflict of interest statement. None declared.
  28 in total

1.  Java Treeview--extensible visualization of microarray data.

Authors:  Alok J Saldanha
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  Global biodiversity patterns of marine phytoplankton and zooplankton.

Authors:  Xabier Irigoien; Jef Huisman; Roger P Harris
Journal:  Nature       Date:  2004-06-24       Impact factor: 49.962

3.  The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism.

Authors:  E Virginia Armbrust; John A Berges; Chris Bowler; Beverley R Green; Diego Martinez; Nicholas H Putnam; Shiguo Zhou; Andrew E Allen; Kirk E Apt; Michael Bechner; Mark A Brzezinski; Balbir K Chaal; Anthony Chiovitti; Aubrey K Davis; Mark S Demarest; J Chris Detter; Tijana Glavina; David Goodstein; Masood Z Hadi; Uffe Hellsten; Mark Hildebrand; Bethany D Jenkins; Jerzy Jurka; Vladimir V Kapitonov; Nils Kröger; Winnie W Y Lau; Todd W Lane; Frank W Larimer; J Casey Lippmeier; Susan Lucas; Mónica Medina; Anton Montsant; Miroslav Obornik; Micaela Schnitzler Parker; Brian Palenik; Gregory J Pazour; Paul M Richardson; Tatiana A Rynearson; Mak A Saito; David C Schwartz; Kimberlee Thamatrakoln; Klaus Valentin; Assaf Vardi; Frances P Wilkerson; Daniel S Rokhsar
Journal:  Science       Date:  2004-10-01       Impact factor: 47.728

4.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

5.  A diatom gene regulating nitric-oxide signaling and susceptibility to diatom-derived aldehydes.

Authors:  Assaf Vardi; Kay D Bidle; Clifford Kwityn; Donald J Hirsh; Stephanie M Thompson; James A Callow; Paul Falkowski; Chris Bowler
Journal:  Curr Biol       Date:  2008-06-05       Impact factor: 10.834

6.  Expressed sequence tags with cDNA termini: previously overlooked resources for gene annotation and transcriptome exploration in Chlamydomonas reinhardtii.

Authors:  Chun Liang; Yuansheng Liu; Lin Liu; Adam C Davis; Yingjia Shen; Qingshun Quinn Li
Journal:  Genetics       Date:  2008-05       Impact factor: 4.562

7.  The Phaeodactylum genome reveals the evolutionary history of diatom genomes.

Authors:  Chris Bowler; Andrew E Allen; Jonathan H Badger; Jane Grimwood; Kamel Jabbari; Alan Kuo; Uma Maheswari; Cindy Martens; Florian Maumus; Robert P Otillar; Edda Rayko; Asaf Salamov; Klaas Vandepoele; Bank Beszteri; Ansgar Gruber; Marc Heijde; Michael Katinka; Thomas Mock; Klaus Valentin; Fréderic Verret; John A Berges; Colin Brownlee; Jean-Paul Cadoret; Anthony Chiovitti; Chang Jae Choi; Sacha Coesel; Alessandra De Martino; J Chris Detter; Colleen Durkin; Angela Falciatore; Jérome Fournet; Miyoshi Haruta; Marie J J Huysman; Bethany D Jenkins; Katerina Jiroutova; Richard E Jorgensen; Yolaine Joubert; Aaron Kaplan; Nils Kröger; Peter G Kroth; Julie La Roche; Erica Lindquist; Markus Lommer; Véronique Martin-Jézéquel; Pascal J Lopez; Susan Lucas; Manuela Mangogna; Karen McGinnis; Linda K Medlin; Anton Montsant; Marie-Pierre Oudot-Le Secq; Carolyn Napoli; Miroslav Obornik; Micaela Schnitzler Parker; Jean-Louis Petit; Betina M Porcel; Nicole Poulsen; Matthew Robison; Leszek Rychlewski; Tatiana A Rynearson; Jeremy Schmutz; Harris Shapiro; Magali Siaut; Michele Stanley; Michael R Sussman; Alison R Taylor; Assaf Vardi; Peter von Dassow; Wim Vyverman; Anusuya Willis; Lucjan S Wyrwicz; Daniel S Rokhsar; Jean Weissenbach; E Virginia Armbrust; Beverley R Green; Yves Van de Peer; Igor V Grigoriev
Journal:  Nature       Date:  2008-10-15       Impact factor: 49.962

8.  Genome properties of the diatom Phaeodactylum tricornutum.

Authors:  Simona Scala; Nicolas Carels; Angela Falciatore; Maria Luisa Chiusano; Chris Bowler
Journal:  Plant Physiol       Date:  2002-07       Impact factor: 8.340

9.  Whole-cell response of the pennate diatom Phaeodactylum tricornutum to iron starvation.

Authors:  Andrew E Allen; Julie Laroche; Uma Maheswari; Markus Lommer; Nicolas Schauer; Pascal J Lopez; Giovanni Finazzi; Alisdair R Fernie; Chris Bowler
Journal:  Proc Natl Acad Sci U S A       Date:  2008-07-24       Impact factor: 11.205

10.  A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis.

Authors:  Peter G Kroth; Anthony Chiovitti; Ansgar Gruber; Veronique Martin-Jezequel; Thomas Mock; Micaela Schnitzler Parker; Michele S Stanley; Aaron Kaplan; Lise Caron; Till Weber; Uma Maheswari; E Virginia Armbrust; Chris Bowler
Journal:  PLoS One       Date:  2008-01-09       Impact factor: 3.240

View more
  24 in total

Review 1.  Agrigenomics for microalgal biofuel production: an overview of various bioinformatics resources and recent studies to link OMICS to bioenergy and bioeconomy.

Authors:  Namrata Misra; Prasanna Kumar Panda; Bikram Kumar Parida
Journal:  OMICS       Date:  2013-09-17

2.  Potential impact of stress activated retrotransposons on genome evolution in a marine diatom.

Authors:  Florian Maumus; Andrew E Allen; Corinne Mhiri; Hanhua Hu; Kamel Jabbari; Assaf Vardi; Marie-Angèle Grandbastien; Chris Bowler
Journal:  BMC Genomics       Date:  2009-12-22       Impact factor: 3.969

3.  An Emerging Model Diatom to Study Nitrogen Metabolism.

Authors:  Gregory Bertoni
Journal:  Plant Cell       Date:  2017-08-21       Impact factor: 11.277

4.  Membrane glycerolipid remodeling triggered by nitrogen and phosphorus starvation in Phaeodactylum tricornutum.

Authors:  Heni Abida; Lina-Juana Dolch; Coline Meï; Valeria Villanova; Melissa Conte; Maryse A Block; Giovanni Finazzi; Olivier Bastien; Leïla Tirichine; Chris Bowler; Fabrice Rébeillé; Dimitris Petroutsos; Juliette Jouhet; Eric Maréchal
Journal:  Plant Physiol       Date:  2014-12-08       Impact factor: 8.340

5.  Characterization of a trimeric light-harvesting complex in the diatom Phaeodactylum tricornutum built of FcpA and FcpE proteins.

Authors:  Jidnyasa Joshi-Deo; Matthias Schmidt; Ansgar Gruber; Wolfram Weisheit; Maria Mittag; Peter G Kroth; Claudia Büchel
Journal:  J Exp Bot       Date:  2010-05-17       Impact factor: 6.992

6.  Digital expression profiling of novel diatom transcripts provides insight into their biological functions.

Authors:  Uma Maheswari; Kamel Jabbari; Jean-Louis Petit; Betina M Porcel; Andrew E Allen; Jean-Paul Cadoret; Alessandra De Martino; Marc Heijde; Raymond Kaas; Julie La Roche; Pascal J Lopez; Véronique Martin-Jézéquel; Agnès Meichenin; Thomas Mock; Micaela Schnitzler Parker; Assaf Vardi; E Virginia Armbrust; Jean Weissenbach; Michaël Katinka; Chris Bowler
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

7.  A gene in the process of endosymbiotic transfer.

Authors:  Kateřina Jiroutová; Luděk Kořený; Chris Bowler; Miroslav Oborník
Journal:  PLoS One       Date:  2010-10-06       Impact factor: 3.240

8.  Phylogenomic analysis of the Chlamydomonas genome unmasks proteins potentially involved in photosynthetic function and regulation.

Authors:  Arthur R Grossman; Steven J Karpowicz; Mark Heinnickel; David Dewez; Blaise Hamel; Rachel Dent; Krishna K Niyogi; Xenie Johnson; Jean Alric; Francis-André Wollman; Huiying Li; Sabeeha S Merchant
Journal:  Photosynth Res       Date:  2010-05-20       Impact factor: 3.573

9.  Genome-wide analysis of the diatom cell cycle unveils a novel type of cyclins involved in environmental signaling.

Authors:  Marie J J Huysman; Cindy Martens; Klaas Vandepoele; Jeroen Gillard; Edda Rayko; Marc Heijde; Chris Bowler; Dirk Inzé; Yves Van de Peer; Lieven De Veylder; Wim Vyverman
Journal:  Genome Biol       Date:  2010-02-08       Impact factor: 13.583

10.  Gene silencing in the marine diatom Phaeodactylum tricornutum.

Authors:  Valentina De Riso; Raffaella Raniello; Florian Maumus; Alessandra Rogato; Chris Bowler; Angela Falciatore
Journal:  Nucleic Acids Res       Date:  2009-05-31       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.