| Literature DB >> 32265943 |
Richard Barker1, Jonathan Lombardino1,2, Kai Rasmussen1, Simon Gilroy1.
Abstract
Recent advances in the routine access to space along with increasing opportunities to perform plant growth experiments on board the International Space Station have led to an ever-increasing body of transcriptomic, proteomic, and epigenomic data from plants experiencing spaceflight. These datasets hold great promise to help understand how plant biology reacts to this unique environment. However, analyses that mine across such expanses of data are often complex to implement, being impeded by the sheer number of potential comparisons that are possible. Complexities in how the output of these multiple parallel analyses can be presented to the researcher in an accessible and intuitive form provides further barriers to such research. Recent developments in computational systems biology have led to rapid advances in interactive data visualization environments designed to perform just such tasks. However, to date none of these tools have been tailored to the analysis of the broad-ranging plant biology spaceflight data. We have therefore developed the Test Of Arabidopsis Space Transcriptome (TOAST) database (https://astrobiology.botany.wisc.edu/astrobotany-toast) to address this gap in our capabilities. TOAST is a relational database that uses the Qlik database management software to link plant biology, spaceflight-related omics datasets, and their associated metadata. This environment helps visualize relationships across multiple levels of experiments in an easy to use gene-centric platform. TOAST draws on data from The US National Aeronautics and Space Administration's (NASA's) GeneLab and other data repositories and also connects results to a suite of web-based analytical tools to facilitate further investigation of responses to spaceflight and related stresses. The TOAST graphical user interface allows for quick comparisons between plant spaceflight experiments using real-time, gene-specific queries, or by using functional gene ontology, Kyoto Encyclopedia of Genes and Genomes pathway, or other filtering systems to explore genetic networks of interest. Testing of the database shows that TOAST confirms patterns of gene expression already highlighted in the literature, such as revealing the modulation of oxidative stress-related responses across multiple plant spaceflight experiments. However, this data exploration environment can also drive new insights into patterns of spaceflight responsive gene expression. For example, TOAST analyses highlight changes to mitochondrial function as likely shared responses in many plant spaceflight experiments.Entities:
Keywords: Arabidopsis thaliana; Qlik; RNAseq; bioinformatics; microarray; proteomics; spaceflight; transcriptomics
Year: 2020 PMID: 32265943 PMCID: PMC7076552 DOI: 10.3389/fpls.2020.00147
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Publicly available spaceflight transcriptomics datasets. (A) Relationships between species, ecotype, genotype (i.e., mutant or wild type) growth environment and assay technique for datasets from plants experiencing spaceflight. (B) Relationships between species, ecotypeand genotype versus the tissue or organ type that was sampled to generate the tspaceflight-related transcriptomics dataset. Col-0, Columbia ecotype of Arabidopsis thaliana; Ws, Wassilewskija ecotype; Ler, Landsberg erecta ecotype; Cvi, Cape Verdi Islands ecotype; mutants of Arabidopsis: phyD, Phytochrome D; arg, Altered Response to Gravity; act2, Actin 2; phyD, phytochrome D; hsfa2, heat shock factor a2, Wt, wild type; BRIC, Biological Research in Canister; ABRS, Advanced Biological Research System; EMCS, European Modular Cultivation System.
Figure 2Database structure underlying TOAST 4.5. Each dataset within TOAST includes a series of pre-computed factors for each gene: minimally including fold-change, P-value, Q-value, and a yes/no value for whether the fold-change for each gene is significant at P < 0.05. These pre-computed values greatly speed the real-time processing of interactive visualizations within the TOAST user interface. The identifiers in the raw data, such as Transcript ID from RNAseq, Probe ID for Microarray, or TAIR ID are translated to their unique Entrez and Ensembl IDs to allow for uniform indexing within TOAST itself and to facilitate passing of analyzed data produced by TOAST analyses to exterior sites and tools. Within TOAST, the strings of molecular ID's from a dataset are both directly transferred to a series of data visualization and exploration tools and are imported into a series of analytical packages accessing a range of databases that have been imported into the TOAST environment. These databases include: the Genome Ontology (GO) consortium databases that allow analysis of the relationships between gene lists of interest and known biological processes, the SUBA4 database which catalogs predicted subcellular locales for each gene, the Kyoto Encyclopedia of Genes and Genomes (KEGG) database that analyzes relationships to known cellular pathways, and Ensembl's Orthologous Matrix database, allowing TOAST to make comparisons between species. The outputs of these analytical modules are then passed to TOAST's interactive data visualization tools to help explore each dataset. Results from the visualizations are in turn returned as lists of Gene IDs to allow for reiterative analyses.
Figure 3The TOAST 4.5 user interface. (A) The web interface for TOAST launches an overview menu of dashboard icons allowing the user to directly access the introductory materials, omics data, and related analysis tools. (B) Each icon provides a visual summary of the data or tools that it links to including elements such as spaceflight vehicle (e.g., Shuttle, ISS, Shenzou vs ground-based experimentation), the growth hardware used, plant/seedling vs cell culture experiment, RNAseq vs microarray vs proteomics, species and ecotype and dataset identifier (e.g., GLDS number).
Figure 4Graphical user interface for typical dataset. Clicking on a volcano plot also activates an interactive graphical tool for manual selection of groups of genes of interest. *Defaults to showing 33.43K, i.e., all Entrez identifiers, until a filter or gene selection is applied. Inset, a lasso tool allows user selection of data points from volcano plot in addition to activation of filters such as on significance of change, KEGG Pathway, or GO annotation.
Figure 5Overview of use of the TOAST 4.5 database. (1a) The user selects an initial study of interest and then can review the summary of its metadata to ensure it is the correct focus for study (1b). The dataset is then opened and (2) when the study is selected an interactive dashboard launches and the user has a direct link to any associated manuscript. Gene filtering: statistical (3a), gene ontology (3b), and other related functional filters can be applied to focus the number of loci being visualized in the volcano plot (3c) to genes of interest. In addition, the volcano plot itself can be interactively manually filtered using a graphical selection tool. All filters can be toggled on and off using selectable tabs at the top of the interface (3d). If an interesting subset of loci are selected the user can activate the download option (4a) and save the related data in word or xml format (4b). (5) The user can also perform further bioinformatic and statistical analysis with other online tools linked from the main user interface.
Figure 6Analysis of metadata within the TOAST 4.5. (A) Initial dashboards allow access to comparisons between a range of experiment-related factors such as lighting conditions, growth environment, and plant genotypes. (B) A typical dashboard for metadata exploration, in this case for light conditions and age of seedling. Preset filters for e.g., lab group performing the research and growth and radiation environments are available to the user and the identity of the filtered datasets is shown in the bottom left window.
Figure 7TOAST confirms the “high light early” ROS response from spaceflight data. The “high light early” clade in the ROS wheel analysis represents 8.87K transcripts from a total of 21.33K transcripts detected, or 41.5% of all transcripts.
Figure 8Analysis of mitochondrion-related genes altered by spaceflight. (A) Screenshot depicting an example of a user's interaction with the TOAST graphical user interface to define mitochondrion-related transcripts. (B) Using TOAST for iterative filtering of differentially expressed genes across multiple spaceflight studies where plants were light grown. (C) More extensive analysis of the studies in (B) using differentiation within the individual datasets for different analytical approaches (microarray vs RNAseq) and for different analysis periods (4 days vs 8 days). (D) Similar analysis but for dark grown plant samples. (E) The effects of spaceflight on the alternative oxidase gene family in dark grown samples. Maximum likelihood tree of AOX gene family generated using ClustalW alignment with Mega-X software (www.megasoftware.net). Venn diagrams plotted using jvenn (Bardou et al., 2014).
| Acronym/term | Name | Definition | Reference |
|---|---|---|---|
| ABRS | Advanced Biological Research System | NASA on-orbit growth facility that provided LED lighting and sample photography | ( |
| Affymetrix microarray | – | Microarray to monitor patterns of gene expression produced by Affymetrix Inc. | – |
| AGI | Arabidopsis Gene Initiative | Consortium of researchers studying the genome of | ( |
| AGRIS AtTFD | AGRIS Arabidopsis Transcription Factor database | A searchable database of ~1770 | ( |
| ATTED II |
| A database cataloging plant gene co-expression data | ( |
| BAM | Binary compressed sequence Alignment Map | A file containing information on the alignment of each read from a DNA sequencing machine to the genome of a target organism | – |
| BRIC | Biological Research in Canister | Spaceflight hardware allowing for plant growth on orbit. Samples are sealed prior to launch. Lighting provided only in the BRIC-LED version | |
| CATdb | – | A repository of transcriptome data for | ( |
| CATMA microarray | Complete Arabidopsis Transcript MicroArray | Microarray to monitor patterns of gene expression using technology developed by the European CATMA initiative. | ( |
| CPM | counts per million reads mapped | In RNAseq: the counts of number of reads per gene scaled to the number of fragments sequenced. Unlike FPKM (see below), this value is not normalized for the effects of gene length or amount of sequencing on count number per gene. | – |
| CyVerse | – | A cloud computing infrastructure supported through the National Science Foundation's Directorate of Biological Sciences. | ( |
| D3 JavaScript | – | A library of routines for the Javascript programming language that enables interactive data visualizations within a web browser. | – |
| DESeq | – | An analysis tool for calculating differential gene expression. | ( |
| EdgeR | Empirical analysis of Digital Gene Expression in R | An analysis tool calculating differential gene expression. | ( |
| eFP-Seq Browser | – | An RNA-seq data exploration and visualization tool. | ( |
| EMBL EBI Expression atlas | – | A database of patterns of gene expression under different conditions. Maintained by The European Molecular Biology Laboratory's (EMBL) European Bioinformatics Institute (EBI). | ( |
| EMCS | European Modular Cultivation System | On-orbit growth hardware developed by the European Space Agency. Provides an on-board centrifuge, video and lighting, temperature and atmospheric control. | ( |
| Ensembl | – | A database of genome-related information maintained by the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, | ( |
| Entrez | – | The US National Center for Biotechnology Information (NCBI)'s database for gene-specific information | ( |
| ePlant | – | A portal that provides access to multiple web services to download genome-level data on plant genes. | ( |
| Expression Angler | – | A tool that finds other genes with similar expression patterns to a gene of interest. | ( |
| FASTQ file | – | File containing the nucleotide sequences identified by next generation nucleotide sequencing machines | – |
| FPKM | Fragments Per Kilobase of transcript per Million mapped reads | An estimation of gene expression based on RNA-sequencing data that is normalizing for gene length and the amount of sequencing (longer and more heavily sequenced genes will naturally produce more reads independent of their expression level). | – |
| Gene Symbol | – | Commonly used gene name such as AOX1A to denote the Arabidopsis gene | – |
| GeneLab | – | A repository for spaceflight-related ‘omics-level data administered by the US National Aeronautics and Space Administration (NASA). | – |
| Genemania | Gene Multiple Association Network Integration Algorithm | A tool that generates a single functional interaction network for a gene of interest drawing on multiple data sources. | ( |
| Genevisble | – | A search portal to curated expression data from the GENEVESTIGATOR database | ( |
| GEO | Gene Expression Omnibus | A functional genomics data repository administered by US National Center for Biotechnology Information (NCBI). | – |
| GLDS | GeneLab Dataset | Unique identifier of a dataset (usually microarray, RNAseq or proteomics data) deposited in NASA's GeneLab data repository | – |
| GO | Gene Ontology | Descriptive terms drawn from a standard set that classify genes dependent on their relationships to biological processes or functions or subcellular locales. | ( |
| GO Enrichment analysis | Gene Ontology Enrichment analysis | Statistical analysis of dataset as to whether there is an over-representation of genes associated with a particular biological process or function, or cellular locale relative to that expected from a random selection of the same number of genes. | ( |
| HZE | – | High-charge, high-energy radiation. | – |
| iDEP | integrated Differential Expression and Pathway analysis | Software package for the R programming language designed to process genetic data. | ( |
| KEGG | Kyoto Encyclopedia of Genes and Genomes | A widely used database that categorizes genes into the cellular pathways in which they are involved. | ( |
| Metadata | – | Additional data about parameters and conditions that adds to the description of each experiment and provides context for interpreting results. | – |
| microRNA annotation TAIR10 | – | A database of microRNAs predicted in the genome of | ( |
| NCBI | The National Center for Biotechnology Information | Part of the National Library of Medicine that is run by the US National Institutes of Health. This unit maintains a series of databases relevant to biological research | – |
| NCBI PubMed | – | Online aggregator of scientific publications curated by NCBI | – |
| OM | Orthologous Matrix | A table linking gene identifiers in one species to orthologous genes in a different species | |
| Ortholog | – | Related genes between species that originated from a common ancestral gene prior to speciation | – |
| P-value vs Q-value | – | In transcriptomics: P-value is the statistical significance that a gene is differentially expressed when comparing between treatments; Q-value is an adjusted P-value, taking in to account the cumulative effect of making multiple comparisons (tests of significance) within a dataset, such as across many genes. | |
| Promomer | – | A tool for identifying promoter elements | ( |
| Qlik | – | Database management software | – |
| R | – | Programming language widely used in the statistical analysis of scientific data. | – |
| R-Shiny | – | An R software package that allows for easy development of interactive web-based applications. | – |
| R-studio | – | Commercially produced software that aids with the development of programs using R. | – |
| Reactome | – | A curated and peer-reviewed molecular pathway database | ( |
| RMA | Robust Multi-array Average | An algorithm used to normalize microarray data between multiple microarray chips | ( |
| RNA-seq | – | High-throughput sequencing of RNA. | – |
| ROS-wheel | – | A meta-analysis of many publicly available microarray experiments related to responses to reactive oxygen species (ROS) and oxidative stress. | ( |
| SIMBOX | Science In Microgravity BOX | An on-orbit experiment facility developed by the German Aerospace Center's (DLR) Space Administration. Contains an internal centrifuge and lighting and temperature control. | ( |
| STRING | – | A database and web tool for visualizing protein:protein interaction networks. | ( |
| SUBA4 | The SUBcellular location database for Arabidopsis | Database of predicted subcellular locations for a given gene product. | ( |
| TAIR | The Arabidopsis Information Resource | A database of genetic and molecular biology data focused on | ( |
| TAIR9/TAIR10 | The Arabidopsis Genome Annotation Version 9 or 10 | Annotated versions of the sequenced Arabidopsis genome produced by TAIR. Each successive version has used newer information to improve the annotation of the entire genome. | – |
| Thalemine | – | A data warehouse aggregating many genomic tools and datasets for | ( |
| TOAST | Test Of Arabidopsis Space Transcriptome database | A relational database that compares plant biology, spaceflight-related omics datasets and their associated metadata. | – |
| Veggie | – | NASA's Vegetable Production System; an ISS-based growth hardware providing LED lighting. | ( |
| Volcano plot | A scatter plot of data. For the microarray and RNAseq data in TOAST the volcano plot presents fold-change per gene ID plotted versus statistical significance for each data point. |