| Literature DB >> 30239664 |
Taylor Falk1, Nic Herndon1, Emily Grau1, Sean Buehler1, Peter Richter1, Sumaira Zaman1, Eliza M Baker1, Risharde Ramnath1, Stephen Ficklin2, Margaret Staton3, Frank A Feltus4, Sook Jung2, Doreen Main2, Jill L Wegrzyn1.
Abstract
Forest trees are valued sources of pulp, timber and biofuels, and serve a role in carbon sequestration, biodiversity maintenance and watershed stability. Examining the relationships among genetic, phenotypic and environmental factors for these species provides insight on the areas of concern for breeders and researchers alike. The TreeGenes database is a web-based repository that is home to 1790 tree species and over 1500 registered users. The database provides a curated archive for high-throughput genomics, including reference genomes, transcriptomes, genetic maps and variant data. These resources are paired with extensive phenotypic information and environmental layers. TreeGenes recently migrated to Tripal, an integrated and open-source database schema and content management system. This migration enabled developments focused on data exchange, data transfer and improved analytical capacity, as well as providing TreeGenes the opportunity to communicate with the following partner databases: Hardwood Genomics Web, Genome Database for Rosaceae, and the Citrus Genome Database. Recent development in TreeGenes has focused on coordinating information for georeferenced accessions, including metadata acquisition and ontological frameworks, to improve integration across studies combining genetic, phenotypic and environmental data. This focus was paired with the development of tools to enable comparative genomics and data visualization. By combining advanced data importers, relevant metadata standards and integrated analytical frameworks, TreeGenes provides a platform for researchers to store, submit and analyze forest tree data.Entities:
Mesh:
Year: 2018 PMID: 30239664 PMCID: PMC6146132 DOI: 10.1093/database/bay084
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Phylogenetic representation of the forest tree orders held in the TreeGenes
database. The orders are listed and the number of species in each are included parenthetically. Cyan branches represent gymnosperm orders; green branches represent angiosperm orders.
Figure 2The internal design of Chado and connections to other areas of TreeGenes. Items in green boxes are part of the structure of the Chado relational database. Items in blue are used exclusively for CartograTree, but the module does pull data from multiple parts of the database. Sequence data are displayed or manipulated by a number of tools. The tools developed by TreeGenes are dark green, and the ones implemented by other groups are in light green.
Figure 3Screen captures of the main TreeGenes webpages. (A) The home page of the TreeGenes database has links to various tools such as JBrowse and CartograTree, as well as links to download and search pages. (B) The JBrowse page landing page which features access to the viewer as well as associated flat files and short read alignment indexes.
Figure 4Conceptual overview of the TreeGenes database. The architecture (hardware and software) underlying TreeGenes. There are three distinct servers: database, web and application. The database server, running PostgreSQL, houses Chado and GeoServer. This connects to the web server which houses Tripal, Drupal and the web server software, NGINX, as well as FTP through h5ai. ElasticSearch is installed as an extension module to provide access to indexed content in the database and search indexed content in partner databases (Genome Database for Rosaceae and Hardwood Genomics Web). The web server connects to the app server which hosts Galaxy and gives TreeGenes users access to computational resources.
Environmental data available in TreeGenes. For each data set, the provider, the name of the data set, its source and a short description are shown, along with the format in which it is stored, and how it can be accessed in TreeGenes
| Provider | Environmental layer | Description | Storage | Access |
|---|---|---|---|---|
| WorldClim v2 ( | Minimum temperature1 | Average monthly climate data for 1970–2000 | GeoTiff files | Raster layers in CartograTree |
| Maximum temperature2 | ||||
| Average temperature3 | ||||
| Precipitation4 | ||||
| Solar radiation5 | ||||
| Wind speed6 | ||||
| Water vapor pressure7 | ||||
| Conservation Biology Institute ( | Harmonized World Soil Database8 | Composition of 15,773 soil mapping units, and standardized soil parameters for top- and subsoil. | PostGIS table | Vector layer in CartograTree |
| US Forest Service ( | Species range maps9 | Distribution maps of 135 tree species | Shape files | Vector layers in CartograTree |
| US Geological Survey (43) | Tree cover10 | Estimates for tree cover and bare ground | GeoTiff files | Raster layer in CartograTree |
| Global Aridity and PET Database ( | Annual global aridity11 | Data related to evapotranspiration processes and rainfall deficit for potential vegetative growth | GeoTiff file | Raster layer in CartograTree |
| NASA (44) | Canopy height12 | The height of the world’s forests | GeoTiff file | Raster layer in CartograTree |
| Intact Forest Landscapes (45) | Intact Forest Landscapes13 | The extent of the intact forest landscapes | Shape files | Vector layers in CartograTree |
1http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_tmin.zip
2http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_tmax.zip
3http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_tavg.zip
4http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_prec.zip
5http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_srad.zip
6http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_wind.zip
7http://biogeo.ucdavis.edu/data/worldclim/v2.0/tif/base/wc2.0_30s_vapr.zip
8http://www.arcgis.com/home/item.html?id=1d16ed2a0aa24ab39e5ee6c491965883
9https://www.fs.fed.us/nrs/atlas/littlefia/
10https://landcover.usgs.gov/glc/
11http://www.cgiar-csi.org/data/global-aridity-and-pet-database
12https://landscape.jpl.nasa.gov/
13http://www.intactforests.org/data.ifl.html
Figure 5Submission and data acquisition avenues for the TreeGenes database. Clockwise from top left: TreeGenes automatically imports literature data from Web of Science, sequence and genotypic information from GenBank and georeferenced phenotypic data from TreeSnap; users can submit genotypic, phenotypic, environmental and literature data through the TPPS platform; users can also submit genetic maps through the CMap pipeline; TreeGenes managers manually import sequence data from Ensembl and 1KP, phenotypic data from TRY-DB and georeferenced genotypic and phenotypic data from Dryad.
Ontology frameworks and statistics for TreeGenes
| Ontology | Unique Terms | Total Measures |
|---|---|---|
| PO | 23 | 724 225 |
| TO | 20 | 227 338 |
| CO | 19 | 173 066 |
| ChEBI | 126 | 42 412 |
| PATO | 21 | 642 326 |
| TreeGenes Ontology (TGDR) | 5 | 1745 |