| Literature DB >> 30913342 |
En-Hua Xia1, Fang-Dong Li1, Wei Tong1, Peng-Hui Li1, Qiong Wu1, Hui-Juan Zhao1, Ruo-Heng Ge1, Ruo-Pei Li1, Ye-Yun Li1, Zheng-Zhu Zhang1, Chao-Ling Wei1, Xiao-Chun Wan1.
Abstract
Tea is the world's widely consumed nonalcohol beverage with essential economic and health benefits. Confronted with the increasing large-scale omics-data set particularly the genome sequence released in tea plant, the construction of a comprehensive knowledgebase is urgently needed to facilitate the utilization of these data sets towards molecular breeding. We hereby present the first integrative and specially designed web-accessible database, Tea Plant Information Archive (TPIA; http://tpia.teaplant.org). The current release of TPIA employs the comprehensively annotated tea plant genome as framework and incorporates with abundant well-organized transcriptomes, gene expressions (across species, tissues and stresses), orthologs and characteristic metabolites determining tea quality. It also hosts massive transcription factors, polymorphic simple sequence repeats, single nucleotide polymorphisms, correlations, manually curated functional genes and globally collected germplasm information. A variety of versatile analytic tools (e.g. JBrowse, blast, enrichment analysis, etc.) are established helping users to perform further comparative, evolutionary and functional analysis. We show a case application of TPIA that provides novel and interesting insights into the phytochemical content variation of section Thea of genus Camellia under a well-resolved phylogenetic framework. The constructed knowledgebase of tea plant will serve as a central gateway for global tea community to better understand the tea plant biology that largely benefits the whole tea industry.Entities:
Keywords: bioinformatics platform; comparative genomics; evolution biology; knowledge database; tea plant
Mesh:
Substances:
Year: 2019 PMID: 30913342 PMCID: PMC6737018 DOI: 10.1111/pbi.13111
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Data resources in TPIA database (as of 1 January 2019)
| Data type | Entries | Data type | Entries |
|---|---|---|---|
|
|
| ||
| Assembly | 1 | Assembly of tea plant | 60 |
| Gene models | 33 932 | PacBio SMRT transcriptome | 1 |
| Blast to CSA (Yunkang 10) | 32 708 | Expression studies (developmental stages) | 8 |
| Blast to | 32 122 | Expression studies (under biotic and abiotic stresses) | 5 |
| Annotated to NR database | 33 380 | Assemblies from close relatives | 23 |
| Annotated to PFAM database | 27 262 | Polymorphic EST‐SSR | 1663 |
| Annotated to GO database | 15 138 | SNP genetic variations | 190 363 |
| Annotated to KEGG pathway | 27 267 |
| |
| Annotated to KOG groups | 19 230 | Catechins | 26 |
| Annotated to COG groups | 13 757 | PAs | 25 |
| Annotated to InterPro database | 27 262 | Theanine | 26 |
| Transposable elements (Gb) | 1.86 | Caffeine | 25 |
| Transcription factors | 2486 |
| |
| Simple sequence repeat (SSR) | 59 765 | Wild | 37 |
| OrthMCL gene family | 15 224 | Breeding varieties | 352 |
| Organelle genome/insertions | 19/269 | Other lines | 731 |
| Experimentally validated genes | 107 | Total | 1120 |
| Genomic synteny | 151 |
| |
|
| CircRNA | 1942 | |
| Leaves | 2 | microRNA | 767 |
Figure 1Architecture of TPIA database. (a) Data source layer; (b) middleware layer; (c) application layer.
Figure 2Visualization of the major genomic features of tea plant using JBrowse. (a) Overview of the JBrowse that totally includes 27 tracks; (b) functional annotations of tea plant genes using NCBI NR database (totally seven annotations); (c) expression patterns of tea plant genes (e.g. cold acclimations) at different tissues or under stress or in leaves of other Camellia species; and (d) variations of tea plant genes among 16 representative Camellia species.
Figure 3Search engine of TPIA database. Screenshot of the results from (a) locus search (e.g. TEA000001.1); (b) text search (e.g. kinase); (c) function‐based search (e.g. PF00069, protein kinase domain); (d) search expression patterns for 10 genes; and (e) sequence‐based search (e.g. UGT84A22).
Figure 4Genomewide analysis of tea plant transcription factors. (a) A total of 2486 TFs from 67 families are provided for analysis; a case study for Alfin‐like gene family, including (b) functional annotation, (c) multiple sequence download, (d) expression patterns at different tissues, or under diverse abiotic stresses (cold, drought, salt) and hormone treatment (MeJA), or across 16 Camellia species, (e) correlation with genes and metabolites and (f) gene structure visualization.
Figure 5Visualization of tea quality‐related metabolites and pathways. (a) Visualization of the EGCG accumulations in eight representative tissues of tea plant and other 15 Camellia species; (b) the genes whose expression patterns are highly correlated with the accumulation patterns of selected metabolites (e.g. EGCG); further filtering options, analysis tools and download menus are provided; (c) biosynthesis pathways of secondary metabolites; and (d) pathway mapping for caffeine metabolism; the tea plant genes in the pathway are highlighted in green and cross‐linked to JBrowser and KEGG database (https://www.genome.jp/kegg/).
Figure 6Flexible tools implemented in TPIA. (a) Gene set functional enrichment analysis using KEGG pathway; (b) orthologous groups between tea plant and other 11 representative plant species (e.g. TEA005863.1); (c) correlation analysis between gene expression and metabolite accumulation; (d) ORF finder tool; (e) polymorphic SSR discovery tool; (f) automatic primer designer; and (g) primer blaster (e.g. tea caffeine synthesis gene).
Figure 7A case application of TPIA. (a) Phylogenetic tree of section Thea of genus Camellia constructed using 313 high‐quality 1:1 single‐copy orthologous genes from transcriptome data of TPIA; (b) accumulation dynamics of tea quality associated characteristic metabolites under a well‐resolved phylogenetic framework of section Thea; (c) expression pattern of key genes involved in catechins biosynthesis across different Thea groups. The FPKM values of expression levels are centred and scaled using ‘pheatmap’ package; (d) variation of 22 SCPL1A genes across different Thea species. The left panel indicates the scaffolds that hold the SCPL1A genes (brown: reverse strand; green: forward strand). The middle panel shows the cis‐regulatory elements identified in the putative promoter region (upstream 2 kb) of each SCPL1A gene. The right panel represents the total number of functional (nonsynonymous) SNPs detected in the coding sequences of each SCPL1A gene, which is further visualized by red solid circles.