| Literature DB >> 31665439 |
Dehe Wang1,2, Weiliang Fan1, Xiaolong Guo1,2, Kai Wu1,2, Siyu Zhou3, Zonggui Chen4, Danyang Li1, Kun Wang1, Yuxian Zhu1,4, Yu Zhou1,2,4.
Abstract
Malvaceae is a family of flowering plants containing many economically important plant species including cotton, cacao and durian. Recently, the genomes of several Malvaceae species have been decoded, and many omics data were generated for individual species. However, no integrative database of multiple species, enabling users to jointly compare and analyse relevant data, is available for Malvaceae. Thus, we developed a user-friendly database named MaGenDB (http://magen.whu.edu.cn) as a functional genomics hub for the plant community. We collected the genomes of 13 Malvaceae species, and comprehensively annotated genes from different perspectives including functional RNA/protein element, gene ontology, KEGG orthology, and gene family. We processed 374 sets of diverse omics data with the ENCODE pipelines and integrated them into a customised genome browser, and designed multiple dynamic charts to present gene/RNA/protein-level knowledge such as dynamic expression profiles and functional elements. We also implemented a smart search system for efficiently mining genes. In addition, we constructed a functional comparison system to help comparative analysis between genes on multiple features in one species or across closely related species. This database and associated tools will allow users to quickly retrieve large-scale functional information for biological discovery.Entities:
Mesh:
Year: 2020 PMID: 31665439 PMCID: PMC7145696 DOI: 10.1093/nar/gkz953
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of MaGenDB. (A) Data sources included 13 species from five subfamilies in Malvaceae and associated gene models, GWAS, eight types of sequencing data, and MS data. (B) Summary of data processing and obtained biological knowledge. (C) All data are stored in a MySQL relationship database with additional indexes. Django and React frameworks are used for interactive queries between frontend and backend. (D) Overview of the web interface and usage of MaGenDB. Main functions or data are listed under the first-level menu. The dashed lines indicate the linkages between different pages.
Summary of MaGenDB data
| Data | Statistics |
|---|---|
| Subfamily/Genus/Species | 5/9/13 |
| Transcripts/Proteins | 759 700/728 366 |
| Types of omics data | 9 |
| Omics data samples | 374 |
| GO/KO/EC annotation | 494 364/254 551/85 157 |
| Gene/TF families | 656 516/280 976 |
| BLAST annotation (NT/NR/TAIR) | 63 313 713/307 542 675/635 804 |
| Types of functional elements | 7 |
| Functional elements | 24 855 369 |
| Synteny gene pairs | 28 976 317 |
| Protein-protein interactions | 170 660 615 |
| Protein 3D structures | 98 831 |
Query fields and examples in smart search system
| Query field | Query content | Query example |
|---|---|---|
| Name | Locus/Transcript/Protein name | GhRDL1 |
| Name | Locus/Transcript/Protein ID | Ga14g01129.F1 |
| Name | TAIR homolog gene symbol | ACX3 |
| GO | GO accession ID | GO:0000032 |
| KO | KO accession ID | K11091 |
| KO | KEGG ontology name | ATPF1B, atpD |
| EC | Enzyme Commission ID | 1.17.7.2 |
| EC | Enzyme Commission name | Very-long-chain 3-oxoacyl-CoA synthase |
| KEGG pathway | KEGG pathway ID | ko04024 |
| KEGG pathway | KEGG pathway name | cAMP signaling pathway |
| KEGG model | KEGG model ID | M00087 |
| KEGG model | KEGG model name | Fe-S protein |
| KEGG disease | KEGG disease ID | H00254 |
| KEGG disease | KEGG disease name | Lysosomal acid lipase deficiency |
| InterPro domain | InterPro ID | IPR001736 |
| InterPro domain | InterPro domain name | cd01254 |
| Pfam domain | Pfam ID | PF00614.22 |
| Pfam domain | Pfam domain name | PLDc |
| CDD domain | NCBI CDD ID | 197200 |
| TF family | TF family name | C2H2 |
| eggNOG gene family | eggNOG root name | EOY06593 |
| eggNOG gene family | eggNOG hit gene | COG1502 |
Figure 2.Functional annotations in GeneWiki. (A) Different types of functional annotations in GeneWiki page. (B) Custom figure and table showing the positions and details of MS evidence for a protein. (C) Bar plot and table showing the gene expression level in FPKM across different tissues. (D) Custom dynamic charts and tables of functional elements annotated for a protein. (E) An interactive view of the PPI network for the query protein. (F) Dynamic visualization of the predicted 3D structure and the model details for a protein.
Figure 3.Genome browser and functional genomics tools in MaGenDB. (A) Genome browser view of processed omics data for a gene example. For any genome, the available data are organised in track groups and can be dynamically selected/unselected. (B) Overview of genomics tools provided in MaGenDB. (C) The interface of BLAST service for the different databases. (D) Example of genome map viewer of multiple genes. (E) Primer design page considering alternative splicing (AS) events.
Figure 4.Comparative genomics tools in MaGenDB. (A) Management of gene list of interest for comparative analysis. (B) Gene expression heatmap across different tissues for collinear genes to a query gene from the GeneWiki page. (C) Genomic position mapping of collinear genes. (D) Comparison of functional domains between two proteins. Grey box marks the missing region in one vs the other. (E) An interactive view of multiple alignments of collinear genes from the gene structure perspective. (F) A circular view of the gene synteny clusters between G. arboreum Chr06 with G. hirsutum chromosomes.