| Literature DB >> 24712981 |
Juan Yu, Zhenhai Zhang, Jiangang Wei, Yi Ling, Wenying Xu1, Zhen Su.
Abstract
BACKGROUND: Soybean (Glycine max L.) is one of the world's most important leguminous crops producing high-quality protein and oil. Increasing the relative oil concentration in soybean seeds is many researchers' goal, but a complete analysis platform of functional annotation for the genes involved in the soybean acyl-lipid pathway is still lacking. Following the success of soybean whole-genome sequencing, functional annotation has become a major challenge for the scientific community. Whole-genome transcriptome analysis is a powerful way to predict genes with biological functions. It is essential to build a comprehensive analysis platform for integrating soybean whole-genome sequencing data, the available transcriptome data and protein information. This platform could also be used to identify acyl-lipid metabolism pathways. DESCRIPTION: In this study, we describe our construction of the Soybean Functional Genomics Database (SFGD) using Generic Genome Browser (Gbrowse) as the core platform. We integrated microarray expression profiling with 255 samples from 14 groups' experiments and mRNA-seq data with 30 samples from four groups' experiments, including spatial and temporal transcriptome data for different soybean development stages and environmental stresses. The SFGD includes a gene co-expression regulatory network containing 23,267 genes and 1873 miRNA-target pairs, and a group of acyl-lipid pathways containing 221 enzymes and more than 1550 genes. The SFGD also provides some key analysis tools, i.e. BLAST search, expression pattern search and cis-element significance analysis, as well as gene ontology information search and single nucleotide polymorphism display.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24712981 PMCID: PMC4051163 DOI: 10.1186/1471-2164-15-271
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of published soybean databases
| SGMD [ | Genomic data, expressed sequence tags and microarray expression experiments, Proteomics of Oilseeds | |
| SoyGD [ | Soybean physical map and genetic map using Gbrowse as platform | |
| Soybean Full-length cDNA Database [ | 40,000 full-length sequences of cDNA clones | |
| SoyDB [ | Soybean transcription factors | |
| SoyTEDB [ | Soybean transposable elements | |
| SoyBase [ | Comprehensive database for curated genetics, genomics, and related data resources developed by USDA-ARS | |
| LegumeIP [ | Comparative genomics and transcriptomics database of model legumes | |
| PlaNet [ | Whole-genome co-expression networks for seven important plant crop species | |
| SoyKB [ | Integration of soybean omics data along with annotation of gene function and biological pathway | |
| SoyXpress [ | Microarray expression data and expressed sequence tags [ | |
| Phytozome [ | Soybean genome sequence and gene annotation information | |
| Soybean eFP Browser [ | Creates ‘electronic fluorescent pictographic’ representations of genes’ expression pattern | |
| SoyProDB [ | Soybean seed proteins | |
| GmGDB [ | Soybean genome and gene models | |
| SoyPLEX [ | Soybean gene expression resource |
Data source of SFGD
| Coding gene and annotation | 66,207 genes, the items of cDNA, CDS and protein are all 75,778, respectively | |
| Full-length cDNA | 37,870 (4708 full, 32,063 forward and 27,927 reverse sequences) | |
| Consensus sequence | 37,593 | |
| Consensus sequence annotation | 18,872 (including GO term, EC number and description) | |
| Microarray experiment | 14 experiments, 245 samples | |
| Deep sequencing data | Four experiments, 30 samples | |
| MicroRNA data | 229 (precursor and mature sequences) | |
| SNP | 17 wild and 14 cultivated soybean species | [ |
Soybean acyl-lipid metabolism pathway
| Fatty acid synthesis | 20 | 88 |
| Fatty acid elongation, desaturation and export from plastid | 17 | 55 |
| Triacylglycerol biosynthesis | 14 | 126 |
| Triacylglycerol and fatty acid degradation | 18 | 148 |
| Eukaryotic galactolipid and sulfolipid synthesis | 17 | 63 |
| Prokaryotic galactolipid, sulfolipid, phospholipid synthesis | 25 | 112 |
| Eukaryotic phospholipid metabolism | 18 | 97 |
| Mitochondrial phospholipid metabolism | 9 | 62 |
| Sphingolipid synthesis and transport | 22 | 61 |
| Mitochondrial lipoic acid synthesis | 13 | 46 |
| Wax synthesis and transport | 22 | 207 |
| Cutin synthesis and transport | 7 | 63 |
| Suberin synthesis and transport | 17 | 279 |
| Oxylipin metabolism | 21 | 120 |
| Choline synthesis | 8 | 23 |
Figure 1The tissue-specific gene expression and regulation network of the triacylglycerol (TAG) biosynthesis pathway. Multiple genes of interest in the TAG biosynthesis pathway (A) are simultaneously shown in a heatmap (B) with the same color. The co-expression network of WRI1 (Glyma15g34770) (C), which regulates the biosynthesis of TAG.
Soybean lipid biosynthesis related genes (WRI1 network genes)
| Glyma06g11860 | GmaAffx.50807.1.S1_at, GmaAffx.50807.2.S1_at, GmaAffx.26813.1.A1_at | AT1G77590 | LACS9 (LONG CHAIN ACYL-COA SYNTHETASE 9); long-chain-fatty-acid-CoA ligase |
| Glyma18g50020 | Gma.16819.1.S1_at | AT5G16390 | CAC1 (CHLOROPLASTIC ACETYLCOENZYME A CARBOXYLASE 1); acetyl-CoA carboxylase/biotin binding |
| Glyma07g37050 | GmaAffx.84778.1.S1_at | AT3G16950 | LPD1 (LIPOAMIDE DEHYDROGENASE 1); dihydrolipoyl dehydrogenase |
| Glyma08g22750 | GmaAffx.47472.1.S1_at | AT3G15820 | Phosphatidic acid phosphatase-related/PAP2-related |
| Glyma03g39860 | Gma.959.1.S1_at | AT1G54860 | Unknown protein |
| Glyma04g14830 | GmaAffx.86095.1.S1_at | AT1G65870 | Disease resistance-responsive family protein |
| Glyma18g01280 | Gma.910.1.A1_at | AT1G24360 | 3-oxoacyl-(acyl-carrier protein) reductase, chloroplast/3-ketoacyl-acyl carrier protein reductase |
| Glyma05g36450 | Gma.8414.1.S1_at, Gma.8414.1.S1_s_at | AT5G35360 | CAC2; acetyl-CoA carboxylase/ biotin carboxylase |
| Glyma12g11150 | GmaAffx.3734.1.S1_at | AT5G05410 | DREB2A; DNA binding/transcription activator/ transcription factor |
| Glyma16g27460 | GmaAffx.25251.1.S1_at | AT2G39210 | nodulin family protein |
| Glyma01g21480 | GmaAffx.85056.2.S1_at, Gma.14628.1.S1_at | AT1G25510 | aspartyl protease family protein |
| Glyma15g35080 | Gma.12634.1.A1_s_at | AT3G56850 | AREB3 (ABA-RESPONSIVE ELEMENT BINDING PROTEIN 3); DNA binding/transcription activator/transcription factor |
| Glyma12g34740 | Gma.11970.1.S1_at | AT5G54250 | ATCNGC4 (CYCLIC NUCLEOTIDE-GATED CATION CHANNEL 4); calmodulin binding/cation channel/cation transmembrane transporter/cyclic nucleotide binding |
| Glyma15g00450 | Gma.12064.1.S1_at | AT5G25900 | GA3 (GA REQUIRING 3); ent-kaurene oxidase/oxygen binding |
| Glyma03g34760 | GmaAffx.93571.1.S1_s_at, GmaAffx.93571.1.S1_at, GmaAffx.73653.1.S1_at | AT3G52970 | CYP76G1; electron carrier/heme binding/iron ion binding/monooxygenase/oxygen binding |
| Glyma01g36180 | Gma.3792.1.A1_at | AT1G42960 | unknown protein |
| Glyma10g31540 | Gma.17374.1.S1_at, GmaAffx.64124.1.S1_at, GmaAffx.87290.1.S1_at | AT1G32900 | starch synthase, putative |
| Glyma04g27740 | GmaAffx.25768.1.S1_at | AT1G65870 | disease resistance-responsive family protein |
| Glyma08g03120 | Gma.181.1.S1_at | AT5G35360 | CAC2; acetyl-CoA carboxylase/ biotin carboxylase |
| Glyma05g03210 | Gma.8701.1.S1_at, GmaAffx.7258.1.S1_s_at | AT4G24830 | arginosuccinate synthase family |
| Glyma18g43140 | Gma.2316.1.S1_at | AT5G05600 | oxidoreductase, 2OG-Fe(II) oxygenase family protein |
| Glyma07g13560 | GmaAffx.5167.1.S1_at | AT4G01070 | GT72B1; UDP-glucosyltransferase/UDP-glycosyltransferase/transferase, transferring glycosyl groups |
| Glyma09g38570 | GmaAffx.81233.1.A1_at | AT5G16460 | Unknown protein |
| Glyma10g12290 | Gma.1883.1.S1_at | AT2G41190 | Amino acid transporter family protein |
| Glyma08g36590 | Gma.14186.1.A1_at | AT5G46690 | bHLH071 (beta HLH protein 71); DNA binding/transcription factor |
| Glyma13g44870 | GmaAffx.59734.1.A1_at, GmaAffx.86023.1.S1_at, GmaAffx.33541.1.S1_at | AT5G25900 | GA3 (GA REQUIRING 3); ent-kaurene oxidase/oxygen binding |
| Glyma03g20380 | GmaAffx.37979.1.S1_at | AT3G07250 | Nuclear transport factor 2 (NTF2) family protein/RNA recognition motif (RRM)-containing protein |
| Glyma05g31820 | GmaAffx.29450.1.S1_at | AT1G10500 | ATCPISCA (chloroplast-localized IscA-like protein); structural molecule |
| Glyma17g00580 | Gma.10258.1.A1_s_at, Gma.10258.2.S1_at, GmaAffx.83788.1.S1_at | AT5G49820 | emb1879 (embryo defective 1879) |
| Glyma07g03350 | Gma.11469.1.S1_at, GmaAffx.67403.1.S1_at, GmaAffx.67403.1.A1_at | AT3G15820 | phosphatidic acid phosphatase-related/PAP2-related |
| Glyma18g44350 | Gma.6041.1.S1_at | AT1G62640 | KAS III (3-KETOACYL-ACYL CARRIER PROTEIN SYNTHASE III); 3-oxoacyl-[acyl-carrier-protein] synthase/catalytic/transferase, transferring acyl groups other than amino-acyl groups |
Motif significance analysis results of soybean triacylglycerol biosynthesis related genes
| GTCATTATCGG | CATTAT-motif | 1 | 14.1 | 0 | phyA3;Avena sativa |
| CGCCACGTGTCC | ABREBNNAPA | 2 | 10.81 | 0 | napA; storage protein; ABRE; napin; seed |
| AATTAAA | POLASIG2 | 453 | 8.42 | 0 | poly A signal |
| GGACACGTGGC | ABRETAEM | 3 | 5.87 | 0 | ABA; ABRE; EMBP-1; seed |
| ACGTGKC | ACGTABREMOTIFA2OSEM | 30 | 5.17 | 0 | ABA; ABRE; motif A; DRE |
| MCACGTGGC | GBOXLERBCS | 9 | 5.17 | 0 | G box; rbcS; tomato; G-box; leaf; shoot |
| ATTAAT | Box 4 | 664 | 4.7 | 0.000001 | pal-CMA1;light responsiveness |
| TCCACGTGGC | LREBOXIIPCCHS1 | 3 | 4.69 | 0.000001 | Chalcone synthase; CHS; light; Box II; LRE; leaf; shoot |
| GTATGATGG | SORLIP4AT | 4 | 4.63 | 0.000002 | phyA; phytochrome; light |
| YACGTGGC | ABREATCONSENSUS | 11 | 4.38 | 0.000006 | ABA; ABF; bZIP factors |
| ACGTGGC | BOXIIPCCHS | 15 | 4.35 | 0.000007 | Box II; Box 2; CHS; chs; light regulation |
| CACGTGGC | EMBP1TAEM | 9 | 4.24 | 0.000011 | EMBP-1; Em; ABA; ABF; ABRE; bZIP; seed |
| TGTATATAT | SORLREP3AT | 43 | 4.13 | 0.000018 | phyA; phytochrome; light |
| CCNNNNNNNNNNNNCCACG | UPRMOTIFIIAT | 8 | 3.96 | 0.000037 | UPR; unfolded protein response |
| TCCACGTGTC | SGBFGMGMAUX28 | 2 | 3.85 | 0.000058 | Aux28; G box; auxin; bZIP; SGBF-1; SGBF-2 |
| AGATATGATAAAA | IBOXLSCMCUCUMISIN | 1 | 3.79 | 0.000075 | Cucumisin; fruit |
| ACGTGGCA | LRENPCABE | 8 | 3.68 | 0.000117 | CAB; cab; cab-E; CABE; light; leaf; shoot |
| CACGTGG | IRO2OS | 16 | 3.55 | 0.00019 | root; shoot; Fe; iron |
| CCACGTGG | ABREZMRAB28 | 8 | 3.54 | 0.000201 | Freezing tolerance; seed; shoot; CBF2 |
| ACGTGTC | GADOWNAT | 15 | 3.35 | 0.000404 | Ga; seed; germination |
| GCCACGTGGC | ACGTROOT1 | 2 | 3.12 | 0.000898 | Root; ACGT; G box; G-box; ABRE motif; bZIP binding enhancement |
| ACGTCA | HEXMOTIFTAH3H4 | 33 | 2.83 | 0.002325 | Leucine zipper motif; meristem; OBF1; bZIP; lip19; LIP1 |
| TGACGT | TGACGTVMAMY | 33 | 2.83 | 0.002325 | Alpha-Amylase; cotyledon; seed germination; seed |
| CACGTG | CACGTGMOTIF | 44 | 2.83 | 0.002359 | G box; G-box; rbcs; chs; ACGT element; adh; Bz-2; R-motif; STR;GT-1; GBF; elicitor; bZIP; napin; strictosidine synthase; cell |
| TGTAATAATATATTTATATT | Unnamed__5 | 5 | 2.77 | 0.002822 | SEF1 factor binding site;seeds |
| AATTATTTTTTATT | AT1-motif | 4 | 2.5 | 0.006162 | Light responsive element |
| CCACGTGGCC | CPRFPCCHS | 1 | 2.48 | 0.006584 | BoxII; CPRF; bZIP; leaf; shoot; CHS; ACE; light; bZIP |
| CCWWWWWWWWGG | CARGNCAT | 16 | 2.45 | 0.007059 | MADS; AGAMOUS; AGL; embryo |
| RYACGTGGYR | ABREATRD22 | 5 | 2.34 | 0.009586 | ABA; responsive element; ABRE; rd22; RD22; dehydration; shoot |
Note: ZFM (Z-score for motif) and P-value are described in ‘Cis-element significant analysis’ section(job ID:job2014Mar4201558, produced by inputting genes appearing in Table 4).
Figure 2Structure of the SFGD database. Rectangles with rounded corners are pages in the database, and the directed lines show linkage for pages.