| Literature DB >> 25505034 |
Hajime Ohyanagi1, Tomoyuki Takano2, Shin Terashima2, Masaaki Kobayashi3, Maasa Kanno4, Kyoko Morimoto4, Hiromi Kanegae4, Yohei Sasaki3, Misa Saito4, Satomi Asano3, Soichi Ozaki3, Toru Kudo3, Koji Yokoyama4, Koichiro Aya5, Keita Suwabe6, Go Suzuki7, Koh Aoki8, Yasutaka Kubo9, Masao Watanabe10, Makoto Matsuoka5, Kentaro Yano11.
Abstract
Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources.Entities:
Keywords: Correspondence analysis; Database; Gene expression network; Manual curation; Natural language processing (NLP); Omics
Mesh:
Year: 2014 PMID: 25505034 PMCID: PMC4301748 DOI: 10.1093/pcp/pcu188
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
The number of PubMed papers for NLP and manual curation
| Keyword | ||||||||
|---|---|---|---|---|---|---|---|---|
| Reproduction | 367 | 213 | 23 | 17 | 27 | 11 | 95 | 110 |
| Fertilization | 183 | 185 | 18 | 12 | 11 | 15 | 34 | 28 |
| Flowering | 1,303 | 515 | 36 | 36 | 44 | 38 | 50 | 123 |
| Pistil | 48 | 33 | 3 | 0 | 0 | 0 | 13 | 1 |
| Heading | 20 | 277 | 0 | 1 | 0 | 0 | – | – |
| Pollen | 729 | 381 | 25 | 13 | 10 | 8 | 60 | 42 |
| Embryo | 557 | 290 | 4 | 15 | 16 | 35 | 22 | 83 |
| Hybrid | 738 | 675 | 34 | 34 | 42 | 21 | 128 | 56 |
| Yield | 423 | 1,185 | 60 | 63 | 64 | 29 | 316 | 369 |
| Meiosis | 242 | 109 | 4 | 4 | 5 | 3 | 17 | 5 |
| Vernalization | 147 | 15 | 0 | 0 | 0 | 2 | – | – |
| Flower development | 172 | 49 | 7 | 0 | 9 | 6 | 4 | 2 |
| Pollination | 137 | 75 | 26 | 9 | 7 | 8 | 9 | 15 |
| Short-day | 127 | 69 | 1 | 4 | 3 | 1 | 26 | 13 |
| Long-day | 126 | 69 | 0 | 0 | 1 | 3 | 14 | 15 |
| Incompatibility | 75 | 23 | 3 | 2 | 2 | 0 | 14 | 3 |
| Inflorescence | 373 | 97 | 8 | 12 | 17 | 4 | – | 3 |
| Endosperm | 204 | 479 | 5 | 32 | 6 | 9 | 36 | 13 |
| Anther | 160 | 190 | 6 | 2 | 4 | 3 | 2 | 2 |
| Fruit | 275 | 170 | 358 | 0 | 442 | 8 | 167 | 50 |
| Sterility | 125 | 249 | 5 | 9 | 1 | 1 | 13 | 20 |
| Flowering/anthesis | 1,337 | 561 | 49 | 39 | 59 | – | 55 | 138 |
| Flowering/fertilization | 1,444 | 690 | 46 | 49 | 57 | – | 84 | 149 |
| Flowering/flower development | 1,445 | 548 | 35 | 37 | 51 | – | 54 | – |
| Floral initiation/flower bud initiation/ floral differentiation/flower development/ flower bud differentiation | 202 | 51 | 8 | 9 | 9 | – | 4 | 6 |
| Heading/ear emergence | 20 | 280 | 0 | 1 | 0 | – | – | – |
| Seed-setting/fruition / fruit | 281 | 198 | 359 | 1 | 446 | – | 167 | – |
| Fertilization/syngamy/pollination | 291 | 250 | 36 | 21 | 17 | – | 41 | 41 |
| Long-day/short-day | 187 | 89 | 1 | 4 | 3 | – | 32 | 17 |
| Crossbreeding/hybridization | 856 | 891 | 58 | 50 | 52 | 97 | 264 | 158 |
| Total | 12,594 | 8,906 | 1,218 | 476 | 1,405 | 302 | 1,721 | 1,462 |
A list of keywords for plant reproduction processes and the corresponding number of papers in each PubMed search is shown.
A solidus (/) indicates search for papers containing either keywords.
PubMed search query (examples): ‘Arabidopsis thaliana’ AND ‘reproduction’.
(‘Arabidopsis thaliana’ AND ‘flowering’) OR (‘Arabidopsis thaliana’ AND ‘anthesis’).
Fig. 1Home page and flowchart of the PODC. A keyword search for gene annotations including NLP relationships (blue pane), a sequence homology search with the BLAST program (green pane) and a GEN search using gene IDs (red pane) are available. In each search result page, the gene detail information page and GEN viewer are hyperlinked.
Fig. 2Search query pages (advanced search) and search result pages of the PODC. (A) Gene search query page. (B) BLAST search query page. (C) GEN search query page. (D) Gene search result page. (E) BLAST search result page. (F) GEN search result page. Each search result is also downloadable as a table file.
Fig. 3Gene detail information page. Each page has a vertically long layout and contains functional annotations (A), NLP annotations (A), genes having similar expression patterns and their gene expression profile (B), orthologous and paralogous genes (B), the GEN (B), GO annotations (C), KEGG pathway information (C), and DNA and amino acid sequences (C).
Fig. 4Details in GEN viewer. (A) An interspecies network with genes from multiple species. Each node indicates a gene, each solid edge means a relationship (a similarly expressed gene pair) and each dashed edge represents an orthologous or paralogous relationship. Some of those genes are orthologous to the centered Arabidopsis gene (gray dashed edges). (B) Zoomed-in view of the red box in (A). The blue dashed edge represents a paralogous relationship between two Arabidopsis genes. (C) Detailed information pages including for gene expression profiles, network members and gene annotations.