| Literature DB >> 24334350 |
Takeshi Obayashi1, Yasunobu Okamura, Satoshi Ito, Shu Tadaka, Yuichi Aoki, Matsuyuki Shirota, Kengo Kinoshita.
Abstract
ATTED-II (http://atted.jp) is a database of coexpressed genes that was originally developed to identify functionally related genes in Arabidopsis and rice. Herein, we describe an updated version of ATTED-II, which expands this resource to include additional agriculturally important plants. To improve the quality of the coexpression data for Arabidopsis and rice, we included more gene expression data from microarray and RNA sequencing studies. The RNA sequencing-based coexpression data now cover 94% of the Arabidopsis protein-encoding genes, representing a substantial increase from previously available microarray-based coexpression data (76% coverage). We also generated coexpression data for four dicots (soybean, poplar, grape and alfalfa) and one monocot (maize). As both the quantity and quality of expression data for the non-model species are generally poorer than for the model species, we verified coexpression data associated with these new species using multiple methods. First, the overall performance of the coexpression data was evaluated using gene ontology annotations and the coincidence of a genomic feature. Secondly, the reliability of each guide gene was determined by comparing coexpressed gene lists between platforms. With the expanded and newly evaluated coexpression data, ATTED-II represents an important resource for identifying functionally related genes in agriculturally important plants.Entities:
Keywords: Arabidopsis; Comparative transcriptomics; Database; Gene coexpression; Gene network; Non-model species
Mesh:
Year: 2013 PMID: 24334350 PMCID: PMC3894708 DOI: 10.1093/pcp/pct178
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Coexpression data in ATTED-II version 7.1
| Species | Version | No. of genes | Gene coverage (%) | No. of experiments | No. of samples | Platform | Release date |
|---|---|---|---|---|---|---|---|
| Arabidopsis | Ath.c5-0 | 20,836 | 76 | 737 | 11,171 | A-AFFY-2 | May 23, 2013 |
| Arabidopsis | Ath2.c1-0 | 25,838 | 94 | 28 | 328 | RNAseq | August 17, 2013 |
| Soybean | Gma.c1-0 | 15,902 | 29 | 31 | 938 | A-AFFY-59 | May 23, 2013 |
| Poplar | Ppo.c1-0 | 21,909 | 53 | 23 | 404 | A-AFFY-131 | May 23, 2013 |
| Grape | Vvi.c1-0 | 8,351 | 32 | 14 | 245 | A-AFFY-78 | May 23, 2013 |
| Alfalfa | Mtr.c1-0 | 4,166 | 9 | 43 | 585 | A-AFFY-71 | May 23, 2013 |
| Rice | Osa.c3-0 | 20,625 | 53 | 73 | 1214 | A-AFFY-126 | May 23, 2013 |
| Maize | Zma.c1-0 | 8,397 | 47 | 617 | A-AFFY-77 | May 23, 2013 | |
Gene coverage indicates the percentage of protein-encoding genes (provided by Phytozome v9.1) that are included in the coexpression data set (Goodstein et al. 2012). Statistics for maize are not provided because of poor annotation quality.
This column indicates the number of slides for each microarray platform and the number of runs for the RNAseq platform (Ath2).
Fig. 1An example of a coexpressed gene list in ATTED-II. The Arabidopsis PSBS gene is used as the example of a guide gene, and coexpressed genes are shown along with their mutul rank (MR) values (a smaller MR value indicates a stronger coexpression). The six columns on the right indicate the degree of coexpression for ortholog pairs in other species (or another Arabidopsis platform). Coexpression with an MR value >200 is considered weak (gray text). A blank cell means that coexpression data were not available. The reliability was calculated on the basis of coexpression conservation and is represented by stars. Three stars indicate excellent reliability, whereas no stars indicates not reliable. This list is available at http://atted.jp/cgi-bin/coex_list.cgi?gene=At1g44575.
Development of coexpression data performance
| Species | Version | No. of genes | No. of samples | GO score | Codon score | ||
|---|---|---|---|---|---|---|---|
| Arabidopsis | Ath.c5-0 | 20,836 | 11,171 | 7.27 | 4.02 | ||
| Arabidopsis | Ath2.c1-0 | 25,838 | 328 | 4.88 | 2.63 | ||
| Soybean | Gma.c1-0 | 15,902 | 938 | 2.53 | |||
| Poplar | Ppo.c1-0 | 21,909 | 404 | 1.77 | |||
| Grape | Vvi.c1-0 | 8,351 | 245 | 1.42 | |||
| Alfalfa | Mtr.c1-0 | 4,166 | 585 | 1.37 | |||
| Rice | Osa.c3-0 | 20,625 | 1214 | 3.73 | 2.38 | ||
| Maize | Zma.c1-0 | 8,397 | 617 | 1.96 | |||
| Random | 0.5 | 1.00 | |||||
Italicized lines indicate previous versions of Arabidopsis and rice coexpression data.
This column indicates the number of slides for each microarray platform and the number of runs for the RNAseq platform (Ath2).
Predictive performance of the GO annotation represented by AUC0.01 (E–4). A larger score indicates a better performance.
Coincidence score with codon similarity represented by the median of the normalized COXSIM value. A larger score indicates a better performance.
Number of GO BP terms and genes to validate the predictive power of the gene coexpression data
| Coexpression data | No. of GO BP terms | No. of assessed genes |
|---|---|---|
| Ath.c5-0 | 2,785 | 3,410 |
| Ath2.c1-0 | 2,950 | 4,058 |
| Osa.c3-0 | 679 | 203 |
Italicized lines indicate previous versions of Arabidopsis and rice coexpression data.
Fig. 2Number of genes associated with each reliability level. Reliability levels are represented by stars. Three stars indicate excellent reliability, whereas no stars indicates not reliable. The numbers within the bars indicate the percentage of each reliability category for each species. Genes with no stars include genes without orthologs.