| Literature DB >> 29216398 |
Takeshi Obayashi1, Yuichi Aoki1,2,3, Shu Tadaka2,3, Yuki Kagaya1, Kengo Kinoshita1,2,4.
Abstract
ATTED-II (http://atted.jp) is a coexpression database for plant species to aid in the discovery of relationships of unknown genes within a species. As an advanced coexpression analysis method, multispecies comparisons have the potential to detect alterations in gene relationships within an evolutionary context. However, determining the validity of comparative coexpression studies is difficult without quantitative assessments of the quality of coexpression data. ATTED-II (version 9) provides 16 coexpression platforms for nine plant species, including seven species supported by both microarray- and RNA sequencing (RNAseq)-based coexpression data. Two independent sources of coexpression data enable the assessment of the reproducibility of coexpression. The latest coexpression data for Arabidopsis (Ath-m.c7-1 and Ath-r.c3-0) showed the highest reproducibility (Jaccard coefficient = 0.13) among previous coexpression data in ATTED-II. We also investigated the statistical basis of the mutual rank (MR) index as a coexpression measure by bootstrap sampling of experimental units. We found that the error distribution of the logit-transformed MR index showed normality with equal variances for each coexpression platform. Because the MR error was strongly correlated with the number of samples for the coexpression data, typical confidence intervals for the MR index can be estimated for any coexpression platform. These new, high-quality coexpression data can be analyzed with any tool in ATTED-II and combined with external resources to obtain insight into plant biology.Entities:
Keywords: Arabidopsis; Comparative transcriptomics; Database; Gene coexpression; Gene network; Statistics
Mesh:
Year: 2018 PMID: 29216398 PMCID: PMC5914358 DOI: 10.1093/pcp/pcx191
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Fig. 1Quality assessment based on the consistency of known gene functions. As a measure of the quality of gene coexpression data, the power to discriminate gene pairs sharing a common functional annotation from other gene pairs was used. (A) Previous and current Arabidopsis coexpression data were assessed by this discrimination analysis. Irrespective of the gene annotation source (GO Biological Process or KEGG pathway), the quality trend was consistent. (B) The current 16 coexpression platforms were assessed with the discrimination analysis using the KEGG pathway annotation. Circles and diamonds indicate microarray-based and RNAseq-based coexpression datasets, respectively.
Fig. 2Properties of the error distribution of MR and logit-MR values. SDs of the bootstrapped MRs from RNAseq-based Arabidopsis coexpression data (Ath-r.c3-0) are shown. (A, B) SDs of bootstrapped MR values (A) and of logit-MR values (B) are shown against the mean of the bootstrapped MR values. The black lines show the median values with a sliding window corresponding to the 0.01 percentile of the MR without overlap, whereas the gray lines represent the first and third quantiles. (C) Mean SDs for the current 16 coexpression platforms in ATTED-II are plotted against the number of samples for each platform. Circles and diamonds indicate microarray-based and RNAseq-based coexpression datasets, respectively.
Coexpression data in ATTED-II version 9
| Species | Platform ID | Version | Genes | Samples | Logit-MR error | Function score | Reproducibility |
|---|---|---|---|---|---|---|---|
| Ath-m | c7.1 | 20,819 | 16,033 | 0.37 | 5.43 | 0.136 | |
| Ath-r | c3.0 | 22,760 | 2,120 | 0.71 | 5.17 | ||
| Bra-r | c2.1 | 28,978 | 188 | 1.04 | 4.77 | – | |
| Gma-m | c3.1 | 15,746 | 1,131 | 0.74 | 3.37 | 0.076 | |
| Gma-r | c3.0 | 8,373 | 599 | 1.02 | 7.64 | ||
| Mtr-m | c3.1 | 20,376 | 975 | 1.04 | 4.43 | 0.021 | |
| Mtr-r | c1.1 | 3,753 | 41 | 1.46 | 2.65 | ||
| Osa-m | c6.1 | 19,867 | 2,250 | 0.76 | 4.98 | 0.041 | |
| Osa-r | c2.1 | 24,437 | 336 | 1.07 | 4.06 | ||
| Ppo-m | c2.1 | 21,910 | 765 | 1.10 | 3.82 | – | |
| Sly-m | c2.1 | 5,721 | 401 | 1.04 | 4.08 | 0.041 | |
| Sly-r | c2.1 | 20,564 | 282 | 1.01 | 3.87 | ||
| Vvi-m | c3.1 | 9,421 | 314 | 1.14 | 4.47 | 0.028 | |
| Vvi-r | c1.1 | 18,587 | 346 | 0.90 | 3.10 | ||
| Zma-m | c3.1 | 10,777 | 806 | 1.11 | 4.62 | 0.055 | |
| Zma-r | c2.1 | 32,274 | 1,794 | 0.88 | 4.42 |
aXxx-m, microarray-based coexpression; Xxx-r, RNAseq-based coexpression.
bPredictive performance of the KEGG annotation represented by partial AUROC (1E-04). A higher score indicates a better performance.
cJaccard coefficient for common edges between the platforms in the same species. The top three coexpressed genes from every gene were used as edges.
Jaccard coefficients of common edges among a series of coexpression data for Arabidopsis in ATTED-II
| Ath-r.c3-0 | Ath-r.c2-0 | Ath-r.c1-0 | |
|---|---|---|---|
| Ath-m.c7-0 | 0.134 | 0.055 | 0.038 |
| Ath-m.c6-0 | 0.111 | 0.057 | 0.040 |
| Ath-m.c5-0 | 0.106 | 0.056 | 0.040 |
| Ath-m.c4-1 | 0.078 | 0.046 | 0.032 |
| Ath-m.c3-1 | 0.061 | 0.042 | 0.029 |
Xxx-m, microarray-based coexpression; Xxx-r, RNAseq-based coexpression.
Note that a Jaccard coefficient of 1 indicates complete overlap between the two sets of coexpression edges, whereas a Jaccard coefficient of 0 indicates no overlap.
The 90% confidence intervals of typical MR values with different error levels
| MR | Bootstrap SD = 0.5 | Bootstrap SD = 1 | ||
|---|---|---|---|---|
| 1 | 0.8 | 1.4 | 0.7 | 2.1 |
| 3 | 1.9 | 4.9 | 1.3 | 8.3 |
| 10 | 5.9 | 17.3 | 3.5 | 30.2 |
| 30 | 14.2 | 52.6 | 6.6 | 92.5 |
| 100 | 56.9 | 175.8 | 32.4 | 308.4 |
Comparison of calculation methods
| Platform | No. of genes | No. of samples | Q | CQ | CQB | Combat effect | Bagging effect |
|---|---|---|---|---|---|---|---|
| Ath-m | 20,819 | 16,033 | 5.46 | 5.42 | 5.43 | 0.99 | 1.00 |
| Gma-m | 15,746 | 1,131 | 3.29 | 3.35 | 3.37 | 1.02 | 1.01 |
| Mtr-m | 20,376 | 975 | 3.41 | 4.32 | 4.43 | 1.27 | 1.03 |
| Osa-m | 19,867 | 2,250 | 4.60 | 4.92 | 4.98 | 1.07 | 1.01 |
| Ppo-m | 21,910 | 765 | 3.49 | 3.74 | 3.82 | 1.07 | 1.02 |
| Sly-m | 5,721 | 401 | 3.31 | 3.80 | 4.08 | 1.15 | 1.07 |
| Vvi-m | 9,421 | 314 | 3.93 | 4.22 | 4.47 | 1.07 | 1.06 |
| Zma-m | 10,777 | 806 | 4.00 | 4.46 | 4.62 | 1.12 | 1.03 |
| Ath-r | 22,760 | 2,120 | 4.56 | 5.12 | 5.17 | 1.12 | 1.01 |
| Bra-r | 28,978 | 188 | 4.54 | 4.63 | 4.77 | 1.02 | 1.03 |
| Gma-r | 8,373 | 599 | 6.56 | 7.60 | 7.64 | 1.16 | 1.01 |
| Mtr-r | 3,753 | 41 | 2.54 | 2.62 | 2.65 | 1.03 | 1.01 |
| Osa-r | 24,437 | 336 | 3.43 | 3.80 | 4.06 | 1.11 | 1.07 |
| Sly-r | 20,564 | 282 | 3.55 | 3.70 | 3.87 | 1.04 | 1.05 |
| Vvi-r | 18,587 | 346 | 3.21 | 3.07 | 3.10 | 0.96 | 1.01 |
| Zma-r | 32,274 | 1,794 | 3.98 | 4.35 | 4.42 | 1.09 | 1.01 |
Q, quantile normalization; CQ, ComBat-Quantile normalization; CQB, Bagging procedure for ComBat-Quantile normalized expression data.