| Literature DB >> 26538400 |
Min Tang1, Jianqiang Sun2, Kentaro Shimizu3, Koji Kadota4.
Abstract
BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired.Entities:
Mesh:
Year: 2015 PMID: 26538400 PMCID: PMC4634584 DOI: 10.1186/s12859-015-0794-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Average AUC values for simulation data with replicates
| PG1 | 33 % | 50 % | 50 % | 60 % | 60 % | 70 % | 80 % |
| PG2 | 33 % | 30 % | 40 % | 20 % | 30 % | 20 % | 10 % |
| PG3 | 33 % | 20 % | 10 % | 20 % | 10 % | 10 % | 10 % |
| (a) PDEG = 5 % | |||||||
|
| 91.57 |
|
|
|
|
|
|
|
| 90.70 | 90.62 | 90.64 | 90.54 | 90.55 | 90.59 | 90.62 |
|
| 88.34 | 88.33 | 88.30 | 88.24 | 88.23 | 88.21 | 88.30 |
|
|
| 91.48 | 91.47 | 91.38 | 91.37 | 91.38 | 91.34 |
| edgeR_robust | 90.95 | 90.86 | 90.85 | 90.75 | 90.74 | 90.74 | 90.73 |
|
| 90.71 | 90.60 | 90.60 | 90.50 | 90.49 | 90.50 | 90.48 |
|
| 88.34 | 88.31 | 88.26 | 88.19 | 88.17 | 88.11 | 88.14 |
| voom | 87.16 | 87.01 | 86.99 | 86.88 | 86.91 | 86.88 | 86.86 |
| SAMseq | 85.04 | 84.97 | 84.93 | 84.83 | 84.88 | 84.88 | 84.91 |
| PoissonSeq | 87.31 | 87.25 | 87.25 | 87.19 | 87.17 | 87.22 | 87.23 |
| baySeq | 90.24 | 90.21 | 90.21 | 90.22 | 90.17 | 90.13 | 90.07 |
| EBSeq | 85.77 | 85.85 | 85.78 | 85.81 | 85.73 | 85.71 | 85.77 |
| (b) PDEG = 25 % | |||||||
|
| 91.47 |
|
|
|
|
|
|
|
| 90.77 | 90.73 | 90.72 | 90.70 | 90.68 | 90.65 | 90.57 |
|
| 88.13 | 88.11 | 88.13 | 88.14 | 88.12 | 88.09 | 88.06 |
|
|
| 91.30 | 91.18 | 91.06 | 90.98 | 90.62 | 89.97 |
| edgeR_robust | 90.89 | 90.69 | 90.57 | 90.43 | 90.34 | 89.97 | 89.27 |
|
| 90.77 | 90.54 | 90.37 | 90.25 | 90.15 | 89.73 | 89.04 |
|
| 88.12 | 87.83 | 87.62 | 87.49 | 87.36 | 86.79 | 85.92 |
| voom | 87.08 | 86.71 | 86.52 | 86.29 | 86.18 | 85.60 | 84.56 |
| SAMseq | 84.95 | 84.82 | 84.82 | 84.77 | 84.75 | 84.72 | 84.63 |
| PoissonSeq | 87.22 | 87.18 | 87.14 | 87.13 | 87.11 | 87.06 | 86.97 |
| baySeq | 90.34 | 90.13 | 90.07 | 89.92 | 89.83 | 89.52 | 88.86 |
| EBSeq | 85.82 | 85.61 | 85.49 | 85.34 | 85.30 | 84.74 | 84.02 |
Average AUC values (%) of 100 trials for each simulation condition are shown: (a) PDEG = 5 % and (b) PDEG = 25 %. Simulation data contain a total of 10,000 genes: PDEG % of genes is for DEGs, PG1 % of PDEG in G1 is higher than in the other groups, and each group has three BRs (Nrep = 3). Seven conditions are shown in total. The highest AUC value for each condition is in bold
Effect of different choices for the possible pipelines in TCC
| PG1 | 33 % | 50 % | 50 % | 60 % | 60 % | 70 % | 80 % |
| PG2 | 33 % | 30 % | 40 % | 20 % | 30 % | 20 % | 10 % |
| PG3 | 33 % | 20 % | 10 % | 20 % | 10 % | 10 % | 10 % |
| (a) PDEG = 5 % | |||||||
|
| 91.57 | 91.50 | 91.50 |
| 91.42 | 91.45 | 91.46 |
|
| 91.57 |
|
| 91.43 |
|
|
|
|
| 91.57 | 91.50 | 91.50 | 91.43 | 91.42 | 91.45 | 91.46 |
|
| 91.57 | 91.50 | 91.50 | 91.43 | 91.42 | 91.45 | 91.46 |
|
| 90.70 | 90.62 | 90.64 | 90.54 | 90.55 | 90.58 | 90.62 |
|
| 90.71 | 90.62 | 90.64 | 90.54 | 90.55 | 90.59 | 90.62 |
|
| 90.70 | 90.62 | 90.64 | 90.54 | 90.55 | 90.58 | 90.62 |
|
| 90.70 | 90.62 | 90.64 | 90.54 | 90.55 | 90.59 | 90.62 |
|
| 91.58 | 91.48 | 91.47 | 91.38 | 91.37 | 91.38 | 91.34 |
|
|
| 91.48 | 91.46 | 91.38 | 91.36 | 91.36 | 91.32 |
|
| 90.70 | 90.61 | 90.61 | 90.50 | 90.50 | 90.51 | 90.50 |
|
| 90.71 | 90.60 | 90.60 | 90.50 | 90.49 | 90.50 | 90.48 |
| (b) PDEG = 25 % | |||||||
|
| 91.47 | 91.46 | 91.45 | 91.45 | 91.43 | 91.42 | 91.37 |
|
| 91.47 |
|
|
|
|
|
|
|
| 91.47 | 91.43 | 91.41 | 91.40 | 91.36 | 91.30 | 91.19 |
|
| 91.47 | 91.44 | 91.43 | 91.42 | 91.39 | 91.36 | 91.29 |
|
| 90.77 | 90.74 | 90.74 | 90.73 | 90.71 | 90.71 | 90.65 |
|
| 90.77 | 90.74 | 90.76 | 90.75 | 90.73 | 90.74 | 90.71 |
|
| 90.77 | 90.71 | 90.70 | 90.68 | 90.64 | 90.60 | 90.47 |
|
| 90.77 | 90.73 | 90.72 | 90.70 | 90.68 | 90.65 | 90.57 |
|
| 91.47 | 91.30 | 91.18 | 91.06 | 90.98 | 90.62 | 89.97 |
|
|
| 91.25 | 91.08 | 90.96 | 90.86 | 90.44 | 89.75 |
|
| 90.77 | 90.59 | 90.48 | 90.35 | 90.26 | 89.92 | 89.25 |
|
| 90.77 | 90.54 | 90.37 | 90.25 | 90.15 | 89.73 | 89.04 |
Legends are basically the same as in Table 1. Results of a total of 12 pipelines are shown. The AUC values for four pipelines (EEE-E, DDD-D, E-E, and D-D) in bold are also shown in Table 1. The DED-E pipeline outperforms the others overall
– Average AUC values for simulation data without replicates
| PG1 | 33 % | 50 % | 50 % | 60 % | 60 % | 70 % | 80 % |
| PG2 | 33 % | 30 % | 40 % | 20 % | 30 % | 20 % | 10 % |
| PG3 | 33 % | 20 % | 10 % | 20 % | 10 % | 10 % | 10 % |
|
| 77.15 | 76.88 | 76.78 | 76.63 | 76.88 | 76.15 | 75.48 |
|
| 77.15 | 76.86 | 76.73 | 76.59 | 76.86 | 76.08 | 75.41 |
|
| 77.15 | 76.88 | 76.79 | 76.64 | 76.88 | 76.19 | 75.57 |
|
| 77.15 | 76.87 | 76.75 | 76.61 | 76.87 | 76.13 | 75.50 |
|
| 81.51 | 81.14 | 81.28 | 80.93 | 81.14 | 80.51 | 79.97 |
|
| 81.52 | 81.14 | 81.25 | 80.90 | 81.14 | 80.45 | 79.90 |
|
| 81.49 | 81.14 | 81.28 | 80.94 | 81.14 | 80.55 | 80.05 |
|
| 81.51 | 81.15 | 81.26 | 80.91 | 81.15 | 80.49 | 79.98 |
|
| 77.15 | 76.87 | 76.76 | 76.60 | 76.87 | 76.10 | 75.36 |
|
| 77.15 | 76.86 | 76.71 | 76.57 | 76.86 | 76.04 | 75.35 |
|
| 81.49 | 81.13 | 81.27 | 80.91 | 81.13 | 80.46 | 79.86 |
|
| 81.53 | 81.12 | 81.23 | 80.88 | 81.12 | 80.41 | 79.84 |
|
| 82.46 | 82.18 | 82.08 | 81.98 | 82.18 | 81.52 | 80.97 |
|
|
| 82.18 | 82.08 | 81.98 | 82.18 | 81.50 | 80.89 |
|
| 82.46 | 82.17 | 82.04 | 81.95 | 82.17 | 81.43 | 80.81 |
|
| 82.46 |
|
|
|
|
|
|
|
| 82.46 | 82.17 | 82.06 | 81.97 | 82.17 | 81.48 | 80.90 |
|
| 82.46 | 82.16 | 82.01 | 81.92 | 82.16 | 81.38 | 80.73 |
|
| 82.46 | 82.17 | 82.07 | 81.96 | 82.17 | 81.45 | 80.76 |
|
| 82.46 | 82.16 | 82.02 | 81.93 | 82.16 | 81.39 | 80.74 |
Legends are basically the same as in Table 1. Results of a total of 20 pipelines under PDEG = 25 % are shown. The EDE-S pipeline outperforms the others overall
Fig. 1Overall similarity of 12 ranked gene lists applied for Blekhman’s real count data. The dendrogram of average-linkage clustering is shown. Spearman’s rank correlation coefficient (r) is used as a similarity metric; left-hand scale represents (1 - r)
– Classification of expression patterns for DEGs
| G1 = G2 = G3 | G1 > G2 = G3 | G1 > G2 > G3 | G1 > G3 > G2 | G2 > G1 = G3 | G2 > G1 > G3 | G2 > G3 > G1 | G3 > G1 = G2 | G3 > G1 > G2 | G3 > G2 > G1 | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| all_genes | 13.5 | 2.2 | 15.1 | 8.7 | 2.3 | 15.9 | 9.4 | 2.9 | 15.1 | 14.8 | 20689 |
| common | 0.0 | 0.1 | 23.2 | 5.8 | 0.2 | 26.4 | 5.7 | 0.7 | 18.6 | 19.2 | 2376 |
|
| 0.0 | 0.6 | 20.7 | 7.4 | 0.7 | 21.9 | 8.1 | 1.6 | 19.9 | 19.2 | 7247 |
|
| 0.0 | 0.4 | 25.0 | 7.3 | 0.6 | 25.0 | 6.0 | 1.4 | 17.3 | 17.1 | 3850 |
|
| 0.0 | 0.2 | 19.3 | 7.1 | 0.3 | 21.7 | 9.4 | 0.9 | 19.9 | 21.2 | 7295 |
|
| 0.0 | 0.6 | 20.4 | 7.3 | 0.7 | 22.1 | 8.3 | 1.6 | 19.7 | 19.3 | 7247 |
| edgeR_robust | 0.0 | 0.3 | 20.6 | 8.4 | 0.5 | 22.0 | 8.8 | 1.2 | 19.1 | 18.9 | 8076 |
|
| 0.0 | 0.4 | 24.3 | 7.2 | 0.6 | 24.2 | 6.0 | 1.4 | 17.8 | 18.1 | 3832 |
|
| 0.0 | 0.2 | 20.4 | 8.0 | 0.3 | 21.8 | 8.9 | 0.8 | 19.7 | 19.9 | 7585 |
| voom | 0.0 | 0.7 | 21.3 | 7.7 | 0.7 | 22.5 | 8.2 | 1.3 | 18.7 | 19.0 | 7016 |
| SAMseq | 0.0 | 0.2 | 20.9 | 9.7 | 0.3 | 21.8 | 9.2 | 0.8 | 18.9 | 18.3 | 9453 |
| PoissonSeq | 0.0 | 0.0 | 19.5 | 8.9 | 0.1 | 22.2 | 9.4 | 0.3 | 20.3 | 19.3 | 6613 |
| baySeq | 0.0 | 0.8 | 21.0 | 5.5 | 1.3 | 23.7 | 6.3 | 2.8 | 19.0 | 19.6 | 3975 |
| EBSeq | 0.0 | 0.0 | 21.0 | 7.0 | 0.1 | 23.7 | 7.1 | 0.3 | 20.8 | 19.9 | 5699 |
Percentages of genes assigned to each of the ten possible patterns defined as baySeq. Numbers in the “Total” column indicate the numbers of genes. For example, baySeq assigned 13.5 % of 20,689 genes as “G1 = G2 = G3.”
Fig. 2Reproducibility between ranked gene lists. Numbers of common genes between top-ranked genes for individual pipelines are shown: (a) results for 100 top-ranked gene lists and (b) results for 1000 top-ranked gene lists. Bars in black (rep1-6 vs. rep1-2), gray (rep1-6 vs. rep3-4), and blue (rep1-6 vs. rep5-6) in Fig. 2a indicate the numbers of common genes between the two sets of 100 top-ranked genes obtained from the individual pipelines. For example, the gray bar (rep1-6 vs. rep3-4) for DDD-D in Fig. 2a indicates that there were 46 common genes when the 100 top-ranked genes from the dataset rep1-6 are compared with the 100 top-ranked genes from the dataset rep3-4. Analogously, bars in red (rep1-2 vs. rep3-4 vs. rep5-6) in Fig. 2b indicate the numbers of common genes between the three sets of 1000 top-ranked genes for the three datasets (rep1-2, rep3-4, and rep5-6). For example, the red bar for EEE-E in Fig. 2b indicates that there were 397 common genes (39.7 % of overlapping genes) when the three sets of gene lists (each of which contains 1000 top-ranked genes) obtained from the pipeline EEE-E for the three datasets were compared. The full R code for this analysis is given in Additional file 5