| Literature DB >> 24977143 |
Yan Guo1, Shilin Zhao1, Fei Ye1, Quanhu Sheng1, Yu Shyr1.
Abstract
BACKGROUND: After a decade of microarray technology dominating the field of high-throughput gene expression profiling, the introduction of RNAseq has revolutionized gene expression research. While RNAseq provides more abundant information than microarray, its analysis has proved considerably more complicated. To date, no consensus has been reached on the best approach for RNAseq-based differential expression analysis. Not surprisingly, different studies have drawn different conclusions as to the best approach to identify differentially expressed genes based upon their own criteria and scenarios considered. Furthermore, the lack of effective quality control may lead to misleading results interpretation and erroneous conclusions. To solve these aforementioned problems, we propose a simple yet safe and practical rank-sum approach for RNAseq-based differential gene expression analysis named MultiRankSeq. MultiRankSeq first performs quality control assessment. For data meeting the quality control criteria, MultiRankSeq compares the study groups using several of the most commonly applied analytical methods and combines their results to generate a new rank-sum interpretation. MultiRankSeq provides a unique analysis approach to RNAseq differential expression analysis. MultiRankSeq is written in R, and it is easily applicable. Detailed graphical and tabular analysis reports can be generated with a single command line.Entities:
Mesh:
Year: 2014 PMID: 24977143 PMCID: PMC4058234 DOI: 10.1155/2014/248090
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1(a) Cluster result using all genes shows control 1 clustered together with disease group. (b) Cluster results using genes with top 5% coefficient of variation, control group, and disease group are now clustered correctly.
Figure 2(a) Boxplots of gene raw read count. (b) Correlation matrix of all genes between all pairs of samples using raw read count.
Figure 3(a) Venn diagram of differential expression analyses by DESeq, edgeR, and baySeq. The Venn diagram can be drawn based on P value, fold change, or rank. (b) Scalable volcano plot representing fold change, P value, and rank. Rank is presented as the size of the circle, and larger size denotes higher ranking. (c) Heatmap of top differentially expressed genes. MultiRankSeq produces heatmap based on P value, fold change, and rank; only genes selected by fold change are shown here.
Analysis difference for IGHG2.
| Method | Adjusted | log2 FC | Rank |
|---|---|---|---|
| DESeq | 0.278 | 3.00 | 2572 |
| edgeR | 0.047 | 2.92 | 712 |
| baySeq | 0.907 | NA | 24962 |
| Cuffdiff | <0.001 | 5.83 | 13 |
Read count of samples for IGHG2 gene.
| Disease 1 | Disease 2 | Disease 3 | Control 1 | Control 2 | Control 3 | |
|---|---|---|---|---|---|---|
| Read count (IGHG2) | 391 | 2038 | 338 | 634 | 10282 | 1764 |
| Total read count | 49870084 | 65550902 | 71454121 | 35641084 | 44863975 | 49052840 |
| Adjusted read Count1 | 78 | 311 | 47 | 178 | 2292 | 360 |
1Adjusted read count of gene A is computed as read count of a gene A divided by total read count of the sample times a constant.