| Literature DB >> 29989020 |
Hao Cai1, Xiangyu Li1, Jing Li1, Qirui Liang1, Weicheng Zheng1,2, Qingzhou Guan1, Zheng Guo1,3,2, Xianlong Wang1.
Abstract
It is a basic task in high-throughput gene expression profiling studies to identify differentially expressed genes (DEGs) between two phenotypes. But the weakly differential expression signals between two phenotypes are hardly detectable with limited sample sizes. To solve this problem, many researchers tried to combine multiple independent datasets using meta-analysis or batch effect adjustment algorithms. However, these algorithms may distort true biological differences between two phenotypes and introduce unacceptable high false rates, as demonstrated in this study. These problems pose critical obstacles for analyzing the transcriptional data in The Cancer Genome Atlas where there are many small-scale batches of data. Previously, we developed RankComp to detect DEGs for individual disease samples through exploiting the incongruous relative expression orderings between two phenotypes and further improved it here to identify DEGs using multiple independent datasets. We demonstrated the improved RankComp can directly analyze integrated cross-site data to detect DEGs between two phenotypes without the need of batch effect adjustments. Its usage was illustrated in detecting weak differential expression signals of breast cancer drug-response data using combined datasets from multiple experiments.Entities:
Keywords: Batch effect; Differentially expressed genes; Drug response; Relative expression orderings
Mesh:
Year: 2018 PMID: 29989020 PMCID: PMC6036750 DOI: 10.7150/ijbs.24548
Source DB: PubMed Journal: Int J Biol Sci ISSN: 1449-2288 Impact factor: 6.580
Description of the datasets used in this study
| Number | Platforms | #Gene | Control | Case | Control vs. Case | Database |
|---|---|---|---|---|---|---|
| GSE23878 | GPL570a | 20486 | 24 | 35 | Normal vs. COAD | GEO |
| GSE37364 | GPL570 | 20486 | 38 | 27 | Normal vs. COAD | GEO |
| GSE20916 | GPL570 | 20486 | 24 | 45 | Normal vs. COAD | GEO |
| GSE29001 | GPL571b | 12432 | 12 | 12 | Normal vs. ESCC | GEO |
| GSE20347 | GPL571 | 12432 | 17 | 17 | Normal vs. ESCC | GEO |
| GSE38129 | GPL571 | 12432 | 30 | 30 | Normal vs. ESCC | GEO |
| Batch93 | RNASeqV2c | 17618 | 16 | 41 | Normal vs. BRCA | TCGA |
| Batch96 | RNASeqV2 | 17675 | 15 | 41 | Normal vs. BRCA | TCGA |
| Batch109 | RNASeqV2 | 17679 | 15 | 70 | Normal vs. BRCA | TCGA |
| The data of ER positive breast cancer | ||||||
| GSE22093 | GPL96d | 12432 | 10 | 32 | pCR vs. RD | GEO |
| GSE23988 | GPL96 | 12432 | 7 | 24 | pCR vs. RD | GEO |
| GSE42822 | GPL96 | 12432 | 7 | 19 | pCR vs. RD | GEO |
| GSE20271 | GPL96 | 12432 | 6 | 83 | pCR vs. RD | GEO |
| GSE20194T | GPL96 | 12432 | 4 | 61 | pCR vs. RD | GEO |
| The data of ER negative breast cancer | ||||||
| GSE22093 | GPL96 | 12432 | 18 | 37 | pCR vs. RD | GEO |
| GSE23988 | GPL96 | 12432 | 13 | 16 | pCR vs. RD | GEO |
| GSE42822 | GPL96 | 12432 | 13 | 15 | pCR vs. RD | GEO |
| GSE20271 | GPL96 | 12432 | 13 | 50 | pCR vs. RD | GEO |
| GSE20194T | GPL96 | 12432 | 16 | 16 | pCR vs. RD | GEO |
Note: a Affymetrix Human Genome U133 Plus 2.0 Array; b Affymetrix Human Genome U133A 2.0 Array; c UNC IlluminaHiSeq_RNASeqV2; d Affymetrix Human Genome U133A Array Abbreviation: COAD: Colon adenocarcinoma; ESCC: Esophageal squamous cell carcinoma; BRCA, Breast invasive carcinoma; pCR: pathologic complete response; RD, residual disease. GEO: Gene Expression Omnibus; TCGA, The Cancer Genome Atlas.
Figure 1A schematic diagram of the RankCompV2 algorithm. Focusing on the overlaps of the two lists of stable gene pairs, the stable REOs in the control samples are defined as the background stable REOs while the reversely stable REOs in the case samples are defined as the reversal REOs of the disease. For a given gene G1, a indicates the number of gene pairs with the REO pattern (G1 > Gi), and b indicates that with (G1 < Gj) in background stable REOs; x indicates the number of reversal gene pairs with shift (G1 < Gj→G1 > Gj) and y indicates the number of reversal gene pairs with shift (G1 > Gi→G1 < Gi) in reversal REOs of the disease 28, 30, 35.
Figure 2Performance evaluation of different algorithms. (A) Performance of RankCompV2 in nine datasets; (B) Comparison of the results of RankCompV2 with other algorithms in merged COAD, ESCC and BRCA datasets, respectively.
Figure 3Stacked bar chart for the distribution of the numbers of DEGs identified from the simulated null datasets among 100 repeated experiments.
Figure 4The enriched biological pathways by the DEGs in RD groups.