| Literature DB >> 29040359 |
Xiangyu Li1, Hao Cai1, Xianlong Wang1, Lu Ao1, You Guo1, Jun He1, Yunyan Gu1, Lishuang Qi1, Qingzhou Guan1, Xu Lin1, Zheng Guo2.
Abstract
To detect differentially expressed genes (DEGs) in small-scale cell line experiments, usually with only two or three technical replicates for each state, the commonly used statistical methods such as significance analysis of microarrays (SAM), limma and RankProd (RP) lack statistical power, while the fold change method lacks any statistical control. In this study, we demonstrated that the within-sample relative expression orderings (REOs) of gene pairs were highly stable among technical replicates of a cell line but often widely disrupted after certain treatments such like gene knockdown, gene transfection and drug treatment. Based on this finding, we customized the RankComp algorithm, previously designed for individualized differential expression analysis through REO comparison, to identify DEGs with certain statistical control for small-scale cell line data. In both simulated and real data, the new algorithm, named CellComp, exhibited high precision with much higher sensitivity than the original RankComp, SAM, limma and RP methods. Therefore, CellComp provides an efficient tool for analyzing small-scale cell line data.Entities:
Keywords: differentially expressed genes; small-scale cell line data; technical replicates; within-sample relative expression orderings
Mesh:
Year: 2019 PMID: 29040359 PMCID: PMC6433897 DOI: 10.1093/bib/bbx135
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Data sets for human cancer cell lines analyzed in this study
| Data accession | Cell line | Cancer type | Treatment | Size |
|---|---|---|---|---|
| GSE29084 | HepG2 | Liver cancer | HNF4A knockdown | 2 VS 2 |
| GSE38581 | MKN45 | Gastric cancer | Hsa-miR-29c transfection | 2 VS 2 |
| GSE38581 | MKN74 | Gastric cancer | Hsa-miR-29c transfection | 2 VS 2 |
| GSE31450 | SNU638 | Gastric cancer | LAP2β transfection | 3 VS 3 |
| E-MEXP-1691 | HCT116 | Colon cancer | 5-FU-treatment for 24h | 3 VS 3 |
| GSE35004 | Hep3B | Liver cancer | YAP knockdown | 3 VS 3 |
| GSE78167 | MCF7 | Breast cancer | Estrogen-treatment for 2h | 3 VS 3 |
| GSE15709 | A2780 | Ovarian cancer | Cisplatin-induced resistance | 5 VS 5 |
Figure 1The flowchart for the CellComp algorithm. We use a given gene g to elucidate this algorithm. The first step is to extract the g-specific background gene pairs, each including g, which are stable in both State 1 and State 2. The second step is to perform Fisher’s exact test to test the null hypothesis that f and f are equal, where f1 and f2 denote the frequencies of gene pairs, among all g-specific background gene pairs, in which g shows a higher expression level than its partner genes in State 1 and State 2, respectively. After all genes are judged as potential DEGs or non-DEGs, the third step is to renew the g-specific background gene pairs. Only the gene pairs each including g and potential non-DEGs identified from Step 2 are retained. Step 2 and Step 3 are repeated until the number of detected DEGs stops changing.
GEO or ArrayExpress accession numbers of the untreated samples
| Data set | Untreatment 1 | Untreatment 2 | Untreatment 3 |
|---|---|---|---|
| HepG2 | GSM720616 | GSM720618 | – |
| MKN45 | GSM945747 | GSM945748 | – |
| MKN74 | GSM945751 | GSM945752 | – |
| SNU638 | GSM781665 | GSM781666 | GSM781667 |
| HCT116 | S0114F032 | S0114F040 | S0114F042 |
| Hep3B | GSM860172 | GSM860173 | GSM860174 |
| MCF7 | GSM2068643 | GSM2068653 | GSM2068663 |
The average numbers of DEGs identified from the null data sets based on the HepG2, MKN45 and MKN74 data each state with two technical replicates
|
| Null data set | ||
|---|---|---|---|
| HepG2 | MKN45 | MKN74 | |
| 5 | 2231.35 | 1689.44 | 1837.88 |
| 15 | 18.16 | 326.85 | 373.11 |
| 25 | 0 | 32.32 | 31.05 |
Sensitivity, specificity and F-score of DEGs identified by CellComp, RankComp, SAM, limma and RP for simulated data
| FDR (%) | Evaluation | HepG2 | SNU638 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| CellComp | RankComp | SAM | limma | RP | CellComp | RankComp | SAM | limma | RP | ||
| 1 | Sensitivity | 0.709 | 0.628 | 0 | 0.025 | 0 | 0.956 | 0.736 | 0.475 | 0.648 | 0 |
| Specificity | 1 | 1 | 1 | 1 | 1 | 0.999 | 0.999 | 1 | 1 | 1 | |
|
| 0.829 | 0.772 | 0 | 0.049 | 0 | 0.977 | 0.848 | 0.644 | 0.786 | 0 | |
| 5 | Sensitivity | 0.726 | 0.644 | 0.041 | 0.197 | 0.015 | 0.965 | 0.744 | 0.635 | 0.878 | 0.288 |
| Specificity | 1 | 1 | 1 | 1 | 1 | 0.999 | 0.999 | 1 | 1 | 1 | |
|
| 0.842 | 0.784 | 0.079 | 0.328 | 0.029 | 0.982 | 0.853 | 0.777 | 0.935 | 0.447 | |
| 10 | Sensitivity | 0.798 | 0.755 | 0.345 | 0.393 | 0.058 | 0.970 | 0.755 | 0.802 | 0.936 | 0.506 |
| Specificity | 1 | 1 | 0.994 | 1 | 1 | 0.999 | 0.999 | 1 | 1 | 1 | |
|
| 0.888 | 0.860 | 0.508 | 0.564 | 0.109 | 0.984 | 0.860 | 0.890 | 0.967 | 0.672 | |
Figure 2Concordance analyses of the detected DEGs. The precision of DEGs is marked in the brackets. (A) The concordance of DEGs identified by CellComp and SAM (or SAMseq), limma or RP. (B) The concordance of DEGs identified by CellComp and RankComp.