| Literature DB >> 28465555 |
Guini Hong1, Hongdong Li2, Jiahui Zhang2, Qingzhou Guan2, Rou Chen2, Zheng Guo3,4.
Abstract
Due to the invasiveness nature of tissue biopsy, it is common that investigators cannot collect sufficient normal controls for comparison with diseased samples. We developed a pathway enrichment tool, DRFunc, to detect significantly disease-disrupted pathways by incorporating normal controls from other experiments. The method was validated using both microarray and RNA-seq expression data for different cancers. The high concordant differentially ranked (DR) gene pairs were identified between cases and controls from different independent datasets. The DR gene pairs were used in the DRFunc algorithm to detect significantly disrupted pathways in one-phenotype expression data by combing controls from other studies. The DRFunc algorithm was exemplified by the detection of significant pathways in glioblastoma samples. The algorithm can also be used to detect altered pathways in the datasets with weak expression signals, as shown by the analysis on the expression data of chemotherapy-treated breast cancer samples.Entities:
Mesh:
Year: 2017 PMID: 28465555 PMCID: PMC5431047 DOI: 10.1038/s41598-017-01536-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Datasets used in this study.
| Dataseta | Case | Control | Data source | Platform |
|---|---|---|---|---|
| GC38-31 | 38 | 31 | GSE13911 | GPL570 |
| GC12-15 | 12 | 15 | GSE19826 | GPL570 |
| LC91-65 | 91 | 65 | GSE19188 | GPL570 |
| LC60-60 | 60 | 60 | GSE19804 | GPL570 |
| BC12-27 ER | 12 | 27 | GSE10810 | GPL570 |
| BC34-17 ER | 34 | 17 | GSE42568 | GPL570 |
| GBM34-13 | 34 | 13 | GSE50161 | GPL570 |
| GBM70-0 | 70 | 0 | GSE53733 | GPL570 |
| BC68-46 Response | 68 | 46 | GSE20194 | GPL96 |
| BC61-19 Response | 61 | 19 | GSE20271 | GPL96 |
| LUAD125-37 | 125 | 37 | TCGA | HiSeq2000 |
| CRC32-32 | 32 | 32 | GSE8671 | GPL570 |
| COAD285-41 | 285 | 41 | TCGA | HiSeq2000 |
Denotes: aGC denotes gastric cancer, LC denotes lung cancer, BC denotes breast cancer, ER denotes estrogen receptor, GBM denotes glioblastoma, LUAD denotes lung adenocarcinoma, CRC denotes colorectal cancer, and COAD denotes colon adenocarcinoma. We referred to each dataset using the following nomenclature: cancer type followed by the number of case and control samples separated by a hyphen sign.
Figure 1Flowchart of DRFunc. The DRFunc algorithm includes three steps: input of expression profiles for case and control samples (from the same or different experiments), DR gene pair identification, annotation and detection of significant pathways.
Mean and standard deviation of the number of DR gene pairs identified from random subsets.
| Dataset | #DR pair | #Overlapped pair | #Concordant pair | Concordant ratio |
|---|---|---|---|---|
| GC38-31 | 1054900 ± 237429 | |||
| 1169868 ± 271089 | 586201 ± 36373 | 586198 ± 36373 | 0.9999 ± 8.42 × 10−6 | |
| LC91-65 | 5211347 ± 236859 | |||
| 4983364 ± 256758 | 4078924 ± 69845 | 4078880 ± 69861 | 0.9999 ± 6.74 × 10−6 | |
| BC34-17 ER | 1199844 ± 328353 | |||
| 1046124 ± 308752 | 595768 ± 86284 | 595768 ± 86284 | 0.9999 ± 2.2 × 10−16 |
Concordance of DR gene pairs identified for each cancer dataset.
| Dataset | #DR pair | #Overlapped pair | #Concordant pair | Concordant ratio |
|---|---|---|---|---|
| GC12-15 | 249379 | 188706 | 186655 | 0.9997 |
| GC38-31 | 3060133 | |||
| LC60-60 | 5035285 | 3785548 | 3724663 | 0.9839 |
| LC91-65 | 7977878 | |||
| BC12-27 ER | 2527003 | 1406505 | 1404282 | 0.9984 |
| BC34-17 ER | 3087813 |
Concordance of DR gene pairs identified from datasets with the same case samples but different control samples.
| Dataset | #DR pair | #Overlapped pair | #Concordant pair | Concordant ratio |
|---|---|---|---|---|
| GC12-31 | 3870438 | 163670 | 162305 | 0.9919 |
| GC12-15 | 249379 | |||
| GC38-15 | 4523783 | 1560772 | 1442242 | 0.9241 |
| GC38-31 | 3060133 | |||
| LC60-65 | 7387229 | 3982182 | 3799350 | 0.9541 |
| LC60-60 | 5035285 | |||
| LC91-60 | 8935664 | 6335001 | 6030374 | 0.9519 |
| LC91-65 | 7977878 | |||
| BC12-17 ER | 2649823 | 1130216 | 1117603 | 0.9888 |
| BC12-27 ER | 2527003 | |||
| BC34-27 ER | 6630077 | 2393323 | 2332764 | 0.9747 |
| BC34-17 ER | 3087813 |
Figure 2Overlaps of significant pathways detected for the three cancer types. The bar plot shows the number of significant pathways (y-axis) shared by at least two, three, four, five and six datasets (x-axis) for BC, LC and GC.
Figure 3Numbers of significant MSigDB pathways detected for GC, LC, and BC.
Figure 4Venn diagrams for the number of significant MSigDB pathways detected for GBM. The 1330 significant MSigDB pathways were divided into five groups according to the source databases, including Biocarta, KEGG, PID, Reactom and the others.