| Literature DB >> 29467826 |
Abstract
MOTIVATION: Detecting differentially expressed (DE) genes between disease and normal control group is one of the most common analyses in genome-wide transcriptomic data. Since most studies don't have a lot of samples, researchers have used meta-analysis to group different datasets for the same disease. Even then, in many cases the statistical power is still not enough. Taking into account the fact that many diseases share the same disease genes, it is desirable to design a statistical framework that can identify diseases' common and specific DE genes simultaneously to improve the identification power.Entities:
Keywords: Cross disease transcriptome; Differentially expressed; Gene expression; Public data integration
Year: 2018 PMID: 29467826 PMCID: PMC5819186 DOI: 10.1186/s13040-018-0163-y
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Summaries of six different cancer data sets used in this study
| Dataset ID | Disease Name | Microarray Platforms | # of disease samples | # of control samples | Reference |
|---|---|---|---|---|---|
| GSE13507 | Bladder cancer | GPL6102 | 165 | 68 | [ |
| GSE41258 | Colorectal cancer | GPL96 | 181 | 58 | [ |
| GSE19188 | NSCLC | GPL570 | 91 | 65 | [ |
| GSE9476 | AML | GPL96 | 26 | 38 | [ |
| GSE32863 | Lung adenocarcinoma | GPL6884 | 58 | 58 | [ |
| GSE1542 | Pancreatic Cancer | GPL96 | 24 | 25 | [ |
Abbreviations: NSCLC Non-Small Cell Lung Carcinoma, AML Acute Myelocytic Leukemia
Fig. 1Workflow of proposed joint analysis framework
Fig. 2Average sensitivity and False Discovery Rate (FDR) comparison between single data set analysis and joint analysis under different simulation parameter setup. The results are summarized from 100 runs. a Average sensitivity comparison. b Average FDR comparison
Fig. 3Number of true genes against top ranked genes evaluated by different methods under different simulation parameter setup: shared percentage = 0.6, 0.8, 1; number of data sets = 2, 4, 6
Comparison of estimated prior probability with true ratio in the simulation study
| DE | X | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 600 | 700 | 800 | 900 | 1000 | ||||||
| Estimate | Truth | Estimate | Truth | Estimate | Truth | Estimate | Truth | Estimate | Truth | |
| (0,0) | 0.8857 (0.006)a | 0.86 | 0.8914 (0.006) | 0.87 | 0.8982 (0.006) | 0.88 | 0.9054 (0.005) | 0.89 | 0.9068 (0.003) | 0.9 |
| (0,1) | 0.041 (0.005) | 0.04 | 0.0356 (0.005) | 0.03 | 0.0284 (0.004) | 0.02 | 0.0218 (0.003) | 0.01 | 0.018 (0.003) | 0 |
| (1,0) | 0.042 (0.008) | 0.04 | 0.0365 (0.008) | 0.03 | 0.0316 (0.006) | 0.02 | 0.0258 (0.003) | 0.01 | 0.02 (0.004) | 0 |
| (1,1) | 0.031 (0.003) | 0.06 | 0.0364 (0.002) | 0.07 | 0.0417 (0.004) | 0.08 | 0.0469 (0.005) | 0.09 | 0.0546 (0.004) | 0.1 |
a The values in the parentheses represent the standard deviation summarized from 10 repeated runs
Fig. 4Influence of sample size of a similar disease to be borrowed from. The results are averaged from 100 runs for each sample size parameter setup. a Average sensitivity comparison. b Average FDR comparison
Fig. 5DE gene identification results comparison between joint analysis and single data set analysis on six cancer data sets. a The total number of identified DE genes in each cancer. b The number of cancer-related DE genes identified in each cancer. The enrichment level is evaluated by the p-value of hypergeometric test. ***: < 0.001; **: < 0.01; *: < 0.05
Fig. 6Scatterplot of Z-scores between two cancers. a Scatterplot of Z-score pairs between colorectal cancer and pancreatic cancer. b Scatterplot of Z-score pairs between colorectal cancer and bladder cancer
Pair-wise similarity estimated among cancers
| Bladder | Colorectal | NSCLC | AML | Lung | Pancreatic | |
|---|---|---|---|---|---|---|
| Bladder | 1 | 0.521 | 0.525 | 0.219 | 0.435 | 0.132 |
| Colorectal | 0.521 | 1 | 0.511 | 0.168 | 0.486 | 0.117 |
| NSCLC | 0.525 | 0.511 | 1 | 0.251 | 0.906 | 0.148 |
| AML | 0.219 | 0.168 | 0.251 | 1 | 0.197 | 0.085 |
| Lung | 0.435 | 0.486 | 0.906 | 0.197 | 1 | 0.124 |
| Pancreatic | 0.132 | 0.117 | 0.148 | 0.085 | 0.124 | 1 |
Number of genes in KEGG pathway of AD and HD among top ranked genes in each neurodegenerative disorder
| Alzheimer’s Disease | Huntington’s disease | |||
|---|---|---|---|---|
| Top Rank | Single | Joint | Single | Joint |
| < 250 | 2 | 4 | 5 | 6 |
| < 500 | 9 | 12 | 10 | 16 |
| < 750 | 19 | 21 | 14 | 24 |
| < 1000 | 29 | 30 | 19 | 32 |
Posterior probability and rank comparison of 15 HD-related genes exclusively identified by joint analysis among top 1000 genes
| Gene Symbol | Single Pa | Joint Pb | Single Rank | Joint Rank |
|---|---|---|---|---|
| ATP5B | 0.722686833 | 0.938701726 | 1475 | 652 |
| ATP5F1 | 0.712111394 | 0.933163998 | 1690 | 954 |
| ATP5G1 | 0.724132275 | 0.939691574 | 1443 | 594 |
| ATP5J | 0.71465263 | 0.935545911 | 1639 | 827 |
| CLTA | 0.704691436 | 0.933779808 | 1833 | 918 |
| COX4I1 | 0.732382138 | 0.938415838 | 1208 | 673 |
| NDUFA7 | 0.718914525 | 0.938173615 | 1553 | 683 |
| NDUFA9 | 0.738565931 | 0.942103957 | 1037 | 460 |
| NDUFB5 | 0.731956544 | 0.940700886 | 1218 | 546 |
| NDUFB6 | 0.709427659 | 0.932649277 | 1741 | 972 |
| POLR2K | 0.705645646 | 0.935145538 | 1806 | 850 |
| SLC25A5 | 0.737126871 | 0.934489089 | 1089 | 884 |
| UQCRC1 | 0.727082935 | 0.932648306 | 1373 | 973 |
| UQCRH | 0.723833736 | 0.934133476 | 1448 | 902 |
| VDAC2 | 0.738319827 | 0.944168188 | 1045 | 327 |
a Posterior probability of true DE status in single data set analysis
b Posterior probability of true DE status in joint analysis
Top 10 KEGG pathway enrichment results comparison between (A) single data set analysis and (B) joint analysis in Huntington’s disease
| Term | Count | Enrichment Pvalue | Bonferroni corrected P |
|---|---|---|---|
| (A) Single | |||
| hsa03050:Proteasome | 12 | 5.14E-07 | 1.21E-04 |
| hsa05010:Alzheimer’s disease | 20 | 1.95E-05 | 0.004579209 |
| hsa05012:Parkinson’s disease | 18 | 2.58E-05 | 0.006034046 |
| hsa05016:Huntington’s disease | 19 | 3.66E-04 | 0.08242395 |
| hsa05033:Nicotine addiction | 7 | 0.003777846 | 0.589128604 |
| hsa00190:Oxidative phosphorylation | 13 | 0.004721495 | 0.671158323 |
| hsa04932:Non-alcoholic fatty liver disease (NAFLD) | 14 | 0.00497103 | 0.689975889 |
| hsa05169:Epstein-Barr virus infection | 15 | 0.013743654 | 0.9613094 |
| hsa04723:Retrograde endocannabinoid signaling | 10 | 0.01471692 | 0.969321024 |
| hsa04728:Dopaminergic synapse | 11 | 0.02422846 | 0.996860832 |
| (B) Joint | |||
| hsa05012:Parkinson’s disease | 29 | 3.18E-13 | 7.59E-11 |
| hsa00190:Oxidative phosphorylation | 27 | 2.90E-12 | 6.92E-10 |
| hsa05016:Huntington’s disease | 32 | 4.34E-12 | 1.04E-09 |
| hsa05010:Alzheimer’s disease | 29 | 2.33E-11 | 5.57E-09 |
| hsa04932:Non-alcoholic fatty liver disease (NAFLD) | 22 | 2.14E-07 | 5.12E-05 |
| hsa03050:Proteasome | 10 | 3.21E-05 | 0.007653523 |
| hsa01100:Metabolic pathways | 73 | 5.47E-05 | 0.012999463 |
| hsa05169:Epstein-Barr virus infection | 20 | 1.02E-04 | 0.024158392 |
| hsa04721:Synaptic vesicle cycle | 10 | 5.70E-04 | 0.127417187 |
| hsa01200:Carbon metabolism | 13 | 0.001156092 | 0.241540496 |