| Literature DB >> 34290733 |
Limei Wang1,2,3, Weixin Xie1, Kongning Li2, Zhenzhen Wang2, Xia Li2,3, Weixing Feng1, Jin Li2,3.
Abstract
Differential co-expression-based pathway analysis is still limited and not widely used. In most current methods, the pathways were considered as gene sets, but the gene regulation relationships were not considered, and the computational speed was slow. In this article, we proposed a novel Dysregulated Pathway Identification Analysis (DysPIA) method to overcome these shortcomings. We adopted the idea of Correlation by Individual Level Product into analysis and performed a fast enrichment analysis. We constructed a combined gene-pair background which was much more sufficient than the background used in Edge Set Enrichment Analysis. In simulation study, DysPIA was able to identify the causal pathways with high AUC (0.9584 to 0.9896). In p53 mutation data, DysPIA obtained better performance than other methods. It obtained more potential dysregulated pathways that could be literature verified, and it ran much faster (∼1,700-8,000 times faster than other methods when 10,000 permutations). DysPIA was also applied to breast cancer relapse dataset and breast cancer subtype dataset. The results show that DysPIA is effective and has a great biological significance. R packages "DysPIA" and "DysPIAData" are constructed and freely available on R CRAN (https://cran.r-project.org/web/packages/DysPIA/index.html and https://cran.r-project.org/web/packages/DysPIAData/index.html), and on GitHub (https://github.com/lemonwang2020).Entities:
Keywords: differential co-expression; differential expression; differential variability; dysregulated pathway; enrichment analysis; gene regulation
Year: 2021 PMID: 34290733 PMCID: PMC8287415 DOI: 10.3389/fgene.2021.647653
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Toy example of DE, DV, and DC. (A) The means of gene expression are significantly different between the two conditions, such as case and control, while the variances are similar. (B) The variances of gene expression are significantly different between the two conditions, while the means are similar. (C) The correlation coefficients of the two genes are significantly different between the two conditions.
FIGURE 2The flowchart of DysPIA. In the left side, there are the input data, gene expression, combined background, and pathway gene list. From the gene expression data, the CILP-like gene pair scores are calculated, then the DysGPS and DysPS are calculated in the right side. The results are shown in the bottom.
The simulation data with 100,000 gene pairs.
| Gene pair | Number of gene pairs | Correlation coefficients | |
| 500 | |||
| 500 | |||
| GCDG | 2,500 | 500 | |
| 500 | |||
| 500 | |||
| NDG | 95,000 | ||
| 500 | |||
| 500 | |||
| LCDG | 2,500 | 500 | |
| 500 | |||
| 500 | |||
Summary of pathways and gene pairs.
| Database name | Number of pathways | Number of gene pairs |
| Reactome | 1,901 | 264,867 |
| KEGG | 306 | 60,571 |
| BioCarta | 247 | 5,421 |
| Panther | 84 | 12,951 |
| PathBank | 48,593 | 6,882 |
| NCI | 212 | 14,198 |
| SMPDB | 48,581 | 6,777 |
| PharmGKB | 60 | 2,727 |
| Total pathways | 99,984 | 333,484 |
| Random background | NA | 349,915 |
| Combined background | NA | 682,417 |
FIGURE 3ROC curves in simulation study. (A) ROC curves in the five simulations with different proportions (20–60%) of dysregulated gene pairs. (B) ROC curves of pathways with different sizes (20, 40, 60, 80, and 100) when the proportion of dysregulated gene pairs is 20%. The group “Pathways with all sizes” means the union set of pathways with different sizes (20, 40, 60, 80, and 100). (C–F) Similar to (B), ROC curves of pathways when the proportions of dysregulated gene pairs are 30–60%, respectively.
FIGURE 4The running time comparison of the four methods.
The correlation coefficients of P-value between 1,000 and 10,000 permutations.
| Method | PCC | SCC |
| DysPIA | 0.9986 | 0.9983 |
| ESEA | 0.9064 | 0.9037 |
| GSCA | 0.9982 | 0.9976 |
| GSNCA | 0.9986 | 0.9981 |
The correlation of P-values between methods.
| Method | DysPIA | ESEA | GSCA | GSNCA |
| DysPIA | 1 | 0.1900 | 0.1138 | −0.1011 |
| ESEA | 0.1967 | 1 | 0.0516 | −0.0308 |
| GSCA | 0.1083 | 0.0531 | 1 | 0.0150 |
| GSNCA | −0.1100 | −0.0308 | 0.0222 | 1 |
Number of dysregulated pathways between subtypes.
| Subtype | Basal | Her2 | LumA | LumB |
| Basal | – | 53 | 75 | 50 |
| Her2 | 33 | – | 49 | 19 |
| LumA | 48 | 20 | – | 42 |
| LumB | 20 | 10 | 26 | – |