| Literature DB >> 29126216 |
Zongliang Yue1, Qi Zheng1,2, Michael T Neylon3, Minjae Yoo4, Jimin Shin4, Zhiying Zhao1,5, Aik Choon Tan4, Jake Y Chen1.
Abstract
Integrative Gene-set, Network and Pathway Analysis (GNPA) is a powerful data analysis approach developed to help interpret high-throughput omics data. In PAGER 1.0, we demonstrated that researchers can gain unbiased and reproducible biological insights with the introduction of PAGs (Pathways, Annotated-lists and Gene-signatures) as the basic data representation elements. In PAGER 2.0, we improve the utility of integrative GNPA by significantly expanding the coverage of PAGs and PAG-to-PAG relationships in the database, defining a new metric to quantify PAG data qualities, and developing new software features to simplify online integrative GNPA. Specifically, we included 84 282 PAGs spanning 24 different data sources that cover human diseases, published gene-expression signatures, drug-gene, miRNA-gene interactions, pathways and tissue-specific gene expressions. We introduced a new normalized Cohesion Coefficient (nCoCo) score to assess the biological relevance of genes inside a PAG, and RP-score to rank genes and assign gene-specific weights inside a PAG. The companion web interface contains numerous features to help users query and navigate the database content. The database content can be freely downloaded and is compatible with third-party Gene Set Enrichment Analysis tools. We expect PAGER 2.0 to become a major resource in integrative GNPA. PAGER 2.0 is available at http://discovery.informatics.uab.edu/PAGER/.Entities:
Mesh:
Year: 2018 PMID: 29126216 PMCID: PMC5753198 DOI: 10.1093/nar/gkx1040
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Statistics of PAGER 2.0 as compared to PAGER 1.0
| PAGER 1.0 | PAGER 2.0 | Increase ratio | |
|---|---|---|---|
|
| 44 313 | 65 774 | 148% |
|
| 115 840 | 601 164 | 518% |
| PPI | 93 713 | 579 037 | 617% |
| Gene Regulation | 22 127 | 22 127 | 100% |
|
| 38 379 | 84 282 | 219% |
| Singleton ( | 19 772 | 27 206 | 137% |
| Regular ( | 18 607 | 57 076 | 306% |
| with CoCo scores ( | 14 701 | 42 048 | 286% |
| with CoCo score ≥ 1 | 13 856 | 15 028 | 108% |
|
| |||
| m-type (V1:logPMF > 5 V2:logCDF > 10) | 3 101 499 | 7 418 174 | 239% |
| r-type (V1:PMF < 0.05 V2:CDF < 0.05) | 72 824 | 120 101 | 164% |
| sPAG to mPAG | 7250 | 28 744 | 396% |
| mPAG to mPAG | 39 253 | 83 741 | 213% |
| mPAG to sPAG | 2479 | 4613 | 186% |
An example of comparing the PAG quality using nCoCo score
| PAG Id | Type | PAG name | PAG size | Theoretical PPI | PPI |
|
|
|---|---|---|---|---|---|---|---|
| WIG001980 | P | Non-homologous end joining | 6 | 15 | 13 | 88 | 1153 |
| WIG001424 | P | Actin Nucleation and Branching | 101 | 5050 | 612 | 2094 | 130 |
Figure 1.Distribution of the PAG size from 22 data sources. The color indicates the PAGs size distribution from 22 different sources. The MsigDB includes the ImmuSigDB and the Genome Data has not shown since the size is equal to 1.
Figure 2.Comparisons of PAGER 2.0 nCoCo distribution and PAGER 1.0 nCoCo distribution. (A) The cumulative percentage of nCoCo score in PAGER 1.0 and PAGER 2.0. The gray line is the nCoCo score of PAGER 1.0 and the black line is the nCoCo score of PAGER 2.0. The dash line indicates the cross point of the nCoCo score of PAGER 1.0 and PAGER 2.0 at PAG size = 128 and cumulative percentage = 0.50. (B) PAGER 1.0 nCoCo distribution. The bin size is in increments of 20.2 to form the range of [2x,2x+0.2]. x ranges from 0 to 16. (C) PAGER-2.0 nCoCo distribution. The pie-chart shows the nCoCo bin of [26.8,27].
Figure 3.PAGER 2.0 Web Interface. (A) The refined result page searching by keyword. (B) The overall of retrieved PAG results by using a list of genes relevant to Non-Small Cell Lung Cancer. Statistical parameters and nCoCo score for filtering the results. (C) Results of the PAGs related to the query of genes relevant to Non-Small Cell Lung Cancer. (D) The m-type and r-type PAG-to-PAG relationships, (E) Visualization of the m-type and r-type PAGs networks and PAG-to-PAG similarity matrix.
Figure 4.The r-type PAG-to-PAG's network of NSCLC study. The nodes represent the PAGs. The width of the edges denotes the score of r-type PAG-to-PAG's relationship. The node color represents the -log2(FDR) value of the PAGs in the NSCLC enrichment analysis. The size and shape of the nodes represent the degree and the type of PAGs, respectively.
Figure 5.Gene prioritization with RP-score. (A) Top 10 genes in the ‘Non-Small Cell Lung Cancer’ PAG WAG000379 ranked by RP-score. (B) Genes with high RP-score (colored in red) are tightly connected in the protein–protein interactions. The size of the nodes represents the RP-score, and the width of the edges represents the confidence score for the protein–protein interactions as obtained from HAPPI-2 database source.
New features in PAGER 2.0
| New features in PAGER 2.0 | |
|---|---|
|
| • Gene prioritization in PAGs |
| • Evidence of gene member in PAGs supporting from PUBMED | |
| • m-type and r-type PAG-to-PAG relationship detail, gene–gene interaction and gene–gene regulation in PAGs | |
| • New PAGER 2.0 GMT file for GSEA | |
|
| • Bulk download of PAG’s information, PAG-to-PAG relationship, and gene–gene relationship |
| • Marks the suspected PAGs with comments and submit to our system for curation | |
| • Uploading system updates: supports file uploading | |
| • Search button to filter the content in the results | |