| Literature DB >> 25213199 |
Qing Xiong1, Sayan Mukherjee2, Terrence S Furey3.
Abstract
RNA-Seq is quickly becoming the preferred method for comprehensively characterizing whole transcriptome activity, and the analysis of count data from RNA-Seq requires new computational tools. We developed GSAASeqSP, a novel toolset for genome-wide gene set association analysis of sequence count data. This toolset offers a variety of statistical procedures via combinations of multiple gene-level and gene set-level statistics, each having their own strengths under different sample and experimental conditions. These methods can be employed independently, or results generated from multiple or all methods can be integrated to determine more robust profiles of significantly altered biological pathways. Using simulations, we demonstrate the ability of these methods to identify association signals and to measure the strength of the association. We show that GSAASeqSP analyses of RNA-Seq data from diverse tissue samples provide meaningful insights into the biological mechanisms that differentiate these samples. GSAASeqSP is a powerful platform for investigating molecular underpinnings of complex traits and diseases arising from differential activity within the biological pathways. GSAASeqSP is available at http://gsaa.unc.edu.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25213199 PMCID: PMC4161965 DOI: 10.1038/srep06347
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic flow diagram of GSAASeqSP.
(A): GSAASeqSP takes as input an experimental count dataset and a priori defined gene sets, and first generates permuted datasets based on the experimental dataset; (B): Data is normalized and extremely small and large gene sets are filtered; (C): Differential expression analysis is performed using one of: Signal2Noise, log2Ratio, and Signal2Noise_log2Ratio; (D): Gene set association analysis is performed using one of: Weighted_KS, L2Norm, Mean, WeightedSigRatio, SigRatio, GeometricMean, FisherMethod, and RankSum; (E): Outputs include 1) ranked summary gene set association table with the name of the gene set, the number of genes (SIZE), association score (AS), normalized association score (NAS), P-VALUE, FDR, and FWER; 2) a link to gene set annotation in MSigDB (where applicable); 3) a heat map of the gene expression data for each gene set; and 4) the null distribution of the AS.
Figure 2Recognition rates for all combinations of gene-level and gene set-level statistics applied to simulation scenarios 1–6.
Figure 3FDRs for all combinations of gene-level and gene set-level statistics applied to simulation scenarios 1–6.
The occurrences and ranks of the top pathways across eight methods associated with differences between kidney and liver tissue
| Index | Pathway | NOC | Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 | Rank 6 | Rank 7 | Rank 8 | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | BIOCARTA AMI PATHWAY | 8 | 1 | 2 | 7 | 1 | 15 | 1 | 1 | 2 | 3.75 |
| G2 | REACTOME XENOBIOTICS | 8 | 3 | 1 | 6 | 3 | 23 | 2 | 2 | 1 | 5.13 |
| G3 | REACTOME COMPLEMENT CASCADE | 6 | 0 | 4 | 9 | 0 | 6 | 4 | 9 | 4 | 6.00 |
| G4 | BIOCARTA INTRINSIC PATHWAY | 8 | 2 | 3 | 8 | 2 | 26 | 3 | 3 | 3 | 6.25 |
| G5 | BIOCARTA COMP PATHWAY | 5 | 0 | 10 | 1 | 0 | 0 | 8 | 6 | 8 | 6.60 |
| G6 | KEGG PRIMARY BILE ACID BIOSYNTHESIS | 6 | 4 | 6 | 12 | 0 | 0 | 10 | 8 | 7 | 7.83 |
| G7 | KEGG RETINOL METABOLISM | 6 | 11 | 8 | 13 | 0 | 0 | 6 | 4 | 6 | 8.00 |
| G8 | KEGG COMPLEMENT AND COAGULATION CASCADES | 6 | 12 | 11 | 10 | 0 | 0 | 9 | 7 | 5 | 9.00 |
| G9 | REACTOME BILE ACID AND BILE SALT METABOLISM | 6 | 0 | 9 | 11 | 0 | 10 | 5 | 10 | 9 | 9.00 |
| G10 | REACTOME SYNTHESIS OF BILE ACIDS AND BILE SALTS | 7 | 9 | 5 | 14 | 0 | 18 | 7 | 5 | 15 | 10.43 |
NOC: number of occurrences; 1: Weighted_KS; 2: L2Norm; 3: Mean; 4: WeigtedSigRatio; 5: SigRatio; 6: GeometricMean; 7: FisherMethod; 8: RankSum; Avg: the average rank.
The occurrences and ranks of top pathways across eight methods associated with differences in breast cancer subtypes
| Index | Pathway | NOC | Rank 1 | Rank 2 | Rank 3 | Rank 4 | Rank 5 | Rank 6 | Rank 7 | Rank 8 | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|
| G1 | REACTOME DNA STRAND ELONGATION | 7 | 0 | 5 | 4 | 2 | 4 | 1 | 1 | 2 | 2.71 |
| G2 | REACTOME ACTIVATION OF THE PRE REPLICATIVE COMPLEX | 8 | 2 | 9 | 2 | 4 | 6 | 3 | 3 | 19 | 6.00 |
| G3 | PID FOXM1PATHWAY | 8 | 1 | 2 | 3 | 10 | 20 | 5 | 4 | 5 | 6.25 |
| G4 | PID AURORA B PATHWAY | 8 | 3 | 4 | 11 | 1 | 13 | 4 | 5 | 25 | 8.25 |
| G5 | REACTOME G1 S SPECIFIC TRANSCRIPTION | 8 | 9 | 7 | 1 | 18 | 12 | 2 | 2 | 16 | 8.38 |
| G6 | REACTOME G1 PHASE | 8 | 11 | 6 | 8 | 15 | 21 | 8 | 7 | 4 | 10.00 |
| G7 | PID ATR PATHWAY | 7 | 0 | 19 | 14 | 3 | 5 | 9 | 11 | 13 | 10.57 |
| G8 | KEGG DNA REPLICATION | 8 | 8 | 16 | 10 | 5 | 8 | 11 | 10 | 18 | 10.75 |
| G9 | REACTOME G2 M CHECKPOINTS | 8 | 18 | 8 | 5 | 25 | 1 | 13 | 9 | 7 | 10.75 |
| G10 | REACTOME CYCLIN A B1 ASSOCIATED EVENTS DURING G2 M TRANSITION | 7 | 10 | 1 | 12 | 16 | 0 | 6 | 6 | 29 | 11.43 |
Figure 4The predicted protein-protein interaction network of protein products of genes in the top three differential pathways associated with breast cancer subtypes (confidence: 0.90).
The nodes represent proteins; the edges represent the predicted functional associations. The associations were inferred from two types of evidence from the STRING database: the presence of experimental evidence (purple line) and text-mining evidence (yellow line). Experimental evidence was obtained from protein-protein interaction databases and text-mining evidence from abstracts of scientific literature.