| Literature DB >> 31060502 |
Sahil D Shah1, Rosemary Braun2,3,4.
Abstract
BACKGROUND: A key challenge of identifying disease-associated genes is analyzing transcriptomic data in the context of regulatory networks that control cellular processes in order to capture multi-gene interactions and yield mechanistically interpretable results. One existing category of analysis techniques identifies groups of related genes using interaction networks, but these gene sets often comprise tens or hundreds of genes, making experimental follow-up challenging. A more recent category of methods identifies precise gene targets while incorporating systems-level information, but these techniques do not determine whether a gene is a driving source of changes in its network, an important characteristic when looking for potential drug targets.Entities:
Keywords: Algorithms; Gene expression; Networks; Pathways; Systems biology
Mesh:
Year: 2019 PMID: 31060502 PMCID: PMC6503437 DOI: 10.1186/s12859-019-2829-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of GeneSurrounder algorithm. The algorithm incorporates systems–level information, in the form of a network model of cellular interactions, with gene expression data to identify the genes that control disease–associated mechanisms. The algorithm than identifies “disruptive” genes by assessing the significance of the combined evidence that (1) a gene has a influence on others in the network and (2) that its influence is driving disease
Fig. 2Procedure for Sphere of Influence. The Sphere of Influence computation tests if a putative driver gene is more correlated with its neighbors than a random sample of genes
Fig. 3Procedure for Decay of Differential Expression. The Decay of Differential Expression computation tests if the discordance between differential expression and distance from the driver gene is greater with the phenotype labels we observe than with
Fig. 4Illustration of Method. Displayed are the results for the gene MCM2 when our algorithm was applied to Ovarian Cancer Study GSE14764. a shows − log10(pSphere) vs the Neighborhood Radius. b shows − log10(pDecay) vs the Neighborhood Radius. c shows − log10(pCombined) vs the Neighborhood Radius. d shows the Number of Assayed Genes vs the Neighborhood Radius. In the top three plots, the dashed and dotted lines correspond to a significance level of 0.05 and 0.01 respectively. In the bottom plot, the solid line corresponds to the total number of genes assayed and on the network
Ovarian cancer datasets used in this study
| GEO Accession No. | ||
|---|---|---|
| GSE14764 | 24 | 44 |
| GSE17260 | 67 | 43 |
| GSE9891 | 103 | 154 |
Comparisons were made between low– and high–grade serous ovarian carcinoma using public data. Sample sizes for each group in each dataset are given. The data are publicly accessible and available as part of the curatedOvarianData package [23]
“Disruptive” disease genes in high-grade ovarian cancer consistently found by GeneSurrounder
| −log10 | |||
|---|---|---|---|
| Gene | GSE14764 | GSE17260 | GSE9891 |
| ADRB3 | 3.033 | 2.933 | 3.554 |
| AURKA | 2.865 | 3.383 | 3.716 |
| CDC45 | 4.270 | 3.741 | 4.830 |
| CDC7 | 4.386 | 3.769 | 4.830 |
| DBF4 | 4.270 | 3.769 | 4.830 |
| IL7 | 3.055 | 2.898 | 2.910 |
| ITGAM | 2.961 | 3.024 | 3.094 |
| MCM2 | 4.830 | 3.372 | 4.830 |
| MCM3 | 4.830 | 3.383 | 4.830 |
| MCM4 | 4.830 | 3.394 | 4.830 |
| MCM5 | 4.830 | 3.372 | 4.830 |
| MCM6 | 4.830 | 3.428 | 4.830 |
| ORC4 | 4.386 | 3.172 | 4.830 |
| ORC6 | 4.386 | 3.691 | 4.830 |
| TTK | 2.904 | 3.089 | 4.830 |
At a threshold of p=0.05 and with a diameter of D=34, the Bonferroni corrected threshold is −log10(p)≥2.83. Listed are the genes that pass this threshold in all three studies
Correlation between GeneSurrounder results and network/gene statistics
| Network/Gene statistic | GSE14764 | GSE17260 | GSE9891 |
|---|---|---|---|
| Degree Cor. | 0.044 | 0.070 | 0.038 |
| Betweenness Cor. | 0.047 | 0.059 | 0.030 |
| 0.060 | 0.103 | − 0.051 |
The three columns are the rank correlation between GeneSurrounder results (pGS) and network/gene statistics (Degree, Betweenness, and pDE) across all genes in each dataset. The Degree and Betweenness are two different network centrality measures. The Degree is the number of connections a node has and the Betweenness is the fraction of shortest paths that passes through the node. pDE is the p-value obtained from a standard differential expression t-test
Cross study concordance of GeneSurrounder results compared to differential expression analysis and LEAN
| Ovarian cancer study pair | |||
|---|---|---|---|
| GSE14764 - GSE17260 | 0.342 | 0.040 | 0.056 |
| GSE14764 - GSE9891 | 0.436 | 0.056 | 0.130 |
| GSE17260 - GSE9891 | 0.485 | 0.138 | 0.290 |
The columns pGS Cor., pDE Cor., and pLEAN Cor. are the Spearman rank correlations respectively between the results obtained from GeneSurrounder, differential expression analysis, and LEAN for each study pair