| Literature DB >> 31484776 |
Ke Liu1, Elizabeth Theusch2, Yun Zhou1, Tal Ashuach3, Andrea C Dose2, Peter J Bickel4,3, Marisa W Medina5, Haiyan Huang4,3.
Abstract
Rapid advances in genomic technologies have led to a wealth of diverse data, from which novel discoveries can be gleaned through the application of robust statistical and computational methods. Here, we describe GeneFishing, a semisupervised computational approach to reconstruct context-specific portraits of biological processes by leveraging gene-gene coexpression information. GeneFishing incorporates multiple high-dimensional statistical ideas, including dimensionality reduction, clustering, subsampling, and results aggregation, to produce robust results. To illustrate the power of our method, we applied it using 21 genes involved in cholesterol metabolism as "bait" to "fish out" (or identify) genes not previously identified as being connected to cholesterol metabolism. Using simulation and real datasets, we found that the results obtained through GeneFishing were more interesting for our study than those provided by related gene prioritization methods. In particular, application of GeneFishing to the GTEx liver RNA sequencing (RNAseq) data not only reidentified many known cholesterol-related genes, but also pointed to glyoxalase I (GLO1) as a gene implicated in cholesterol metabolism. In a follow-up experiment, we found that GLO1 knockdown in human hepatoma cell lines increased levels of cellular cholesterol ester, validating a role for GLO1 in cholesterol metabolism. In addition, we performed pantissue analysis by applying GeneFishing on various tissues and identified many potential tissue-specific cholesterol metabolism-related genes. GeneFishing appears to be a powerful tool for identifying related components of complex biological systems and may be used across a wide range of applications.Entities:
Keywords: cholesterol metabolism; context-specific gene functional groups; gene pathways; gene prioritization; pantissue analysis
Mesh:
Substances:
Year: 2019 PMID: 31484776 PMCID: PMC6754596 DOI: 10.1073/pnas.1820340116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Motivation and workflow of GeneFishing. (A to D) Spectral clustering plot of the 21 bait genes (colored in red) with another 61 genes (colored in blue) associated with the GO BP term “cholesterol metabolic process” (A), and 100 (B), 1,500 (C), and 2,000 (D) random genes (colored in gray). (E) Workflow of GeneFishing.
Fig. 2.Evaluation of GeneFishing. (A) Distribution of CFR values when GeneFishing was applied to the CAP-LCL dataset. (B) For each method, 2 ranked gene lists were generated by applying the method to the CAP-LCL and GEUVADIS-LCL datasets. Each colored curve corresponds to a gene prioritization method, plotting the number of overlapped genes between the 2 lists up to a rank position (y axis) against the rank (x axis). GBA is short for guilt-by-association. (C) Scatterplots of the CFR values when GeneFishing was applied to the raw CAP-LCL dataset and 3 randomly perturbed datasets.
Fig. 3.Effect of candidate gene knockdown on transcript levels of cholesterol related genes. (A) Transcript levels (in the Huh7 cell line) of candidate genes were quantified by SYBR Green assay via qPCR to assess the degree of gene knockdown. (B) Transcript level of SQLE (in the Huh7 cell line) was quantified by SYBR Green assay to test whether candidate genes knockdown modulated its expression level. (C) Transcript levels (in the HepG2 cell line) of GLO1 and RDH11 were quantified by SYBR Green assay via qPCR to assess the degree of gene knockdown. Transcript level of SQLE (in the HepG2 cell line) was quantified by SYBR Green assay to test whether GLO1 and RDH11 knockdown modulated its expression level. In A to C, data were analyzed using the delta Ct (cycle threshold) method and normalized to CLPTM1 transcript levels as a loading control. All qPCR assays were performed in triplicate. (D) Cellular cholesterol levels were quantified using the Amplex Red Cholesterol Assay kit with values normalized to total cellular protein quantified via Bradford assay. There are 3 to 6 replicates per treatment condition. NTC, nontargeting control.
Fig. 4.Pantissue GeneFishing analysis. (A) Examination of modularity of the 21 bait genes across GTEx tissues. GeneFishing was applied to the 17 tissues inside the blue circle. The Inset shows the detailed coordinates of the 17 tissues. (B) The coexpression pattern of the genes associated with the GO BP term cholesterol metabolic process in 6 representative tissues. In each heat map, the row and column have identical gene orders, and the side bar indicates whether the gene belongs to the 21 bait genes (red means yes). (C) Visualization of pantissue GeneFishing results. Each row is associated with a gene, and each column is associated with a tissue (labeled with different colors). If the color of an entry is not gray, then it means that the CFR of the corresponding gene is higher than 0.9 in the corresponding tissue.
Fig. 5.In both panels, each colored curve corresponds to a method, with x axis representing the rank and the y axis representing the number of lipid metabolism genes among the top-ranked genes.