Literature DB >> 28500050

A Systemic Analysis of Transcriptomic and Epigenomic Data To Reveal Regulation Patterns for Complex Disease.

Chao Xu¹, Ji-Gang Zhang¹, Dongdong Lin², Lan Zhang¹, Hui Shen¹, Hong-Wen Deng^3,4.

Abstract

Integrating diverse genomics data can provide a global view of the complex biological processes related to the human complex diseases. Although substantial efforts have been made to integrate different omics data, there are at least three challenges for multi-omics integration methods: (i) How to simultaneously consider the effects of various genomic factors, since these factors jointly influence the phenotypes; (ii) How to effectively incorporate the information from publicly accessible databases and omics datasets to fully capture the interactions among (epi)genomic factors from diverse omics data; and (iii) Until present, the combination of more than two omics datasets has been poorly explored. Current integration approaches are not sufficient to address all of these challenges together. We proposed a novel integrative analysis framework by incorporating sparse model, multivariate analysis, Gaussian graphical model, and network analysis to address these three challenges simultaneously. Based on this strategy, we performed a systemic analysis for glioblastoma multiforme (GBM) integrating genome-wide gene expression, DNA methylation, and miRNA expression data. We identified three regulatory modules of genomic factors associated with GBM survival time and revealed a global regulatory pattern for GBM by combining the three modules, with respect to the common regulatory factors. Our method can not only identify disease-associated dysregulated genomic factors from different omics, but more importantly, it can incorporate the information from publicly accessible databases and omics datasets to infer a comprehensive interaction map of all these dysregulated genomic factors. Our work represents an innovative approach to enhance our understanding of molecular genomic mechanisms underlying human complex diseases.

Entities: Chemical Disease Gene Species

Keywords: glioblastoma multiforme; integrative analysis; multi-omics data; network analysis; sparse modeling

Mesh：

Substances：

Year: 2017 PMID： 28500050 PMCID： PMC5499134 DOI： 10.1534/g3.117.042408

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Human complex diseases (e.g., cancer) are induced by various genomic and epigenomic alterations in multiple biological processes (Wang ). Studying a single type of biological data is generally insufficient to fully explore the underlying mechanisms of the human complex diseases. Recent advances in high-throughput technologies allow efficient investigation of various omics data, such as single nucleotide polymorphism (SNP), copy number variation (CNV), DNA methylation, and gene expression. Several pioneering studies have yielded genome-scale large datasets, including genomic, epigenomic, transcriptomic, and proteomic information, which are publicly accessible from large collaborative projects, including the ENCODE Project Consortium (2012), NIH Epigenomics Roadmap (Bernstein ), and The Cancer Genome Atlas (TCGA) (Weinstein ). Concomitantly, the integration analyses of these diverse omics data are increasingly adopted to identify the potential causal (epi)genomic factors and ultimately provide the systematic view of fundamental insights into the complex mechanisms underlying the etiology of human diseases (Chen ; Hamed ). For example, integrating genotype data with whole-genome gene expression data or DNA methylation data can identify the expression quantitative trait loci (eQTL) or the methylation quantitative trait loci (meQTL) (Shabalin 2012; Schadt ). Methods for integrating gene expression data with miRNA, SNP, CNV, and DNA methylation data have been applied in cancer genomics (West ; Taylor ). Additionally, some latent variable models, such as canonical correlation analysis and partial least squares, were applied to identify the relationship between different omics datasets (Friedman ; Zhao ; Soneson ). Recently, network analysis is increasingly gaining acceptance as a useful tool for data integration (Chen ; Kholodenko ). Various network analysis methods have been developed to incorporate transcriptomic and proteomic data for computation of biological networks (Gosline ; Wachter and Beissbarth 2015), to elucidate causality in biological networks (Gitter ; Ourfali ), and to integrate and visualize complex metabolomics results even in cases where biochemical domain knowledge or molecular annotations are unknown (Grapov ; Karnovsky ). Although substantial efforts have been made to integrate different omics data, there are at least three challenges for multi-omics integration methods to overcome: (i) How to simultaneously consider the effects of all kinds of genomic and epigenomic factors, since these factors jointly influence the phenotypes (Lander 2011); (ii) How to effectively incorporate the information from publicly accessible databases and omics datasets to fully capture the interactions among (epi)genomic factors from diverse omics data sources; and (iii) Until present, the combination of more than two omics datasets has been poorly explored compared with those that intend to integrate two various omics datasets, such as those in eQTL and meQTL analyses (Pineda ). Current integration approaches are not sufficient to address all of these challenges simultaneously in one analytical framework. To address these three challenges in multi-omics integration analysis, we presented a novel integrative analysis framework that incorporated sparse model, multivariate analysis, Gaussian graphical model (GGM), and network analysis. Our method can not only simultaneously identify disease-associated (epi)genomic factors from diverse omics data, but also incorporate the information from publicly accessible databases and omics datasets to infer the regulatory modules of these (epi)genomic factors. By applying this strategy to systemically study the genome-wide gene expression, DNA methylation, and miRNA expression data of glioblastoma multiforme (GBM) samples, we identified three regulatory modules of dysregulated (epi)genomic factors associated with GBM patient survival time. By combining the three regulatory modules with respect to the common regulatory factors, we further revealed a promising global regulatory pattern critical for GBM survival. Our integrative analysis represents an innovative approach to enhance our comprehensive understanding of molecular genomic mechanisms underlying human complex diseases.

Methods

It is well known that genes usually jointly contribute to certain diseases and many epigenomic factors play an important role in the development of complex diseases by regulating gene expression. Our goal is to infer a comprehensive interaction map of all these dysregulated (epi)genomic factors. Thus, we divided our integrative analysis into two major stages as shown in Figure 1: we first identify the trait-related mRNAs and build optimized coexpression modules with these mRNAs, and then infer regulation pattern between the epigenomic (and/or other omics factors) and the identified coexpression module.

Figure 1

Workflow of the integrative analysis of multi-omics data.

Stage 1: build optimized mRNA coexpression module networks associated with the trait of interest

Considering the different gene functional modules may contribute to certain diseases and the computational burden for the downstream analysis, we propose to identify trait-associated mRNAs and subsequently discover the coexpression modules with trait-associated mRNAs through coexpression network analysis. First, we apply the elastic net penalized regression model to select a set of trait-associated mRNAs. The elastic net is particularly useful to handle the situation of small sample size and a large number of features. In addition, it encourages the selection of strongly correlated predictors in or out of the model together, which is helpful to preserve the information for the following module identification (Zou and Hastie 2005). The elastic net penalized regression model is illustrated as follows:where the regularization parameter λ > 0 controls the overall strength of the penalty and 0 < α ≤ 1 bridges the gap between lasso (α = 1, the default) and ridge (α = 0) penalty. is the loss function given a fitted model, such as the residual sum of squares for the ordinary linear model or the negative log partial likelihood function for the Cox’s proportional hazards model. The optimal λ and α can be chosen by 10-fold cross-validation. The trait-associated mRNAs are then subjected to weighted correlation network analysis (WGCNA) (Zhang and Horvath 2005) for the identification of high coexpression modules, denoted as where is the number of modules identified. Computations are carried out using the R package WGCNA (Zhang and Horvath 2005). The relationships stored in the coexpression modules include direct interactions, which connects one pair of genes directly, and indirect interactions, where two genes are connected due to a path with multiple edges (Poyatos 2011). We then remove the indirect interactions in the coexpression modules through a partial correlation analysis. The GGM reveals direct associations with conditional independence/dependence among variables using partial correlation coeffcients (Mader ). Given a coexpression module it is assumed the expression of the genes in follows a multivariate Gaussian distribution with mean μ and covariance matrix Σ. The conditional independence between g and g given the other genes g − {}, denoted by P(g, g|g−{}) = P(g|g − {}) P(g|g − {}), is equivalent to that the corresponding element in the precision matrix is zero (Wang and Huang 2014), i.e., ω = 0. The precision matrix is the inverse of the covariance matrix of genes in denoted by Ω = (ω) = Σ−1. A partial correlation P(g, g|g − {}) is formally written as with the property where , and are the estimators of , and (Wang ). Only the edges with significantly different from zero will be preserved. An FDR of 0.05 is used as the cut-off for statistical significance to adjust for the multiple testing. The filtered modules are denoted by After removing the indirect interactions, an optimal subnetwork is refined that which may play an important role on the trait of interest. Given one module is , with a node set as and direct interactions set as the Prize Collecting Steiner Tree (PCST) algorithm, which is able to reconstruct compact networks of the functionally relevant connections with control of the false positives in the network (Huang and Fraenkel 2009), is used to find a set of most confident interactions that connect the terminal genes in the network , using the following function that simultaneously minimizes the cost of edges included and the penalties of nodes excluded:where the node prize b is the weight of node i; c is the cost of edges with k interaction evidence, r indicates the score of the jth interaction evidence. The weight of node i is from the univariate mRNA trait regression analysis. We assign the absolute value of the regression coefficient of the node i to b. The cost of edges is derived from two kinds of evidence: (i) information from publicly accessible databases and omics datasets, such as the Search Tool for the Retrieval of Interacting Genes (STRING) database (http://string-db.org/), or public datasets; and (ii) correlations between nodes in the local datasets. Instead of using the standard definition of we combine the two kinds of evidence into the edge weight using wherea naive Bayesian approach to measure the interaction evidence among nodes (Szklarczyk ). is the subscore downloaded from STRING or the absolute value of the correlations from the local datasets. This method integrates the scores by multiplying the probabilities of associations not predicting a functional interaction while adjusting for the prior probability () for any two genes being linked, which is 0.063 according to the KEGG benchmark dataset (Franceschini ). In the calculation, the prior corrected score is constrained to be within [0, 1] (see the source code online). Compared with the standard method, it yields higher confidence when more than one type of evidence supports a given interaction (von Mering ).

Stage 2: infer module-based regulation pattern with epigenomic and/or other omics data

To identify DNA methylation sites and other factors that potentially regulate the optimized module subnetwork (such as miRNAs), we examine their associations with the genes within Let be the data matrix of derived in stage 1, where n is the number of patients with complete phenotypic and multi-omics data, and p is the number of mRNAs in the given module First, we screened trait-associated methylation sites and other regulators by the elastic net penalized regression model as in the analyses of mRNAs in stage 1. Let be the combined data matrix for the q trait-associated regulators, which may include methylation sites and other factors. To deal with the large number of features and small sample size, we use the sparse partial least square (SPLS) regression model to identify the regulators that are correlated with the module (Meng ):where β are loading vectors for the latent vectors respectively. The sparse regularization function including L1 and L2 penalties is imposed on β: the L1 penalty is applied to set the coefficients of the irrelevant variables to 0; the L2 penalty is added to handle multicollinearity among covariates (Chun and Keles 2010; Chun ). The optimal regularization parameter can be chosen by 10-fold cross-validation (Chun ). The resulted regulators associated with the module are denoted by To reveal the module-based regulation patterns, the Pearson’s correlation is used to detect correlations between the identified regulators and mRNAs within the module The cut-off of correlation test p value ≤0.05 (t-test) is applied to select those regulator mRNA pairs of potential interest. Those chosen pairs are further filtered by external databases if there are any, i.e., Exiqon miRSearch, TargetScan, and microRNA.org for miRNA–mRNA regulatory relationships (Chen and Rajewsky 2007; Akhtar ).

Data availability

The source code in this study is publicly available at https://github.com/xu1912/SMON.git.

Results

Here, we apply our framework to integrate three different omics datasets (mRNA expression, miRNA expression, and DNA methylation) from the GBM study. All normalized data (level 3) were collected from The Cancer Genome Atlas (TCGA) portal, and can be accessed from the TCGA-GBM project repository at National Cancer Institute Genomic Data Commons Data Portal (https://portal.gdc.cancer.gov/legacy-archive/search/f). To minimize the scale differences among different omics data, the features of the three omics datasets were standardized to have zero means and unit SDs. The clinical outcome of interest is the patient survival time. From the GBM datasets, we excluded those patients whose survival time was <30 d to remove short survival due to reasons such as postoperative complications from the surgery (Kim ). Figure 2 presents the sample size of the three omics datasets and the overlapping samples among them.

Figure 2

Venn diagram of the samples with different Omics data. Methylation ∩ miRNA = 261; methylation ∩ mRNA = 265; miRNA ∩ mRNA = 504.

Venn diagram of the samples with different Omics data. Methylation ∩ miRNA = 261; methylation ∩ mRNA = 265; miRNA ∩ mRNA = 504. Identifying trait-associated genes in mRNA data are the first step in reconstructing trait-associated coexpression modules. As detailed in the Methods section, the sparse Cox’s proportional hazards model was used to select genes associated with patients’ survival time, which can simultaneously incorporate thousands of genomic markers working collaboratively with joint effects on the trait of interest in a single statistical model. After 10-fold cross-validation, we identified 217 genes with 512 subjects in mRNA data significantly associated with GBM survival time, including numerous GBM-related genes such as FZD7 (Kierulf-Vieira ), TPPP3 (Fomchenko ), and LGALS3 (Ma ). Using the Dynamic Tree Cut method in WGCNA, we identified three coexpression modules, which are composed of 50, 28, and 18 genes, respectively. There were 121 genes (55.76% in total identified genes) that were not grouped in any community. This was partially due to the minimum size of modules we set. In this study, the minimum size of modules was set as 15, which means those genes were not assigned in a module due to their size being <15, even if they were grouped. Figure 3 presents the correlation heat map of modules constructed through WGCNA. In the heat map, each row (column) corresponds to a gene. The independent modules are represented as isolated boxes along the diagonal. Inspecting the correlation between and within the module memberships, these genes within each module are found to be strongly connected (reflected by the majority of red blocks within each module in Figure 3), and the genes between modules show weak connections (reflected by the overall green block in the heat map in Figure 3). The weak interconnectivity between modules suggests that the three modules may function separately and affect patient survival time in relatively independent ways.

Figure 3

Heat map of correlations between and within coexpression modules constructed by WGCNA. Each row/column represents a gene. Each cell element is the absolute value of correlation coefficient between two genes. The intensity of red coloring indicates the strength of correlation between pairs of genes, with green color corresponding to low correlation. The independent modules are represented as isolated boxes along the diagonal. We estimated the partial correlations between genes in each module to trim indirect interactions. In each module, we selected significant gene pairs (FDR ≤ 0.05) for their partial correlations and reconstructed the coexpression networks using these significant gene pairs. As shown in Supplemental Material, Figure S1, in modules 1–3, we identified 272, 141, and 141 gene pairs with significant partial correlations, respectively, and trimmed 77.80, 62.70, and 7.84% of gene pairs as indirect interactions. In addition, by incorporating the information on protein–protein interactions (PPIs) from the STRING database, we further filtered the unreliable or indirect interactions in the modules by PCST. In total, 49 gene pairs from module 1, 27 pairs from module 2, and 17 pairs from module 3 were determined as the most reliable interactions (Figure S1) that had multiple lines of evidence supporting their potential functionality in the cell. We also performed pathway enrichment analysis on the genes collected from all the three modules with WEB-based GEne SeT AnaLysis Toolkit (Wang ) to investigate how well the modules functioned in a GBM-related process, as annotated by KEGG database (Kanehisa ). An additional file (Table S1) lists the nine most significant KEGG pathways with adjusted p values <0.05, including mRNA surveillance pathway, pathways in cancer, WNT signaling pathway, etc. Among these pathways, it is well known that WNT signaling pathway regulates proliferation, death, and migration and cell fate decision. Dysregulation of the WNT signaling pathway was associated with various cancers, including GBM (Lee ). The members FZD7 and FZD10 in the WNT signaling pathway are important receptors. In many types of cancer, FZD2 expression was strongly correlated with poor prognosis (Mine ). Therefore, our results may reveal the regulation pattern of FZD2 and FZD10 expression by network analysis, which could be utilized for epigenomic-based therapy for GBM. To incorporate DNA methylation and miRNA data into the coexpression modules, we identified miRNAs and methylation sites that are associated with patient survival time, and then used SPLS regression to determine those miRNAs and DNA methylation sites that are also associated with the genes of the three coexpression modules in Table 1. Further, within each module, we determined genes with cis-correlated DNA methylation sites, as well as miRNA–mRNA pairs.

Table 1

The identified miRNAs and methylation sites for three modules by SPLS model

	mRNAs	miRNAs	Methylation Sites
Module 1	50	46	353
Module 2	28	11	125
Module 3	18	33	174

It is known that DNA methylation is an important epigenomic mechanism to regulate gene expression. If there is a significant association for one gene between its methylation level and expression level, it is called cis relationship; otherwise, it is defined as trans relationship (Smith and Meissner 2013). Since the reproducibility of trans relationships is still in debate, we focused on genes with cis relationships (van Nas ; van Eijk ). It can be seen that only four gene sites in module 1 have cis effects (Figure 4). The methylation levels of these four gene sites show negative correlations with the expression levels of corresponding genes. In modules 2 and 3, we did not identify the gene sites with cis effects on corresponding genes.

Figure 4

The interaction modules incorporating information of miRNAs and methylation sites. Each rectangle represents a gene; the circle represents miRNA. The green rectangles are genes with cis effects, with brighter indicating higher. The pink dashed edges indicate miRNA–gene interactions annotated in previously mentioned miRNA databases. The green solid edges are gene–gene connections resulting from PCST. miRNA is well known for the major function of cleaving transcripts of its target genes at the post-transcriptional level (He and Hannon 2004). Thus, we were most interested in a negative correlation between miRNA and gene expression. The external databases Exiqon miRSearch, TargetScan, and microRNA.org were used to filter the miRNA–mRNA pairs with miRNA–target regulatory relationships (Chen and Rajewsky 2007; Akhtar ). Those miRNA–mRNA interactions with significant negative correlations and miRNA–target relationships are kept. As shown in Figure 4, we identified 15 miRNAs for module 1, four for module 2, and four for module 3. Our results highlight a number of interesting regulatory mechanisms that may be critical for GBM development and progress. For example, our results suggest that miRNA-181c and the methylation level of DIRAS3 both contribute to the alteration of DIRAS3 expression (module 1 in Figure 4), which may, in turn, affect the GBM survival time. DIRAS3 (also known as ARHI) is a known tumor suppressor gene and overexpression of DIRAS3 resulted in suppression of glioma cell proliferation, arrest of cell-cycle progression, reduction in cell migration and invasion, and promotion of cell apoptosis (Chen ). In addition, miRNA-181c was reported as a tumor-related gene in glioma cells (Ruan ). Thus, our results indicate one possible regulation mechanism of these tumor-related factors and may provide candidate targets for gene therapy of glioma. We combined the three modules to have a global view (Figure 5) of the regulatory networks contributing to GBM patient survival time. The top six miRNAs with most edges in the combined network were miRNA-221, miRNA-204, miRNA-20a, miRNA-340, miRNA-222, and miRNA-181c. Among these six miRNAs, miRNA-221, -204, and -222 were shared by three modules; miRNA-181c was shared by modules 1 and 2; miRNA-20a was shared by modules 1 and 3; and miRNA-340 was not shared among modules. These miRNAs may mediate cooperative regulation of different modules and thus may play particularly critical roles in regulating GBM development and progress. miRNA-221 and miRNA-222 are oncogenic miRNAs that have been studied in relation to a diverse list of cancers, including GBM. When overexpressed in vitro, both miRNA-221 and miRNA-222 potentiate classic cancer hallmarks, i.e., proliferation, angiogenesis, and invasion (Henriksen ; Singh ; Zhang ). Due to their broader functional relevance, miRNA-20a, -204, -181c, and -340 also have been identified as oncogenic genes and may serve as targets for treatment of GBM (Wang ; Wei ; Xia ; Huang ; Ruan ). Additionally, several target genes of these miRNAs have been validated in previous studies, e.g., RAB18, RSU1, GTPBP4, DIRAS3, and F3 (Behrends ; Chunduru ; Lee ; Riemenschneider ; Gessler ). Taken together, our findings highlight several miRNAs that may regulate multiple signaling cascades crucial for gliomagenesis and therefore, these miRNAs could be therapeutically significant.

Figure 5

Combined regulatory network with three identified modules. Each rectangle represents a gene; the circle represents miRNA. Different colors of the rectangles/circles indicate their belonging to a single module or overlap of the three modules. The pink dashed edges indicate miRNA–gene interactions annotated in previously mentioned miRNA databases. The green solid edges are gene–gene connections resulting from PCST.

Discussion

Single-omics studies (genome/transcriptome/epigenome/proteome) fall short of illuminating the underlying functional mechanisms and providing a comprehensive view of the regulatory patterns of genomic factors across multiple omics datasets for the etiology of human diseases (Farber and Lusis 2009). Integrating multi-omics datasets in network analysis may facilitate the discovery of novel susceptibility genes for human complex diseases, and more importantly, yield a comprehensive understanding of the complex regulatory mechanisms embedded in and across multi-omics data (Farber 2010). In this study, we proposed an integrative network analysis framework with epigenomic and transcriptomic data to identify regulatory patterns relevant to the trait of interest. The additional analysis in File S1 indicates that our framework can produce reliable results. Our framework started with the coexpression network analysis to identify the coexpression modules based on the following two considerations: (i) due to the complexity of human diseases, it is highly likely that different gene functional modules may contribute to certain diseases, and since different modules tend to have different biological functions, it is reasonable to analyze each module separately in the sense that different biological functionalities should be considered separately (Ma ); (ii) a benefit of coexpression network analysis is that it can greatly decrease the computational burden for the downstream analysis, e.g., partial correlation analysis and inference of optimal subnetworks. For example, if we have 300 trait-associated genes, there will be 44,850 gene pairs to be tested in partial correlation analysis, and this number will increase dramatically with the increase of the number of trait-associated genes, which will lead to heavy computation burden and decrease of power to identify gene pairs with significant partial correlations. In the case of complex diseases (such as GBM), comprehensively identifying interactions among (epi)genomic factors is important to systematically dissect cellular roles of those (epi)genomic factors and to gain insights into metabolic pathways. With a coexpression network, the number of correlations is generally considerably high, suggesting a plethora of indirect interactions (Krumsiek ). To remove indirect interactions among genomic factors and refer reliable regulation networks, our proposed method incorporated two kinds of analyses for a given coexpression module: partial correlation analysis and inference of optimal subnetworks. In partial correlation analysis, GGMs were applied to distinguish direct from indirect associations by estimating the conditional dependence between genes based on partial correlation coefficients. However, conditional independence by itself is insufficient to remove all indirect relationships. Thus, compiling the information from external public databases will be helpful to further prune those unlikely, indirect, and spurious interactions. We can retrieve various genomic interactions from many available public databases and multi-omics datasets, such as PPIs. In this study, we incorporated interaction information from PPIs and the results of the partial correlation analysis to compute a score as a confidence score for each interaction in the module. The most reliable interactions of each module were further inferred through searching optimal subnetwork for the given module. Since our goal is to identify regulatory patterns relevant to the trait of interest from epigenomic and transcriptomic data, it is reasonable to only choose the trait-associated (epi)genomic factors from multi-omics data for the subsequent network analysis. This selection can not only remove noise but also decrease the computational cost in the network construction of multi-omics data. During the selection procedure, we adopted a sparse model using L1 and L2 penalties to identify the trait-associated genomic factors, which has the following specific advantages: (i) it accommodates tens of thousands of features at a time and identifies joint effects of a combination of trait-associated genomic factors, including those with small effect sizes; (ii) using both L1 and L2 penalties, it is able to select groups of correlated variables, which are very common in high-dimensional genomic data. The selection of relevant correlated genomic factors is essentially important for coexpression network analysis and discovery of regulation patterns. Thus, the sparse model using both L1 and L2 penalties demonstrates the efficiency in feature selection and captures informative genomic factors. In summary, our method can not only identify disease-associated dysregulated genomic factors, but also, more importantly, construct a comprehensive map of interactions of all these dysregulated genomic factors implicated in a specific disease. It is essential to understand the intricacy of the genomic mechanisms behind complex diseases, and this may support the development of new therapeutics. However, it should be recognized that network representation of the complexity of biological systems is just the beginning. This study is expected to pioneer an innovative approach to comprehensively enhance our understanding of molecular genomic mechanism in human complex diseases.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.117.042408/-/DC1. Click here for additional data file. Click here for additional data file. Click here for additional data file.

66 in total

Review 1. MicroRNAs: small RNAs with a big role in gene regulation.

Authors: Lin He; Gregory J Hannon
Journal: Nat Rev Genet Date: 2004-07 Impact factor: 53.242

2. The huge Package for High-dimensional Undirected Graph Estimation in R.

Authors: Tuo Zhao; Han Liu; Kathryn Roeder; John Lafferty; Larry Wasserman
Journal: J Mach Learn Res Date: 2012-04 Impact factor: 3.654

3. An integrative genomics approach to infer causal associations between gene expression and disease.

Authors: Eric E Schadt; John Lamb; Xia Yang; Jun Zhu; Steve Edwards; Debraj Guhathakurta; Solveig K Sieberts; Stephanie Monks; Marc Reitman; Chunsheng Zhang; Pek Yee Lum; Amy Leonardson; Rolf Thieringer; Joseph M Metzger; Liming Yang; John Castle; Haoyuan Zhu; Shera F Kash; Thomas A Drake; Alan Sachs; Aldons J Lusis
Journal: Nat Genet Date: 2005-06-19 Impact factor: 38.330

Review 4. The evolution of gene regulation by transcription factors and microRNAs.

Authors: Kevin Chen; Nikolaus Rajewsky
Journal: Nat Rev Genet Date: 2007-02 Impact factor: 53.242

5. Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks.

Authors: Shao-Shan Carol Huang; Ernest Fraenkel
Journal: Sci Signal Date: 2009-07-28 Impact factor: 8.192

6. Wnt inhibition is dysregulated in gliomas and its re-establishment inhibits proliferation and tumor sphere formation.

Authors: Kirsten Strømme Kierulf-Vieira; Cecilie J Sandberg; Zanina Grieg; Clara-Cecilie Günther; Iver A Langmoen; Einar O Vik-Mo
Journal: Exp Cell Res Date: 2015-12-19 Impact factor: 3.905

7. Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.

Authors: Charlotte Soneson; Henrik Lilljebjörn; Thoas Fioretos; Magnus Fontes
Journal: BMC Bioinformatics Date: 2010-04-15 Impact factor: 3.169

8. Recruited cells can become transformed and overtake PDGF-induced murine gliomas in vivo during tumor progression.

Authors: Elena I Fomchenko; Joseph D Dougherty; Karim Y Helmy; Amanda M Katz; Alexander Pietras; Cameron Brennan; Jason T Huse; Ana Milosevic; Eric C Holland
Journal: PLoS One Date: 2011-07-06 Impact factor: 3.240

9. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013.

Authors: Jing Wang; Dexter Duncan; Zhiao Shi; Bing Zhang
Journal: Nucleic Acids Res Date: 2013-05-23 Impact factor: 16.971

Review 10. Bioinformatic tools for microRNA dissection.

Authors: Most Mauluda Akhtar; Luigina Micolucci; Md Soriful Islam; Fabiola Olivieri; Antonio Domenico Procopio
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971

3 in total

1. Integrative analysis of multi-omics data to detect the underlying molecular mechanisms for obesity in vivo in humans.

Authors: Qiang Zhang; Xiang-He Meng; Chuan Qiu; Hui Shen; Qi Zhao; Lan-Juan Zhao; Qing Tian; Chang-Qing Sun; Hong-Wen Deng
Journal: Hum Genomics Date: 2022-05-14 Impact factor: 6.481

2. DNA methylation profiling identifies potentially significant epigenetically-regulated genes in glioblastoma multiforme.

Authors: Shifeng Kan; Song Chai; Wenhua Chen; Bo Yu
Journal: Oncol Lett Date: 2019-06-21 Impact factor: 2.967

Review 3. A New Era of Neuro-Oncology Research Pioneered by Multi-Omics Analysis and Machine Learning.

Authors: Satoshi Takahashi; Masamichi Takahashi; Shota Tanaka; Shunsaku Takayanagi; Hirokazu Takami; Erika Yamazawa; Shohei Nambu; Mototaka Miyake; Kaishi Satomi; Koichi Ichimura; Yoshitaka Narita; Ryuji Hamamoto
Journal: Biomolecules Date: 2021-04-12

3 in total