Literature DB >> 27612452

Mergeomics: a web server for identifying pathological pathways, networks, and key regulators via multidimensional data integration.

Douglas Arneson¹, Anindya Bhattacharya¹, Le Shu¹, Ville-Petteri Mäkinen^2,3,4, Xia Yang⁵.

Abstract

BACKGROUND: Human diseases are commonly the result of multidimensional changes at molecular, cellular, and systemic levels. Recent advances in genomic technologies have enabled an outpour of omics datasets that capture these changes. However, separate analyses of these various data only provide fragmented understanding and do not capture the holistic view of disease mechanisms. To meet the urgent needs for tools that effectively integrate multiple types of omics data to derive biological insights, we have developed Mergeomics, a computational pipeline that integrates multidimensional disease association data with functional genomics and molecular networks to retrieve biological pathways, gene networks, and central regulators critical for disease development.
RESULTS: To make the Mergeomics pipeline available to a wider research community, we have implemented an online, user-friendly web server ( http://mergeomics. RESEARCH: idre.ucla.edu/ ). The web server features a modular implementation of the Mergeomics pipeline with detailed tutorials. Additionally, it provides curated genomic resources including tissue-specific expression quantitative trait loci, ENCODE functional annotations, biological pathways, and molecular networks, and offers interactive visualization of analytical results. Multiple computational tools including Marker Dependency Filtering (MDF), Marker Set Enrichment Analysis (MSEA), Meta-MSEA, and Weighted Key Driver Analysis (wKDA) can be used separately or in flexible combinations. User-defined summary-level genomic association datasets (e.g., genetic, transcriptomic, epigenomic) related to a particular disease or phenotype can be uploaded and computed real-time to yield biologically interpretable results, which can be viewed online and downloaded for later use.
CONCLUSIONS: Our Mergeomics web server offers researchers flexible and user-friendly tools to facilitate integration of multidimensional data into holistic views of disease mechanisms in the form of tissue-specific key regulators, biological pathways, and gene networks.

Entities: Chemical Disease Gene Species

Keywords: Disease network; EWAS; GWAS; Key driver; Multidimensional data integration; Network meta-analysis; Omics integration; Pathway meta-analysis; TWAS; Web server

Mesh：

Year: 2016 PMID： 27612452 PMCID： PMC5016927 DOI： 10.1186/s12864-016-3057-8

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

Human diseases are complex in nature and commonly involve pathological changes at multiple molecular, cellular and systemic levels [1]. The recent revolution in genomic technologies has enabled the generation of massive amounts of molecular data encompassing genetics, transcriptomics, epigenomics, metabolomics, and proteomics, which have become easily accessible in public domains and private sectors. It is increasingly recognized that analysis of individual types of data separately only reveals a fraction of the complex biology and often misses the key players driving diseases, making multidimensional big data integration an urgent need [1-3]. To date, the majority of the available data integration tools focus on individual data types or analysis steps and lack flexibility in accommodating diverse types of genomic data. Users are typically required to turn to multiple tools to bridge the gaps between wet lab data and well-founded hypothesis for future experimental exploration. Variations in performance and potential incompatibility between tools further amplify the difficulty in big data integration. To overcome these challenges, we have developed an R package, Mergeomics (Shu et al., companion manuscript), which has the capacity to integrate summary-level disease association data, functional genomics (such as expression quantitative trait loci (eQTLs) and ENCODE annotations), biological pathways, and gene networks to identify disease-associated gene subnetworks and key regulatory genes. Examples of disease association data are genetic association studies via genome-wide association studies (GWAS) or exome sequencing, transcriptome-wide association (TWAS) via microarray or RNA sequencing studies, and epigenome-wide association studies (EWAS). Mergeomics has a streamlined workflow to incorporate multiple analysis components in modular format, allowing flexibility in accommodating a variety of data types and study designs. In order to achieve this flexibility in data types and study design, it is not feasible to deal with these diverse data types in their native, pre-processed format. We find that the association of markers (e.g., SNPs, genes, methylation sites) to a phenotype from various data modalities is a converging point that allows us to apply our strategy across such data types as GWAS, microarray, RNA sequencing, DNA methylation, etc. In the context of the current study, we use the term “association dataset” in a broader sense by referring to the association statistics between any omics-derived markers and a phenotype. The applicability of the approach to various data types including GWAS, TWAS, and EWAS has been demonstrated in the companion paper. It is also feasible for metabolome and proteome association datasets. In addition, the analytical pipeline can be used for datasets from any species as long as the corresponding species-specific datasets and information are provided. Furthermore, our pipeline offers a meta-analysis component that can converge findings across data types and/or species in a single analysis, as demonstrated in the companion method paper. To maximize the usability of Mergeomics, here we present a user-friendly web server with full implementation of the features in the Mergeomics R package. The web server features four analytical modules that can correct for dependencies between omics markers (MDF - Marker Dependency Filtering), identify biological pathways and networks that are enriched for disease-associated signals from a single omics study (MSEA - Marker Set Enrichment Analysis) or from two or more omics studies of the same or different data types (Meta-MSEA), and identify key network regulators or hubs of the disease-associated pathways and networks (wKDA - Weighted Key Driver Analysis). The web server is also accompanied by a detailed step-by-step tutorial and pre-processed, commonly used datasets such as functional genomics, pathways, and gene networks as resources to facilitate data integration. Users have the full flexibility to select analysis modules that best suit their study design and data types. Summary results and interactive network views are displayed online and links to detailed downloadable results are also available.

Implementation

The Mergeomics web server is hosted within UCLA’s Hoffman2 High Performance Cluster as part of the Institute for Digital Research and Education (IDRE), and jobs are submitted using the Sun Grid Engine queuing system. The web server front-end is written in HTML, PHP and JavaScript and hosted through an Apache HTTP server, which allows users to submit jobs through web browsers (all major browsers are supported). A 400 MB file size upload restriction is imposed to ensure reasonable compute time. User files are temporarily kept under a randomly generated file name on the web server during the analysis. As soon as the compute completes, user uploaded files are deleted from the server. Submitted jobs are assigned unique alphanumeric identifiers that allow users to control access to their results. Only the job submitter will be given the job-specific links to download the result files, which are kept on the web server for 24 h to allow sufficient time to download. Users can provide an email address (optional) if they would like to receive notifications about their job status and have results emailed upon task completion. Sever traffic is monitored through the total number of hits and the number of unique hits.

Results and discussion

Web server overview

As depicted in Fig. 1, the web server hosts three resource pages (HOME, TUTORIAL, and DOWNLOAD) and four analytical modules (MDF, MSEA, Meta-MSEA, and wKDA). The “HOME” page provides basic information about the Mergeomics pipeline including data types involved, overall workflow, and citation information. The “TUTORIAL” section provides step-by-step instructions for using the web server and detailed descriptions about input file format requirements, analysis parameters, and result interpretation for each analytical component. The “DOWNLOAD” section gives users access to the standalone Mergeomics R package and a wide variety of useful, publically available genomic resources applicable in the pipeline, including sample disease association studies, tissue-specific eQTL studies, ENCODE resources, knowledge-driven biochemical and signaling pathways, and data-driven gene networks (Additional file 1: Table S1). The four analytical modules are: 1) MDF, standing for Marker Dependency Filtering, which corrects for dependencies between omics markers (e.g. correcting for linkage disequilibrium or LD between genetic variants in GWAS), 2) MSEA, standing for Marker Set Enrichment Analysis, that allows users to leverage multi-omics association data, functional genomics, canonical pathways and/or data-driven gene modules to identify causal subnetworks of disease or traits; 3) Meta-MSEA, which performs pathway- or network-level meta-analysis when multiple association datasets from different studies (e.g., multiple GWAS) and/or of different data types (e.g., GWAS and EWAS) are available; 4) wKDA, standing for weighted key driver analysis, which maps disease associated genes to tissue-specific gene networks to identify potential key drivers or regulators of disease. We also provide an interactive network visualization module for visualizing select key drivers and their associated disease subnetworks.

Fig. 1

Web server pipeline overview with red lines indicating cross-module interactions, green headers indicating resource pages, and teal headers indicating analytical modules

Web server pipeline overview with red lines indicating cross-module interactions, green headers indicating resource pages, and teal headers indicating analytical modules To use the web server, users follow a streamlined workflow (Fig. 1) by setting up the parameters and selecting or uploading datasets necessary for each of the analytical components. Due to the modular design, users are allowed the flexibility to choose subsets of the analytical components according to available datasets and specific study design (Fig. 2). For example, if a single association dataset (e.g., GWAS/EWAS/TWAS) is available, users can identify the causal subnetwork and key regulatory gene of a trait or disease using the MSEA-wKDA-Visualization workflow (Fig. 2a). If multiple association datasets of either the same data type or different data types are available, the Meta-MSEA-wKDA-Visualization workflow is more appropriate (Fig. 2b). If only groups of disease-associated genes are available, the wKDA-Visualization workflow is sufficient to generate the key regulators and disease subnetworks, and users could still explore the association of the gene sets with the same disease in other organisms or other relevant disease types using MSEA or Meta-MSEA, if the corresponding association data is available (Fig. 2c). As technological advances bring about new data types, they can be effectively incorporated into the analysis framework of Mergeomics. In the following sections we explain each analytical component of the Mergeomics web server in detail.

Fig. 2

Flexibility of Mergeomics in accommodating different datasets and study designs. a When a single disease association dataset is available, an MSEA-wKDA-visualization flow can be utilized. b When multiple disease association datasets are available, use a Meta-MSEA then wKDA flow is more appropriate. c When selected disease-associated genes or proteins are already identified, wKDA can be directly used. MSEA could also be run if additional association datasets are available, allowing for association testing in other diseases or cross-species validation

Marker dependency filtering (MDF)

Depending on the type of data to be analyzed in MSEA or Meta-MSEA, an additional step may be required to correct for any dependencies between the markers (e.g., LD between single nucleotide polymorphisms (SNPs) in GWAS, correlation between adjacent methylation sites, etc.). For GWAS, we provide on-the-fly pruning of SNPs based on HapMap3 LD information and user-defined LD cutoffs. For other types of markers, users need to prepare and upload the marker association files containing independent markers. If dependency among markers is not corrected, spurious results may be generated.

Data input

Association summary results between molecular markers and a disease or phenotype as described in the following MSEA section. Gene-marker mapping file as described in the MSEA section. Marker dependency file, which describes the pairwise relationships of the markers in the association file.

Parameter setting

Two parameters are required for MDF: marker dependency cutoff and the percentage of top markers to be used. For GWAS, LD cutoff defines the maximum acceptable r2 values when pruning the SNPs, which is done using the built-in MDPRUNE program of Mergeomics pipeline. The MDPRUNE program is also downloadable from the web server and can be run independently for marker pruning. The percentage of top marker parameter is introduced to increase signal to noise ratio since more noise tends to be present towards the weaker association spectrum [4]. Default parameters yield optimal performance for GWAS based on simulation.

Result interpretation

The output of MDF contains the marker-disease association results only for independent markers after pruning, which can be directly used in the downstream MSEA or meta-MSEA analysis.

Marker-set enrichment analysis (MSEA)

MSEA proceeds in the following order. Step 1: Upload marker-level association data. Only summary level association results featuring marker IDs and association –log10 p-values are needed. Step 2: Map marker-level association p-values to genes based on available genomic mapping. Step 3: Assign p-values to knowledge-driven or data-driven gene sets in the form of pathways of gene subnetwork modules. Step 4: Perform enrichment analysis and output gene sets significantly enriched for disease-associated markers. Step 5: Merge top significant gene sets into non-overlapping supersets to reduce redundancy.

Data preparation

MSEA requires three different types of data as input, and our web server provides a number of public datasets and allows users to upload datasets of their interest. Marker-disease association summary results including Marker IDs and –log10 transformed association p-values (Fig. 3a). Marker types can be SNPs, methylation loci, transcripts, proteins, metabolites, etc. (sample datasets provided in the web server). Non-independent relationship between markers should be corrected if possible (using MDF described above).

Fig. 3

Screenshots showing a step-by-step overview of the MSEA module using sample data on the web server. a Association data format. b Marker mapping data format. c Parameters. d Gene sets format. e Gene sets descriptions. f Example result summary table containing the IDs and descriptions of the gene sets tested to be associated with the disease, the p - values, FDR values, top genes and markers with the strongest disease association within each gene set, and the strongest association scores in the form of –log10 p - values Gene-marker mapping file that links genomic markers to genes (Fig. 3b). For GWAS, the most commonly used mapping is based on genomic distance (e.g., 10 kb, 20 kb, 50 kb), which is provided on the web server. Such distance mapping could be applied to methylation sites in EWAS as well. A data-driven function-based mapping is more preferred if available. To facilitate this aspect, we provide curated publically available functional genomics resources from ENCODE [5] and various tissue-specific eQTLs from numerous studies including GTEx [6] and Muther [7] (Additional file 1: Table S1). Users can upload their own mapping files between markers and genes. Functionally related gene sets defined based on various criteria (Fig. 3d), such as co-regulation, shared response, co-localization on chromosomes, or participants of specific biological processes or signaling pathways. Typical sources of gene sets includes canonical pathways such as Reactome [8] and KEGG [9], coexpression modules constructed using algorithms like weighted coexpression gene networks analysis (WGCNA) [10], and gene signatures from gene expression profiling studies. We provide curated functional gene sets from these sources on our web server (Additional file 1: Table S1), but also allow users to upload their own gene sets. To better annotate the gene sets in MSEA output, the user could optionally provide a description file for the gene sets to specify the source of the gene set and a detailed description of the functional information used to group genes (Fig. 3e). The core parameter of MSEA is the permutation type for assessing null distribution. Either marker-level permutation or gene-level permutation can be selected. Gene-level permutation is set as default as it has higher true positive rate, low false positive rate, and low sensitivity to variation in other parameters (Shu et al., companion manuscript). However, users may use the more sensitive marker-level permutation for suggestive signals. Additional parameters include the maximum and minimum number of genes in gene sets, false discovery rate (FDR) cutoff to determine significant signals, and merging criteria for overlapping gene sets (Fig. 3c; detailed in online tutorial). On the result webpage, a summary table for the top disease-associated gene sets at a user-defined cutoff is generated for quick reference of the MSEA results (Fig. 3f). This summary table reports the gene set ID, disease association enrichment p - value of the gene set, FDR, gene set description, top 3 genes and top 3 markers in each gene set that contribute to the overall enrichment signal, and the association p - values (in –log10 format) of the markers. The detailed results of MSEA are provided in four downloadable tables: 1) “MSEA_modules_pval.txt”: a list of gene sets ranked by enrichment p -values and false discovery rate (FDR). 2) “MSEA_modules_full_results.txt”: detailed enrichment results including enrichment p -values, number of mapped genes and markers for each gene set, and the density ratio which gives the ratio of unique markers mapped to gene set divided by the total markers in the analysis. 3) “MSEA_top_modules_details.txt”: detailed list of genes from top gene sets (default: FDR < 25 %), accompanied by the total number of markers, top markers and the highest –log10 transformed association p -value for each gene. Users could use this output to track the origin of enrichment signals as well as to identify potentially important genes and markers in each of the significant gene sets. 4) “MSEA_genes_details.txt”: stores gene-related MSEA results, including enrichment score, number of markers mapped to each gene, and markers with top association values for each gene.

Meta-MSEA

When multiple disease association datasets are available, Meta-MSEA can be used to conduct meta-analysis at the pathway or network level. This function allows users to achieve maximal power by combining results from independent association studies of different ethnicities, platforms or even species, while avoiding the technical difficulties when performing meta-analysis directly on the marker-level association data [11]. To run meta-MSEA, users simply need to navigate to the Meta-MSEA tab, and upload multiple datasets following the same workflow as previously described for MSEA to generate results for individual datasets as well as the pathway/network-level meta-analysis results. The result files produced by Meta-MSEA follow the same layout as MSEA.

Weighted key driver analysis (wKDA)

wKDA aims to pinpoint key regulator genes or key drivers (KDs) of the disease related gene sets from MSEA or meta-MSEA using gene network topology and edge weight information. Specifically, wKDA first screens the network for candidate hub genes. Then the disease gene sets are overlaid onto the subnetworks of the candidate hubs to identify KDs whose neighbors are enriched with disease genes. Two types of files are required for wKDA: 1) disease-associated gene sets (Fig. 4a) and 2) molecular networks (Fig. 4c). wKDA can be run as either the continuing step of MSEA or meta-MSEA or as an independent step (Fig. 2). If the user elects to continue wKDA from MSEA or meta-MSEA, then the enriched gene sets from these analyses will be used as the disease-associated gene sets. If the user elects to run wKDA as a separate module, they must upload their own gene sets to the web server or they can use the pre-loaded sample gene set for testing. With regards to molecular networks, wKDA supports a wide range of directed and undirected regulatory networks. wKDA is designed to utilize edge weight information in gene networks, which could be connection strength or reliability measures. If no edge weight information is available, wKDA can also operate by considering equal weights to all edges. The web server provides a collection of tissue-specific Bayesian networks previously constructed in human and mouse studies (Additional file 1: Table S1). There are also a large number of publicly available network resources, such as protein-protein interactions (PPI) [12], BioGRID [13], GeneMANIA [14] and GIANT [15], which could be used in wKDA.

Fig. 4

Screenshots showing a step-by-step overview of the wKDA module using sample data on the web server. a Gene set format. b Parameters. c Network format. d Example result summary table containing the IDs of the disease associated gene sets, their KDs, p - values and FDRs of the KDs, gene counts of KD subnetworks, and fold enrichment of the KD subnetworks for disease genes Core wKDA parameters include 1) search depth, which specifies the number of layers to expand in the network when determining the neighboring genes of candidate KDs for enrichment assessment, and 2) edge directionality, which specifies whether to neglect edge directionality (incoming and outgoing) or require the candidate hubs to be upstream of neighbor genes (only outgoing) for networks that carry directionality information (Fig. 4b). Additional parameters are described in detail in the online tutorial. Summary results of wKDA will be displayed on the webpage (Fig. 4d), which reports top 5 KDs for each disease gene set along with the statistics. Users could also download four detailed results files: 1) “wKDA_kd_pvalues.txt”, a summary table of all KDs ranked by p -values and FDR; 2) “wKDA_kd_full_results.txt”, providing detailed statistics on all KDs identified; 3) “wKDA_kd_tophits.txt”, p-value summary table for only the top KDs for each disease gene set. 4) “wKDA_hub_structure.txt”, specifying hub-cohub relationship. The co-hub structure is useful to group KDs with highly overlapping network topology, and to retrieve list of independent KDs for more efficient prioritization. Additionally, wKDA provides Cytoscape-ready files that can be used in Cytoscape [16] for a more customized visualization than the included web-based network visualization module.

Network visualization

Our web server provides a convenient module to allow users visualize top KDs and subnetworks using Cytoscape Web v0.8 [17]. The top 5 KDs for each disease gene set from wKDA will be automatically visualized, as exemplified in Fig. 5a. The visualization is interactive so that users can make real-time changes such as zooming in on a node of interest by only considering that particular subnetwork (Fig. 5b) or by filtering a subnetwork based on the edge weight information (Fig. 5c), as detailed in the tutorial page.

Fig. 5

Screenshots of the network visualization module output generated by the default sample files in the wKDA module. a Overview of a network comprised of top KDs for LDL-associated gene sets, with 5 KDs for each gene set displayed. KDs are depicted as diamond nodes and node colors indicating gene set membership. b Focused subnetwork view of key driver of interest Fasn when filtering the network in (a) by submitting a “Fasn” query or right clicking the node. c The network can be filtered to display only nodes with edge weights higher than a certain threshold: 4.0 in this case

Application example

As an illustration of our web server’s workflow and analysis results, we applied the Mergeomics web server to a publically available low-density lipoprotein (LDL) GWAS dataset from the GLGC consortium [18]. All files mentioned within this section are provided as example files on the web server. To correct for LD between SNPs, we used the MDF analysis module and the following input files: the GLGC LDL GWAS summary statistics (SNPs, −log10 p - values), SNP-gene mapping based on 50 kb chromosomal distance, and the Hapmap CEU LD file containing SNPs with r2 > 0.7 as the Marker Dependency file. We also filtered the GWAS loci by only consider the top 50 % SNPs ranked by p - values to reduce random noise from the weaker association spectrum. After correction for marker-dependency using MDF, the resulting association and mapping files were used directly as input for MSEA (Fig. 3a-b) using gene permutation and default setting for the other parameters (Fig. 3c), to test for enrichment for canonical pathways collected from KEGG, Biocarta and Reactome databases (Fig. 3d-e). Upon completion of MSEA, a summary table was produced which details the top pathways ranked by FDR along with descriptions of the pathways and top associated genes and SNPs in each pathway. As exemplified in Fig. 3f, “Metabolism of lipids and lipoproteins”, “biosynthesis of unsaturated fatty acids”, and “terepenoid backbone biosynthesis” were three of the top pathways identified among others. APOC2, APOE, and LDLR were listed as the top associated genes in the “metabolism of lipids and lipoproteins” pathway, and their corresponding SNPs were also provided in the summary table. Links to detailed result tables were also displayed for file download. Furthermore, these top pathways were checked for overlaps and merged if significant overlaps in gene membership between pathways were identified. A summary table of the merged pathways was also displayed (not shown). To identify potential KDs and subnetworks for the LDL-associated pathways, the merged pathways were used directly as input for wKDA (Fig. 4a). wKDA was run using default parameters (Fig. 4b) and a liver Bayesian network (Fig. 4c). Upon completion, a summary table is produced (Fig. 4d) which lists the top 5 KDs for each merged module and information about their local subnetwork structure. For example, Fasn was a KD for the Metabolism of Lipids and Lipoproteins pathway. Links to detailed result tables were also displayed for file download. The wKDA results can be viewed directly using the interactive visualization feature, which by default illustrates the top 5 KDs for each gene set and their local subnetworks with disease genes highlighted (Fig. 5a). The networks can be filtered by selecting a particular KD of interest to focus on (Fig. 5b) or by removing edges below an edge weight cutoff to focus on high confidence network connections (Fig. 5c). To facilitate further customization of network views, Cytoscape-ready files can be downloaded for external visualization.

Conclusions

We have implemented the Mergeomics pipeline as a user-friendly, publicly available web server that can facilitate multidimensional omics data integration to expedite novel discoveries. The web server also pre-populates a wide range of publically available data sources. Users can apply the pipeline to their own data in conjunction with any preloaded data to identify disease-associated pathways, gene networks, and key regulators. The web server includes step-by-step tutorials, examples and visualization tools in a web-based platform. The flexibility of the web server to accommodate various omics data types and to conduct pathway and network-level meta-analysis of multiple studies of different design will boost our ability to integrate big data.

18 in total

1. KEGG: kyoto encyclopedia of genes and genomes.

Authors: M Kanehisa; S Goto
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

3. An integrative genomics approach to infer causal associations between gene expression and disease.

Authors: Eric E Schadt; John Lamb; Xia Yang; Jun Zhu; Steve Edwards; Debraj Guhathakurta; Solveig K Sieberts; Stephanie Monks; Marc Reitman; Chunsheng Zhang; Pek Yee Lum; Amy Leonardson; Rolf Thieringer; Joseph M Metzger; Liming Yang; John Castle; Haoyuan Zhu; Shera F Kash; Thomas A Drake; Alan Sachs; Aldons J Lusis
Journal: Nat Genet Date: 2005-06-19 Impact factor: 38.330

Review 4. The model organism as a system: integrating 'omics' data sets.

Authors: Andrew R Joyce; Bernhard Ø Palsson
Journal: Nat Rev Mol Cell Biol Date: 2006-03 Impact factor: 94.444

5. Hundreds of variants clustered in genomic loci and biological pathways affect human height.

Authors: Hana Lango Allen; Karol Estrada; Guillaume Lettre; Sonja I Berndt; Michael N Weedon; Fernando Rivadeneira; Cristen J Willer; Anne U Jackson; Sailaja Vedantam; Soumya Raychaudhuri; Teresa Ferreira; Andrew R Wood; Robert J Weyant; Ayellet V Segrè; Elizabeth K Speliotes; Eleanor Wheeler; Nicole Soranzo; Ju-Hyun Park; Jian Yang; Daniel Gudbjartsson; Nancy L Heard-Costa; Joshua C Randall; Lu Qi; Albert Vernon Smith; Reedik Mägi; Tomi Pastinen; Liming Liang; Iris M Heid; Jian'an Luan; Gudmar Thorleifsson; Thomas W Winkler; Michael E Goddard; Ken Sin Lo; Cameron Palmer; Tsegaselassie Workalemahu; Yurii S Aulchenko; Asa Johansson; M Carola Zillikens; Mary F Feitosa; Tõnu Esko; Toby Johnson; Shamika Ketkar; Peter Kraft; Massimo Mangino; Inga Prokopenko; Devin Absher; Eva Albrecht; Florian Ernst; Nicole L Glazer; Caroline Hayward; Jouke-Jan Hottenga; Kevin B Jacobs; Joshua W Knowles; Zoltán Kutalik; Keri L Monda; Ozren Polasek; Michael Preuss; Nigel W Rayner; Neil R Robertson; Valgerdur Steinthorsdottir; Jonathan P Tyrer; Benjamin F Voight; Fredrik Wiklund; Jianfeng Xu; Jing Hua Zhao; Dale R Nyholt; Niina Pellikka; Markus Perola; John R B Perry; Ida Surakka; Mari-Liis Tammesoo; Elizabeth L Altmaier; Najaf Amin; Thor Aspelund; Tushar Bhangale; Gabrielle Boucher; Daniel I Chasman; Constance Chen; Lachlan Coin; Matthew N Cooper; Anna L Dixon; Quince Gibson; Elin Grundberg; Ke Hao; M Juhani Junttila; Lee M Kaplan; Johannes Kettunen; Inke R König; Tony Kwan; Robert W Lawrence; Douglas F Levinson; Mattias Lorentzon; Barbara McKnight; Andrew P Morris; Martina Müller; Julius Suh Ngwa; Shaun Purcell; Suzanne Rafelt; Rany M Salem; Erika Salvi; Serena Sanna; Jianxin Shi; Ulla Sovio; John R Thompson; Michael C Turchin; Liesbeth Vandenput; Dominique J Verlaan; Veronique Vitart; Charles C White; Andreas Ziegler; Peter Almgren; Anthony J Balmforth; Harry Campbell; Lorena Citterio; Alessandro De Grandi; Anna Dominiczak; Jubao Duan; Paul Elliott; Roberto Elosua; Johan G Eriksson; Nelson B Freimer; Eco J C Geus; Nicola Glorioso; Shen Haiqing; Anna-Liisa Hartikainen; Aki S Havulinna; Andrew A Hicks; Jennie Hui; Wilmar Igl; Thomas Illig; Antti Jula; Eero Kajantie; Tuomas O Kilpeläinen; Markku Koiranen; Ivana Kolcic; Seppo Koskinen; Peter Kovacs; Jaana Laitinen; Jianjun Liu; Marja-Liisa Lokki; Ana Marusic; Andrea Maschio; Thomas Meitinger; Antonella Mulas; Guillaume Paré; Alex N Parker; John F Peden; Astrid Petersmann; Irene Pichler; Kirsi H Pietiläinen; Anneli Pouta; Martin Ridderstråle; Jerome I Rotter; Jennifer G Sambrook; Alan R Sanders; Carsten Oliver Schmidt; Juha Sinisalo; Jan H Smit; Heather M Stringham; G Bragi Walters; Elisabeth Widen; Sarah H Wild; Gonneke Willemsen; Laura Zagato; Lina Zgaga; Paavo Zitting; Helene Alavere; Martin Farrall; Wendy L McArdle; Mari Nelis; Marjolein J Peters; Samuli Ripatti; Joyce B J van Meurs; Katja K Aben; Kristin G Ardlie; Jacques S Beckmann; John P Beilby; Richard N Bergman; Sven Bergmann; Francis S Collins; Daniele Cusi; Martin den Heijer; Gudny Eiriksdottir; Pablo V Gejman; Alistair S Hall; Anders Hamsten; Heikki V Huikuri; Carlos Iribarren; Mika Kähönen; Jaakko Kaprio; Sekar Kathiresan; Lambertus Kiemeney; Thomas Kocher; Lenore J Launer; Terho Lehtimäki; Olle Melander; Tom H Mosley; Arthur W Musk; Markku S Nieminen; Christopher J O'Donnell; Claes Ohlsson; Ben Oostra; Lyle J Palmer; Olli Raitakari; Paul M Ridker; John D Rioux; Aila Rissanen; Carlo Rivolta; Heribert Schunkert; Alan R Shuldiner; David S Siscovick; Michael Stumvoll; Anke Tönjes; Jaakko Tuomilehto; Gert-Jan van Ommen; Jorma Viikari; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael A Province; Manfred Kayser; Alice M Arnold; Larry D Atwood; Eric Boerwinkle; Stephen J Chanock; Panos Deloukas; Christian Gieger; Henrik Grönberg; Per Hall; Andrew T Hattersley; Christian Hengstenberg; Wolfgang Hoffman; G Mark Lathrop; Veikko Salomaa; Stefan Schreiber; Manuela Uda; Dawn Waterworth; Alan F Wright; Themistocles L Assimes; Inês Barroso; Albert Hofman; Karen L Mohlke; Dorret I Boomsma; Mark J Caulfield; L Adrienne Cupples; Jeanette Erdmann; Caroline S Fox; Vilmundur Gudnason; Ulf Gyllensten; Tamara B Harris; Richard B Hayes; Marjo-Riitta Jarvelin; Vincent Mooser; Patricia B Munroe; Willem H Ouwehand; Brenda W Penninx; Peter P Pramstaller; Thomas Quertermous; Igor Rudan; Nilesh J Samani; Timothy D Spector; Henry Völzke; Hugh Watkins; James F Wilson; Leif C Groop; Talin Haritunians; Frank B Hu; Robert C Kaplan; Andres Metspalu; Kari E North; David Schlessinger; Nicholas J Wareham; David J Hunter; Jeffrey R O'Connell; David P Strachan; H-Erich Wichmann; Ingrid B Borecki; Cornelia M van Duijn; Eric E Schadt; Unnur Thorsteinsdottir; Leena Peltonen; André G Uitterlinden; Peter M Visscher; Nilanjan Chatterjee; Ruth J F Loos; Michael Boehnke; Mark I McCarthy; Erik Ingelsson; Cecilia M Lindgren; Gonçalo R Abecasis; Kari Stefansson; Timothy M Frayling; Joel N Hirschhorn
Journal: Nature Date: 2010-09-29 Impact factor: 49.962

6. An integrated encyclopedia of DNA elements in the human genome.

Authors:
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

7. Mapping cis- and trans-regulatory effects across multiple tissues in twins.

Authors: Elin Grundberg; Kerrin S Small; Åsa K Hedman; Alexandra C Nica; Alfonso Buil; Sarah Keildson; Jordana T Bell; Tsun-Po Yang; Eshwar Meduri; Amy Barrett; James Nisbett; Magdalena Sekowska; Alicja Wilk; So-Youn Shin; Daniel Glass; Mary Travers; Josine L Min; Sue Ring; Karen Ho; Gudmar Thorleifsson; Augustine Kong; Unnur Thorsteindottir; Chrysanthi Ainali; Antigone S Dimas; Neelam Hassanali; Catherine Ingle; David Knowles; Maria Krestyaninova; Christopher E Lowe; Paola Di Meglio; Stephen B Montgomery; Leopold Parts; Simon Potter; Gabriela Surdulescu; Loukia Tsaprouni; Sophia Tsoka; Veronique Bataille; Richard Durbin; Frank O Nestle; Stephen O'Rahilly; Nicole Soranzo; Cecilia M Lindgren; Krina T Zondervan; Kourosh R Ahmadi; Eric E Schadt; Kari Stefansson; George Davey Smith; Mark I McCarthy; Panos Deloukas; Emmanouil T Dermitzakis; Tim D Spector
Journal: Nat Genet Date: 2012-09-02 Impact factor: 38.330

8. The Reactome pathway knowledgebase.

Authors: David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971

9. Discovery and refinement of loci associated with lipid levels.

Authors: Cristen J Willer; Ellen M Schmidt; Sebanti Sengupta; Michael Boehnke; Panos Deloukas; Sekar Kathiresan; Karen L Mohlke; Erik Ingelsson; Gonçalo R Abecasis; Gina M Peloso; Stefan Gustafsson; Stavroula Kanoni; Andrea Ganna; Jin Chen; Martin L Buchkovich; Samia Mora; Jacques S Beckmann; Jennifer L Bragg-Gresham; Hsing-Yi Chang; Ayşe Demirkan; Heleen M Den Hertog; Ron Do; Louise A Donnelly; Georg B Ehret; Tõnu Esko; Mary F Feitosa; Teresa Ferreira; Krista Fischer; Pierre Fontanillas; Ross M Fraser; Daniel F Freitag; Deepti Gurdasani; Kauko Heikkilä; Elina Hyppönen; Aaron Isaacs; Anne U Jackson; Åsa Johansson; Toby Johnson; Marika Kaakinen; Johannes Kettunen; Marcus E Kleber; Xiaohui Li; Jian'an Luan; Leo-Pekka Lyytikäinen; Patrik K E Magnusson; Massimo Mangino; Evelin Mihailov; May E Montasser; Martina Müller-Nurasyid; Ilja M Nolte; Jeffrey R O'Connell; Cameron D Palmer; Markus Perola; Ann-Kristin Petersen; Serena Sanna; Richa Saxena; Susan K Service; Sonia Shah; Dmitry Shungin; Carlo Sidore; Ci Song; Rona J Strawbridge; Ida Surakka; Toshiko Tanaka; Tanya M Teslovich; Gudmar Thorleifsson; Evita G Van den Herik; Benjamin F Voight; Kelly A Volcik; Lindsay L Waite; Andrew Wong; Ying Wu; Weihua Zhang; Devin Absher; Gershim Asiki; Inês Barroso; Latonya F Been; Jennifer L Bolton; Lori L Bonnycastle; Paolo Brambilla; Mary S Burnett; Giancarlo Cesana; Maria Dimitriou; Alex S F Doney; Angela Döring; Paul Elliott; Stephen E Epstein; Gudmundur Ingi Eyjolfsson; Bruna Gigante; Mark O Goodarzi; Harald Grallert; Martha L Gravito; Christopher J Groves; Göran Hallmans; Anna-Liisa Hartikainen; Caroline Hayward; Dena Hernandez; Andrew A Hicks; Hilma Holm; Yi-Jen Hung; Thomas Illig; Michelle R Jones; Pontiano Kaleebu; John J P Kastelein; Kay-Tee Khaw; Eric Kim; Norman Klopp; Pirjo Komulainen; Meena Kumari; Claudia Langenberg; Terho Lehtimäki; Shih-Yi Lin; Jaana Lindström; Ruth J F Loos; François Mach; Wendy L McArdle; Christa Meisinger; Braxton D Mitchell; Gabrielle Müller; Ramaiah Nagaraja; Narisu Narisu; Tuomo V M Nieminen; Rebecca N Nsubuga; Isleifur Olafsson; Ken K Ong; Aarno Palotie; Theodore Papamarkou; Cristina Pomilla; Anneli Pouta; Daniel J Rader; Muredach P Reilly; Paul M Ridker; Fernando Rivadeneira; Igor Rudan; Aimo Ruokonen; Nilesh Samani; Hubert Scharnagl; Janet Seeley; Kaisa Silander; Alena Stančáková; Kathleen Stirrups; Amy J Swift; Laurence Tiret; Andre G Uitterlinden; L Joost van Pelt; Sailaja Vedantam; Nicholas Wainwright; Cisca Wijmenga; Sarah H Wild; Gonneke Willemsen; Tom Wilsgaard; James F Wilson; Elizabeth H Young; Jing Hua Zhao; Linda S Adair; Dominique Arveiler; Themistocles L Assimes; Stefania Bandinelli; Franklyn Bennett; Murielle Bochud; Bernhard O Boehm; Dorret I Boomsma; Ingrid B Borecki; Stefan R Bornstein; Pascal Bovet; Michel Burnier; Harry Campbell; Aravinda Chakravarti; John C Chambers; Yii-Der Ida Chen; Francis S Collins; Richard S Cooper; John Danesh; George Dedoussis; Ulf de Faire; Alan B Feranil; Jean Ferrières; Luigi Ferrucci; Nelson B Freimer; Christian Gieger; Leif C Groop; Vilmundur Gudnason; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Aroon Hingorani; Joel N Hirschhorn; Albert Hofman; G Kees Hovingh; Chao Agnes Hsiung; Steve E Humphries; Steven C Hunt; Kristian Hveem; Carlos Iribarren; Marjo-Riitta Järvelin; Antti Jula; Mika Kähönen; Jaakko Kaprio; Antero Kesäniemi; Mika Kivimaki; Jaspal S Kooner; Peter J Koudstaal; Ronald M Krauss; Diana Kuh; Johanna Kuusisto; Kirsten O Kyvik; Markku Laakso; Timo A Lakka; Lars Lind; Cecilia M Lindgren; Nicholas G Martin; Winfried März; Mark I McCarthy; Colin A McKenzie; Pierre Meneton; Andres Metspalu; Leena Moilanen; Andrew D Morris; Patricia B Munroe; Inger Njølstad; Nancy L Pedersen; Chris Power; Peter P Pramstaller; Jackie F Price; Bruce M Psaty; Thomas Quertermous; Rainer Rauramaa; Danish Saleheen; Veikko Salomaa; Dharambir K Sanghera; Jouko Saramies; Peter E H Schwarz; Wayne H-H Sheu; Alan R Shuldiner; Agneta Siegbahn; Tim D Spector; Kari Stefansson; David P Strachan; Bamidele O Tayo; Elena Tremoli; Jaakko Tuomilehto; Matti Uusitupa; Cornelia M van Duijn; Peter Vollenweider; Lars Wallentin; Nicholas J Wareham; John B Whitfield; Bruce H R Wolffenbuttel; Jose M Ordovas; Eric Boerwinkle; Colin N A Palmer; Unnur Thorsteinsdottir; Daniel I Chasman; Jerome I Rotter; Paul W Franks; Samuli Ripatti; L Adrienne Cupples; Manjinder S Sandhu; Stephen S Rich
Journal: Nat Genet Date: 2013-10-06 Impact factor: 38.330

10. Understanding multicellular function and disease with human tissue-specific networks.

Authors: Casey S Greene; Arjun Krishnan; Aaron K Wong; Emanuela Ricciotti; Rene A Zelaya; Daniel S Himmelstein; Ran Zhang; Boris M Hartmann; Elena Zaslavsky; Stuart C Sealfon; Daniel I Chasman; Garret A FitzGerald; Kara Dolinski; Tilo Grosser; Olga G Troyanskaya
Journal: Nat Genet Date: 2015-04-27 Impact factor: 38.330

19 in total

1. Integration of Multi-omics Data from Mouse Diversity Panel Highlights Mitochondrial Dysfunction in Non-alcoholic Fatty Liver Disease.

Authors: Karthickeyan Chella Krishnan; Zeyneb Kurt; Rio Barrere-Cain; Simon Sabir; Aditi Das; Raquel Floyd; Laurent Vergnes; Yuqi Zhao; Nam Che; Sarada Charugundla; Hannah Qi; Zhiqiang Zhou; Yonghong Meng; Calvin Pan; Marcus M Seldin; Frode Norheim; Simon Hui; Karen Reue; Aldons J Lusis; Xia Yang
Journal: Cell Syst Date: 2018-01-18 Impact factor: 10.304

2. Integrative Prioritization of Causal Genes for Coronary Artery Disease.

Authors: Ke Hao; Raili Ermel; Katyayani Sukhavasi; Haoxiang Cheng; Lijiang Ma; Ling Li; Letizia Amadori; Simon Koplev; Oscar Franzén; Valentina d'Escamard; Nirupama Chandel; Kathryn Wolhuter; Nicole S Bryce; Vamsidhar R M Venkata; Clint L Miller; Arno Ruusalepp; Heribert Schunkert; Johan L M Björkegren; Jason C Kovacic
Journal: Circ Genom Precis Med Date: 2021-12-28

3. The HDAC9-associated risk locus promotes coronary artery disease by governing TWIST1.

Authors: Lijiang Ma; Nicole S Bryce; Adam W Turner; Antonio F Di Narzo; Karishma Rahman; Yang Xu; Raili Ermel; Katyayani Sukhavasi; Valentina d'Escamard; Nirupama Chandel; Bhargavi V'Gangula; Kathryn Wolhuter; Daniella Kadian-Dodov; Oscar Franzen; Arno Ruusalepp; Ke Hao; Clint L Miller; Johan L M Björkegren; Jason C Kovacic
Journal: PLoS Genet Date: 2022-06-17 Impact factor: 6.020

4. Differential metabolic and multi-tissue transcriptomic responses to fructose consumption among genetically diverse mice.

Authors: Guanglin Zhang; Hyae Ran Byun; Zhe Ying; Montgomery Blencowe; Yuqi Zhao; Jason Hong; Le Shu; Karthick Chella Krishnan; Fernando Gomez-Pinilla; Xia Yang
Journal: Biochim Biophys Acta Mol Basis Dis Date: 2019-10-25 Impact factor: 5.187

Review 5. Systems genetics applications in metabolism research.

Authors: Marcus Seldin; Xia Yang; Aldons J Lusis
Journal: Nat Metab Date: 2019-10-21

6. CoffeeProt: an online tool for correlation and functional enrichment of systems genetics data.

Authors: Jeffrey Molendijk; Marcus M Seldin; Benjamin L Parker
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971

7. Integration of multiple networks and pathways identifies cancer driver genes in pan-cancer analysis.

Authors: Claudia Cava; Gloria Bertoli; Antonio Colaprico; Catharina Olsen; Gianluca Bontempi; Isabella Castiglioni
Journal: BMC Genomics Date: 2018-01-06 Impact factor: 3.969

8. Shared genetic regulatory networks for cardiovascular disease and type 2 diabetes in multiple populations of diverse ethnicities in the United States.

Authors: Le Shu; Kei Hang K Chan; Guanglin Zhang; Tianxiao Huan; Zeyneb Kurt; Yuqi Zhao; Veronica Codoni; David-Alexandre Trégouët; Jun Yang; James G Wilson; Xi Luo; Daniel Levy; Aldons J Lusis; Simin Liu; Xia Yang
Journal: PLoS Genet Date: 2017-09-28 Impact factor: 5.917

9. Multiple independent mechanisms link gene polymorphisms in the region of ZEB2 with risk of coronary artery disease.

Authors: Lijiang Ma; Nirupama Chandel; Raili Ermel; Katyayani Sukhavasi; Ke Hao; Arno Ruusalepp; Johan L M Björkegren; Jason C Kovacic
Journal: Atherosclerosis Date: 2020-08-29 Impact factor: 5.162

10. Mergeomics 2.0: a web server for multi-omics data integration to elucidate disease networks and predict therapeutics.

Authors: Jessica Ding; Montgomery Blencowe; Thien Nghiem; Sung-Min Ha; Yen-Wei Chen; Gaoyan Li; Xia Yang
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971