Literature DB >> 30032270

MoDentify: phenotype-driven module identification in metabolomics networks at different resolutions.

Kieu Trinh Do¹, David J N-P Rasp¹, Gabi Kastenmüller^2,3, Karsten Suhre⁴, Jan Krumsiek^1,2,5.

Abstract

Summary: Associations of metabolomics data with phenotypic outcomes are expected to span functional modules, which are defined as sets of correlating metabolites that are coordinately regulated. Moreover, these associations occur at different scales, from entire pathways to only a few metabolites; an aspect that has not been addressed by previous methods. Here, we present MoDentify, a free R package to identify regulated modules in metabolomics networks at different layers of resolution. Importantly, MoDentify shows higher statistical power than classical association analysis. Moreover, the package offers direct interactive visualization of the results in Cytoscape. We present an application example using complex, multifluid metabolomics data. Due to its generic character, the method is widely applicable to other types of data. Availability and implementation: https://github.com/krumsieklab/MoDentify (vignette includes detailed workflow). Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30032270 PMCID： PMC6361241 DOI： 10.1093/bioinformatics/bty650

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Associations with clinical phenotypic outcomes in large-scale metabolomics datasets are complex. They typically span entire modules, which are defined as groups of correlating molecules that are functionally coordinated, coregulated or generally driven by a common biological process (Mitra ). The systematic identification of modules is often based on networks, where the aim is to identify highly connected parts containing nodes that are coordinately associated with a given phenotype. Systematic module identification algorithms are well established (Chuang ; Martignetti ; May ; Polanski ); however, none of the previously published methods consider that phenotype associations can occur at different scales, ranging from global associations spanning entire pathways or even sets of pathways (‘dense’ associations, e.g. between metabolites and phenotypic traits such as gender or BMI), to localized associations with only a few metabolites [‘sparse’ associations, e.g. between metabolites and phenotypic traits such as insulin-like growth-factor I (IGF-I) levels or asthma; Do ]. For sparse associations, the identification and interpretation of modules is usually straightforward. However, modules for dense phenotype associations at the metabolite level are challenging to interpret due to their overwhelming number. To facilitate interpretation, the plethora of information at the fine-grained metabolite level can be condensed to a hierarchically superordinate level, such as a pathway network (i.e. a network of pathways). We have recently introduced a module identification algorithm for multifluid metabolomics data (Do ), which has been successfully applied to IGF-I and gender as examples of sparse and dense phenotype associations, respectively. We here present MoDentify, a free R package implementing the approach for general use. MoDentify offers network inference, module identification and interactive module visualization at different levels of resolution. In particular, it increases statistical power compared with classical association analysis and can easily be applied to any type of quantitative data due to its generic character.

2 Description

MoDentify identifies network-based modules that are highly affected by a given phenotype. The underlying network is either directly inferred from the data at the single metabolite or pathway level (see below) or can be provided from an external source. Any external network can be used for the module identification procedure. This includes (i) networks obtained from public databases such as KEGG (Kanehisa ) or Recon3D (Brunk ), (ii) networks inferred from statistical approaches such as partial, Pearson or Spearman correlations or (iii) networks produced by newly emerging hybrid prior-knowledge/data-based approaches (e.g. Zuo ). Regardless of the source of the network, all nodes in the network must be measured in the given dataset. Details can be found in the Supplementary Material.

2.1 Network inference

MoDentify estimates Gaussian graphical models, which have been shown to reconstruct metabolic pathways from metabolomics data (Krumsiek ). At the fine-grained level, the network consists of nodes corresponding to metabolites, while at the pathway level, the nodes correspond to entire pathways (sets of metabolites). Such pathway definitions are available from public databases such as the Human Metabolome Database (HMDB) (Wishart ), MetaCyc (Caspi ), KEGG (Kanehisa ) or Recon3D (Brunk ). Edges represent significant (partial) correlations between two nodes after multiple testing correction.

2.2 Pathway representation

To build a network of interacting pathways, new variables are defined as representatives for each pathway, aggregating the total abundance of metabolites from the pathway into a single value. MoDentify implements two approaches: (i) eigenmetabolite approach, where the first principal component (eigenmetabolite) from a Principal Component Analysis is used as a representative value (Langfelder and Horvath, 2007); (ii) average approach, where the pathway representative is calculated as the average of all z-scored metabolite concentrations in the pathway.

2.3 Module identification and scoring

Given a network, a scoring function, and a starting node (seed node) as initial candidate module, the algorithm identifies an optimal module by score maximization. To this end, candidate modules are extended along the network edges until no further score improvement can be achieved. The score of a candidate module is calculated as the negative logarithmized P-value obtained from a multivariable linear regression model with the candidate module as dependent and the phenotype and optional covariates as independent variables. The procedure is repeated for each node in the network as seed node. Overlapping optimal modules are combined into single modules in an optional consolidation step. The combined module is then re-evaluated by the scoring function. If multiple resolution levels are available, each resolution level is represented by its own network and module identification is performed at each resolution level separately.

2.4 Module visualization

In addition to returning R data structures and producing flat-file results, MoDentify offers visualization of the identified modules within an interactive network in the open source software Cytoscape (Shannon ) for external visualization.

2.5 Complexity and runtime

The algorithm has a complexity of , which will lead to quadratic runtime in the worst-case scenario of a fully connected network. In practice, we assume biological networks to be sparse, i.e. with constant neighborhood sizes, leading to an approximate complexity of . On a 64-bit Windows 8 system with Intel(R) Core(TM) i7-4600U CPU @ 2.10 GHz, network inference took ∼21 s, module identification ∼100 s and module visualization ∼48 s for a network with 1524 nodes.

3 Application example

We demonstrate the easy usage of MoDentify on plasma, urine and saliva metabolomics data from the Qatar Metabolomics Study on Diabetes (QMDiab, see Supplementary Material; Mook-Kanamori ), aiming to identify functional modules associated with type 2 diabetes (T2D). Pathway annotations were provided by Metabolon, Inc., the metabolomics platform on which metabolomics measurements were performed. The dataset is also available via https://doi.org/10.6084/m9.figshare.5904022. MoDentify was applied to the dataset at metabolite and pathway levels. To produce the list of T2D associated modules, as well as their interactive visualization in Cytoscape (Fig. 1A), only three lines of code are required. Briefly, generate.network estimates partial correlations between metabolites, identify.modules searches network modules for the given phenotype, and draw.modules visualizes the results in Cytoscape.

Fig. 1.

Visualization of identified modules for type 2 diabetes. The metabolomics networks with embedded modules at metabolite (A) and pathway (B) level are screenshots of the interactive versions in Cytoscape produced by MoDentify. Zoom-ins have been added to highlight examples for MoDentify’s increased statistical power and its ability to extract biologically valuable insights. Rounds nodes correspond to metabolic entitles not significantly associated with T2D when considered alone. Diamond nodes represent metabolic entities significantly related to T2D MoDentify identified 36 modules for T2D at the metabolite level (Fig. 1A) and six modules at the pathway level (Fig. 1B). Many of these modules consist of metabolites or pathways that are not significantly associated with T2D if considered alone. In combination, however, they form modules that are more associated with T2D than all of their single components. This increased statistical power in MoDentify can be attributed to the reduction of uncorrelated technical noise by aggregation of multiple metabolites and allows the detection of links with the phenotype that would have been missed with classical association analysis.

4 Conclusion

To the best of our knowledge, MoDentify implements the first approach for the systematic identification of phenotype-driven modules based on networks at different layers of resolution. The algorithm utilizes pathway definitions in combination with network topology to search for functional modules. Due to its increased statistical power, novel links between phenotypic outcomes and molecular levels can be detected that would be missed by classical analysis. We presented an application example using complex multifluid metabolomics data, but our approach can be applied for any quantitative dataset. Click here for additional data file.

15 in total

1. metaModules identifies key functional subnetworks in microbiome-related disease.

Authors: Ali May; Bernd W Brandt; Mohammed El-Kebir; Gunnar W Klau; Egija Zaura; Wim Crielaard; Jaap Heringa; Sanne Abeln
Journal: Bioinformatics Date: 2015-09-05 Impact factor: 6.937

Review 2. Integrative approaches for finding modular structure in biological networks.

Authors: Koyel Mitra; Anne-Ruxandra Carvunis; Sanath Kumar Ramesh; Trey Ideker
Journal: Nat Rev Genet Date: 2013-10 Impact factor: 53.242

3. Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data.

Authors: Jan Krumsiek; Karsten Suhre; Thomas Illig; Jerzy Adamski; Fabian J Theis
Journal: BMC Syst Biol Date: 2011-01-31

4. HMDB: the Human Metabolome Database.

Authors: David S Wishart; Dan Tzur; Craig Knox; Roman Eisner; An Chi Guo; Nelson Young; Dean Cheng; Kevin Jewell; David Arndt; Summit Sawhney; Chris Fung; Lisa Nikolai; Mike Lewis; Marie-Aude Coutouly; Ian Forsythe; Peter Tang; Savita Shrivastava; Kevin Jeroncic; Paul Stothard; Godwin Amegbey; David Block; David D Hau; James Wagner; Jessica Miniaci; Melisa Clements; Mulu Gebremedhin; Natalie Guo; Ying Zhang; Gavin E Duggan; Glen D Macinnis; Alim M Weljie; Reza Dowlatabadi; Fiona Bamforth; Derrick Clive; Russ Greiner; Liang Li; Tom Marrie; Brian D Sykes; Hans J Vogel; Lori Querengesser
Journal: Nucleic Acids Res Date: 2007-01 Impact factor: 16.971

5. Wigwams: identifying gene modules co-regulated across multiple biological conditions.

Authors: Krzysztof Polanski; Johanna Rhodes; Claire Hill; Peijun Zhang; Dafyd J Jenkins; Steven J Kiddle; Aleksey Jironkin; Jim Beynon; Vicky Buchanan-Wollaston; Sascha Ott; Katherine J Denby
Journal: Bioinformatics Date: 2013-12-18 Impact factor: 6.937

6. ROMA: Representation and Quantification of Module Activity from Target Expression Data.

Authors: Loredana Martignetti; Laurence Calzone; Eric Bonnet; Emmanuel Barillot; Andrei Zinovyev
Journal: Front Genet Date: 2016-02-19 Impact factor: 4.599

7. Phenotype-driven identification of modules in a hierarchical map of multifluid metabolic correlations.

Authors: Kieu Trinh Do; Maik Pietzner; David Jnp Rasp; Nele Friedrich; Matthias Nauck; Thomas Kocher; Karsten Suhre; Dennis O Mook-Kanamori; Gabi Kastenmüller; Jan Krumsiek
Journal: NPJ Syst Biol Appl Date: 2017-09-21

8. Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO.

Authors: Yiming Zuo; Yi Cui; Guoqiang Yu; Ruijiang Li; Habtom W Ressom
Journal: BMC Bioinformatics Date: 2017-02-10 Impact factor: 3.169

9. Network-based classification of breast cancer metastasis.

Authors: Han-Yu Chuang; Eunjung Lee; Yu-Tsueng Liu; Doheon Lee; Trey Ideker
Journal: Mol Syst Biol Date: 2007-10-16 Impact factor: 11.429

10. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.

Authors: Ron Caspi; Tomer Altman; Richard Billington; Kate Dreher; Hartmut Foerster; Carol A Fulcher; Timothy A Holland; Ingrid M Keseler; Anamika Kothari; Aya Kubo; Markus Krummenacker; Mario Latendresse; Lukas A Mueller; Quang Ong; Suzanne Paley; Pallavi Subhraveti; Daniel S Weaver; Deepika Weerasinghe; Peifen Zhang; Peter D Karp
Journal: Nucleic Acids Res Date: 2013-11-12 Impact factor: 16.971

9 in total

Review 1. Genetics meets proteomics: perspectives for large population-based studies.

Authors: Karsten Suhre; Mark I McCarthy; Jochen M Schwenk
Journal: Nat Rev Genet Date: 2020-08-28 Impact factor: 53.242

Review 2. Multi-omics integration in biomedical research - A metabolomics-centric review.

Authors: Maria A Wörheide; Jan Krumsiek; Gabi Kastenmüller; Matthias Arnold
Journal: Anal Chim Acta Date: 2020-10-22 Impact factor: 6.558

3. Plasma Metabolomic Signatures of Chronic Obstructive Pulmonary Disease and the Impact of Genetic Variants on Phenotype-Driven Modules.

Authors: Lucas A Gillenwater; Katherine A Pratte; Brian D Hobbs; Michael H Cho; Yonghua Zhuang; Eitan Halper-Stromberg; Charmion Cruickshank-Quinn; Nichole Reisdorph; Irina Petrache; Wassim W Labaki; Wanda K O'Neal; Victor E Ortega; Dean P Jones; Karan Uppal; Sean Jacobson; Gregory Michelotti; Christine H Wendt; Katerina J Kechris; Russell P Bowler
Journal: Netw Syst Med Date: 2020-12-31

4. Clinical diagnosis of metabolic disorders using untargeted metabolomic profiling and disease-specific networks learned from profiling data.

Authors: Lillian R Thistlethwaite; Xiqi Li; Lindsay C Burrage; Kevin Riehle; Joseph G Hacia; Nancy Braverman; Michael F Wangler; Marcus J Miller; Sarah H Elsea; Aleksandar Milosavljevic
Journal: Sci Rep Date: 2022-04-21 Impact factor: 4.996

Review 5. Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: An Epidemiological Perspective.

Authors: Su H Chu; Mengna Huang; Rachel S Kelly; Elisa Benedetti; Jalal K Siddiqui; Oana A Zeleznik; Alexandre Pereira; David Herrington; Craig E Wheelock; Jan Krumsiek; Michael McGeachie; Steven C Moore; Peter Kraft; Ewy Mathé; Jessica Lasky-Su
Journal: Metabolites Date: 2019-06-18

6. Metabolomics Analytics Workflow for Epidemiological Research: Perspectives from the Consortium of Metabolomics Studies (COMETS).

Authors: Mary C Playdon; Amit D Joshi; Fred K Tabung; Susan Cheng; Mir Henglin; Andy Kim; Tengda Lin; Eline H van Roekel; Jiaqi Huang; Jan Krumsiek; Ying Wang; Ewy Mathé; Marinella Temprosa; Steven Moore; Bo Chawes; A Heather Eliassen; Andrea Gsur; Marc J Gunter; Sei Harada; Claudia Langenberg; Matej Oresic; Wei Perng; Wei Jie Seow; Oana A Zeleznik
Journal: Metabolites Date: 2019-07-17

7. Intra- and inter-individual metabolic profiling highlights carnitine and lysophosphatidylcholine pathways as key molecular defects in type 2 diabetes.

Authors: Klev Diamanti; Marco Cavalli; Gang Pan; Maria J Pereira; Chanchal Kumar; Stanko Skrtic; Manfred Grabherr; Ulf Risérus; Jan W Eriksson; Jan Komorowski; Claes Wadelius
Journal: Sci Rep Date: 2019-07-04 Impact factor: 4.379

8. SGI: Automatic clinical subgroup identification in omics datasets.

Authors: Mustafa Buyukozkan; Karsten Suhre; Jan Krumsiek
Journal: Bioinformatics Date: 2021-09-16 Impact factor: 6.937

Review 9. The metaRbolomics Toolbox in Bioconductor and beyond.

Authors: Jan Stanstrup; Corey D Broeckling; Rick Helmus; Nils Hoffmann; Ewy Mathé; Thomas Naake; Luca Nicolotti; Kristian Peters; Johannes Rainer; Reza M Salek; Tobias Schulze; Emma L Schymanski; Michael A Stravs; Etienne A Thévenot; Hendrik Treutler; Ralf J M Weber; Egon Willighagen; Michael Witting; Steffen Neumann
Journal: Metabolites Date: 2019-09-23

9 in total