Literature DB >> 29897411

comoRbidity: an R package for the systematic analysis of disease comorbidities.

Alba Gutiérrez-Sacristán^1,2, Àlex Bravo^1,3, Alexia Giannoula¹, Miguel A Mayer¹, Ferran Sanz¹, Laura I Furlong¹.

Abstract

Motivation: The study of comorbidities is a major priority due to their impact on life expectancy, quality of life and healthcare cost. The availability of electronic health records (EHRs) for data mining offers the opportunity to discover disease associations and comorbidity patterns from the clinical history of patients gathered during routine medical care. This opens the need for analytical tools for detection of disease comorbidities, including the investigation of their underlying genetic basis.
Results: We present comoRbidity, an R package aimed at providing a systematic and comprehensive analysis of disease comorbidities from both the clinical and molecular perspectives. comoRbidity leverages from (i) user provided clinical data from EHR databases (the clinical comorbidity analysis) and (ii) genotype-phenotype information of the diseases under study (the molecular comorbidity analysis) for a comprehensive analysis of disease comorbidities. The clinical comorbidity analysis enables identifying significant disease comorbidities from clinical data, including sex and age stratification and temporal directionality analyses, while the molecular comorbidity analysis supports the generation of hypothesis on the underlying mechanisms of the disease comorbidities by exploring shared genes among disorders. The open-source comoRbidity package is a software tool aimed at expediting the integrative analysis of disease comorbidities by incorporating several analytical and visualization functions. Availability and implementation: https://bitbucket.org/ibi_group/comorbidity. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities: Disease Gene Species

Mesh：

Year: 2018 PMID： 29897411 PMCID： PMC6137966 DOI： 10.1093/bioinformatics/bty315

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

The co-existence of two or more diseases in the same patient, also known as comorbidity (van den Akker ; Valderas ) is a matter of public health concern as it has important consequences both for patients and the healthcare system (Gijsen ; Valderas ). According to several studies, the prevalence of comorbidity varies between ∼20% and ∼90% (Bonavita and De Simone, 2008; Fortin ; Mezzich and Salloum, 2008; Marengoni ). This variation is due to the population under study, as well as other characteristics of the study design, such as the definition of comorbidity (van den Akker ; Valderas ). Although the prevalence of comorbidity increases with age, it is not limited to the elderly population (Doshi-Velez ; Jakovljević and Ostojić, 2013; Marengoni ; Taylor ). The availability of electronic health records (EHR) for data mining offers the opportunity to discover disease associations and comorbidity patterns from the clinical history of patients gathered during routine medical care (Bagley ; Backenroth ; Holmes ). In recent years, there has been a growing interest in the re-use of clinic data for research (Jensen ). In this context, the availability of tools that enable the analysis of clinical data in a reproducible manner and in a secure environment is key. The development of analytical tools to identify comorbidity patterns from clinical data will enable: (i) the estimation of the prevalence of comorbidities in particular populations, (ii) the stratification of patients according to their comorbidities and (iii) the development of decision support systems in the clinical setting. In this paper, we introduce comoRbidity, an R package aimed at providing a comprehensive analysis of disease comorbidities from both the clinical and molecular perspectives. comoRbidity leverages from clinical data obtained from EHR databases or health registries (the clinical comorbidity analysis), and from genotype-phenotype information of the diseases under study (the molecular comorbidity analysis) from DisGeNET (Piñero ), or provided by the user.

2 Design and implementation

comoRbidity aims at expediting the analysis of disease comorbidities by providing several analytical functions and different visualization options to analyze clinical data provided by the user. comoRbidity is based on standard CRAN and Bioconductor classes allowing for full flexibility and integration with other R packages. It runs under Linux, Windows and Mac operating systems. The R CRAN package parallel (R, 2014) is used to speed up the comorbidity estimation by adjusting the cores according to the user requirements. comoRbidity contains 14 R functions (see Supplementary Table S1 for details) used to process clinical and molecular data to perform the disease comorbidity analysis and visualize the results. The package includes a dataset of artificially generated clinical data (http://www.emrbots.org/) to illustrate the functionalities of the package. The software implements two types of independent analysis, the clinical comorbidity analysis and the molecular comorbidity analysis. An overview of the workflow of data analysis provided by the package is shown in Figure 1. Each analysis includes three sequential steps:

Fig. 1.

Overview of comoRbidity

Overview of comoRbidity Data Selection: From the user’s input data, comoRbidity provides an overview of the data, including a demographic analysis based on age and sex, the number of genes associated to the diseases under study, or the number of diseases sharing genes. Data Analysis: The comorbidity analysis is performed, based on different parameters set by the user. Results Visualization: The package offers different options for the visualization of the results.

3 Related work

To the best of our knowledge, only few tools have been developed for the analysis of disease comorbidities, namely comoR (Moni ), CytoCom (Moni ) and medicalRisk (McCormick, 2016). In the R environment, the comoR package (Moni ) computes statistically significant associations among diseases based on the US Medicare claims database (Hidalgo ) along with several molecular and phenotypic association metrics. The same authors developed CytoCom2 (Moni ), a Cytoscape App to visualize and query their disease comorbidity networks (Hidalgo ). The medicalRisk R package (McCormick, 2016), can be used to obtain medical risk status from large datasets with diseases encoded in ICD-9-CM, based on mortality predictors such as the Charlson Comorbidity Index and the Elixhauser comorbidity map. Compared to these tools, the main advantage of comoRbidity is the possibility to analyze the user’s own clinical data in his/her private workstation, avoiding any privacy issues concerning the sharing of patient data. In addition, comoRbidity allows any classification to encode diseases, and provides different statistics and functions for assessing comorbidity between disorders. Finally, it allows exploring the genetic basis of disease comorbidities by the analysis of gene-disease association data from DisGeNET (Piñero ), or from gene-disease association data provided by the user.

4 Conclusions

The comoRbidity package is a novel, publicly available tool for the processing of healthcare data to identify comorbidity patterns enabling their analysis in a user-friendly and reproducible manner. More importantly, it permits the user to provide its own clinical data, which can be analyzed locally in a secure environment. comoRbidity supports any classification system used to identify diseases and/or phenotypes. In addition, it permits full flexibility to the user in the definition of comorbidity regarding the temporal window considered, the diseases of interest and the use of primary or secondary diagnoses in the analysis, among other aspects. Several analytical and visualization functions are provided including metrics to assess disease associations and their temporal directionality. In addition, it allows performing a molecular analysis of the comorbidities even if no genomic data of the patient is available, by using publicly available information on gene-disease associations, making possible the formulation of hypothesis regarding the etiology of disease comorbidities.

Funding

The authors received support from ISCIII-FEDER (PI13/00082, CP10/00524, CPII16/00026), IMI-JU under grants agreements no. 115372 (EMIF), no. 115735 (iPiE), resources of which are composed of financial contribution from the EU-FP7 (FP7/2007-2013) and EFPIA companies in kind contribution and the EU H2020 Programme 2014–2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (ElixirExcelerate). The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I + D+i 2013-2016, funded by ISCIII and FEDER. AGS acknowledges financial support from the Spanish Ministry of Economy and Competitiveness, through the ‘María de Maeztu’ Programme for Units of Excellence in R&D (MDM-2014-0370). Conflict of Interest: none declared. Click here for additional data file.

17 in total

Review 1. Causes and consequences of comorbidity: a review.

Authors: R Gijsen; N Hoeymans; F G Schellevis; D Ruwaard; W A Satariano; G A van den Bos
Journal: J Clin Epidemiol Date: 2001-07 Impact factor: 6.437

2. Clinical complexity and person-centered integrative diagnosis.

Authors: Juan E Mezzich; Ihsan M Salloum
Journal: World Psychiatry Date: 2008-02 Impact factor: 49.548

Review 3. Multimorbidity is common to family practice: is it commonly researched?

Authors: Martin Fortin; Lise Lapointe; Catherine Hudon; Alain Vanasse
Journal: Can Fam Physician Date: 2005-02 Impact factor: 3.275

Review 4. Mining electronic health records: towards better research applications and clinical care.

Authors: Peter B Jensen; Lars J Jensen; Søren Brunak
Journal: Nat Rev Genet Date: 2012-05-02 Impact factor: 53.242

5. A dynamic network approach for the study of human phenotypes.

Authors: César A Hidalgo; Nicholas Blumm; Albert-László Barabási; Nicholas A Christakis
Journal: PLoS Comput Biol Date: 2009-04-10 Impact factor: 4.475

Review 6. Towards a definition of comorbidity in the light of clinical complexity.

Authors: Vincenzo Bonavita; Roberto De Simone
Journal: Neurol Sci Date: 2008-05 Impact factor: 3.307

7. comoR: a software for disease comorbidity risk assessment.

Authors: Mohammad Ali Moni; Pietro Liò
Journal: J Clin Bioinforma Date: 2014-05-23

8. CytoCom: a Cytoscape app to visualize, query and analyse disease comorbidity networks.

Authors: Mohammad Ali Moni; Haoming Xu; Pietro Liò
Journal: Bioinformatics Date: 2014-11-07 Impact factor: 6.937

9. Using Rich Data on Comorbidities in Case-Control Study Design with Electronic Health Record Data Improves Control of Confounding in the Detection of Adverse Drug Reactions.

Authors: Daniel Backenroth; Herbert Chase; Carol Friedman; Ying Wei
Journal: PLoS One Date: 2016-10-07 Impact factor: 3.240

10. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.

Authors: Janet Piñero; Àlex Bravo; Núria Queralt-Rosinach; Alba Gutiérrez-Sacristán; Jordi Deu-Pons; Emilio Centeno; Javier García-García; Ferran Sanz; Laura I Furlong
Journal: Nucleic Acids Res Date: 2016-10-19 Impact factor: 16.971

7 in total

1. A Poisson binomial-based statistical testing framework for comorbidity discovery across electronic health record datasets.

Authors: Gordon Lemmon; Sergiusz Wesolowski; Alex Henrie; Martin Tristani-Firouzi; Mark Yandell
Journal: Nat Comput Sci Date: 2021-10-21

2. Extracting Significant Comorbid Diseases from MeSH Index of PubMed.

Authors: Sharanya Manoharan; Oviya Ramalakshmi Iyyappan; Dheepa Anand; Sadhanha Anand; Kalpana Raja
Journal: Methods Mol Biol Date: 2022

3. Common genetic associations between age-related diseases.

Authors: Handan Melike Dönertaş; Daniel K Fabian; Matías Fuentealba Valenzuela; Linda Partridge; Janet M Thornton
Journal: Nat Aging Date: 2021-04-08

4. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records.

Authors: Carlos Sáez; Alba Gutiérrez-Sacristán; Isaac Kohane; Juan M García-Gómez; Paul Avillach
Journal: Gigascience Date: 2020-08-01 Impact factor: 6.524

5. System biology and bioinformatics pipeline to identify comorbidities risk association: Neurodegenerative disorder case study.

Authors: Utpala Nanda Chowdhury; Shamim Ahmad; M Babul Islam; Salem A Alyami; Julian M W Quinn; Valsamma Eapen; Mohammad Ali Moni
Journal: PLoS One Date: 2021-05-06 Impact factor: 3.240

6. An explainable artificial intelligence approach for predicting cardiovascular outcomes using electronic health records.

Authors: Sergiusz Wesołowski; Gordon Lemmon; Edgar J Hernandez; Alex Henrie; Thomas A Miller; Derek Weyhrauch; Michael D Puchalski; Bruce E Bray; Rashmee U Shah; Vikrant G Deshmukh; Rebecca Delaney; H Joseph Yostl; Karen Eilbeck; Martin Tristani-Firouzi; Mark Yandell
Journal: PLOS Digit Health Date: 2022-01-18

7. Validation of Clinical Risk Models for Clostridioides difficile-Attributable Outcomes.

Authors: Gregory R Madden; William A Petri; Deiziane V S Costa; Cirle A Warren; Jennie Z Ma; Costi D Sifri
Journal: Antimicrob Agents Chemother Date: 2022-06-21 Impact factor: 5.938

7 in total