| Literature DB >> 32768443 |
Abdelaali Hassaine1, Gholamreza Salimi-Khorshidi2, Dexter Canoy1, Kazem Rahimi3.
Abstract
The prevalence of multimorbidity has been increasing in recent years, posing a major burden for health care delivery and service. Understanding its determinants and impact is proving to be a challenge yet it offers new opportunities for research to go beyond the study of diseases in isolation. In this paper, we review how the field of machine learning provides many tools for addressing research challenges in multimorbidity. We highlight recent advances in promising methods such as matrix factorisation, deep learning, and topological data analysis and how these can take multimorbidity research beyond cross-sectional, expert-driven or confirmatory approaches to gain a better understanding of evolving patterns of multimorbidity. We discuss the challenges and opportunities of machine learning to identify likely causal links between previously poorly understood disease associations while giving an estimate of the uncertainty on such associations. We finally summarise some of the challenges for wider clinical adoption of machine learning research tools and propose some solutions.Entities:
Keywords: Deep learning; Electronic health records; Machine learning; Multimorbidity; Phenotyping
Year: 2020 PMID: 32768443 PMCID: PMC7493712 DOI: 10.1016/j.mad.2020.111325
Source DB: PubMed Journal: Mech Ageing Dev ISSN: 0047-6374 Impact factor: 5.432
Fig. 1Annual crude and age/sex-standardised prevalence of number of comorbidities in incident cardiovascular disease patients (credits to Tran et al. (2018)); Number labels for each line refer to the number of comorbidities. (A) Crude prevalence. (B) Age/sex-standardised prevalence.
Fig. 2Two common types of factorisation methods employed in the multimorbidity literature; (a) Matrix factorisation, and (b) Tensor factorisation. Note that, one can change the concept that each dimension represents; in these illustrations, we show a very common way of choosing the dimensions.
Fig. 3Examples of temporal phenotyping. (a) using a tensor where time is mapped to a dimension, (b) using a tensor where the encounters are mapped to a dimension (c) using concatenated matrix representations.
Fig. 4Network showing how male disease clusters (DC) may lead to one another over time. Edges are coloured with the colour of the nodes they originate from. Credits to (Hassaine et al., 2019).
A summary of the papers that introduced a new method for the study of multimorbidity patterns.
| Study | Method | Context | Data |
|---|---|---|---|
| Hernández et al. (2019) ( | Pairwise correlations | 6101 Irish adults aged 50+ years | Self-reported conditions |
| Aguado et al. (2020) ( | Pairwise correlations | 500 K adults in Spain with Type 2 diabetes mellitus. | EHR |
| Jin et al. (2018) ( | Pairwise correlations | 21,435 adults from Jilin province, China | Self-reported conditions |
| Khorrami et al. (2020) ( | Latent class analysis | 10,069 adult Iranian people | Self-reported conditions |
| Wang et al. (2019) ( | Principal Component Analysis | 2713 adults in São Paulo, Brazil | Self-reported conditions |
| Schiltzet al. (2017) ( | Classification/regression trees and random forest | 5771 people from US aged 65+ years | Self-reported conditions linked to Medicare claims |
| Haug et al. (2020) ( | Hierarchical clustering | 5M patients in Austria | EHR |
| Bueno et al. (2018) ( | Hidden Markov Models | Dutch patients with comorbidities related to atherosclerosis | EHR |
| Violán et al. (2018) ( | K-means non-hierarchical cluster analysis | 400 patients aged 45−64 years from Spain | EHR |
| Marengoni et al. (2019) ( | Fuzzy c-means cluster algorithm | 2931 individuals in Sweden aged 60+ years | EHR |
| Medlock-Brown et al. (2019) ( | Pairwise correlations | 574,172 patients with obesity in the US | EHR |
Fig. 5Occurrence and co-occurrence of diseases as a function of the size of the dataset, error bars correspond to 95 % CI estimated using 10 bootstrapped samples. Experiments conducted on CPRD (Herrett et al., 2015).
Fig. 6Illustration of a patient’s multimodal health record, where events/encounters tend to happen at irregular intervals.
Fig. 7(a) UMAP projections of ICD-10 disease embeddings extracted using CBOW algorithm from CPRD. Note that diseases within the same ICD-10 chapter are very close in the embedding space. (b) Cosine similarity between vector embeddings of a selected group of diseases.
Fig. 8Illustration of the self-attention in BEHRT. The left column shows the outcome of interest, the right column shows the corresponding associations that “attracted” the attention of the model, the darker the colour, the more relevant was the disease in predicting the outcome of interest.