Literature DB >> 31891765

Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records.

Yanshan Wang1, Yiqing Zhao2, Terry M Therneau3, Elizabeth J Atkinson3, Ahmad P Tafti2, Nan Zhang2, Shreyasee Amin4, Andrew H Limper5, Sundeep Khosla6, Hongfang Liu7.   

Abstract

Machine learning has become ubiquitous and a key technology on mining electronic health records (EHRs) for facilitating clinical research and practice. Unsupervised machine learning, as opposed to supervised learning, has shown promise in identifying novel patterns and relations from EHRs without using human created labels. In this paper, we investigate the application of unsupervised machine learning models in discovering latent disease clusters and patient subgroups based on EHRs. We utilized Latent Dirichlet Allocation (LDA), a generative probabilistic model, and proposed a novel model named Poisson Dirichlet Model (PDM), which extends the LDA approach using a Poisson distribution to model patients' disease diagnoses and to alleviate age and sex factors by considering both observed and expected observations. In the empirical experiments, we evaluated LDA and PDM on three patient cohorts, namely Osteoporosis, Delirium/Dementia, and Chronic Obstructive Pulmonary Disease (COPD)/Bronchiectasis Cohorts, with their EHR data retrieved from the Rochester Epidemiology Project (REP) medical records linkage system, for the discovery of latent disease clusters and patient subgroups. We compared the effectiveness of LDA and PDM in identifying disease clusters through the visualization of disease representations. We tested the performance of LDA and PDM in differentiating patient subgroups through survival analysis, as well as statistical analysis of demographics and Elixhauser Comorbidity Index (ECI) scores in those subgroups. The experimental results show that the proposed PDM could effectively identify distinguished disease clusters based on the latent patterns hidden in the EHR data by alleviating the impact of age and sex, and that LDA could stratify patients into differentiable subgroups with larger p-values than PDM. However, those subgroups identified by LDA are highly associated with patients' age and sex. The subgroups discovered by PDM might imply the underlying patterns of diseases of greater interest in epidemiology research due to the alleviation of age and sex. Both unsupervised machine learning approaches could be leveraged to discover patient subgroups using EHRs but with different foci.
Copyright © 2019 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Aging; Artificial intelligence; Electronic health records; Epidemiology; Unsupervised Machine learning

Mesh:

Year:  2019        PMID: 31891765      PMCID: PMC7028517          DOI: 10.1016/j.jbi.2019.103364

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  26 in total

1.  Toward best practice: leveraging the electronic patient record as a clinical data warehouse.

Authors:  C S Ledbetter; M W Morgan
Journal:  J Healthc Inf Manag       Date:  2001

2.  Risk factors for Parkinson's disease may differ in men and women: an exploratory study.

Authors:  Rodolfo Savica; Brandon R Grossardt; James H Bower; J Eric Ahlskog; Walter A Rocca
Journal:  Horm Behav       Date:  2012-06-08       Impact factor: 3.587

3.  The relationships between age, sex, and the incidence of dementia and Alzheimer disease: a meta-analysis.

Authors:  S Gao; H C Hendrie; K S Hall; S Hui
Journal:  Arch Gen Psychiatry       Date:  1998-09

4.  Comorbidity measures for use with administrative data.

Authors:  A Elixhauser; C Steiner; D R Harris; R M Coffey
Journal:  Med Care       Date:  1998-01       Impact factor: 2.983

5.  Learning probabilistic phenotypes from heterogeneous EHR data.

Authors:  Rimma Pivovarov; Adler J Perotte; Edouard Grave; John Angiolillo; Chris H Wiggins; Noémie Elhadad
Journal:  J Biomed Inform       Date:  2015-10-14       Impact factor: 6.317

6.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.

Authors:  Edward Choi; Mohammad Taha Bahadori; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal:  JMLR Workshop Conf Proc       Date:  2016-12-10

7.  Long-term mortality following fractures at different skeletal sites: a population-based cohort study.

Authors:  L J Melton; S J Achenbach; E J Atkinson; T M Therneau; S Amin
Journal:  Osteoporos Int       Date:  2012-12-05       Impact factor: 4.507

8.  Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes.

Authors:  Jung Hoon Son; Gangcai Xie; Chi Yuan; Lyudmila Ena; Ziran Li; Andrew Goldstein; Lulin Huang; Liwei Wang; Feichen Shen; Hongfang Liu; Karla Mehl; Emily E Groopman; Maddalena Marasa; Krzysztof Kiryluk; Ali G Gharavi; Wendy K Chung; George Hripcsak; Carol Friedman; Chunhua Weng; Kai Wang
Journal:  Am J Hum Genet       Date:  2018-06-28       Impact factor: 11.025

Review 9.  History of the Rochester Epidemiology Project: half a century of medical records linkage in a US population.

Authors:  Walter A Rocca; Barbara P Yawn; Jennifer L St Sauver; Brandon R Grossardt; L Joseph Melton
Journal:  Mayo Clin Proc       Date:  2012-11-28       Impact factor: 7.616

10.  Discovering associations among diagnosis groups using topic modeling.

Authors:  Ding Cheng Li; Terry Thermeau; Christopher Chute; Hongfang Liu
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2014-04-07
View more
  11 in total

Review 1.  Applications of machine learning in routine laboratory medicine: Current state and future directions.

Authors:  Naveed Rabbani; Grace Y E Kim; Carlos J Suarez; Jonathan H Chen
Journal:  Clin Biochem       Date:  2022-02-25       Impact factor: 3.281

2.  One Hundred Years of Hypertension Research: Topic Modeling Study.

Authors:  Mustapha Abba; Shukri Mohamed; Chidozie Nduka; Seun Anjorin; Emmanuel Agogo; Olalekan Uthman
Journal:  JMIR Form Res       Date:  2022-05-18

3.  Lab indicators standardization method for the regional healthcare platform: a case study on heart failure.

Authors:  Ming Liang; ZhiXing Zhang; JiaYing Zhang; Tong Ruan; Qi Ye; Ping He
Journal:  BMC Med Inform Decis Mak       Date:  2020-12-15       Impact factor: 2.796

Review 4.  Machine Learning Algorithms to Detect Subclinical Keratoconus: Systematic Review.

Authors:  Howard Maile; Ji-Peng Olivia Li; Daniel Gore; Marcello Leucci; Padraig Mulholland; Scott Hau; Anita Szabo; Ismail Moghul; Konstantinos Balaskas; Kaoru Fujinami; Pirro Hysi; Alice Davidson; Petra Liskova; Alison Hardcastle; Stephen Tuft; Nikolas Pontikos
Journal:  JMIR Med Inform       Date:  2021-12-13

5.  A cluster analysis of patients with axial spondyloarthritis using tumour necrosis factor alpha inhibitors based on clinical characteristics.

Authors:  Seulkee Lee; Seonyoung Kang; Yeonghee Eun; Hong-Hee Won; Hyungjin Kim; Hoon-Suk Cha; Eun-Mi Koh; Jaejoon Lee
Journal:  Arthritis Res Ther       Date:  2021-11-15       Impact factor: 5.156

Review 6.  Applications of Machine Learning in Bone and Mineral Research.

Authors:  Sung Hye Kong; Chan Soo Shin
Journal:  Endocrinol Metab (Seoul)       Date:  2021-10-21

Review 7.  Artificial intelligence in clinical and translational science: Successes, challenges and opportunities.

Authors:  Elmer V Bernstam; Paula K Shireman; Funda Meric-Bernstam; Meredith N Zozus; Xiaoqian Jiang; Bradley B Brimhall; Ashley K Windham; Susanne Schmidt; Shyam Visweswaran; Ye Ye; Heath Goodrum; Yaobin Ling; Seemran Barapatre; Michael J Becich
Journal:  Clin Transl Sci       Date:  2021-10-30       Impact factor: 4.689

8.  A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history.

Authors:  Marc P Maurits; Ilya Korsunsky; Soumya Raychaudhuri; Shawn N Murphy; Jordan W Smoller; Scott T Weiss; Thomas W J Huizinga; Marcel J T Reinders; Elizabeth W Karlson; Erik B van den Akker; Rachel Knevel
Journal:  J Am Med Inform Assoc       Date:  2022-04-13       Impact factor: 7.942

9.  Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data.

Authors:  Wei Shao; Xiao Luo; Zuoyi Zhang; Zhi Han; Vasu Chandrasekaran; Vladimir Turzhitsky; Vishal Bali; Anna R Roberts; Megan Metzger; Jarod Baker; Carmen La Rosa; Jessica Weaver; Paul Dexter; Kun Huang
Journal:  BMC Bioinformatics       Date:  2022-04-19       Impact factor: 3.307

10.  The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment.

Authors:  Melissa A Haendel; Christopher G Chute; Tellen D Bennett; David A Eichmann; Justin Guinney; Warren A Kibbe; Philip R O Payne; Emily R Pfaff; Peter N Robinson; Joel H Saltz; Heidi Spratt; Christine Suver; John Wilbanks; Adam B Wilcox; Andrew E Williams; Chunlei Wu; Clair Blacketer; Robert L Bradford; James J Cimino; Marshall Clark; Evan W Colmenares; Patricia A Francis; Davera Gabriel; Alexis Graves; Raju Hemadri; Stephanie S Hong; George Hripscak; Dazhi Jiao; Jeffrey G Klann; Kristin Kostka; Adam M Lee; Harold P Lehmann; Lora Lingrey; Robert T Miller; Michele Morris; Shawn N Murphy; Karthik Natarajan; Matvey B Palchuk; Usman Sheikh; Harold Solbrig; Shyam Visweswaran; Anita Walden; Kellie M Walters; Griffin M Weber; Xiaohan Tanner Zhang; Richard L Zhu; Benjamin Amor; Andrew T Girvin; Amin Manna; Nabeel Qureshi; Michael G Kurilla; Sam G Michael; Lili M Portilla; Joni L Rutter; Christopher P Austin; Ken R Gersing
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 7.942

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.