Literature DB >> 36273240

A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record.

Yuri Ahuja1,2,3, Jun Wen4, Chuan Hong4, Zongqi Xia5, Sicong Huang4,6,7, Tianxi Cai8,4,7.   

Abstract

While there exist numerous methods to identify binary phenotypes (i.e. COPD) using electronic health record (EHR) data, few exist to ascertain the timings of phenotype events (i.e. COPD onset or exacerbations). Estimating event times could enable more powerful use of EHR data for longitudinal risk modeling, including survival analysis. Here we introduce Semi-supervised Adaptive Markov Gaussian Embedding Process (SAMGEP), a semi-supervised machine learning algorithm to estimate phenotype event times using EHR data with limited observed labels, which require resource-intensive chart review to obtain. SAMGEP models latent phenotype states as a binary Markov process, and it employs an adaptive weighting strategy to map timestamped EHR features to an embedding function that it models as a state-dependent Gaussian process. SAMGEP's feature weighting achieves meaningful feature selection, and its predictions significantly improve AUCs and F1 scores over existing approaches in diverse simulations and real-world settings. It is particularly adept at predicting cumulative risk and event counting process functions, and is robust to diverse generative model parameters. Moreover, it achieves high accuracy with few (50-100) labels, efficiently leveraging unlabeled EHR data to maximize information gain from costly-to-obtain event time labels. SAMGEP can be used to estimate accurate phenotype state functions for risk modeling research.
© 2022. The Author(s).

Entities:  

Year:  2022        PMID: 36273240     DOI: 10.1038/s41598-022-22585-3

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


  23 in total

1.  A translational engine at the national scale: informatics for integrating biology and the bedside.

Authors:  Isaac S Kohane; Susanne E Churchill; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2011-11-10       Impact factor: 4.497

2.  Inaccuracy of ICD-9 Codes for Chronic Kidney Disease: A Study from Two Practice-based Research Networks (PBRNs).

Authors:  Charlotte W Cipparone; Matthew Withiam-Leitch; Kim S Kimminau; Chet H Fox; Ranjit Singh; Linda Kahn
Journal:  J Am Board Fam Med       Date:  2015 Sep-Oct       Impact factor: 2.657

3.  High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors:  Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

4.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

Authors:  Robert J Carroll; Will K Thompson; Anne E Eyler; Arthur M Mandelin; Tianxi Cai; Raquel M Zink; Jennifer A Pacheco; Chad S Boomershine; Thomas A Lasko; Hua Xu; Elizabeth W Karlson; Raul G Perez; Vivian S Gainer; Shawn N Murphy; Eric M Ruderman; Richard M Pope; Robert M Plenge; Abel Ngo Kho; Katherine P Liao; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2012-02-28       Impact factor: 4.497

5.  Detecting Lung and Colorectal Cancer Recurrence Using Structured Clinical/Administrative Data to Enable Outcomes Research and Population Health Management.

Authors:  Michael J Hassett; Hajime Uno; Angel M Cronin; Nikki M Carroll; Mark C Hornbrook; Debra Ritzwoller
Journal:  Med Care       Date:  2017-12       Impact factor: 2.983

6.  sureLDA: A multidisease automated phenotyping method for the electronic health record.

Authors:  Yuri Ahuja; Doudou Zhou; Zeling He; Jiehuan Sun; Victor M Castro; Vivian Gainer; Shawn N Murphy; Chuan Hong; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2020-08-01       Impact factor: 4.497

7.  Determining the Time of Cancer Recurrence Using Claims or Electronic Medical Record Data.

Authors:  Hajime Uno; Debra P Ritzwoller; Angel M Cronin; Nikki M Carroll; Mark C Hornbrook; Michael J Hassett
Journal:  JCO Clin Cancer Inform       Date:  2018-12

8.  Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer.

Authors:  Jessica Chubak; Onchee Yu; Gaia Pocobelli; Lois Lamerato; Joe Webster; Marianne N Prout; Marianne Ulcickas Yood; William E Barlow; Diana S M Buist
Journal:  J Natl Cancer Inst       Date:  2012-04-30       Impact factor: 13.506

9.  Next-generation phenotyping of electronic health records.

Authors:  George Hripcsak; David J Albers
Journal:  J Am Med Inform Assoc       Date:  2012-09-06       Impact factor: 4.497

10.  Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.

Authors:  Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai
Journal:  PLoS One       Date:  2015-08-24       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.