Literature DB >> 33396741

Increasing the Density of Laboratory Measures for Machine Learning Applications.

Vida Abedi1,2, Jiang Li1, Manu K Shivakumar3, Venkatesh Avula1, Durgesh P Chaudhary4, Matthew J Shellenberger5, Harshit S Khara5, Yanfei Zhang6, Ming Ta Michael Lee6, Donna M Wolk7, Mohammed Yeasin8, Raquel Hontecillas2,9, Josep Bassaganya-Riera2,9, Ramin Zand4.   

Abstract

BACKGROUND: The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications.
METHOD: We analyzed the laboratory measures derived from Geisinger's EHR on patients in three distinct cohorts-patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns.
RESULTS: We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as -35.5 for the Cdiff, -8.3 for the IBD, and -11.3 for the OA dataset.
CONCLUSIONS: An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.

Entities:  

Keywords:  C. difficile infection; EHR; complex diseases; electronic health records; imputation; inflammatory bowel disease; laboratory measures; machine learning; medical informatics; osteoarthritis

Year:  2020        PMID: 33396741     DOI: 10.3390/jcm10010103

Source DB:  PubMed          Journal:  J Clin Med        ISSN: 2077-0383            Impact factor:   4.241


  5 in total

1.  Early Detection of Septic Shock Onset Using Interpretable Machine Learners.

Authors:  Debdipto Misra; Venkatesh Avula; Donna M Wolk; Hosam A Farag; Jiang Li; Yatin B Mehta; Ranjeet Sandhu; Bipin Karunakaran; Shravan Kethireddy; Ramin Zand; Vida Abedi
Journal:  J Clin Med       Date:  2021-01-15       Impact factor: 4.241

2.  Machine Learning-Enabled 30-Day Readmission Model for Stroke Patients.

Authors:  Negar Darabi; Niyousha Hosseinichimeh; Anthony Noto; Ramin Zand; Vida Abedi
Journal:  Front Neurol       Date:  2021-03-31       Impact factor: 4.003

Review 3.  Artificial Intelligence: A Shifting Paradigm in Cardio-Cerebrovascular Medicine.

Authors:  Vida Abedi; Seyed-Mostafa Razavi; Ayesha Khan; Venkatesh Avula; Aparna Tompe; Asma Poursoroush; Alireza Vafaei Sadr; Jiang Li; Ramin Zand
Journal:  J Clin Med       Date:  2021-12-06       Impact factor: 4.241

4.  Predicting short and long-term mortality after acute ischemic stroke using EHR.

Authors:  Vida Abedi; Venkatesh Avula; Seyed-Mostafa Razavi; Shreya Bavishi; Durgesh Chaudhary; Shima Shahjouei; Ming Wang; Christoph J Griessenauer; Jiang Li; Ramin Zand
Journal:  J Neurol Sci       Date:  2021-06-29       Impact factor: 4.553

5.  Prediction of Long-Term Stroke Recurrence Using Machine Learning Models.

Authors:  Vida Abedi; Venkatesh Avula; Durgesh Chaudhary; Shima Shahjouei; Ayesha Khan; Christoph J Griessenauer; Jiang Li; Ramin Zand
Journal:  J Clin Med       Date:  2021-03-20       Impact factor: 4.241

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.