Literature DB >> 27919371

A machine learning-based framework to identify type 2 diabetes through electronic health records.

Tao Zheng1, Wei Xie2, Liling Xu3, Xiaoying He4, Ya Zhang5, Mingrong You6, Gong Yang6, You Chen7.   

Abstract

OBJECTIVE: To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate.
MATERIALS AND METHODS: We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort retrieved from a regional distributed EHR repository ranging from 2012 to 2014.
RESULTS: We apply top-performing machine learning algorithms on the engineered features. We benchmark and contrast the accuracy, precision, AUC, sensitivity and specificity of classification models against the state-of-the-art expert algorithm for identification of T2DM subjects. Our results indicate that the framework achieved high identification performances (∼0.98 in average AUC), which are much higher than the state-of-the-art algorithm (0.71 in AUC). DISCUSSION: Expert algorithm-based identification of T2DM subjects from EHR is often hampered by the high missing rates due to their conservative selection criteria. Our framework leverages machine learning and feature engineering to loosen such selection criteria to achieve a high identification rate of cases and controls.
CONCLUSIONS: Our proposed framework demonstrates a more accurate and efficient approach for identifying subjects with and without T2DM from EHR. Copyright Â
© 2016 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Keywords:  Data mining; Electronic health records; Feature engineering; Machine learning; Type 2 diabetes

Mesh:

Year:  2016        PMID: 27919371      PMCID: PMC5144921          DOI: 10.1016/j.ijmedinf.2016.09.014

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  34 in total

Review 1.  Data-mining technologies for diabetes: a systematic review.

Authors:  Miroslav Marinov; Abu Saleh Mohammad Mosa; Illhoi Yoo; Suzanne Austin Boren
Journal:  J Diabetes Sci Technol       Date:  2011-11-01

Review 2.  The genetics of type 2 diabetes: what have we learned from GWAS?

Authors:  Liana K Billings; Jose C Florez
Journal:  Ann N Y Acad Sci       Date:  2010-11       Impact factor: 5.691

3.  Improving classification performance with discretization on biomedical datasets.

Authors:  Jonathan L Lustgarten; Vanathi Gopalakrishnan; Himanshu Grover; Shyam Visweswaran
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

Review 4.  The thrifty phenotype hypothesis.

Authors:  C N Hales; D J Barker
Journal:  Br Med Bull       Date:  2001       Impact factor: 4.291

5.  SecureMA: protecting participant privacy in genetic association meta-analysis.

Authors:  Wei Xie; Murat Kantarcioglu; William S Bush; Dana Crawford; Joshua C Denny; Raymond Heatherly; Bradley A Malin
Journal:  Bioinformatics       Date:  2014-08-21       Impact factor: 6.937

6.  Multivariate Analysis of Genotype-Phenotype Association.

Authors:  Philipp Mitteroecker; James M Cheverud; Mihaela Pavlicev
Journal:  Genetics       Date:  2016-02-19       Impact factor: 4.562

Review 7.  Pleiotropy in complex traits: challenges and strategies.

Authors:  Nadia Solovieff; Chris Cotsapas; Phil H Lee; Shaun M Purcell; Jordan W Smoller
Journal:  Nat Rev Genet       Date:  2013-06-11       Impact factor: 53.242

8.  Type 2 diabetes risk forecasting from EMR data using machine learning.

Authors:  Subramani Mani; Yukun Chen; Tom Elasy; Warren Clayton; Joshua Denny
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

Review 9.  Genome-wide association studies provide new insights into type 2 diabetes aetiology.

Authors:  Timothy M Frayling
Journal:  Nat Rev Genet       Date:  2007-09       Impact factor: 53.242

10.  Supporting Regularized Logistic Regression Privately and Efficiently.

Authors:  Wenfa Li; Hongzhe Liu; Peng Yang; Wei Xie
Journal:  PLoS One       Date:  2016-06-06       Impact factor: 3.240

View more
  40 in total

1.  Cohort selection for clinical trials using hierarchical neural network.

Authors:  Ying Xiong; Xue Shi; Shuai Chen; Dehuan Jiang; Buzhou Tang; Xiaolong Wang; Qingcai Chen; Jun Yan
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

2.  CaliForest: Calibrated Random Forest for Health Data.

Authors:  Yubin Park; Joyce C Ho
Journal:  Proc ACM Conf Health Inference Learn (2020)       Date:  2020-04-02

3.  Predicting diabetes-related hospitalizations based on electronic health records.

Authors:  Theodora S Brisimi; Tingting Xu; Taiyao Wang; Wuyang Dai; Ioannis Ch Paschalidis
Journal:  Stat Methods Med Res       Date:  2018-11-25       Impact factor: 3.021

Review 4.  Machine Learning: Algorithms, Real-World Applications and Research Directions.

Authors:  Iqbal H Sarker
Journal:  SN Comput Sci       Date:  2021-03-22

Review 5.  Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council.

Authors:  Partho P Sengupta; Sirish Shrestha; Béatrice Berthon; Emmanuel Messas; Erwan Donal; Geoffrey H Tison; James K Min; Jan D'hooge; Jens-Uwe Voigt; Joel Dudley; Johan W Verjans; Khader Shameer; Kipp Johnson; Lasse Lovstakken; Mahdi Tabassian; Marco Piccirilli; Mathieu Pernot; Naveena Yanamala; Nicolas Duchateau; Nobuyuki Kagiyama; Olivier Bernard; Piotr Slomka; Rahul Deo; Rima Arnaout
Journal:  JACC Cardiovasc Imaging       Date:  2020-09

Review 6.  Electronic health records and polygenic risk scores for predicting disease risk.

Authors:  Ruowang Li; Yong Chen; Marylyn D Ritchie; Jason H Moore
Journal:  Nat Rev Genet       Date:  2020-03-31       Impact factor: 53.242

7.  High-throughput phenotyping with temporal sequences.

Authors:  Hossein Estiri; Zachary H Strasser; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2021-03-18       Impact factor: 4.497

8.  Application of multi-label classification models for the diagnosis of diabetic complications.

Authors:  Liang Zhou; Xiaoyuan Zheng; Di Yang; Ying Wang; Xuesong Bai; Xinhua Ye
Journal:  BMC Med Inform Decis Mak       Date:  2021-06-07       Impact factor: 2.796

Review 9.  Hyperpolarized Magnetic Resonance and Artificial Intelligence: Frontiers of Imaging in Pancreatic Cancer.

Authors:  José S Enriquez; Yan Chu; Shivanand Pudakalakatti; Kang Lin Hsieh; Duncan Salmon; Prasanta Dutta; Niki Zacharias Millward; Eugene Lurie; Steven Millward; Florencia McAllister; Anirban Maitra; Subrata Sen; Ann Killary; Jian Zhang; Xiaoqian Jiang; Pratip K Bhattacharya; Shayan Shams
Journal:  JMIR Med Inform       Date:  2021-06-17

Review 10.  Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective.

Authors:  Iqbal H Sarker
Journal:  SN Comput Sci       Date:  2021-07-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.