Literature DB >> 33422088

Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement.

Pei-Yuan Zhou1, Andrew K C Wong2.   

Abstract

BACKGROUND: Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups.
METHODS: In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare.
RESULTS: Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches.
CONCLUSIONS: In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.

Entities:  

Keywords:  Clinical decision-making; Disentanglement; Imbalance classification; Pattern discovery

Mesh:

Year:  2021        PMID: 33422088      PMCID: PMC7796578          DOI: 10.1186/s12911-020-01356-y

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   2.796


  9 in total

1.  Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence.

Authors:  Huiying Liang; Brian Y Tsui; Hao Ni; Carolina C S Valentim; Sally L Baxter; Guangjian Liu; Wenjia Cai; Daniel S Kermany; Xin Sun; Jiancong Chen; Liya He; Jie Zhu; Pin Tian; Hua Shao; Lianghong Zheng; Rui Hou; Sierra Hewett; Gen Li; Ping Liang; Xuan Zang; Zhiqi Zhang; Liyan Pan; Huimin Cai; Rujuan Ling; Shuhua Li; Yongwang Cui; Shusheng Tang; Hong Ye; Xiaoyan Huang; Waner He; Wenqing Liang; Qing Zhang; Jianmin Jiang; Wei Yu; Jianqun Gao; Wanxing Ou; Yingmin Deng; Qiaozhen Hou; Bei Wang; Cuichan Yao; Yan Liang; Shu Zhang; Yaou Duan; Runze Zhang; Sarah Gibson; Charlotte L Zhang; Oulan Li; Edward D Zhang; Gabriel Karin; Nathan Nguyen; Xiaokang Wu; Cindy Wen; Jie Xu; Wenqin Xu; Bochu Wang; Winston Wang; Jing Li; Bianca Pizzato; Caroline Bao; Daoman Xiang; Wanting He; Suiqin He; Yugui Zhou; Weldon Haw; Michael Goldbaum; Adriana Tremoulet; Chun-Nan Hsu; Hannah Carter; Long Zhu; Kang Zhang; Huimin Xia
Journal:  Nat Med       Date:  2019-02-11       Impact factor: 53.440

Review 2.  Artificial intelligence in healthcare.

Authors:  Kun-Hsing Yu; Andrew L Beam; Isaac S Kohane
Journal:  Nat Biomed Eng       Date:  2018-10-10       Impact factor: 25.671

Review 3.  High-performance medicine: the convergence of human and artificial intelligence.

Authors:  Eric J Topol
Journal:  Nat Med       Date:  2019-01-07       Impact factor: 53.440

4.  Imbalanced target prediction with pattern discovery on clinical data repositories.

Authors:  Tak-Ming Chan; Yuxi Li; Choo-Chiap Chiau; Jane Zhu; Jie Jiang; Yong Huo
Journal:  BMC Med Inform Decis Mak       Date:  2017-04-20       Impact factor: 2.796

Review 5.  A primer to frequent itemset mining for bioinformatics.

Authors:  Stefan Naulaerts; Pieter Meysman; Wout Bittremieux; Trung Nghia Vu; Wim Vanden Berghe; Bart Goethals; Kris Laukens
Journal:  Brief Bioinform       Date:  2013-10-26       Impact factor: 11.622

6.  Revealing Subtle Functional Subgroups in Class A Scavenger Receptors by Pattern Discovery and Disentanglement of Aligned Pattern Clusters.

Authors:  Pei-Yuan Zhou; En-Shiun Annie Lee; Antonio Sze-To; Andrew K C Wong
Journal:  Proteomes       Date:  2018-02-08

7.  Pattern to Knowledge: Deep Knowledge-Directed Machine Learning for Residue-Residue Interaction Prediction.

Authors:  Andrew K C Wong; Ho Yin Sze-To; Gary L Johanning
Journal:  Sci Rep       Date:  2018-10-04       Impact factor: 4.379

8.  Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics.

Authors:  Pei-Yuan Zhou; Antonio Sze-To; Andrew K C Wong
Journal:  BMC Med Genomics       Date:  2018-11-20       Impact factor: 3.063

9.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

Authors:  Davide Chicco; Giuseppe Jurman
Journal:  BMC Genomics       Date:  2020-01-02       Impact factor: 3.969

  9 in total
  3 in total

1.  Predicting Amyloid Positivity in Cognitively Unimpaired Older Adults: A Machine Learning Approach Using A4 Data.

Authors:  Kellen K Petersen; Richard B Lipton; Ellen Grober; Christos Davatzikos; Reisa A Sperling; Ali Ezzati
Journal:  Neurology       Date:  2022-04-25       Impact factor: 11.800

2.  Cluster-Based Ensemble Learning Model for Aortic Dissection Screening.

Authors:  Yan Gao; Min Wang; Guogang Zhang; Lingjun Zhou; Jingming Luo; Lijue Liu
Journal:  Int J Environ Res Public Health       Date:  2022-05-06       Impact factor: 4.614

3.  Pattern discovery and disentanglement on relational datasets.

Authors:  Andrew K C Wong; Pei-Yuan Zhou; Zahid A Butt
Journal:  Sci Rep       Date:  2021-03-11       Impact factor: 4.379

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.