| Literature DB >> 35629190 |
Alramzana Nujum Navaz1, Hadeel T El-Kassabi2, Mohamed Adel Serhani1, Abderrahim Oulhaj3,4, Khaled Khalil5.
Abstract
Precision medicine can be defined as the comparison of a new patient with existing patients that have similar characteristics and can be referred to as patient similarity. Several deep learning models have been used to build and apply patient similarity networks (PSNs). However, the challenges related to data heterogeneity and dimensionality make it difficult to use a single model to reduce data dimensionality and capture the features of diverse data types. In this paper, we propose a multi-model PSN that considers heterogeneous static and dynamic data. The combination of deep learning models and PSN allows ample clinical evidence and information extraction against which similar patients can be compared. We use the bidirectional encoder representations from transformers (BERT) to analyze the contextual data and generate word embedding, where semantic features are captured using a convolutional neural network (CNN). Dynamic data are analyzed using a long-short-term-memory (LSTM)-based autoencoder, which reduces data dimensionality and preserves the temporal features of the data. We propose a data fusion approach combining temporal and clinical narrative data to estimate patient similarity. The experiments we conducted proved that our model provides a higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms.Entities:
Keywords: BERT; autoencoder; big data; deep learning; electronic health records; patient; patient similarity network; patient-centered framework; personalized healthcare; precision medicine; transformers
Year: 2022 PMID: 35629190 PMCID: PMC9144142 DOI: 10.3390/jpm12050768
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Supervised patient similarity matching framework.
Methods used for building patient similarity.
| Method | Parameters/Factors | Applications |
|---|---|---|
| Deep learning | ICD9 | Unsupervised/supervised patient similarity (CNN) [ |
| Triplet-loss metric learning | Longitudinal EHRs | Personalized prediction [ |
| Temporal similarity | Temporal sequences | Clinical (workflow) case similarity [ |
| Clustering | Variety of components of patient data | Patient similarity analytics loop [ |
| Similarity measure construction | ICD code, Empirical co-occurrence frequency, | Predict individual discharge diagnoses [ |
| Deep patient representation (three-layer stacked denoising autoencoders) | ICD9 | Future disease prediction [ |
| Similarity network fusion (SNF) | Nodes represent patients, and patients’ pairwise similarities are represented by edges | Network-based survival risk prediction |
| Locally supervised metric learning (LSML) | Longitudinal patient data | Personalized predictive models and generation of personalized risk factor profiles [ |
| Collaborative filtering methodology | ICD data | Creates a personalized disease risk profile and a disease management plan for the patient [ |
| Anonymous indexing of health conditions for a similarity measure | Text similarity | Recommend two other patients for each patient based on a keyword [ |
| SimSVM | 14 similarity measures from relevant clinical and imaging data | Predicting the survival of patients suffering from hepatocellular carcinoma (HCC) [ |
| Concept hierarchy | Hierarchical distance measure | Detecting correlations in medical records by comparing the hierarchy of terms considering the distance between non-similar records in a hierarchy [ |
Figure 2System architecture.
Figure 3Visualization dashboard—A physician’s perspective.
Figure 4Key processes in building a PSN.
Summary of the datasets used in our experiments.
| Dataset-1 | Dataset-2 | |
|---|---|---|
|
| COVID-19 | CVD |
|
| Static | Static and Dynamic |
|
| Small (200) | Big (20,000) |
|
|
Figure 5Accuracy with various distance measures (one-hot encoding and BERT).
Evaluation of the PSN distance measures with one-hot encoding and BERT.
| One-Hot Encoding | BERT | |||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Accuracy Std. Dev. | Precision | F1-Score | Accuracy | Accuracy Std. Dev. | Precision | F1-Score | |
|
| 71.86 | 4.78 | 72.10 | 83.35 | 72.37 | 4.77 | 99.73 | 83.73 |
|
| 70.78 | 5.63 | 71.01 | 82.62 | 72.28 | 5.52 | 99.89 | 83.70 |
|
| 71.00 | 5.24 | 71.24 | 82.68 | 84.60 | 5.51 | 97.64 | 89.97 |
|
| 69.58 | 5.70 | 71.90 | 80.98 | 72.12 | 5.61 | 99.66 | 83.59 |
|
| 71.79 | 5.40 | 72.33 | 82.82 | 71.83 | 4.99 | 96.93 | 83.04 |
Weighted scoring table.
| Age | Sex | Symptoms | Addnl_Info | Chronic_Disease_Binary | Chronic_Disease | Score | Rank | |
|---|---|---|---|---|---|---|---|---|
| Weight | 0.1 | 0.15 | 0.2 | 0.15 | 0.2 | 0.2 | ||
| Option1 | 1 | 1 | 3 | 3 | 3 | 1 |
|
|
| Option2 | 1 | 1 | 3 | 2 | 3 | 3 |
|
|
| Option3 | 1 | 1 | 4 | 3 | 2 | 2 |
|
|
| Option4 | 1 | 1 | 3 | 3 | 3 | 3 |
|
|
| Option5 | 1 | 1 | 3 | 1 | 1 | 1 |
|
|
| Option6 | 2 | 1 | 1 | 2 | 1 | 1 |
|
|
| Option7 | 1 | 1 | 1 | 1 | 1 | 1 |
|
|
| Option8 | 1 | 2 | 2 | 1 | 1 | 1 |
|
|
| Option9 | 1 | 1 | 1 | 2 | 1 | 2 |
|
|
| Option10 | 1 | 1 | 1 | 2 | 2 | 1 |
|
|
Figure 6Weighted accuracy based on weighted features.
Figure 7Accuracy with varying training data involving similar patients.
Figure 8Static data: accuracy in the case of similar patients.
Figure 9Dataset 2: dynamic data distribution.
Figure 10The architecture of the data reduction autoencoder.
Figure 11Reconstruction loss associated with an autoencoder.
Figure 12Accuracy of the fusion PSN.
Benchmark PSN model compared to other classification algorithms.
| Dataset | Accuracy | |||||||
|---|---|---|---|---|---|---|---|---|
| PSN | Naïve Bayes | SVM | ZeroR | CNN | Logistic | Random Tree | Decision Tree | |
| CVD | 96% | 80.67% | 87.20 | 87.03% | 91.2% | 87.10% | 87.32% | 87.03% |
| COVID-19 Dataset 1 | 89% | 84.80% | 88.45 | 83.20% | 85.84% | 83.20% | 88.80% | 86.40% |