| Literature DB >> 35009673 |
Sang Ho Oh1, Seunghwa Back2, Jongyoul Park1,3.
Abstract
Patient similarity research is one of the most fundamental tasks in healthcare, helping to make decisions without incurring additional time and costs in clinical practices. Patient similarity can also apply to various medical fields, such as cohort analysis and personalized treatment recommendations. Because of this importance, patient similarity measurement studies are actively being conducted. However, medical data have complex, irregular, and sequential characteristics, making it challenging to measure similarity. Therefore, measuring accurate similarity is a significant problem. Existing similarity measurement studies use supervised learning to calculate the similarity between patients, with similarity measurement studies conducted only on one specific disease. However, it is not realistic to consider only one kind of disease, because other conditions usually accompany it; a study to measure similarity with multiple diseases is needed. This research proposes a convolution neural network-based model that jointly combines feature learning and similarity learning to define similarity in patients with multiple diseases. We used the cohort data from the National Health Insurance Sharing Service of Korea for the experiment. Experimental results verify that the proposed model has outstanding performance when compared to other existing models for measuring multiple-disease patient similarity.Entities:
Keywords: convolution neural network; electronic health records; feature learning; joint learning; multiple diseases; patient similarity measurement
Mesh:
Year: 2021 PMID: 35009673 PMCID: PMC8749530 DOI: 10.3390/s22010131
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1An Illustration of EHRs.
Group division for continuous measurement.
| Age | BP | BMI | |
|---|---|---|---|
| Group 1 | 0–30 | Prehypertension | Underweight |
| Group 2 | 31–60 | Stage 1 | Normal |
| Group 3 | Over 61 | Stage 2 | Overweight |
In BP, the prehypertension range is 120–139 mmHg of systolic blood pressure (SBP) and 80–89 mmHg of diastolic blood pressure (DBP). Stage 1 range is SBP of 140–159 mmHg and 90–99 mmHg of DBP. Lastly, the range of stage 2 is SBP over 160 mmHg and DBP over 100 mmHg. In BMI, the underweight group range is below 18.5 kg/m2. Normal group range is between 18.5 kg/m2 to 22.9 kg/m2. The overweight group is composed of BMI more than 23 kg/m2.
Example of similarity labeling.
| Features | Patient A | Patient B | Match | No. of Match | |
|---|---|---|---|---|---|
| Age | 0–30 | 0 | 0 | 0 | 4 |
| 31–60 | 1 | 1 | 1 | ||
| Over 61 | 0 | 0 | 0 | ||
| Gender | Male | 1 | 1 | 1 | |
| Female | 0 | 0 | 0 | ||
| BP | Prehypertension | 0 | 0 | 0 | |
| Stage 1 | 1 | 0 | 0 | ||
| Stage 2 | 0 | 1 | 0 | ||
| BMI | Underweight | 0 | 0 | 0 | |
| Normal | 0 | 0 | 0 | ||
| Overweight | 1 | 1 | 1 | ||
| Disease | Diabetes | 1 | 0 | 0 | |
| Cerebrovascular disease | 1 | 1 | 1 | ||
| Ischemic heart disease | 0 | 0 | 0 | ||
| Output | Similarity Label | 1 | |||
If the patient’s record belongs to a certain feature, we write 1, otherwise 0. In the match column, if the features of patients A and B match, we write 1, otherwise 0. Similarity label will be 1 if the number of matches is more than equal to 4, otherwise it is 0.
Figure 2CNN block.
Figure 3An Illustration of Feature Learning.
Figure 4An illustration of proposed similarity learning.
Dataset descriptions.
| Descriptions | Male | Female |
|---|---|---|
| Sex (%) | 56 | 44 |
| Age, mean(SD) | 57.8 (23.4) | 60.5 (21.5) |
| SBP (mmHg), mean (SD) | 126.4 (27.1) | 131.8 (24.8) |
| DBP (mmHg), mean (SD) | 78.8 (12.6) | 81.8 (14.2) |
| BMI (kg/m2), mean (SD) | 22.3 (5.8) | 23.1 (4.2) |
| No. of visits | 1,052,085 | 648,066 |
Figure 5Population distribution of diabetes, cerebrovascular disease, and ischemic heart disease.
Similarity-based learning results.
| Diabetes and | Diabetes and | Cerebrovascular Disease and | Diabetes and | Average | |
|---|---|---|---|---|---|
| Mean | 0.7859 | 0.8027 | 0.8429 | 0.8474 | 0.8197 |
| Max | 0.8757 | 0.8607 | 0.8952 | 0.9274 | 0.8897 |
| Min | 0.5571 | 0.7470 | 0.8035 | 0.8137 | 0.7304 |
The metric presented in Table 4 is the similarity score calculated by Equation (4). The range of the score is between 0 and 1. If the score is closer to 1, it means higher similarity; otherwise, it is closer to 0. The mean, max, and min came from the score of patient pairs.
Joint learning results.
| Diabetes and | Diabetes and | Cerebrovascular Disease and | Diabetes and | Average | |
|---|---|---|---|---|---|
| Mean | 0.8446 | 0.9148 | 0.9227 | 0.9333 | 0.9039 |
| Max | 0.9589 | 0.9789 | 0.9849 | 0.9782 | 0.9752 |
| Min | 0.6439 | 0.8125 | 0.8526 | 0.8645 | 0.7934 |
The metric presented in Table 5 is the similarity score calculated by Equation (5). The range of the score is between 0 and 1. If the score is closer to 1, it means higher similarity; otherwise, it is closer to 0. The mean, max, and min came from the score of patient pairs.
Figure 6Performance of feature learning, similarity learning joint learning.
Performance comparison between existing algorithms.
| Algorithms | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Euclidean | 0.5423 | 0.5218 | 0.6285 | 0.5457 |
| Cosine | 0.5751 | 0.5684 | 0.6071 | 0.5895 |
| Mahalanobis | 0.6573 | 0.6817 | 0.7825 | 0.7782 |
| LSML | 0.8015 | 0.8148 | 0.8533 | 0.8759 |
| Joint Learning | 0.8572 | 0.8511 | 0.8925 | 0.9227 |