| Literature DB >> 31888660 |
Kishan Rama1,2, Helena Canhão3, Alexandra M Carvalho1, Susana Vinga4.
Abstract
BACKGROUND: Patient stratification is a critical task in clinical decision making since it can allow physicians to choose treatments in a personalized way. Given the increasing availability of electronic medical records (EMRs) with longitudinal data, one crucial problem is how to efficiently cluster the patients based on the temporal information from medical appointments. In this work, we propose applying the Temporal Needleman-Wunsch (TNW) algorithm to align discrete sequences with the transition time information between symbols. These symbols may correspond to a patient's current therapy, their overall health status, or any other discrete state. The transition time information represents the duration of each of those states. The obtained TNW pairwise scores are then used to perform hierarchical clustering. To find the best number of clusters and assess their stability, a resampling technique is applied.Entities:
Keywords: Bootstrap; Clustering; Temporal sequence alignment; clustering indices
Mesh:
Year: 2019 PMID: 31888660 PMCID: PMC6938005 DOI: 10.1186/s12911-019-1013-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The proposed AliClu approach. First, raw data is pre-processed to obtain PE sequences. Then, pairwise sequence alignment is performed and a similarity matrix S is obtained. Next, S is converted into a distance matrix D. Agglomerative clustering is then performed with this distance matrix D. Validation of the clustering results is accomplished via a bootstrapping approach. In the end, retrieved clusters are analysed by the clinicians
Fig. 2Percentage of biologic drugs taken by Rheumatoid Arthritis (RA) patients. Almost 60% of the patients only had one biologic drug. Patients that have taken more than five biologic drugs are rare; three patients have taken five, two patients have taken six, and other two seven biologic drugs
Fig. 3Dendrogram of the agglomerative hierarchical clustering of Rheumatoid Arthritis (RA) patients. Dendrogram of Ward’s method hierarchical clustering with gap penalty g=0.7 and temporal penalty T=0.25. Twenty five clusters were selected based on the analysis of the clustering indices and clinical interpretation
Average values of five clustering indices for the dendrogram of Fig. 3
| Rand | AR | FM | Jaccard | AW | |
|---|---|---|---|---|---|
| 2 | 0.876 | 0.744 | 0.897 | 0.827 | 0.704 |
| 3 | 0.852 | 0.675 | 0.789 | 0.658 | 0.661 |
| 4 | 0.872 | 0.689 | 0.780 | 0.644 | 0.644 |
| 5 | 0.897 | 0.705 | 0.773 | 0.632 | 0.759 |
| 6 | 0.920 | 0.751 | 0.802 | 0.672 | 0.768 |
| 7 | 0.935 | 0.780 | 0.820 | 0.699 | 0.771 |
| 8 | 0.931 | 0.753 | 0.796 | 0.662 | 0.700 |
| 9 | 0.950 | 0.801 | 0.830 | 0.712 | 0.782 |
| 10 | 0.966 | 0.855 | 0.875 | 0.779 | 0.861 |
| 11 | 0.969 | 0.863 | 0.881 | 0.789 | 0.857 |
| 12 | 0.973 | 0.876 | 0.892 | 0.805 | 0.878 |
| 13 | 0.975 | 0.883 | 0.897 | 0.814 | 0.883 |
| 14 | 0.979 | 0.897 | 0.909 | 0.833 | 0.914 |
| 15 | 0.982 | 0.910 | 0.920 | 0.852 | 0.917 |
| 16 | 0.985 | 0.925 | 0.933 | 0.875 | 0.931 |
| 17 | 0.987 | 0.932 | 0.940 | 0.887 | 0.937 |
| 18 | 0.988 | 0.936 | 0.943 | 0.893 | 0.939 |
| 19 | 0.989 | 0.940 | 0.946 | 0.899 | 0.944 |
| 20 | 0.988 | 0.937 | 0.943 | 0.893 | 0.933 |
| 21 | 0.989 | 0.938 | 0.945 | 0.895 | 0.939 |
| 22 | 0.990 | 0.942 | 0.948 | 0.901 | 0.940 |
| 23 | 0.991 | 0.946 | 0.951 | 0.907 | 0.961 |
| 24 | 0.992 | 0.953 | 0.958 | 0.919 | 0.965 |
| 25 | 0.958 | 0.962 | 0.926 | ||
| 26 | 0.964 | ||||
| 27 | 0.958 | 0.962 | 0.928 | 0.960 | |
| 28 | 0.992 | 0.955 | 0.959 | 0.923 | 0.952 |
| 29 | 0.992 | 0.952 | 0.957 | 0.920 | 0.945 |
| 30 | 0.991 | 0.940 | 0.947 | 0.903 | 0.924 |
Fig. 4Standard deviation of Adjusted Rand (AR) versus the number of clusters. Standard deviation of AR versus number of clusters for dendrogram in Fig. 3. There is a downward trend of the standard deviation when increasing the number of clusters. The minimum value is attained with 25 clusters
Stability of the 25 clusters for Ward’s method, g=0.7, and T=0.25
| Cluster Nb. | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| (# patients) | median | median | median | average | average | average | std | std | std |
| 1 (4) | 0.475 | 0.298 | 0.625 | 0.475 | 0.298 | 0.625 | 0.389 | 0.185 | 0.177 |
| 2 (4) | 0.750 | 0.429 | 0.750 | 0.750 | 0.429 | 0.750 | 0.000 | 0.000 | 0.000 |
| 3 (5) | 0.083 | 0.077 | 0.200 | 0.083 | 0.077 | 0.200 | 0.000 | 0.000 | 0.000 |
| 4 (5) | 0.400 | 0.271 | 0.600 | 0.400 | 0.271 | 0.600 | 0.283 | 0.147 | 0.000 |
| 5 (5) | 0.275 | 0.215 | 0.500 | 0.275 | 0.215 | 0.500 | 0.035 | 0.022 | 0.141 |
| 6 (6) | 0.833 | 0.455 | 0.833 | 0.833 | 0.455 | 0.833 | 0.000 | 0.000 | 0.000 |
| 7 (7) | 0.741 | 0.423 | 0.786 | 0.741 | 0.423 | 0.786 | 0.164 | 0.054 | 0.101 |
| 8 (7) | 0.307 | 0.233 | 0.500 | 0.307 | 0.233 | 0.500 | 0.080 | 0.047 | 0.101 |
| 9 (7) | 0.643 | 0.390 | 0.643 | 0.643 | 0.390 | 0.643 | 0.101 | 0.037 | 0.101 |
| 10 (8) | 0.688 | 0.407 | 0.688 | 0.688 | 0.407 | 0.688 | 0.088 | 0.031 | 0.088 |
| 11 (9) | 0.542 | 0.347 | 0.611 | 0.542 | 0.347 | 0.611 | 0.177 | 0.075 | 0.079 |
| 12 (9) | 0.389 | 0.269 | 0.444 | 0.389 | 0.269 | 0.444 | 0.236 | 0.124 | 0.157 |
| 13 (10) | 0.352 | 0.256 | 0.400 | 0.352 | 0.256 | 0.400 | 0.145 | 0.080 | 0.141 |
| 14 (10) | 0.489 | 0.311 | 0.550 | 0.489 | 0.311 | 0.550 | 0.337 | 0.156 | 0.354 |
| 15 (13) | 0.513 | 0.330 | 0.577 | 0.513 | 0.330 | 0.577 | 0.254 | 0.112 | 0.163 |
| 16 (13) | 0.472 | 0.321 | 0.577 | 0.472 | 0.321 | 0.577 | 0.039 | 0.018 | 0.054 |
| 17 (14) | 0.571 | 0.358 | 0.571 | 0.571 | 0.358 | 0.571 | 0.202 | 0.082 | 0.202 |
| 18 (16) | 0.719 | 0.416 | 0.719 | 0.719 | 0.416 | 0.719 | 0.133 | 0.045 | 0.133 |
| 19 (17) | 0.309 | 0.235 | 0.353 | 0.309 | 0.235 | 0.353 | 0.084 | 0.049 | 0.083 |
| 20 (19) | 0.716 | 0.416 | 0.737 | 0.716 | 0.416 | 0.737 | 0.119 | 0.041 | 0.149 |
| 21 (20) | 0.791 | 0.440 | 0.825 | 0.791 | 0.440 | 0.825 | 0.154 | 0.048 | 0.106 |
| 22 (32) | 0.696 | 0.410 | 0.719 | 0.696 | 0.410 | 0.719 | 0.056 | 0.019 | 0.088 |
| 23 (37) | 0.791 | 0.441 | 0.811 | 0.791 | 0.441 | 0.811 | 0.104 | 0.032 | 0.076 |
| 24 (46) | 0.728 | 0.420 | 0.728 | 0.728 | 0.420 | 0.728 | 0.108 | 0.036 | 0.108 |
| 25 (101) | 0.777 | 0.437 | 0.777 | 0.777 | 0.437 | 0.777 | 0.007 | 0.002 | 0.007 |
Fig. 5Cluster Visualization. Graph representation of selected clusters based on stability measures and clinical interpretation. Drug codes: A - Etanercept; B - Infliximab; C - Rituximab; D - Adalimumab; E - Anacinra; F - Abatacept; G - Tocilizumab; H - Golimumab. Z - Follow-up/end