| Literature DB >> 31023325 |
Zheng Jia1, Xudong Lu1, Huilong Duan1, Haomin Li2,3.
Abstract
BACKGROUND: Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied.Entities:
Keywords: Concept similarity; Data visualization; ICD-10; Patient similarity; Predictive model; Taxonomic concept
Mesh:
Year: 2019 PMID: 31023325 PMCID: PMC6485152 DOI: 10.1186/s12911-019-0807-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Taxonomic clinical concepts and patient similarity. a Taxonomic concepts and concepts semantic similarity. b Patient similarity based on the concept set-level similarity
The formula used in the taxonomic concept-based patient similarity
| # | Formula | Reference | ||
|---|---|---|---|---|
| Information Content (IC) | 1 | levels(a → r) | [ | |
| 2 |
| [ | ||
| Code-level Similarity (CS) | 1 |
| – | |
| 2 |
| [ | ||
| 3 |
| [ | ||
| 4 |
| – | ||
| Set-level similarity (SS) | 1 | Dice |
| – |
| 2 | Jaccard |
| – | |
| 3 | Cosine |
| – | |
| 4 | Overlap |
| – | |
| 5 |
| [ | ||
| 6 |
| [ | ||
Selected combinations of algorithms
| Triple# | IC | Code-level Similarity (CS) | Set-level Similarity (SS) |
|---|---|---|---|
| < 1,2,5> | levels(a → r) |
|
|
| < 1,2,6> |
| ||
| < 1,2,7> |
| ||
| < 1,2,8> | Minimum Weighted Bipartite Matching | ||
| < 2,2,5> |
|
|
|
| < 2,2,6> |
| ||
| < 2,2,7> |
| ||
| < 2,2,8> | Minimum Weighted Bipartite Matching | ||
| < 1,3,8> | levels(a → r) |
| Minimum Weighted Bipartite Matching |
| < 1,4,8> | levels(a → r) |
| Minimum Weighted Bipartite Matching |
Numbers of patients of four pre-defined subpopulations
| Criteria | 18 ≤ Age ≤ 50 | Age ≥ 51 |
|---|---|---|
| 1 ≤ HLOS≤18 | 283 | 257 |
| 19 ≤ HLOS≤50 | 82 | 83 |
Fig. 2Visual comparison of two IC algorithms through the mapping distance of ICD-10 concepts. a IC #1 Formula was used to generate the distance matrix. b IC #2 Formula was used to generate the distance matrix
Fig. 3Variation of the distance between long HLOS and short HLOS subpopulation prototypes (y-axis) generated by different code-level similarity formula with respect to the size of the subpopulation prototype(x-axis). A longer distance indicates that prototypes can be separated more distinctly. a The comparison results of the younger population (age ≤ 50 years). b The comparison results of the elder population (age ≥ 50 years)
Fig. 4The scatter plots of diagnoses prototypes of four subpopulations used for evaluation. The size of the circle is a relative scaled prototype score of the diagnosis code in the subpopulation. The color of the circle depends on the first letter of the ICD-10 code. a A subpopulation younger than 50 years old with HLOS shorter than 18 days. b A subpopulation younger than 50 years old with HLOS longer than 19 days. c A subpopulation older than 50 years old with HLOS shorter than 18 days. d A subpopulation older than 50 years old with HLOS longer than 19 days
Fig. 5Variation of the algorithm performance (y-axis) with respect to the size of the retrieved concepts list forming the prototype (x-axis). a The IC-level comparisons. b The code-level comparisons. c The set-level comparisons
Fig. 6Correlation between set-level methods. a Using four set-level similarity algorithms to measure the distance of two prototypes with different prototype set sizes. b The correlation between each two SS methods