| Literature DB >> 26032596 |
Lawrence Wc Chan1, Ying Liu2, Tao Chan3, Helen Kw Law4, S C Cesar Wong4, Andy Ph Yeung4, K F Lo4, S W Yeung4, K Y Kwok4, William Yl Chan4, Thomas Yh Lau4, Chi-Ren Shyu5.
Abstract
BACKGROUND: Similarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue.Entities:
Mesh:
Year: 2015 PMID: 26032596 PMCID: PMC4450834 DOI: 10.1186/s12911-015-0166-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1- Projection of image finding terms to feature concepts in SNOMED CT “is-a” hierarchy. Part of the “is-a” hierarchical relationships is illustrated with three examples demonstrating the rules to determine the semantic distances. Four image finding terms: “cirrhosis”, “hepatic fibrosis”, “splenomegaly” and “fatty liver” are considered. The level-4 concepts are regarded as feature concepts. In this case, feature concepts: “liver finding”, “abdominal organ finding” and “fatty liver” are involved. a The term “cirrhosis” at level-7 is the descendant of “liver finding”. Their semantic distance is 3 because there are three “is-a” links between them. b The semantic distance between “hepatic fibrosis” and “liver finding” is 2. c The term “splenomegaly” is not a descendant of “liver finding” but the descendant of “abdominal organ finding”. Thus, the semantic distance between “splenomegaly” and “liver finding” is infinity and that with “abdominal organ finding” is 2. Finally, the term “fatty liver” at level 4 is also a feature concept and the semantic distance is 0
Fig. 2- A schematic view of the method. Step 1: Manual extraction of the image finding terms and their corresponding synonyms from the reports. Step 2: The concepts of the image finding terms defined in SNOMED CT were identified by using UMLS Terminology Services. Step 3: Edge counting of the semantic distances between the extracted terms and the level-4 feature concepts. Step 4: The feature concepts are weighted by (Step 4a) generic term weighting approach and (Step 4b) specific term weighting approach. Step 5: The feature vectors are generated. Step 6: Similarity scores between feature vectors are calculated by modified direction cosine
Fig. 3- Plot of AUROC against the value of k. The accuracy of inter-patient HCC co-occurrence prediction increases when k is between 0 and 2 and saturates at the level of 0.735 when k further increases
- Comparison of term weighting approaches. AUROCs and the 95 % CIs of the equal, generic and specific term weighting approaches are summarized here
| Term weighting approach | AUROC | 95 % CI |
|---|---|---|
| Equal term weighting | 0.735 | (0.724, 0.746) |
| Generic term weighting | 0.728 | (0.717, 0.739) |
| Specific term weighting | 0.743 | (0.732, 0.754) |
- Top ten image finding terms. The PubMed search results indicated that some image finding terms were co-mentioned with HCC very frequently in the abstracts of biomedical journal articles. The conditional probability of “Dysplastic nodule” (0.934) is the highest among all the extracted terms
| Rank | Image finding | Conditional probability |
|---|---|---|
| 1 | Dysplastic nodule | 0.934 |
| 2 | Nodule of liver | 0.513 |
| 3 | Equal density (isodense) lesion | 0.438 |
| 4 | Nodular hyperplasia of liver | 0.329 |
| 5 | Solitary necrotic liver nodule | 0.259 |
| 6 | Portal vein thrombosis | 0.209 |
| 7 | Space occupying lesion of liver | 0.175 |
| 8 | Cirrhosis of liver | 0.170 |
| 9 | Hepatic fibrosis | 0.082 |
| 10 | Nontraumatic hemoperitoneum | 0.064 |