| Literature DB >> 22779045 |
Hongfang Liu1, Kavishwar Wagholikar, Stephen Tze-Inn Wu.
Abstract
Extracting and encoding clinical information captured in free text with standard medical terminologies is vital to enable secondary use of electronic medical records (EMRs) for clinical decision support, improved patient safety, and clinical/translational research. A critical portion of free text is comprised of 'summary level' information in the form of problem lists, diagnoses and reasons of visit. We conducted a systematic analysis of SNOMED-CT in representing the summary level information utilizing a large collection of summary level data in the form of itemized entries. Results indicate that about 80% of the entries can be encoded with SNOMED-CT normalized phrases. When tolerating one unmapped token, 96% of the itemized entries can be encoded with SNOMED-CT concepts. The study provides a solid foundation for developing an automated system to encode summary level data using SNOMED-CT.Entities:
Year: 2012 PMID: 22779045 PMCID: PMC3392059
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.An example of summary level data in our study.
Most frequent itemized entries in summary level data.
| Hypertension | 777 | 1260 |
| Hyperlipidemia | 566 | 814 |
| Health-maintenance | 230 | 402 |
| Depression | 175 | 456 |
| Obesity | 151 | 288 |
| Coronary artery disease | 147 | 380 |
| Osteoporosis | 124 | 199 |
| Hypothyroidism | 122 | 251 |
| Diabetes mellitus | 103 | 386 |
Figure 2.Statistics of itemized entries and the number of mapped phrases.
Distribution statistics of SNOMED Semantic Tags.
| disorder | 44,116 | 1,221 |
| Finding | 16,030 | 1,724 |
| procedure | 12,825 | 1,195 |
| body structure | 8,528 | 2,521 |
| substance | 6,761 | 758 |
| morphologic abnormality | 6,280 | 2,078 |
| qualifier value | 6,025 | |
| Situation | 4,077 | 3,346 |
| Product | 3,503 | 808 |
| observable entity | 3,137 | 2,074 |
| Organism | 1,975 | 692 |
| Physical | 1,550 | 940 |
| regime/therapy | 931 | 1,476 |
| Attribute | 739 | |
| Event | 528 | 1,044 |
Figure 3.Token distribution in the corpus. Mapping of distribution of total tokens to the distribution of unique tokens is shown using dashed lines.
Statistics of the composition level for the entries.
| #unique Entries (in thousands) | 32 | 565 | 1,772 | 2,218 | 1,715 | 1,103 | 671 | 403 | 244 | 152 | 95 |
| #Total Entries (in thousands) | 306 | 16,522 | 8,532 | 4,653 | 2,483 | 1,403 | 812 | 474 | 281 | 174 | 107 |
| Average occurrences | 9.70 | 29.22 | 4.82 | 2.10 | 1.45 | 1.27 | 1.21 | 1.18 | 1.15 | 1.14 | 1.13 |
| Cumulative % occurrence | 1 | 47 | 70 | 83 | 90 | 94 | 96 | 98 | 99 | 99 | 99 |
Figure 4.Compositional statistics. The x-axis shows the number of SNOMED CT phrases needed to encode an entry. The y-axis is the number of unique entries and the total number itemized entries.