| Literature DB >> 31469075 |
Mina Kim1,2, Soo-Yong Shin1,2, Mira Kang1,2,3, Byoung-Kee Yi1,4, Dong Kyung Chang1,2,5.
Abstract
BACKGROUND: Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods.Entities:
Keywords: data quality; data science; electronic health records; standardization
Year: 2019 PMID: 31469075 PMCID: PMC6740165 DOI: 10.2196/14083
Source DB: PubMed Journal: JMIR Med Inform
Five common value sets.
| Category | Value set |
| Urine color tests | Clear, cloudy, orange, purple, brown, green, blue, red, black, yellow, dark yellow, pink, turbid, milky white, amber, straw, colorless, bloody |
| Urine dipstick tests | Negative, normal, trace, +, ++, +++, ++++ |
| Blood type tests | Rh+, Rh−, weak D, partial D, variant D, A, B, AB, O, cis-AB |
| Presence-finding tests | Positive, negative, weakly positive |
| Pathogenesis tests | Reactive, nonreactive, weakly reactive |
Figure 1Process of the proposed standardization algorithm for laboratory tests—categorical results (SALT-C). neg: negative; pos: positive.
Figure 2Character level vectorization. neg: negative; norm: normal; pos: positive.
Figure 3Distribution of laboratory tests data. Example laboratory test in the urine color tests category.
Figure 7Distribution of laboratory tests data. Example laboratory test in the pathogenesis tests category.
Figure 4Distribution of laboratory tests data. Example laboratory test in the urine dipstick tests category.
Figure 5Distribution of laboratory tests data. Example laboratory test in the blood type tests category.
Figure 6Distribution of laboratory tests data. Example laboratory test in the presence finding tests category.
Laboratory test categorization.
| Category | Classified laboratory tests | |
| Number | Representative laboratory tests | |
| Urine color tests | 2 | Urinalysis: color, turbidity |
| Urine dipstick tests | 14 | Urinalysis: glucose, protein, ketones, hemoglobin, urobilinogen, bilirubin, leukocyte esterase |
| Blood type tests | 3 | Rh type, ABO group |
| Presence-finding tests | 453 | Hepatitis C virus antibody, Anti-HIV antibody, hepatitis B surface antigen, hepatitis B surface antibody, hepatitis B e-antigen, barbiturate screen, opiate screen, toxoplasma antibody, rubella antibody |
| Pathogenesis tests | 8 | Rapid plasma reagin, venereal disease research laboratory (VDRL), |
Manual validation in unlabeled data.
| Category | Cosine similarity | Euclidean distance | Hybrid method | ||||
| Value | Data | Value | Data | Value | Data | ||
| Correct | 123 (97.6) | 8,592,841 (>99.99) | 122 (96.8) | 8,592,835 (0.49) | 123 (97.6) | 8,592,841 (>99.99) | |
| Incorrect | 3 (2.4) | 140 (<0.01) | 4 (3.2) | 146 (<0.01) | 3 (2.4) | 140 (<0.01) | |
| Correct | 162 (79.8) | 28,747,699 (93.96) | 198 (97.5) | 30,594,572 (>99.99) | 198 (97.5) | 30,594,572 (>99.99) | |
| Incorrect | 41 (20.2) | 1,846,897 (6.04) | 5 (2.5) | 24 (<0.01) | 5 (2.5) | 24 (<0.01) | |
| Correct | 50 (89) | 3,261,963 (>99.99) | 53 (95) | 3,261,994 (>99.99) | 53 (95) | 3,261,994 (>99.99) | |
| Incorrect | 6 (11) | 44 (<0.01) | 3 (5) | 13 (<0.01) | 3 (5) | 13 (<0.01) | |
| Correct | 162,291 (99.68) | 14,788,631 (99.97) | 162,296 (99.69) | 14,788,663 (99.97) | 162,291 (99.68) | 14,788,631 (99.97) | |
| Incorrect | 514 (0.32) | 4021 (0.03) | 509 (0.31) | 3989 (0.03) | 514 (0.32) | 4021 (0.03) | |
| Correct | 4643 (99.61) | 1,944,729 (99.98) | 4638 (99.51) | 1,941,960 (99.84) | 4643 (99.61) | 1,944,729 (99.98) | |
| Incorrect | 18 (0.39) | 283 (0.01) | 23 (0.49) | 3052 (0.16) | 18 (0.39) | 283 (0.01) | |