| Literature DB >> 31313661 |
Jiang Han1,2, Ken Chen3, Lei Fang3, Shaodian Zhang3,4, Fei Wang3,5, Handong Ma3,6, Liebin Zhao2,7, Shijian Liu1,2.
Abstract
BACKGROUND: The growing interest in observational trials using patient data from electronic medical records poses challenges to both efficiency and quality of clinical data collection and management. Even with the help of electronic data capture systems and electronic case report forms (eCRFs), the manual data entry process followed by chart review is still time consuming.Entities:
Keywords: case report form; electric medical records; electronic data capture; field research; natural language processing
Year: 2019 PMID: 31313661 PMCID: PMC6672807 DOI: 10.2196/13331
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Workflow of the natural language processing–driven medical information extraction system. EMR: electronic medical record; NLP: natural language processing; eCRF: electronic case report form.
Figure 2Electronic case report form design for congenital heart disease.
Figure 3Electronic case report form design for pneumonia.
Figure 4Graphic user interface for electronic case report form (eCRF) data entry.
Average accuracy for electronic case report form data entry.
| Type of disease and data element | NLPa only | Manual group (median, IQRb) | NLP-MIESc group (median, IQR) | Logarithmic mean difference (95% CI) | Ratio of change in geometric mean (95% CI) | ||
| True-false Id | 97.50 | 79.17 (66.74, 84.17) | 96.81 (95.69, 97.29) | 0.41 (0.04 to 0.79) | 1.51 (1.03 to 2.20) | .04 | |
| True-false IIe | 92.00 | 95.39 (92.67, 95.89) | 97.78 (97.19, 98.44) | 0.21 (–0.01 to 0.10) | 1.10 (0.99 to 1.24) | .10 | |
| Multiple choice | 89.33 | 82.80 (73.13, 85.83) | 95.00 (94.58, 97.42) | 0.29 (0.10 to 0.49) | 1.34 (1.10 to 1.63) | .009 | |
| Fill-in-the-blank | 94.17 | 96.33 (95.25, 97.00) | 97.00 (95.83, 97.42) | 0.01 (–0.01 to 0.02) | 1.01 (0.99 to 1.02) | .22 | |
| Overall | 92.77 | 90.42 (87.75, 92.68) | 97.17 (96.83, 97.44) | 0.14 (0.03 to 0.25) | 1.15 (1.04 to 2.20) | .03 | |
| True-false I | 88.00 | 70.83 (65.25, 77.75) | 88.17 (87.25, 89.00) | 0.30 (0.11 to 0.50) | 1.35 (1.11 to 1.65) | .009f | |
| True-false II | 94.44 | 91.25 (88.26, 93.78) | 95.83 (95.21, 96.81) | 0.11 (0.01 to 0.21) | 1.12 (1.01 to 1.23) | .04 | |
| Multiple choice | 80.83 | 67.50 (50.21, 72.50) | 81.25 (77.92, 85.00) | 0.33 (0.14 to 0.52) | 1.39 (1.15 to 1.68) | .003f | |
| Overall | 84.15 | 84.21 (80.53, 86.23) | 92.19 (91.49, 93.20) | 0.17 (0.06 to 0.28) | 1.18 (1.06 to 1.32) | .008 | |
aNLP: natural language processing.
bIQR: interquartile range.
cNLP-MIES: NLP-driven medical information extraction system.
dTrue-false I: data elements retrieved from admissions records.
eTrue-false II: data elements retrieved from imaging reports (ultrasonic cardiogram or chest x-ray).
fIndependent group t test.
Average elapsed time for electronic case report form data entry.
| Type of disease and data element | Manual group seconds (median, IQRa) | NLP-MIESb group seconds (median, IQR) | Logarithmic mean difference (95% CI) | Ratio of change in geometric mean (95% CI) | ||
| True-false Ic | 26.43 (21.43, 30.24) | 13.84 (11.83, 16.06) | –0.71 (–1.02 to –0.39) | 0.49 (0.36 to 0.68) | <.001 | |
| True-false IId | 49.48 (43.08, 51.44) | 35.47 (31.34, 38.63) | –0.29 (–0.46 to –0.11) | 0.75 (0.63 to 0.89) | .003 | |
| Multiple choice | 9.70 (10.61, 12.29) | 7.34 (7.47, 8.55) | –0.36 (–0.53 to –0.19) | 0.70 (0.59 to 0.82) | <.001 | |
| Fill-in-the-blank | 18.41 (17.35, 19.60) | 12.38 (11.38, 14.70) | –0.34 (–0.50 to –0.17) | 0.71 (0.60 to 0.84) | <.001 | |
| Overall | 103.79 (94.59, 109.39) | 69.73 (60.91, 79.66) | –0.40 (–0.55 to –0.25) | 0.67 (0.58 to 0.78) | <.001 | |
| True-false I | 28.71 (25.61, 32.61) | 15.82 (14.36, 16.88) | –0.64 (–0.97 to –0.30) | 0.53 (0.38 to 0.74) | .001 | |
| True-false II | 31.59 (28.29, 32.49) | 25.22 (22.07, 28.80) | –0.19 (–0.35 to –0.03) | 0.83 (0.71 to 0.97) | .02 | |
| Multiple choice | 11.02 (10.65, 12.05) | 8.61 (8.05, 9.25) | –0.33 (–0.51 to –0.15) | 0.72 (0.60 to 0.86) | .001 | |
| Overall | 73.28 (65.80, 74.47) | 49.42 (44.33, 53.88) | –0.37 (–0.53 to –0.21) | 0.69 (0.59 to 0.81) | <.001 | |
aIQR: interquartile range.
bNLP-MIES: NLP-driven medical information extraction system.
cTrue-false I: data elements retrieved from admissions records.
dTrue-false II: data elements retrieved from imaging reports (ultrasonic cardiogram or chest x-ray).
Error analysis for natural language processing–driven medical information extraction system–supported data entry.
| Types | Errors, n (%) | |||
| True-false (n=1167) | Multiple choice (n=439) | Fill-in-the-blank (n=121) | Total (N=1727) | |
| Errors with modification | 325 (27.85) | 158 (36.00) | 16 (13.22) | 499 (28.89) |
| Errors without modification | 842 (72.15) | 281 (64.01) | 105 (86.78) | 1228 (71.11) |