| Literature DB >> 28663163 |
Marco Duz1, John F Marshall2, Tim Parkin2.
Abstract
BACKGROUND: The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used.Entities:
Keywords: data mining; electronic medical record; text mining; validation studies
Year: 2017 PMID: 28663163 PMCID: PMC5509949 DOI: 10.2196/medinform.7123
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Flowchart summary of the text mining process adopted in the study. The lower portion of the picture summarizes how the final dataset (dark gray) resulted from the subtraction of the exclusion dataset from the inclusion dataset and the final addition of reinclusion dataset obtained from the exclusion dataset.
Figure 2Flowchart summary of data flow of the computer-assisted text mining process for the validation dataset. The columns include either the number of terms (each term is either a word or a combination of words) in each dictionary, cases, or number of times (“repeats”) terms in the relevant dictionary are identified in the dataset.
Sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively) of computer-assisted analysis compared with manual analysis reported (values reported as per cent values). Rows are conditions identified by the software and columns correspond to manual classification.
| Category | Manuala | Sensitivity | Specificity | PPVb | NPVc | |||
| C-Ad | + | − | ||||||
| NSAIDse | ||||||||
| + | 1128 | 19 | 99.8 | 99.9 | 98.3 | 100 | 99.0 | |
| − | 2 | 16410 | ||||||
| Colic | ||||||||
| + | 226 | 13 | 100 | 99.9 | 94.6 | 100 | 100 | |
| − | 0 | 17322 | ||||||
| RFf | ||||||||
| + | 22 | 0 | 100 | 100 | 100 | 100 | 100 | |
| − | 0 | 17539 | ||||||
| RDCg | ||||||||
| + | 7 | 0 | 100 | 100 | 100 | 100 | 100 | |
| − | 0 | 17554 | ||||||
a+/− in the “manual” column identifies the number of positive and negative terms classified manually in each category (Colic, nonsteroidal antiinflammatory drugs [NSAIDs], renal failure [RF], and right dorsal colitis [RDC]).
bPPV: positive predictive values.
cNVP: negative predictive values.
dThe +/− in the “C-A” column identifies the number of positive and negative terms classified with the computer-assisted method for each category.
eNSAIDs: nonsteroidal antiinflammatory drugs.
fRF: renal failure.
gRDC: right dorsal colitis.