| Literature DB >> 22335391 |
Mats Martinell1, Jan Stålhammar, Johan Hallqvist.
Abstract
INTRODUCTION: Electronic medical records (EMRs) enable analysis of health care data by using data mining techniques to build research databases. Though the reliability of the data extraction process is crucial for the credibility of the final analysis, there are few published validations of this process. In this paper we validate the performance of an automated data mining tool on EMR in a primary care setting.Entities:
Mesh:
Year: 2012 PMID: 22335391 PMCID: PMC3282243 DOI: 10.3109/03009734.2011.653015
Source DB: PubMed Journal: Ups J Med Sci ISSN: 0300-9734 Impact factor: 2.384
The CXP arranged extracted data into nine tables that correspond to the nine modules in the original EMR. Data in the top eight tables are extracted from structural data, whereas “Terminology” is extracted from narrative data.
| Name of table | Data to be customized for second step in the extraction process |
|---|---|
| Contacts | Type of contact registration (e.g. doctor, nurse, telephone, administrative), user ID, and date or time interval of contact |
| Diagnosis | Name, code according to ICD –9 –10, and date or time interval of diagnosis |
| Biometrics | Weight, height, and BMI sorted by date |
| Documents | Referrals and other documentation sorted by date |
| Drugs | Prescriptions (name, ATC code, iteration, and dosage) sorted by date |
| Biochemical analysis | Analyses, values, and units sorted by date |
| Measurement | Biometric measurements documented outside the table “Terminology” |
| Patients | Gender, age, alive/dead |
| Terminology |
Figure 1.Flow chart for the procedure of selecting personal identification numbers (PIN) to assess the congruity of CXP extracted data to the original EMRs. In step B, the PIN already selected in step A was excluded. In step C, PINs from step A and step B were excluded. Altogether 3,045 data items were compared.
Figure 2.Venn diagrams for data extraction by CXP and by eXtractor, illustrating distribution of inclusion criteria (ICD-10 code, ATC code, laboratory value) for each patient. Patients extracted by Pygargus CXP (n = 445) and by eXtractor (n = 433).
Figure 3.Data items (n = 3,045) compared to the original EMR distributed over nine CXP modules and by inclusion (ATC code, laboratory values, or ICD code).