| Literature DB >> 35156001 |
Sophia Z Shalhout1,2, Farees Saqlain2, Kayla Wright1, Oladayo Akinyemi1, David M Miller1,2,3.
Abstract
OBJECTIVE: To develop a clinical informatics pipeline designed to capture large-scale structured Electronic Health Record (EHR) data for a national patient registry.Entities:
Keywords: EHR; Merkel Cell Carcinoma; R statistical software; REDCap; patient registries; rare tumor registry
Year: 2022 PMID: 35156001 PMCID: PMC8827011 DOI: 10.1093/jamiaopen/ooab118
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1.Schematic overview of the multi-institutional MCCPR. The EHR-R-REDCap pipeline is implemented at each site (eg Site A–D) to allow for rapid remapping and transformation of structured data for import into REDCap from a variety of sources using several MCCPR-driven R statistical software scripts and registry-driven R packages. This pipeline augments the manual data abstraction implemented for the capture of nonstructured EHR data. Data collection and single-site analysis are streamlined at each institution. The multisite aggregated data will be hosted on Project Data Sphere’s (PDS) open source platform.
Figure 2.Schematic overview of the eLAB clinical informatics pipeline. eLAB is designed to take as input csv data from several EHR/EDW sources and may be adapted for other site-specific inputs. Once delimited data are assigned as R object “dt,” a single-line command is used to transform the data into the REDCap-ready registry configuration for import based on source type (eg ehr_reformat(), or edw_reformat()). eLAB is designed to de-identify the patient names/medical record numbers (MRNs) with registry-specific record identification (record_id) numbers. Furthermore, 300 subtypes of laboratory tests (ehr_labs) are remapped into 35 registry fields (mcc_labs). Once imported into the registry via an output csv file or using REDCap API token, single-line commands are used for outcomes research and analysis (baselabs_os()), as well as remapping for interoperability with standardized SNOMED or LOINC code (loinc()). LOINC remapping is an optional feature provided by eLAB to allow the MCCPR data to be linked to other non-MCCPR clinical research efforts/registries that may utilize LOINC. Using a lookup table dependent on the MCCPR data dictionary, eLAB remaps 1 LOINC code per 1 data dictionary field variable name only after the data has successfully been cleaned, remapped, transformed, and imported.
Proof-of-concept univariable model (N = 176, Hazard Ratios with a P value < 0.05 shown in bold)
| Overall survival | ||
|---|---|---|
| Laboratory tests | Hazard ratio (95% CI) | |
| Electrolytes/renal/glucose | ||
| Sodium (na) | 0.9332 (0.82–1.06) | .300 |
| Potassium (k) | 1.7750 (0.72–4.40) | .215 |
| Chloride (cl) | 0.9146 (0.82–1.02) | .097 |
| Carbon dioxide (co2) | 0.9600 (0.84–1.1) | .560 |
| Blood urea nitrogen (bun) |
|
|
| Creatinine (cre) |
|
|
| Estimated glomerular filtration rate (gfr) |
|
|
| Blood glucose (glu) | 1.0030 (0.99–1.01) | .575 |
| Anion gap (anion) | 1.112 (0.96–1.29) | .149 |
| General chemistries | ||
| Albumin (alb) | 0.7204 (0.18–2.92) | .646 |
| Total bilirubin (tbili) | 0.1340 (0.01–1.90) | .137 |
| Calcium (ca) | 0.8072 (0.38–1.70) | .574 |
| Total protein (tp) | 0.6961 (0.20–2.38) | .563 |
| Liver function tests | ||
| Alanine aminotransferase (sgpt) | 0.9798 (0.92–1.05) | .535 |
| Aspartate aminotransferase (sgot) | 1.0078 (0.93–1.08) | .827 |
| Alkaline phosphatase (alkp) | 1.0116 (0.99–1.03) | .122 |
| Globulin (glob) | 1.0763 (0.35–3.29) | .897 |
| Hematological studies | ||
| White blood cells (wbc) | 0.9891 (0.94–1.04) | .661 |
| Red blood cells (rbc) | 0.5838 (0.26–1.32) | .197 |
| Hemoglobin (hgb) | 0.9562 (0.74–1.23) | .726 |
| Hematocrit (hct) | 0.9897 (0.90–1.10) | .836 |
| Mean corpuscular volume (mcv) |
|
|
| Mean corpuscular hemoglobin (mch) | 1.1878 (0.95–1.48) | .129 |
| Mean corpuscular hemoglobin conc. (mchc) | 0.9195 (0.70–1.21) | .551 |
| Platelet count (plt) | 0.9949 (0.99–1.00) | .127 |
| Mean platelet volume (mpv) | 1.1619 (0.82–1.65) | .405 |
| Red cell distribution width (rdw) | 1.1377 (0.89–1.46) | .312 |
| Percent neutrophils (%) (neut) | 0.9926 (0.96–1.02) | .615 |
| Absolute neutrophil count (anc) | 0.8880 (0.61–1.29) | .537 |
| Percent lymphocytes (%) (lymp) | 1.0884 (0.98–1.04) | .555 |
| Absolute lymphocyte count (alc) | 0.9955 (0.95–1.04) | .837 |
| Percent monocytes (%) (mon) |
|
|
| Absolute monocyte count (amc) |
|
|
| Percent eosinophils (%) (eosp) | 0.8099 (0.58–1.13) | .217 |
| Absolute eosinophil Count (aec) | 0.0468 (0.0003–6.46) | .223 |
| Percent basophils (%) (basop) | 0.3741 (0.08–1.77) | .214 |
| Absolute basophil count (abc) | 0.00012 (2e-12–7297) | .324 |
| Percent granulocytes, immature (immgranp) | 0.4491 (0.03–6.28) | .552 |
| Absolute immature granulocytes Count (agc) | 0.0005 (4e-16–6.7e8) | .593 |
| Percent nucleated RBCs (nrbc) | 3.8e-16 (0–∞) | .997 |
| Neutrophil to lymphocyte ratio (nlr) | 1.034 (0.78–1.37) | .815 |
| Lactate dehydrogenase (ldh) | 1.002 (0.99–1.02) | .707 |
Figure 3.eLAB and the data dictionary to harmonize multi-institutional data aggregation. The data dictionary, once uploaded into REDCap, creates the lab capture system or the “Labs instrument.” eLAB is designed to reformat and normalize laboratory values and units that are bulk-pulled from the EHR. eLAB transforms the data into a format predefined by the data dictionary and its associated variable codes. eLAB performs the primary data cleansing steps. However, one final quality check is also utilized at the very end during import of data. If an attempt is made to import any data that is incorrectly reformatted (ie numerical value for a date field that is not in the acceptable M/D/Y format), an error is set off during the final stage of import by the REDCap scaffold that is associated with the preconfigured design of the data dictionary. Data with errors will fail to import into REDCap. The REDCap data import tool will display the data that does not conform to the configuration designated by the data dictionary, and it will be flagged, alongside error messages that provide guidance on how to resolve the issue. The data will have to be re-evaluated and corrected. Only when the data are free of errors and all issues are resolved, a successful upload/import into REDCap can occur. With every site in the multi-institutional registry utilizing eLAB and the exact same data dictionary, aggregation of the data is straightforward and only requires combining/appending the outputs of each site together (ie each site will combine individual “.csv file” into one large multisite “.csv file” for the final aggregated data). Multisite data aggregation is facilitated by each participating site (1) utilizing eLAB for transformation and normalization with (2) the accompanying data dictionary.