| Literature DB >> 27570667 |
Andrew R Post1, Akshatha K Pai1, Richard Willard1, Bradley J May1, Andrew C West1, Sanjay Agravat1, Stephen J Granite2, Raimond L Winslow2, David S Stephens1.
Abstract
Clinical and Translational Science Award (CTSA) recipients have a need to create research data marts from their clinical data warehouses, through research data networks and the use of i2b2 and SHRINE technologies. These data marts may have different data requirements and representations, thus necessitating separate extract, transform and load (ETL) processes for populating each mart. Maintaining duplicative procedural logic for each ETL process is onerous. We have created an entirely metadata-driven ETL process that can be customized for different data marts through separate configurations, each stored in an extension of i2b2 's ontology database schema. We extended our previously reported and open source Eureka! Clinical Analytics software with this capability. The same software has created i2b2 data marts for several projects, the largest being the nascent Accrual for Clinical Trials (ACT) network, for which it has loaded over 147 million facts about 1.2 million patients.Entities:
Year: 2016 PMID: 27570667 PMCID: PMC5001768
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Eureka! Clinical Analytics component diagram.
Figure 3.Eureka! internal data model, showing various entities and their properties (blue boxes), and references between entities (black lines).
Figure 4.Extended i2b2 metadata schema for use with Eureka!, showing four ontology tables that each have one additional column as compared with i2b2’s standard ontology tables, the standard TABLE_ACCESS table supporting concept lookup, and the Eureka!-specific EK_MODIFIER_INTERP table (see text for details).
Figure 5.Mappings from the Eureka! internal data model to facts and the dimension tables in i2b2’s star schema.
Ongoing and completed project data (2011 - present).
| Project name | # of users | Data source | Data target (and software version, where applicable) | Count of phenotypes computed | Count of patients loaded | Count of facts loaded (including phenotypes) | Project status |
|---|---|---|---|---|---|---|---|
| Hospital readmissions analysis | 11 | Emory Clinical Data Warehouse (inpatient discharges from 10/2006-3/2011) | Tab-delimited file | 7,886,868 | 149,514 | 41,031,515[ | Completed |
| Hospital readmissions analysis | 11 | UHC Clinical Database (inpatient discharges from 10/2006-3/2011) | Tab-delimited file | 593,428,440 | 11,794,310 | 1,685,673,683[ | Completed |
| Lymphoma data registry | 4 | Emory Clinical Data Warehouse | i2b2 1.7.05 | 0 | 4,870 | 13,086,071[ | Completed |
| Lung cancer data registry | 4 | Emory Clinical Data Warehouse | i2b2 1.7.05 | 0 | 1,554 | 310,330[ | Completed |
| Cardiovascular Research Grid | 34 | MIMIC-II (all encounters) | i2b2 1.7.05 | 13,196 | 32,074 | 6,446,413[ | Production |
| Quantitative Imaging Network | 6 | NLST data files (all patients) | Neo4j 2.2.2 | 0 | 53,452 | 374,164[ | Development |
| NCATS ACT | 42 | Emory Clinical Data Warehouse (all discharges 1/2012-5/2015) | i2b2 1.7.05 | 0 | 1,153,320 | 147,345,659[ | Beta testing |
number of non-null values in the output file,
number of rows in the i2b2 project’s observation_fact table,
number of nodes in the graph database
Figure 6.Eureka! data flow illustrated for three ETL processes for loading data into i2b2 (ACT Network, local copy of UHC Clinical Database, and local copy of MIMIC-II Database). Ont=Ontology; CRC=Clinical Research Chart; PM=Project Management.