| Literature DB >> 25750622 |
Naveen Ashish1, Arthur W Toga1.
Abstract
This paper presents a system for declaratively transforming medical subjects' data into a common data model representation. Our work is part of the "GAAIN" project on Alzheimer's disease data federation across multiple data providers. We present a general purpose data transformation system that we have developed by leveraging the existing state-of-the-art in data integration and query rewriting. In this work we have further extended the current technology with new formalisms that facilitate expressing a broader range of data transformation tasks, plus new execution methodologies to ensure efficient data transformation for disease datasets.Entities:
Keywords: Alzheimer's disease datasets; data integration; data mapping; query rewriting
Year: 2015 PMID: 25750622 PMCID: PMC4335467 DOI: 10.3389/fninf.2015.00001
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
GAAIN common data model.
| GAAIN.SUBJECT(SUBJECT_ID,AGE,SEX,RACE,ETHNIC,COUNTRY) |
| GAAIN.HEALTH(SUBJECT_ID,DIAGNOSIS_M0,DIAGNOSIS_M6,…. |
| DIAGNOSIS_M48, APOE) |
| GAAIN.COGNITIVE(SUBJECT_ID,MMSE_M0,MMSE_M6, ….,CDR_MO,CDR_M6, …….) |
Figure 1(A) Data Transformation. (B) Transformer.
Uniform staging representation.
| STAGING_ADNI_PATIENT_RACE(SUBJECT_ID,VCODE,RACE) |
| STAGING_ADNI_PATIENT_ETHNIC(SUBJECT_ID,VCODE,ETHNIC) |
| STAGING_ADNI_PATIENT_MMSE(SUBJECT_ID,VCODE,MMSE) |
Domain model.
| ADNI_RACE(SUBJECT_ID,VCODE,RACE) | SELECT SUBJECT_ID, SEX FROM |
| ADNI_ETHNIC(SUBJECT_ID,VCODE,ETHNIC) | GAAIN.SUBJECT |
| ADNI_GENDER(SUBJECT_ID,VCODE,GENDER) | |
| SELECT SUBJECT, PTGENDER as “M” | |
| GAAIN.SUBJECT(SUBJECT_ID,AGE,SEX,RACE,ETHNIC,COUNTRY) | FROM ADNI_PTGENDER |
| WHERE PTGENDER=1 | |
| GAAIN.SUBJECT(SUBJECT_ID,“M,”) | UNION |
| ADNI_GENDER(SUBJECT_ID,VCODE,GENDER) ∧ (GENDER=1) | SELECT SUBJECT, PTGENDER as “F” |
| GAAIN.SUBJECT(SUBJECT_ID,“F,”) | FROM ADNI_PTGENDER |
| ADNI_GENDER(SUBJECT_ID,VCODE,GENDER)∧ (GENDER=2) | WHERE PTGENDER=2 |
| GAAIN.SUBJECT(SUBJECT_ID,“U,”) | UNION |
| ADNI_GENDER(SUBJECT_ID,VCODE,GENDER)∧ (GENDER=9) | SELECT SUBJECT, PTGENDER as “U” |
| FROM ADNI_PTGENDER | |
| WHERE PTGENDER=9 |
Transformation rules.
| |
| ∧ |
| … |
| |
| … |
Iterative rules.
| MMSE_M0,MMSE_M6,MMSE_M12,MMSE_M18, | |
| MMSE_M24,MMSE_M36,MMSE_M48] | |
Rule sets.
| MERGE INTO GAAIN.ASSESSMENT(SUBJECT_ID,MMSE_M0) | |
| KEY(SUBJECT_ID) | |
| SELECT SUBJECT_ID,MMSE from ADNI_MMSE WHERE VISIT=1 | |
| …. | |
| MMSE_M0,MMSE_M6,MMSE_M12,MMSE_M18, | MERGE INTO GAAIN.ASSESSMENT(SUBJECT_ID,MMSE_M48) |
| MMSE_M24,MMSE_M36,MMSE_M48] | KEY(SUBJECT_ID) |
| SELECT SUBJECT_ID,MMSE from ADNI_MMSE WHERE VISIT=7 | |
Transformation script.
| RuleSet RS=<T,K,RS> |
| Transformation SQL Script |
| for each rule r ε R |
| ER |
| M |
| rewriter |
| for each rule r ε ER |
| H |
| V |
| RWQ |
| sqlQ |
| TABLE+T+VALUES(K,V), key(K)+RWQ |
| finalScript |
| return finalScript |
Congruent rules.
| G.SEX(SUB_ID,SEX) | G.SEX(SUB_ID,SEX) |
| G.RACE(SUB_ID,RACE) | G.RACE(SUB_ID,RACE) |
Effort optimization.
| Baseline | 21 h | 20 h |
| Iterative rules | 16 h | 17 h |
| Uniform staging + congruent rules | 15 h | 5 h |
Figure 2Merge vs. join execution times.