| Literature DB >> 19208192 |
Isaac Kunz1, Ming-Chin Lin, Lewis Frey.
Abstract
BACKGROUND: This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories.Entities:
Mesh:
Year: 2009 PMID: 19208192 PMCID: PMC2646244 DOI: 10.1186/1471-2105-10-S2-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1caBIG™ UML model. This is an example of a portion of a UML model of a system available in caBIG™. The model describes the classes and attributes of the system and information about function and relationships. Note the class Race has attributes id (a string identifier of a race) and raceDesc (a string description of a race). This Race class is mapped to a CDE within caBIG™ to give a semantic definition and allow reuse of this type of data element.
Figure 2Visual representation of several caBIG™ UML models. An example of several UML models available in caBIG™ for reuse.
Figure 3UML and ISO/IEC11179. The mapping of UML elements to the ISO 11179 Common Data Elements (CDE) within the caDSR. UML Class maps to Object Class, UML Attribute to Property, and UML Data Type to Property. Object Class and Property components of the Data Element Concept are then mapped to Terminology concepts stored in EVS.
caBIG™ projects/application sizes – 66 UML projects. caBIG™ enabled projects/models used in this research with their corresponding UML element size (class-attribute pairs)
| Bioconductor 1 | 75 |
| BiospecimenCoreResource 1 | 286 |
| BRIDG 1 | 343 |
| C3PR 1.1 | 58 |
| C3PR 2 | 185 |
| CaAERS 1 | 308 |
| CaArray 2 | 440 |
| CaArray_1.1 | 318 |
| caBIO 4 | 302 |
| CaElmir 1 | 174 |
| caFE Server 2 | 80 |
| caGrid 1 | 14 |
| caIntegrator 2 | 271 |
| caIntegrator 2.1 | 328 |
| Caisis 3.5 | 67 |
| caNano 1 | 150 |
| caNanoLab | 621 |
| Cancer Models Database 2.0 | 242 |
| Cancer Models Database 2.1 | 272 |
| Cancer Molecular Pages 1 | 152 |
| CAP Cancer Checklists 1 | 194 |
| caTIES 1.0 | 38 |
| caTIES 2.0 | 219 |
| caTISSUE CAE 1.2 | 284 |
| caTISSUE Core 1 | 287 |
| caTissue_Core 1.1 | 327 |
| caTissue_Core_1_2 | 329 |
| caTissue_Core_caArray 1 | 329 |
| caTRIP Annotation Engine 1 | 63 |
| CaTRIP Tumor Registry 1 | 114 |
| CDC NCPHI Proof of Concept .1 | 9 |
| CGWB 1 | 91 |
| ChemBank 1 | 19 |
| Clinical Trials Lab Model 1 | 84 |
| Clinical Trials Object Data System (CTODS) .53 | 434 |
| CoCaNUT 1 | 244 |
| CTMS Metadata Project 1 | 51 |
| DemoService 1 | 4 |
| DSD 1 | 31 |
| GeneConnect 1 | 59 |
| GenePattern 1 | 88 |
| Generic Image 1 | 39 |
| Genomic Identifiers 1 | 12 |
| geworkbench 1 | 80 |
| GoMiner 1 | 69 |
| Grid-enablement of Protein Information Resource (PIR) 1.1 | 183 |
| Grid-enablement of Protein Information Resource (PIR) 1.2 | 200 |
| LabKey CPAS Client API 2.1 | 364 |
| LexBIG 2.2 | 206 |
| MicroArray Gene Expression Object Model (Mage-OM) 1 | 140 |
| NCI-60 Drug 1 | 124 |
| NCI-60 SKY 1 | 109 |
| NCIA_Model 3 | 110 |
| NHLBI 1 | 772 |
| Organism Identification 1 | 10 |
| PathwayInteractionDatabase 1 | 59 |
| Patient Study Calendar 2 | 67 |
| Potential CDEs for Reuse 1 | 185 |
| ProteomicsLIMS 1 | 200 |
| Reactome Database Sharing 1 | 83 |
| RProteomics 1 | 40 |
| Seed 1 | 17 |
| SNP500Cancer 1 | 29 |
| TobaccoInformaticsGrid 1 | 15 |
| Training Models 1 | 37 |
| Transcription Annotation Prioritization and Screening System 1 | 92 |
Figure 4Per project dice vs dynamic. Total percentage of "Gold Standard" matches per cumulative rank per project
Figure 5Combined project dice vs dynamic. Total percentage of "Gold Standard" matches per cumulative rank for all "RELEASED" CDEs.
Per project percentages. Percentage of "Gold Standard" mappings correct in cumulative rankings. For example Dice had 85.1% of the "Gold Standard" mappings returned in the top 5 results.
| Dice | 58.4 | 85.1 | 91.8 | 96.6 | 98.3 |
| Dynamic | 56.3 | 82.6 | 89.5 | 95.4 | 97.8 |
Combined project percentages. Percentage of "Gold Standard" mappings correct in cumulative rankings. For example Dice had 72.1% of the "Gold Standard" mappings returned in the top 5 results.
| Dice | 45.1 | 72.1 | 79.5 | 86.1 | 90.2 |
| Dynamic | 47.6 | 70.9 | 78.6 | 85.3 | 89.6 |
Figure 6Dice per project. This graph shows 20 of the 66 projects mapped to a restricted set of CDEs using the Dice algorithm. Restriction is made by only mapping to corresponding CDEs as indicated in caDSR.
Dice caTissue CORE caArray and proteomics LIMS. caTissue CORE caArray and Proteomics LIMS percentage of "Gold Standard" mappings correct in cumulative rankings. Differences in mapping scores illustrate various levels of UML class-attribute alignments with CDE class-properties.
| CaTissue_Core_caArray | 77.2 | 98.5 | 99.7 | 99.7 | |
| ProteomicsLIMS | 62 | 91.5 | 95 | 97.5 |
Difficult matches. caTissue and ProtLIMS UML class-attribute compared to CDE class-property pairs are shown here where the dice algorithm scored lower than expected. Reduced performance of the algorithms tends to occur when abbreviations and synonyms appear. For example ProtLIMS gel2d is used in UML to represent 2 dimensional electrophoresis gel.
| distribute id item | distribution identifier specimen | label sample | specimen tracer |
| biohazard id | biohazardous identifier substance | identification sample | name specimen |
| csm id user user | common identifier module security user user | gel2d id sample | 2 dimensional electrophoresis gel identifier |
| id site | identifier site | id plate plate sample sample | identifier microplate |
| check check event id out parameter | identifier object parameter present remove status | gel2d identification | 2 dimensional electrophoresis gel name |
| numb participant security social | participant ssn | id log log sample sample | identifier log quantity specimen |
| container id storage | identifier storage unit | file file id lim lim | file identifier information laboratory management system |
| audit event id user | audit event login name | id sample sample type type | identifier specimen type |
| date start user | begin date user | id raw sample sample | identifier raw specimen |