| Literature DB >> 22511016 |
Guoqian Jiang1, Harold R Solbrig, Christopher G Chute.
Abstract
OBJECTIVE: The objective of this study is to develop an approach to evaluate the quality of terminological annotations on the value set (ie, enumerated value domain) components of the common data elements (CDEs) in the context of clinical research using both unified medical language system (UMLS) semantic types and groups.Entities:
Mesh:
Year: 2012 PMID: 22511016 PMCID: PMC3392855 DOI: 10.1136/amiajnl-2011-000739
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1The linkage between the value set constructs (ie, enumerated value domain, permissible value, value meaning, and enumerated conceptual domain) of a data element and the UMLS semantic groups (with an example data element, 2429490—Fatigue Symptom Prior Surgery Present Ind-3). Note that the ‘Described Value Domain’ is crossed out, as it is not the focus of this paper. The solid lines between the components denote the linkage specified in the ISO/IEC 11179 standard, and the dotted lines denote the linkage between the terminological annotations and their corresponding semantic types and groups.
Figure 2System architecture of the semantic web infrastructure implemented for quality evaluation of common data element (CDE) value set components. caDSR, Cancer Data Standards Repository; OWL, web ontology language; RDF, resource description framework; TSV, tab separated values.
Figure 3A portion of a common data element (CDE) example in the transformed resource description framework (RDF) format.
Figure 4A SPARQL query example that extracts data element name, value domain name, valid value, value meaning, and meaning concepts (ie, terminological annotations by National Cancer Institute (NCI) concept codes) for a specific data element ID (ie, 2429490).
Figure 5A SPARQL query example that extracts the semantic type and group information for a specific National Cancer Institute Thesaurus (NCIt) concept code (ie, C12392|liver). The code P106 denotes semantic type in the NCIt.
Permissible values of a data element ‘Lesion Measurable Evaluation Anatomic Site’ from the selected data element samples for human review
| Valid value | Value meaning | NCIt code (meaning concept) | NCIt concept label | Semantic type | Type code | Semantic group | Group code |
| Liver | Liver | C12392 | Liver | Body Part, Organ, or Organ Component | T023 | Anatomy | ANAT |
| Brain | Brain | C12439 | Brain | Body Part, Organ, or Organ Component | T023 | Anatomy | ANAT |
| Lung | Lung | C12468 | Lung | Body Part, Organ, or Organ Component | T023 | Anatomy | ANAT |
| Bone | Bone | C12366 | Bone | Tissue | T024 | Anatomy | ANAT |
| Skin | Skin | C12470 | Skin | Anatomical Structure | T017 | Anatomy | ANAT |
| Breast | Breast | C12971 | Breast | Body Part, Organ, or Organ Component | T023 | Anatomy | ANAT |
| Other | Other | C17649 | Other | Qualitative Concept | T080 | Concepts & Ideas | CONC |
Each row represents a permissible value, its asserted NCIt concept code, the linkage to the semantic type and code, and the linkage to its corresponding semantic group name and code. Data element: Lesion Measurable Evaluation Anatomic Site (2003735). Value domain: Lesion Anatomic Site (2017124).
NCIt, National Cancer Institute Thesaurus.
Summary of evaluation results for 17 data elements including the dominant semantic group of permissible values and inconsistent code examples for each CDE
| Data element ID | Data element name | Number of asserted NCIt codes | Dominant semantic group | Other semantic group | Inconsistent asserted NCIt code examples |
| 3179024 | Malignant Neoplasm Measurable Disease Evaluation Method Clinical Trial Eligibility Criteria Type | 6 | PROC | PHEN | C17262|X-Ray |
| 2663176 | Template (Object Class) Name Prefix Code | 13 | CONC | LIVB | C25174|Father |
| 2188290 | Disease Response Site Status | 9 | CONC | DISO | C35571|Progressive Disease |
| 2672955 | Preparative Regimen Other Finished Pharmaceutical Product Planned Administered Dose Unit of Measure Name | 49 | CONC | PHYS, DISO, ACTI | C25379|Course, C28245|Inhalation, C67447|Session |
| 2673966 | Glioblastoma Pathology Or Primary Neoplasm Metastatic Neoplasm Status Tumor Status Text Name | 8 | CONC | DISO | C14174|Metastatic |
| 62339 | Gynecologic Malignant Neoplasm Progression Anatomic Site | 13 | ANAT | DISO | C3331|Pleural Effusion, C2885|Ascites |
| 2785775 | Hematopoietic Stem Cell Graft Arrival Facility Shipping Environment Type | 10 | CONC | ANAT, DEVI, CHEM. OBJC, ACTI | Multiple codes |
| 2783910 | New Specimen Order Object Specimen Type Specimen Type Collection Text Type | 64 | ANAT | CONC, OBJC, ACTI, CHEM, PHYS, | Multiple codes |
| 3109755 | National Surgical Adjuvant Breast and Bowel Project Laboratory Procedure Outcome Type | 4 | CHEM | PROC, ANAT | C51951|Platelet Count |
| 2963645 | Bone Sarcoma Or Soft Tissue Sarcoma Disease or Disorder Primary Occurrence Anatomic Site Type | 47 | ANAT | CONC | C63921|Radius, C25253|Multifocal |
| 3197150 | Participant Personal Medical History Cardiac Surgery Type | 4 | PROC | OCCU, ANAT | C17173|Surgery |
| 2784056 | Kidney Biopsy Pathology Surgical Procedure Specimen Procedure | 53 | PROC | OCCU, CONC, ANAT, ACTI, OBJC | C17173|Surgery, C25436|Block |
| 2919627 | Molecular Specimen Class Specimen Class Text Type | 4 | ANAT | CONC, OBJC | C25574|Molecular, C25278|Fluid |
| 3162656 | Study Agent Dosage Unit of Measure Code | 22 | CONC | CHEM, ACTI | C25158|Capsule, C25397|Application |
| 2695092 | Laboratory Procedure IgG Or Total Cytomegalovirus Antibody Laboratory Finding Result | 38 | CONC | DISO, PHEN, PROC, LIVB, ACTI | C38757|Negatvie Finding, C38758|Positive Finding |
| 3012793 | First Electrocardiogram Right Bundle Branch Block Status | 5 | DISO | CONC, LIVB | C17734|At-Risk Population |
| 2755993 | Composition Element Type Composing Element Type | 16 | CHEM | OBJC, LIVB, CONC | Multiple codes |
CDE, common data element; NCIt, National Cancer Institute Thesaurus.
Permissible values of the first example from the data elements identified with inconsistent codes (highlighted in bold and italic)
| Valid value | Value meaning | NCIt code (meaning concept) | NCIt concept label | Semantic type | Type code | Semantic group | Group code |
| Palpatation | Palpation | C16950 | Palpation | Diagnostic Procedure | T060 | Procedures | PROC |
| CT | Computed Tomography | C17204 | Computed Tomography | Diagnostic Procedure | T060 | Procedures | PROC |
| Chest x-ray | Chest Radiography | C38103 | Chest Radiography | Diagnostic Procedure | T060 | Procedures | PROC |
| Spiral CT | Spiral CT | C20645 | Spiral CT | Diagnostic Procedure | T060 | Procedures | PROC |
| MRI | Magnetic Resonance Imaging | C16809 | Magnetic Resonance Imaging | Diagnostic Procedure | T060 | Procedures | PROC |
Data element: Malignant Neoplasm Measurable Disease Evaluation Method Clinical Trial Eligibility Criteria Type (3179024). Value domain: Malignant Neoplasm Measurable Disease Evaluation Method Type (3179022).
NCIt, National Cancer Institute Thesaurus.
Permissible values of the second example from the data elements identified with inconsistent code (highlighted in bold and italic)
| Valid value | Value meaning | NCIt code (meaning concept) | NCIt concept label | Semantic type | Type code | Semantic group | Group code |
| Residual | Residual | C37895 | Residual | Qualitative Concept | T080 | Concepts & Ideas | CONC |
| Recurrent | RECURRENT | C14173 | Recurrent | Qualitative Concept | T080 | Concepts & Ideas | CONC |
| Invasive | Invasive | C14159 | Invasive | Qualitative Concept | T080 | Concepts & Ideas | CONC |
| Primary | PRIMARY | C25251 | Primary | Qualitative Concept | T080 | Concepts & Ideas | CONC |
| Malignant | MALIGNANT | C14143 | Malignant | Qualitative Concept | T080 | Concepts & Ideas | CONC |
| N/A | Not Applicable | C48660 | Not Applicable | Qualitative Concept | T080 | Concepts & Ideas | CONC |
| Benign | BENIGN | C14172 | Benign | Qualitative Concept | T080 | Concepts & Ideas | CONC |
Data element: Glioblastoma Pathology Or Primary Neoplasm Metastatic Neoplasm Status Tumor Status Text Name (2673966). Value domain: Tumor Status Text Name (2231026).
NCIt, National Cancer Institute Thesaurus.