| Literature DB >> 29888090 |
Alex Butler1, Wei Wei1, Chi Yuan1, Tian Kang1, Yuqi Si2, Chunhua Weng1.
Abstract
Much effort has been devoted to leverage EHR data for matching patients into clinical trials. However, EHRs may not contain all important data elements for clinical research eligibility screening. To better design research-friendly EHRs, an important step is to identify data elements frequently used for eligibility screening but not yet available in EHRs. This study fills this knowledge gap. Using the Alzheimer's disease domain as an example, we performed text mining on the eligibility criteria text in Clinicaltrials.gov to identify frequently used eligibility criteria concepts. We compared them to the EHR data elements of a cohort of Alzheimer's Disease patients to assess the data gap by usingthe OMOP Common Data Model to standardize the representations for both criteria concepts and EHR data elements. We identified the most common SNOMED CT concepts used in Alzheimer 's Disease trials, andfound 40% of common eligibility criteria concepts were not even defined in the concept space in the EHR dataset for a cohort of Alzheimer 'sDisease patients, indicating a significant data gap may impede EHR-based eligibility screening. The results of this study can be useful for designing targeted research data collection forms to help fill the data gap in the EHR.Entities:
Year: 2018 PMID: 29888090 PMCID: PMC5961795
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 2.The eight-step workflow of this study.
Figure 3.The process of deriving SNOMED CT terms from clinical trial eligibility criteria. 42,131 clinical entitieswere extracted from the eligibility criteria of 1,587 clinical trials. A simplified list of 4,260 clinical entities wasgenerated following manual review and filtration, and this list was mapped first to 3,294 UMLS concepts, and then to1,991 SNOMED CT variables, of which 304 variables occur in more than 1% of all trials (i.e., 15 trials).
Manual revision of clinical entities.
| Types of Revision | Example | Times |
|---|---|---|
| Formatting; Typo | delerium -> delirium | 207 |
| Formatting; Plural | cancers -> cancer | 253 |
| Formatting; removal of non-informative words | heart rate measurement -> heart rate | 364 |
| Formatting; removal of abbreviations | absolute neutrophil count (ANC) -> absolute neutrophil count | 1768 |
| Simplification | asthmatic conditions -> asthma | 573 |
| Breaking down long phrases to logically-connected single phrases | basal or squamous cell carcinoma -> basal cell carcinoma or squamous cell carcinoma | 445 |
The most commonly adopted eligibility criteria variables and their prevalence in AD trials (the last column with column header as “#” indicates the number of parent concepts)
| SNOMED-CT Concept Representation for Commonly Adopted Eligibility Variables | SNOMED_ID | Prevalence | Type of | level | Parent_SNOMED ID | # |
|---|---|---|---|---|---|---|
| Clinical finding | 404684003 | 97.09% | finding | 1 | 138875005 | 1 |
| Disease | 64572001 | 94.25% | disorder | 2 | 404684003 | 1 |
| Mental disorder | 74732009 | 82.21% | disorder | 3 | 64572001 | 1 |
| Disorder of brain | 81308009 | 79.50% | disorder | 3 | 64572001 | 1 |
| Organic mental disorder | 111479008 | 74.74% | disorder | 4 | 74732009, 81308009 | 2 |
| Dementia | 52448006 | 74.60% | disorder | 5 | 111479008 | 1 |
| Cerebral degeneration presenting primarily with dementia | 279982005 | 64.62% | disorder | 3 | 64572001 | 1 |
| Clinical history and observation findings | 250171008 | 64.55% | finding | 2 | 404684003 | 1 |
| Alzheimer’s disease | 26929004 | 64.29% | disorder | 6 | 52448006, 279982005 | 2 |
| Staging and scales | 254291000 | 60.65% | staging scale | 1 | 138875005 | 1 |
| Assessment scales | 273249006 | 60.65% | assessment scale | 2 | 254291000 | 1 |
| Procedure | 71388002 | 58.33% | procedure | 1 | 138875005 | 1 |
| Observable entity | 363787002 | 51.19% | observable entity | 1 | 138875005 | 1 |
| Mini-mental state examination | 273617000 | 46.63% | assessment scale | 3 | 273249006 | 1 |
| Qualifier value | 362981000 | 45.24% | qualifier value | 1 | 138875005 | 1 |
| General finding of observation of patient | 118222006 | 41.14% | finding | 3 | 250171008 | 1 |
| Presenile dementia | 12348006 | 39.62% | disorder | 6 | 52448006 | 1 |
| Disorder of cardiovascular system | 49601007 | 39.55% | disorder | 3 | 64572001 | 1 |
| Psychological finding | 116367006 | 38.96% | finding | 3 | 250171008 | 1 |
| Mental state, behavior and/or psychosocial function finding | 384821006 | 38.96% | finding | 4 | 116367006 | 1 |
| Disorder of nervous system | 118940003 | 35.78% | disorder | 3 | 64572001 | 1 |
| Current chronological age | 424144002 | 34.06% | observable entity | 3 | 105727008 | 1 |
| Age AND/OR growth period | 105727008 | 34.06% | observable entity | 2 | 363787002 | 1 |
| Disorder of blood vessel | 27550009 | 33.33% | disorder | 4 | 49601007 | 1 |
| Evaluation procedure | 386053000 | 33.33% | procedure | 2 | 71388002 | 1 |
| Disorder of body system | 362965005 | 32.41% | disorder | 3 | 64572001 | 1 |
| Cerebrovascular disease | 62914000 | 32.28% | disorder | 5 | 27550009 | 1 |
| Magnetic resonance imaging | 113091000 | 31.88% | procedure | 2 | 71388002 | 1 |
| Disorder by body site | 123946008 | 30.16% | disorder | 3 | 64572001 | 1 |
| Procedure by method | 128927009 | 25.79% | procedure | 2 | 71388002 | 1 |
| Mood disorder | 46206005 | 25.73% | disorder | 4 | 74732009 | 1 |
| Substance abuse | 66214007 | 25.66% | disorder | 3 | 64572001 | 1 |
| Descriptor | 272099008 | 24.80% | qualifier value | 2 | 362981000 | 1 |
| Cerebrovascular accident | 230690007 | 24.54% | disorder | 6 | 62914000 | 1 |
| Global assessment of functioning -1993 Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition adaptation | 284061009 | 23.94% | assessment scale | 3 | 273249006 | 1 |
| Systemic disease | 56019007 | 23.35% | finding | 4 | 118222006 | 1 |
| General body state finding | 82832008 | 22.49% | finding | 4 | 118222006 | 1 |
| Impaired cognition | 386806002 | 21.43% | finding | 5 | 384821006 | 1 |
| System disorder of the nervous system | 230226000 | 21.16% | disorder | 4 | 118940003 | 1 |
| Movement disorder | 60342002 | 21.16% | disorder | 5 | 230226000 | 1 |
| Extrapyramidal disease | 76349003 | 21.16% | disorder | 6 | 60342002 | 1 |
| Disorder of head | 118934005 | 21.03% | disorder | 4 | 123946008 | 1 |
| Depressive disorder | 35489007 | 21.03% | disorder | 5 | 46206005 | 1 |
The counts of trials containing each SNOMED CT semantic type.
| SNOMED-CT Semantic Type | Trial Count | Prevalence in Trials |
|---|---|---|
| Disorder | 1425 | 94.25% |
| Finding | 1072 | 70.90% |
| Assessment scale | 917 | 60.65% |
| Staging scale | 917 | 60.65% |
| Procedure | 882 | 58.33% |
| Observable entity | 774 | 51.19% |
| Qualifier value | 684 | 45.24% |
| Situation | 250 | 16.53% |
| Physical object | 231 | 15.28% |
| Attribute | 163 | 10.78% |
| Linkage concept | 163 | 10.78% |
| Body structure | 154 | 10.19% |
| Metadata | 105 | 6.94% |
| Morphologic abnormality | 125 | 8.27% |
| Mother | 56 | 3.70% |
| Substance | 21 | 1.39% |
| Regime/therapy | 33 | 2.18% |
| Environment | 19 | 1.26% |
| Environment/location | 19 | 1.26% |
| Event | 17 | 1.12% |
| Organism | 15 | 0.99% |
The top 20 common SNOMED CT terms in AD trials and their prevalence in EHR dataset.
| SNOMED CT Term | SNOMED-CT ID | Trial Count | Prevalence in Trials | Count of usesin EHR data for AD patients |
|---|---|---|---|---|
| 26929004 | 972 | 64.29% | 30,262 | |
| 273617000 | 705 | 46.63% | 0 | |
| 12348006 | 599 | 39.62% | 7,089 | |
| 64572001 | 555 | 36.71% | 12,029,900 | |
| 424144002 | 515 | 34.06% | 0 | |
| 74732009 | 499 | 33.00% | 505,870 | |
| 113091000 | 482 | 31.88% | 63,171 | |
| 230690007 | 371 | 24.54% | 4 | |
| 284061009 | 361 | 23.88% | 0 | |
| 56019007 | 353 | 23.35% | 0 | |
| 118940003 | 335 | 22.16% | 780,478 | |
| 66214007 | 279 | 18.45% | 9,466 | |
| 49049000 | 275 | 18.19% | 0 | |
| 386806002 | 260 | 17.20% | 13,375 | |
| 128613002 | 240 | 15.87% | 28,586 | |
| 421961002 | 218 | 14.42% | 4,686 | |
| 191526005 | 216 | 14.29% | 40777 | |
| 417662000 | 207 | 13.69% | 189,543 | |
| 386414004 | 205 | 13.56% | 0 | |
| 273367002 | 204 | 13.49% | 0 |
The count of SNOMED CT variables from the “master list” in the five categories.
| Category Description | Example | Categories | Total Count |
|---|---|---|---|
| In EHR, categorical variables | 132 | 181 (60%) | |
| In EHR, continuous variables | 40 | ||
| Not in EHR, can be derived | 9 | ||
| Not in EHR, answerable by patient | 59 | 123 (40%) | |
| Not in EHR, not answerable by patient | 34 | ||
| Not applicable | 30 |