| Literature DB >> 33887457 |
Yingcheng Sun1, Alex Butler2, Latoya A Stewart3, Hao Liu1, Chi Yuan1, Christopher T Southard3, Jae Hyun Kim1, Chunhua Weng4.
Abstract
Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as "informed consent", "exclusivity of participation" were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction.Entities:
Keywords: COVID-19; Clinical trial; Eligibility criteria; Machine readable dataset; Structured text corpus
Year: 2021 PMID: 33887457 PMCID: PMC8079156 DOI: 10.1016/j.jbi.2021.103790
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Fig. 1Overview of the structured dataset generation framework.
Fig. 2Hierarchical trial annotation model with four layers: study cohort layer, eligibility criteria layer, named entity layer and standard concept layer.
Example of a trial with three cohort groups.
| 1 | NCT04494893_exposed | Cohort 1. Exposed to coronavirus disease |
| 2 | NCT04494893_active | Cohort 2. Infected with coronavirus disease |
| 3 | NCT04494893_recovered | Cohort 3. Recovered from coronavirus disease |
Examples of four particular types of criteria.
| Non-query-able | |
| Informed consent | |
| Competing trial | |
| Post-eligibility |
Entity and attribute with different categories and examples.
| Condition | Patients with | |
| Observation | ||
| Drug | Has received treatment with | |
| Measurement | ||
| Procedure | requires | |
| Person | If | |
| Visit | Patients | |
| Device | Has a | |
| Value | SpO2 | |
| Temporal | Has had plasmapheresis | |
| Qualifier | ||
| Reference_point | within two weeks of a | |
| Mood | Patients who | |
| Negation | ||
| Multiplier |
Examples of entity and standard concept.
| # | ||||
|---|---|---|---|---|
| 1 | acute hepatic failure | Condition | acute hepatic failure | 4026032 |
| 2 | discharged from hospital | Observation | discharged from hospital | 4084843 |
| 3 | drug addiction | Condition | Dependent drug abuse | 4275756 |
| 4 | CPAP | Device | Continuous positive airway pressure (cpap) device | 2616666 |
| 5 | shortness of breath | Condition | Dyspnea | 312437 |
| 6 | solid tumor | Condition | Neoplasm | 4030314 |
Fig. 3Semi-automatic eligibility criteria annotation and normalization.
Fig. 4Example of the eligibility criteria annotations on Brat tool.
A short segment of the dictionary.
| GI perforation | Condition | 4202064 | Gastrointestinal perforation |
| encephalitis | Condition | 378143 | Encephalitis |
| oxygen saturation (pulse oximetry) | Measurement | 4310328 | Blood oxygen saturation |
| individual NEWS parameters | Measurement | 44808684 | National early warning score |
| hematopoietic stem cell transplant | Procedure | 4120445 | Hemopoietic stem cell transplant |
| transthoracic echocardiogram | Procedure | 4335825 | Transthoracic echocardiography |
The dictionary is a lookup table that saves all the manually corrected mappings and will be used for concept mapping before applying Usagi. It ensures the mappings updated by domain experts will be applied first in order to preserve the quality of annotations. The update of the dictionary follows an iteratively incremental learning process and all inappropriate mappings in previous annotations are avoided in mapping annotations of new trials.
Statistical information of 700 trials: 6.A Study Type, 6.B Study Allocation, 6.C Study Phase, 6.D Trial Locations.
| Interventional | 503 | 71.86% |
| Observational | 153 | 21.86% |
| Observational [Patient Registry] | 31 | 4.43% |
| Expanded Access | 13 | 1.86% |
Total count and percentage (%) of annotated entities in 8 domains.
| Condition | 6,614 | 36.42% |
| Observation | 4,243 | 23.36% |
| Drug | 2,179 | 12.00% |
| Measurement | 2,112 | 11.63% |
| Procedure | 1,364 | 7.51% |
| Person | 1,008 | 5.55% |
| Visit | 470 | 2.59% |
| Device | 171 | 0.94% |
Total count and percentage of annotated entities in 7 attributes.
| Value | 2,832 | 36.57% |
| Temporal | 1,803 | 23.29% |
| Qualifier | 1,613 | 20.83% |
| Reference_point | 537 | 6.94% |
| Mood | 520 | 6.72% |
| Negation | 380 | 4.91% |
| Multiplier | 58 | 0.75% |
Total count and percentage of annotated entities in 11 relationships.
| OR | 3,155 | 19.19% |
| has_value | 2,910 | 17.70% |
| has_temporal | 2,651 | 16.12% |
| has_qualifier | 1,599 | 9.72% |
| AND | 1,549 | 9.42% |
| has_context | 1,529 | 9.30% |
| subsumes | 1,382 | 8.40% |
| has_mood | 572 | 3.48% |
| has_index | 529 | 3.22% |
| has_negation | 509 | 3.1% |
| has_multiplier | 58 | 0.35% |
Total count and percentage of the four particular types of criteria.
| Non-query-able | 1,112 | 50.32% |
| Informed consent | 478 | 21.63% |
| Competing trial | 331 | 13.08% |
| Post eligibility | 289 | 13.08% |
The count (#) of most common (top 10) concepts mapped in each domain.
| 37311061 | Disease caused by severe acute respiratory syndrome coronavirus 2 | 660 | 4188893 | History of | 523 | 21605200 | Corticosteroids | 78 | 37310255 | Detection of 2019 novel coronavirus using polymerase chain reaction technique | 192 |
| 4299535 | Patient currently pregnant | 372 | 40218805 | CDC laboratory | 148 | 21602457 | Prolactine inhibitors | 66 | 4146380 | Alanine aminotransferase measurement | 118 |
| 437663 | Fever | 133 | 4185135 | Breastfeeding | 124 | 1777087 | Hydroxychloroquine | 53 | 4263457 | Aspartate aminotransferase measurement | 114 |
| 312437 | Dyspnea | 118 | 36685445 | on room air at rest | 91 | 21603891 | IMMUNOSUPPRESSANTS | 52 | 4310328 | Blood oxygen saturation | 96 |
| 254761 | Cough | 104 | 37310260 | Close exposure to 2019 novel coronavirus infection | 52 | 21601386 | ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS | 43 | 4233883 | Ratio of arterial oxygen tension to inspired oxygen fraction | 86 |
| 439727 | Human immunodeficiency virus infection | 83 | 4120014 | Polymerase chain reaction | 43 | 2718732 | Immunosuppressive drug not otherwise classified | 42 | 4267147 | Platelet count | 74 |
| 316866 | Hypertensive disorder | 68 | 45766517 | Confirmatory technique | 40 | 1507835 | Vasopressin (USP) | 41 | 4313591 | Respiratory rate | 72 |
| 255573 | Chronic obstructive lung disease | 66 | 4244251 | Confirmed by | 40 | 40171288 | tocilizumab | 36 | 44806420 | Estimation of glomerular filtration rate | 69 |
| 443392 | Malignant neoplastic disease | 61 | 4289014 | Normal breast feeding | 35 | 1792515 | Chloroquine | 30 | 3027315 | Oxygen [Partial pressure] in Blood | 66 |
| 4212484 | Multiple organ failure | 60 | 4142947 | Symptomatic | 33 | 1309944 | Amiodarone | 23 | 44789311 | Pregnancy test | 60 |
| 4230167 | Artificial respiration | 151 | 4265453 | age | 567 | 38004515 | Hospital | 298 | 4139525 | High flow oxygen nasal cannula | 28 |
| 4239130 | Oxygen therapy | 123 | 442986 | female | 200 | 38004311 | Inpatient Hospice | 29 | 4138614 | BiPAP oxygen nasal cannula | 16 |
| 4052536 | Extracorporeal membrane oxygenation | 71 | 442985 | Male | 100 | 38004519 | Home Health Agency | 21 | 2614925 | Cannula nasal | 15 |
| 4032243 | Dialysis procedure | 67 | 4323831 | old | 34 | 32037 | Intensive Care | 19 | 2616666 | Continuous positive airway pressure (cpap) device | 14 |
| 4202832 | Intubation | 60 | 4046779 | Adult | 23 | 9201 | Inpatient Visit | 13 | 4145528 | Nonrebreather oxygen mask | 12 |
| 4273629 | Chemotherapy | 59 | 4119673 | year | 7 | 38004522 | Department Store | 10 | 4030875 | Cardiac pacemaker | 6 |
| 44790095 | Invasive ventilation | 51 | 1332764 | children | 9 | 38004284 | Psychiatric Hospital | 8 | 4232657 | Vascular stent | 5 |
| 4208341 | Solid organ transplant | 45 | 4305451 | Infant | 3 | 8717 | Inpatient Hospital | 8 | 4234106 | Metal periosteal implant | 4 |
| 37018292 | Continuous renal replacement therapy | 37 | 42073776 | Newborn | 5 | 8676 | Nursing Facility | 8 | 4148006 | Epidural catheter | 3 |
| 40486624 | Noninvasive positive pressure ventilation | 34 | 2090691 | Mothers | 3 | 9202 | Outpatient Visit | 8 | 45760696 | Spinal catheter | 3 |
Instance-level and token-level agreement rates of entities in different types and the average (arithmetic mean for all trials).
| Measurement | 0.874 | 0.887 |
| Condition | 0.832 | 0.845 |
| Person | 0.79 | 0.830 |
| Drug | 0.781 | 0.825 |
| Visit | 0.765 | 0.792 |
| Procedure | 0.728 | 0.746 |
| Observation | 0.601 | 0.718 |
| Device | 0.565 | 0.714 |
| 0.809 | 0.825 |
Instance-level and token-level agreement rates of attributes in different types and the average (arithmetic mean for all trials).
| Temporal | 0.713 | 0.752 |
| Value | 0.927 | 0.954 |
| Negation | 0.653 | 0.693 |
| Qualifier | 0.66 | 0.741 |
| Multiplier | 0.678 | 0.764 |
| Reference_point | 0.344 | 0.372 |
| Mood | 0.624 | 0.643 |
| 0.72 | 0.798 |