| Literature DB >> 32855408 |
Fabrício Kury1, Alex Butler1, Chi Yuan1, Li-Heng Fu1, Yingcheng Sun1, Hao Liu1,2, Ida Sim3, Simona Carini3, Chunhua Weng4.
Abstract
We present Chia, a novel, large annotated corpus of patient eligibility criteria extracted from 1,000 interventional, Phase IV clinical trials registered in ClinicalTrials.gov. This dataset includes 12,409 annotated eligibility criteria, represented by 41,487 distinctive entities of 15 entity types and 25,017 relationships of 12 relationship types. Each criterion is represented as a directed acyclic graph, which can be easily transformed into Boolean logic to form a database query. Chia can serve as a shared benchmark to develop and test future machine learning, rule-based, or hybrid methods for information extraction from free-text clinical trial eligibility criteria.Entities:
Mesh:
Year: 2020 PMID: 32855408 PMCID: PMC7452886 DOI: 10.1038/s41597-020-00620-0
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Annotated eligibility criteria with citations, methods of annotation, coverage, availability and size.
| Citation | Annotation Method | Coverage | Availability | Criteria Count |
|---|---|---|---|---|
| Chondrogiannis | Manual | 87% | Online View Only | 2,000 |
| Tu | Manual | 62% | Methods Only | 1,000 |
| Zhang | Manual | 85% | None | 1,043 |
| Milian | Automated | 18% | Methods Only | 1,773 |
| Lonsdale | Automated | 34% | Methods Only | 1,545 |
| Kang | Automated | 71% | Available Upon Request | 3,619 |
Fig. 1Sample eligibility criterion with associated visual annotation (a), annotation graph (b), and pseudo-SQL query for relational patient database (c).
Total count and percentage of unevaluable criteria using unevaluable entity annotations.
| Entity Type | Count (%; n = 1,606) | Example |
|---|---|---|
| Non-query-able | 557 (34.7%) | |
| Post-eligibility | 425 (26.5%) | |
| Informed_consent | 223 (13.8%) | |
| Pregnancy_considerations | 172 (10.7%) | |
| Parsing_Error | 135 (8.4%) | |
| Non-representable | 120 (7.4%) | |
| Competing_trial | 86 (5.4%) | |
| Context_Error | 61 (3.8%) | |
| Subjective_judgement | 43 (2.7%) | |
| Not_a_criteria | 33 (2.1%) | |
| Undefined_semantics | 21 (1.3%) | |
| Intoxication_considerations | 5 (0.3%) |
Comparison of EliIE and Chia Annotated Datasets.
| Statistic | EliIE | Chia |
|---|---|---|
| Disease Domain | Alzheimer’s | Representative of all diseases |
| No. of Trials | 230 | 1,000 |
| No. of Criteria | 3,619 | 12,409 |
| No. of Annotations | 15,596 | 65,886 |
| No. of Entity Types | 8 | 15 |
| No. of Relationship Types | 3 | 12 |
| Criteria Coverage | 71% | 85.9% |
Most common relationship entities including overall count and percentage of all relationships.
| Relationship | Count | Percent (n = 25,017) |
|---|---|---|
| OR | 4,939 | 19.8% |
| has_value | 3,806 | 15.2% |
| AND | 3,679 | 14.7% |
| has_qualifier | 3,535 | 14.1% |
| has_temporal | 3,336 | 13.3% |
Most common relationship triplets (excluding OR relationships) including overall count and percentage of all relationship triplets.
| Root Type | Relationship | Target Type | Count | Percent (n = 20,078) |
|---|---|---|---|---|
| Measurement | Has_value | Value | 2799 | 13.94% |
| Condition | Has_qualifier | Qualifier | 2445 | 12.18% |
| Condition | Has_temporal | Temporal | 1323 | 6.59% |
| Temporal | Has_index | Reference_point | 889 | 4.43% |
| Procedure | Has_temporal | Temporal | 857 | 4.27% |
| Person | Has_value | Value | 752 | 3.75% |
| Condition | AND | Drug | 645 | 3.21% |
| Condition | Subsumes | Condition | 624 | 3.11% |
| Drug | Has_temporal | Temporal | 532 | 2.65% |
| Condition | AND | Procedure | 514 | 2.56% |
| Procedure | Has_qualifier | Qualifier | 465 | 2.32% |
| Condition | AND | Condition | 459 | 2.29% |
| Condition | AND | Measurement | 408 | 2.03% |
| Condition | Has_negation | Negation | 380 | 1.89% |
| Procedure | AND | Condition | 315 | 1.57% |
Mapping accuracy to OMOP CDM via Usagi per Entity Category.
| Entity Category | Percent of Entities with Confidence Score ≥ 0.70 |
|---|---|
| Condition | 74.9% |
| Procedure | 66.5% |
| Drug | 64.8% |
| Device | 62.1% |
| Person | 61.8% |
| Measurement | 55.2% |
| Observation | 39.8% |
| Visit | 31.3% |
Most common annotated entities by Domain.
| Condition | Qualifier | Drug | Procedure | ||||
|---|---|---|---|---|---|---|---|
| Concept | Count | Concept | Count | Concept | Count | Concept | Count |
| pregnancy | 442 | severe | 326 | systemic corticosteroids | 81 | treatment | 174 |
| allergy | 269 | significant | 117 | medication | 72 | surgery | 99 |
| contraindications | 197 | active | 114 | anticoagulants | 55 | chemotherapy | 81 |
| infection | 129 | other | 112 | prednisone | 49 | radiation therapy | 62 |
| malignancy | 104 | uncontrolled | 106 | antibiotics | 48 | general anesthesia | 58 |
| hypertension | 92 | clinically significant | 83 | study medications | 45 | physical examination | 42 |
| lactation | 90 | chronic | 57 | antidepressants | 40 | cardiac surgery | 41 |
| heart failure | 89 | serious | 55 | aspirin | 39 | contraception | 39 |
| stroke | 88 | symptomatic | 54 | opioids | 39 | intubation | 38 |
| diabetes | 82 | moderate | 47 | vaccine | 36 | transplant | 36 |
| lactating | 82 | acute | 43 | statin | 32 | implantation | 35 |
| myocardial infarction | 81 | elective | 40 | warfarin | 27 | liver transplant | 35 |
| cardiovascular disease | 64 | untreated | 39 | insulin | 27 | dialysis | 34 |
| liver disease | 63 | stable | 38 | rifampin | 27 | hysterectomy | 33 |
| serum creatinine | 77 | age | 577 | breastfeeding | 68 | pacemakers | 18 |
| body mass index | 65 | female | 355 | life expectancy | 64 | intrauterine device | 12 |
| blood pressure | 64 | male | 355 | informed consent | 29 | prosthetic valve | 12 |
| weight | 59 | older | 67 | family history | 18 | prosthetic material | 11 |
| hemoglobin | 57 | adult | 54 | english speaking | 16 | prosthetic mesh | 11 |
| bilirubin | 55 | years | 47 | smoking | 15 | contraceptive implant | 10 |
| systolic blood pressure | 52 | children | 32 | childbearing potential | 13 | drug-eluting stent | 9 |
| diastolic blood pressure | 52 | patients | 16 | alcohol abuse | 9 | metal implants | 9 |
| pregnancy test | 48 | prisoners | 13 | evidence | 8 | device | 8 |
| platelet count | 45 | smokers | 7 | nursing | 7 | cochlear implants | 8 |
| creatinine clearance | 44 | infants | 6 | contraception | 7 | condom | 7 |
| ast [aspartate aminotransferase] | 43 | newborns | 5 | lactating | 6 | joint prosthesis | 7 |
| hba1c [hemoglobin a1c] | 41 | donor | 5 | last vaccination intervals | 6 | aneurysm clips | 6 |
| alt [alanine aminotransferase] | 41 | liver transplant recipients | 5 | suspected | 6 | metal in the body | 6 |
| asa [american society of anesthesiologists] | 40 | adolescents | 5 | sexually active | 6 | bare-metal stent | 5 |
Examples of Scope objects in Chia (contained on Scope object).
| Trial Number | Inc/Exc | Line | Sample Criterion |
|---|---|---|---|
| NCT02781610 | Exclusion | 5 | …worsening lower respiratory symptoms (e.g.,, |
| NCT02596555 | Exclusion | 13 | …strong inhibitors of P-glycoprotein like |
| NCT00650312 | Inclusion | 4 | …judged normal and healthy during a pre-study medical evaluation ( |
| NCT01373684 | Exclusion | 13 | …immunodeficiency syndromes (e.g., |
| NCT02531971 | Inclusion | 2 | …including tobacco products (e.g., |
Fig. 2Comparisons of Chia annotation model to previous annotation efforts using identical sample eligibility criteria text. (a) EliIE annotation model proposed by Kang et al., (b) hepatitis C trials outlined by Zhang et al., (c) ERGO annotation model proposed by Tu et al.
Examples of subsumes relationships in Chia (parent entity and subsumed entity).
| Trial Number | Inc/Exc | Line | Sample Criterion |
|---|---|---|---|
| NCT00050349 | Inclusion | 2 | |
| NCT00094861 | Exclusion | 7 | |
| NCT00182520 | Inclusion | 2 | …open label trial of one the following SRI’s…and demonstrating a |
| NCT00343668 | Exclusion | 10 | …significant |
| Measurement(s) | Clinical Trial Eligibility Criteria • Analytical Procedure Accuracy |
| Technology Type(s) | digital curation • computational modeling technique |
| Sample Characteristic - Organism | Homo sapiens |