| Literature DB >> 30753493 |
Chi Yuan1,2, Patrick B Ryan1,3, Casey Ta1, Yixuan Guo1, Ziran Li1, Jill Hardin3, Rupa Makadia3, Peng Jin1, Ning Shang1, Tian Kang1, Chunhua Weng1.
Abstract
OBJECTIVE: Cohort definition is a bottleneck for conducting clinical research and depends on subjective decisions by domain experts. Data-driven cohort definition is appealing but requires substantial knowledge of terminologies and clinical data models. Criteria2Query is a natural language interface that facilitates human-computer collaboration for cohort definition and execution using clinical databases.Entities:
Keywords: cohort definition; common data model; natural language interfaces to database; natural language processing
Mesh:
Year: 2019 PMID: 30753493 PMCID: PMC6402359 DOI: 10.1093/jamia/ocy178
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.System architecture and data flow of Criteria2Query.
Named entities and attributes recognized by Criteria2Query
| Category | Definition | Examples | |
|---|---|---|---|
| Condition | Conditions are records of a Person suggesting the presence of a disease or medical condition stated as a diagnosis, a sign or a symptom. | ||
| Drug | Drugs are biochemical substances formulated in such ways that when administered to a person it will exert a certain physiological effect. | ||
| Measurement | The standardized examination or testing of a person or person’s sample. | ||
| Procedure | Procedures are activities or processes on the patient to have a diagnostic or therapeutic purpose. | ||
| Observation | Observations are clinical facts about a person obtained in the context of examination, questioning or a procedure. | ||
| Value | Numeric attributes include but not limited to age range, lab test result, etc. | ||
| Temporal | Temporal constraints imposed on clinical diagnoses, drugs, etc. |
Relationships in Criteria2Query
| Relationship | Entity | Attribute | Example |
|---|---|---|---|
| has_temp | Condition|Measurement | Temporal | |
| |Drug|Observation | |||
| |Procedure | |||
| has_value | Demographic| Measurement | Value |
Figure 2.An example of one criterion on ATLAS.
Figure 3.Concept set autogeneration process. AD: Alzheimer’s disease; ICD10 : International Classification of Diseases–Tenth Revision; ICD9CM: International Classification of Diseases–Ninth Revision–Clinical Modification; N: no; Y: yes.
Figure 4.User workflow of Criteria2Query.
Figure 5.The user interface of the Criteria2Query system.
Figure 6.Automatically generated cohort query presented by ATLAS to allow query review, refinement, and execution for patient cohort generation using clinical databases.
The evaluation matrix of criteria representation with 95% confidence intervals
| Evaluation Matrix | Criteria crawled from Clinical Trials.gov (n = 125) | Criteria Entered by Testers (n = 52) | Combined (n = 177) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
| Entity recognition | 0.902 (156/173) [0.844–0.936] | 0.726 (156/215) [0.661–0.777] | 0.804 [0.760–0.841] | 0.899 (62/69) [0.783–0.942] | 0.681 (62/91) [0.571–0.758] | 0.775 [0.694–0.833] | 0.901 (218/242) [0.851–0.930] | 0.712 (218/306) [0.657–0.758] | 0.795 [0.758–0.828] |
| Relation extraction | 0.958 (23/24) [0.792–1.000] | 0.676 (23/34) [0.471–0.794] | 0.793 [0.576–0.867] | 1.00 (10/10) | 0.714 (10/14) [0.357–0.857] | 0.833 [0.526–0.923] | 0.971 (33/34) [0.824–1.000] | 0.688 (33/48) [0.521–0.792] | 0.805 [0.647–0.871] |
| Accuracy | |||||||||
| Negation detection | 0.985 (135/137) [0.942–0.993] | 0.979 (47/48) [0.896-1.000] | 0.984 (182/185) [0.946–0.995] | ||||||
| Logic detection | 0.944 (17/18) [0.722-1.00] | 0.500 (2/4) [0.000–0.750] | 0.864 (19/22) [0.591–0.955] | ||||||
| Entity normalization | 0.447 (51/114) [0.351–0.535] | 0.808 (21/26) [0.577–0.885] | 0.514(72/140) [0.429–0.586] | ||||||
| Attribute normalization | 0.800 (16/20) [0.500–0.900] | 0.778(7/9) [0.222–0.889] | 0.793(23/29) [0.586–0.897] | ||||||
Values are Precision, Recall, F1 score (n/n) [95% confidence interval] or Accuracy [95% confidence interval), unless otherwise indicated.