| Literature DB >> 23566239 |
Angelo Restificar1, Ioannis Korkontzelos, Sophia Ananiadou.
Abstract
BACKGROUND: We consider the user task of designing clinical trial protocols and propose a method that discovers and outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documentsD',|D'|≪|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D, D ⊃ D', by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. The appropriateness is measured by the degree to which they are consistent with the user-supplied sample documents D'.Entities:
Mesh:
Year: 2013 PMID: 23566239 PMCID: PMC3618207 DOI: 10.1186/1472-6947-13-S1-S6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Criteria Counts
| count | pct | |
|---|---|---|
| inclusion criteria | 178,178 | 39% |
| exclusion criteria | 284,281 | 61% |
| Total | 462,459 | 100% |
XML tags used to extract PICO data
| XML tags | |
|---|---|
| P | brief_summary, detailed_description condition, official_title, brief_title |
| I | Intervention |
| C | arm_group |
| O | primary_outcome, secondary_outcome |
Illustration of topics (2 out of 5 topics) and corresponding top words
| LDA topic # | top words |
|---|---|
| 1 | female, study, breast, feeding, subjects potential, contraception, childbearing, drug |
| 3 | consent, informed, study, patient subject, inability, comply, give, protocol |
Illustration of topic proportions (5 topics)
| document id | topic1 | topic2 | topic3 | topic4 | topic5 |
|---|---|---|---|---|---|
| NCT00000174e8 | 0.44 | 0.15 | 0.08 | 0.18 | 0.15 |
| NCT00000408e11 | 0.86 | 0.0 | 0.14 | 0.0 | 0.0 |
Feature and Train Size of Logistic Regression Models
| avg # features | avg train size | |
|---|---|---|
| inclusion models | 60,589 | 36,505 |
| exclusion models | 61,368 | 58,105 |
Logistic Regression Average Accuracy (10-fold CV)
| criteria only | criteria+PICO | |
|---|---|---|
| inclusion models | 63.43 | 79.80 |
| exclusion models | 61.95 | 79.51 |
Average Counts
| neighbors | criteria | candidates | |
|---|---|---|---|
| inclusion | 23.79 | 5.03 | 120.12 |
| exclusion | 23.79 | 6.95 | 173.12 |
Inclusion Criteria Results, Exclusion Criteria Results, Number of Topics = 75
| inclusion | LMT | ITR | BFS | RND |
|---|---|---|---|---|
| avg sim | 0.581 | 0.435 | 0.425 | 0.051 |
| normalized sim | 100.00 | 74.823 | 73.107 | 8.806 |
| % vs random | 849.64 | 830.16 | 100.00 |
Exclusion Criteria Results, Number of Topics = 100
| exclusion | LMT | ITR | BFS | RND |
|---|---|---|---|---|
| avg sim | 0.524 | 0.368 | 0.349 | 0.039 |
| normalized sim | 100.00 | 70.173 | 66.481 | 7.358 |
| % vs random | 953.76 | 903.58 | 100.00 |
Figure 1Normalized average similarity (inclusion) for different number of topics.
Figure 2Normalized average similarity (exclusion) for different number of topics.
Figure 3Eligibility Criteria Comparison Tool Snaphot.