| Literature DB >> 20920176 |
Svetlana Kiritchenko1, Berry de Bruijn, Simona Carini, Joel Martin, Ida Sim.
Abstract
BACKGROUND: Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs).Entities:
Mesh:
Year: 2010 PMID: 20920176 PMCID: PMC2954855 DOI: 10.1186/1472-6947-10-56
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Target trial characteristics (information elements)
| Element | Description |
|---|---|
| Eligibility criteria | logical conditions for being included in the trial, usually split into inclusion and exclusion criteria |
| Sample size | the total number of participants actually enrolled (randomized) in the trial |
| Start date of enrolment | date the enrolment actually started, including day, month, year or as much as presented |
| End date of enrolment | date the enrolment actually ended, including day, month, year or as much as presented |
| Name of experimental treatment | name of experimental intervention |
| Name of control treatment | name of control intervention |
| Dose | dosage of experimental/control intervention |
| Frequency of treatment | frequency of administration of experimental/control intervention |
| Route of treatment | route of administration of experimental/control intervention |
| Duration of treatment | duration of administration of experimental/control intervention |
| Primary outcome name | the outcome(s) of greatest importance, where outcome is a "component of a participant's clinical and functional status after an intervention has been applied, that is used to assess the effectiveness of an intervention" (source: Glossary of Terms in the Cochrane Collaboration) |
| Primary outcome time point | point in time when a primary outcome was assessed |
| Secondary outcome name | outcome(s) used to evaluate additional effects of the intervention deemed a priori as being less important than the primary outcomes (source: Glossary of Terms in the Cochrane Collaboration) |
| Secondary outcome time point | point in time when a secondary outcome was assessed |
| Funding organization name | name of a funding source |
| Funding number | funding grant number |
| Early stopping | whether the trial was stopped earlier |
| Registration identifier of trial | trial registration ID, often ClinicalTrials.gov NCT number |
| Author name | first and last name of the first author |
| Date of publication | year the article was published |
| DOI | digital object identifier for the publication |
Figure 1Example of an abstract and the corresponding template filling. The top part of the figure shows the abstract of a journal article regarding an RCT. The bottom part shows the template with the slots filled in with text excerpts from the abstract. Some slots are left empty as the information is not present in the abstract.
Figure 2Example of the system's output. The publication details of an article are retrieved directly from PubMed. For other information elements, the system outputs five best candidate sentences in decreasing order of confidence. The text fragments identified by the system as containing the target information are highlighted in the retrieved sentences whose confidence score is above a certain threshold. For eligibility criteria, the whole sentence is considered the target, so no fragments are highlighted in those sentences. Sentences in black were confirmed as correct answers by the field expert.
Figure 3Hierarchy of information elements. The hierarchy of the 21 information elements used in ExaCT. The elements are grouped semantically in the four-level hierarchical structure to enhance the sentence classification method.
Figure 4Interactive user interface for curation. The user interface consists of two panels simultaneously displayed on a screen: the left panel displays the system's suggestions and the right panel displays the original article. A button next to each sentence on the left panel highlights the same sentence within the article on the right panel. The information elements are divided into five tabs: publication information, meta information, enrolment, interventions, and outcomes. For each element, the top-scored solution with a high confidence score or a "not found" message is initially displayed. A user can expand the list of the system's suggestions to the five highest scoring solutions and choose one or more sentences as the most relevant to the element. If a target sentence is not among the system's five choices, a curator can add a sentence directly from the article by copying-and-pasting or dragging-and-dropping it into the corresponding element text box, or by right-clicking and selecting the relevant element from the menu. Automatically extracted fragments are highlighted in the sentences. The highlighting can be modified by a curator using the mouse.
Sentence level performance of ExaCT
| Information element | at least one relevant answer per article | all answers | ||||
|---|---|---|---|---|---|---|
| # of articles with expert's answers | top5 recall | # of sentences with expert's answers | top5 sentence recall | |||
| precision | recall | |||||
| Eligibility criteria | 50 | 0.78 | 0.78 | 0.98 | 133 | 0.77 |
| Sample size | 50 | 0.77 | 0.68 | 0.84 | 52 | 0.83 |
| Start date of enrolment | 37 | 0.97 | 0.86 | 0.86 | 37 | 0.86 |
| End date of enrolment | 37 | 0.91 | 0.81 | 0.89 | 37 | 0.89 |
| Name of experimental treatment | 50 | 0.86 | 0.86 | 0.98 | 56 | 0.95 |
| Name of control treatment | 50 | 0.86 | 0.86 | 1.00 | 54 | 0.96 |
| Dose | 49 | 0.81 | 0.78 | 0.98 | 72 | 0.90 |
| Frequency of treatment | 41 | 0.80 | 0.78 | 1.00 | 55 | 0.95 |
| Route of treatment | 37 | 0.86 | 0.81 | 0.95 | 39 | 0.95 |
| Duration of treatment | 41 | 0.74 | 0.76 | 0.93 | 44 | 0.91 |
| Primary outcome name | 48 | 0.66 | 0.69 | 0.88 | 52 | 0.81 |
| Primary outcome time point | 31 | 0.53 | 0.61 | 0.81 | 33 | 0.76 |
| Secondary outcome name | 43 | 0.69 | 0.79 | 0.98 | 49 | 0.88 |
| Secondary outcome time point | 26 | 0.69 | 0.69 | 0.88 | 29 | 0.83 |
| Funding organization name | 40 | 0.47 | 0.50 | 0.72 | 42 | 0.74 |
| Funding number | 5 | 0.31 | 0.80 | 0.80 | 5 | 0.80 |
| Early stopping | 2 | 0.33 | 1.00 | 1.00 | 2 | 1.00 |
| Registration identifier of trial | 31 | 1.00 | 0.94 | 0.94 | 31 | 0.94 |
| Author name | 50 | 0.98 | 0.98 | 0.98 | 50 | 0.98 |
| Date of publication | 50 | 0.98 | 0.98 | 0.98 | 50 | 0.98 |
| DOI | 48 | 1.00 | 0.98 | 0.98 | 48 | 0.98 |
| 816 | 970 | |||||
Fragment level performance of ExaCT
| Information element | # of expert's fragments | exact match | partial match | ||
|---|---|---|---|---|---|
| precision | recall | precision | recall | ||
| Eligibility criteria | 103 | 1.00 | 1.00 | 1.00 | 1.00 |
| Sample size | 46 | 0.89 | 0.87 | 0.89 | 0.87 |
| Start date of enrolment | 32 | 1.00 | 1.00 | 1.00 | 1.00 |
| End date of enrolment | 31 | 1.00 | 1.00 | 1.00 | 1.00 |
| Name of experimental treatment | 54 | 0.72 | 0.54 | 0.97 | 0.72 |
| Name of control treatment | 55 | 0.83 | 0.80 | 0.89 | 0.85 |
| Dose | 103 | 0.91 | 0.90 | 0.96 | 0.97 |
| Frequency of treatment | 70 | 0.91 | 0.87 | 0.99 | 0.93 |
| Route of treatment | 53 | 0.94 | 0.92 | 0.94 | 0.92 |
| Duration of treatment | 45 | 0.84 | 0.91 | 0.86 | 0.93 |
| Primary outcome name | 38 | 0.97 | 0.97 | 0.97 | 0.97 |
| Primary outcome time point | 33 | 0.90 | 0.79 | 0.97 | 0.85 |
| Secondary outcome name | 43 | 0.93 | 0.88 | 1.00 | 1.00 |
| Secondary outcome time point | 25 | 0.72 | 0.72 | 0.92 | 0.92 |
| Funding organization name | 45 | 0.90 | 0.98 | 0.90 | 0.98 |
| Funding number | 7 | 1.00 | 1.00 | 1.00 | 1.00 |
| Early stopping | 2 | 1.00 | 1.00 | 1.00 | 1.00 |
| Registration identifier of trial | 29 | 1.00 | 1.00 | 1.00 | 1.00 |
| Author name | 49 | 1.00 | 1.00 | 1.00 | 1.00 |
| Date of publication | 49 | 1.00 | 1.00 | 1.00 | 1.00 |
| DOI | 47 | 1.00 | 1.00 | 1.00 | 1.00 |
| 959 | |||||
Performance of the entire IE system
| Information element | fully correct solution | partially correct solution | incorrect solution | |||
|---|---|---|---|---|---|---|
| total | sentence selection only | change in highlighting | sentence adding | |||
| Eligibility criteria | 0.08 | 0.90 | 0.50 | 0.00 | 0.40 | 0.02 |
| Sample size | 0.56 | 0.28 | 0.04 | 0.24 | 0.02 | 0.16 |
| Start date of enrolment | 0.88 | 0.02 | 0.02 | 0.00 | 0.00 | 0.10 |
| End date of enrolment | 0.82 | 0.10 | 0.06 | 0.04 | 0.00 | 0.08 |
| Name of experimental treatment | 0.38 | 0.60 | 0.04 | 0.56 | 0.02 | 0.02 |
| Name of control treatment | 0.62 | 0.38 | 0.08 | 0.30 | 0.02 | 0.00 |
| Dose | 0.50 | 0.48 | 0.10 | 0.32 | 0.10 | 0.02 |
| Frequency of treatment | 0.60 | 0.40 | 0.02 | 0.34 | 0.06 | 0.00 |
| Route of treatment | 0.74 | 0.22 | 0.04 | 0.18 | 0.00 | 0.04 |
| Duration of treatment | 0.58 | 0.36 | 0.10 | 0.26 | 0.02 | 0.06 |
| Primary outcome name | 0.58 | 0.30 | 0.12 | 0.10 | 0.08 | 0.12 |
| Primary outcome time point | 0.42 | 0.46 | 0.22 | 0.22 | 0.02 | 0.12 |
| Secondary outcome name | 0.60 | 0.38 | 0.20 | 0.14 | 0.06 | 0.02 |
| Secondary outcome time point | 0.56 | 0.38 | 0.12 | 0.24 | 0.04 | 0.06 |
| Funding organization name | 0.38 | 0.40 | 0.26 | 0.14 | 0.00 | 0.22 |
| Funding number | 0.80 | 0.18 | 0.18 | 0.00 | 0.00 | 0.02 |
| Early stopping | 0.92 | 0.08 | 0.08 | 0.00 | 0.00 | 0.00 |
| Registration identifier of trial | 0.96 | 0.00 | 0.00 | 0.00 | 0.00 | 0.04 |
| Author name | 0.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 |
| Date of publication | 0.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 |
| DOI | 0.98 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 |
The numbers represent the proportion of the 50 test articles for which a fully correct, a partially correct, or no correct solution has been found by the IE engine.