| Literature DB >> 26073888 |
Siddhartha R Jonnalagadda1, Pawan Goyal2, Mark D Huffman3.
Abstract
BACKGROUND: Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.Entities:
Mesh:
Year: 2015 PMID: 26073888 PMCID: PMC4514954 DOI: 10.1186/s13643-015-0066-7
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Data elements, category, sources and existing automation work
| Data element | Category | Included in standards | Published method to extract? |
|---|---|---|---|
| Total number of participants | Participants | Cochrane, PICO, PECODR, PIBOSO, STARD | Yes [ |
| Settings | Participants | Cochrane, CONSORT, STARD | No |
| Diagnostic criteria | Participants | Cochrane, STARD | No |
| Age | Participants | Cochrane, STARD | Yes [ |
| Sex | Participants | Cochrane, STARD | Yes [ |
| Country | Participants | Cochrane | Yes [ |
| Co-morbidity | Participants | Cochrane, STARD | Yes [ |
| Socio-demographics | Participants | Cochrane, STARD | No |
| Spectrum of presenting symptoms, current treatments, recruitment centers | Participants | STARD | Yes [ |
| Ethnicity | Participants | Cochrane | Yes [ |
| Date of study | Participants | Cochrane | Yes [ |
| Date of recruitment and follow-up | Participants | CONSORT, STARD | No |
| Participant sampling | Participants | STARD | No |
| Total number of intervention groups | Intervention | Cochrane | Yes [ |
| Specific intervention | Intervention | Cochrane, PICO, PIBOSO, PECODR | Yes [ |
| Intervention details (sufficient for replication, if feasible) | Intervention | Cochrane, CONSORT | Yes [ |
| Integrity of intervention | Intervention | Cochrane | No |
| Outcomes and time points (i) collected; (ii) reported | Outcomes | Cochrane, CONSORT, PICO, PECODR, PIBOSO | Yes [ |
| Outcome definition (with diagnostic criteria if relevant) | Outcomes | Cochrane | No |
| Unit of measurement (if relevant) | Outcomes | Cochrane | No |
| For scales: upper and lower limits, and whether high or low score is good | Outcomes | Cochrane | No |
| Comparison | Comparisons | PICO, PECODR | Yes [ |
| Sample size | Results | Cochrane, CONSORT | Yes [ |
| Missing participants | Results | Cochrane | No |
| Summary data for each intervention group (e.g. 2 × 2 table for dichotomous data; means and SDs for continuous data) | Results | Cochrane, PECODR, STARD | No |
| Estimate of effect with confidence interval; | Results | Cochrane | No |
| Subgroup analyses | Results | Cochrane | No |
| Adverse events and side effects for each study group | Results | CONSORT, STARD | No |
| Overall evidence | Interpretation | CONSORT | Yes [ |
| Generalizability: external validity of trial findings | Interpretation | CONSORT | Yes [ |
| Research questions and hypotheses | Objectives | CONSORT, PECODR, PIBOSO, STARD | Yes [ |
| Reference standard and its rationale | Method | STARD | No |
| Technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard | Method | STARD | No |
| Study design | Method | Cochrane, PIBOSO | Yes [ |
| Total study duration | Method | Cochrane, PECODR | Yes [ |
| Sequence generation | Method | Cochrane | Yes [ |
| Allocation sequence concealment | Method | Cochrane | Yes [ |
| Blinding | Method | Cochrane, CONSORT, STARD | Yes [ |
| Methods used to generate random allocation sequence, implementation | Method | CONSORT, STARD | Yes [ |
| Other concerns about bias | Method | Cochrane | No |
| Methods used to compare groups for primary outcomes and for additional analyses | Method | CONSORT, STARD | No |
| Methods for calculating test reproducibility | Method | STARD | No |
| Definition and rationale for the units, cutoffs and/or categories of the results of the index tests and reference standard | Method | STARD | No |
| Number, training, and expertise of the persons executing and reading the index tests and the reference standard | Method | STARD | No |
| Participant flow: flow of participants through each stage: randomly assigned, received intended treatment, completed study, analyzed for primary outcome, inclusion and exclusion criteria | Method | CONSORT | Yes [ |
| Funding source | Miscellaneous | Cochrane | No |
| Key conclusions of the study authors | Miscellaneous | Cochrane | Yes [ |
| Clinical applicability of the study findings | Miscellaneous | STARD | No |
| Miscellaneous comments from the study authors | Miscellaneous | Cochrane | No |
| References to other relevant studies | Miscellaneous | Cochrane | No |
| Correspondence required | Miscellaneous | Cochrane | No |
| Miscellaneous comments by the review authors | Miscellaneous | Cochrane | No |
Fig. 1Process of screening the articles to be included for this systematic review
A summary of included extraction methods and their evaluation
| Study | Extracted elements | Dataset | Method | Sentence/Concept/Neither | Full text (F)/Abstract (A) | Results |
|---|---|---|---|---|---|---|
| Dawes et al. (2007) [ | PECODR | 20 evidence-based medicine journal synopses (759 extracts from the corresponding PubMed abstracts) | Proposed potential lexical patterns and assessed using NVIvo software | Neither | Abstract | Agreement among the annotators was 86.6 and 85 %, which rose up to 98.4 and 96.9 % after consensus. No automated system. |
| Kim et al. (2011) [ | PIBOSO | 1000 medical abstracts (PIBOSO corpus) | Conditional random fields with various features based on lexical, semantic, structural and sequential information | Sentence | Abstract | Micro-averaged F-scores on structured and unstructured: 80.9 and 66.9 %, 63.1 % on an external dataset |
| Boudin et al. (2010) [ | PICO (I and C were combined together) | 26,000 abstracts from PubMed, first sentences from the structured abstract | Combination of multiple supervised classification algorithms: random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) | Sentence | Abstract | F-score of 86.3 % for P, 67 % for I (and C), and 56.3 % for O |
| Huang et al. (2011) [ | PICO (except C) | 23,472 sentences from the structured abstracts | naïve Bayes | Sentence | Abstract | F-measure of 0.91 for patient/problem, 0.75 for intervention, and 0.88 for outcome |
| Verbeke et al. (2012) [ | PIBOSO | PIBOSO corpus | Statistical relational learning with kernels, kLog | Sentence | Abstract | Micro-averaged F of 84.29 % on structured abstracts and 67.14 % on unstructured abstracts |
| Huang et al. (2013) [ | PICO (except C) | 19,854 structured abstracts of randomized controlled trials | First sentence of the section or all sentences in the section, NB classifier | Sentence | Abstract | First sentence of the section: F-scores for P: 0.74, I: 0.66, and O: 0.73 |
| All sentences in the section: F-scores for P: 0.73, I: 0.73, and O: 0.74 | ||||||
| Hassanzadeh et al. (2014) [ | PIBOSO (Population-Intervention-Background-Outcome-Study Design-Other) | PIBOSO corpus, 1000 structured and unstructured abstracts | CRF with discriminate set of features | Sentence | Abstract | Micro-averaged F-score: 91 |
| Robinson (2012) [ | Patient-oriented evidence: morbidity, morality, symptom severity, quality of life | 1356 PubMed abstracts | SVM, NB, multinomial NB, logistic regression | Sentence | Abstract | Best results achieved via SVM: F-measure of 0.86 |
| Chung (2009) [ | Intervention, comparisons | 203 RCT abstracts for training and 124 for testing | Coordinating constructs are identified using a full parser, which are further classified as positive or not using CRF | Sentence | Abstract | F-score: 0.76 |
| Hara and Matsumoto (2007) [ | Patient population, comparison | 200 abstracts labeled as ‘Neoplasms’ and ‘Clinical Trial, Phase III’ | Categorizing noun phrases (NPs) into classes such as ‘Disease’, ‘Treatment’ etc. using CRF and use regular expressions on the sentence with classified Noun Phrases | Sentence | Abstract | F-measure of 0.91 for the task of noun phrase classification. Results of sentence classification: F-,measure of 0.8 for patient population and 0.81 for comparisons |
| Davis-Desmond and Molla (2012) [ | Detecting statistical evidence | 194 randomized controlled trial abstracts from PubMed | Rule-based classifier using negation expressions | Sentence | Abstract | Accuracy: between 88 and 98 % at 95 % CI |
| Zhao et al. (2012) [ | Patient, result, Intervention, Study Design, Research Goal | 19,893 medical abstracts and full text articles from 17 journal websites | Conditional random fields | Sentence | Full text | F-scores for sentence classification: patient: 0.75, intervention: 0.61, result: 0.91, study design: 0.79, research goal: 0.76 |
| Hsu et al. (2012) [ | Hypothesis, statistical method, outcomes and generalizability | 42 full-text papers | Regular expressions | Sentence | Full text | For classification task, F-score of 0.86 for hypothesis, 0.84 for statistical method, 0.9 for outcomes, and 0.59 for generalizability |
| Song et al. (2013) [ | Analysis (statistical facts), general (generally accepted facts), recommend (recommendations about interventions), rule (guidelines) | 346 sentences from three clinical guideline document | Maximum entropy (MaxEnt), SVM, MLP, radial basis function network (RBFN), NB as classifiers and information gain (IG), genetic algorithm (GA) for feature selection | Sentence | Full text | F-score of 0.98 for classifying sentences |
| Demner-Fushman and Lin (2007) [ | PICO (I and C were combined) | 275 manually annotated abstracts | Rule-based approach to identify sentence containing PICO and supervised classifier for Outcomes | Concept | Abstract | Precision of 0.8 for population, 0.86 for problem, 0.80 for intervention, 0.64–0.95 for outcome |
| Kelly and Yang (2013) [ | Age of subjects, duration of study, ethnicity of subjects, gender of subjects, health status of subjects, number of subjects | 386 abstracts from PubMed obtained with the query ‘soy and cancer’ | Regular expressions, gazetteer | Concept | Abstract | F-scores for age of subjects: 1.0, duration of study: 0.911, ethnicity of subjects: 0.949, gender of subjects: 1.0, health status of subjects: 0.874, number of subjects: 0.963 |
| Hansen et al. (2008) [ | Number of trial participants | 233 abstracts from PubMed | Support vector machines | Concept | Abstract | F-measure: 0.86 |
| Xu et al. (2007) [ | Subject demographics such as subject descriptors, number of participants and diseases/symptoms and their descriptors | 250 randomized controlled trial abstracts | Text classification augmented with hidden Markov models was used to identify sentences; rules over parse tree to extract relevant information | Sentence, concept | Abstract | Precision for subject descriptors: 0.83 %, number of trial participants: 0.923, diseases/symptoms: 51.0 %, descriptors of diseases/symptoms: 92.0 % |
| Summerscales et al. (2009) [ | Treatments, groups and outcomes | 100 abstracts from | Conditional random fields | Concept | Abstract | F-scores for treatments: 0.49, groups: 0.82, outcomes: 0.54 |
| Summerscales et al. (2011) [ | Groups, outcomes, group sizes, outcome numbers | 263 abstracts from | CRF, MaxEnt, template filling | Concept | Abstract | F-scores for groups: 0.76, outcomes: 0.42, group sizes: 0.80, outcome numbers: 0.71 |
| Kiritchenko et al. (2010) [ | Eligibility criteria, sample size, drug dosage, primary outcomes | 50 full-text journal articles with 1050 test instances | SVM classifier to recover relevant sentences, extraction rules for correct solutions | Concept | Full text | P5 precision for the classifier: 0.88, precision and recall of the extraction rules: 93 and 91 %, respectively |
| Lin et al. (2010) [ | Intervention, age group of the patients, geographical area, number of patients, time duration of the study | 93 open access full-text literature documenting oncological and cardio-vascular studies from 2005 to 2008 | Linear chain, conditional random fields | Concept | Full text | Precision of 0.4 for intervention, 0.63 for age group, 0.44 for geographical area, 0.43 for number of patients and 0.83 for time period |
| Restificar et al. (2012) [ | Eligibility criteria | 44,203 full-text articles with clinical trials | Latent Dirichlet allocation along with logistic regression | Concept | Full text | 75 and 70 % accuracy based on similarity for inclusion and exclusion criteria, respectively. |
| De Bruijn et al. (2008) [ | Eligibility criteria, sample size, treatment duration, intervention, primary and secondary outcomes | 88 randomized controlled trials full-text articles from five medical journals | SVM classifier to identify the most promising sentences; manually crafted weak extraction rules for the information elements | Sentence, concept | Full text | Precision for eligibility criteria: 0.69, sample size: 0.62, treatment duration: 0.94, intervention: 0.67, primary outcome: 1.00, secondary outcome: 0.67 |
| Zhu et al. (2012) [ | Subject demographics: patient age, gender, disease and ethnicity | 50 randomized controlled trials full-text articles | Manually crafted rules for extraction from the parse tree | Concept | Full text | Disease extraction: for exact matching, the F-score was 0.64. For partially matched, it was 0.85. |
| Marshall et al. (2014) [ | Risk of bias concerning sequence generation, allocation concealment and blinding | 2200 clinical trial reports | Soft-margin SVM for a joint model of risk of bias prediction and supporting sentence extraction | Sentence | Full text | For sentence identification: F-score of 0.56, 0.48, 0.35 and 0.38 for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment |
Checklist of items to consider in data collection or data extraction from Cochrane Handbook [1]
| Source |
| • Study ID (created by review author) |
| • Report ID (created by review author) |
| • Review author ID (created by review author) |
| • Citation and contact details |
| Eligibility |
| • Confirm eligibility for review |
| • Reason for exclusion |
| Methods |
| • Study design |
| • Total study duration |
| • Sequence generationa |
| • Allocation sequence concealmenta |
| • Blindinga |
| • Other concerns about biasa |
| Participants |
| • Total number |
| • Setting |
| • Diagnostic criteria |
| • Age |
| • Sex |
| • Country |
| • [Co-morbidity] |
| • [Socio-demographics] |
| • [Ethnicity] |
| • [Date of study] |
| Interventions |
| • Total number of intervention groups. |
| For each intervention and comparison group of interest: |
| • Specific intervention |
| • Intervention details (sufficient for replication, if feasible) |
| • [Integrity of intervention] |
| Outcomes |
| • Outcomes and time points (i) collected; (ii) reporteda |
| For each outcome of interest: |
| • Outcome definition (with diagnostic criteria if relevant) |
| • Unit of measurement (if relevant) |
| • For scales: upper and lower limits, and whether high or low score is good |
| Results |
| • Number of participants allocated to each intervention group. |
| For each outcome of interest: |
| • Sample size |
| • Missing participantsa |
| • Summary data for each intervention group (e.g. 2 × 2 table for dichotomous data; means and SDs for continuous data) |
| • [Estimate of effect with confidence interval; |
| • [Subgroup analyses] |
| Miscellaneous |
| • Funding source |
| • Key conclusions of the study authors |
| • Miscellaneous comments from the study authors |
| • References to other relevant studies |
| • Correspondence required |
| • Miscellaneous comments by the review authors |
Items without parentheses should normally be collected in all reviews; items in square brackets may be relevant to some reviews and not to others
aFull description required for standard items in the ‘Risk of bias’ tool