| Literature DB >> 30809592 |
Fala Cramond1, Alison O'Mara-Eves2, Lee Doran-Constant3, Andrew Sc Rice1, Malcolm Macleod4, James Thomas2.
Abstract
Background: The extraction of data from the reports of primary studies, on which the results of systematic reviews depend, needs to be carried out accurately. To aid reliability, it is recommended that two researchers carry out data extraction independently. The extraction of statistical data from graphs in PDF files is particularly challenging, as the process is usually completely manual, and reviewers need sometimes to revert to holding a ruler against the page to read off values: an inherently time-consuming and error-prone process.Entities:
Keywords: Systematic review; automation; data extraction; graphs
Year: 2019 PMID: 30809592 PMCID: PMC6372928 DOI: 10.12688/wellcomeopenres.14738.3
Source DB: PubMed Journal: Wellcome Open Res ISSN: 2398-502X
Figure 2. The graph as displayed in PDF.JS.
Figure 3. Specification of graph characteristics.
Figure 4. Data extraction.
Figure 5. Data table.
Figure 6. An alternative type of data table, supporting individual and aggregate level data.
Figure 1. The ‘current methods’ data collection tool in the Qualtrics platform.
Mean and standard deviation of the time difference for each graph across n participants.
| Graph
| Mean
| Standard
| Participants,
|
|---|---|---|---|
| 1 | 180.02 | 240.55 | 10 |
| 2 | 363.60 | 255.27 | 10 |
| 3 | 314.59 | 225.16 | 9 |
| 4 | 140.22 | 121.24 | 10 |
| 5 | 113.68 | 132.87 | 10 |
| 6 | 486.54 | 298.17 | 10 |
| 7 | 463.91 | 410.30 | 10 |
| 8 | 167.77 | 153.72 | 9 |
| 9 | 332.57 | 252.34 | 8 |
| 10 | 546.28 | 649.84 | 7 |
| 11 | 412.55 | 243.81 | 9 |
| 12 | 564.25 | 820.52 | 8 |
| 13 | 210.15 | 169.28 | 8 |
| 14 | 377.24 | 466.44 | 8 |
| 15 | 281.76 | 331.66 | 8 |
| 16 | 478.74 | 404.05 | 8 |
| 17 | 691.20 | 738.84 | 7 |
| 18 | 119.62 | 104.42 | 8 |
| 19 | 93.34 | 151.23 | 9 |
| 20 | 650.31 | 750.77 | 9 |
| 21 | 469.19 | 804.92 | 8 |
| 22 | 373.28 | 462.56 | 9 |
| 23 | 270.24 | 258.40 | 8 |
Note: A positive time difference indicates that the current methods condition took longer than the new graphical data extraction application method condition.
Frequency per graph of data points deemed sufficient accuracy or insufficient accuracy, with percentage of data points that are sufficient accuracy, by condition.
| Graph | Current methods condition | New graphical data extraction application condition | ||||
|---|---|---|---|---|---|---|
| Sufficient
| Insufficient
| Percent sufficient
| Sufficient
| Insufficient
| Percent sufficient
| |
| 1 | 3 | 1 | 75.00% | 4 | 0 | 100.00% |
| 2 | 16 | 8 | 66.67% | 18 | 2 | 90.00% |
| 3 | 7 | 5 | 58.33% | 7 | 5 | 58.33% |
| 4 | 1 | 9 | 10.00% | 10 | 0 | 100.00% |
| 5 | 8 | 4 | 66.67% | 6 | 0 | 100.00% |
| 6 | 15 | 33 | 31.25% | 17 | 0 | 100.00% |
| 7 | 14 | 6 | 70.00% | 5 | 15 | 25.00% |
| 8 | 9 | 11 | 45.00% | 9 | 11 | 45.00% |
| 9 | 16 | 16 | 50.00% | 18 | 0 | 100.00% |
| 10 | could not match data so removed from analysis | |||||
| 11 | 0 | 30 | 0.00% | 0 | 20 | 0.00% |
| 12 | 3 | 33 | 8.33% | 20 | 14 | 58.82% |
| 13 | 10 | 10 | 50.00% | 20 | 0 | 100.00% |
| 14 | 37 | 3 | 92.50% | 14 | 0 | 100.00% |
| 15 | could not match data so removed from analysis | |||||
| 16 | 12 | 28 | 30.00% | 10 | 12 | 45.45% |
| 17 | 22 | 33 | 40.00% | 8 | 12 | 40.00% |
| 18 | 0 | 12 | 0.00% | 5 | 5 | 50.00% |
| 19 | 10 | 2 | 83.33% | 12 | 0 | 100.00% |
| 20 | could not match data so removed from analysis | |||||
| 21 | 22 | 38 | 36.67% | 27 | 9 | 75.00% |
| 22 | 12 | 30 | 28.57% | 14 | 0 | 100.00% |
| 23 | 10 | 14 | 41.67% | 20 | 0 | 100.00% |
|
|
|
|
|
|
|
|
Notes: Three of the graphs (10, 15, 20) had incompatible data because participants in the new graphical data extraction application condition selected too many different data input types, so a comparison could not be made. The total number of data points in the two conditions differs due to issues including missing data or incorrect selection of graph type in the new graphical data extraction application condition. This value represents the mean for this column, not the total.
Figure 7. Satisfaction with the features of the new graphical data extraction tool: percentage of respondents who ‘agreed’ or ‘strongly agreed’.