| Literature DB >> 31033444 |
Nassr Nama1,2,3,4, Margaret Sampson2, Nicholas Barrowman1,2,5, Ryan Sandarage3, Kusum Menon1,2, Gail Macartney2, Kimmo Murto1,6, Jean-Philippe Vaccani1,7, Sherri Katz1,2, Roger Zemek1,2,8, Ahmed Nasr1,9, James Dayre McNally1,2.
Abstract
BACKGROUND: Systematic reviews (SRs) are often cited as the highest level of evidence available as they involve the identification and synthesis of published studies on a topic. Unfortunately, it is increasingly challenging for small teams to complete SR procedures in a reasonable time period, given the exponential rise in the volume of primary literature. Crowdsourcing has been postulated as a potential solution.Entities:
Keywords: crowdsourcing; meta-analysis as topic; research design; systematic reviews as topic
Mesh:
Year: 2019 PMID: 31033444 PMCID: PMC6658317 DOI: 10.2196/12953
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Description of systematic reviews.
| Systematic reviewa | Description | Total citationsb, N | Validation studyc, N | Eligible citationsd, N (%) |
| Anesthesiologye | A systematic review of preoperative screening for factors associated with postoperative critical respiratory events in children undergoing elective adenotonsillectomy | 5458 | 300 | 29 (9.7) |
| Cardiologyf | A scoping review of all randomized controlled trials in pediatric cardiology | 7540 | 490 | 71 (14.5) |
| Emergency | A systematic review of studies on concussion education and outcomes for children | 513 | 503 | 9 (1.8) |
| Endocrinologyg | 2017 update of a previously published systematic review on high-dose supplementation of vitamin D in children [ | 201 | 201 | 30 (14.9) |
| Respirology | A systematic review of studies on predictors of positive airway pressure adherence at home among children with sleep-disordered breathing | 277 | 265 | 23 (8.7) |
| Surgery | A systematic review of studies on asymptomatic antenatal diagnoses of congenital pulmonary airways malformation that describe natural history of the disease and future symptoms [ | 574 | 564 | 16 (2.8) |
aTotal of 6 systematic reviews and 2323 citations were included. 178 (7.7%) of citations were identified as eligible by the experts (ie, true positives).
bTotal number of citations identified by the search strategy.
cNumber of citations included in the validation study, after excluding the 10 citations used as a training set.
dEligible citations as identified by the experts (ie, true positives).
eA random sample of 300 citations was selected and enriched with up to 30 eligible citations.
fA random sample of 500 citations was selected.
gGiven the limited number of citations, the 10 training set citations were selected from the original publication.
Figure 1Study flow diagram. To focus the study on the crowd’s capacity to assess abstracts and not title screening, citations with missing abstracts (129) were removed. These citations were later added to the full text screening stage, along with any citation that did not receive higher than our a priori exclusion threshold of 75% at the abstract screening level. True positives reflect the number of citations that were identified as eligible by the experts. CHEO: Children’s Hospital of Eastern Ontario; PI: principal investigator.
Comparison of crowd members who proceeded to complete the minimum 50 citations with those who did not.
| Crowd members | <50 assessmentsa, N (%) | ≥50 assessments, N (%) | Total | ||
| Total reviewers | 241 (77.2) | 71 (22.8) | —c | 312 | |
| — | — | .15 | — | ||
| Undergraduate studies | 107 (44.4) | 24 (33.8) | — | 131 | |
| Medical student | 41 (17.0) | 20 (28.2) | — | 61 | |
| Graduate studies | 36 (14.9) | 9 (12.7) | — | 45 | |
| Allied health professional | 20 (8.3) | 3 (4.2) | — | 23 | |
| Physician | 7 (2.9) | 3 (4.2) | — | 10 | |
| Other | 4 (1.7) | 3 (4.2) | — | 7 | |
| — | — | .08 | — | ||
| None | 65 (27.0) | 27 (38.0) | — | 92 | |
| Student | 130 (53.9) | 35 (49.3) | — | 165 | |
| Volunteer | 81 (33.6) | 23 (32.4) | — | 104 | |
| Coordinator | 66 (27.4) | 11 (15.5) | — | 77 | |
| Investigator | 25 (10.4) | 3 (4.2) | — | 28 | |
| — | — | .23 | — | ||
| None | 156 (64.7) | 53 (74.6) | — | 209 | |
| 1-3 | 57 (23.7) | 14 (19.7) | — | 71 | |
| >3 | 28 (11.6) | 4 (5.6) | — | 32 | |
| — | — | — | — | ||
| Involvement in a review | 52 (21.6) | 13 (18.3) | .62 | 65 | |
| Leading a review | 12 (5.0) | 5 (7.0) | .55 | 17 | |
| Publishing a review | 38 (15.8) | 12 (16.9 ) | .85 | 50 | |
aMinimum of 50 citations in a systematic review was requested from crowd members at the beginning of the study. Crowd members with 50 citations or more performed 98.8% (16,789/16,988) and 93.0% (7071/7604) of the abstract and full text assessments, respectively.
bComparison between those who did less than 50 assessments and those who did 50 or more (Fisher test).
cNot applicable.
dOnly 277 crowd members provided their background.
eMultiple choices can be selected by reviewers.
Figure 2Time to review completion during abstract screening. Time required to complete the desired 4 assessments per citation at the abstract screening level. On day 61, additional incentives were offered for the surgery review.
Figure 3Time to review completion during full text screening. Time required to complete the desired 4 assessments per citation at the full screening level. Between days 58 and 77, reviewers were notified that the screening deadline is for day 90, and further incentives were offered for the anesthesiology, surgery and respirology reviews.
Crowd’s sensitivity and work performed at different exclusion thresholds.
| Crowd agreement required to excludea | Sensitivityb | Work performedc | Specificityd | ||||
| Mean (95% CI) | Mean (95% CI) | Mean (95% CI) | |||||
| =100% | 100 (97.9-100) | .50 | 44.9 (42.8-46.9) | <.001 | 48.6 (46.5-50.7) | <.001 | |
| >75% | 100 (97.9-100) | (Refg) | 60.1 (58.1-62.1) | (Ref) | 65.1 (63.0-67.1) | (Ref) | |
| >50% | 98.9 (96.0-99.7) | .25 | 68.0 (66.1-69.9) | <.001 | 73.6 (71.7-75.4) | <.001 | |
| =100% | 100 (97.9-100) | .50 | 68.3 (66.4-70.1) | <.001 | 73.9 (72.0-75.8) | <.001 | |
| >75% | 100 (97.9-100) | (Ref) | 72.9 (71.0-74.6) | (Ref) | 78.9 (77.2-80.6) | (Ref) | |
| >50% | 98.9 (96.0-99.7) | .25 | 80.4 (78.7-82.0) | <.001 | 87.0 (85.5-88.4) | <.001 | |
aCitations were excluded based on different thresholds.
bSensitivity is the percentage of eligible citations, identified by the experts, that were retained by the crowd.
cWork performed is the percentage of citations that were excluded by the crowd and did not require assessment by the investigative team at the abstract level.
dSpecificity is the percentage of ineligible citations, as identified by the experts, that were excluded by the crowd.
eP value compares sensitivity, work performed, or specificity to the respective value at the 75% threshold (McNemar test).
fOutcomes were measured after abstract screening. A citation was excluded if the percentage of assessments that excluded the paper at the abstract level was higher than the specified threshold.
gRef: reference category.
hOutcomes were measured at the end of both screening levels. A citation was excluded if the percentage of assessments that excluded the paper at either abstract or full text levels was higher than the specified threshold.
Figure 4Sensitivity and work performed as a function of the exclusion threshold at the abstract level. A citation is excluded when the percentage of exclusion assessment is above the exclusion cut-off at the abstract level. Sensitivity is the percentage of eligible citations identified by the experts that were retained by the crowd. Work performed is the percentage of citations that were excluded by the crowd and did not require assessment by the investigative team at the abstract level.
Figure 5Sensitivity and work performed as a function of the exclusion threshold after abstract and full text screening. A citation is excluded when the percentage of exclusion assessment is above the exclusion cut-off at either abstract or full text screening. Sensitivity is the percentage of eligible citations identified by the experts that were retained by the crowd. Work performed is the percentage of citations that were excluded by the crowd and did not require assessment by the investigative team at abstract or full text levels.
Individual crowd members’ performance.
| Performancea,b | Abstract level (N=40) | Full text level (N=41) | ||
| Median (IQRc) | Range | Median (IQR) | Range | |
| Assessments | 306.5 (108.5-513.5) | 16-2194 | 141 (72-206) | 5-786 |
| Sensitivityd | 96.6 (92.0-100.0) | 55.0-100.0 | 96.7 (89.6-99.0) | 32.3-100.0 |
| Specificitye | 76.4 (66.2-92.8) | 42.4-96.3 | 64.3 (58.5-73.8) | 22.9-100.0 |
aOnly crowd members who have completed 50 assessments or more in 1 review were included in this table. Crowd members with 50 citations or more performed 98.8% (16,789/16,988) and 93.0% (7071/7604) of the abstract and full text assessments, respectively.
bResults are provided per crowd member.
cIQR: interquartile range.
dSensitivity is the percentage of eligible citations, identified by the experts, that were retained by the crowd member. It is based on 38 crowd members at the abstract level and 38 at the full text level. The remaining crowd members did not assess any eligible citations.
eSpecificity is the percentage of ineligible citations, as discarded by the experts, that were also excluded by the crowd member.