| Literature DB >> 35923956 |
Gianfranco Brambilla1, Antonella Rosi1, Francesco Antici2, Andrea Galassi2, Daniele Giansanti1, Fabio Magurano1, Federico Ruggeri2, Paolo Torroni2, Evaristo Cisbani1, Marco Lippi3.
Abstract
Background: The COVID-19 pandemic prompted the scientific community to share timely evidence, also in the form of pre-printed papers, not peer reviewed yet. Purpose: To develop an artificial intelligence system for the analysis of the scientific literature by leveraging on recent developments in the field of Argument Mining. Methodology: Scientific quality criteria were borrowed from two selected Cochrane systematic reviews. Four independent reviewers gave a blind evaluation on a 1-5 scale to 40 papers for each review. These scores were matched with the automatic analysis performed by an AM system named MARGOT, which detected claims and supporting evidence for the cited papers. Outcomes were evaluated with inter-rater indices (Cohen's Kappa, Krippendorff's Alpha, s* statistics).Entities:
Keywords: COVID-19; argument mining; artificial intelligence; inter-rater agreement; scientific literature quality assessment
Mesh:
Year: 2022 PMID: 35923956 PMCID: PMC9339778 DOI: 10.3389/fpubh.2022.945181
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Flowchart representing the methodology adopted to compare MARGOT and human reviewers.
Figure 2Scatter plots illustrating how two scores computed by MARGOT (namely, AR and AAS) well relate to the eligibility criteria of the two Cochrane reviews. Left: review #1; right: review #2. Orange (respectively, blue) dots correspond to papers that are included (respectively, excluded) in the review.
Figure 3Comparison of grading from the four experts named 1-4 on the 40 papers of Cochrane 1 (top) and Cochrane 2 (bottom) reviews, Score ranges from 1 to 5.
Figure 4Different inter-rater indices vs. the MARGOT metrics, described in the text (8 categories considered). The yellow stars correspond to the Cohen Kappa index between MARGOT score and Cochrane binary accepted/not-accepted. The error bars represent the Standard Deviations, which are evaluated according to (21) for the Cohen Kappa, and as Standard Deviation of the null hypothesis (agreement due to chance) for the Krippendorff Alpha and s* statistics as described in (18).