Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Building an Evaluation Scale using Item Response Theory.

Literature DB >> 28004039

Building an Evaluation Scale using Item Response Theory.

Abstract

Evaluation of NLP methods requires testing against a previously vetted gold-standard test set and reporting standard metrics (accuracy/precision/recall/F1). The current assumption is that all items in a given test set are equal with regards to difficulty and discriminating power. We propose Item Response Theory (IRT) from psychometrics as an alternative means for gold-standard test-set generation and NLP system evaluation. IRT is able to describe characteristics of individual items - their difficulty and discriminating power - and can account for these characteristics in its estimation of human intelligence or ability for an NLP task. In this paper, we demonstrate IRT by generating a gold-standard test set for Recognizing Textual Entailment. By collecting a large number of human responses and fitting our IRT model, we show that our IRT model compares NLP systems with the performance in a human population and is able to provide more insight into system performance than standard evaluation metrics. We show that a high accuracy score does not always imply a high IRT score, which depends on the item characteristics and the response pattern.

Entities: Chemical

Year: 2016 PMID： 28004039 PMCID： PMC5167538 DOI： 10.18653/v1/d16-1062

Source DB: PubMed Journal: Proc Conf Empir Methods Nat Lang Process

1 in total

1. Mastering the game of Go with deep neural networks and tree search.

Authors: David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2016-01-28 Impact factor: 49.962

1 in total

3 in total

1. Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds.

Authors: John P Lalor; Hao Wu; Hong Yu
Journal: Proc Conf Empir Methods Nat Lang Process Date: 2019-11

2. Improving Electronic Health Record Note Comprehension With NoteAid: Randomized Trial of Electronic Health Record Note Comprehension Interventions With Crowdsourced Workers.

Authors: John P Lalor; Beverly Woolf; Hong Yu
Journal: J Med Internet Res Date: 2019-01-16 Impact factor: 5.428

3. Using Item Response Theory for Explainable Machine Learning in Predicting Mortality in the Intensive Care Unit: Case-Based Approach.

Authors: Adrienne Kline; Theresa Kline; Zahra Shakeri Hossein Abad; Joon Lee
Journal: J Med Internet Res Date: 2020-09-25 Impact factor: 5.428

3 in total