| Literature DB >> 29409442 |
Halil Kilicoglu1, Asma Ben Abacha2, Yassine Mrabet2, Sonya E Shooshan2, Laritza Rodriguez2, Kate Masterton2, Dina Demner-Fushman2.
Abstract
BACKGROUND: Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap. A major challenge in developing consumer health QA systems is extracting relevant semantic content from the natural language questions (question understanding). To develop effective question understanding tools, question corpora semantically annotated for relevant question elements are needed. In this paper, we present a two-part consumer health question corpus annotated with several semantic categories: named entities, question triggers/types, question frames, and question topic. The first part (CHQA-email) consists of relatively long email requests received by the U.S. National Library of Medicine (NLM) customer service, while the second part (CHQA-web) consists of shorter questions posed to MedlinePlus search engine as queries. Each question has been annotated by two annotators. The annotation methodology is largely the same between the two parts of the corpus; however, we also explain and justify the differences between them. Additionally, we provide information about corpus characteristics, inter-annotator agreement, and our attempts to measure annotation confidence in the absence of adjudication of annotations.Entities:
Keywords: Annotation confidence modeling; Consumer health informatics; Corpus annotation; Question answering
Mesh:
Year: 2018 PMID: 29409442 PMCID: PMC5802048 DOI: 10.1186/s12859-018-2045-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Frame representation of the question in Example 3
| Frame 1 | |
| Question type | |
| Theme | |
| Frame 2 | |
| Question type | |
| Theme | |
| Frame 3 | |
| Question type | |
| Theme |
The question is represented with three frames, each composed of two elements, question type and theme. The content of each element is shown as a mention:TYPE pair
Fig. 1Brat annotation for the consumer health question in Example 3. Named entities and question triggers are indicated with text spans and the question frames are represented as edges between the question trigger and named entities that act as arguments. Question topic (ACTIVE LUNG TB) is indicated with (F) next to its named entity category. Named entity categories are sometimes abbreviated: ANATOMY (Anat), DIAGNOSTIC_PROCEDURE (DiaP), GEOGRAPHIC_LOCATION (GeoL), PERSON_POPULATION (Pers), PROFESSION (Prof), SUBSTANCE (Subt). For question type categories, the abbreviated forms are: DIAGNOSIS (DIAG), SUSCEPTIBILITY (SUSC), and TREATMENT (TRTM)
The number of questions annotated by each annotator in CHQA-email
| Annotator | # questions |
|---|---|
| Ann1 | 565 |
| Ann2 | 495 |
| Ann3 | 553 |
| Ann4 | 489 |
| Ann5 | 554 |
| Ann6 | 544 |
| Ann7 | 240 |
Named entity categories with definitions, examples, and relevant UMLS semantic types
| Entity type | Brief definition | Examples | UMLS semantic types |
|---|---|---|---|
|
| Includes organs, body parts, and | Body System, | |
| tissues. | Anatomical Structure | ||
|
| Includes anatomical entities at the | Cell, | |
| molecular or cellular level. |
| Cell Component | |
|
| Includes tests and procedures used | Diagnostic Procedure, | |
|
| for diagnosis |
| Laboratory Procedure |
|
| Includes substances used for | Clinical Drug, | |
| therapeutic purposes. | Vitamin | ||
|
| |||
|
| Refers to specific nutritional | Food | |
| substances. | |||
|
| Includes specific genes and gene | Gene or Genome, | |
| products. | Enzyme | ||
|
| |||
|
| Includes countries, cities, etc. | Geographic Area | |
|
| |||
|
| Refers to daily and recreational | Daily or Recreational | |
| activities. | Activity | ||
|
| A quantity that is a core attribute of a named entity, such as dosage of a drug. | Quantitative Concept | |
|
| Includes institutions as well as their | Organization | |
| subparts. |
| ||
|
| Includes individuals (gender, age | Age Group, | |
|
| group, etc.) and population groups. | Population Group | |
|
| |||
|
| Includes disorders, symptoms, | Disease or Syndrome, | |
| abnormalities, and complications. | Neoplastic Process | ||
|
| |||
|
| Refers to procedures or medical | Medical Device, | |
| devices used for therapeutic | Therapeutic or | ||
| purposes as well as unspecific |
| Preventive Procedure | |
| interventions. | |||
|
| Includes occupations, disciplines, or | Professional or | |
| areas of expertise. |
| Occupational Group | |
|
| Includes chemicals, hazardous | Inorganic Chemical, | |
| substances, and body substances. | Biologically Active | ||
| Substance | |||
|
| Includes entities that are relevant to |
| Temporal Concept |
| question understanding, but do not | |||
| fit in one of the categories above. | |||
|
| |||
|
| Refers to physiological functions of the organism. |
| Organism Function |
|
| Indicates that consumer is interested in research information. | Qualitative Concept | |
Question type categories in CHQA-email with their definitions and some commonly used triggers
| Question type | Brief definition | Example triggers | In [ |
|---|---|---|---|
|
| |||
|
| Concerned with comparison of several entities (often of the same type) | ||
|
| General information about an entity | ✓ | |
|
| Information not covered with other types | ✓ | |
|
| |||
|
| Cause of a disease | ✓ | |
| Longer term effects of a disease | ✓ | ||
|
| Methods of diagnosing a disease | ✓ | |
| Unspecific effects of a disease | ✓(a) | ||
|
| Prevalence of a disease | ✓(b) | |
|
| Inheritance patterns of a disease | ✓(b) | |
| Lifestyle/diet changes after a disease | |||
| Body location of a disease | ✓(c) | ||
| Individuals/organizations specializing in a disease | ✓(d) | ||
|
| Methods of prevention for a disease | ✓(e) | |
| Life expectancy and quality of life for a disease | ✓ | ||
| Support groups for a disease | ✓(d) | ||
|
| Transmission of a disease | ✓ | |
|
| Signs and symptoms of a disease | ✓(f) | |
|
| Treatment, cure, management of a disease | ✓(e) | |
|
| |||
|
| How a drug acts in the body |
| |
|
| Alternatives for an intervention | ||
|
| Conditions in which an intervention should be avoided | ||
|
| Pricing of an intervention | ||
|
| Appropriate dosage of a drug | ||
|
| Conditions to use an intervention | ||
|
| Ingredients of a drug | ||
|
| Interactions between drugs | ||
|
| Long term consequences of an intervention | ||
|
| Consequences of a substance overdose | ||
|
| Short-term, adverse reactions to an intervention | ||
|
| Instructions for storing/disposing a drug | ||
|
| Instructions for how to stop using a drug | ||
|
| Patient instructions for an intervention | ||
|
| Other intervention question (e.g., drug form) | ||
Notes: (*) Applies to procedures/medical devices, as well. (a) As other_effect. (b) As susceptibility. (c) As anatomy. (d) As person_org. (e) As management. (f) As manifestation
Restrictions on THEME arguments in CHQA-email
| Question types | Theme restrictions |
|---|---|
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
{2,} indicates cardinality of at least two, + indicates at least one argument
Illustration of KEYWORD and EXCLUDE_KEYWORD semantic roles in question frames
| Question |
| |
| Frame | Question type | |
| Theme | ||
| Keyword | ||
| Keyword | ||
| Question |
| |
|
| ||
|
| ||
| Frame | Question type | |
| Theme | ||
| ExcludeKeyword | ||
The contents of frame elements are shown as mention:TYPE pairs
Basic corpus statistics
| Corpus Part | # questions | # tokens | Average | Range | Std. Dev. |
|---|---|---|---|---|---|
| CHQA-email | 1,740 | 95,834 | 55.1 | 2-427 | 51.3 |
|
| 20 | ||||
|
| 1,720 | ||||
| CHQA-web | 874 | 6,597 | 7.5 | 3-51 | 4.1 |
| Total | 2,614 | 102,431 | 39.2 | 2-427 | 136.7 |
The distribution of annotated named entity categories
| Category | # questions | % (Rank) | # questions | % (Rank) | # questions | % (Rank) |
|---|---|---|---|---|---|---|
|
|
| |||||
|
|
| |||||
|
| 31 | 12.8 (4) | 5,339 | 15.8 (2) | 153 | 10.8 (3) |
|
| 0 | 0 (17) | 224 | 0.7 (16) | 13 | 0.9 (12) |
|
| 3 | 1.2 (8) | 967 | 2.9 (8) | 101 | 7.2 (4) |
|
| 26 | 10.8 (5) | 3,264 | 9.7 (4) | 237 | 16.8 (2) |
|
| 3 | 1.2 (8) | 474 | 1.4 (11) | 35 | 2.5 (11) |
|
| 1 | 0.4 (16) | 156 | 0.5 (17) | 9 | 0.6 (14) |
|
| 2 | 0.8 (13) | 455 | 1.4 (13) | 3 | 0.2 (17) |
|
| 2 | 0.8 (13) | 438 | 1.3 (14) | 44 | 3.1 (9) |
|
| 3 | 1.2 (8) | 331 | 1.0 (15) | 10 | 0.7 (13) |
|
| 3 | 1.2 (8) | 469 | 1.4 (12) | 66 | 4.7 (6) |
|
| 7 | 2.9 (7) | 576 | 1.7 (9) | 1 | 0.1 (18) |
|
| 36 | 14.9 (2) | 3,763 | 11.2 (3) | 60 | 4.2 (7) |
|
| 75 | 31.0 (1) | 11,711 | 34.7 (1) | 476 | 33.7 (1) |
|
| 32 | 13.2 (3) | 2,481 | 7.4 (5) | 99 | 7.0 (5) |
|
| 2 | 0.8 (13) | 1,144 | 3.4 (7) | 8 | 0.6 (15) |
|
| - | - | - | - | 4 | 0.3 (16) |
|
| 13 | 5.4 (6) | 1,466 | 4.3 (6) | 56 | 4.0 (8) |
|
| 3 | 1.2 (8) | 489 | 1.5 (10) | 38 | 2.7 (10) |
| Total | 242 | 100.0 | 33,747 | 100.0 | 1,413 | 100.0 |
| Average | 12.1 | 9.8 | 1.6 | |||
| Range | 1-35 | 1-84 | 1-5 | |||
Note that questions in the unadjudicated set are counted twice since this set is double-annotated
Fig. 2The distribution of named entities in CHQA-email and CHQA-web parts of the corpus. RESEARCH_CUE, annotated in only CHQA-web, is not included
The distribution of annotated question triggers/types
| Category | # questions | # questions | % (Rank) | # questions | % (Rank) |
|---|---|---|---|---|---|
|
|
| ||||
|
|
| ||||
|
| |||||
|
| 1 | 52 | 1.2 (19) | 25 | 2.8 (11) |
|
| 1 | 692 | 15.6 (2) | 147 | 16.7 (1) |
|
| - | - | - | 40 | 4.5 (9) |
|
| 0 | 165 | 3.7 (7) | 0 | 0 (24) |
|
| |||||
|
| 4 | 251 | 5.7 (3) | 95 | 10.8 (2) |
|
| 1 | 66 | 1.5 (16) | 32 | 3.6 (10) |
|
| - | - | - | 11 | 1.3 (19) |
|
| 0 | 166 | 3.8 (6) | 16 | 1.8 (15) |
| 0 | 111 | 2.5 (10) | 11 | 1.3 (19) | |
|
| 0 | 20 | 0.5 (29) | - | - |
|
| 0 | 87 | 2.0 (14) | - | - |
|
| 0 | 50 | 1.1 (20) | 15 | 1.7 (16) |
|
| 0 | 8 | 0.2 (32) | 79 | 9.0 (4) |
|
| 1 | 224 | 5.1 (4) | 17 | 1.9 (14) |
|
| 0 | 42 | 1.0 (21) | 4 | 0.5 (22) |
|
| 1 | 205 | 4.6 (5) | 14 | 1.6 (17) |
|
| 0 | 12 | 0.3 (31) | - | - |
|
| 0 | 66 | 1.5 (16) | 49 | 5.6 (7) |
|
| 0 | 80 | 1.8 (15) | 18 | 2.0 (13) |
|
| 9 | 1,243 | 28.1 (1) | 87 | 9.9 (3) |
|
| |||||
|
| 0 | 26 | 0.6 (28) | 41 | 4.7 (8) |
|
| 0 | 35 | 0.8 (22) | 0 | 0 (24) |
|
| 1 | 33 | 0.7 (25) | 24 | 2.7 (12) |
|
| 1 | 20 | 0.5 (29) | 2 | 0.2 (23) |
|
| 0 | 34 | 0.8 (23) | - | - |
|
| 1 | 111 | 2.5 (10) | 75 | 8.5 (5) |
|
| 1 | 123 | 2.8 (9) | 7 | 0.8 (21) |
|
| 0 | 60 | 1.4 (18) | 13 | 1.5 (18) |
|
| 1 | 33 | 0.7 (25) | - | - |
|
| 1 | 8 | 0.2 (32) | - | - |
|
| 1 | 109 | 2.5 (12) | - | - |
|
| 0 | 31 | 0.7 (27) | - | - |
|
| 0 | 34 | 0.8 (23) | - | - |
|
| 0 | 133 | 3.0 (8) | 60 | 6.8 (6) |
|
| 0 | 99 | 2.2 (13) | 0 | 0 (24) |
| Total | 25 | 4,429 | 882 | ||
| Average | 1.25 | 1.29 | 1.01 | ||
| Range | 1-4 | 1-15 | 1-2 | ||
Note that questions in the unadjudicated set are counted twice since this set is double-annotated
Fig. 3The distribution of question trigger types in CHQA-email and CHQA-web parts of the corpus. The question type categories in CHQA-web is used and some of the CHQA-email types are merged with their supertypes (e.g., SUPPORT_GROUP is merged with PERSON_ORGANIZATION) for simplicity
The distribution of annotated question triggers
| Category | # | Top triggers | # | Top triggers |
|---|---|---|---|---|
|
|
| |||
|
| ||||
|
| 28 |
| 20 |
|
|
| 272 |
| 52 |
|
|
| - | - | 23 |
|
|
| 131 |
| 0 | - |
|
| ||||
|
| 70 |
| 26 |
|
|
| 40 |
| 29 |
|
|
| - | - | 10 |
|
|
| 85 |
| 11 |
|
| 69 |
| 8 |
| |
|
| 14 |
| - | |
|
| 54 |
| - | |
|
| 37 |
| 11 |
|
|
| 7 |
| 21 |
|
|
| 114 |
| 13 |
|
|
| 22 |
| 2 |
|
|
| 148 |
| 13 |
|
|
| 10 |
| - | |
|
| 41 |
| 29 |
|
|
| 51 |
| 14 |
|
|
| 303 |
| 46 |
|
| Category | # | Top triggers | # | Top triggers |
|
|
| |||
|
| ||||
|
| 21 |
| 29 |
|
|
| 25 |
| 0 | - |
|
| 23 |
| 22 |
|
|
| 5 |
| 1 |
|
|
| 19 |
| 0 | - |
|
| 73 |
| 34 |
|
|
| 38 |
| 4 |
|
|
| 41 |
| 12 |
|
|
| 26 |
| 0 | - |
|
| 8 | - | 0 | - |
|
| 60 |
| 0 | - |
|
| 21 |
| 0 | - |
|
| 28 |
| 0 | - |
|
| 89 |
| 42 |
|
Numbers in the second and fourth columns are unique counts of triggers used for the corresponding category. Only triggers that occur at least twice are shown. The most frequent trigger for a given category is indicated with its frequency in parentheses (when this frequency is 2, all triggers given occur twice)
The distribution of proposed question types annotated as OTHER_QUESTION or DRUG_QUESTION
| Category | # questions | Brief desription |
|---|---|---|
|
| ||
| Antidote | 2 | How to deal with a chemical |
| Availability | 9 | Availability of an intervention on the market, where to get it |
| Complication management | 1 | How to fix an issue arising from a procedure |
| Contraindicated | 1 | What is contraindicated for a disease |
| Diagnose Me | 5 | Diagnosis given a list of symptoms |
| Duration | 3 | How long for a procedure/treatment |
| Fertility | 1 | Possible to have children with existing condition |
| Function | 1 | How a body part works |
| Gene-disease association | 2 | Association between a gene and a disease |
| History | 3 | History of a disease |
| Incubation | 1 | Incubation period for a disease |
| Interpretation | 5 | Lab result interpretation |
| Post-procedure management | 3 | Management options after a procedure |
| Preparation | 2 | How to prepare for a lab test |
| Procedure follow-up | 7 | Whether procedures are still needed after a problem is solved |
| Progression | 3 | How a disease progresses |
| Test result range | 10 | Reference values for a lab test/procedure |
|
| ||
| Clinical trial | 3 | Trials for a drug |
| Coindication | 1 | Whether to use a drug with another |
| Coverage | 5 | Whether insurance pays for a drug |
| Effect duration | 3 | How long the effect lasts |
| Form | 4 | What form the drug comes in |
| Manufacturer | 6 | Manufacturer of a drug |
| Packaging | 1 | How a drug is packaged |
| Pharmacokinetics | 6 | How long it takes for a drug to have effect |
| Potency | 3 | Whether a drug retains its potency after a time period |
| Prescription | 4 | Whether a prescription is needed |
| Stability | 1 | Whether a drug is stable when diluted |
| Transmission | 3 | Whether a drug is transmitted through body fluids |
The distribution of annotated question frame categories
| Category | # questions | # questions | % (Rank) | # questions | % (Rank) |
|---|---|---|---|---|---|
|
|
| ||||
|
|
| ||||
|
| |||||
|
| 1 | 52 | 1.1 (19) | 25 | 2.7 (12) |
|
| 1 | 736 | 15.7 (2) | 148 | 16.5 (1) |
|
| - | - | - | 40 | 4.5 (9) |
|
| 0 | 172 | 3.7 (7) | 0 | 0 (24) |
|
| |||||
|
| 4 | 263 | 5.6 (3) | 100 | 11.1 (2) |
|
| 1 | 67 | 1.4 (17) | 32 | 3.6 (10) |
|
| - | - | - | 11 | 1.2 (20) |
|
| 0 | 180 | 3.8 (6) | 16 | 1.8 (15) |
| 0 | 112 | 2.4 (12) | 12 | 1.3 (19) | |
|
| 0 | 21 | 0.5 (29) | - | - |
|
| 0 | 90 | 1.9 (14) | - | - |
|
| 0 | 51 | 1.1 (20) | 15 | 1.7 (16) |
|
| 0 | 8 | 0.2 (32) | 79 | 8.8 (4) |
|
| 1 | 237 | 5.1 (4) | 17 | 1.9 (14) |
|
| 0 | 48 | 1.0 (21) | 4 | 0.5 (22) |
|
| 1 | 209 | 4.5 (5) | 14 | 1.6 (17) |
|
| 0 | 14 | 0.3 (31) | - | - |
|
| 0 | 67 | 1.4 (17) | 49 | 5.5 (7) |
|
| 0 | 83 | 1.8 (15) | 18 | 2.0 (13) |
|
| 9 | 1,298 | 27.7 (1) | 90 | 10.0 (3) |
|
| |||||
|
| 0 | 26 | 0.6 (28) | 43 | 4.8 (8) |
|
| 0 | 39 | 0.8 (22) | 0 | 0 (24) |
|
| 1 | 34 | 0.7 (26) | 27 | 3.0 (11) |
|
| 1 | 20 | 0.4 (30) | 2 | 0.2 (23) |
|
| 0 | 36 | 0.8 (23) | - | - |
|
| 1 | 127 | 2.7 (10) | 78 | 8.7 (5) |
|
| 1 | 127 | 2.7 (10) | 7 | 0.8 (21) |
|
| 0 | 73 | 1.6 (16) | 13 | 1.5 (18) |
|
| 1 | 34 | 0.7 (26) | - | - |
|
| 1 | 8 | 0.2 (32) | - | - |
|
| 1 | 134 | 2.9 (9) | - | - |
|
| 0 | 36 | 0.8 (23) | - | - |
|
| 0 | 35 | 0.7 (25) | - | - |
|
| 0 | 143 | 3.1 (8) | 60 | 6.7 (6) |
|
| 0 | 104 | 2.2 (13) | 0 | 0 (24) |
| Total | 25 | 4,684 | 900 | ||
| Range | 1-4 | 1-18 | 1-2 | ||
| Average | 1.25 | 1.36 | 1.03 | ||
Fig. 4The distribution of question frames in CHQA-email and CHQA-web parts of the corpus. The question frame categories in CHQA-web is used and some of the CHQA-email types are merged with their supertypes
The distribution of frame semantic roles
| Category | # questions | Range | # questions | Range | # questions | Range |
|---|---|---|---|---|---|---|
|
|
| |||||
|
|
| |||||
|
| 27 | 1-3 | 4,860 | 1-7 | 958 | 1-5 |
|
| 18 | 1-2 | 2,959 | 0-9 | - | - |
|
| 0 | - | 107 | 0-6 | - | - |
|
| - | - | - | - | 91 | 0-2 |
|
| - | - | - | - | 85 | 0-2 |
|
| - | - | - | - | 58 | 0-1 |
|
| - | - | - | - | 24 | 0-1 |
|
| - | - | - | - | 185 | 0-3 |
|
| - | - | - | - | 28 | 0-1 |
|
| - | - | - | - | 4 | 0-1 |
Inter-annotator agreement results
| Category | Average | Range | Average | Range |
|---|---|---|---|---|
|
|
| |||
|
| ||||
| Avg. # of questions shared | 81.9 | 36-120 | 540 | 540-540 |
| Named entity | - | - | 0.72 | 0.66-0.78 |
| Question trigger | 0.37 | 0.18-0.52 | 0.60 | 0.46-0.66 |
| Question type | 0.58 | 0.39-0.69 | 0.74 | 0.65-0.81 |
| Full frame w/ trigger | 0.22 | 0.08-0.34 | 0.41 | 0.33-0.48 |
| Core frame w/ trigger | 0.27 | 0.11-0.41 | 0.48 | 0.38-0.56 |
| Full frame w/ type | 0.32 | 0.15-0.46 | 0.54 | 0.47-0.58 |
| Core frame w/ type | 0.41 | 0.22-0.56 | 0.64 | 0.55-0.70 |
| Question topic | 0.71 | 0.61-0.87 | - | - |
Inter-annotator agreement is calculated as the micro-average F1 score when one set of annotations is taken as the gold standard
Fig. 5Inter-annotator agreement for various question elements in CHQA-email. Exact match criterion is used as the basis of agreement
Fig. 6Inter-annotator agreement for various question elements in CHQA-web. Exact match criterion is used as the basis of agreement
Inter-annotator agreement broken down by question types and corresponding triggers
| Category | Trigger | Type | Trigger | Type |
|---|---|---|---|---|
|
|
| |||
|
| ||||
|
| 0.28 | 0.43 | 0.66 | 0.77 |
|
| 0.37 | 0.51 | 0.65 | 0.75 |
|
| - | - | 0.47 | 0.65 |
|
| 0.23 | 0.39 | 0.00 | 0.00 |
|
| ||||
|
| 0.60 | 0.64 | 0.63 | 0.76 |
|
| 0.17 | 0.21 | 0.58 | 0.64 |
|
| - | - | 0.72 | 0.82 |
|
| 0.30 | 0.48 | 0.51 | 0.72 |
| 0.35 | 0.45 | 0.17 | 0.17 | |
|
| 0.20 | 0.30 | - | - |
|
| 0.30 | 0.46 | - | - |
|
| 0.02 | 0.25 | 0.32 | 0.37 |
|
| 0.20 | 0.20 | 0.60 | 0.85 |
|
| 0.43 | 0.63 | 0.18 | 0.63 |
|
| 0.44 | 0.51 | 0.50 | 0.50 |
|
| 0.25 | 0.49 | 0.41 | 0.77 |
|
| 0.00 | 0.00 | - | - |
|
| 0.26 | 0.34 | 0.57 | 0.73 |
|
| 0.26 | 0.45 | 0.53 | 0.65 |
|
| 0.52 | 0.75 | 0.76 | 0.86 |
|
| ||||
|
| 0.03 | 0.19 | 0.57 | 0.60 |
|
| 0.18 | 0.22 | - | - |
|
| 0.10 | 0.25 | 0.57 | 0.67 |
|
| 0.40 | 0.40 | 1.00 | 1.00 |
|
| 0.23 | 0.37 | - | - |
|
| 0.15 | 0.29 | 0.48 | 0.75 |
|
| 0.36 | 0.83 | 0.50 | 0.50 |
|
| 0.20 | 0.73 | 0.57 | 0.78 |
|
| 0.22 | 0.30 | - | - |
|
| 0.00 | 0.33 | - | - |
|
| 0.31 | 0.39 | - | - |
|
| 0.19 | 0.35 | - | - |
|
| 0.13 | 0.36 | - | - |
|
| 0.08 | 0.40 | 0.62 | 0.72 |
|
| 0.14 | 0.28 | - | - |
Annotation confidence estimation in CHQA-web
| Method | Precision | Recall | F1 |
|---|---|---|---|
| AGREE | 0.99 | 0.66 | 0.79 |
| IAA_RANK ( | 0.78 | 0.80 | 0.79 |
| IAA_RANK ( | 0.85 | 0.84 | 0.84 |
| MACE | 0.82 | 0.86 | 0.84 |
| MACE (Reliability Rank) | 0.83 | 0.83 | 0.83 |
| P&C | 0.87 | 0.79 | 0.83 |
| P&C (Reliability Rank) | 0.81 | 0.82 | 0.82 |