| Literature DB >> 24956272 |
Slavko Žitnik1, Lovro Šubelj2, Marko Bajec2.
Abstract
Coreference resolution tries to identify all expressions (called mentions) in observed text that refer to the same entity. Beside entity extraction and relation extraction, it represents one of the three complementary tasks in Information Extraction. In this paper we describe a novel coreference resolution system SkipCor that reformulates the problem as a sequence labeling task. None of the existing supervised, unsupervised, pairwise or sequence-based models are similar to our approach, which only uses linear-chain conditional random fields and supports high scalability with fast model training and inference, and a straightforward parallelization. We evaluate the proposed system against the ACE 2004, CoNLL 2012 and SemEval 2010 benchmark datasets. SkipCor clearly outperforms two baseline systems that detect coreferentiality using the same features as SkipCor. The obtained results are at least comparable to the current state-of-the-art in coreference resolution.Entities:
Mesh:
Year: 2014 PMID: 24956272 PMCID: PMC4067305 DOI: 10.1371/journal.pone.0100101
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Classification of coreference resolution approaches.
| UNSUPERVISED | SUPERVISED | |
|
|
|
|
|
|
|
|
According to the two-dimensional classification of coreference resolution systems, the proposed SkipCor system solves the problem in a novel fashion.
Figure 1Linear-chain conditional random fields model.
Black nodes represent observable values, which are in our case entity mentions. White nodes represent hidden labels that we need to predict and define whether the current observable value is coreferent with the previous one.
Feature functions description.
| Name | Description | Model |
| Target label distribution | Distribution of target labels. | A, S, C |
| Starts upper | Does the mention start with an upper case letter. | A, S, C |
| Starts upper twice | Do two consequent mentions start with an upper case letter. | A, S, C |
| Prefix value | Value of the prefix (length of 2 and 3) for the mention on offset distance (distances from −5 to 5) from the current mention. | A, S, C |
| Suffix value | Value of the suffix (length of 2 and 3) for the mention on offset distance (distances from −5 to 5) from the current mention. | A, S, C |
| Consequent value | A combination of values of the consequent mentions on offset distance (distances from −4 to 4) from the current mention. | A, S, C |
| String match | Do consequent mention values match. | A, S, C |
| Gender match | Does the gender of two consequent mentions match. | A, S, C |
| Gender value | The gender value of the mention. | A, S, C |
| Is appositive | Is the mention appositive of the another. | A, S, C |
| Alias | Is the mention alias or abbreviation of the another. | A, S, C |
| Is prefix | Is the mention prefix of the another. | A, S, C |
| Is suffix | Is the mention suffix of the another. | A, S, C |
| Similarity value | How similar are the two mention values according to the Jaro Winkler | A, S, C |
| Is pronoun | Is the mention a pronoun. | A, S, C |
| Same sentence | Are consequent mentions in the same sentence. | A, S, C |
| Hearst co-occurence | Does the text between the two mentions follow some predefined rules, e.g. | A, S, C |
| Sentence distance | What is the distance between the sentences of the two mentions. | A, S, C |
| Is quoted | Is the mention within the parentheses. | A, S, C |
| Substring match | Is the mention a substring of the another. | A |
| Starts with | Does the mention starts with the another. | A, S, C |
| Ends with | Does the mention ends with the another. | A, S, C |
| Number match | Do the mentions match in number (i.e., singular, plural). | A, S, C |
| Mention type | Type of mention (i.e., pronoun, name, nominal). | A |
| Relative pronoun | Heuristic decision if the mention is a relative pronoun of the another. | A |
| WordNet | How is the mention semantically connected to the another (e.g., is a hypernym, synonym). | A |
| WordNet synset | Are the two consequent mentions in the same synset. | S, C |
| Entity type | What is the named entity type or subtype of the mention. | A |
| Length difference | What is the difference in length of the two consecutive mentions. | A, S, C |
| Is demonstrative | Is the mention a demonstrative noun phrase. | A, S, C |
| Offset match | Do consecutive POS values on distances from −2 to 2 match. | A |
| Parse tree path | Path values between the two mentions in a parse tree. | A, S, C |
| Parse tree mention depth | Depth of the mention within the parse tree. | A, S, C |
| Parse tree parent value | Parse tree value of the mention on lengths of one, two or three. | A, S, C |
| Relation | Does a relationship exist between the two consecutive mentions. | S |
| Speaker | Who is the current speaker in a transcript text. | C |
The feature functions are used by all skip-mention CRF models and are modeled as unigram or bigram features. The exact details (e.g., which mention values are used by a specific feature functions) and implementations can be retrieved from our public source repository [31] (within the class FeatureFunctionPackages). The abbreviations A, S and C define which feature functions were used when training the models for the ACE2004, SemEval2010 and CoNLL2012 datasets, respectively.
Figure 2Distribution of distances between two consecutive coreferent mentions.
The data was taken from the SemEval2010 [46] coreference dataset. Distance between two consecutive mentions means that there exist other mentions between them.
Figure 3Coreference resolution results using different skip-mention sequences.
Evaluation of the proposed system on the whole ACE2004 [45] and SemEval2010 [46] datasets using the metrics BCubed [41], MUC [9] and CEAFe [42].
Figure 4Zero skip-mention training sequence.
Initial mention sequence that contains all mentions from the input text “John is married to Jena. He is a mechanic at OBI and she works there. It is a DIY market.” If the current mention is coreferent with the previous one, it is labeled with C, otherwise with O.
Figure 5One skip-mention training sequences.
Mention sequences that include every second mention (i.e., one skip-mention) from the input text “John is married to Jena. He is a mechanic at OBI and she works there. It is a DIY market.” If the current mention is coreferent with the previous one, it is labeled with C, otherwise with O.
Figure 6Two skip-mention training sequences.
Mention sequences that include every third mention (i.e., two skip-mention) from the input text “John is married to Jena. He is a mechanic at OBI and she works there. It is a DIY market.” If the current mention is coreferent with the previous one, it is labeled with C, otherwise with O.
Figure 7High level skip-mention coreference resolution data flow.
The input to the system is given as a set of documents. For each document we select mentions and transform them into mention sequences. According to the system parameters, sequences contain every s+1th mention (i.e., s skip-mention). A model is trained for each sequence type and then used for labeling. After sequences are labeled, the mentions are then clustered. Each cluster of mentions represents a specific entity, which is also the final result of the system.
Algorithm 1.
| Algorithm 1: Skip-mention classifier training |
|
|
|
|
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
Algorithm 2.
| Algorithm 2: Skip-mention classifier labeling |
|
|
|
|
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
Dataset descriptions.
| Dataset | # documents | # sentences | # tokens | # mentions | # entities |
| ACE2004-ALL | 450 | 7,518 | 191,387 | 29,724 | 12,439 |
| ACE2004-NW | 127 | 2,865 | 74,987 | 11,188 | 4,701 |
| ACE2004-BN | 220 | 3,782 | 71,602 | 11,323 | 4,918 |
| SemEval2010-Train | 229 | 3,648 | 78,831 | 21,550 | 16,082 |
| SemEval2010-Test | 85 | 1,141 | 24,121 | 6,692 | 4,839 |
| CoNLL2012-ALL-Train | 1,914 | 75,185 | 1,299,310 | 154,760 | 33,113 |
| CoNLL2012-ALL-Test | 221 | 9,479 | 169,579 | 19,677 | 4,217 |
| CoNLL2012-NW-Train | 734 | 15,288 | 387,082 | 34,470 | 9,404 |
| CoNLL2012-NW-Test | 88 | 1,898 | 49,235 | 4,361 | 1,168 |
| CoNLL2012-BN-Train | 748 | 9,723 | 180,300 | 22,262 | 6,433 |
| CoNLL2012-BN-Test | 93 | 1,252 | 23,209 | 2,936 | 790 |
The acronyms ALL (i.e., whole), NW (i.e., newswire), BN (i.e., broadcast news) stand for different subdatasets of the whole dataset, which is further divided into training and test portions.
Results of the proposed SkipCor system, baseline systems, and other approaches on the ACE2004 datasets.
| MUC | BCubed | |||||
| P | R | F | P | R | F | |
|
|
| |||||
| SkipCor | 78.6 | 68.8 | 73.4 | 75.7 |
|
|
| SkipCorZero | 78.5 | 22.6 | 35.1 |
| 51.9 | 67.4 |
| SkipCorPair | 78.2 | 49.0 | 60.3 | 85.3 | 61.7 | 71.6 |
| Finkel et al. | 78.7 | 58.5 | 67.1 | 86.8 | 65.2 | 74.5 |
| Soon et al. |
| 37.8 | 52.4 | 94.1 | 56.9 | 70.9 |
| Haghighi et al. | 77.0 |
|
| 79.4 | 74.5 | 76.9 |
| Stoyanov et al. | - | - | 62.1 | - | - | 75.5 |
|
| ||||||
| SkipCor | 76.3 |
|
| 76.2 |
|
|
| SkipCorZero | 79.3 | 28.3 | 41.7 |
| 57.3 | 71.8 |
| SkipCorPair | 80.9 | 59.4 | 68.5 | 86.3 | 70.7 | 77.7 |
| Finkel et al. | 87.8 | 46.8 | 61.1 | 93.5 | 59.9 | 73.1 |
| Soon et al. | 90.0 | 43.2 | 58.3 | 95.6 | 58.4 | 72.5 |
|
| ||||||
| SkipCor | 79.5 | 70.9 | 75.0 | 76.3 |
| 78.6 |
| SkipCorZero |
| 28.9 | 42.6 |
| 55.4 | 70.2 |
| SkipCorPair | 80.5 | 57.1 | 66.8 | 84.8 | 68.9 | 76.0 |
| Cullota et al. | - | - | - | 86.7 | 73.2 | 79.3 |
| Bengston et al. | - | - | - | 88.3 | 74.5 |
|
| Haghighi et al. | 74.8 |
|
| 79.6 | 78.5 | 79.0 |
Coreference resolution systems evaluated on the ACE2004 dataset (i.e., ALL) [45] and its newswire (i.e., NW) and broadcast news (i.e., BN) subdatasets using the metrics MUC [9] and BCubed [41].
Results were reported by Finkel and Manning [42].
The MUC F1-score value does not agree with reported precision and recall and has been recalculated.
Results of the proposed SkipCor system, baseline systems, and other approaches on the CoNLL2012 datasets.
| MUC | BCubed | CEAF | |||||||
| P | R | F | P | R | F | P | R | F | |
|
|
| ||||||||
| SkipCor | 76.1 |
|
| 69.6 |
|
| 39.9 |
|
|
| SkipCorZero | 76.7 | 16.8 | 27.5 |
| 36.7 | 53.0 | 35.1 | 35.3 | 35.2 |
| SkipCorPair |
| 45.9 | 58.7 | 86.4 | 56.7 | 68.5 |
| 54.7 | 47.5 |
| Fernandes et al. | - | - | 61.6 | - | - | 70.0 | - | - | 46.6 |
|
| |||||||||
| SkipCor | 76.2 |
|
| 69.8 |
|
| 34.2 |
| 43.3 |
| SkipCorZero | 76.8 | 19.4 | 31.0 |
| 45.8 | 62.0 | 37.4 | 37.6 | 37.5 |
| SkipCorPair |
| 44.7 | 57.8 | 88.6 | 54.6 | 67.6 |
| 56.3 | 46.1 |
| Fernandes et al. | - | - | 65.6 | - | - | 70.0 | - | - |
|
|
| |||||||||
| SkipCor | 84.9 | 63.6 |
| 74.6 | 65.6 | 69.8 | 32.9 |
| 41.7 |
| SkipCorZero | 81.8 | 21.7 | 34.3 |
| 39.7 | 56.1 | 32.1 | 32.7 | 32.4 |
| SkipCorPair |
| 50.7 | 63.8 | 86.5 | 53.4 | 66.0 | 30.8 | 53.8 | 39.2 |
| Fernandes et al. | 77.5 | 64.9 | 70.7 | 79.0 | 64.3 |
|
| 56.5 |
|
| Björkelund et al. | 71.6 | 63.4 | 67.3 | 76.6 | 64.0 | 69.7 | 41.4 | 50.0 | 45.3 |
| Chen et al. | 66.8 | 63.3 | 65.0 | 73.6 | 65.4 | 69.2 | 44.9 | 48.8 | 46.8 |
| Stamborg et al. | 58.8 |
| 62.3 | 65.0 |
| 68.0 | 45.8 | 38.5 | 41.8 |
| Zhekova et al. | 54.7 | 55.0 | 54.8 | 55.6 | 61.9 | 58.6 | 34.7 | 34.4 | 34.5 |
| Li et al. | 33.7 | 44.2 | 38.2 | 53.9 | 66.4 | 59.5 | 36.5 | 27.5 | 31.4 |
Coreference resolution systems evaluated on the CoNLL2012 dataset (i.e., ALL) [43], and its newswire (i.e., NW) and broadcast news (i.e., BN) subdatasets using the metrics MUC [9], BCubed [41] and CEAF [42].
Results of the proposed SkipCor system, baseline systems, and other approaches on the SemEval2010 dataset.
| MUC | BCubed | CEAF | |||||||
| P | R | F | P | R | F | P | R | F | |
|
|
| ||||||||
| SkipCor | 68.8 | 30.1 | 41.8 | 94.8 |
|
| 74.0 | 78.5 |
|
| SkipCorZero | 67.0 | 3.6 | 6.8 |
| 75.1 | 85.7 | 73.0 | 73.1 | 73.1 |
| SkipCorPair |
| 35.6 | 48.7 | 97.1 | 79.0 | 87.1 | 72.7 |
| 75.9 |
| RelaxCor | 72.4 | 21.9 | 33.7 | 97.0 | 74.8 | 84.5 |
| 75.6 | 75.6 |
| SUCRE | 54.9 |
|
| 78.5 | 86.7 | 82.4 | 74.3 | 74.3 | 74.3 |
| TANL-1 | 24.4 | 23.7 | 24.0 | 72.1 | 74.6 | 73.4 | 61.4 | 75.0 | 67.6 |
| UBIU | 25.5 | 17.2 | 20.5 | 83.5 | 67.8 | 74.8 | 68.2 | 63.4 | 65.7 |
Coreference resolution systems evaluated on the SemEval2010 dataset [46] using the metrics MUC [9], BCubed [41] and CEAF [42].
Comparison of the results when training on one type of dataset or domain and testing on another.
| Model | |||||
| Dataset | A-BN | A-NW | C-BN | C-NW | SemEval2010 |
|
|
|
| 65, 70, 28 | 64, 69, 29 | 42, |
|
|
|
| 60, 64, 27 | 59, 69, 29 | 42, 67, |
|
| 33, 56, 37 | 40, 58, 39 |
|
|
|
|
| 39, 57, 39 | 41, 59, 41 |
|
| 56, 64, 32 |
|
| 19, | 23, | 39, 76, 40 | 39, |
|
Coreference resolution results comparison on ACE2004 (i.e., A), CoNLL2012 (i.e., C) and SemEval2010 newswire (i.e., NW) and broadcast news (i.e., BN) datasets. Each column represents a model trained on a specific dataset, while each row represents a dataset. Values represent -scores of MUC [9], BCubed [41] and CEAF [42], respectively.