| Literature DB >> 24459452 |
Rakesh Patra1, Sujan Kumar Saha1.
Abstract
Support vector machine (SVM) is one of the popular machine learning techniques used in various text processing tasks including named entity recognition (NER). The performance of the SVM classifier largely depends on the appropriateness of the kernel function. In the last few years a number of task-specific kernel functions have been proposed and used in various text processing tasks, for example, string kernel, graph kernel, tree kernel and so on. So far very few efforts have been devoted to the development of NER task specific kernel. In the literature we found that the tree kernel has been used in NER task only for entity boundary detection or reannotation. The conventional tree kernel is unable to execute the complete NER task on its own. In this paper we have proposed a kernel function, motivated by the tree kernel, which is able to perform the complete NER task. To examine the effectiveness of the proposed kernel, we have applied the kernel function on the openly available JNLPBA 2004 data. Our kernel executes the complete NER task and achieves reasonable accuracy.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24459452 PMCID: PMC3891429 DOI: 10.1155/2013/950796
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Tree kernels.
Figure 2Sliding tree kernel.
Figure 3SL constituent trees to word's sliding window.
Performance of overall NER system on linear SVM using word features.
| Feature | Precision% | Recall% |
|
|---|---|---|---|
| Word window 5 | 54.89% | 57.34% | 56.09% |
| Word window 7 | 53% | 54.85% | 53.91% |
Performance of different s values of SL kernel on class B-DNA.
| Class | SL tree kernel with changing “ | Precision% | Recall% |
|
|---|---|---|---|---|
| B-DNA |
| 79.61% | 42.89% | 55.75% |
| B-DNA |
| 82.44% | 42.7% | 56.26% |
| B-DNA |
| 82.17% | 41.47% | 55.12% |
The overall NER system performance using proposed kernel of s = 7.
| SL tree kernel | Precision% | Recall% |
|
|---|---|---|---|
|
| 72.63% | 51.99% | 60.60% |
Our system compared with existing systems.
| System | ML approach | Domain knowledge |
|
|---|---|---|---|
| Normal SVM | Linear SVM | — | 56.09% |
| Our system | SVM with SL | — | 60.60% |
|
Zhou and Su (2004) [ | HMM, SVM | Resolution of name alias, cascaded NEs, and Abbreviations; dictionary; POS | 72.55% |
|
Zhou and Su (2004) [ | HMM, SVM | (baseline) | 64.1% |
| Song et al. (2004) [ | SVM, CRF | POS information, phrase, and virtual sample | 66.28% |
| Song et al. (2004) [ | SVM | (baseline) | 63.85% |
| Saha et al. (2010) [ | Composite kernel | — | 67.89% |