| Literature DB >> 27570656 |
Denis Griffis1, Chaitanya Shivade2, Eric Fosler-Lussier2, Albert M Lai3.
Abstract
Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation.Entities:
Year: 2016 PMID: 27570656 PMCID: PMC5001746
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Summary of datasets
| Corpus | Type of Data | # of Documents | # of Sentences | Avg. Sentence Length (# Tokens) |
|---|---|---|---|---|
| BNC | General text, mixed-domain | 4,049 | 6,027,378 | 16.1 |
| Switchboard | Telephone conversations | 650 | 110,504 | 7.4 |
| GENIA | MEDLINE abstracts | 1,999 | 16,479 | 24.4 |
| i2b2 | Clinical Notes | 426 | 43,940 | 9.5 |
Summary of toolkits
| Toolkit | Version | URL (as of 1/7/2016) | Training Corpora |
|---|---|---|---|
| Stanford | 3.5.2 |
| PTB, GENIA, Other |
| Lingpipe | 4.1.0 |
| MEDLINE abstracts, |
| Splitta | 1.03 |
| PTB |
| SPECIALIST | 2.4C |
| SPECIALIST |
| cTAKES | 3.2.2 |
| GENIA, PTB, Mayo |
Precision (Pr), Recall (Re), and F1 score (FS) of sentence boundary detection task, evaluated for each tool on each dataset. The best results for each dataset are highlighted in bold.
| Toolkit | BNC | SWB | GENIA | i2b2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
| 0.59 | 0.37 | 0.45 | 0.98 | 0.98 | 0.98 | 0.58 | 0.34 | 0.43 |
|
| 0.83 | 0.65 | 0.73 | 0.58 | 0.33 | 0.42 | 0.97 | 0.95 | 0.96 | 0.59 | 0.33 | 0.42 |
|
| 0.82 | 0.65 | 0.72 | 0.59 | 0.34 | 0.43 |
|
|
| 0.54 | 0.34 | 0.41 |
|
| - | - | - | 0.55 | 0.30 | 0.39 | 0.98 | 0.96 | 0.97 | 0.59 | 0.34 | 0.43 |
|
| - | - | - | 0.55 | 0.30 | 0.39 |
|
|
| 0.58 | 0.35 | 0.43 |
|
| 0.77 | 0.71 | 0.74 | 0.60 | 0.37 | 0.46 | 0.89 | 0.94 | 0.92 | 0.58 | 0.53 | 0.56 |
|
| 0.73 | 0.75 | 0.74 |
|
|
| 0.62 | 0.76 | 0.68 |
|
|
|
Distribution of terminal sentence characters among the four corpora, by character type.
| Corpus | Period | Other Punctuation | Lowercase | Uppercase | Numeric | Special |
|---|---|---|---|---|---|---|
| BNC | 74% | 15% | 9% | 1% | <1% | <1% |
| Switchboard | 56% | 33% | 10% | <1% | 0% | <1% |
| GENIA | 99% | <1% | <1% | <1% | 0% | <1% |
| I2b2 | 51% | 22% | 9% | 7% | 9% | 1% |
Figure 1.Average errors per 1000 sentences, by the type of terminal character in the sentence. A and B show errors created by each toolkit, calculated as a sum of the errors on each corpus weighted by the number of sentences in that corpus. C and D show errors that occurred in each corpus, averaged across the toolkits used.
Figure 2.Runtime to process clinical notes corpus.