| Literature DB >> 18426545 |
Richárd Farkas1, György Szarvas.
Abstract
BACKGROUND: In this paper we focus on the problem of automatically constructing ICD-9-CM coding systems for radiology reports. ICD-9-CM codes are used for billing purposes by health institutes and are assigned to clinical records manually following clinical treatment. Since this labeling task requires expert knowledge in the field of medicine, the process itself is costly and is prone to errors as human annotators have to consider thousands of possible codes when assigning the right ICD-9-CM labels to a document. In this study we use the datasets made available for training and testing automated ICD-9-CM coding systems by the organisers of an International Challenge on Classifying Clinical Free Text Using Natural Language Processing in spring 2007. The challenge itself was dominated by entirely or partly rule-based systems that solve the coding task using a set of hand crafted expert rules. Since the feasibility of the construction of such systems for thousands of ICD codes is indeed questionable, we decided to examine the problem of automatically constructing similar rule sets that turned out to achieve a remarkable accuracy in the shared task challenge.Entities:
Mesh:
Year: 2008 PMID: 18426545 PMCID: PMC2352868 DOI: 10.1186/1471-2105-9-S3-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Generating expert rules from an ICD-9-CM coding guide.
| Pulmonary collapse | |
| Atelectasis | |
| Collapse of lung | |
| Middle lobe syndrome | |
| Excludes: | |
| atelectasis: | |
| congenital (partial) (770.5) | |
| primary (770.4) | |
| tuberculous, current disease (011.8) | |
Overview of our results.
| train | test | |
| 45-class statistical | 88.20 | 86.69 |
| Simple rule-based | 84.07 | 83.21 |
| Rule-based with label-dependencies | 85.57 | 84.85 |
| Hybrid rule-based + C4.5 | 90.22 | 88.92 |
| Hybrid rule-based + MaxEnt | 90.26 | 88.93 |
| CMC challenge best system | 90.02 | 89.08 |
All values are micro-averaged Fβ=1.
The 45-class statistical row stands for a C4.5 classifier trained for single labels. The CMC challenge best system gives the results of the best system that was submitted to the CMC challenge. All our models use the same algorithm to detect negation and speculative assertions, and were trained using the whole training set (simple rule-based model needs no training) and evaluated on the training and the challenge test sets. The difference in performance between the 45-class statistical model and our best hybrid system (that is, using rule-based + MaxEnt models) proved to be statistically significant on both the training and test datasets, using McNemar's test with a p < 0.05 confidence level. On the other hand, the difference between our best hybrid model (constructed automatically) and our manually constructed ICD-9-CM coder (the CMC challenge best system) was not statistically significant on either set.
Inter-annotator agreement rates on the challenge train / test sets, in micro-averaged Fβ=1.
| A1 | A2 | A3 | GS | BasicRB | Hybrid | |
| A1 | – | 73.97/75.79 | 65.61/67.28 | 83.67/84.62 | 75.11/75.56 | 78.02/79.19 |
| A2 | 73.97/75.79 | – | 70.89/72.68 | 88.48/89.63 | 78.52/78.43 | 83.40/82.84 |
| A3 | 65.61/67.28 | 70.89/72.68 | – | 82.01/82.64 | 75.48/74.29 | 80.11/78.97 |
| GS | 83.67/84.62 | 88.48/89.63 | 82.01/82.64 | – | 85.57/84.85 | 90.26/88.93 |
| BasicRB | 75.11/75.56 | 78.52/78.43 | 75.48/74.29 | 85.57/84.85 | – | – |
| Hybrid | 78.02/79.19 | 83.40/82.84 | 80.11/78.97 | 90.26/88.93 | – | – |
A1, A2 and A3 refers to Annotators 1, 2 and 3 respectively. GS stands for gold standard labeling, while BasicRB represents our basic rule-based system that models inter-label dependencies. Hybrid denotes our hybrid rule-based + MaxEnt statistical model.