| Literature DB >> 30589855 |
Jose Antonio Miñarro-Giménez1, Catalina Martínez-Costa1, Daniel Karlsson2, Stefan Schulz1, Kirstine Rosenbeck Gøeg3.
Abstract
SNOMED CT provides about 300,000 codes with fine-grained concept definitions to support interoperability of health data. Coding clinical texts with medical terminologies it is not a trivial task and is prone to disagreements between coders. We conducted a qualitative analysis to identify sources of disagreements on an annotation experiment which used a subset of SNOMED CT with some restrictions. A corpus of 20 English clinical text fragments from diverse origins and languages was annotated independently by two domain medically trained annotators following a specific annotation guideline. By following this guideline, the annotators had to assign sets of SNOMED CT codes to noun phrases, together with concept and term coverage ratings. Then, the annotations were manually examined against a reference standard to determine sources of disagreements. Five categories were identified. In our results, the most frequent cause of inter-annotator disagreement was related to human issues. In several cases disagreements revealed gaps in the annotation guidelines and lack of training of annotators. The reminder issues can be influenced by some SNOMED CT features.Entities:
Mesh:
Year: 2018 PMID: 30589855 PMCID: PMC6307753 DOI: 10.1371/journal.pone.0209547
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Number of concepts associated with each semantic group and extracted from SNOMED CT to create the terminology setting.
| Semantic group | Content |
|---|---|
| Disorder | 111,424 |
| Objects | 312 |
| Living Beings | 20,467 |
| Devices | 12,822 |
| Chemical and Drugs | 5,802 |
| Procedures | 55,783 |
| Genes and Molecular Sequences | 356 |
| Concepts and Ideas | 5,994 |
| Anatomy | 28,646 |
Definition of concept coverage scores for ASSESS CT manual annotation.
| Score | Definition |
|---|---|
| When the meaning of a chunk is fully represented by a set of codes, e.g. the term “Heart attack” is fully covered with “22298006 | Myocardial infarction (disorder)|”. | |
| When the meaning of elliptic or ambiguous chunks of text can be inferred from the context and can be fully represented by a set of codes, e.g. a specific use of the term “hypertension” could mean “Renal arterial hypertension”, so the code “39018007 | Renal arterial hypertension (disorder)|” is justified. | |
| When the meaning of the chunk comes close to the meaning of a set of codes, e.g. “Third rib fracture” is more specific than the code “20274005 | Fracture of one rib (disorder)|”. Yet the meaning is close enough to justify annotation with this code. | |
| When there is not any set of codes that has a closer meaning to the chunk, e.g. generic codes such as “125605004 | Fracture of bone (disorder)|” for coding “third rib fracture” must not be used as partial coverage. |
Fig 1Fragment of an annotation spreadsheet.
It shows the header of the document and an example of the annotation of two chunks with SNOMED CT (SCT ONLY) and UMLS terminology settings.
The number of concepts from each semantic group that were used by each annotator and the reference standard for coding the 20 medical text snippets.
| Semantic group | Annotator 1 | Annotator 2 | Reference standard |
|---|---|---|---|
| Disorder | 69 | 98 | 94 |
| Objects | 1 | 2 | 2 |
| Living Beings | 0 | 0 | 0 |
| Devices | 2 | 1 | 1 |
| Chemicals and Drugs | 12 | 10 | 10 |
| Procedures | 28 | 31 | 33 |
| Genes and Molecular Sequences | 0 | 0 | 0 |
| Concepts and Ideas | 68 | 64 | 65 |
| Anatomy | 67 | 38 | 56 |
Quantitative analysis of the agreements at chunk level between the reference standard with the two English annotators.
| Cases | Agreements |
|---|---|
| Reference standard agrees with both annotators | 50 |
| Reference standard agrees only with the first annotator | 39 |
| Reference standard agrees only with the second annotator | 50 |
| Reference standard does not agree with any annotator | 92 |
Typology of disagreements and their frequency.
| Categories of disagreement | Frequency |
|---|---|
| Human issues | 156 |
| Annotation guidelines issues | 38 |
| Ontology issues | 22 |
| Interface term issues | 22 |
| Language issues | 9 |
Fig 2Hodgkin's disease hierarchy.