| Literature DB >> 30369958 |
Zhen Qi1, Li-Ping Tu1, Zhi-Yu Luo1, Xiao-Juan Hu2, Ling-Zhi Zeng1, Wen Jiao1, Xu-Xiang Ma1, Cong-Cong Jing1, Wei-Jian Wang3, Zhi-Feng Zhang1, Jia-Tuo Xu1.
Abstract
This study aims at introducing a method for individual agreement evaluation to identify the discordant raters from the experts' group. We exclude those experts and decide the best experts selection method, so as to improve the reliability of the constructed tongue image database based on experts' opinions. Fifty experienced experts from the TCM diagnostic field all over China were invited to give ratings for 300 randomly selected tongue images. Gwet's AC1 (first-order agreement coefficient) was used to calculate the interrater and intrarater agreement. The optimization of the interrater agreement and the disagreement score were put forward to evaluate the external consistency for individual expert. The proposed method could successfully optimize the interrater agreement. By comparing three experts' selection methods, the interrater agreement was, respectively, increased from 0.53 [0.32-0.75] for original one to 0.64 [0.39-0.80] using method A (inclusion of experts whose intrarater agreement>0.6), 0.69 [0.63-0.81] using method B (inclusion of experts whose disagreement score="0"), and 0.76 [0.67-0.83] using method C (inclusion of experts whose intrarater agreement>0.6& disagreement score="0"). In this study, we provide an estimate of external consistency for individual expert, and the comprehensive consideration of both the internal consistency and the external consistency for each expert would be superior to either one in the tongue image construction based on expert opinions.Entities:
Year: 2018 PMID: 30369958 PMCID: PMC6189655 DOI: 10.1155/2018/8491057
Source DB: PubMed Journal: Evid Based Complement Alternat Med ISSN: 1741-427X Impact factor: 2.629
Figure 1Web interface of tongue image diagnosis for experts.
Figure 2The whole process of the proposed method.
Disagreement scoring method for each expert.
| experts | recognition times | disagreement scores |
|---|---|---|
| discordant experts (K) | 1 | m |
| 2 | m-1 | |
| 3 | m-2 | |
| ⋮ | ⋮ | |
| ⋮ | ⋮ | |
| m | 1 | |
|
| ||
| rest of the experts (n-k) | none | 0 |
The inter-rater agreement changes after the first identification of discordant experts.
| 25 features | inter-rater agreement | ||
|---|---|---|---|
| before first identification | after first identification | ||
| tongue body | pale | 0.7622 | 0.8125 |
| light red | 0.3431 | 0.4357 | |
| red and crimson | 0.5397 | 0.6254 | |
| purplish | 0.7292 | 0.7845 | |
| old | 0.5566 | 0.6420 | |
| moderate texture | 0.0948 | 0.1674 | |
| tender | 0.5349 | 0.5842 | |
| enlarged | 0.3345 | 0.4411 | |
| moderate shape | 0.1477 | 0.2212 | |
| thin | 0.8389 | 0.8864 | |
| teeth-print | 0.4580 | 0.5349 | |
| red dot | 0.4911 | 0.6269 | |
| crack | 0.4972 | 0.5686 | |
| bruise | 0.7780 | 0.8787 | |
| petechia | 0.7803 | 0.8897 | |
|
| |||
| tongue fur | white fur | 0.4902 | 0.5941 |
| yellowish fur | 0.6699 | 0.7398 | |
| black and gray fur | 0.9818 | 0.9869 | |
| white and yellowish fur | 0.8173 | 0.8826 | |
| thin fur | 0.2769 | 0.3507 | |
| thick fur | 0.3147 | 0.3816 | |
| moist fur | 0.2706 | 0.3625 | |
| damp and smooth fur | 0.7115 | 0.8084 | |
| dry and rough fur | 0.6600 | 0.7357 | |
| greasy fur | 0.2213 | 0.3167 | |
Improving process of the interrater agreement for “the moderate tongue texture.”
| Recognition times | Identified discordant experts | Rest experts | Inter-rater agreement for the rest of the experts |
|---|---|---|---|
| 1 | rater 45, rater 49, rater 32, rater 42, rater 34, rater 40 | 44 | 0.1674 |
| 2 | rater 20, rater 16, rater 13, rater 3 | 40 | 0.2220 |
| 3 | rater 5, rater 43 | 38 | 0.2470 |
| 4 | rater 15, rater 25 | 36 | 0.2707 |
| 5 | rater 29, rater 39, rater 33, rater 47 | 32 | 0.3247 |
| 6 | rater 31, rater 35, rater 50, rater 26, rater 38 | 27 | 0.4098 |
| 7 | rater 41, rater 17, rater 1, rater 10 | 23 | 0.4890 |
| 8 | rater 46, rater 19, rater 44, rater 37, rater 30, rater 18, rater 11 | 16 | 0.6333 |
Optimizing results of interrater agreement for all the tongue features which needed optimization.
| Optimized tongue features | recognition times | rest of the experts | Inter-rater agreement | ||
|---|---|---|---|---|---|
| Before optimization | After optimization | ||||
| Tongue body | Light red | 3 | 19 | 0.3431 | 0.6450 |
| moderate texture | 8 | 16 | 0.0948 | 0.6333 | |
| tender | 2 | 41 | 0.5349 | 0.6535 | |
| Enlarged | 3 | 26 | 0.3345 | 0.6404 | |
| Moderate shape | 5 | 10 | 0.1477 | 0.6310 | |
| Teeth-print | 2 | 16 | 0.4580 | 0.6363 | |
| crack | 2 | 21 | 0.4972 | 0.6960 | |
|
| |||||
| Tongue fur | White fur | 2 | 35 | 0.4902 | 0.6907 |
| Thin fur | 3 | 24 | 0.2769 | 0.4601 | |
| Thick fur | 3 | 4 | 0.3147 | 0.6406 | |
| Moist fur | 5 | 22 | 0.2706 | 0.6989 | |
| Greasy fur | 4 | 24 | 0.2213 | 0.6200 | |
Disagreement scores of discordant experts for “the moderate tongue texture”.
| Recognition times | Identified discordant experts | The disagreement scores |
|---|---|---|
| 1 | rater 45, rater 49, rater 32, rater 42, rater 34, rater 40 | 8 |
| 2 | rater 20, rater 16, rater 13, rater 3 | 7 |
| 3 | rater 5, rater 43 | 6 |
| 4 | rater 15, rater 25 | 5 |
| 5 | rater 29, rater 39, rater 33, rater 47 | 4 |
| 6 | rater 31, rater 35, rater 50, rater 26, rater 38 | 3 |
| 7 | rater 41, rater 17, rater 1, rater 10 | 2 |
| 8 | rater 46, rater 19, rater 44, rater 37, rater 30, rater 18, rater 11 | 1 |
Figure 3Distribution of interrater agreement and disagreement scores for 50 experts. Notes. Section A= experts with lower internal consistency and more discordant test results; Section B= experts with higher internal consistency but more discordant test results; Section C= experts with lower internal consistency but less discordant test results; Section D= experts with higher internal consistency and less discordant test results.
Figure 4Interrater agreement of 25 tongue features after three expert selection methods. Notes. ∗ compared with interrater agreement before selection, P<0.05.