Literature DB >> 21765855

Analysis of agreement on traditional chinese medical diagnostics for many practitioners.

Lun-Chien Lo¹, Tsung-Lin Cheng, You-Chieh Huang, Ying-Ling Chen, Jeng-Ting Wang.

Abstract

In Traditional Chinese Medicine (TCM) diagnostics, it is an important issue to study the degree of agreement among several distinct practitioners. In order to study the reliability of TCM diagnostics, we have to design an experiment to simultaneously deal with both of the cases when the data is ordinal and when there are many TCM practitioners. In this study, we consider a reliability measure called "Krippendorff's alpha" to investigate the agreement of tongue diagnostics in TCM. Besides, since it is not easy to obtain a large data set with patients rated simultaneously by many TCM practitioners, we use the renowned "bootstrapping" to obtain a 95% confidence interval for the Krippendorff's alpha. The estimated Krippendorff's alpha for the agreement among ten physicians that discerned fifteen randomly chosen patients is 0.7343, and the 95% bootstrapping confidence interval for the true alpha coefficient is [0.6570, 0.7349]. The data was collected and analyzed at the Department of Traditional Chinese Medicine, Changhua Christian Hospital (CCH) in Taiwan.

Entities: Chemical Disease Species

Year: 2011 PMID： 21765855 PMCID： PMC3133885 DOI： 10.1155/2012/178081

Source DB: PubMed Journal: Evid Based Complement Alternat Med ISSN： 1741-427X Impact factor: 2.629

1. Introduction

Studying reliability and validity is important in designing questionnaires in psychological research. The practitioners of western medical system are often skeptical about objectivity of clinical examination in TCM. In TCM diagnostics, there are four clinical diagnostics to evaluate a patient's health condition, which include “Inspection,” “Smelling and Listening,” “Inquiring,” and “Palpation.” The outcome of tongue inspection is an index among many important characteristics in TCM diagnostics. In general, the tongue inspection in TCM refers to the shape, luxuriance and witheredness, toughness and softness, thinness and swelling, and so forth. For example, a patient having an enlarged tongue with slippery fur is categorized into the Yang deficiency and requires corresponding TCM treatment. The diagnostic of TCM depends mainly on the sensorial evaluation. Therefore, the reliability and objectivity of such sensorial diagnostics is important in the modernization of the TCM theory since unreliable diagnoses lead to inappropriate prescriptions. To compare with western modern medical research, only few attempts have so far been made at agreement analysis in TCM diagnostics. In Kim et al. [1], the authors examine the reliability of TCM tongue inspection by the evaluation of inter- and intrapractitioner agreement levels for specific tongue characteristics. Mist et al. [2] investigates whether a training process that focused on a questionnaire-based diagnosis in TCM would improve the agreement of TCM diagnoses. Zhang et al. [3] studied the effect of training that aims to improve the agreement in TCM diagnosis among practitioners for persons with the conventional diagnosis of rheumatoid arthritis. The above studies used proportion of agreement, similar to Goodman and Kruskal [4], to express the degree of agreement among the TCM practitioners. While the proportion of agreement is widely used, such a statistic overlooks the possibility that randomness might cause agreement and/or disagreement. This problem has been partly solved by Cohen [5] who invented the renowned “kappa” coefficient to measure agreement between two raters. Since Cohen's kappa deals only with binary or nominal data, it does not take the discrepancy of agreement for different categories into account. The degree of disagreement may vary according to the categories that classify the data. For example, if patients' health condition can be categorized into “very good,” “ordinary,” and “severely bad,” the agreement between a rating of “very good” and a rating of “ordinary” differs from the agreement between a rating of “very good” and a rating of “severely bad.” O'Brien et al. [6] studied the reliability of diagnostic variables in a TCM examination. In their study, they used the Cohen's kappa to measure the agreement among three TCM practitioners and suggest that even when there are certain features of the TCM system that are highly objective and repeatable, there are also other features that are subjective and unreliable. However, Cohen's kappa cannot deal with the ordinal data. Weighted kappa [7] is a generalization of the original kappa, and it uses the same contingent table to describe the data. However, the weighted kappa cannot deal with the cases when there are more than two raters. Fleiss [8] proposed another “kappa” to measure agreement among more than two practitioners while it only works for nominal data. In the study of the reliability of TCM diagnostics which discerns ordinal categories, not only the levels of disagreements but also the generalization to the case of more than two practitioners should be taken into account simultaneously. To overcome both difficulties, Krippendorff's alpha [9-12] emerges as a good substitute for both of the Cohen's kappa and Fleiss kappa. In this study, we recruited 10 TCM physicians with ages ranging from 28 to 46 and randomly chose 15 patients taking TCM treatments in CCH. Each patient's tongue is photographed using digital camera. Then the recruited TCM practitioners independently classified the patient's tongues into three categories: thin tongue, normal tongue, and enlarged tongue. The estimated Krippendorff's alpha is 0.7343 and its 95% confidence interval by a modified bootstrapping is [0.6570, 0.7349]. We will report the results in next section.

2. Method

2.1. Patients and TCM Tongue Inspectors

Fifteen patients were recruited randomly from the archive of the Department of Traditional Chinese Medicine (TCM), Changhua Christian Hospital (CCH). Their tongues were photographed by a digital camera and were rated, within a day, by ten TCM practitioners educated in China Medical University, Taiwan. All of the recruited TCM practitioners have passed the National Professional & Technical Examinations for Doctors of Chinese Medicine. The rating levels are classified into three categories: enlarged tongue, normal (moderate) tongue, and thin tongue. In general, an enlarged tongue and a thin tongue indicate unhealthy conditions. The ages of the TCM practitioners range from 30 to 45. About five of them just graduated from the medical school within 5 years, and the other five are senior TCM physicians in Changhua Christian Hospital.

2.2. Statistical Analysis

Cohen's kappa is a popular measure of agreement, and its confidence interval relies on a large sample which is, in general, hard to obtain in medical study. Cohen [5] proposed an algorithm based on bootstrapping to obtain a 95% confidence interval for Krippendorff's alpha. In our setting, the algorithm cannot be directly applied and requires some modification such that it can comprise the estimated Krippendorff's alpha. A concrete example on how to calculate Krippendorff's alpha can be found in Cohen [5, 13]. The Krippendorff's alpha measure for tongue inspection data obtained in the Department of Chinese Medicine in Changhua Christian Hospital of Taiwan, using nominal weight, is about 0.7343. In general, people applied asymptotic normality to obtain confidence interval when the data at hand is large enough. While in medical study, it is not easy to obtain a large sample with many raters and many patients in a clinical trial. When we are confronted with a small sample, we may apply Efron's bootstrapping [14] to obtain a reasonable confidence interval for Krippendorff's alpha that measures the agreement of diagnostics among raters. On modifying Krippendorff's original algorithm, we may obtain a reasonable 95% confidence interval for the true Krippendorff's alpha (Appendix B). Table 1 is the data of tongue inspection obtained in the Department of Chinese Medicine, Changhua Christian Hospital of Taiwan. Figure 1 reports the 95% confidence interval for Krippendorff's alpha for the tongue inspection data by Krippendorff's original algorithm. From Figure 1, we see that the confidence interval using Krippendorff's original algorithm does not include the estimated Krippendorff's alpha = 0.7343. However, from Figure 2, the 95% confidence interval of bootstrapped α using our modified algorithm contains the estimated Krippendorff's alpha.

Table 1

Tongue diagnostics obtained by Changhua Christian Hospital.

Unit	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Rater 1	2	3	1	1	3	3	1	1	1	2	3	1	2	2	3
Rater 2	2	3	1	2	3	3	1	1	1	2	3	1	2	2	3
Rater 3	2	3	1	2	3	3	1	1	1	2	3	1	2	2	3
Rater 4	2	3	1	2	3	3	1	1	2	2	3	1	2	2	3
Rater 5	2	3	1	2	3	3	1	2	2	2	3	1	2	2	3
Rater 6	2	3	1	2	3	3	1	2	2	2	3	2	2	2	3
Rater 7	2	3	1	2	3	3	1	2	2	2	3	2	3	2	3
Rater 8	2	3	1	2	3	3	2	2	2	2	3	2	3	2	3
Rater 9	2	3	1	2	3	3	2	2	3	2	2	2	3	3	3
Rater 10	2	3	2	3	3	3	2	2	3	3	2	2	3	3	2

Figure 1

The distribution of bootstrapped α adopting Krippendorff's original algorithm.

Figure 2

The distribution of bootstrapped α adopting our modified algorithm.

3. Conclusion

There are many works investigating agreement measures for western medical diagnostics, while only few study agreement analysis among TCM physicians. In the literature concerning agreement analysis, although many researchers consider complex TCM diagnostics, most of them adopted a so-called “proportion of agreement” measure which overlooks the possible bias caused by randomness. O'Brien et al. [6] used the Cohen's kappa to measure the agreement among three TCM practitioners while Cohen's kappa cannot deal with data of ordinal scale. To simultaneously deal with the case when there are many raters and the case when the data is ordinal as well as multinomial distributed, Krippendorff's alpha provides itself as a good substitute both for Cohen's and Fleiss' kappa. We not only estimate the Krippendorff's alpha coefficient of 0.7343 for the tongue inspection data obtained in the Department of TCM, CCH of Taiwan, but also modify Krippendorff's bootstrapping algorithm to obtain a 95% confidence interval [0.6570, 0.7349] for the Krippendorff's alpha. In this study, for such a dataset that a patient's tongue is classified into three distinct categories, it seems that the diagnostics of tongue's shapes in TCM is moderately reliable in the standard of reliability requirement. Apart from tongue inspection, there are many other diagnostics that are regularly used to rate a patient's health condition, for example, listening, smelling, inquiring, palpation, and so forth. The agreement analysis of other diagnostics in TCM among many practitioners involves more complicated methods of experimental design. This study may serve itself as a touchstone of approaching the reliabilities of many other diagnostics among several practitioners in TCM. We will focus on this topic in the future.

(a)

Rater	Unit
Rater	1	2	⋯	N
1	c ₁₁	c ₁₂	⋯	c _1r
2	c ₂₁	c ₂₂	⋯	c _2r
⋮	⋮	⋮	⋱	⋮
r	c _r1	c _r2	⋯	c _rN

Number of ratings	m ₁	m ₂	⋯	m _N

(b)

	1	⋯	j	·
1	o ₁₁	·	o _1j	⋯
·	·		·	⋮
i	o _i1	·	o _ij	⋯
·	·		·	⋱

6 in total

1. Understanding the reliability of diagnostic variables in a Chinese Medicine examination.

Authors: Kylie A O'Brien; Estelle Abbas; Jiansheng Zhang; Zhi-Xin Guo; Ruizhi Luo; Alan Bensoussan; Paul A Komesaroff
Journal: J Altern Complement Med Date: 2009-07 Impact factor: 2.579

2. Traditional Chinese medicine tongue inspection: an examination of the inter- and intrapractitioner reliability for specific tongue characteristics.

Authors: Minah Kim; Deirdre Cobbin; Christopher Zaslawski
Journal: J Altern Complement Med Date: 2008-06 Impact factor: 2.579

3. Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.

Authors: J Cohen
Journal: Psychol Bull Date: 1968-10 Impact factor: 17.737

4. Quantitative guidelines for communicable disease control programs.

Authors: K Krippendorff
Journal: Biometrics Date: 1978-03 Impact factor: 2.571

5. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine.

Authors: Scott Mist; Cheryl Ritenbaugh; Mikel Aickin
Journal: J Altern Complement Med Date: 2009-07 Impact factor: 2.579

6. Improvement of agreement in TCM diagnosis among TCM practitioners for persons with the conventional diagnosis of rheumatoid arthritis: effect of training.

Authors: Grant G Zhang; Betsy Singh; Wenlin Lee; Barry Handwerger; Lixing Lao; Brian Berman
Journal: J Altern Complement Med Date: 2008-05 Impact factor: 2.579