Literature DB >> 20065616

Declining accuracy in disease classification on health insurance claims: should we reconsider classification by principal diagnosis?

Etsuji Okamoto1.   

Abstract

BACKGROUND: An ideal classification should have maximum intercategory variance and minimal intracategory variance. Health insurance claims typically include multiple diagnoses and are classified into different disease categories by choosing principal diagnoses. The accuracy of classification based on principal diagnoses was evaluated by comparing intercategory and intracategory variance of per-claim costs and the trend in accuracy was reviewed.
METHODS: Means and standard deviations of log-transformed per-claim costs were estimated from outpatient claims data from the National Health Insurance Medical Benefit Surveys of 1995 to 2007, a period during which only the ICD10 classification was applied. Intercategory and intracategory variances were calculated for each of 38 mutually exclusive disease categories and the percentage of intercategory variance to overall variance was calculated to assess the trend in accuracy of classification.
RESULTS: A declining trend in the percentage of intercategory variance was observed: from 19.5% in 1995 to 10% in 2007. This suggests that there was a decline in the accuracy of disease classification in discriminating per-claim costs for different disease categories. The declining trend temporarily reversed in 2002, when hospitals and clinics were directed to assign the principal diagnosis. However, this reversal was only temporary and the declining trend appears to be consistent.
CONCLUSIONS: Classification of health insurance claims based on principal diagnoses is becoming progressively less accurate in discriminating per-claim costs. Researchers who estimate disease-specific health care costs using health insurance claims must therefore proceed with caution.

Entities:  

Mesh:

Year:  2010        PMID: 20065616      PMCID: PMC3900816          DOI: 10.2188/jea.je20090044

Source DB:  PubMed          Journal:  J Epidemiol        ISSN: 0917-5040            Impact factor:   3.211


INTRODUCTION

Health insurance claims contain diagnostic information and are a valuable data source for economic and epidemiological studies. However, 2 problems arise when researchers use health insurance claims for epidemiological studies: the need to ensure (1) the accuracy of diagnoses and (2) the accuracy of disease classification. The former is challenging because health insurance claims are essentially financial documents and not medical records. The latter derives from the fact that principal diagnoses are chosen rather arbitrarily when the coders are not properly trained. To bypass these difficulties, studies attempting to evaluate the economic effects of smoking,[1] walking,[2] and health promotional activities[3] have largely used per-capita health care cost, without disease classification. Some studies estimating disease-specific health care costs for diseases such as asthma[4] and liver disease[5] also used other data sources, including the Patient Survey (a one-day cross-sectional sampling survey conducted by the Japanese Ministry of Health, Labour & Welfare), to increase the accuracy of disease classification. Indeed, disease classification on health insurance claims was shown to be of questionable accuracy when compared with the Patient Survey even for a well-defined disease category like dialysis.[6] Health insurance claims are widely used for epidemiological studies abroad, and foreign researchers have validated the accuracy of diagnoses in an empirical manner with more or less positive results. A Korean study reported 76% accuracy of acute myocardial infarction (AMI) diagnoses through matching with medical records.[7] A Taiwan study reported 74.6% accuracy of diabetes diagnoses through a questionnaire survey to patients.[8] Researchers in the United States reported even higher accuracy: 94.1% positive predictive value (PPV) for AMI diagnoses,[9] 72.6% to 80.8% PPV for pneumonia,[10] and 76.2% sensitivity and 93.3% specificity for hypertension.[11] Some researchers went so far as to match cases with the cancer registry to validate diagnoses of malignancy.[12] However, when researchers use health insurance claims classified by principal diagnoses, the second problem, ie, the accuracy of classification, is more important than the accuracy of diagnoses per se. There may also be systematic biases in classification, because some diseases are more likely to be chosen as principal diagnoses than others.[13] Accuracy of diagnoses can only be validated empirically through matching with a gold standard such as medical records, but accuracy of disease classification can be evaluated statistically. If claims of the same disease category have the same values, accurate classification should yield uniform claims, ie, zero variance. In other words, accurate classification should maximize the intercategory variance while minimizing intracategory variance. In this study, statistical analysis is used to evaluate the accuracy of classification by analyzing per-claim costs of outpatient claims. Per-claim cost is the amount of money charged for medical treatment and is written on the bottom line of a health insurance claim. Per-claim cost is expressed in points and can be converted into Japanese yen by multiplying by 10.

METHODS

Theory

Disease-specific means and variance can be estimated from published frequency tables—without referring to microdata that are not readily available—by using an optimization program such as Excel Solver with the assumption of a particular distribution. If a normal distribution is assumed, as is usually the case, then the frequency tables must follow a normal distribution for the optimization program to yield good estimates. Per-claim costs of health insurance claims do not follow a normal distribution, as evidenced by the skewed distribution in the frequency tables; they follow a log-normal distribution.[14] Therefore, the ranges of frequency tables were log-transformed to ensure normal distribution. The goodness-of-fit of the log-normal distribution was confirmed by the Kolmogorov-Smirnov test.

Data source

The data source was the National Health Insurance Medical Benefit Survey (NHIMBS), a sampling survey on health insurance claims submitted in May of the survey year. Japan’s National Health Insurance covers the population that does not have regular employment, eg, the self-employed, retired, and part-time workers. The NHIBMS has been administered by the Ministry of Health, Labour & Welfare (MHLW), Bureau of Health Insurance, Department of Investigation every year since 1955. The reports include 24 summary tables and 11 raw output tables, which are distributed by the Central Federation of National Health Insurance. Summary tables from 1998 to 2005 and both summary and raw output tables from 2005 to 2007 are available from the Portal Site of Official Statistics of Japan, maintained by the Statistics Bureau, Ministry of Internal Affairs and Communications, with the collaboration of related ministries and agencies (http://www.e-stat.go.jp). The NHIMBS is a survey of all insurers (1818 municipal governments and 165 National Health Insurance societies as of March 2007). Health insurance claims are sampled randomly by each insurer at a specified sampling proportion. The sampling proportion is approximately 1/500 for regular and elderly beneficiaries. Until 2002, elderly was defined as age 70 years or older, after which the threshold was raised gradually to 75 years in 2007. Elderly beneficiaries also include people 65 years or older with certain disabilities. The sampling proportion for retiree beneficiaries is 1/100. Because health insurance claims are administrative data, the population of health insurance claims can be determined from monthly administrative reports compiled by the Central Federation of National Health Insurance (www.kokuho.or.jp). The exact population and sample size of outpatient claims, as well as the number of beneficiaries from which the data were derived, are shown in Table 1. Thirteen years of data (1995–2007) were used because the same ICD10 classification (commonly referred to as the “119 classification”[15]) has been applied since 1995.
Table 1.

Data on beneficiaries and outpatient claims

 No. of beneficiariesaNo. of outpatient claims


 (in thousands)Regular and elderly beneficiariesRetiree beneficiaries


 RegularRetireeElderlyTotalPopulation (P)a(Sample (S)b)P/SPopulation (P)a(Sample (S)b)P/S
199530 5074153829042 95023 315 377(47 572)490.13 660 465(36 707)99.7
199630 3194254883143 40424 472 909(49 297)496.43 840 657(36 303)105.8
199730 4514373935244 17625 123 382(48 647)516.43 935 421(39 472)99.7
199830 1554590992144 66626 074 183(52 711)494.74 144 267(41 694)99.4
199930 520478610 54245 84827 134 887(55 406)489.74 315 181(43 426)99.4
200030 710514011 19047 04029 160 510(59 429)490.74 741 168(47 450)99.9
200131 213534312 39648 95230 485 879(62 331)489.14 949 316(49 243)100.5
200231 460553112 56749 55831 434 551(63 893)492.05 028 551(50 398)99.8
200332 264604712 59150 90232 250 061(65 420)493.05 463 517(54 600)100.1
200432 691679512 20451 69031 511 976(64 424)489.16 035 495(60 264)100.2
200532 680750011 73051 91032 192 705(65 289)493.16 949 882(69 212)100.4
200632 340819011 27051 80032 516 241(65 900)493.47 878 017(78 171)100.8
200731 783882210 81951 42432 368 390(65 436)494.78 727 709(86 610)100.8

afrom monthly administrative reports compiled by the Central Federation of National Health Insurance.

bfrom National Health Insurance Medical Benefit Survey.

afrom monthly administrative reports compiled by the Central Federation of National Health Insurance. bfrom National Health Insurance Medical Benefit Survey. The representativeness of the data is believed to be satisfactory because the survey includes all insurers. However, some irregularities were observed for renal failures in 2000, as shown in Table 2. Data in the genitourinary disease category of 2000 were modified according to the 1999 and 2001 data.
Table 2.

Distribution of claims of renal failure (ICD10:N00–19) by per-claim cost (%)

 Sample size (sampling proportion)Per-claim cost (yen)Arithmeticmean


 Regularbeneficiaries(1/500)Elderlybeneficiaries(1/500)Retireebeneficiaries(1/100)1–500500–10001000–20002000–30003000–50005000–10 000>10 000
199521113824512.311.116.311.87.55.534.9159 292
199621116521114.19.117.010.57.75.335.6163 785
199719316223410.512.015.58.59.25.039.2170 907
199821718526710.59.519.89.27.95.736.7168 703
199919321226611.411.612.28.16.63.546.1186 753
200021924728713.810.317.88.67.527.0a14.5a94 008a
200121924031412.210.716.58.06.45.241.0177 358
200219825527013.79.414.65.76.63.446.5189 810
200319426931415.710.014.66.45.65.242.4174 586
200426322936211.111.114.17.36.74.045.7174 838
200524828937714.612.713.46.15.64.942.6166 683
200624125139914.413.614.75.24.14.143.8173 836
200722726042811.212.912.36.66.83.347.0184 728

airregularities are shown in .

source: National Health Insurance Medical Benefit Survey.

airregularities are shown in . source: National Health Insurance Medical Benefit Survey.

Estimation of means and standard deviations of log-transformed per-claim costs

To determine whether an observed distribution follows a certain distribution (such as a normal distribution), the Kolmogorov-Smirnov (KS) test is used. In this test, the maximum discrepancy between the 2 cumulative distributions, or KS value, is used as a test statistic. If the KS value is smaller than 1.63/√n, then one can assume that the observed distribution follows the certain distribution at P = 0.01.[16] Because the NHIMBS provides only arithmetic means for per-claim costs, the means (m) and standard deviations (σ) of log-transformed per-claim costs were estimated from disease-specific frequency tables (Summary table 16-2) using Excel Solver, an add-in program for Microsoft Excel software. For example, 25.1% of outpatient claims with a principal diagnosis of diabetes were in the range of 500 to 1000 yen per-claim in 2006. The range of 500 to 1000 yen was log-transformed to LN(500)–LN(1000) or 6.21 to 6.91 (LN, natural logarithm). If the log-transformed per-claim costs follow a normal distribution, the proportion of claims in this range is expressed with Excel functions as follows (“TRUE” in Excel functions denotes cumulative density functions; “FALSE” denotes probability density functions): Frequency tables consist of 7 ranges (1–500, 500–1000, 1000–2000, 2000–3000, 3000–5000, 5000–10 000, and ≥10 000 yen per-claim). Let R denote the cumulative proportion of claims in the frequency tables in the kth range (1 ≦ k ≦ 7) and E denote the estimated cumulative proportion in the log-transformed kth range using formula [1]. Then, the KS value is expressed as follows: Optimal m and σ were obtained using Excel Solver to minimize the KS value of the formula [2] for all disease categories and years. The square of σ, σ2, gives the variance within a given disease category (hereafter referred to as intracategory variance).

Estimation of intercategory variances

Let n, m, and σ denote the number of claims, and the mean and standard deviation of per-claim costs, respectively, of an entire sample, and n, m, and σ denote those of the kth disease category. The relationship between the entire sample and disease categories are expressed as follows: Formula [4] signifies the following relationship: Hence, intercategory variance was calculated using the second part of the right side of formula [4].

Extrapolation of sample size

The number of sampled claims was obtained from the raw output tables (Table 7-1 for regular, 7-2 for elderly, and 7-3 for retiree beneficiaries) of the NHIMBS. However, the number of claims from these 3 beneficiary categories cannot be summed because the sampling proportion is different (1/500 for regular and elderly beneficiaries and 1/100 for retiree beneficiaries). Hence, the number of claims for retiree beneficiaries was deflated by five to adjust for the difference in sampling proportion.

Calculation of means and variance of residual subcategories

The NHIMBS presents disease-specific data on all 19 major disease categories in ICD10 (I–XIX), plus some selected subcategories. For example, NHIMBS presents data on ophthalmic disease (VII), as well as on a subcategory—cataract (H25–26). From these data, a residual subcategory, “other ophthalmic diseases (H00–59 minus H25–26)”, must be extrapolated to create a mutually exclusive disease classification. The means and variances of residual subcategories can be calculated using formula [4]. A total of 38 mutually exclusive disease categories were thus created. A subcategory, “renal failure”, was merged with a major category, “genitourinary diseases”, because of irregularities in the data.

RESULTS

Table 3 illustrates how the optimal m and σ were obtained. The left frequency table presents an actual distribution of per-claim costs and the right frequency table presents a theoretical distribution when per-claim costs are log-transformed and assumed to follow a normal distribution with optimal m and σ minimizing the KS value.
Table 3.

Optimal means and standard deviations of log-transformed per-claim costs of outpatient claims

YearPer-claim cost from NHIMBSLN(per-claim cost) optimized to minimize KS values


Frequency table (%)Frequency table (%)KS valueOptimal



1–500500–10001000–20002000–30003000–50005000–10 000>10 0000–6.216.21–6.916.91–7.607.60–8.018.01–8.528.52–9.219.21meanSD
199524.025.825.811.38.33.71.224.026.026.010.98.14.11.00.306.910.98
199622.526.026.412.08.33.61.222.226.226.911.38.44.10.90.346.950.96
199722.225.527.012.28.23.51.322.126.126.911.38.44.20.90.476.950.96
199824.326.527.810.66.62.91.324.027.326.610.67.43.40.70.686.880.94
199925.526.827.49.86.32.81.325.627.526.010.17.03.20.60.706.840.95
200025.326.327.810.36.63.10.724.727.626.510.37.13.20.60.646.850.94
200126.226.327.010.26.52.81.125.627.426.010.17.13.20.60.566.840.95
200228.026.527.19.35.62.41.027.228.125.59.66.42.70.50.786.780.94
200329.527.325.68.95.32.31.029.727.724.49.06.02.60.50.616.730.96
200430.927.324.88.65.02.21.130.827.623.98.85.92.60.50.716.700.97
200531.126.924.68.85.22.31.131.327.223.58.75.92.70.50.566.690.99
200632.026.824.48.55.02.21.131.827.423.48.65.82.60.50.606.680.98
200732.326.823.88.65.02.31.231.827.123.38.65.92.70.50.666.680.99

NHIMBS: National Health Insurance Medical Benefit Survey.

KS value: Kolmogorov-Smirnov value; SD: standard deviation.

NHIMBS: National Health Insurance Medical Benefit Survey. KS value: Kolmogorov-Smirnov value; SD: standard deviation. Table 4 shows the results of the KS test for goodness-of-fit. Overall, per-claim costs were shown to follow a log-normal distribution in 5 of 13 years (1995, 1996, 1997, 2001, and 2005). On a disease-specific level, a majority of disease categories were shown to follow log-normal distributions. Most notably, all disease categories followed a normal distribution in 1995 and 1996. Hypertension had the largest number of non-compatible years (11 out of 13 years), reflecting its large sample size, followed by genitourinary diseases (9 out of 13 years), including dialysis, which has an exceptionally high per-claim cost. The overall compatibility improved when hypertension and/or genitourinary diseases were excluded (shown as a reference in Table 4). Without these 2 categories, per-claim costs were shown to follow a log-normal distribution in all 13 years.
Table 4.

Results of Kolmogorov-Smirnov (KS) test for goodness-of-fit (α = 0.01)

Diagnostic categoriesICD 101995199619971998199920002001200220032004200520062007No. of *
ALLALL   *** *** **8
 ALL except hypertensionb          * **3
 ALL except genitourinary diseasesb      ****    4
 ALL except hypertension and genitourinary diseasesb              0
gastrointestinal infectionA00–09             0
tuberculosisA15–19             0
sexually transmitted diseaseA50–64             0
other infectious diseasesrest of A00–B99             0
malignant tumorC00–97           **2
nonmalignant tumorD00–48             0
hematopoietic diseasesD50–89             0
diabetesE10–14             0
other endocrine diseasesrest of E00–90     * *     2
schizophreniaF20–29             0
other psychiatric diseasesrest of F00–99             0
neurological diseasesG00–99           **2
cataractH25–26           **2
other ophthalmic diseasesrest of H00–59             0
diseases of the earH60–95             0
hypertensionI10–15  ***********11
ischemic heart diseasesI20–25           **2
cerebrovascular diseasesI60–69           **2
other circulatory diseasesrest of I00–99             0
acute upper respiratory infectionJ00–06             0
pneumoniaJ12–18             0
acute bronchitisJ20–21           **2
chronic sinusitisJ32             0
chronic obstructive pulmonary diseasesJ40–44           **2
asthmaJ45–46             0
other respiratory diseasesrest of J00–99             0
stomach and duodenal diseasesK25–29             0
liver diseasesK70–77             0
other gastrointestinal diseasesrest of K00–93             0
skin diseasesL00–99             0
musculoskeletal diseasesM00–99             0
genitourinary diseasesN00–99   ** *******9
pregnancy and deliveryO00–99             0
perinatal conditionsP00–96             0
congenital anomaliesQ00–99             0
unspecified conditionsR00–99             0
bone fracturesa             0
other injuriesrest of S00–T98             0

No. of * 0012222322299 

*: significantly different from normal distribution at confidence level 0.01.

aS02,12,22,32,42,52,62,72,82,92,T02,08,10,12.

bfor reference.

*: significantly different from normal distribution at confidence level 0.01. aS02,12,22,32,42,52,62,72,82,92,T02,08,10,12. bfor reference. Table 5 and 6 show the exponentiated m and σ (exp(m) and exp(σ)) or geometric means and standard ratio for all disease categories and years. Geometric means of per-claim costs have consistently decreased, which may reflect a reduction in drug costs due to increasing separation of dispensing and prescription. In contrast, the standard ratio remains constant, around 2.55 to 2.70, throughout the study period. It is noteworthy that the standard ratio of per-claim costs is close to the Napier constant (e = 2.718). If the geometric mean of per-claim costs is 1000 yen and the standard ratio is 2.7, one can assume that 68% of claims fall within the range of 1000/2.7 to 1000 * 2.7, or 370 to 2700 yen.
Table 5.

Exponentiated optimal means of log-transformed per-claim costs (= geometric means; in Japanese yen)

 ICD101995199619971998199920002001200220032004200520062007
ALLALL1002.31041.41044.9969.8930.7948.4930.8882.9835.6813.6807.9795.7798.8
gastrointestinal infectionA00–09681.4689.8715.5668.7620.5662.7665.6622.8695.9631.6630.2577.1614.4
tuberculosisA15–191465.71798.01397.21465.61358.41403.61220.11295.51235.8833.81306.11103.51248.9
sexually transmitted diseaseA50–64859.61086.5914.6748.1866.71242.91037.71043.9866.1806.2898.4802.6769.3
other infectious diseasesrest of A00–B99739.7756.3844.2820.5823.5756.3763.7735.9696.3674.7710.0709.9699.5
malignant tumorC00–972011.71917.11871.51620.51615.81454.01458.81462.01481.81431.51497.11382.01538.5
nonmalignant tumorD00–481271.51313.71287.61242.01100.61091.91083.2996.51205.61008.0992.4921.4952.3
hematopoietic diseasesD50–89998.7936.01069.5959.7815.6968.4870.4876.3837.1743.8872.5808.3865.7
diabetesE10–141594.91583.51604.81448.81388.41424.31406.51372.51285.71264.81237.01285.11276.0
other endocrine diseasesrest of E00–901260.11406.51301.91177.81107.41104.81106.71072.21031.6958.7947.7921.4914.5
schizophreniaF20–291338.11307.81237.51248.41205.91151.41256.11186.71160.61044.61033.11022.7948.0
other psychiatric diseasesrest of F00–991000.31063.31033.8998.7925.6953.4892.0916.5891.4828.2795.7737.1700.5
neurological diseasesG00–991017.41036.71077.2945.5872.7923.5925.2756.4779.9752.3777.3759.4733.4
cataractH25–26535.5530.5544.6519.2524.3537.8533.4495.8500.6469.0477.7517.7520.2
other ophthalmic diseasesrest of H00–59574.7602.4588.5594.1594.8586.7585.1555.7568.3551.5563.0537.8528.9
diseases of the earH60–95729.1787.8802.7720.5731.2729.3706.4678.5643.3638.8621.3630.3594.0
hypertensionI10–151308.81333.71359.91210.21184.21240.01236.81169.61065.2989.7986.8975.4947.0
ischemic heart diseasesI20–251517.11591.51556.41418.11327.31347.61274.31172.71077.91011.61024.41030.7986.7
cerebrovascular diseasesI60–691711.21751.31796.61499.31334.61281.41166.91102.01009.7936.3950.4961.7902.3
other circulatory diseasesrest of I00–991296.51336.11289.61203.01165.31173.31156.11000.4939.9904.0906.8875.6905.0
acute upper respiratory infectionJ00–06550.1586.0600.4598.0625.7605.9598.4577.7565.1589.0563.1559.5556.0
pneumoniaJ12–181506.71941.81386.21243.81537.71488.01213.81363.61161.11194.61301.21295.71407.8
acute bronchitisJ20–21691.0740.3771.2714.9720.4708.9703.7633.0660.2669.0645.8608.9614.6
chronic sinusitisJ32729.9774.3795.0765.2741.0741.9685.5700.7637.4601.4658.1587.9603.6
chronic obstructive pulmonary diseasesJ40–44859.2965.31051.4984.8921.0917.1907.8951.7829.1877.7751.8756.6758.1
asthmaJ45–461105.81124.61104.31001.51020.9969.2975.3853.1830.6865.4810.9761.4778.7
other respiratory diseasesrest of J00–99660.9735.6704.0668.5626.7632.9614.2605.8563.2600.0533.5605.8596.5
stomach and duodenal diseasesK25–291281.01263.61244.41176.11105.51127.71104.51052.7969.6937.6955.2942.9915.9
liver diseasesK70–771573.41618.01526.61417.71398.71400.21384.91375.31425.51282.81240.51272.41252.9
other gastrointestinal diseasesrest of K00–93956.5999.91010.9898.7857.9882.7895.3827.0723.5689.4697.9718.0704.2
skin diseasesL00–99488.3506.8506.5513.2489.8486.1474.5431.2452.0411.7422.7425.1391.1
musculoskeletal diseasesM00–991195.71184.81203.11105.41059.41077.11044.4937.9934.6899.8915.8883.3885.4
genitourinary diseasesN00–991180.31217.21212.81121.61071.21032.31001.4947.9884.4833.8845.4859.0832.2
pregnancy and deliveryO00–99859.0963.1980.9923.5927.2992.5850.1793.2861.5831.3958.3857.2947.4
perinatal conditionsP00–96728.0585.9578.5564.8887.0689.5636.7615.1687.1525.5767.3955.7931.1
congenital anomaliesQ00–99744.0808.0856.0695.4823.3676.6693.1831.0578.9701.4650.9632.3594.4
unspecified conditionsR00–99780.8715.0731.0746.6723.9745.5733.8635.4631.1641.2652.1698.1703.3
bone fracturesa1219.31272.71250.01142.51247.71218.31115.71096.11096.3979.91111.11108.71070.6
other injuriesrest of S00–T98795.7855.4884.0865.3822.4875.0839.8804.1784.1768.8803.3778.4807.4

aS02,12,22,32,42,52,62,72,82,92,T02,08,10,12.

Table 6.

Exponentiated optimal standard deviations of log-transformed per-claim costs (= standard ratio)

 ICD101995199619971998199920002001200220032004200520062007
ALLALL2.672.602.612.562.582.552.582.562.622.642.682.672.70
gastrointestinal infectionA00–092.712.282.392.472.292.292.292.172.322.062.412.182.33
tuberculosisA15–192.462.592.662.582.862.712.743.393.623.442.813.073.59
sexually transmitted diseaseA50–642.312.992.572.282.922.221.941.951.982.132.001.942.33
other infectious diseasesrest of A00–B993.062.982.822.842.712.812.732.802.792.922.772.702.79
malignant tumorC00–973.343.393.193.683.613.253.513.914.024.123.803.944.02
nonmalignant tumorD00–483.033.213.223.153.353.063.203.223.063.243.182.923.38
hematopoietic diseasesD50–892.473.052.872.612.772.752.632.642.382.932.742.532.66
diabetesE10–142.382.292.322.302.382.262.292.262.302.262.282.202.21
other endocrine diseasesrest of E00–902.292.292.232.212.232.122.212.192.292.242.282.402.29
schizophreniaF20–292.332.282.202.262.282.232.372.562.452.632.742.912.98
other psychiatric diseasesrest of F00–992.302.312.212.342.232.172.282.322.312.272.442.482.54
neurological diseasesG00–992.872.973.032.792.952.823.093.213.093.173.423.203.39
cataractH25–262.132.152.032.052.042.052.021.971.992.072.121.951.97
other ophthalmic diseasesrest of H00–591.971.871.951.871.791.871.861.901.841.861.851.911.93
diseases of the earH60–952.502.652.502.642.532.442.342.492.452.542.512.472.47
hypertensionI10–152.142.102.092.052.082.082.112.142.192.252.322.352.36
ischemic heart diseasesI20–252.372.362.312.152.362.352.552.492.692.592.802.822.75
cerebrovascular diseasesI60–692.512.372.442.542.632.702.702.873.043.153.143.043.20
other circulatory diseasesrest of I00–992.602.612.532.672.572.592.692.642.792.782.782.912.88
acute upper respiratory infectionJ00–062.012.032.062.062.022.072.041.941.991.961.942.042.00
pneumoniaJ12–182.732.582.492.852.942.582.702.832.922.722.972.732.71
acute bronchitisJ20–212.042.072.142.001.941.942.011.951.921.891.962.072.09
chronic sinusitisJ322.482.382.472.302.552.392.512.302.332.452.452.412.48
chronic obstructive pulmonary diseasesJ40–442.702.462.772.732.792.332.842.612.863.273.223.243.28
asthmaJ45–462.512.572.522.522.502.502.552.602.562.542.712.672.71
other respiratory diseasesrest of J00–992.822.442.522.552.482.572.883.113.123.023.082.812.88
stomach and duodenal diseasesK25–292.362.412.332.392.422.392.452.402.452.462.592.592.65
liver diseasesK70–772.452.432.402.392.452.512.412.352.562.532.732.482.46
other gastrointestinal diseasesrest of K00–933.072.922.933.012.982.903.093.153.093.323.083.343.30
skin diseasesL00–992.712.542.482.322.322.312.392.322.252.332.392.332.43
musculoskeletal diseasesM00–992.532.512.482.492.522.552.532.512.422.472.452.502.50
genitourinary diseasesN00–993.463.523.563.593.923.893.863.793.844.554.503.874.55
pregnancy and deliveryO00–991.972.142.042.002.082.062.092.442.202.082.142.192.04
perinatal conditionsP00–962.813.722.721.902.292.401.973.041.782.402.902.432.35
congenital anomaliesQ00–992.633.542.733.244.233.273.102.673.702.683.202.243.60
unspecified conditionsR00–992.742.972.972.702.722.732.792.582.932.882.872.792.76
bone fracturesa2.742.742.602.662.712.612.962.682.582.712.702.632.81
other injuriesrest of S00–T982.382.312.282.172.322.332.472.292.212.312.272.332.39

aS02,12,22,32,42,52,62,72,82,92,T02,08,10,12.

aS02,12,22,32,42,52,62,72,82,92,T02,08,10,12. aS02,12,22,32,42,52,62,72,82,92,T02,08,10,12. Table 7 shows the consistent decline in interclass variance relative to overall variance: in 1995, intercategory variance was 19.5% of overall variance but declined to only 10% in 2007. This means that disease categories account for less than before in discriminating differences in per-claim costs; Figure 1 shoes the trend line (Y = −0.0065X + 0.1659, R2 = 0.901). The declining trend reversed in 2002, when hospitals and clinics were mandated to choose principal diagnoses; however, the reversal was only temporary and the declining trend appears consistent. In 2007, another reversal occurred, but it is too early to determine if it is temporary.
Table 7.

Estimated variances of log-transformed per-claim costs

YearOverall V (A)Intercategory V (B)Intracategory V%intercategory V (B/A)
199553 023869244 47219.5%
199651 777831544 59518.6%
199752 011809443 76718.5%
199853 941706447 37614.9%
199957 383678950 86013.3%
200060 250744053 55113.9%
200164 965762358 77413.0%
200264 950816259 73013.7%
200370 608728563 28111.5%
200472 096687367 17110.2%
200576 907702870 8569.9%
200678 663687671 9109.6%
200781 517763576 40910.0%

V: variance.

Figure 1.

Trend in %intercategory variance in overall variance of per-claim cost of outpatient claims

V: variance.

DISCUSSION

This study demonstrated a consistent decline in the intercategory variance of per-claim costs. If the difference in per-claim costs among disease categories is held constant, the declining intercategory variance can be interpreted as declining accuracy of classification or, in other words, increasing misclassification. Until 2001, disease classification was conducted rather arbitrarily, with no explicit criteria, by nonprofessionals at insurers. Starting in 2002, hospitals and clinics were required to specify principal diagnoses, which, it was hoped, would enhance the accuracy of classification. The change in classifiers did increase intercategory variance, as suggested by this author’s previous study,[17] but the effect was short-lived and does not appear to have altered the overall declining trend. This finding is sufficient to rebut the common claim that classification is accurate when doctors choose principal diagnoses. The goodness-of-fit evaluated by the KS test revealed that all disease categories followed log-normal distributions in 1995 and 1996, but that the goodness-of-fit deteriorated year by year as more categories were evaluated that did not follow a log-normal distribution, as indicated by increasing KS values. At the same time, this study revealed that the standard ratio of per-claim costs remained stable and close to the Napier constant (2.718). This finding should prove to be a useful rule of thumb for analysis of health insurance claims: 68% of claims fall between 2.718 times and 2.718th of the geometric mean. Then, what is the cause of the decline in accuracy? The most probable cause is the increasing number of diagnoses, as suggested by this author in 1996.[13] The average number of diagnoses in a claim has consistently increased, as shown in Figure 2. The increased intercategory variance in 2002 can be explained by the sudden reduction in the number of diagnoses due to the revised rule exempting diagnoses for inexpensive drugs. Whatever the causes, disease classification by principal diagnoses is becoming progressively less accurate in discriminating per-claim costs. With the rapid computerization of claims, there is a need for a statistical method that can objectively quantify all diagnoses. Such a method was described by this author in a previous study.[18]
Figure 2.

Trend in average number of diagnoses in an outpatient claim

  13 in total

1.  Impact of smoking habit on medical care use and its costs: a prospective observation of National Health Insurance beneficiaries in Japan.

Authors:  Y Izumi; I Tsuji; T Ohkubo; A Kuwahara; Y Nishino; S Hisamichi
Journal:  Int J Epidemiol       Date:  2001-06       Impact factor: 7.196

2.  Estimation of disease-specific costs in health insurance claims: a comparison of three methods.

Authors:  Etsuji Okamoto; Eiichi Hata
Journal:  Nihon Koshu Eisei Zasshi       Date:  2004-11

3.  Accuracy of diabetes diagnosis in health insurance claims data in Taiwan.

Authors:  Cheng-Ching Lin; Mei-Shu Lai; Ci-Yong Syu; Shuan-Chuan Chang; Fen-Yu Tseng
Journal:  J Formos Med Assoc       Date:  2005-03       Impact factor: 3.282

4.  Accuracy of administrative data for identifying patients with pneumonia.

Authors:  Dominik Aronsky; Peter J Haug; Charles Lagor; Nathan C Dean
Journal:  Am J Med Qual       Date:  2005 Nov-Dec       Impact factor: 1.852

5.  The accuracy of myocardial infarction diagnosis in medical insurance claims. Korean Research Group for Cardiovascular Disease Prevention and Control.

Authors:  S Y Ryu; J K Park; I Suh; S H Jee; J Park; C B Kim; K S Kim
Journal:  Yonsei Med J       Date:  2000-10       Impact factor: 2.759

6.  Impact of walking upon medical care expenditure in Japan: the Ohsaki Cohort Study.

Authors:  Ichiro Tsuji; Kohko Takahashi; Yoshikazu Nishino; Takayoshi Ohkubo; Shinichi Kuriyama; Yoko Watanabe; Yukiko Anzai; Yoshitaka Tsubono; Shigeru Hisamichi
Journal:  Int J Epidemiol       Date:  2003-10       Impact factor: 7.196

Review 7.  Do individualized health promotional programs reduce health care expenditure? A systematic review of controlled trials in the "Health-Up" model projects of the National Health Insurance.

Authors:  Etsuji Okamoto
Journal:  Nihon Koshu Eisei Zasshi       Date:  2008-12

8.  Agreement of diagnosis and its date for hematologic malignancies and solid tumors between medicare claims and cancer registry data.

Authors:  Soko Setoguchi; Daniel H Solomon; Robert J Glynn; E Francis Cook; Raisa Levin; Sebastian Schneeweiss
Journal:  Cancer Causes Control       Date:  2007-04-19       Impact factor: 2.506

9.  [Reliability of health insurance claim statistical data based on the principal diagnosis method].

Authors:  Shinichi Tanihara; Zentaro Yamagata; Hiroshi Une
Journal:  Nihon Eiseigaku Zasshi       Date:  2008-01

10.  Accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records.

Authors:  Yuka Kiyota; Sebastian Schneeweiss; Robert J Glynn; Carolyn C Cannuscio; Jerry Avorn; Daniel H Solomon
Journal:  Am Heart J       Date:  2004-07       Impact factor: 4.749

View more
  2 in total

1.  The Effects of Diagnostic Definitions in Claims Data on Healthcare Cost Estimates: Evidence from a Large-Scale Panel Data Analysis of Diabetes Care in Japan.

Authors:  Haruhisa Fukuda; Shunya Ikeda; Takeru Shiroiwa; Takashi Fukuda
Journal:  Pharmacoeconomics       Date:  2016-10       Impact factor: 4.981

2.  Effects of health guidance on outpatient and pharmacy expenditures: a disease- and drug-specific 3-year observational study using propensity-score matching.

Authors:  Etsuji Okamoto
Journal:  J Epidemiol       Date:  2013-06-01       Impact factor: 3.211

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.