Literature DB >> 24317344

Linkage rate between data from health checks and health insurance claims in the Japan National Database.

Etsuji Okamoto1.   

Abstract

BACKGROUND: Japan's National Database (NDB) includes data on health checks and health insurance claims, is linkable using hash functions, and is available for research use. However, the linkage rate between health check and health insurance claims data has not been investigated.
METHODS: Linkage rate was evaluated by comparing observed medical and pharmaceutical charges among health check recipients in fiscal year (FY) 2009 (N = 21 588 883) with expected charges from the same population when record linkage was complete. Using the NDB, observed charges were estimated from the first published result of linking health check recipients in FY2009 and their health insurance claims in FY2010. Expected charges were estimated by combining 3 publicly available datasets, including data from the Medical Care Benefit Survey and an ad-hoc report by the Japan Health Insurance Association.
RESULTS: Only 14.9% of expected charges were linked by the NDB. The linkage rate was higher for women than for men (18.2% vs 12.4%) and for elderly adults as compared with younger adults (>25% vs <10%).
CONCLUSIONS: The linkage rate in the NDB was so low that any research linking health check and health insurance claims will not be reliable. Causes for the low linkage rate include differences between health check and health insurance claims data in name format (eg, insertion of a space between family and given names) and date of birth (Japanese vs Gregorian calendar). Investigation of the causes for the low linkage rate and measures for improvement are urgently needed.

Entities:  

Mesh:

Year:  2013        PMID: 24317344      PMCID: PMC3872528          DOI: 10.2188/jea.je20130075

Source DB:  PubMed          Journal:  J Epidemiol        ISSN: 0917-5040            Impact factor:   3.211


INTRODUCTION

In 2008, the National Database (NDB) was created in Japan for the “development, implementation, and evaluation” of the Health Care Cost Containment Plan (HCCCP), as set forth by Section 16 of the Elderly Health Care Security Act. Data from regular health checks and guidance have been collected since fiscal year (FY) 2008, and health insurance claims data have been collected since April 2009. The NDB has grown to one of the largest databases in the world and in June 2012 encompassed approximately 5 billion health insurance claims and 66 million health check and guidance data.[1] Personally identifiable data in the NDB are irreversibly encrypted using hash functions. Because Japan does not have unique personal identifiers, 2 32-digit hash functions are generated: one from the insurer ID, beneficiary ID, date of birth, and sex, and the other from name, date of birth, and sex. By combining 2 hash functions, the NDB maximizes record linkage of health insurance claims with health check data from the same person.[2] Unfortunately, the use of dual hash functions is by no means complete. Mistyping of names, inclusion of a space between family and first names, and a change in insurer or beneficiary ID will result in the generation of completely different hash functions, thereby compromising the accuracy of record linkage. Indeed, the accuracy of such record linkage in the NDB has not been fully investigated. The NDB is available for research use and many research projects using the NDB are underway.[3] However, as a prerequisite of scientifically sound analysis, researchers must first ensure the accuracy of record linkage. The author evaluated the linkage rate between health check data and health insurance claims in the NDB by comparing the medical and pharmaceutical charges observed for health check recipients in FY2009, ascertained through record linkage in the NDB, with the expected charges for the same population, estimated using publicly available data. If record linkage is complete, the observed and expected charges should match or at least be similar.

METHODS

Data source

Four publicly available datasets were used, all of which are available on the internet. The first 3 were used to estimate expected charges and the last one was used to estimate observed charges.

[1] Report on Health Checks and Guidance Regarding Metabolic Syndrome in FY2009[4]

The FY2009 Report on Health Checks and Guidance Regarding Metabolic Syndrome compiled administrative reports from 3453 insurers. It lists the number of beneficiaries “eligible for health checks”, which is defined as “beneficiaries as of April 1, 2009” and excludes those who quit in the middle of the fiscal year. For evaluation of insurer performance, the number of beneficiaries eligible for health checks is used as the denominator to calculate the percentage of health check recipients. Because insurers are held responsible only for beneficiaries eligible throughout the fiscal year, those who changed health insurance in the middle of the fiscal year are excluded from the denominator. However, in this study, the population as of October 1, 2009 was used as the denominator because the present study does not seek to evaluate insurer performance. Hence, the percentages of health check recipients reported in this study (males: 42.0%, females: 32.6%) are lower than those in the report (males: 46.5%, females: 36.4%) (Table 1).
Table 1.

Percentages of individuals receiving health checks in fiscal year 2009

AgeMALESFEMALES


Population as ofOct 2009(in thousands, A)Eligible forhealthcheckRecipients ofhealth checkN(+)% receivinghealth check, RN(+)/APopulation as ofOct 2009(in thousands, A)Eligiblefor healthcheckRecipients ofhealth checkN(+)% receivinghealth check, RN(+)/A
40–4443234 056 3512 208 37651.1%42583 851 4651 380 70932.4%
45–4939323 685 5672 054 32452.2%38943 552 1161 315 06133.8%
50–5438633 542 4611 903 91949.3%38773 473 4901 293 21933.4%
55–5945174 011 8401 975 96843.7%46163 989 5311 419 48130.8%
60–6446034 078 4321 565 72534.0%48104 375 5751 486 00630.9%
65–6940053 484 9401 214 40530.3%43803 922 8971 481 64333.8%
70–7431992 840 2671 019 99731.9%37123 346 8031 270 05034.2%

Total28 44225 699 85811 942 71442.0%29 54726 511 8779 646 16932.6%

Source: Report of health check and guidance against metabolic syndrome in fiscal year 2009.

Source: http://www.mhlw.go.jp/bunya/shakaihosho/iryouseido01/dl/info03_h21_03.pdf.

Source: Report of health check and guidance against metabolic syndrome in fiscal year 2009. Source: http://www.mhlw.go.jp/bunya/shakaihosho/iryouseido01/dl/info03_h21_03.pdf.

[2] Analysis of data on health checks and medical charges in FY2008[5]

The Japan Health Insurance Association (JHIA) linked health check data and health insurance claims for 11 705 320 beneficiaries aged 35 to 74 years (the total was 9 618 145 when limited to individuals aged 40–74 years) in FY2008 and compared per capita charges between health check recipients and nonrecipients by sex and 5-year age group (Table 2).
Table 2.

Per capita charges for medical and pharmaceutical claims by recipients and nonrecipients of health checks (annual charges in yen)

AgeMALESFEMALES


RecipientsP(+)NonrecipientsP(−)Ratio, rP(−)/P(+)RecipientsP(+)NonrecipientsP(−)Ratio, rP(−)/P(+)
40–4468 46083 0171.2180 39193 9371.17
45–4989 120112 2201.2692 159107 6171.17
50–54117 000148 7871.27109 917128 0351.16
55–59149 394197 4211.32128 347154 2321.20
60–64191 084257 5931.35158 692195 4201.23
65–69235 556339 8281.44199 147258 2051.30
70–74332 376512 2311.54290 377403 9381.39

Total129 273188 3351.46114 226147 7701.29

Source: Japan Health Insurance Association: Analysis of data on health checks and medical charges in fiscal year 2008.

Source: http://www.kyoukaikenpo.or.jp/~/media/Files/honbu/cat740/2506/250611/250611003.xls.

Source: Japan Health Insurance Association: Analysis of data on health checks and medical charges in fiscal year 2008. Source: http://www.kyoukaikenpo.or.jp/~/media/Files/honbu/cat740/2506/250611/250611003.xls.

[3] Medical Care Benefit Survey, FY2011[6]

The Medical Care Benefit Survey (MCBS) is a population survey of all health insurance claims submitted from May 2011 thru April 2012 and is conducted by the Japan Ministry of Health, Labour and Welfare (MHLW). The FY2011 rather than the FY2010 MCBS was used because the MCBS included sex-specific data for the first time in FY2011. Because there was no fee schedule revision between FY2010 and FY2011, the estimates of charges will not be biased. Unlike the national database, which covers only electronically submitted claims, the MCBS includes all claims, including those submitted on paper, and thus provides the best estimate of per capita charges for the entire insured population. Since the MCBS is a survey of health insurance, it does not cover claims under the Livelihood Assistance Act for the indigent population. The MCBS also does not include Seamen’s Insurance, because the insurer did not submit the relevant data. In addition, some health insurance societies and mutual aid associations did not submit data and were thus excluded from the numerator and denominator. The survey report included age-specific number of beneficiaries as the denominator but no sex-specific data were available. Therefore, age- and sex-specific numbers of beneficiaries were estimated by applying sex ratios for the population as of October 1, 2011 (Table 3).
Table 3.

Age- and sex-specific per capita medical and pharmaceutical charges for the entire population in fiscal year 2011

AgeMALESFEMALES


No. ofbeneficiaries (N)Medical andpharmaceuticalcharges in yen (C)Per capitacharges, P(C/N)No. ofBeneficiaries (N)Medical andpharmaceuticalcharges in yen (C)Per capitacharges, P(C/N)
40–444 006 878453 169 049 750113 0983 926 809473 955 768 170120 697
45–493 321 980503 331 494 490151 5163 287 134491 671 766 320149 575
50–543 138 226624 634 900 530199 0413 139 869584 106 152 970186 029
55–593 465 152908 033 409 780262 0473 518 876811 274 890 500230 549
60–644 780 8751 650 648 766 370345 2614 960 4551 411 115 393 690284 473
65–693 459 9671 643 229 952 330474 9273 777 6061 421 169 561 630376 209
70–743 019 4591 953 067 625 520646 8273 483 9211 825 207 400 040523 895

Total25 192 5377 736 115 198770307 08026 094 6707 018 500 933320268 963

Source: Medical Care Benefit Survey, fiscal year 2011

http://www.e-stat.go.jp/SG1/estat/GL02020101.do?method=xlsDownload&fileId=000006435815&releaseCount=1.

Source: Medical Care Benefit Survey, fiscal year 2011 http://www.e-stat.go.jp/SG1/estat/GL02020101.do?method=xlsDownload&fileId=000006435815&releaseCount=1.

[4] Per capita medical and pharmaceutical charges for health check recipients in FY2009[7]

A report submitted by the MHLW to the Seventh Meeting of the Committee on Health Checks and Guidance on February 24, 2012 used hash functions to link health check data in FY2009 and health insurance claims data in FY2010 on an individual basis and was the first published evidence of the accuracy of record linkage in the NDB. In FY2009, 21 588 883 beneficiaries (11 942 714 males and 9 646 169 females) underwent health checks. Of them, 2 685 509 beneficiaries (1 172 510 males and 1 512 999 females; 9.8% and 15.7%, respectively) were linked with FY2010 health insurance claims (medical, pharmaceutical, and diagnosis-procedure–combination [DPC]—a system of per diem payment for acute hospitals that is part of medical claims). The medical and pharmaceutical charges contained in the linked health insurance claims totaled 716 128 080 857 yen. Because the NDB contains only electronically submitted claims, the computerization rate of claims must be considered, to ensure fair comparison with the MCBS, which also contains claims submitted on paper. According to the Social Insurance Payment Fund, the computerization rate in FY2010 was 92.0% for medical claims and 99.9% for pharmaceutical claims, for an overall rate of 94.8%[8] (463 225 000 medical and 281 613 000 pharmaceutical claims were submitted electronically out of 503 627 000 medical and 281 842 000 pharmaceutical claims in FY2010). The observed charges were inflated by multiplying values by the inverse of the computerization rate (Table 4).
Table 4.

Observed medical and pharmaceutical charges for health check recipients linked to health insurance claims

AgeMALESFEMALES


No. of health checkrecipients linked tohealth insuranceclaims n(+)Per capitamedical andpharmaceuticalcharges p(+)Observedcharges c(+)(n(+) * p(+))No. of health checkrecipients linked tohealth insuranceclaims n(+)Per capitamedical andpharmaceuticalcharges p(+)Observedcharges c(+)(n(+) * p(+))
40–4496 70414 41913 943 766 73989 83713 49912 126 810 616
45–4997 58417 11316 699 638 20094 10215 33514 430 862 278
50–54119 62820 62624 674 486 348120 84016 68820 165 611 842
55–59155 98424 55938 308 534 776163 49718 64730 487 271 483
60–64107 66825 30827 248 692 485241 88121 15351 165 276 230
65–69299 37334 212102 421 567 712424 60128 157119 553 750 511
70–74295 56940 104118 534 614 534378 24133 409126 367 197 103

Total1 172 51029 1543 418 313 007941 512 99924 7393 742 967 80063

Source: Per capita medical and pharmaceutical charges for health check recipients in fiscal year 2009.

Source: http://www.mhlw.go.jp/stf/shingi/2r98520000023mfn-att/2r98520000023mkh.pdf.

Source: Per capita medical and pharmaceutical charges for health check recipients in fiscal year 2009. Source: http://www.mhlw.go.jp/stf/shingi/2r98520000023mfn-att/2r98520000023mkh.pdf.

Statistical analysis

Accuracy of the record linkage in the NDB was evaluated by comparing (1) the observed medical and pharmaceutical charges for health check recipients in data source [4] with (2) the expected medical and pharmaceutical charges of the same population estimated from data sources [1], [2], and [3]. It is expressed as c(+)/C(+) using the following notation:Subclassification was denoted by using the following subscripts: N: number of beneficiaries obtained from data source [3] N(+): number of health check recipients obtained from data source [1] N(−): number of nonrecipients (= N − N(+)) n(+): number of health check recipients whose health insurance claims were linked using hash functions obtained from data source [4] C: medical and pharmaceutical charges of all beneficiaries obtained from data source [3] C(+): medical and pharmaceutical charges for health check recipients C(−): medical and pharmaceutical charges for nonrecipients (= C − C(+)) c(+): medical and pharmaceutical charges for health check recipients whose health insurance claims were linked (ie, the observed medical and pharmaceutical charges for health check recipients) P: per capita medical and pharmaceutical charges for all beneficiaries (= C/N) P(+): per capita medical and pharmaceutical charges for health check recipients (= C(+)/N(+)) P(−): per capita medical and pharmaceutical charges for nonrecipients of health checks (= C(−)/N(−)) p(+): per capita medical and pharmaceutical charges for health check recipients whose health insurance claims were linked using hash functions obtained from data source [4] (= c(+)/n(+)) i: sex (2 categories: males, females) j: 5-year age group (7 categories: age 40–44, 45–49…70–74 years) k: metabolic syndrome status (3 categories: no metabolic syndrome, risk of metabolic syndrome, metabolic syndrome)

Observed charges (c(+))

Observed medical and pharmaceutical charges for health check recipients, c(+), were calculated from data source [4], using the following formula (the results were inflated by the inverse of 0.948 to adjust for computerization of claims):

Expected charges (C(+))

Expected medical and pharmaceutical charges, C(+), were estimated as:N(+) was obtained from data source [1]. P(+) had to be estimated from per capita charges for the entire population, obtained from data source [3]. Because bedridden people and hospitalized patients cannot receive health checks, the per capita charges for health check recipients (P(+)) should be lower than those for nonrecipients (P(−)). Let r denote the ratio between per capita charges for nonrecipients over recipients, which was obtained from data source [2]:Let R denote the percentage of those receiving health checks (= N(+)/N), as indicated in data source [1]. Then, Hence, Since P(−) = r * P(+) Using this formula, the per capita charges for health check recipients (P(+)) can be estimated. Then, C(+) is obtained as follows:

RESULTS

The results are summarized in Table 5 and Figure.
Table 5.

Observed and expected medical and pharmaceutical charges for health check recipients

AgeNo. ofhealthcheckrecipients(N)%receivinghealthcheck(R)Ratio ofcharges ofnonrecipientsto recipients(r)Per capitacharges forentirepopulation(in yen) (P)Per capita chargesfor health checkrecipients (in yen)(P/(R + r − Rr))Expected charges forhealth check recipients(N * P/(R + r − Rr))Observed charges forhealth check recipientsinflated bycomputerization rate(c(+)/0.948)Observed/Expected
MALES
40–442 208 37651.1%1.21113 098102 443226 231 655 92614 708 614 7046.5%
45–492 054 32452.2%1.26151 516134 827276 978 115 88217 615 652 1106.4%
50–541 903 91949.3%1.27199 041174 938333 067 001 95626 027 939 1867.8%
55–591 975 96843.7%1.32262 047221 915438 495 975 01640 409 846 8109.2%
60–641 565 72534.0%1.35345 261280 776439 617 687 81028 743 346 5036.5%
65–691 214 40530.3%1.44474 927362 972440 795 159 536108 039 628 38824.5%
70–741 019 99731.9%1.54646 827472 625482 076 124 377125 036 513 22225.9%
Subtotal11 942 71442.0%1.46307 080242 7442 899 018 186 819360 581 540 92212.4%
FEMALES
40–441 380 70932.4%1.17120 697108 359149 612 721 70912 791 994 3218.6%
45–491 315 06133.8%1.17149 575134 620177 033 700 18315 222 428 5638.6%
50–541 293 21933.4%1.16186 029167 616216 764 167 32021 271 742 4499.8%
55–591 419 48130.8%1.20230 549202 297287 156 175 36832 159 569 07511.2%
60–641 486 00630.9%1.23284 473245 248364 439 955 85553 971 810 36914.8%
65–691 481 64333.8%1.30376 209314 494465 967 106 619126 111 551 17227.1%
70–741 270 05034.2%1.39523 895416 691529 218 233 237133 298 731 12125.2%
Subtotal9 646 16932.6%1.29268 963224 5492 166 033 679 871394 827 827 07118.2%

Total21 588 88337.2%1.38287 686232 2815 065 051 866 690755 409 367 99314.9%
Figure.

Summary of results.

The NDB linked only 0.755 trillion yen of a total of 5.065 trillion yen actually charged for health check recipients in FY2009. Thus, in terms of charges, the NDB was able to link only 14.9% of health insurance claims. There was an obvious sex difference: the linkage rate was higher for women than for men (18.2% vs 12.4%, respectively). In addition, there was an age difference: the linkage rate was higher for elderly adults than for younger adults. Adults aged 65 years or older had greater than 25% of their claims linked, while younger adults had less than 10% of their claims linked.

DISCUSSION

The present results were alarming. The linkage rate of 14.9% was far lower than that of the Japan Medical Data Center (JMDC) database (88.5% with 1 hash function and 98.0% with 2 hash functions combined)[9] and might bias the findings of any research linking health check and health insurance claims data. The NDB was created for the “development, implementation and evaluation” of the HCCCP, which emphasizes health care cost containment through prevention of metabolic syndrome. However, the low linkage rate of the NDB makes it incapable of fulfilling that task. The reasons for the low linkage rate and sex and age differences are not clear. One possibility is that the formats for names and dates of birth are inconsistent on the health insurance claims and health check data. A space must be inserted between family and given names on health insurance claims but not in health check data. Although date of birth is recorded using the Japanese calendar for health insurance claims, it is recorded using the Gregorian calendar for health checks. The advantage of this study is that it is based entirely on publicly available datasets, thanks to the recent availability of detailed data. One such development is the availability of per capita charges for health check recipients and nonrecipients from the Japan Health Insurance Association. The fact that male nonrecipients of health checks consume 1.46 times the charges of recipients sheds new light on the conventional wisdom that municipalities with higher health check participation have lower per capita health care charges. Another development was the Medical Care Benefit Survey, which serves as a “mirror site” of the NDB. Interestingly, different sections of the MHLW collect the same health insurance claims data based on different legal requirements.[3] These dual databases provided the author a valuable opportunity to obtain observed and expected charges by means of comparing them. This study did have limitations, however. Although it revealed the low linkage rate of the NDB, the reasons for this low linkage remain unclear. Investigation of the low linkage rate and identification of measures for improvement are thus urgently needed. Hash function encryption is performed by the Prefectural Federations of National Health Insurance and by prefectural branches of the Social Insurance Payment Fund, using an encryption program distributed by the MHLW (not by individual health insurers). The author suspects that the encryption algorithm is flawed, although this would not fully explain the observed sex and age differences. Since hash functions are irreversible, it is not possible to investigate causes within the NDB. A future field test involving health insurers of sufficient enrollment size may be useful. By comparing the original, personally identifiable data (health insurance claims and health check data) with the encrypted data generated by the encryption program, it would be possible to identify the causes for the low linkage rate. Once these causes are identified, the encryption algorithm should be revised, and the old data, back to April 2009, should be recollected before they are lost, as it is not too late to address the problem.

Suggestions for researchers

The NDB is available for research use, and publications based on NDB data are already appearing. However, researchers and reviewers must carefully consider the linkage rate using hash functions, as it should never be assumed that the linkage is 100%. Just as response rate is required in reporting a questionnaire survey, linkage rate should be reported when using NDB data, particularly when using data to link the same individual across time or attempting to link health check and health insurance claims data. This study provides a method for evaluating linkage rate. Researchers who use NDB data should refer to its mirror site, the MCBS. Because the MCBS covers the same health insurance claims as the NDB (actually the coverage of the MCBS is greater because it covers health insurance claims submitted on paper), researchers should be able to compare health care charges on a sex- and age-specific basis. As a matter of policy, researchers are prohibited from cross-linking the NDB with any other individual-level data. However, this does not preclude comparisons with or references to other publicly available aggregate data. Researchers are reminded that part of the NDB is publicly available. The Social Insurance Claims Survey has collected data directly from the NDB for hospitals and pharmacies since 2011. Pharmacy MEDIAS collects electronic pharmacy claims since 2004 but was replaced by the NDB in April 2012.[10] Finally, the JHIA independently provides aggregate data on health insurance claims. By comparing those publicly available aggregate data, researchers may be able to evaluate linkage rate and any potential bias related to it.
  1 in total

1.  Development of a database of health insurance claims: standardization of disease classifications and anonymous record linkage.

Authors:  Shinya Kimura; Toshihiko Sato; Shunya Ikeda; Mitsuhiko Noda; Takeo Nakayama
Journal:  J Epidemiol       Date:  2010-08-07       Impact factor: 3.211

  1 in total
  5 in total

Review 1.  Cryptorchidism after the Fukushima Daiichi Nuclear Power Plant accident:causation or coincidence?

Authors:  Yoshiyuki Kojima; Susumu Yokoya; Noriaki Kurita; Takayuki Idaka; Tetsuo Ishikawa; Hideaki Tanaka; Yoshiko Ezawa; Hitoshi Ohto
Journal:  Fukushima J Med Sci       Date:  2019

2.  The first report of Japanese antimicrobial use measured by national database based on health insurance claims data (2011-2013): comparison with sales data, and trend analysis stratified by antimicrobial category and age group.

Authors:  Daisuke Yamasaki; Masaki Tanabe; Yuichi Muraki; Genta Kato; Norio Ohmagari; Tetsuya Yagi
Journal:  Infection       Date:  2017-12-22       Impact factor: 3.553

3.  Association between the number of board-certified physiatrists and volume of rehabilitation provided in Japan: an ecological study.

Authors:  Yuki Kato; Miho Shimizu; Shinsuke Hori; Kenta Ushida; Yoshinori Yamamoto; Ken Muramatsu; Ryo Momosaki
Journal:  J Rural Med       Date:  2022-04-06

4.  The proportion of uncoded diagnoses in computerized health insurance claims in Japan in May 2010 according to ICD-10 disease categories.

Authors:  Shinichi Tanihara
Journal:  J Epidemiol       Date:  2014-06-28       Impact factor: 3.211

5.  Validity assessment of self-reported medication use by comparing to pharmacy insurance claims.

Authors:  Misuzu Fujita; Yasunori Sato; Kengo Nagashima; Sho Takahashi; Akira Hata
Journal:  BMJ Open       Date:  2015-11-09       Impact factor: 2.692

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.