| Literature DB >> 35929173 |
Mee Kyoung Kim1, Kyungdo Han2, Seung-Hwan Lee3,4.
Abstract
Recently, medical research using big data has become very popular, and its value has become increasingly recognized. The Korean National Health Information Database (NHID) is representative of big data that combines information obtained from the National Health Insurance Service collected for claims and reimbursement of health care services and results obtained from general health examinations provided to all Korean adults. This database has several strengths and limitations. Given the large size, various laboratory data, and questionnaires obtained from medical check-ups, their longitudinal nature, and long-term accumulation of data since 2002, carefully designed studies may provide valuable information that is difficult to obtain from other forms of research. However, consideration of possible bias and careful interpretation when defining causal relationships is also important because the data were not collected for research purposes. After the NHID became publicly available, research and publications based on this database have increased explosively, especially in the field of diabetes and metabolism. This article reviews the history, structure, and characteristics of the Korean NHID. Recent trends in big data research using this database, commonly used operational diagnosis, and representative studies have been introduced. We expect further progress and expansion of big data research using the Korean NHID.Entities:
Keywords: Database; Diabetes mellitus; Korea; Metabolism; National health programs
Mesh:
Year: 2022 PMID: 35929173 PMCID: PMC9353560 DOI: 10.4093/dmj.2022.0193
Source DB: PubMed Journal: Diabetes Metab J ISSN: 2233-6079 Impact factor: 5.893
Fig. 1Operational structure of National Health Insurance System (NHIS). Reproduced from Kim et al. [4]. HIRA, Health Insurance Review & Assessment Service.
Number of eligible individuals and actual examinees of health examination in recent 10 years
| Variable | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
|---|---|---|---|---|---|---|---|---|---|---|
| No. of eligible individuals | 15,249,528 | 15,673,188 | 15,775,891 | 16,456,214 | 17,356,727 | 17,633,406 | 17,818,302 | 19,593,149 | 21,716,582 | 21,446,220 |
| No. of actual examinees | ||||||||||
| Total no. (%) | 11,070,569 (72.6) | 11,419,350 (72.9) | 11,381,295 (72.1) | 12,301,581 (74.8) | 13,213,329 (76.1) | 13,709,413 (77.7) | 13,987,129 (78.5) | 15,076,899 (76.9) | 16,098,417 (74.1) | 14,544,980 (67.8) |
| Sex | ||||||||||
| Men | 6,117,787 | 6,277,362 | 6,258,804 | 6,716,277 | 7,152,110 | 7,360,929 | 7,470,196 | 8,106,914 | 8,395,046 | 7,659,607 |
| Women | 4,952,782 | 5,141,988 | 5,122,491 | 5,585,304 | 6,061,219 | 6,348,484 | 6,516,933 | 6,969,985 | 7,703,371 | 6,885,373 |
| Age, yr | ||||||||||
| ≤19 | 22,066 | 25,852 | 30,395 | 28,855 | 27,898 | 27,698 | 25,498 | 21,548 | 16,162 | 13,126 |
| 20–24 | 292,806 | 289,877 | 310,544 | 320,157 | 331,153 | 348,864 | 340,926 | 337,873 | 544,396 | 525,980 |
| 25–29 | 959,981 | 861,405 | 879,338 | 886,824 | 906,928 | 974,937 | 972,343 | 1,008,398 | 1,144,773 | 1,095,797 |
| 30–34 | 1,161,993 | 1,181,946 | 1,206,389 | 1,232,766 | 1,271,907 | 1,203,259 | 1,166,903 | 1,195,162 | 1,340,699 | 1,235,064 |
| 35–39 | 1,070,355 | 1,083,236 | 1,020,708 | 1,139,037 | 1,193,888 | 1,231,963 | 1,267,513 | 1,335,464 | 1,385,978 | 1,209,388 |
| 40–44 | 1,238,902 | 1,274,646 | 1,304,791 | 1,330,964 | 1,446,585 | 1,426,743 | 1,411,857 | 1,839,238 | 1,919,130 | 1,656,855 |
| 45–49 | 1,329,572 | 1,361,423 | 1,371,396 | 1,512,407 | 1,653,299 | 1,729,097 | 1,751,848 | 1,732,167 | 1,767,840 | 1,520,351 |
| 50–54 | 1,661,191 | 1,759,631 | 1,647,344 | 1,801,231 | 1,885,250 | 1,895,002 | 1,907,258 | 1,965,960 | 2,055,588 | 1,848,045 |
| 55–59 | 1,062,443 | 1,152,283 | 1,204,758 | 1,337,416 | 1,492,845 | 1,586,881 | 1,644,551 | 1,662,173 | 1,648,391 | 1,497,048 |
| 60–64 | 972,055 | 1,053,108 | 1,004,503 | 1,162,690 | 1,285,409 | 1,456,209 | 1,551,359 | 1,597,421 | 1,780,520 | 1,614,717 |
| 65–69 | 333,237 | 322,477 | 318,096 | 362,290 | 451,578 | 455,019 | 490,695 | 868,891 | 860,339 | 900,574 |
| 70–74 | 594,159 | 648,373 | 638,440 | 684,102 | 687,162 | 756,759 | 764,036 | 778,593 | 850,860 | 757,803 |
| 75–79 | 226,827 | 245,203 | 267,378 | 293,277 | 334,352 | 343,885 | 387,835 | 416,163 | 414,459 | 372,947 |
| 80–84 | 114,366 | 128,812 | 139,344 | 165,798 | 193,357 | 217,859 | 241,803 | 253,020 | 292,102 | 234,836 |
| ≥85 | 30,616 | 31,078 | 37,871 | 43,767 | 51,718 | 55,238 | 62,704 | 64,828 | 77,180 | 62,449 |
Variables included in the Korean National Health Information sample cohort database
| Qualification table | Year of construction, individual unique number, age, sex, location, type of subscription, deciles of contribution, type of disability, severity of disability, eligibility of medical check-up, sample type |
|
| |
| Birth and death table | Year of birth, date of death, cause of death |
|
| |
| Treatment table | |
| Statement (T20) | Start date of medical care, medical subject code, principal diagnosis, additional diagnosis, first date of hospitalization, route of hospitalization, official injury, operation (yes/no), days of medical care, days of hospital visit, days of total prescription, result of medical care, medical expenses (cost paid by insurer, cost paid by beneficiaries) |
| Treatment details (T30) | Start date of medical care, classification and item of specification, code of medical care classification, dosage and frequency of medication or procedure, type of medical expense, unit price, total cost, drug classification |
| Type of disease (T40) | Start date of medical care, medical subject code, principal diagnosis, additional diagnosis, ruled-out diagnosis |
| Prescription details (T60) | Start date of medical care, code of medication, drug classification, dosage, total days of administration, cost of medication |
|
| |
| Medical check-up table | Anthropometry, blood pressure, vision, hearing ability, blood test (fasting glucose, lipid levels, hemoglobin, creatinine, estimated glomerular filtration rate, aspartate aminotransferase, alanine aminotransferase, gamma-glutamyl transferase), chest radiography, electrocardiogram, past medical history, family history, questionnaires (smoking, alcohol consumption, exercise) |
|
| |
| Clinic table | Institution classification code, address of institution, subject type, numbers of doctors, nurses, beds for admission, beds for operation, and beds for emergency room |
|
| |
| Elderly long-term nursing table | General information and rating result of application, claim specification, status of long-term nursing facility |
Variables and questionnaires included in the health examination database
| Classification | Variable | Year of health examination | ||
|---|---|---|---|---|
|
| ||||
| 2002–2008 | 2009–2017 | 2018–2019 | ||
| Health examination | ||||
| Obesity | Height | ○ | ○ | ○ |
| Weight | ○ | ○ | ○ | |
| Body mass index | ○ | ○ | ○ | |
| Waist circumference | ○[ | ○ | ||
| Hypertension | Systolic blood pressure | ○ | ○ | ○ |
| Diastolic blood pressure | ○ | ○ | ○ | |
| Sensory | Vision | ○ | ○ | ○ |
| Hearing ability | ○ | ○ | ○ | |
| Diabetes | Fasting glucose | ○ | ○ | ○ |
| Hypertension, dyslipidemia, atherosclerosis | Total cholesterol | ○ | ○ | ○ |
| Triglyceride | ○ | ○ | ||
| HDL-cholesterol | ○ | ○ | ||
| LDL-cholesterol | ○ | ○ | ||
| Anemia | Hemoglobin | ○ | ○ | ○ |
| Kidney disease | Urine glucose | ○ | ||
| Urine occult blood | ○ | |||
| Urine pH | ○ | |||
| Urine protein | ○ | ○ | ○ | |
| Chronic kidney disease | Serum creatinine | ○ | ○ | |
| Estimated glomerular filtration rate | ○[ | ○ | ||
| Liver disease | Aspartate aminotransferase | ○ | ○ | ○ |
| Alanine aminotransferase | ○ | ○ | ○ | |
| Gamma-glutamyl transferase | ○ | ○ | ○ | |
| Pulmonary disease | Chest radiography | ○ | ○ | ○ |
| Cardiac disease | Electrocardiogram | ○ | ||
|
| ||||
| Questionnaire | ||||
| Past medical history | ○[ | ○[ | ○[ | |
| Family history | ○[ | ○[ | ○[ | |
| Smoking | Smoking status | ○ | ○ | ○ |
| Daily smoking amount | ○ | |||
| Average daily smoking amount (ex-smoker) | ○ | ○ | ||
| Average daily smoking amount (current smoker) | ○ | ○ | ||
| Smoking duration | ○ | |||
| Smoking duration (ex-smoker) | ○ | ○ | ||
| Smoking duration (current smoker) | ○ | ○ | ||
| Alcohol consumption | Drinking frequency | ○ | ○ | |
| Days of drinking per week | ○ | |||
| Amount of drinking per time | ○ | |||
| Amount of drinking per day | ○ | ○ | ||
| Type of alcohol | ○ | |||
| Maximum amount of drinking per day | ○ | |||
| Exercise | Exercise frequency per week | ○ | ||
| Days of strenuous exercise per week | ○ | ○ | ||
| Time of strenuous exercise per day | ○ | |||
| Days of moderate intensity exercise per week | ○ | ○ | ||
| Time of moderate intensity exercise per day | ○ | |||
| Days of walking exercise per week | ○ | |||
| Days of strength training per week | ○ | |||
| Hepatitis B | Hepatitis B | ○ | ○ | |
HDL, high-density lipoprotein; LDL, low-density lipoprotein.
Waist circumference measurement was started in 2008,
Estimated glomerular filtration rate measurement was not performed in 2010 to 2011,
Past medical history, development year, cured or not on pulmonary tuberculosis, hepatitis, liver disease, hypertension, cardiac disease, stroke, diabetes, cancer, and other disease,
Past medical history and medical treatment of stroke, cardiac disease, hypertension, diabetes, dyslipidemia, pulmonary tuberculosis, cancer, and other disease,
Family history of liver disease, hypertension, stroke, cardiac disease, diabetes, and cancer,
Family history of hypertension, stroke, cardiac disease, diabetes, cancer, and other disease.
Fig. 2The number of publications using National Health Information database from 2008 to 2021.
The operational definitions of commonly used outcomes and covariates in the field of diabetes and metabolism research
| ICD-10 codes and additional definitions | General health check-up results | ||
|---|---|---|---|
| Type 2 diabetes mellitus | E11–14 | Recording as either principal diagnosis or 1st to 4th additional diagnosis at least once a year and prescription of anti-diabetic drugs | Fasting blood glucose ≥126 mg/dL |
| Dyslipidemia | E78 | Recording at least once a year and prescription of lipid-lowering agents (statin, ezetimibe, fenofibrate) | Total cholesterol ≥240 mg/dL |
| Hypertension | I10–I11 | Recording at least once a year and prescription of antihypertensive agents | Systolic blood pressure ≥140 mm Hg or diastolic blood pressure ≥90 mm Hg |
| Myocardial infarction | I21, I22 | Recording at admission ≥1 | |
| Ischemic stroke | I63, I64 | Recording at admission ≥1 with claims for the imaging studies (brain CT or MRI) | |
| Heart failure | I50 | Recording at admission or outpatient clinic ≥1 | |
| Chronic kidney disease | N18, N19 | Recording at admission ≥1 or outpatient clinic ≥2 | eGFR <60 mL/min/1.73 m2 |
| End-stage renal disease | N18–N19, Z49, Z94.0, Z99.2 | Dialysis ≥30 days or kidney transplantation during hospitalization |
ICD-10, International Classification of Disease, 10th revision; CT, computed tomography; MRI, magnetic resonance imaging; eGFR, estimated glomerular filtration rate.