Literature DB >> 30727703

Data resource profile: the National Health Insurance Research Database (NHIRD).

Liang-Yu Lin1,2, Charlotte Warren-Gash1, Liam Smeeth1, Pau-Chung Chen2,3,4,5,6.   

Abstract

Electronic health records (EHRs) can provide researchers with extraordinary opportunities for population-based research. The National Health Insurance system of Taiwan was established in 1995 and covers more than 99.6% of the Taiwanese population; this system's claims data are released as the National Health Insurance Research Database (NHIRD). All data from primary outpatient departments and inpatient hospital care settings after 2000 are included in this database. After a change and update in 2016, the NHIRD is maintained and regulated by the Data Science Centre of the Ministry of Health and Welfare of Taiwan. Datasets for approved research are released in three forms: sampling datasets comprising 2 million subjects, disease-specific databases, and full population datasets. These datasets are de-identified and contain basic demographic information, disease diagnoses, prescriptions, operations, and investigations. Data can be linked to government surveys or other research datasets. While only a small number of validation studies with small sample sizes have been undertaken, they have generally reported positive predictive values of over 70% for various diagnoses. Currently, patients cannot opt out of inclusion in the database, although this requirement is under review. In conclusion, the NHIRD is a large, powerful data source for biomedical research.

Entities:  

Keywords:  Database; Electronic health records; Information storage and retrieval; National Health Insurance Research Database; Taiwan

Mesh:

Year:  2018        PMID: 30727703      PMCID: PMC6367203          DOI: 10.4178/epih.e2018062

Source DB:  PubMed          Journal:  Epidemiol Health        ISSN: 2092-7193


INTRODUCTION

The increasing availability, size, and detail of electronic health records (EHRs) offer unprecedented opportunities for research. The advantages of EHRs include increased statistical power, speed, wide breadth, relatively low cost, representative population coverage, completeness of follow-up, and the ability to assess interventions in routine clinical care [1]. Linking EHRs to disease registries and other resources can further extend their utility. Meanwhile, randomised controlled trials (RCTs) control for known and unknown confounding factors; therefore, they are regarded as the gold standard for measuring the efficacy of interventions [2]. However, the true effectiveness of exposures may be influenced by many factors in a real-world setting, leading to a gap between efficacy and effectiveness. Consequently, real-world data, collected in non-RCT settings, are essential to bridging this gap [3]. As an important source of real-world data, EHRs have become a practical tool in medical research. By utilising EHRs, researchers can measure treatment effects, demonstrate trends in disease incidence and prevalence, and further explore possible disease aetiologies. Among national EHR databases all over the world, the National Health Insurance Research Database (NHIRD) of Taiwan is unique. This large database, which contains data from 23 million residents of Taiwan, was previously described by Chen et al. [4]. However, the NHIRD was updated in 2016. This database changed its regulatory administration, was integrated with other datasets for further linkage, and released its full population dataset. The NHIRD now provides greater flexibility for scientific research. In this article, we introduce the latest version of the NHIRD, demonstrate its key features for research, and describe its strengths and weaknesses.

BASIC DATA RESOURCES

National Health Insurance programme of Taiwan

To increase the affordability and accessibility of health care, in 1995, the Taiwanese government initiated a single-payer health insurance system, known as National Health Insurance (NHI). NHI has a contract with most healthcare facilities in Taiwan, and it is mandatory for physicians to upload the claims data from each visit to the National Health Insurance Ministry. Notably, the primary care system in Taiwan is different from that of many other countries. Referrals from general practitioners are not required to receive specialist care; therefore, patients with non-emergency health concerns can either visit local private or public clinics or go directly to specialists at hospital outpatient departments [5]. In 2017, 93% of healthcare facilities in Taiwan contracted with NHI, except some self-pay private clinics [6]. As a programme that provides universal care health coverage, NHI covers all necessary medical expenses including outpatient visits, the inpatient system, prescriptions, treatment with traditional Chinese medicine, dental services, operations, and investigations such as X-rays or magnetic resonance imaging. The coverage of NHI reached 92% as it was established; by the end of 2014, NHI covered 99.9% of the Taiwanese population [7,8].

History of insurance data usage and governance

In 2000, the anonymous and encrypted sampling dataset from this national insurance system was first released for use in research, under the regulation and maintenance of the National Health Research Institutes of Taiwan. From 2000 to 2013, the National Health Research Institutes made available to researchers general sampling datasets with 1 million subjects, as well as disease-specific sampling datasets. In 2016, these insurance data were moved to the Data Science Centre of the Ministry of Health and Welfare of Taiwan, where data are regulated and managed by the government [9]. The regulatory structure of the NHIRD is illustrated in Figure 1 [10]. The claims data from the NHI are stored and processed at the Data Science Centre, along with other governmental surveys and datasets. At the Data Science Centre, the NHIRD and other datasets are compared with the Household Registry Record from the Ministry of the Interior for quality control. Variables, such as sex and dates, are examined to ensure accuracy and consistency across different years. All data are de-identified and encrypted to protect participants’ privacy [11].
Figure 1.

Administrative structure of the National Health Insurance Research Database [10].

Research using the National Health Insurance Research Database data

The NHIRD is a powerful for observing chronic diseases and assessing the effects of treatments. For instance, hepatitis B virus (HBV) and hepatitis C virus (HCV) infections are relatively prevalent in Taiwan. Previous studies using the NHIRD demonstrated that the use of statins, medicines for decreasing low-density lipoprotein in the blood, was associated with a decreased incidence of hepatocellular carcinoma (HCC) in HBV and HCV patients [8,9]. Another study showed that the use of nucleoside analogues as antiviral treatments for chronic hepatitis B reduced HCC recurrence in HBV patients receiving liver resection [12]. Furthermore, the availability of data linkage makes it possible to conduct population-based studies of rare diseases. By using NHIRD data, Kuo et al. [13] demonstrated an increased heritable risk of systemic lupus erythematosus (SLE) and other autoimmune diseases among families of SLE patients. Since 2014, more than 300 published studies have used NHIRD data each year (Figure 2). To date, over 2,700 peer-reviewed studies have been published using NHIRD data, covering such topics as general medicine, multidisciplinary science, psychiatry, clinical neurology, oncology, and public environmental and occupational health.
Figure 2.

Publications using the National Health Insurance Research Database from 2000 to 2018.

MEASUREMENTS

Practice and patient data

The basic structure of the NHIRD data is shown in Figure 3. The de-identified data contain demographic variables, including the insured persons’ registration location, sex, age, investigations, diagnoses, prescriptions, and details of each outpatient visit or their inpatient care. Disease diagnoses are coded using the International Classification of Diseases, Ninth Revision. Each subject in the dataset is coded with an encrypted identifier, which can be used to link future patient data. Detailed laboratory test results and medical notes are not included in this database.
Figure 3.

Data structure of the National Health Insurance Research Database.

Data release

The NHIRD data are released in three forms. The first form is a general dataset containing 2 million patients. Two million subjects are collected using stratified random sampling by age, sex, and the registry of regions from the full database population. They were sampled at three different time points: 2000, 2005, and 2010 (Supplementary Material 1). Each dataset contains claims data including diagnoses, prescriptions, investigation items, and treatments that the subjects received from 2000 to 2016. For the datasets sampled in 2005 and 2010, two additional datasets are available: from 2005 to 2016 and from 2010 to 2016. In addition to the complete claims data, these sampling datasets also contain data from cause of death datasets, cancer registry datasets, major illness datasets, and hospital information datasets. The general 2-million-patient sampling dataset is considered to be nationally representative. The second form of NHIRD data are disease-specific databases. These databases contain complete claims data of all patients with a certain health condition. For instance, all patients with a diabetes diagnosis from 2002 to 2015 are included in the diabetes database. As of 2018, there are 13 disease-specific databases available for research (Table 1). These datasets can also be linked to cancer registry data and cause of death data.
Table 1.

Disease-specific databases of the National Health Insurance Research Database

Database nameYearNo. of new case
Colorectal Cancer Health2002-2015175,405
Breast Cancer Health2002-2015136,476
Prostate Cancer Health2002-201553,937
Systemic Lupus Erythematosus Health2002-201529,637
Hypertension Health2002-20153,342,827
Brain Tumour Health2002-201510,267
Chronic Kidney Disease Health2002-20151,066,892
End-Stage Renal Disease Health2002-2015134,228
Diabetes Mellitus Health2002-20151,720,602
Injury2000-201525,925,939
Triple-High[1]2001-20156,558
Disability Process1996, 1999, 2003, 2007, 20116,935
Maternal and Child Health2004-20142,171,765

Triple-High: hypertension, hyperglycaemia, hyperlipidaemia.

The third form of NHIRD data is the full population dataset, which has been available for research since 2016. The full population dataset covers the entire Taiwanese population from 2000 to 2016, which comprises approximately 23 million people. Researchers can apply for complete claims data, including inpatient and outpatient records, investigations, and treatment, which can be linked with hospital information, birth certificate applications, death records, the cancer registry dataset, and the major illness datasets. Furthermore, the full population data can also be linked with individual datasets, a feature that will be introduced later. These released datasets are a valuable source for epidemiological research.

Data linkage

Since 2016, under the authorisation and regulation of the Ministry of Health and Welfare, NHIRD data can be more widely linked with other public surveys at the Data Science Centre. These datasets include governmental surveys, disease registries, health surveys, social reporting system data, and welfare registry data. Detailed descriptions of these databases are given in Table 2. These databases and the NHIRD can be linked through an encrypted personal ID using deterministic record linkage. Due to privacy issues, this data linkage can only be processed by researchers at the Data Science Centre. Accessing some sensitive data, such as the domestic violence database, requires special authorisation from other administrative departments. In addition to linking governmental data, with the informed consent of study subjects, researchers are also allowed to link their own research databases with the NHIRD. For instance, the Taiwan Biobank Database is a national cohort containing biological samples and comprehensive examinations of 200,000 adult volunteers that will be linked to the NHIRD [14]. Such data linkages can help researchers discover possible interactions among genes, environmental factors, and diseases.
Table 2.

Databases available for linkage

Name of databaseYear
Health data
 Taiwan Cancer Registry2007-2012
 Cause of death data1971-2014
 Birth certificate applications2001-2013
 Traffic accident data2003-2014
 “Triple-high Status” Survey2006-2007
 Taiwan Birth Cohort Study2005
 “Knowledge, Attitude, and Practice of Contraception” Survey1965-2008
 Taiwan Youth Health Survey File2006-2010
 Rare disease data2012
 Artificial reproductive data1998-2012
 Cancer screening – Pap smear data2004-2013
 Colorectal cancer screening2010-2013
 Breast cancer screening2004-2013
 Oral mucosal screening2010-2013
 Taiwan Healthy Behaviour Risk Factor Surveillance Survey File2007-2012
Social surveys
 National Aboriginal Population Profile2006-2012
 Personal data for the sampled NHI claims cohorts2000, 2005
 National Health Interview Survey2001-2009
 Taiwan Longitudinal Study on Aging1998-2011
 Taiwan Smoking Behaviour Survey2004-2009
Welfare databases
 The Juvenile Condition Survey in Taiwan-Fuchien Area2003
 Report of the Home Care Subsidy User Condition Survey2007
 The Satisfaction with Home Care Services Survey2011
 The Low-Income and Middle-Income Family Living Condition Survey2013
 Taiwan Longitudinal Study on Aging2009-2013
 Physically and Mentally Disabled Citizens Living and Demand Assessment Survey2011
 Single Parent Family Condition Survey2010
 Women’s Living Conditions Survey1998-2011
 Disabled Population Profile2014
 Low-income and middle-low-income household data2014
 Family violence data2011-2014
 Reported data of protection of children and youths2011-2014
 Reported data of sexual assault2011-2014

NHI, National Health Insurance.

Strengths and weaknesses

Strengths

The NHIRD is a nationally representative cohort that contains detailed registry and claims data from all 23 million residents of Taiwan. This huge database provides researchers with powerful and generalisable real-world evidence for biomedical studies. For instance, a molecular epidemiological study has suggested that aristolochic acid (AA), an ingredient in Chinese herbal remedies, was correlated to HCC in Taiwan and other Asian countries [15]. Similar findings were later found using NHIRD data. Chen et al. [16] analysed NHIRD data and discovered that using Chinese herbs containing AA increased the risk of HCC among patients with HBV infections. In addition, after the update in 2016, the NHIRD can be further linked with other datasets to increase the power and potential to research specific population subgroups, rare conditions, and factors that are not usually contained in clinical databases, such as living conditions, violence, or detailed lifestyle data. Payment and reimbursement data are also valuable for health economic analyses.

Weaknesses

There are some issues with the NHIRD. First, the NHIRD lacks comprehensive validation, although some validation studies of the clinical diagnoses in the NHIRD have been done. Some of these validation studies used national disease registries as the reference standard, which is more convincing. Other studies used hospital-based records to validate the diagnoses found in the NHIRD and have reported relatively high positive predictive values (over 70%) (Supplementary Material 2). However, the samples of these studies were small and drawn from a limited number of hospitals. Therefore, the samples may not be regarded as nationally representative. To improve the accuracy of the NHIRD, the Ministry of Health and Welfare of Taiwan recently initiated a national validation project using existing registry data [17]. However, until this new evidence of the database’s validity is reported, researchers should carefully interpret results from the NHIRD. Second, consent from the participants included in the NHIRD is another controversial issue. By law, all residents in Taiwan are required to have NHI, and their data are included in the NHIRD; there is currently no way for participants to opt out of this national cohort. However, in 2017, the Supreme Administrative Court upheld the legitimacy of using the NHIRD data for research [18]. People’s ability to opt out of inclusion in the NHIRD remains under discussion [19]. Finally, records of self-pay healthcare and out-of-pocket payments, such as for cosmetic surgery, are not included in the NHIRD. This may narrow the scope of research using the NHIRD, and researchers must be aware of the effects of these non-included variables.

Data access

Researchers can access NHIRD data after ethical and scientific review processes. Prior to applying, researchers must obtain approval from the institutional review board. Notably, the applicant must be Taiwanese or be affiliated with a Taiwanese research institute. Applicants should submit their research proposal to the Data Science Centre. Proposals should include specific methods and variables required for their analyses. The cost of accessing data depends on the number of variables requested and the time period that they require the data for analyses; for example, accessing one variable for 1 year would cost 200 new Taiwanese dollars. After receiving an application, the Ministry of Health and Welfare reviews the legitimacy of the proposal, which is later reviewed by a scientific committee consisting of three experts. If one of the committee members disagrees with the proposed use of the data, then the researchers must submit a revised proposal to a higher advisory committee for a second review. After receiving approval, researchers must go to the branches of the Data Science Centre to perform their data analyses. The analyses of NHIRD data are complicated, and there is no structural training course for using the NHIRD. Therefore, a mock dataset containing 100,000 subjects is provided by the Data Science Centre to help researchers in writing statistical analysis syntax. When researchers enter the Data Science Centre, they are allowed to use provided computers and software including SAS, Stata, R, and SPSS to conduct their data analyses [20].

Ethics and confidentiality

Ethical review board approval is mandatory when applying to use NHIRD data. There are 27 institutional review boards capable of issuing approvals, and all are supervised and regulated by the Ministry of Health and Welfare [21]. To protect individuals’ confidentiality, all datasets in the Data Science Centre are pseudonymised. Personal ID, birth date, and names are encrypted, and this de-identification process was approved by an independent third party organisation [20]. To further secure the participants’ privacy, NHIRD datasets cannot be accessed outside the Data Science Centre, meaning that researchers must analyse these datasets at the Data Science Centre. When accessing the Data Science Centre, researchers are not allowed to bring any recording devices, including paper and pen. In addition, their statistical analysis syntax needs to be reviewed by the Data Science Centre prior to using the computers and software provided. The analysed results are also examined by the Data Science Centre before exporting. Any results with fewer than 3 subjects are not allowed to be exported to prevent re-identification [22].

CONCLUSION

The NHIRD of Taiwan contains a large quantity of claims data and has the potential for multiple data linkages. Although more validation research is needed, and regulatory work to protect privacy is ongoing, this nationwide cohort is a valuable resource for medical research.
  6 in total

Review 1.  Bridging the efficacy-effectiveness gap: a regulator's perspective on addressing variability of drug response.

Authors:  Hans-Georg Eichler; Eric Abadie; Alasdair Breckenridge; Bruno Flamion; Lars L Gustafsson; Hubert Leufkens; Malcolm Rowland; Christian K Schneider; Brigitte Bloechl-Daum
Journal:  Nat Rev Drug Discov       Date:  2011-07-01       Impact factor: 84.694

2.  Nationwide Population Science: Lessons From the Taiwan National Health Insurance Research Database.

Authors:  Ann W Hsing; John P A Ioannidis
Journal:  JAMA Intern Med       Date:  2015-09       Impact factor: 21.873

3.  Association between nucleoside analogues and risk of hepatitis B virus–related hepatocellular carcinoma recurrence following liver resection.

Authors:  Chun-Ying Wu; Yi-Ju Chen; Hsiu J Ho; Yao-Chun Hsu; Ken N Kuo; Ming-Shiang Wu; Jaw-Town Lin
Journal:  JAMA       Date:  2012-11-14       Impact factor: 56.272

4.  Herbal medicine containing aristolochic acid and the risk of hepatocellular carcinoma in patients with hepatitis B virus infection.

Authors:  Chi-Jen Chen; Yao-Hsu Yang; Meng-Hung Lin; Chuan-Pin Lee; Yu-Tse Tsan; Ming-Nan Lai; Hsiao-Yu Yang; Wen-Chao Ho; Pau-Chung Chen
Journal:  Int J Cancer       Date:  2018-05-07       Impact factor: 7.396

5.  Aristolochic acids and their derivatives are widely implicated in liver cancers in Taiwan and throughout Asia.

Authors:  Alvin W T Ng; Song Ling Poon; Mi Ni Huang; Jing Quan Lim; Arnoud Boot; Willie Yu; Yuka Suzuki; Saranya Thangaraju; Cedric C Y Ng; Patrick Tan; See-Tong Pang; Hao-Yi Huang; Ming-Chin Yu; Po-Huang Lee; Sen-Yung Hsieh; Alex Y Chang; Bin T Teh; Steven G Rozen
Journal:  Sci Transl Med       Date:  2017-10-18       Impact factor: 17.956

6.  Familial Aggregation of Systemic Lupus Erythematosus and Coaggregation of Autoimmune Diseases in Affected Families.

Authors:  Chang-Fu Kuo; Matthew J Grainge; Ana M Valdes; Lai-Chu See; Shue-Fen Luo; Kuang-Hui Yu; Weiya Zhang; Michael Doherty
Journal:  JAMA Intern Med       Date:  2015-09       Impact factor: 21.873

  6 in total
  74 in total

1.  Urban-Rural Disparity in the Incidence of Diagnosed Autism Spectrum Disorder in Taiwan: A 10-Year National Birth Cohort Follow-up Study.

Authors:  Yuu-Hueih Hsu; Chi-Wen Chen; Yuh-Jyh Lin; Chung-Yi Li
Journal:  J Autism Dev Disord       Date:  2022-02-07

2.  Machine-learning model to predict the cause of death using a stacking ensemble method for observational data.

Authors:  Chungsoo Kim; Seng Chan You; Jenna M Reps; Jae Youn Cheong; Rae Woong Park
Journal:  J Am Med Inform Assoc       Date:  2021-06-12       Impact factor: 4.497

3.  Pyogenic Liver Abscess Risk in Patients With Newly Diagnosed Type 2 Diabetes Mellitus: A Nationwide, Population-Based Cohort Study.

Authors:  Tzu-Yuan Wang; Hsueh-Chou Lai; Hsin-Hung Chen; Mei-Lin Wang; Ming-Chia Hsieh; Chwen-Tzuei Chang; Rong-Hsing Chen; Chun-Wei Ho; Yi-Chin Hung; Juei-Yu Tseng; Cheng-Li Lin; Chia-Hung Kao
Journal:  Front Med (Lausanne)       Date:  2021-05-12

4.  Risk of herpes zoster in psoriasis patients receiving systemic therapies: a nationwide population-based cohort study.

Authors:  Sze-Wen Ting; Sze-Ya Ting; Yu-Sheng Lin; Ming-Shyan Lin; George Kuo
Journal:  Sci Rep       Date:  2021-06-03       Impact factor: 4.379

5.  Increased Risk of Kawasaki Disease in Infants Born of Mothers With Immune Disorders.

Authors:  Hsiao-Wen Chu; Chien-Heng Lin; Ming-Chih Lin; Ya-Chi Hsu
Journal:  Front Pediatr       Date:  2021-05-14       Impact factor: 3.418

6.  Uracil-tegafur vs fluorouracil as postoperative adjuvant chemotherapy in Stage II and III colon cancer: A nationwide cohort study and meta-analysis.

Authors:  Po-Huang Chen; Yi-Ying Wu; Cho-Hao Lee; Chi-Hsiang Chung; Yu-Guang Chen; Tzu-Chuan Huang; Ren-Hua Yeh; Ping-Ying Chang; Ming-Shen Dai; Shiue-Wei Lai; Ching-Liang Ho; Jia-Hong Chen; Yeu-Chin Chen; Je-Ming Hu; Sung-Sen Yang; Wu-Chien Chien
Journal:  Medicine (Baltimore)       Date:  2021-05-07       Impact factor: 1.889

7.  Osteoporosis increases the risk of rotator cuff tears: a population-based cohort study.

Authors:  Jia-Pei Hong; Shih-Wei Huang; Chih-Hong Lee; Hung-Chou Chen; Prangthip Charoenpong; Hui-Wen Lin
Journal:  J Bone Miner Metab       Date:  2022-01-21       Impact factor: 2.626

8.  Validity of ICD-10-CM Codes Used to Identify Patients with Chronic Hepatitis B and C Virus Infection in Administrative Claims Data from the Taiwan National Health Insurance Outpatient Claims Dataset.

Authors:  Ming-Jen Sheu; Fu-Weng Liang; Sheng-Tun Li; Chung-Yi Li; Tsung-Hsueh Lu
Journal:  Clin Epidemiol       Date:  2020-02-20       Impact factor: 4.790

9.  Does Statin Therapy Reduce the Risks of Mortality and Major Adverse Cardiac and Cerebrovascular Events in Young Adults with End-Stage Renal Disease? Population-Based Cohort Study.

Authors:  Ya-Lien Cheng; Huang-Yu Yang; Chao-Yi Wu; Chung-Ying Tsai; Chao-Yu Chen; Ching-Chung Hsiao; Hsiang-Hao Hsu; Ya-Chung Tian; Chieh-Li Yen
Journal:  J Clin Med       Date:  2021-05-13       Impact factor: 4.241

10.  Chronic Kidney Disease Is Associated with High Mortality Risk in Patients with Diabetes after Primary Shoulder Arthroplasty: A Nationwide Population-Based Cohort Study.

Authors:  Meng-Hao Lin; Su-Ju Lin; Liang-Tseng Kuo; Tien-Hsing Chen; Chi-Lung Chen; Pei-An Yu; Yao-Hung Tsai; Wei-Hsiu Hsu
Journal:  Diagnostics (Basel)       Date:  2021-05-01
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.