| Literature DB >> 28867810 |
HeeChel Kim1,2, Hong-Woo Chun3,4,5, Seonho Kim6,7, Byoung-Youl Coh8, Oh-Jin Kwon9,10, Yeong-Ho Moon11,12.
Abstract
The issue of public health in Korea has attracted significant attention given the aging of the country's population, which has created many types of social problems. The approach proposed in this article aims to address dementia, one of the most significant symptoms of aging and a public health care issue in Korea. The Korean National Health Insurance Service Senior Cohort Database contains personal medical data of every citizen in Korea. There are many different medical history patterns between individuals with dementia and normal controls. The approach used in this study involved examination of personal medical history features from personal disease history, sociodemographic data, and personal health examinations to develop a prediction model. The prediction model used a support-vector machine learning technique to perform a 10-fold cross-validation analysis. The experimental results demonstrated promising performance (80.9% F-measure). The proposed approach supported the significant influence of personal medical history features during an optimal observation period. It is anticipated that a biomedical "big data"-based disease prediction model may assist the diagnosis of any disease more correctly.Entities:
Keywords: aging; big data; dementia; machine learning; public health; support vector machine
Mesh:
Year: 2017 PMID: 28867810 PMCID: PMC5615520 DOI: 10.3390/ijerph14090983
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Dementia prediction process. DB, database; ML, machine learning.
Figure 2Personal medical data collection of Korea National Health Insurance Service (KNHIS) cooperating with other institutions.
Comparison of KNHIS-SC DB and Resident registration population construct ration according to sex, age, and region (as of 2013).
| Item | Classification | KNHIS-SC DB (%) | Population Statistics Based on Resident Registration (%) |
|---|---|---|---|
| Male | 38 | 39 | |
| Female | 62 | 61 | |
| Sum | 100 | 100 | |
| 70–74 | 36 | 42 | |
| 75–79 | 32 | 30 | |
| 80+ | 32 | 28 | |
| Sum | 100 | 100 | |
| Seoul | 18 | 18 | |
| Gyeonggi-do | 21 | 23 | |
| Gyeongsang-do | 29 | 28 | |
| Jeolla-do | 15 | 14 | |
| Gangwon-do | 4 | 4 | |
| Chungcheong-do | 12 | 12 | |
| Jeju Island | 1 | 1 | |
| Sum | 100 | 100 |
KNHIS-SC DB population.
| Year | Total | Age, Years | |||||
|---|---|---|---|---|---|---|---|
| 60–64 | 65–69 | 70–74 | 75–79 | 80–85 | 85+ | ||
| 2002 | 558,147 | 196,116 | 147,361 | 97,657 | 61,217 | 35,215 | 20,581 |
| 2003 | 557,195 | 161,261 | 157,155 | 105,339 | 66,978 | 41,351 | 25,111 |
| 2004 | 539,278 | 122,144 | 164,629 | 111,768 | 71,595 | 42,769 | 26,373 |
| 2005 | 521,967 | 85,776 | 167,293 | 120,415 | 76,092 | 44,336 | 28,055 |
| 2006 | 504,417 | 46,113 | 173,449 | 128,909 | 80,206 | 45,795 | 29,945 |
| 2007 | 487,460 | 186,057 | 135,631 | 84,926 | 47,983 | 32,863 | |
| 2008 | 470,005 | 151,331 | 142,687 | 89,766 | 50,578 | 35,643 | |
| 2009 | 452,631 | 114,559 | 149,864 | 96,030 | 54,763 | 37,415 | |
| 2010 | 436,395 | 80,560 | 153,026 | 104,029 | 58,905 | 39,875 | |
| 2011 | 422,171 | 43,571 | 160,221 | 112,412 | 63,101 | 42,866 | |
| 2012 | 405,614 | 172,606 | 118,931 | 67,529 | 46,548 | ||
| 2013 | 388,493 | 140,503 | 125,795 | 71,993 | 50,202 | ||
Figure 3Distribution of the Korea National Health Insurance Service Senior Cohort (KNHIS-SC) database, as of 2013. DM, dementia group; NC, normal controls.
KNHIS-SC DB contents.
| Demographic information (sex, age, area of residence) | |
| Death related information (date of death, cause of death) | |
| Types of health insurance (health insurance subscribers/medical benefits) | |
| Socio-economic level and other information (income quintile, disability registration information) | |
| Medical institution information | |
| Medical care benefit costs | |
| Information on medical subjects and medical diseases | |
| Details of medical examination, treatment, surgery and other acts, treatment materials, etc. | |
| Detailed diseases history | |
| In-house/outpatient prescription drug prescription | |
| Major test results such as body measurement, blood test | |
| Results of interview about history, lifestyle | |
| Balance, bone density test, depression, cognitive function test result | |
| Medical utilization, medical institution type and establishment division, medical institution, local information | |
| Information on the number of beds, doctors, equipment, etc. | |
| Long-term care application and judgment result | |
| Doctor’s note | |
| Billing statement | |
| Basic information on long-term care facilities |
Normal/abnormal criteria of GHE-DB features.
| No. | Feature | Class | |
|---|---|---|---|
| Normal | Abnormal | ||
| Body mass index (kg/m2) | 0–29 | 30–300 | |
| Waist circumference (cm) | Male: 50–90, Female: 50–85 | Male: 90–130, Female: 85–130 | |
| Blood pressure highest (mmHg) | 60–139 | 140–400 | |
| Blood pressure lowest (mmHg) | 40–89 | 90–250 | |
| Blood glucose before meals (g/dL) | 25–125 | 126–999 | |
| Total cholesterol (mg/dL) | 40–229 | 230–999 | |
| Hemoglobin (g/dL) | Male: 12–16.5, Female: 10–15.5 | Male: 0–12, Female: 0–10 | |
| Urine protein | Negative | Positive | |
| Serum GOT (U/L) | 0–50 | 51–999 | |
| Serum GPT (U/L) | 0–45 | 46–999 | |
| Gamma GTP (U/L) | Male: 11–77, Female: 8–45 | Male: 78–999, Female: 46–999 | |
Known dementia-related diseases.
| Disease Classification | Disease |
|---|---|
| (1) Parkinson’s disease, (2) Huntington’s disease, (3) Pick’s disease, (4) Progressive palsy, (5) Multiple system atrophy, (6) Genetic disorder, (7) Motor neuron disease, (8) Multiple sclerosis | |
| (1) Thiamine (B1): Wernicke’s encephalopathy, (2) Vitamin B12: Pernicious anemia, (3) Nicotinic acid: Pellagra | |
| (1) Hypothyroidism, (2) Lack of adrenal function and Cushing’s syndrome, (3) Hypothyroidism and hypertrophy, (4) Loss of renal function, (5) Liver failure, (6) Loss of lung function | |
| (1) Primary brain tumor, (2) Paraneoplastic limbic encephalitis, (3) Metastatic brain tumor | |
| (1) Human immunodeficiency virus, (2) Neurosyphilis, (3) Parvovirus, (4) Prion disease, (5) Tuberculosis, (6) Fungi, (7) Protozoa, (8) Sarcoidosis, (9) Whipple’s disease | |
| (1) Chronic subdural hematoma, (2) Anoxic syndrome, (3) Encephalitis, (4) Normal pressure hydrocephalus | |
| (1) Drug and drug addiction, (2) Alcoholism, (3) Heavy metal poisoning, (4) Organic toxins | |
| (1) Depression, (2) Schizophrenia, (3) Conversion reaction | |
| (1) Vasculitis, (2) CADASIL, (3) Acute intermittent porphyria, (4) Repeated seizures |
Figure 4Data sampling process.
Features of baseline experiment.
| Year | DB | Features |
|---|---|---|
| PIE-DB | (1) Sex, (2) Age, (3) Income quintile | |
| GHE-DB | (1) Height, (2) Weight, (3) Body mass index, (4) Waist, (5) Blood pressure highest, (6) Blood pressure lowest, (7) Blood sugar before meals, (8) Total cholesterol, (9) Hemoglobin, (10) Urine protein, (11) Serum GOT, (12) Serum GPT, (13) Gamma GTP, (14) History of personal illness: Stroke, Heart disease, High blood pressure, Diabetes, Hyperlipidemia, Phthisis, Cancer, (15) History of family illness: Stroke, Heart disease, High blood pressure, Diabetes, cancer | |
| MT-DB | Personal disease history diagnosis by every year |
Features of longitudinal data-based experiment set.
| DB | Features | |
|---|---|---|
| Increasing/decreasing compared to 2013 [Income quintile] | ||
| Class changing compared to 2013 [Income quintile] | ||
| Features of baseline | Increasing/decreasing compared to 2013 [(1) Height, (2) Weight, (3) Body mass index, (4) Waist, (5) Blood pressure highest, (6) Blood pressure lowest, (7) Blood sugar before meals, (8) Total cholesterol, (9) Hemoglobin, (10) Urine protein, (11) Serum GOT, (12) Serum GPT, (13) Gamma GTP, (14) History of personal illness: Stroke, Heart disease, High blood pressure, Diabetes, Hyperlipidemia, Phthisis, cancer, (15) History of family illness: Stroke, Heart disease, High blood pressure, Diabetes, Cancer] | |
| Class changing compared to 2013 [(1) Body mass index, (2) Waist, (3) Blood pressure highest, (4) Blood pressure lowest, (5) Blood sugar before meals, (6) Total cholesterol, (7) Hemoglobin, (8) Urine protein, (9) Serum GOT, (10) Serum GPT, (11) Gamma GTP, (12) History of personal illness: Stroke, Heart disease, High blood pressure, Diabetes, Hyperlipidemia, Phthisis, Cancer, (13) History of family illness: Stroke, Heart disease, High blood pressure, Diabetes, Cancer] | ||
| Personal disease history diagnosis by every year | ||
Longitudinal model 1 (All MT-DB features with primary disease group).
| Baseline | Longitudinal Model 1 | |||||
|---|---|---|---|---|---|---|
| 2013 | 2003–2013 | 2005–2013 | 2007–2013 | 2009–2013 | 2011–2013 | |
| 55 | 366 | 314 | 262 | 210 | 158 | |
| 614 | 638 | 613 | 630 | 648 | 648 | |
| 317 | 285 | 280 | 282 | 274 | 292 | |
| 533 | 565 | 570 | 568 | 576 | 558 | |
| 236 | 212 | 237 | 220 | 202 | 202 | |
| 67.5 | 70.8 | 69.6 | 70.5 | 72.0 | 70.9 | |
| 66.0 | 69.1 | 68.6 | 69.1 | 70.3 | 68.9 | |
| 72.2 | 75.1 | 72.1 | 74.1 | 76.2 | 76.2 | |
| 69.0 | 72.0 | 70.3 | 71.5 | 73.1 | 72.4 | |
Longitudinal model 2 (Best combination of MT-DB features).
| Baseline | Longitudinal Model 2 | |||||
|---|---|---|---|---|---|---|
| 2013 | 2003–2013 | 2005–2013 | 2007–2013 | 2009–2013 | 2011–2013 | |
| 55 | 709 | 559 | 409 | 259 | 113 | |
| 614 | 623 | 625 | 633 | 619 | 611 | |
| 317 | 78 | 79 | 82 | 69 | 67 | |
| 533 | 772 | 771 | 768 | 781 | 783 | |
| 236 | 227 | 225 | 217 | 231 | 239 | |
| 67.5 | 82.1 | 82.1 | 82.4 | 82.4 | 82.0 | |
| 66.0 | 88.9 | 88.8 | 88.5 | 90.0 | 90.1 | |
| 72.2 | 73.3 | 73.5 | 74.5 | 72.8 | 71.9 | |
| 69.0 | 80.3 | 80.4 | 80.9 | 80.5 | 80.0 | |
Best feature combination.
| DB | Primary Disease Group | Type | Features |
|---|---|---|---|
| E; Endocrine, nutritional and metabolic diseases (8) | Known | (1) Other disorders of pancreatic internal secretion, (2) Vitamin D deficiency, (3) Other disorders of thyroid, (4) Malnutrition-related diabetes mellitus | |
| Newly detected | (1) Hyperfunction of pituitary gland, (2) Hypofunction and other disorders of pituitary gland, (3) Other disorders of adrenal gland, (4) Unspecified protein-energy malnutrition | ||
| F; Mental and behavioural disorders (13) | Known | (1) Dementia in Alzheimer’s disease, (2) Vascular dementia, (3) Mental and behavioural disorders due to use of alcohol, (4) Acute and transient psychotic disorders, (5) Unspecified nonorganic psychosis, (6) Unspecified dementia, (7) Bipolar affective disorder, (8) Depressive episode, (9) Delirium, not induced by alcohol and other psychoactive substances, (10) Eating disorders, (11) Psychological and behavioural factors associated with disorders or diseases classified elsewhere, (12) Other mental disorders due to brain damage and dysfunction and to physical disease, (13) Schizophrenia | |
| G; Diseases of the nervous system (17) | Known | (1) Parkinson’s disease, (2) Secondary parkinsonism, (3) Parkinsonism in diseases classified elsewhere, (4) Alzheimer’s disease, (5) Other degenerative diseases of nervous system NEC, (6) Epilepsy, (7) Status epilepticus, (8) Transient cerebral ischaemic attacks and related syndromes, (9) Vascular syndromes of brain in cerebro- vascular diseases, (10) Disorders of other cranial nerves, (11) Hemiplegia, (12) Paraplegia and tetraplegia, (13) Other paralytic syndromes, (14) Hydrocephalus, (15) Other disorders of brain, (16) Other disorders of nervous system, NEC, (17) Other disorders of nervous system in diseases classified elsewhere | |
| I; Diseases of the circulatory system (7) | Known | (1) Hypertensive renal disease, (2) Subsequent myocardial infarction, (3) Cerebral infarction, (4) Cerebrovascular disorders in diseases classified elsewhere, (5) Sequelae of cerebrovascular disease, (6) Aortic aneurysm and dissection, (7) Stroke, not specified as haemorrhage or infarction | |
| N; Diseases of the genitourinary system (8) | Known | (1) Acute nephritic syndrome, (2) Chronic kidney disease, (3) Glomerular disorders in diseases classified elsewhere | |
| Newly detected | (1) Calculus of lower urinary tract, (2) Urethral stricture, (3) Other disorders of male genital organs, (4) Inflammatory disease of uterus, except cervix, (5) Polyp of female genital tract | ||
| M; Diseases of the musculoskeletal system and connective tissue (3) | Newly detected | (1) Kyphosis and lordosis, (2) Spinal osteochondrosis (3) Psoriatic and enteropathic arthropathies | |
| R; Symptoms, signs and abnormal clinical and laboratory findings, NEC (13) | Known | (1) Faecal incontinence, (2) Abnormalities of gait and mobility, (3) Unspecified urinary incontinence, (4) Somnolence, stupor and coma, (5) Other symptoms and signs involving cognitive functions and awareness, (6) Other symptoms and signs involving general sensations and perceptions, (7) Symptoms and signs involving appearance and behavior, | |
| Newly detected | (1) Ascites, (2) Retention of urine, (3) Voice disturbances, (4) Malaise and fatigue, (5) Enlarged lymph nodes, (6) Systemic Inflammatory Response Syndrome | ||
| S; Injury, poisoning and certain other consequences of external causes (6) | Known | (1) Fracture of skull and facial bones, (2) Open wound of thorax, (3) Injury of other and unspecified intrathoracic organs, (4) Open wound of forearm, (5) Fracture at wrist and hand level, (6) Injury of muscle and tendon at hip and thigh level | |
| - | - | (1) Total cholesterol, (2) Hemoglobin, (3) Serum GOT, (4) Serum GPT, (5) Gamma GTP |