| Literature DB >> 35629225 |
Kyuhee Lee1,2, Jinhyong Lee2, Sangwon Hwang1,2, Youngtae Kim2, Yeongjae Lee2, Erdenebayar Urtnasan1,2, Sang Baek Koh3, Hyun Youk2,4.
Abstract
We propose a method for data provision, validation, and service expansion for the spread of a lifelog-based digital healthcare platform. The platform is an operational cloud-based platform, implemented in 2020, that has launched a tool that can validate and de-identify personal information in a data acquisition system dedicated to a center. The data acquired by the platform can be processed into products of statistical analysis and artificial intelligence (AI)-based deep learning modules. Application programming interfaces (APIs) have been developed to open data and can be linked in a programmatic manner. As a standardized policy, a series of procedures were performed from data collection to external sharing. The proposed platform collected 321.42 GB of data for 146 types of data. The reliability and consistency of the data were evaluated by an information system audit institution, with a defects ratio of approximately 0.03%. We presented definitions and examples of APIs developed in 17 functional units for data opening. In addition, the suitability of the de-identification tool was confirmed by evaluating the reduced risk of re-identification using quasi-identifiers. We presented specific methods for data verification, personal information de-identification, and service provision to ensure the sustainability of future digital healthcare platforms for precision medicine. The platform can contribute to the diffusion of the platform by linking data with external organizations and research environments in safe zones based on data reliability.Entities:
Keywords: diffusion of digital healthcare; digital healthcare; healthcare platform; precision medicine
Year: 2022 PMID: 35629225 PMCID: PMC9147795 DOI: 10.3390/jpm12050803
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Lifelog-based digital healthcare platform.
Example of an API for product search.
| Type | Field Name | Type | Description | Input |
|---|---|---|---|---|
| Header | X-CKAN-API-Key | string | Information retrieval with the authentication key | Authentication ID |
| Parameters | q | string | Inquiry by adding conditions for each column | “fields_name”:value |
| fq | list | Applying filters per the column | “fields_name”:value | |
| sort | string | Sorting the result | “sort”:”score desc, metadata_modified desc” | |
| rows | Int *a | The number of rows in the query result | The number of lists to be displayed | |
| start | int | The page in the result | The page number to be displayed | |
| include_private | bool | Whether to retrieve private datasets | “include_private”:true | |
| use_default_schema | bool | Use of the default schema | “use_default_schema”:true | |
| include_drafts | bool | retrieval of draft data | “include_drafts”:true | |
| Result | success | bool | Success or failure of API call | res[“success”] |
| result | Dict *b | Retrieved results | res[“result”] | |
| result.count | int | The number of data in the result | res[“result”][“count”] | |
| result.search_facets | dict | The number of retrieved information by conditions | res[“result”][“search_facets”] | |
| result.result | int | The item list in result | res[“result”][“result”] |
*a: integer, *b: dictionary (data structure).
The dataset list of medical data centers.
| Data Centers | Data Sources | 2020 | 2021 | ||
|---|---|---|---|---|---|
| Cases | Capacity (GB) | Cases | Capacity (GB) | ||
| Yonsei Wonju Health System | Metabolic syndrome’s lifelog | 53,210 | 4.50 | 67,695 | 7.76 |
| 12-lead ECG | 40,642 | 1.23 | 2,541,855 | 1.90 | |
| Cohort study | 11,364 | <0.01 | 82,635 | 0.01 | |
| Diabetic patient’s lifelog | - | - | 123,194 | 4.52 | |
| COPD patient’s lifelog | - | - | 358 | 0.04 | |
| Integration data | - | - | 800 | <0.01 | |
| Korea University Medicine | CDM data | 849,210,000 | 94.40 | 36,251,389 | 38.33 |
| inPHR data | 70,000 | <0.01 | - | - | |
| CDM extension data | - | - | 1,505,000 | 0.02 | |
| Kangwon National University Hospital | Lifelog data | 625,846 | 0.19 | 56,639,191 | 86.39 |
| Clinical information data | 6,683,156 | 2.00 | 2,619,241,352 | 0.24 | |
| Clinical support data | 9,179,470 | 2.80 | 7,765,402,405 | 17.98 | |
| Health insurance and other data | 16,513,279 | 5.00 | 1,483,360,512 | 27.58 | |
| Clinical and lifelog data of newcomers | 40,000 | <0.01 | 369,460 | <0.01 | |
| Nutritional images | 5000 | 10.00 | 25,000 | 0.07 | |
| Diabetic patient’s lifelog | - | - | 138,529 | <0.01 | |
| Newcomers’ data | - | - | 9,045,808 | 0.07 | |
| Visit and health checkup data | - | - | 511,632 | 0.05 | |
| Cohort’s clinical data | - | - | 1,179,569,762 | 6.00 | |
| Hallym University Chuncheon | Smart health data in Kangwon | 45,600 | 2.31 | 1,390,391 | 17.93 |
| Healthy life data in Inje-Yangu | 68,500 | 3.89 | 1,598,230 | 28.73 | |
| Healthy life data in Seoul | 75,600 | 2.35 | 1,202,102 | 13.70 | |
| Chatbot data for dementia | 500 | 1.17 | 9277 | 8.75 | |
| Mild cognitive disorder | - | - | 320 | 0.04 | |
| Telemedicine services | - | - | 22,245 | 37.59 | |
| Dementia data | - | - | 80 | <0.01 | |
| The Korean Audiological Society | Auditory test data | 19,000 | <0.01 | 56,400 | 0.02 |
The dataset list of lifelog data centers.
| Data Centers | Data Sources | 2020 | 2021 | ||
|---|---|---|---|---|---|
| Case | Capacity (GB) | Case | Capacity (GB) | ||
| Bagel labs | Morphotype data | 206,000 | 0.02 | 167,000 | 0.04 |
| Morphotype analysis data | 298,000 | 0.03 | 247,000 | 0.08 | |
| Huray Positive | Self-recorded data | 664,130 | 0.01 | 301,988 | <0.01 |
| Intervention data | 2781 | <0.01 | 1769 | <0.01 | |
| Goodoc | Medical service data | 6,562,939 | 1.37 | 11,642,068 | 2.10 |
| Registry service data | 8,170,880 | 0.64 | 10,722,939 | 1.73 | |
| Medical consulting data | 7,034,037 | 0.64 | 11,632,131 | 1.86 | |
| Insurance service data | 105 | <0.01 | 115 | <0.01 | |
| Vaccination | - | - | 3953 | 0.02 | |
| K-weather | Life-air data for house | 76,039,210 | 0.37 | 222,370,000 | 2.39 |
| Life-air data for school | 110,532,228 | 0.55 | 173,980,000 | 2.58 | |
| Life-air data for crowd facilities | 14,331,629 | 0.07 | 44,400,000 | 0.63 | |
| Health environment index | 432,960 | 0.01 | 1,050,000 | 0.05 | |
| Lifelog data of a vulnerable social group | 6,785,432 | 0.03 | 189,110,000 | 2.16 | |
| Clinical trials in Wonju | - | - | 370,530,000 | 4.11 | |
| I-SENS | Chronic disease analysis data | 523,504 | 0.05 | 440,158 | 0.10 |
| Healthmax | Metabolic syndrome’s data | 11,207,155 | 0.82 | 4,794,343 | 4.14 |
| LG U Plus * | Lifelog on communication | - | - | 15,597,222 | 1.66 |
| Health Bridge * | Lifelog under stress | - | - | 9262 | <0.01 |
* is the new data center in 2021.
Figure 2Data contribution in cases and capacity.
Defects ratio and Six-Sigma value in data validation.
| Evaluation Factors | 2020 | 2021 |
|---|---|---|
| The number of opportunities | 906,084,543 | 82,727,257,835 |
| The number of defects | 111,704 | 27,203,636 |
| DPO | 1.23 × 10−4 | 3.28 × 10−4 |
| DPMO | 123 | 329 |
| Defects ratio | 0.01% | 0.03% |
| Data consistency | 99.99% | 99.70 |
| Six-Sigma | 5.17 | 4.91 |
The result of de-identification.
| Parameters | Description | Records at Risk(%) | Highest Risk(%) | Success Risk(%) | De-Identification Method | |||
|---|---|---|---|---|---|---|---|---|
| Before | After | Before | After | Before | After | |||
| WNJU_BLOD_ID | Patient ID | - | - | - | - | - | - | Encryption |
| INDVDL_FLNM | Patient name | - | - | - | - | - | - | Remove |
| BRDT | Birthday | 100 | 0 | 100 | 0.51 | 100 | 0.51 | Masking |
| ADDR | Address | 100 | 0 | 100 | 4 | 100 | 4 | Masking |
| MBL_NO | Mobile | 100 | 0 | 100 | 1.58 | 100 | 1.02 | Masking |
| AGE | Age | 15.30 | 0 | 100 | 5.23 | 16.83 | 3.06 | Interval |
| TC | Total cholesterol | 94.89 | 0 | 100 | 5.26 | 53.57 | 1.53 | Interval |
| ALBMN | Albumin | 100 | 0 | 100 | 5.32 | 92.34 | 1.57 | Interval |
| AST | AST | 17.85 | 0 | 100 | <0.1 | 19.89 | <0.1 | Interval |
| ALT | ALT | 40 | 0 | 100 | 0.22 | 27.04 | <0.1 | Interval |
| GGTP | γ-GTP | 100 | 0 | 100 | 0.51 | 97.95 | 0.51 | Interval |
| LDL | LDL | 97.44 | 0 | 100 | 0.51 | 53.06 | 0.51 | Interval |
| HDL | HDL | 30.61 | 0 | 100 | 20 | 23.97 | 2.04 | Interval |
| Cr | Creatin | 100 | 0 | 100 | 8.33 | 97.96 | 2.55 | Interval |
| BUN | Blood urea nitrogen | 86.73 | 0 | 100 | 16.66 | 53.06 | 1.02 | Interval |
| WBC | White blood cell count | 100 | 1.53 | 100 | 33.33 | 98.46 | 4.08 | Interval |
| PLT | Platelet | 100 | 0 | 100 | 20 | 69.38 | 1.53 | Interval |
The list of developed APIs.
| API Name | Description | URL | Example of the Output |
|---|---|---|---|
| ckan.logic.action.create.package_create (POST) | Creation of packages |
| { "help": "http://API url", "success": true, {"author": , |
| ckan.logic.action.get.package_search (GET) | Searching for data list and information |
| { "help": "http://API url", "success": {"author": "yj", |
| ckan.logic.action.get.package_show (GET) | Information of the specific package |
| { "help": "http://API url", "success": {"author": "yj", |
| ckan.logic.action.create.package_patch (POST) | Updating the information of the specific package |
| { "help": "http://API url", "success": {"author": "yj", |
| ckan.logic.action.delete.package_delete (POST) | Deletion of the package |
| { "help": "http://API url", "success": true, “result”:null } |
| ckan.logic.action.create.resource_create (POST) | Registration of package |
| {"help":"url": http://API url:8080/dataset/ |
| ckan.logic.action.patch.resource_patch (POST) | Updating meta information of attached files in the package |
| { "help": "http://API url", "success": true, {"author": , |
| ckan.logic.action.delete.resource_delete (POST) | Deletion of the file in the package |
| { "help": "http://API url", "success": true, “result”:null } |
| ckan.logic.action.get.statistics_list (GET) | Retrieval of the statistic |
| { "help": "http://API url", "success": true, {"author": , |
| ckan.logic.action.create.schema_create (POST) | Registration |
| { "help": "http://API url?name=schema_create","success": true, "result": { "success": "data insert success" } } |
| ckan.logic.action.get.schema_search (GET) | Retrieval |
| { "help": "http://API url", "success": true, {"author": |
| ckan.logic.action.get.schema_delete (POST) | Deletion |
| { "help": "http://API url?name=schema_delete","success": true, "result": { "success": "data delete success" } } |
| ckan.logic.action.create.species_create (POST) | Registration of data items |
| { "help": "http://API url?name=schema_create","success": true, "result": { "success": "LI012000025" } } |
| ckan.logic.action.get.species_list (GET) | Retrieval of data items |
| { "help": "http://API url", "success": true, {"author": |
| ckan.logic.action.patch.species_patch (POST) | Updating the information |
| { "help": "http://API url?name=schema_patch", |
| ckan.logic.action.delete.species_delete (POST) | Deletion of the data item |
| { "help": "http://API url?name=speices_delete","success": true, "result": { "success": "delete success" } } |
| ckan.logic.action.get.organization_list (GET) | Retrieval of the organization list |
| { "help": "http://API url", "success": true, {"author":, |