| Literature DB >> 35416792 |
Miye Wang1, Sheyu Li2, Tao Zheng1, Nan Li1, Qingke Shi1, Xuejun Zhuo1, Renxin Ding1, Yong Huang1.
Abstract
BACKGROUND: With the advent of data-intensive science, a full integration of big data science and health care will bring a cross-field revolution to the medical community in China. The concept big data represents not only a technology but also a resource and a method. Big data are regarded as an important strategic resource both at the national level and at the medical institutional level, thus great importance has been attached to the construction of a big data platform for health care.Entities:
Keywords: big data; big data platform in health care; data application; data governance; data integration; data quality control; data science; data security; health care; heterogeneous; medical informatics; multisource
Year: 2022 PMID: 35416792 PMCID: PMC9047713 DOI: 10.2196/36481
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Big data platform architecture.
Figure 2The relationship between the degree of data structuration and storage and computing capability.
Figure 3Flowchart of master index governance.
Governance strategy of enterprise master patient index.
| ID number | Name | Telephone number | Result |
| Equal | Equal | Equal | Accept |
| Equal | Equal | Unequal | Accept |
| Equal | Unequal | Equal | Accept |
| Equal | Unequal | Unequal | Denied |
| Unequal | Equal | Equal | Accept |
| Unequal | Equal | Unequal | Denied |
| Unequal | Unequal | Equal | Denied |
| Unequal | Unequal | Unequal | Denied |
Example of the WCH-BDP master data reference standard.
| Classification of master data | Number of reference standards | Example |
| Classification of diseases | 5 | ICD-10GB/ |
| Basic industry information | 6 | GB 11714-1997 Rules of coding for the representation of organizationGB/ |
| Health informatics | 20 | GB/ |
| Personal information | 12 | GB/ |
| Information technology | 3 | GB/ |
Figure 4Metadata governance architecture.
Data security strategy of the big data platform.
| Security classification | Security level | Descriptions | Examples | Precautions |
| Level 1 | Most confidential | Hypersensitive information | Financial data, personal authentication data | A specified environment for a specified single individual to use |
| Level 2 | Confidential | Highly sensitive information | Credit data, personal health privacy data | A specified environment for a specified single role to use |
| Level 3 | Secret | General sensitive information | Personal information during diagnosis and treatment, contract information, employee management data of employees | Use after the role group is authorized |
| Level 4 | Internal use only | Information that is not publicly disclosed | Organizational structure, basic information of employees, general data after desensitization | Use after internal authorization |
| Level 5 | Open to the public | Data that can be publicly disclosed | Summarized results obtained by statistical analysis | Public access or use |
Data desensitization and encryption policies of big data platform.
| Name of strategy | Scope of data involved | Design of desensitization and encryption policy |
| Digital data | Operating revenue, key quantity... | Fuzzy rounding method or fuzzy percentage method |
| Structured data of fixed length | Identification card number, telephone number, name... | Replace or encrypt from the starting bit to the end of the specified length range |
| Text data of varying length | Address, electronic medical record, descriptive diagnosis (infectious disease) | Locating sensitive content, then replace or encrypt sensitive characters |
| Image data | Radiological, ultrasonic and pathological imaging data | In image files, encryption algorithm is used for desensitization and watermarking is configured |
| File data | Genomics, molecular protein data | Locating sensitive content and rename sensitive characters |
Data quality standard system.
| Dimension | Content of the dimension | Quality indicator | Rule |
| Consistency | Check whether the data value is in the dictionary domain | Consistency rate | Higher than 90% |
| Integrality | Check the completeness of the required data | Percentage of integrality | Higher than 80% |
| Relevance | Check the relationship between key data | Degree of relevance | Higher than 95% |
| Timeliness | Check the logical validity of time-type data | Timeliness ratio | Higher than 80% |
| Uniqueness | Check whether data duplication exists | Repetitive rate | Lower than 0.01% |
| Stability | Check whether the fluctuation of data volume is abnormal | Fluctuation ratio | Lower than 20% |
Top-5 subject fields in terms of number of data rows.
| Number | Subject field | Tables, n | Data variables, n | Rows (10,000×n) | Systems involved |
| 1 | Medical record | 10 | 476 | 369824.53 | Hospital information system, online diagnosis and treatment system |
| 2 | Medical technology | 6 | 502 | 115073.35 | Electrocardiogram system, radiology information system, echocardiography system, endoscopic system, dynamic electrocardiogram system, pathology information system, ultrasonic system, laboratory information system, interventional surgical workstation, medical technology reservation information system, outpatient information system, physical examination information system |
| 3 | Fee | 8 | 440 | 60855.24 | Hospital information system, physical examination information system, online diagnosis and treatment system |
| 4 | Medical advice | 3 | 395 | 21366.64 | Hospital information system, online diagnosis and treatment system |
| 5 | Staff | 13 | 802 | 18138.94 | Hospital information system, physical examination information system, electronic data capture, human resource system |
List of data assets.
| Number | Directory field | Data variable (unit) | Types of variables | Example |
| 1 | Demographic | 91 | Structured | Gender, age, occupation, present address, nationality, height, weight, blood type |
| 2 | Basic medical information | 410 | Structured | Appointment date, visit date, clinic department, clinic type, supervising doctor, the department transferred to, admission date, discharge date, discharge state |
| 3 | Medical record information | 123 | Unstructured | Admit note, progress note, discharge record |
| 4 | Clinical diagnostic information | 47 | Structured | Clinic diagnosis, emergency diagnosis, admitting diagnosis, discharge diagnosis, medical insurance diagnosis, pathological diagnosis |
| 5 | Surgical and operational information | 138 | Structured | Name of operation, name of procedure, surgeon, surgical grade, anesthesia grading, incision type, level of healing |
| 6 | Diagnosis and treatment information | 166 | Structured | Type of medical order, drug name, usage, dosage, frequency, execution time of medical order |
| 7 | Laboratory test data | 147 | Structured | White blood cell count, red blood cell count, sodium level, uric acid level, blood glucose level, creatinine level |
| 8 | Imaging results | 124 | Unstructured | Magnetic resonance imaging, computed tomography, x-ray, ultrasound, digital radiography |
| 9 | Nursing record information | 53 | Unstructured | Admission assessment, daily records, nursing records |
| 10 | Physiological monitoring data | 50 | Structured | Vital signs |
| 11 | Scale evaluation data | 50 | Structured | Mood index, pressure ulcer assessment, risk assessment of falling out of bed |
| 12 | Medical cost information | 65 | Structured | Name of charge items, amount of charge items, settlement time |
| 13 | Patient label information | 24 | Unstructured | Patients with lower test results after surgery, patients with higher blood pressure after medication |
Changes between the traditional and the present data service.
|
| Traditional data services | Data services base on the platform |
| Data visualization | Data not visible | Users can visually view the available data catalog |
| Data retrieval | Data engineers develop code through experience | Users can customize the search format and output format through the search engine and preview the results |
| Data approval | Data are available after ethical review and clinical study program approval | Data are available after ethical review and clinical study program approval |
| Data mining | Use your own computer to analyze data | Development environment and tools, such as R, SPSS, and Python, can be used on the platform, and computing power provided by the platform can be called by data mining algorithm |
| Data download and access | Download the data which data engineers develop and perform encryption | The platform creates accounts with different permissions for registered users. In a network environment after security authentication, authorized users can log in to the big data platform unified portal through virtual desktop infrastructure. The platform provides each authorized user with private storage space of different capacities. Users can directly store their research results in this space, or install the software developed by our college on their personal computers to transfer the research results to personal computers |
Comparison of data service capabilities.
| Before the launch | After the launch | |
| Number of business systems covered | 8 | 27 |
| Data dimension | 803 | 1488 |
| Data volume (billion) | 6.8 | 16.49 |
| Number of monthly services | 166 | 8561 |
| Time per request (hour) | 4.5 | 0.15 |