Literature DB >> 34430439

Data governance system of the National Clinical Research Center for Child Health in China.

Jing Li1,2, Gang Yu1,2,3, Wen Ding2,4, Jian Huang1,2, Zheming Li1,2, Zhu Zhu1,2, Dejian Wang5, Jie Zhang5, Jing Wang5, Jianwei Yin6.   

Abstract

BACKGROUND: Since the national big data strategy was unveiled at the fifth plenary session of the 18th CPC (Communist Party of China) Central Committee, the big data industry has been flourishing in China. Various successful industrial data governance systems have emerged with the rapid development of big data technologies and data management theories. City Brain and Enterprise Data Middle Platform are considered the best data governance systems in urban and corporate governance, respectively. However, in the health and medical sectors, issues of data operation occur frequently due to a lack of systematic data governance. These problems need to be urgently addressed, as health and medical data have been defined as national fundamental strategic resources. Clinical researchers have an increasing demand for data analysis.
METHODS: Therefore, the Medical Data Governance System (MDGS) has been designed to improve data quality and provide simple and convenient data analysis tools for the National Clinical Research Center for Child Health. The MDGS consists of the Medical Data Platform (MDP) and Operation Management System (OMS). The MDP comprises acquisition layer, middle platform, and application layer that persistently elevates data quality and significantly shortens data analysis duration. Organization construction, management regulations, and technical standards are included in the OMS, which guarantees the sustainable operation of the MDGS. The MDGS was established to advance state-of-the-art and state-of-practice data governance for the health and medical sectors in China.
RESULTS: With the first phase of the MDGS, the quantity and quality of research projects increase, research transformation speeds up, and the researchers' job satisfaction increased.
CONCLUSIONS: Based on our preliminary achievements, it was necessary and feasible to establish the MDGS. It is important to have comprehensive requirement study, top-level design, refined planning, phase-by-phase implementation, and continual optimization. 2021 Translational Pediatrics. All rights reserved.

Entities:  

Keywords:  Data governance; Medical Data Platform (MDP); artificial intelligence modeling (AI modeling); clinical research; visual programming

Year:  2021        PMID: 34430439      PMCID: PMC8349965          DOI: 10.21037/tp-21-272

Source DB:  PubMed          Journal:  Transl Pediatr        ISSN: 2224-4336


Introduction

With the expansion of the big data industry in China, the Enterprise Data Middle Platform and the City Brain are booming and considered the best data governance systems in corporate and urban governance, respectively (1-7). In the health and medical sectors, however, issues of data operation occur frequently due to the lack of systematic data governance, such as data usage irregularities, data security risks, data operation restrictions, data heterogeneousness, data utilization ethical arguments, and costly data acquisition (8,9). These problems need to be addressed urgently, as health and medical data are national fundamental strategic resources (10). Although medical informatics has progressed rapidly in recent years in China, medical data are rarely managed systematically, which directly affects data quality and clinical research efficiency. There is strong demand among clinical researchers for data analysis. Coding languages and software are obstacles for clinical researchers when they start a research project, and it is difficult to navigate off-the-shelf data analysis products, especially those for artificial intelligence (AI) modelling. It takes significant time and energy to learn the software for data analysis, and doctors and nurses often have limited time to learn the software. Some hospitals have built their own clinical research platforms (11-13); however, they are best suited to pilot studies that focus only on data acquisition, data preprocessing, and disease database building, rather than top-level design of data governance or as convenient tools for researchers. There is a need to construct a comprehensive system to govern medical data that considers both technical support capability and operation guarantee measures. The concept of data management can be traced back to 1980s, when database techniques and data storage first emerged. In the Data Management Association International (DAMA) Guide to the Data Management Body of Knowledge (DMBOK), data management is defined as planning, controlling, and providing data assets (14). Among the 10 areas of DAMA’s data management system, data governance is the highest level planning activity, and includes data strategy development, data policy improvement, and data architecture design. It places emphasis on the data user, their usage mode, access authority, and other compliances. It also underlines the fundamental work before the lifecycle management of data assets, especially the related guarantee measures. Data standards and data value management were added in the DAMA-DMBOK2 in terms of data assets, underlining the upgrade of organization structure and management system to guarantee the workflow, security, and validity (15,16). The Children’s Hospital Zhejiang University School of Medicine is one of the best children’s hospitals in China (17), and awarded the National Clinical Research Center for Child Health and the Regional Children’s Health Center (18,19). The Children’s Hospital Zhejiang University School of Medicine has 1,300 beds and almost 3,000 employees. There are about 3.5 million outpatient visits and 81,000 inpatient visits every year. All hospital employees are committed to providing the highest quality of health care to children. The blueprint for the Medical Data Governance System (MDGS) of The Children’s Hospital Zhejiang University School of Medicine was created in early 2019. Phase I of the system has been completed and commenced operation at the end of October 2019. Preliminary success has been achieved so far.

Methods

System architecture

There are two subsystems that constitute the MDGS at The Children’s Hospital Zhejiang University School of Medicine. The first is the Medical Data Platform (MDP), which can be divided into the data acquisition layer, middle platform layer, and application layer. The other is the Operation Management System (OMS), which can be divided into organization building, management principles and regulations, and technical standards. The system architecture of the MDGS is shown in .
Figure 1

System architecture of the Medical Data Governance System. AI, artificial intelligence; PM, project management; SOP, standard operating procedure; RDR, research data repository; DB, data base; BI, business intelligence; EMR, electronic medical record; LIS, laboratory information system; HIS, hospital information system; PACS, picture archiving and communication system; HIT, healthcare information technology; EHR, electronic health record.

System architecture of the Medical Data Governance System. AI, artificial intelligence; PM, project management; SOP, standard operating procedure; RDR, research data repository; DB, data base; BI, business intelligence; EMR, electronic medical record; LIS, laboratory information system; HIS, hospital information system; PACS, picture archiving and communication system; HIT, healthcare information technology; EHR, electronic health record.

The MDP

As the 1-stop platform for clinical researchers, the MDP provides medical data of compliance, multidimension and high quality, as well as research tools of effectiveness, convenience, and visualization. The MDP primarily includes 3 parts; these are the data acquisition layer, middle platform, and application layer.

Data acquisition layer

Data source

The data acquisition layer acquires medical data required by research projects. The obtained data contain various categories, such as clinical data from health information technology systems (e.g., electronic medical records, hospital information systems, laboratory information systems, and picture archiving and communication systems), omics data from researchers (e.g., genomics, metabonomics, proteomics, immunomics, and ultrasomics), and data from other sources (e.g., biobank, wearables, electronic health records, epidemiology, climate, and environment).

Acquisition and preprocessing

Different techniques [e.g., database batch push, application programming interface (API) transmission, and uploads in files and tables] can be adopted in data acquisition according to the actual conditions. The data are gathered and stored as “raw data” in the data acquisition layer. Raw data are processed through a privacy protection module that deletes patients’ unnecessary private information of each data entry and encrypts what is left.

Backup and recovery

After preprocessing, data are defined as desensitized data and duplicated in the database. Each copy of desensitized data is supervised by the system. It can be recovered from the other copy in case of data loss.

Middle platform

As the core layer of the MDP, the middle platform manages data quality, research database, and system configuration.

Data quality management

The data quality in the data acquisition layer is not good enough for clinical research in terms of completeness, accuracy, and consistency. A closed-loop mechanism is designed for data quality improvement. The module of data quality management can discover and solve most problems of data quality. The workflow of data quality improvement is shown in .
Figure 2

Workflow of data quality management. AI, artificial intelligence.

Workflow of data quality management. AI, artificial intelligence. Problem discovery, problem locating, problem solving, and solution verifying together constitute the process group of data quality management. Most processes of data quality management are triggered by the system automatically according to the established criteria. In the case that a data quality problem is discovered by researchers, the process can also be initiated manually. Once the process is launched, the metadata management module can locate the root cause of the problem and visualize it on the lineage diagrams. Given different scenarios, problems are addressed by the system or platform administrators. It is required to verify the solutions by the system administrators or researchers. All the processes are recorded in the system log and shown on the problem reports. The module of data quality management works continuously, so that the completeness, accuracy, and consistency of the medical data on the data platform can be improved constantly.

Research data repository (RDR)

High-quality medical data are stored in the RDR for research projects. The workflow of data processing from raw data to the RDR is shown in . The data directory is created automatically by the system. Databases and knowledge graphs for specific diseases can also be generated. Metadata (including, but not limited to, data category, quantity, data source, and update time) of the medical data are analyzed statistically and visualized by the business intelligence module.
Figure 3

Dataflow of the Medical Data Governance System. AI, artificial intelligence; RDR, research data repository.

Dataflow of the Medical Data Governance System. AI, artificial intelligence; RDR, research data repository.

System configuration

With the module of system configuration, system administrators can set up different authorizations for different roles and accounts. GUI (Graphical User Interface) also can be personalized for clinical researchers, system administrators, and decision makers of the Chinese National Clinical Research Center.

Application layer

The application layer provides 1-stop services for clinical researchers, including project management, toolbox, and search engine.

Project management

The research project management process consists of the following 4 steps: applying, approving, executing, and closing. The lifecycle is shown in .
Figure 4

Lifecycle of research project management.

Lifecycle of research project management. Projects applications can be submitted after clinical researchers decide to initiate a research project and create a project plan. Applications need to demonstrate the scope and schedule of the project, refine the objectives, define the course of action required to attain the objectives, and clarify data demand and expected research fruits. The project applications need approval from the ethics committee and the data management committee, successively. Government permissions are required for some specific research projects, according to national regulations on the management of human genetic resources. By accessing the data and tools on the platform, researchers can commence their research after project approval. The data platform monitors and controls data usage, and records it in the system logs. When the closing process is performed, researchers are required to summarize the project, submit the final report and research fruits, and apply for acceptance. The data platform generates a data audit report that contains data usage analysis of the project. The data platform is continuously optimized based on the data audit reports. The data management committee arranges to accept the project, evaluate the research results, and offer fruit transformation services.

Toolbox

Based on visual programming, all the tools are visualized as icons. Rather than coding, researchers only need to drag and drop the icons from the toolbox and connect them by arrows. The data will be manipulated based on this dataflow graph. Furthermore, the intermediate results can be shown after each data process. A serial of research tools devoted to clinical researchers are assembled in the toolbox, containing statistics tools, AI algorithms, and data modeling tools. Various algorithms and data modeling tools can be considered (e.g., ANN (Artificial Neural Network), RBF (Radial Basis Function), CART (Classification and Regression Trees), Naïve Bayes, Apriori, correlation analysis, clustering, abnormal value management, and intelligence recommendation).

Search engine

A powerful search engine is designed to search for related papers, data resources on the platform, and projects undertaken by other colleagues. The search engine provides unified access to the databases of references, data directories, and project introductions. It supports multidimensional, bilingual, and fuzzy query.

The OMS

Operation management was often overlooked by administrators of information systems, resulting in unachieved objectives or even negative effects. The OMS is designed to guarantee smooth operation of the MDGS. It mainly consists of organization construction, management principles and regulations, and technical standards.

Organization construction

The following 3 organizations were constructed and play important roles in research project management: ethics committee, data management committee, and technical service team.

Ethics committee

The ethics committee decides if the research is ethical. It inspects the project application and insures the safety, health, and rights of research participants. The ethics committee consists of medical specialists, legal experts, and non-medical staff. Committee members are elected from the candidates list on the hospital board meeting. Research project applications are first submitted to the ethics committee and are rejected if they cannot pass the ethics inspection. The ethics committee works independently and is supervised by the National Clinical Research Center and research participants.

Data management committee

The data management committee is the decision maker on data-relevant issues, such as data security, data utilization, and data platform construction. It consists of the National Clinical Research Center administrators and senior experts from Department of Data and Information, clinical departments, and the Department of Research and Education. Committee members from National Clinical Research Center are assigned by the board of the center. Committee members from Department of Data and Information and Department of Research and Education are elected by the department meeting respectively. After ethics committee approval, research project applications will be submitted to the data management committee. It assesses the rationality and availability of the project data requirements. It reports to the national center and is supervised by the platform users.

Technical service team

The technical service team is responsible for the construction, operation, and maintenance of the MDP, as well as services pertaining to technical training, account management, and data analysis for researchers. It consists of specialists from independent software vendors (ISVs) and the Department of Data and Information. Team members specialize in medical AI, data engineering, bioinformatics, system architecture, software engineering, computer networking, information security, and project management. Team members are selected by the data management committee. It reports to the data management committee and is supervised by platform users.

Management principles and regulations

A series of principles and regulations was released to instruct researchers in MDP usage, research project management, fruit transformation, and other relevant fields.

MDGS guide

The MDGS guide instructs users how to operate the system, safeguards researchers’ interests, and guarantees data security. It contains information on system introduction, rights and responsibilities of different roles, system operation mechanisms, management regulations, and a user manual. It is released by the data management committee and requires compliance by all the users.

Research project management regulations

Regulations on research project management instruct clinical researchers on how to apply for apply, execute, and close projects. It contains project management principles, management workflows, criteria of project approval, and data utilization specifications. It was released by the Department of Research and Education and the data management committee, and requires compliance by all the users.

Research stimulation and fruit transformation policies

Research stimulation and fruit transformation policies were designed to encourage researchers and maximize the value of research fruits. It contains stimulation principles, stimulation methods, fruits evaluation standards, and fruits transformation mechanisms. It was released by the Department of Research and Education and the data management committee, and applies to all researchers.

Technical standards

The technical standards illustrate the objectives, requirements, and technical paths of the MDP. It includes clinical data standards, specifications of data governance and utilization, and MDP construction standards.

Clinical data standards

Clinical data standards specify the models of the medical data, containing data structures, data operations, and data constraints. It was drafted by clinical experts and the data management committee, and was released by the data management committee.

Data governance and utilization specifications

Data governance and utilization specifications focus on the targets, principles, processes, and supervisions of data governance and data utilization. It also includes the standards of data classification and data quality evaluation. It was released by the data management committee.

MDP construction standards

The MDP construction standards focus on system architecture, function planning, database design, data flow, interface specification, and implementing plans. It was released by the data management committee and requires compliance by all ISVs.

Results

After 1 year of construction, phase I of the MDGS was completed with good results. With the sustained improvement of researcher satisfaction, the quantity and quality of research projects increase significantly, as do research grants, and research transformation has big progress. There are 126 research projects supported by the MDGS to date, 14 of which are multicenter clinical research (11 nationwide, 3 provincial). Benefits of the MDGS include 23 National Natural Science Foundation of China projects being approved, 278 SCI (Science Citation Index) papers published, and 39 patents obtained during 2020. The number of research achievements in 2020 was significantly higher compared with 2019, as shown in .
Table 1

Annual research achievements the Children’s Hospital of Zhejiang University School of Medicine

20192020Growth rate (%)
NSFC projects122391.67
Other projects729126.39
SCI papers17927855.31
Patents333918.18

NSFC, National Natural Science Foundation of China; SCI, Science Citation Index.

NSFC, National Natural Science Foundation of China; SCI, Science Citation Index. The science and technology evaluation metrics (STEM) are published by the Chinese Academy of Medical Sciences every year to evaluate the research strength of hospitals and medical colleges in China. Based on STEM released in 2020, The Children’s Hospital Zhejiang University School of Medicine went from third to second in children’s hospitals rankings, and from 97th to 68th among all hospitals (20), as shown in .
Table 2

Science and technology evaluation metrics rankings of The Children’s Hospital Zhejiang University School of Medicine

RankPrevious rankRose
Children’s hospital ranking231
Hospital ranking689729
Within the accepted research projects, AI assistant skeletal age inspection has already finished productization (21). The first all-in-one machine of skeletal age inspection for children was published around the world. It has much lower radiation and much higher diagnostic speed, and is considered a great benefit for teenagers with obesity diabetes. The research project of the AI-based children’s hip joint development evaluation is in the process of transformation. The intelligent diagnostic system of children’s hip joint development, based on medical imaging, is about to be published by the end of 2021 (22). Research on AI-assisted diagnosis of chronic cough in children based on multimodal fusion is in progress. The prototype of the Children Asthma Diagnosis System has been designed and tested, which is expected to benefit pediatricians. With higher enthusiasm and better intellectual atmosphere, more and more clinical staff initiate their research projects on the MDGS.

Discussion

In Phase I of the MDGS, basic functions of the MDP were completed, management organizations were established, and initial regulations and standards were released. Phase II is designed to acquire more data to enrich the RDR, introduce more AI tools to enhance the toolbox, enforce training for clinical staff to improve data quality at the source, optimize management regulations and update technical standards, and employ more talent. Information security system will be enhanced if it is required to connect to the Internet in the future. Phase II is implemented as planed so far and we are confident to have a more powerful MDGS soon. With another 2 years of hard work, we hope our MDGS will become one of the best practices in the field (23).

Conclusions

Based on the achievements of MDGS Phase I, it is necessary and feasible to establish MDGS for national medical centers, and both MDP and OMS are important for smooth system operation. To create an effective MDGS, it is critical to have top-level design, refined planning, phase-by-phase implementation, and continual optimization. The article’s supplementary files as
  3 in total

1.  Adversarial active learning for the identification of medical concepts and annotation inconsistency.

Authors:  Gang Yu; Yiwen Yang; Xuying Wang; Huachun Zhen; Guoping He; Zheming Li; Yonggen Zhao; Qiang Shu; Liqi Shu
Journal:  J Biomed Inform       Date:  2020-07-18       Impact factor: 6.317

2.  Uppsala Clinical Research Center-development of a platform to promote national and international clinical science.

Authors:  Lars Wallentin; Bertil Lindahl
Journal:  Ups J Med Sci       Date:  2018-12-04       Impact factor: 2.384

3.  Diagnostic performance of convolutional neural network-based Tanner-Whitehouse 3 bone age assessment system.

Authors:  Xue-Lian Zhou; Er-Gang Wang; Qiang Lin; Guan-Ping Dong; Wei Wu; Ke Huang; Can Lai; Gang Yu; Hai-Chun Zhou; Xiao-Hui Ma; Xuan Jia; Lei Shi; Yong-Sheng Zheng; Lan-Xuan Liu; Da Ha; Hao Ni; Jun Yang; Jun-Fen Fu
Journal:  Quant Imaging Med Surg       Date:  2020-03
  3 in total
  1 in total

1.  Alliance chain-based simulation on a new clinical research data pricing model.

Authors:  Jing Li; Dejian Wang; Guoqiang Qi; Zheming Li; Jian Huang; Zhu Zhu; Chen Shen; Bo Lin; Kexiong Dong; Baolong Zhao; Qiang Shu; Jianwei Yin; Gang Yu
Journal:  Ann Transl Med       Date:  2022-08
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.