| Literature DB >> 30964441 |
Jacob McPadden1, Thomas Js Durant2,3, Dustin R Bunch2, Andreas Coppi3, Nathaniel Price4, Kris Rodgerson4, Charles J Torre4, William Byron4, Allen L Hsiao1, Harlan M Krumholz3,5,6, Wade L Schulz2,3.
Abstract
BACKGROUND: Health care data are increasing in volume and complexity. Storing and analyzing these data to implement precision medicine initiatives and data-driven research has exceeded the capabilities of traditional computer systems. Modern big data platforms must be adapted to the specific demands of health care and designed for scalability and growth.Entities:
Keywords: big data; computational health care; data science; medical informatics computing; monitoring, physiologic
Mesh:
Year: 2019 PMID: 30964441 PMCID: PMC6477571 DOI: 10.2196/13043
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Baikal platform architecture. Cluster services are monitored, deployed, and provisioned by Ambari management console. Workflow management and configuration synchronization are handled by Zookeeper and Oozie. Data storage frameworks include Hadoop Distributed File System (HDFS) and a nonrelational database: Elasticsearch. Kafka messaging queues are used for incoming data with subsequent ingest and processing handled by Storm, Sqoop, and NiFi. Analytics can be performed by Spark and Hive. Kerberos and Ranger are used to secure cluster applications. Lastly, Docker Swarm is used to deploy custom applications that can be run within the data science platform. YARN: Yet Another Resource Negotiator.
Average storage requirements for adult and pediatric patient monitoring and ventilator monitoring. Signal frequency and storage size are the metrics for a complete 24-hour per-bed monitoring period averaged from 3 independent samples.
| Source | Signal frequency, mean (SD) | Storage size (MB), mean (SD) | Estimated annual storage (GB) |
| Adult monitor | 291,252 (84,568) | 17.1 (5.0) | 6.2 |
| Pediatric monitor | 223,387 (29,543) | 12.7 (1.8) | 4.6 |
| Adult ventilator | 3,504,162 (236,672) | 231.5 (30.6) | 84.5 |
Figure 2System architecture for continuous patient monitoring. Multiple, increasing sources of clinical data (A) acquire and transmit the data to aggregation servers, which then forward Health-Level 7 (HL7) messages to an emissary service (B), where data are normalized and securely forwarded in standardized JSON format to the Baikal system (C) for denormalization, processing, and storage in the Hadoop Distributed File System (HDFS). Traditional historic databases (D) are individually prepared for ingestion in the Baikal system and storage in HDFS. The resulting data lake allows for integrated, distributed analytics by end users.
Figure 3Comparison of storage and read/write efficiency. Avro increases storage space and write time modestly while significantly reducing read time. The addition of Snappy compression increases write time minimally, while significantly decreasing storage space and maintaining minimal read time. The resulting combination optimizes for single archival write with multiple read usage. CSV: comma-separated values. Error bars represent standard error.
Figure 4System architecture for laboratory data monitoring. Health-Level 7 (HL7) observations and results messages generated by laboratory information system and laboratory middleware systems are received by the clinical integration engine Cloverleaf (A). HL7 messages are received and validated by a custom emissary service (B) and mapped to JSON documents, which are submitted to a Kafka message queue for downstream processing (C). Custom Python (version 2.7) scripts are executed in NiFi to denormalize messages and calculate quality improvement metrics. Raw HL7 messages are stored in a Hadoop Distributed File System (HDFS). Processed messages and quality improvement metrics are routed to Elasticsearch (D) for real-time analysis and Kibana (E) for visualization.