| Literature DB >> 27570663 |
Reed McEwan1, Genevieve B Melton2, Benjamin C Knoll3, Yan Wang4, Gretchen Hultman4, Justin L Dale1, Tim Meyer1, Serguei V Pakhomov5.
Abstract
Many design considerations must be addressed in order to provide researchers with full text and semantic search of unstructured healthcare data such as clinical notes and reports. Institutions looking at providing this functionality must also address the big data aspects of their unstructured corpora. Because these systems are complex and demand a non-trivial investment, there is an incentive to make the system capable of servicing future needs as well, further complicating the design. We present architectural best practices as lessons learned in the design and implementation NLP-PIER (Patient Information Extraction for Research), a scalable, extensible, and secure system for processing, indexing, and searching clinical notes at the University of Minnesota.Entities:
Year: 2016 PMID: 27570663 PMCID: PMC5001745
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Figure 1.Clinical NLP Platform Architecture. High-level system architecture of the six components comprising the clinical notes processing architecture of NLP-PIER. Generalized functional names of each component are italicized along the top of diagram. Firewall rules restrict component access (dotted lines). Component labels represent institutional- and technology-specific names.
Figure 2.Index Aliases. Schematic representing index aliases created to facilitate security and application side joins. Shaded boxes represent alias boundaries across actual indexes. D represents an individual note. R, and R represent groups of notes associated with authorized ICS requests and serve as search contexts within PIER. NOTES represents the entire corpus across all indexes.
Summary metrics of the overall notes processing system, including patient and encounter counts and data footprints.
| Documents | Clinical Metrics | Storage Footprints | |||
|---|---|---|---|---|---|
| Total Notes | 100 M | Distinct Patients | 2 M | Index Storage, Total | 600 GB |
| Average Note Size | ∼2 kB | Encounters | 45 M | Index Storage, Notes | 240 GB |
| Annotations | 7 B | Service Dates | 4 M | Index Storage, NLP Annotations | 360 GB |