| Literature DB >> 31872069 |
Andrew Wen1, Sunyang Fu1, Sungrim Moon1, Mohamed El Wazir2, Andrew Rosenbaum2, Vinod C Kaggal3, Sijia Liu1, Sunghwan Sohn1, Hongfang Liu1, Jungwei Fan1.
Abstract
Data is foundational to high-quality artificial intelligence (AI). Given that a substantial amount of clinically relevant information is embedded in unstructured data, natural language processing (NLP) plays an essential role in extracting valuable information that can benefit decision making, administration reporting, and research. Here, we share several desiderata pertaining to development and usage of NLP systems, derived from two decades of experience implementing clinical NLP at the Mayo Clinic, to inform the healthcare AI community. Using a framework, we developed as an example implementation, the desiderata emphasize the importance of a user-friendly platform, efficient collection of domain expert inputs, seamless integration with clinical data, and a highly scalable computing infrastructure.Entities:
Keywords: Health care; Medical research
Year: 2019 PMID: 31872069 PMCID: PMC6917754 DOI: 10.1038/s41746-019-0208-8
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1NLPaaS architecture diagram.
Fig. 2The NLPaaS clinical concept and cohort definition interfaces.
Clinical concept definition interface a and cohort definition interface b.
NLPaaS pilot usage metrics from 01 May 2019 through 30 September 2019—cluster statistics and resulting workstation estimates are determined based on a calculated average of 256 executor threads (16 executor nodes × 16 cores).
| Metric Name | Value |
|---|---|
| Number of projects | 61.0 |
| Number of jobs | 246.0 |
| Number of pilot users | 13.0 |
| Number of unique concepts (across all projects) | 269.0 |
| Average number of unique concepts per project | 5.0 |
| Average number of documents per job | 6,624,651.1 |
| Average number of jobs ran per project | 4.0 |
| Average job runtime (cluster) | 1.0 h |
| Average project runtime (cluster; avg job runtime × avg number of jobs per project) | 3.9 h |
| Average document throughput (cluster) | 6,896,784.1 documents per hour |
| Total job runtime (cluster) | 236.3 h (9.8 days) |
| Estimated equivalent average job runtime (quad-core workstation) | 61.5 h (2.6 days) |
| Estimated equivalent average project runtime (quad-core workstation) | 247.9 h (10.3 days) |
| Estimated equivalent total job runtime (quad-core workstation) | 15,122.8 h (630.1 days) |