| Literature DB >> 34711924 |
Visar Berisha1,2,3, Chelsea Krantsevich4,5, P Richard Hahn5, Shira Hahn6,4, Gautam Dasarathy7, Pavan Turaga7,8, Julie Liss6,4.
Abstract
Digital health data are multimodal and high-dimensional. A patient's health state can be characterized by a multitude of signals including medical imaging, clinical variables, genome sequencing, conversations between clinicians and patients, and continuous signals from wearables, among others. This high volume, personalized data stream aggregated over patients' lives has spurred interest in developing new artificial intelligence (AI) models for higher-precision diagnosis, prognosis, and tracking. While the promise of these algorithms is undeniable, their dissemination and adoption have been slow, owing partially to unpredictable AI model performance once deployed in the real world. We posit that one of the rate-limiting factors in developing algorithms that generalize to real-world scenarios is the very attribute that makes the data exciting-their high-dimensional nature. This paper considers how the large number of features in vast digital health data can challenge the development of robust AI models-a phenomenon known as "the curse of dimensionality" in statistical learning theory. We provide an overview of the curse of dimensionality in the context of digital health, demonstrate how it can negatively impact out-of-sample performance, and highlight important considerations for researchers and algorithm designers.Entities:
Year: 2021 PMID: 34711924 PMCID: PMC8553745 DOI: 10.1038/s41746-021-00521-5
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1A high-level block diagram for clinical AI model development.
During model training, algorithm designers collect a large training dataset consisting of data from different modalities, each acquired according to some predefined protocol. These data are used to engineer a feature set and train a model to automate a clinical decision of interest. The final model and feature set are selected using a cross-validation procedure on a held-out test set. After model deployment, real-world model performance can be monitored and the original model can be iteratively updated and re-deployed.
Fig. 2The two scenarios considered in the example problem in the text.
Under the first scenario (a), type-to-token ratio is the only relevant feature for distinguishing between healthy controls and patients with mild cognitive impairment (MCI). Under the second scenario (b), both type-to-token ratio and lexical density are relevant features for separating between these two groups.
Fig. 3Two classifiers learned from the 2-d samples in Fig. 2.
Both classifiers achieve approximately the same performance on the available data; however they would treat most of the samples from the blind spot differently. Model a would classify them as MCI, whereas model b would classify them as healthy.
Fig. 4The reported accuracy vs. total sample size for 51 classifiers from the meta-analyses in refs. [7,8].
This analysis considers two types of models: (1) speech-based models for classifying between a control group and patients with a diagnosis of Alzheimer’s disease (Con vs. AD; blue plot) and (2) speech-based models for classifying between a control group and patients with other forms of cognitive impairment (Con vs. CI; red plot). The total sample size is the sum of the number of subjects in the control group and the clinical group. The y-axis is in linear scale and the x-axis is in log scale as it spans multiple orders of magnitude.