| Literature DB >> 33266935 |
Elyas Sabeti1, Anders Høst-Madsen2,3.
Abstract
The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We develop the information-theoretic methodology for a number of "universal" signal processing models, and finally apply them to recorded hydrophone data and heart rate variability (HRV) signal.Entities:
Keywords: atypicality; big data; codelength; minimum description length
Year: 2019 PMID: 33266935 PMCID: PMC7514700 DOI: 10.3390/e21030219
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Redundancy comparison between ordinary predictive minimum description length (O.P. MDL) and our proposed sufficient statistic method for and .
Figure 2Precision vs. Recall probability for all six days that manual detections are available.
Figure 3Detected atypical segments of Holter Monitoring heart rate variability (HRV): “S” stands for supraventricular arrhythmia and “V” stands for ventricular contraction based on annotation provided by PhysioNet [60].