| Literature DB >> 29949972 |
Christian Bock1,2, Thomas Gumbsch1,2, Michael Moor1,2, Bastian Rieck1,2, Damian Roqueiro1,2, Karsten Borgwardt1,2.
Abstract
Motivation: Most modern intensive care units record the physiological and vital signs of patients. These data can be used to extract signatures, commonly known as biomarkers, that help physicians understand the biological complexity of many syndromes. However, most biological biomarkers suffer from either poor predictive performance or weak explanatory power. Recent developments in time series classification focus on discovering shapelets, i.e. subsequences that are most predictive in terms of class membership. Shapelets have the advantage of combining a high predictive performance with an interpretable component-their shape. Currently, most shapelet discovery methods do not rely on statistical tests to verify the significance of individual shapelets. Therefore, identifying associations between the shapelets of physiological biomarkers and patients that exhibit certain phenotypes of interest enables the discovery and subsequent ranking of physiological signatures that are interpretable, statistically validated and accurate predictors of clinical endpoints.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29949972 PMCID: PMC6022601 DOI: 10.1093/bioinformatics/bty246
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Schematic illustration of a shapelet, a time series motif, and its occurrences in a data set of time series that belong to one of two phenotypic classes (left: y = 1, right: y = 0). The shapelet is enriched in one class (y = 1). Note that the decision whether a shapelet occurs depends on a distance threshold
A 2 × 2 contingency table as used by our method
| Class label | Row totals | ||
|---|---|---|---|
| Column totals |
Note: When calculating the minimum attainable p-value for a shapelet, only the values r (the number of time series in one part of the partition), n1 (the number of time series with a positive label), and n (the total number of time series) are required.
Number of statistically significant shapelets after adjusting for multiple hypothesis testing
| Vital sign | S3M | gRSF | ||
|---|---|---|---|---|
| Heart rate | 200 | 2.51 × 10−10 | 0 | 1.28 × 10−15 |
| Respiratory rate | 514 | 4.47 × 10−10 | 0 | 1.33 × 10−15 |
| Systolic blood pressure | 58 | 2.55 × 10−9 | 0 | 4.35 × 10−14 |
Note: Our proposed method S3M returns many significant shapelets, in contrast to the baseline competitor gRSF, which does not yield any significant shapelets. We denote the significance threshold reached by our method as , and the Bonferroni correction factor by α.
Fig. 2.Following Section 3.2, we generate the coordinates from the contingency table of each shapelet such that the axes represent a relative measure of the degree to which a shapelet is present in cases (x1) and absent in controls (x0). This results in a point cloud of all shapelets (gray). The statistically significant shapelets (red) identified by the proposed S3M method form a distinct subset. Their coordinates indicate that they are predominantly present in cases and absent in controls
The contingency tables of the statistically most significant shapelets identified by S3M for the three datasets
| (a) Heart rate | (b) Respiratory rate | (c) Systolic blood pressure |
Note: Each table follows the notation from Table 1, i.e. a, b in the top and d, c in the bottom row.
Fig. 3.The three most statistically significant shapelets that our algorithm extracted for the three datasets (a-c). Each shapelet is shown within the context of the time series it is extracted from. The x-axis depicts the hour since ICU admission
Classification accuracy of S3M versus gRSF (average out of 10 repetitions) on the test set
| Vital sign | S3M | # Shapelets | gRSF | # Shapelets |
|---|---|---|---|---|
| Heart rate | 0.70 | 1 | 3030 | |
| Respiratory rate | 0.71 | 1 | 3406 | |
| Systolic blood pressure | 1 | 0.74 | 971 |
Bold values are used to highlight the best predictive performance (accuracy) of the compared methods. Note: The proposed S3M method only uses one shapelet, whereas gRSF constructs a decision tree based on multiple shapelets.