| Literature DB >> 32138289 |
Fayzan F Chaudhry1,2, Matteo Danieletto1,2,3, Eddye Golden1,2,3, Jerome Scelza2,3, Greg Botwin2,3, Mark Shervey2,3, Jessica K De Freitas1,2,3, Ishan Paranjpe1, Girish N Nadkarni1,4,5, Riccardo Miotto1,2,3, Patricia Glowe1,2,3, Greg Stock3, Bethany Percha2,3, Noah Zimmerman2,3, Joel T Dudley2,3, Benjamin S Glicksberg1,2,3.
Abstract
Sleep quality has been directly linked to cognitive function, quality of life, and a variety of serious diseases across many clinical domains. Standard methods for assessing sleep involve overnight studies in hospital settings, which are uncomfortable, expensive, not representative of real sleep, and difficult to conduct on a large scale. Recently, numerous commercial digital devices have been developed that record physiological data, such as movement, heart rate, and respiratory rate, which can act as a proxy for sleep quality in lieu of standard electroencephalogram recording equipment. The sleep-related output metrics from these devices include sleep staging and total sleep duration and are derived via proprietary algorithms that utilize a variety of these physiological recordings. Each device company makes different claims of accuracy and measures different features of sleep quality, and it is still unknown how well these devices correlate with one another and perform in a research setting. In this pilot study of 21 participants, we investigated whether sleep metric outputs from self-reported sleep metrics (SRSMs) and four sensors, specifically Fitbit Surge (a smart watch), Withings Aura (a sensor pad that is placed under a mattress), Hexoskin (a smart shirt), and Oura Ring (a smart ring), were related to known cognitive and psychological metrics, including the n-back test and Pittsburgh Sleep Quality Index (PSQI). We analyzed correlation between multiple device-related sleep metrics. Furthermore, we investigated relationships between these sleep metrics and cognitive scores across different timepoints and SRSM through univariate linear regressions. We found that correlations for sleep metrics between the devices across the sleep cycle were almost uniformly low, but still significant (P < 0.05). For cognitive scores, we found the Withings latency was statistically significant for afternoon and evening timepoints at P = 0.016 and P = 0.013. We did not find any significant associations between SRSMs and PSQI or cognitive scores. Additionally, Oura Ring's total sleep duration and efficiency in relation to the PSQI measure was statistically significant at P = 0.004 and P = 0.033, respectively. These findings can hopefully be used to guide future sensor-based sleep research.Entities:
Keywords: Fitbit; Hexoskin; Oura; Withings; biosensors; cognition; sleep; wearables
Mesh:
Year: 2020 PMID: 32138289 PMCID: PMC7085707 DOI: 10.3390/s20051378
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Study structure and data collection for our pilot sleep study. (A) Illustration of sleep study monitoring procedure and data collection strategies. (B) Example data showing a comparison of sleep staging of a single night for one study participant for all four devices.
Figure 2(A) A correlation matrix of total sleep duration (TSD) (in seconds) by device and self-reported estimation (i.e., self-reported sleep metrics (SRSMs)) with p value significance indication (* p < 0.1; ** p < 0.05; *** p < 0.01). Each point represents data from each night for each participant. The plots in the diagonals of A and B reflect the distribution of sleep metric of interest (TSD and REM, respectively). (B) A REM sleep (in sec) correlation across the Oura, Hexoskin, and Withings devices with p value significance indication (same as above). The Fitbit was excluded, as it does not track REM vs. NREM sleep. for each individual device. The plots in the bottom left of A and B show the trend line with 95% confidence intervals between devices. (C) A correlation matrix of overall sleep stages (awake, NREM, and REM) between Oura, Hexoskin, and Withings devices (Fitbit does not differentiate between NREM and REM) with p value significance indication (same as above).
Summary of the study population. The participant’s gender (M/F/O), baseline assessment of sleep quality according to the Pittsburgh Sleep Quality Index (PSQI) (with higher values indicative of poorer sleep), age, SF-36 score (a measure of general health along eight axes), and MEQ time (optimal time of day) are included.
| ID | Gender | Age | PSQI | MEQ | SF-36 Scores | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Physical Functioning | Role Limitations (Physical) | Role Limitations (Emotional) | Energy | Emotional Well-Being | Social Functioning | Pain | General Health | |||||
| 1 | F | 23 | 1 | 50 | 100 | 100 | 100.0 | 50 | 68 | 87.5 | 100.0 | 55 |
| 2 | F | 26 | 4 | 47 | 90 | 100 | 66.7 | 45 | 72 | 100.0 | 100.0 | 60 |
| 3 | F | 27 | 5 | 52 | 100 | 100 | 100.0 | 45 | 56 | 87.5 | 90.0 | 50 |
| 4 | F | 27 | 2 | 36 | 100 | 100 | 100.0 | 65 | 80 | 75.0 | 100.0 | 55 |
| 5 | F | 27 | 4 | 58 | 100 | 100 | 100.0 | 50 | 76 | 87.5 | 90.0 | 55 |
| 6 | F | 28 | 3 | 52 | 100 | 100 | 100.0 | 55 | 76 | 75.0 | 100.0 | 60 |
| 7 | F | 28 | 3 | 40 | 90 | 100 | 33.3 | 50 | 72 | 87.5 | 67.5 | 55 |
| 8 | F | 29 | 12 | 35 | 100 | 100 | 0 | 15 | 36 | 50.0 | 67.5 | 55 |
| 9 | F | 31 | 4 | 49 | 95 | 100 | 100.0 | 60 | 84 | 100.0 | 100.0 | 55 |
| 10 | F | 39 | 4 | 49 | 60 | 50 | 100.0 | 45 | 44 | 87.5 | 77.5 | 55 |
| 11 | F | 41 | 5 | 53 | 100 | 100 | 100.0 | 95 | 96 | 100.0 | 100.0 | 60 |
| 12 | M | 25 | 10 | 55 | 100 | 100 | 66.7 | 85 | 76 | 100.0 | 100.0 | 60 |
| 13 | M | 29 | 5 | 52 | 100 | 100 | 100.0 | 50 | 88 | 100.0 | 100.0 | 50 |
| 14 | M | 29 | 4 | 41 | 100 | 100 | 100.0 | 50 | 76 | 100.0 | 100.0 | 60 |
| 15 | M | 31 | 3 | 56 | 95 | 100 | 100.0 | 65 | 80 | 75.0 | 90.0 | 50 |
| 16 | M | 34 | 12 | 73 | 100 | 100 | 66.7 | 50 | 52 | 62.5 | 100.0 | 55 |
| 17 | M | 35 | 6 | 52 | 100 | 100 | 100.0 | 75 | 80 | 100.0 | 100.0 | 50 |
| 18 | M | 37 | 3 | 61 | 90 | 100 | 66.7 | 50 | 80 | 87.5 | 90.0 | 55 |
| 19 | M | 39 | 8 | 72 | 100 | 100 | 100.0 | 80 | 88 | 100.0 | 100.0 | 55 |
| 20 | M | 41 | 6 | 55 | 95 | 100 | 100.0 | 50 | 84 | 100.0 | 80.0 | 55 |
| 21 | M | 41 | 9 | 52 | 95 | 100 | 66.7 | 35 | 52 | 87.5 | 70.0 | 60 |
| MIN | 23 | 1 | 35 | 60 | 50 | 0 | 15 | 36 | 50 | 67.5 | 50 | |
| MEDIAN | 29 | 4 | 52 | 100 | 100 | 100 | 50 | 76 | 87.5 | 100 | 55 | |
| MAX | 41 | 12 | 73 | 100 | 100 | 100 | 95 | 96 | 100 | 100 | 60 | |
Summary metrics of device data and SRSMs. All units are in hours except wakeups which is in occurrences and efficiency (no units). Sleep efficiency is a metric to track percentage of time in bed while asleep. TSD is total sleep duration which is similar to start-end duration and similar features were utilized that included latency and other measures.
| Device | Metric | n | Mean | St. Dev | Min | Pctl (25) | Pctl (75) | Max |
|---|---|---|---|---|---|---|---|---|
| Fitbit | Efficiency | 129 | 94.70 | 15.70 | 31.00 | 94.00 | 97.00 | 193.00 |
| TSD All | 129 | 7.47 | 1.47 | 3.78 | 6.50 | 8.43 | 11.40 | |
| TSD | 129 | 7.58 | 1.58 | 1.78 | 5.98 | 7.93 | 10.75 | |
| Start-End | 129 | 7.58 | 1.73 | 3.78 | 6.50 | 8.48 | 15.87 | |
| Wakeups | 129 | 1.60 | 1.20 | 0.00 | 1.00 | 2.00 | 8.00 | |
| Hexoskin | Efficiency | 114 | 92.40 | 4.40 | 70.30 | 91.10 | 95.30 | 97.80 |
| TSD | 114 | 6.72 | 1.31 | 3.45 | 5.78 | 7.81 | 9.69 | |
| Start-End | 135 | 7.57 | 1.42 | 3.93 | 6.57 | 8.58 | 11.43 | |
| REM | 123 | 2.15 | 0.57 | 0.69 | 1.77 | 2.53 | 4.12 | |
| Latency | 114 | 0.29 | 0.26 | 0.07 | 0.12 | 0.38 | 1.56 | |
| Oura | Efficiency | 127 | 89.70 | 14.40 | 24.00 | 84.00 | 93.00 | 164.00 |
| TSD | 128 | 7.69 | 1.72 | 0.42 | 6.73 | 8.75 | 13.48 | |
| Start-End | 130 | 10.67 | 11.63 | 4.62 | 6.97 | 9.55 | 117.60 | |
| REM | 127 | 2.17 | 1.11 | 0.00 | 1.29 | 2.81 | 6.38 | |
| Deep | 127 | 1.12 | 0.58 | 0.00 | 0.73 | 1.44 | 2.58 | |
| Wakeups | 127 | 2.40 | 1.90 | 0.00 | 1.00 | 4.00 | 7.00 | |
| Latency | 127 | 0.26 | 0.25 | 0.01 | 0.11 | 0.30 | 1.58 | |
| Withings | Efficiency | 141 | 84.10 | 20.50 | 20.50 | 74.80 | 90.10 | 179.80 |
| TSD All | 141 | 8.99 | 2.89 | 0.53 | 7.45 | 10.12 | 27.03 | |
| TSD | 141 | 6.97 | 1.75 | 0.33 | 5.95 | 8.15 | 10.97 | |
| Start-End | 141 | 9.30 | 4.45 | 0.42 | 7.08 | 9.73 | 34.55 | |
| REM | 141 | 1.40 | 0.46 | 0.00 | 1.15 | 1.67 | 2.63 | |
| Deep | 141 | 1.74 | 0.58 | 0.00 | 1.42 | 2.15 | 3.67 | |
| Light | 141 | 3.83 | 0.98 | 0.33 | 3.22 | 4.45 | 6.03 | |
| Wakeups | 141 | 2.40 | 2.60 | 0.00 | 0.00 | 3.00 | 13.00 | |
| Latency | 141 | 0.32 | 0.36 | 0.00 | 0.08 | 0.42 | 2.37 | |
| Wakeup Duration | 141 | 1.38 | 2.14 | 0.03 | 0.53 | 1.50 | 17.48 | |
| SRSMs | Start-End | 122 | 7.34 | 1.45 | 4.50 | 6.35 | 8.24 | 12.33 |
| TSD | 122 | 6.91 | 1.56 | 3.00 | 6.00 | 7.78 | 15.00 | |
| Latency | 122 | 0.24 | 0.23 | 0.02 | 0.08 | 0.33 | 2.00 |
Results of multiple univariate linear models for PSQI (left) and cognitive scores across all timepoints (right). For the PSQI-related models, the independent variables were the means of device data for each participant, and the dependent variable was PSQI. The higher the value is on the PSQI, the worse the sleep quality; thus, positive correlations suggest relation to poorer sleep quality. For the cognitive score-related models, the independent variables were the means of device data for each participant, and the dependent variable were the cognitive scores. We show the p values of each univariate regression for cognitive score by timepoint. Please see Supplemental S6–S8 for more statistics related to these regressions. All units are in hours with the exception of wakeups (number of occurrences) and efficiency (a standardized metric).
| PSQI | Cognitive scores ( | |||||||
|---|---|---|---|---|---|---|---|---|
| Device | Feature | Coefficient | Std. Error | R2 | Morning | Afternoon | Evening | |
| Fitbit | TSD | −0.273 | 0.544 | 0.622 | 0.014 | 0.825 | 0.511 | 0.610 |
| Wakeups | 1.570 | 1.005 | 0.136 | 0.119 | 0.329 | 0.672 | 0.857 | |
| Withings | TSD | −0.125 | 0.498 | 0.804 | 0.004 | 0.110 | 0.497 | 0.409 |
| Latency | −2.080 | 2.83 | 0.472 | 0.0291 | 0.869 | 0.016 ** | 0.013 ** | |
| Efficiency | 0.010 | 0.060 | 0.869 | 0.002 | 0.315 | 0.148 | 0.194 | |
| Wakeups | 0.352 | 0.427 | 0.421 | 0.036 | 0.888 | 0.361 | 0.378 | |
| REM | 0.260 | 1.962 | 0.896 | 0.001 | 0.342 | 0.617 | 0.557 | |
| Oura | TSD | −1.004 | 0.305 | 0.004 *** | 0.376 | 0.265 | 0.197 | 0.221 |
| Latency | −7.311 | 4.445 | 0.117 | 0.131 | 0.366 | 0.499 | 0.563 | |
| Efficiency | −0.092 | 0.040 | 0.033 ** | 0.228 | 0.285 | 0.332 | 0.301 | |
| Wakeups | 0.168 | 0.491 | 0.736 | 0.006 | 0.226 | 0.184 | 0.289 | |
| REM | −0.526 | 0.715 | 0.471 | 0.029 | 0.656 | 0.732 | 0.713 | |
| Hexoskin | TSD | 0.187 | 0.702 | 0.793 | 0.004 | 0.206 | 0.289 | 0.235 |
| Latency | 1.249 | 4.444 | 0.782 | 0.004 | 0.995 | 0.481 | 0.718 | |
| Efficiency | −0.226 | 0.272 | 0.417 | 0.037 | 0.530 | 0.527 | 0.798 | |
| REM | 0.397 | 1.833 | 0.831 | 0.003 | 0.128 | 0.186 | 0.180 | |
| SRSM | TSD | −0.725 | 0.558 | 0.210 | 0.086 | 0.725 | 0.361 | 0.273 |
| Latency | 1.846 | 4.033 | 0.653 | 0.012 | 0.935 | 0.210 | 0.261 | |
| Observations | 20 | 16 | 19 | 18 | ||||
Note: * p < 0.1; ** p < 0.05; *** p < 0.01.
In this collection of univariate linear models, the participants’ summary data are the independent variables, and cognitive score is the dependent variable. We present the p values of each univariate regression for cognitive score by timepoint. Please see Supplemental S9–S11 for more statistics related to these regressions. These metrics all represent standardized scores.
| Cognitive Scores ( | |||
|---|---|---|---|
| Feature | Morning | Afternoon | Evening |
| PSQI | 0.531 | 0.083 * | 0.057 ** |
| MEQ | 0.529 | 0.057 | 0.120 |
| Emotional Role Limitations | 0.665 | 0.005 *** | 0.003 *** |
| Energy | 0.700 | 0.018 ** | 0.010 *** |
| General Health | 0.769 | 0.823 | 0.961 |
| Physical | 0.014 ** | 0.745 | 0.597 |
| Social | 0.170 | 0.004 *** | 0.002 *** |
| Well-being | 0.078 * | 0.005 *** | 0.001 *** |
| Observations | 16 | 19 | 18 |
Note: * p < 0.1; ** p < 0.05; *** p < 0.01.
Figure 3Plot of missing sleep-related data including SRSMs. Due to various device preferences, missing data are asymmetric across devices.
Figure 4Average missing data for n-back tests by timepoint (morning, afternoon, and evening) and MEQ groupings (early, intermediate, or late).