| Literature DB >> 32047863 |
Brinnae Bent1, Benjamin A Goldstein2, Warren A Kibbe2, Jessilyn P Dunn1,2.
Abstract
As wearable technologies are being increasingly used for clinical research and healthcare, it is critical to understand their accuracy and determine how measurement errors may affect research conclusions and impact healthcare decision-making. Accuracy of wearable technologies has been a hotly debated topic in both the research and popular science literature. Currently, wearable technology companies are responsible for assessing and reporting the accuracy of their products, but little information about the evaluation method is made publicly available. Heart rate measurements from wearables are derived from photoplethysmography (PPG), an optical method for measuring changes in blood volume under the skin. Potential inaccuracies in PPG stem from three major areas, includes (1) diverse skin types, (2) motion artifacts, and (3) signal crossover. To date, no study has systematically explored the accuracy of wearables across the full range of skin tones. Here, we explored heart rate and PPG data from consumer- and research-grade wearables under multiple circumstances to test whether and to what extent these inaccuracies exist. We saw no statistically significant difference in accuracy across skin tones, but we saw significant differences between devices, and between activity types, notably, that absolute error during activity was, on average, 30% higher than during rest. Our conclusions indicate that different wearables are all reasonably accurate at resting and prolonged elevated heart rate, but that differences exist between devices in responding to changes in activity. This has implications for researchers, clinicians, and consumers in drawing study conclusions, combining study results, and making health-related decisions using these devices.Entities:
Keywords: Biomedical engineering; Imaging and sensing; Technology
Year: 2020 PMID: 32047863 PMCID: PMC7010823 DOI: 10.1038/s41746-020-0226-6
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Reported accuracy, outliers, evaluation process, and factors that affect performance by each device manufacturer.
| Company | Reported accuracy/outliers | Reported evaluation process | Reported factors that affect performance |
|---|---|---|---|
| APPLE | For a small percentage of users, various factors may make it impossible to get any heart rate reading at all[ | – | Skin perfusion, tattoos, rhythmic movements[ |
| FITBIT | – | 5000 + hours of activity, exercise, and sleep to iterate through their heart rate technology and that they have over 50 prototype iterations since 2010[ | – |
| GARMIN | Skin tone may affect heart rate accuracy but “Garmin designs our watches to work on all skin tones… the sensor may have to work harder [when more melanin is present in the skin] to find the pulse which can require slightly more battery power”[ | - | Wearing a watch too tightly, participating in activities that cause flexing of the wrist, tattoos[ |
| XIAOMI | – | – | – |
| EMPATICA | – | Provides information about algorithms used to calculate HR but not evaluation[ | – |
| BIOVOTION | HR is within ±5 bpm under motion. Mean absolute difference (MAD) = 3 bpm and mean absolute relative difference (MARD) = 3% under motion[ | The proprietary algorithms of the Everion are constantly tested and evaluated in our Algorithmics Lab. Biovotion is dedicated to delivering high quality and accuracy data to empower consumers to take control of their health. At Biovotion everybody is testing the devices under all kinds of conditions and we are working hard to improve the algorithms[ | Skin perfusion, tattoos, motion[ |
Device manufacturers sometimes report some expected sources of error, but the reporting and evaluation methods are inconsistent,[14–22] as shown in this table.
Fig. 1Graphical abstract of research.
Graphical abstract of research study presented. We present a full characterization of HR accuracy across skin tones, clinical metrics of HRV accuracy across skin tones, and HR during activity, rest, deep breathing, and typing for six wearable devices representing both consumer wearables and research-grade wearables. HR metrics are compared to the clinical-grade electrocardiogram (ECG) as the standard for heart rate measurement.
Results of mixed effects comprehensive and marginal models.
| Mixed effects model | Mean error |
|---|---|
| Comprehensive model | <2.20e−16*** |
| Marginal model: skin tone | 0.634 |
| Marginal model: activity condition | <2.20e−16*** |
| Marginal model: device | <2.20e−16*** |
| Marginal model: type of device | 3.44e−05*** |
| Interaction model: skin tone and device | 2.80e−05*** |
p Values show results of likelihood ratio tests between models and null models and interaction models.
Fig. 2Error in heart rate across skin tones and devices at rest and during activity.
Mean error in heart rate (bpm) across skin tones and devices at a rest and b during physical activity. The green horizontal line represents no error (no difference from the true measurement of HR from ECG). Mean absolute error in heart rate (bpm) across skin tones and devices at c rest and d during physical activity. Error is calculated as the difference between the ECG and wearable reported heart rate at every simultaneous measurement. Fitzpatrick skin tones 1–6 are represented with an approximately equal number of participants in each skin tone. Error bars represent the 95% confidence interval. Mean absolute error across devices and across skin tones at rest (e) and during activity (f). Error bars represent the 95% confidence interval.
Fig. 3Error in heart rate across all devices and analysis of missing values across consumer devices.
a Mean absolute error in heart rate (bpm) across devices during rest (teal) and activity (orange). This shows the true difference in HR from the ECG but does not show the sign of the difference. The green horizontal line represents no error (no difference from the true measurement of HR from ECG). Error bars show the 95% confidence interval. ** indicates significant difference in error between baseline and activity with a Bonferroni multiple hypothesis corrected p value of 0.0042. b Mean relative error in heart rate (bpm) across devices during rest (teal) and during activity (orange) shows the relative differences from the ECG. The green horizontal line represents no error (no difference from the true measurement of HR from ECG). Error bars show the 95% confidence interval. ** indicates significant difference in error between baseline and activity with a Bonferroni multiple hypothesis corrected p value of 0.0042. c Analysis of missing values across skin tones for rest and activity for consumer wearables. Research-grade wearables (Empatica, Biovotion) down-sample and/or interpolate to have exactly 1 Hz sampling rate and thus we could not calculate missingness values for those devices. Missingness is calculated from the expected sampling rate (reported sampling rate for Apple Watch and Garmin and study average sampling rate for Garmin and Miband, which do not report sampling rate). Missingness that is positive indicates percentage of values with missingness. Missingness that is negative indicates a greater than expected sampling rate (more values than expected).