Literature DB >> 36046637

What possibly affects nighttime heart rate? Conclusions from N-of-1 observational data.

Igor Matias¹, Eric J Daza², Katarzyna Wac¹.

Abstract

Background: Heart rate (HR), especially at nighttime, is an important biomarker for cardiovascular health. It is known to be influenced by overall physical fitness, as well as daily life physical or psychological stressors like exercise, insufficient sleep, excess alcohol, certain foods, socialization, or air travel causing physiological arousal of the body. However, the exact mechanisms by which these stressors affect nighttime HR are unclear and may be highly idiographic (i.e. individual-specific). A single-case or "n-of-1" observational study (N1OS) is useful in exploring such suggested effects by examining each subject's exposure to both stressors and baseline conditions, thereby characterizing suggested effects specific to that individual. Objective: Our objective was to test and generate individual-specific N1OS hypotheses of the suggested effects of daily life stressors on nighttime HR. As an N1OS, this study provides conclusions for each participant, thus not requiring a representative population.
Methods: We studied three healthy, nonathlete individuals, collecting the data for up to four years. Additionally, we evaluated model-twin randomization (MoTR), a novel Monte Carlo method facilitating the discovery of personalized interventions on stressors in daily life.
Results: We found that physical activity can increase the nighttime heart rate amplitude, whereas there were no strong conclusions about its suggested effect on total sleep time. Self-reported states such as exercise, yoga, and stress were associated with increased (for the first two) and decreased (last one) average nighttime heart rate. Conclusions: This study implemented the MoTR method evaluating the suggested effects of daily stressors on nighttime heart rate, sleep time, and physical activity in an individualized way: via the N-of-1 approach. A Python implementation of MoTR is freely available.

Entities: Chemical

Keywords: Auto experimentation; causal inference; endogeneity; longitudinal; n-of-1 trial; nighttime heart rate; resting heart rate; self-reporting; stress; wearables

Year: 2022 PMID： 36046637 PMCID： PMC9421014 DOI： 10.1177/20552076221120725

Source DB: PubMed Journal: Digit Health ISSN： 2055-2076

Introduction

Background

The emergence and ubiquitous availability of personal miniaturized technologies, including self-tracking mobile and wearable devices, enable continuous, longitudinal data collection and facilitate “self-knowledge through numbers,” fulfilling the vision put forward by the “quantified-self” founders.[1,2] Motivated individuals leverage these technologies, as well as self-reporting tools to track their behaviors, including those related to physical activity, sleep, alcohol consumption, foods, the presence of psychological stress, air travel, or more. Additionally, these technologies enable capturing certain physiological signals like body temperature (temp), respiration rate (RR), heart rate (HR), heart rate variability (HRV), or galvanic skin response (GSR) corresponding to the physical or psychological state of the individual.[3,4] Individuals can track a single behavior at its simplest, and use their self-tracking data for self-experimentation, changing it in the desired direction, like walking more steps or sleeping enough. However, these technologies can also enable more complex interventions and, if paired with disciplined scientific approaches to data analysis, they can provide more robust personalized insights.[5,6] They are also able to help detect or even predict health issues by the mean of more advanced measurements like an electrocardiogram (ECG). When combining wearable ECG signals with artificial intelligence algorithms, illness prediction is possible, transforming these ubiquitous and accessible devices into a powerful source of self-information. This study employs an n-of-1 observational study (N1OS) design and integrates data from two different technological touchpoints: a consumer-grade behavior and physiology tracking device; and an electronic self-reporting tool. We use the data to characterize nonathlete individuals and test our main hypotheses on the correlation of daily stressors with nighttime HR, an important health concern in the context of cardiovascular health. The nighttime HR is specifically defined as a nighttime resting heart rate when the body returns to a baseline, and no daily-life stressors are present. We will sometimes use the term “correlation” interchangeably with the broader and more statistically accurate term “association” for ease of understanding. However, note that the statistical definition of “correlation” is narrower than is commonly meant; i.e. a non-linear statistical association or dependence is not a statistical correlation. Additionally, we evaluate the analytic impact of model-twin randomization (MoTR) on our inferences and conclusions. MoTR (“motor”) is a new causal inference method that artificially emulates an n-of-1 randomized trial (i.e. the gold standard due to randomization) from the N1OS dataset. It does so by first modeling the outcome of interest as a function of the exposure of interest, along with an individual's assumed recurring confounders (i.e. daily observed variables thought to influence or affect both the exposure and the outcome). MoTR then randomly shuffles (i.e. permutes) the exposures, which were originally only observed, thereby simulating an n-of-1 randomized trial. This allows us to infer more accurately a suggested effect of daily stressors beyond just correlation. Note that this study is not a case report, an observational study of a single participant. Unlike a case report, which has limited internal validity, our study uses MoTR to improve the veracity of findings of possible causal effects. In this way, an N1OS enables the discovery of findings for a given individual that is hard to achieve with standard group-based observational study designs —and MoTR adjusts these findings to suggest possible interventions. These causal inference methods also facilitate the subsequent design and testing of the suggested effects in an n-of-1 randomized trial of these discovered effects. The operational objective of this paper is to establish the feasibility of the N1OS design augmented with MoTR for generating and evaluating hypotheses about the idiographic (i.e. individual-specific) recurring average effect of an exposure (e.g. daily stressors) on the self-tracked outcome of each participant (e.g. nighttime HR). The analogous nomothetic (i.e. group-level) effects in randomized controlled trials (RCTs) are called “average causal effects” or “average treatment effects” (ATEs). We chose daily life stressors like physical activity, insufficient sleep, excess alcohol consumption, certain foods, presence of psychological stress, and air travels as the exposure variables because they have a profound and acute effect on several aspects of health in a short, as well as long-term, especially when repeated and are behaviors that may be commonly tracked on current consumer devices or via a minimum self-reporting efforts. The nighttime HR is our selected health biomarker because it is affected by daily stressors in nonathletes and is also an important outcome measure of cardiovascular health.[13-16] This hypothesis exploration is based on the relevant literature on the importance of managing daily life stressors for short-term and long-term health outcomes. The intentional choice of nonathletic individuals was with an eye for preventing a chronic disease involving the cardiovascular system. The additional objective was to evaluate the MoTR method for generating and testing such idiographic hypotheses, potentially facilitating personalized management of stressors in daily life. As a result, we demonstrate an observational study design and analysis plan to contribute to and help guide rigorous self-tracking and n-of-1 study designs.

N-of-1 study designs: Experimental and observational

An n-of-1 study, also known as a single-subject or single-case study, is a scientific study focused on a single individual. Such studies are used to better understand the individual-specific effect of an intervention or association of exposure with an outcome, for example how behavioral changes in a specific person causally affect or associate with daily-life stressors and nighttime HR the following night. There are two types of n-of-1 studies. An n-of-1 randomized trial (N1RT) is a randomized crossover design in which a participant acts as their baseline (i.e. control), and is randomly assigned to an active treatment over multiple treatment periods. For example, a participant may be randomized to the sequence of treatment period denoted ABBA, where A and B represent an active and baseline treatment, respectively.[12,17] The N1RT design requires experimentation (i.e. comparing the intervention results with a baseline) and has been increasingly used in clinical trials and biomedical research. A related study design, the single-case experimental design, has been extensively applied in psychology and education.[5,18-21] The target quantity that can be inferred using an N1RT is a recurring average effect that calls an “average period treatment effect” (APTE). In an N1RT, a period is defined as a recurring time interval during which treatment or intervention is randomly assigned. Treatment levels are not required to change every period, but they must be randomized. For example, for intervention levels A and B each randomized twice with equal probability, and the sequences ABBA, AABB, and BABA all have equal probability. The APTE is the n-of-1 analog to the population ATE of RCTs (defined above). An n-of-1 observational study (N1OS) is a scientific study design involving a single individual without any structured randomization, akin to ecological or epidemiological studies. While an N1RT generally carries more internal validity (i.e. cause–effect relationship) than an N1OS, the latter generally carries more external or ecological validity (i.e. generalizability to real-world situations) than an N1RT. The would-be “intervention” and “baseline” are derived from real-world data, without randomization of the individual to the condition. In an N1OS, a period is defined as a recurring time interval during which exposure is observed (e.g. a day after sleeping for a certain amount of time). As with N1RT treatments, exposure levels are not required to change every period; however, they are generally not randomized, being observed as they naturally occur. “Exposure” is the epidemiological term for the would-be randomized treatment in a corresponding N1RT design; i.e. we wish to infer a reasonable or plausible causal effect (i.e. APTE) of the exposure on the outcome that we would otherwise more definitively infer by randomizing the exposure. N1OS designs offer several opportunities for health psychology and behavioral medicine; they can be used to describe changes in naturally occurring phenomena (e.g. behaviors) over time. They can also enable testing the hypotheses related to the relationships between variables, such as those specified in behavioral theories.[24-27] Furthermore, N1OS can be used to design highly personalized, data-driven interventions based on the unique predictive relationships identified at the individual level. Note that while causal inference methods for observational data like MoTR can actually help estimate possible treatment effects (which prediction/correlation methods generally cannot), none of these methods—not even MoTR—can definitively estimate an average causal effect. This is because estimating an APTE in an N1OS requires measuring all (or at least the strongest) treatment-effect confounders, and also correctly modeling the relationship between the exposure, outcome, and all confounders. These two strong assumptions highlight that “there is no free lunch” in trying to infer a causal effect without randomization. To convincingly estimate an APTE, the exposure must be manipulated or randomized; otherwise, these two crucial assumptions must hold—assumptions that generally cannot be tested simply by updating or trying new models using the same dataset (i.e. without manipulating the exposure directly to produce a new dataset). At best, we can and will assume that we have observed the strongest (not necessarily all) confounders, and that our chosen model is reasonably correct (i.e. correctly approximates the “true” causal model). This relaxed assumption allows for bias, but is much more realistic. With our observational data, the most we can do is hope that our set of confounders is complete enough, and that our models are correct enough, to keep the true, unknown bias in estimating the APTE small.

Relationship to longitudinal studies

N1OS is related to longitudinal studies that use common statistical approaches like mixed- or random-effects modeling or generalized estimating equations. However, these two study designs differ fundamentally in their analytic goals concerning levels of inference. In a longitudinal study, the analytic goal is to infer the average trend overtime over a group of participants, i.e. it is a nomothetic goal. However, repeated measurements for each study participant may induce within-individual autocorrelation that reduces the overall information on the group-level trend. This may increase the variance of the trend estimator, which is a “nuisance” to reaching the goal; hence the common need to deal with “statistical nuisance parameters” in longitudinal studies. In an N1OS, the analysis goal is to infer a recurring average association over a set of repeated measurements within one participant; that is, it is an idiographic goal. A group-level association may be useful as a starting point to help specify individual-level a priori hypotheses or as a starting value in some iterative analytic approaches. However, the entire approach of conducting an N1OS assumes that the within-individual average meaningfully differs from the group average—even, perhaps, for a large group of similar individuals. Hence, N1OS a priori hypotheses ideally rely on the participant's own experiences, opinions, and beliefs, and their past self-tracked data if available. That is its core principle. If group-based findings in the scientific literature are deemed useful for structuring the idiographic a priori hypotheses, these can and should also be used. However, an N1OS by design privileges the participant's own prior beliefs above any group-based findings—some of which may inform those prior beliefs. The process of each participant designing their own a priori hypotheses resembles prior elicitation in Bayesian modeling, but where the study participant is the “domain expert” of their N1OS.

Daily-life stressors: exercise, sleep, alcohol and food, psychological stress, aircraft travels, and nighttime HR

The growing body of research indicates the importance of HR (including nighttime HR) as a prognostic factor and potential therapeutic target in populations at large. The resting HR shows a clear circadian rhythm, being substantially higher during waking hours, but the variations are relatively small, between 10 ± 6 beats/min. Additionally, HR also changes with posture, being some 3 beats/min higher in the sitting compared with the supine position. In this work, the nighttime HR considered is specifically defined as HR while sleeping, when there are no daily life stressors. Research shows that although it may be difficult to define an optimal HR for a given individual, it seems desirable to maintain low nighttime HR. High nighttime HR has direct detrimental effects on the progression of cardiovascular diseases (CVD). Studies have specifically found a continuous increase in the risk of CVD with nighttime HR above 60 beats/min,[33-35] which is very important, especially given the increasing prevalence of CVDs leading to premature deaths.[36,37] When considering a desirable or optimal HR for an individual, demographic and measurement factors must also be considered. Namely, HR has been reported to decrease with age, although this has not been seen in all studies, and HR is higher in women than in men. The nighttime HR is influenced by physical stressors experienced in the preceding day, like exercise or mental stress. On the one hand, research shows that higher overall activity level and athletic capacity leads to lower heart rate.[40,41] A systematic meta-review by Reimers et al. shows that especially endurance training, yoga, and strength training conducted at least 2 times a week for at least 4 weeks have shown decreases in heart rate. However, with acute exercise exertion like multiple hours of running or biking, the resulting nighttime HR is shown to be higher.[13,43] Additionally, a systematic meta-review by Kredlow et al. shows that the overall activity levels and acute exercise have small beneficial effects on sleep duration. Additionally, nighttime HR is influenced by insufficient and variable time of sleep, excess alcohol, certain foods (e.g. greasy), presence of psychological stress,[15,45-47] or air travels decreasing oxygen saturation in the blood. When considering the nighttime HR and factors influencing it, it is also important to understand their minimally important differences in their values, which may be considered relevant from the clinical perspective. Concerning HR itself, the research focused mostly on evaluating changes in resting heart rate (measured when awake and calm) in longitudinal observational studies. Chen et al. show that an increase of 1 beats per minute (BPM) in 10 years was associated with a 3% higher risk for all-cause death, 1% higher risk for CVD, and 2% higher risk for coronary heart disease. An increase of 5 BPM is associated with a higher risk of cardiovascular disease, heart failure, and overall all-cause mortality. A further increase of 10 BPM relates to an increased probability of mortality, while a resting HR above 60 BPM increases this risk almost exponentially. Summarizing these findings, we consider minimally important differences in nighttime HR as 1 BPM from the clinical relevance perspective. The research results are as follows when considering the minimally important differences in total sleep time (TST). Overall, it is recommended for an adult to sleep between 7 and 9 hours a day.[51-53] Furthermore, it is important to notice that most of the sleep-related variables in longitudinal observational studies are self-reported. Sleeping less than 5 hours relates to an increase in the risk of chronic illness while sleeping less than 6 hours compared to 7–8 hours was correlated with more fat accumulation. For patients with knee osteoarthritis, a cut-off point of 382 minutes (6.5 hours) of TST has been found important for their health; more sleep corresponded to better disease management. A decrease of TST of 23 minutes or about 0.6 minutes per year for 36 years of follow-up has been assigned to the effects of aging. Patients with poorer overall health were found to sleep 39–46 minutes less than comparable healthy populations. Summarizing these findings, we consider minimally important differences in TST as 23 minutes from the clinical relevance perspective. When considering the minimally important differences in steps and step length, the research results are as follows. A value of 121 steps/day has been indicated as a result of a minimally important change in physical intervention studies—RCTs evaluated via a systematic literature review. An increase in 226 daily steps has been found for one of the intervention groups (RCT) in a study involving CVD patients. Similarly, in an RCT with chronic obstructive pulmonary disease patients, an improvement of 427 steps or deterioration of 456 steps a day has been found clinically significant for their health outcomes. In a similar study that value was 600 steps. Overall research results show that walking an additional 1000 steps per day can help to achieve better health outcomes in cancer patients, in fibromyalgia patients, and lower the risk of all-cause mortality in the general population. For every increase of 2000 steps per day in the general population, the risk of chronic illness is decreased across multiple health outcomes. As for the stride length, there exist fewer research results linking it to the health outcomes in longitudinal studies, likely because of the challenge in its instrumentation to measure it accurately in daily life environments. Boyer et al. focused on the in-lab assessment of millimeter changes in stride length in the context of an assessment of the impact of injuries on patients’ mobility. Hannik et al. assessed stride length with 1-cm accuracy in the context of geriatric care, while Rampp et al. achieved 1.5 cm accuracy in a similar research context. From the clinical relevance perspective, we summarize these findings and consider minimally important differences in steps per day as 121, and stride length as 1 cm.

Organization of the document

The remainder of our paper is organized as follows. We present our study design, the devices used and their accuracy, our hypotheses, and our analysis plan in the Materials and methods section, particularly how to collect and analyze data across devices and time within the individuals contributing to this N1OS. We report our main findings in the Results section, along with findings that can better inform a future N1OS or even N1RT designs in the same context of managing stressors and health outcomes. We summarize our findings in the Discussion section and reflect on our findings and experiences in this study, indicating the potential future work areas.

Materials and methods

The section below presents the resources and methods applied while conducting the study. Hence, in the subsection “Study Design,” we present the description and organization of the different data types used, as well as general statistical principles and an overall modeling approach. The subsection “Participants and Collected Data Summary” provides information about the three participants (IM, EJD, and KW) and the mean values for all the exposure values. In the subsection “Accuracy of sleep duration, steps, distance, and heart rate monitored with Fitbit and Apple Watch,” we discuss the validity of the wearable devices used to collect data. Finally, in the subsection “Statistical Analysis Plan,” we present our a priori hypotheses and the statistical planning of this research.

Study design

Exploratory N1OS study goals and approach

This is an exploratory N1OS. This is not a confirmatory study, which has the goal of replicating fairly well-known relationships between well-defined variables (i.e. testing/confirming discovered or formerly reproduced scientific hypotheses). Instead, the goal of this study is to characterize largely unknown relationships between variables that are not yet well-defined in the scientific literature; that is, its goal is to suggest scientific hypotheses to be tested or confirmed in future studies. With respect to our hypotheses, investigated a number of these that we relied on in forming our a priori hypotheses in Subsection “A Priori Hypotheses.” True to our exploratory goals, these are broad in scope and do not specify exact quantities, but rather directions (e.g. increasing or introducing X causes Y to decrease). Rather, we created our a priori hypotheses based on both the findings of and our own experiences and reflections. This approach reflects the N1OS core principle mentioned in Section “Relationship to longitudinal studies.” The participant is also the study's “principal investigator” and domain expert—the domain being their own past health history and experiences. Other information (e.g. the scientific literature) only serves to supplement their own understanding of this domain, how to create a priori hypotheses, and how to assess exploratory hypotheses.

Estimating credibility and true quantity discernibility

In this study, we depart from common statistical practice in one important way that we hope improves our scientific communication. The term “significant” is largely misunderstood as meaning “scientifically, clinically, or practically important.” Statistical significance is unrelated to scientific significance but has been ubiquitously misunderstood as meaning “significant.” This well-documented and long-standing phenomenon is called the “significance fallacy,”[72-75] a key contributor to the replication crisis in biomedicine and psychology. Hence, leading statistical authorities have strongly recommended abandoning the phrase “statistical significance” entirely,[76-79] necessitating a search for another phrase to describe the amount of statistical evidence in research findings. vInstead, we will proceed as follows. We will continue to describe a P value in terms of its “statistical significance.” However, we will describe its corresponding estimate in terms of “statistical discernibility.” For example, if an estimated effect of 2 has a P value of .001, we might describe the estimate as being statistically discernible for the true, unknown effect (or simply say the estimate is “discernible”). That is, there is sufficient statistical evidence that 2 is a statistically valid estimate of the true, unknown value. If that estimate of 2 had a P value of .83, we might say 2 is not statistically discernible as the true effect is an example of a publication that successfully used this lexical strategy. Our hope in taking this approach is to avoid committing the common error of making scientifically unsupported claims (i.e. based on the statistical qualities of an estimate, rather than on the size and direction of the estimate itself). For example, we might incorrectly claim that “there was a significant effect of getting more sleep on step count the next day, i.e. more sleep causes an increase of 2 steps (p = .001)”, when in fact the true finding is, “there was a discernible effect of getting more sleep on step count the next day; i.e. more sleep causes a credible increase of 2 steps (p = .001), but this small increase may not be practically significant.”

Modeling approach

For each participant, we fit Granger models over each participant's analysis period (i.e. time frame of available data). These are the time series linear models fit by, combinations of which might together comprise a vector autoregression. “Granger” refers to so-called “Granger causality,” which by causal inference definition is in fact only an association/correlation/prediction of one time series with another time series—not a causal effect of one time series on another, as we are attempting to estimate in this study. Predictors include lagged values of both the dependent variable (DV) and independent variables (IDVs). Our own DVs and IDVs resemble theirs, as detailed below. We will not fit any generalized additive models like they did, as they found that these did not perform notably better than their linear models. We also included calendar-based control variables in our models, following the examples in Table 1 (e.g. weekend indicator). To enforce the temporal order needed to conduct causal inference, we made sure all model DVs occurred after their IDVs and control variables, that is generally no overlap in time is allowed between any model predictor and its corresponding outcome.

Table 1.

Types of data used in the study.

Variable	CB	WM	SR	Used for hyp. type(s)	Type	Units/values
Weekend	X			A and B	Binary	0 or 1
Year	X			B	Discrete	0 to 4
Month	X			A and B	Discrete	0 to 11
Season	X			A and B	Discrete	0 to 3
TST		X		A	Continuous	Seconds
SAT		X		A	Continuous	Steps per second
Step length		X		A	Continuous	Meters
Nighttime HR		X		A and B	Continuous	Beats per minute
DIF-HR		X		A and B	Continuous	Beats per minute
Socializing			X	B	Binary	0 or 1
Yoga			X	B	Binary	0 or 1
Exercise			X	B	Binary	0 or 1
Fasting			X	B	Binary	0 or 1
Tired/sick/stress			X	B	Binary	0 or 1
Holiday			X	A	Binary	0 or 1
Vacations			X	B	Binary	0 or 1
Short air travel			X	B	Binary	0 or 1

Abbreviations: CB: calendar-based; WM: wearable-measured; and SR: self-reported control variables.

Types of data used in the study. Abbreviations: CB: calendar-based; WM: wearable-measured; and SR: self-reported control variables.

Participants and collected data definition

The participants of this study were all its three authors, Igor Matias (IM), Eric J. Daza (EJD), and Katarzyna Wac (KW). The data were collected via self-reports and personal wearables used by all three authors (IM, EJD, and KW) for different periods. Seventeen types of data are organized into three main categories: calendar-based (CB) control variables; self-reported (SR); and wearable-measured (WM). Table 1 illustrates their splitting and main characteristics. Two main categories of hypotheses to test were defined according to the time frame of the available data: The CB data type is defined as follows. Weekend, a binary variable, with “1” for Saturday or Sunday and “0” for any other day of the week. Year, a discrete variable between “0” and “4”, representing the years 2017, 2018, 2019, 2020, and 2021, respectively. Month, a discrete variable between “0” and “11” for every month of the year, chronologically ordered from January to December. Season, a discrete variable between “0” and “3”, representing “Summer,” “Autumn,” “Winter,” and “Spring,” respectively, according to the astronomical seasons. Type A hypotheses—included data from all three individuals (IM, EJD, and KW) from 14 August 2020 until 8 January 2021 (148 days per person), with only WM data for the first two (IM and EJD), and with CB, SR, and WM for KW. Type B hypotheses—included data only from one participant (KW) from 13 February 2017 until 13 August 2020 (1278 days), including CB, SR, and WM. For the WM category, we defined five variables as follows. TST is a continuous variable for the total seconds of sleep during the main nighttime sleep period, excluding naps. Steps per awake time (SAT) is defined as a daily average, calculated as the incremental steps that day divided by the seconds between the waking time and going to bed that night (akin to average daily walking speed). We used SAT instead of total steps per day, as the number of steps is dependent on the awake time each day. Step length is defined as a daily average (measured in meters), calculated as the total distance logged that day divided by the total number of steps. Nighttime HR is calculated as the average HR during that night's sleep. Difference HR (DIF-HR) is defined as the difference between the maximum and the minimum heart rate registered during sleep that night after, and it helps characterize the maximum range in nighttime HR. The third category, SR, included seven binary variables defined as follows. Socializing is defined as “1” when socializing in the evening (which in most cases implied eating-out, hence consumption of non-routine foods and potentially of moderate amounts of alcoholic drinks) and “0” otherwise. Yoga is defined as “1” when practicing yoga during the afternoon/evening and “0” when not. Exercise with “1” when any acute physical exercise was practiced during the day (e.g. gym session, running, or long biking). Fasting defined between “1” and “0” whether the participant fasted (since the dinner a day before, for a full day) or not, respectively. Tired/Sick/Stress is defined as “1” when having a tiring day, feeling sick, or experiencing high-stress levels during the day. Holiday (for Type A) or Vacations (for Type B) is positive when going on vacations or having a non-working day, such as a weekend. Short air travel is defined as “1” when traveling by air within the same continent during the daytime and “0” when not traveling or traveling for longer periods or nighttime. The SR variables were collected daily, using manual annotation of personal notes.

Participants and collected data summary

As defined above, the participants of this study were all its three authors: IM; EJD; and KW. On the last day of the experiment (8 January 2021), IM was a 24-year-old male with a normal body mass index (BMI), EJD was a 41-year-old male with a normal BMI, and KW was a 41-year-old female with a normal BMI. All the participants were healthy (i.e. no unusual medical history), not experiencing any notable work- or family-related stresses, nor disturbances or abnormalities in walking, sleeping, or in any cardiovascular aspects. For the Type A hypotheses’ time frame, the mean value of SAT for IM, EJD, and KW, was 0.07 ± 0.03, 0.10 ± 0.05, and 0.22 ± 0.09 steps per second awake, respectively. In the same way, the mean value of TST for IM, EJD, and KW was 25854.55 ± 3530.30 (7 hours, 10 minutes, 54.55 s ± 58 minutes, 50.30 seconds), 28683.21 ± 5127.09 (7 hours, 58 minutes, 03.21 seconds ± 1 hour, 25 minutes, 27.09 seconds), and 30358.39 ± 3232.53 (8 hours, 25 minutes, 58.39 seconds ± 53 minutes, 52.53 seconds), respectively. For both A and B hypotheses’ time frames (1426 days in total), the numbers of positive days for socializing were 369 (25.88%), 68 for yoga (4.77%), 125 for exercise (8.77%), 22 for fasting (1.54%), 150 for tired/sick/stress (10.52%), 280 for holiday (A) and vacations (B) combined (19.64%), and 117 for a short air travel (8.21%).

Accuracy of sleep duration, steps, distance, and heart rate monitored with Fitbit and Apple watches

All wearable measured (WM) data were collected using a Fitbit Charge 2™ (FC2), Charge 3™ (FC3), Charge 4™ (FC4) (Fitbit, Inc., San Francisco, CA, USA), and an Apple Watch (AW) Series 5™ (Apple Computer, Inc., Cupertino, CA, USA). All of them connect via Bluetooth™ to a smartphone, the last one (AW) is only fully compatible with iOS™ devices. Within the Type A hypothesis’ time frame, IM used an AW Series 5, EJD an FB3, and KW used an FC4. As for Type B's time frame, KW used an FB2 until 17 April 2020, changing to FB4 afterward. Although all devices can measure all the WM data this study needed, these devices use different sensors and components. It is therefore important to discuss the accuracy of each one of them. As for sleep, because this study did not consider the sleep stages, we will only evaluate the accuracy of sleep total time assessment for the used devices. As validated by de Zambotti et al., FC2 overestimated TST by 9 min when compared with polysomnography (p < .05). In the same way, compared FC3 and found an inverse conclusion, with an underestimation of TST of about 11 minutes. For FC4, studies of evaluation were not found. Last, AW (no version specified by the literature) overestimated TST by 4.65 minutes, as tested by Roomkham et al. As for the steps, when evaluating FC2 against an ActiGraph GT3X™, found an overestimation of 2451.3 ± 2085.4 steps per day by using the average over seven days of comparison, 32.2 ± 40.7% above the comparison measurement, with a correlation of r = 0.58, p = .02. By performing a 24 minutes exercise, at different speeds, concluded about an error of 1.07 steps for the AW compared to the manual count obtained from video recordings, with a total error of 0.034% and a correlation of r = 0.96, p < .001. The evaluation results for HR are as follows. To validate the HR measured with an FC2 and AW Series 3, compared both to a gold standard electrocardiograph and found that both devices slightly underestimated HR across 24 hours. While sleeping, FC2 showed a mean absolute error (MAE) of 2.15 BPM and mean absolute percentage error (MAPE) of 3.36%, where AW Series 3 had an MAE of 1.96 BPM (MAPE of 3.12%). compared FC3 to other well-known wearable devices such as Polar H10™, and documented an underestimation of HR by 7 BPM by an FC3, although this study did not follow the same gold standard approach as the first one. Finally, when evaluating the measurement of distance traveled during the day, compared an FC device with others available at the time, placing FC among the best with a MAPE lower than 5.6%. followed a similar approach and compared AW Series 4 with other brands’ devices, documenting that the overall MAPE <5% ranges from 0.9% to 4.1% only.

Statistical analysis plan

A Priori hypotheses

We investigated a total of eleven a priori hypotheses of two types (A and B, defined in Subsection “Participants and Collected Data Definition”) and divided them into three groups. We tested an association between an exposure (i.e. IDV) and an outcome (i.e. DV) for each hypothesis. The exposures are SAT, TST, socializing, yoga, exercise, fasting, tired/sick/stress, vacations, and short air travel. The outcomes are TST, step length, DIF-HR, and nighttime HR. All outcomes were log-transformed and treated as continuous variables. All exposures were treated as binary variables indicating the presence (enumerated as 1) relative to the absence (enumerated as 0) of the exposure or as having a high (1) versus low (0) exposure value. We dichotomized continuous exposures in keeping with the traditional Neyman-Holland-Rubin potential-outcomes approach that compares average outcomes between only two treatment levels.[90-92] We assigned a threshold for each exposure per participant to separate their high and low values. These thresholds were set as the observed per-participant mean value of the exposure over each participant's entire analysis period (see Subsection “Participants and collected data definition” for values). After dividing all the hypotheses into two types (A and B), we assigned them to three different groups (Steps-TST, Diff-HR, and Nighttime HR). The first group, Steps-TST, included two hypotheses (H1 and H2) with TST and step length as outcomes. The second group, Diff-HR, included two other hypotheses (H3 and H4) in which the outcome was the DIF-HR. The third group, Nighttime HR, included the remaining seven hypotheses (H5, H6, H7, H8, H9, H10, and H11) having nighttime HR as the outcome. Table 2 illustrates the splitting of the hypotheses across the two types (A and B) and three groups (Steps-TST, Diff-HR, and Nighttime HR). In this paper, the Steps-TST hypotheses were more conjectural in nature (i.e. stemming from curiosity); in contrast, we had stronger prior beliefs about our remaining nine nighttime heart rate hypotheses (i.e. Diff-HR and Nighttime HR).

Table 2.

A priori hypotheses.

Hyp.	Exposure	Exp. Change	Outcome	Out. change	Type	Group
H1	SAT	Increase	TST	Increase	A	Steps-TST
H2	TST	Increase	Step length	Increase	A	Steps-TST
H3	SAT	Increase	DIF-HR	Decrease	A	Diff-HR
H4	Socializing	Presence	DIF-HR	Increase	B	Diff-HR
H5	SAT	Increase	Nighttime HR	Decrease	A	Nighttime HR
H6	Yoga	Presence	Nighttime HR	Decrease	B	Nighttime HR
H7	Exercise	Presence	Nighttime HR	Decrease	B	Nighttime HR
H8	Fasting	Presence	Nighttime HR	Decrease	B	Nighttime HR
H9	TSS	Presence	Nighttime HR	Increase	B	Nighttime HR
H10	Vacations	Presence	Nighttime HR	Increase	B	Nighttime HR
H11	Short air travel	Presence	Nighttime HR	None	B	Nighttime HR

Abbreviations: TST: total sleep time; SAT: steps per awake time; and TSS: tired/sick/stress.

A priori hypotheses. Abbreviations: TST: total sleep time; SAT: steps per awake time; and TSS: tired/sick/stress. We specified the two Steps-TST hypotheses as follows. Hypothesis 1 (H1) was that an increase in SAT was associated with an average increase in TST the next night. Hypothesis 2 (H2) was that an increase in TST was associated with a longer average step length the day after. We specified the two Diff-HR hypotheses as follows. Hypothesis 3 (H3) was that an increase in SAT was associated with an average decrease in DIF-HR. Hypothesis 4 (H4) was that socializing was associated with an average increase in DIF-HR afterward. As in Subsection “Participants and Collected Data Definition,” we define DIF-HR afterward as the difference between the highest and the lowest HR during the sleep period after the socializing event. For example, if socializing refers to a social event in the evening of the day, the DIF-HR refers to the night that same day (recall this is defined while sleeping), after the evening ends. We specified the seven Nighttime HR hypotheses as follows. Hypothesis 5 (H5) was that an increase in SAT was associated with an average decrease in nighttime HR the following night. We expected to have different levels of association for H3 and H5 (outcome change being DIF-HR for the first and Nighttime HR for the other). Hypotheses 6–8 (H6 to H8) were that yoga, exercise, and fasting were all associated with an average decrease in nighttime HR the night after. Hypothesis 9 (H9) was that a tiring day, sickness, or high-stress levels during the day were collectively associated with an average increase in nighttime HR. Hypothesis 10 (H10) was that going on vacation was associated with an average increase in nighttime HR, as the possible sources of stress of vacations were distinct from those on non-vacation stressful days (i.e. due to different physical activities, different sleep hours, sleeping in a different bed, alcohol intake, traveling, among others). Hypothesis 11 (H11) was that short air travel would not be associated with a meaningful average change in nighttime HR. We also included variables in each model to account for suggested effect modification by the CB variables. These are specified as interaction terms between an IDV and each CB variable included in a model. We did so in case the average daily effect of an IDV on a DV might vary based on a CB variable. For example, in H1, suppose the effect of taking more steps increases average TST the following night in the Summer than in the Fall for IM. This might be because IM has fewer scheduled early workdays in the Summer, allowing him to sleep longer after an active day with many steps, or even due to better weather conditions during Summer.

Causal hypotheses via MoTR

Thus far, all hypotheses have been assumptions of statistical association or correlation, not causation. In this paper, we went further and employed the MoTR method to simulate an N1RT after adjusting for other assumed confounders. MoTR allows us to change these hypotheses of association to hypothesized effects that can be statistically tested. MoTR is a Monte Carlo approach to estimating the APTE that works as follows. It takes as its input a model fit to a dataset, randomly shuffles the exposures (IDVs previously dichotomized as in Subsection “A priori hypotheses”), and then sequentially predicts the outcome (DV) for all time points (or “periods” in APTE parlance) in the study period. The average outcomes under high and low exposures are compared, yielding an APTE estimate with a P value (and, thereby, confidence interval). Because many random sequences of exposures are possible given the longitudinal datasets, MoTR repeats this procedure many times by randomly shuffling exposures differently each time. This creates multiple Monte Carlo simulation runs. The final mean APTE estimate and P value were reported once the APTE estimate stabilized (after a minimum of 1000 runs), or at run 10,000 (to set a computational time limit on the MoTR algorithm), whichever occurred first. (See the Supplementary Materials and Formulas for details on these convergence criteria.)

Exploratory, testing, and confirmatory phases

In general, we conducted four types of analyses. Exploratory A and Exploratory B analyses were first conducted as this was the main goal of our paper. We then proceeded to the Testing and Confirmatory phases. We conducted a few Confirmatory analyses based on loosely defined a priori hypotheses. This was done to demonstrate how to apply MoTR in a confirmatory study. These types were separated according to their input and the main goal, as represented in Tables 3 and 4.

Table 3.

Exploratory and confirmatory phases planning.

Exploratory A
	Controlling for	“Weekend,” “Holiday,” “Month,” interactions IDV“Weekend,” IDV“Holiday,” and IDV*“Month” (A-MONTH)			“Weekend,” “Holiday,” “Season,” interactions IDV“Weekend,” IDV“Holiday,” and IDV*“Season” (A-SEASON)
	Days of lag	1 to 10 days	1 to 10 days	1 to 10 days	1 to 10 days	1 to 10 days	1 to 10 days
	Participants	KW	EJD	IM	KW	EJD	IM
	Hypotheses	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5
	Total hyp.	40	40	40	40	40	40
		120			120
		240
Exploratory B
	Controlling for	“Weekend,” “Vacations,” “Year,” “Month,” interactions IDV“Weekend,” IDV“Vacations,” IDV“Year,” and IDV“Month” (B-MONTH)			“Weekend,” “Vacations,” “Year,” “Season,” interactions IDV“Weekend,” IDV“Vacations,” IDV“Year,” and IDV“Season” (B-SEASON)
	Days of lag	1 to 10 days			1 to 10 days
	Participants	KW			KW
	Hypotheses	H4, 6, 7, 8, 9, 10, 11			H4, 6, 7, 8, 9, 10, 11
	Total hyp.	70			70
		140
Confirmatory
	Controlling for	“Weekend,” “Holiday,” and “Month” (C-MONTH)			“Weekend,” “Holiday,” and “Season” (C-SEASON)
	Days of lag	1	1	1	1	1	1
	Participants	KW	EJD	IM	KW	EJD	IM
	Hypotheses	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5	H1, 2, 3, 5
	Total hyp.	4	4	4	4	4	4
		12			12
		24

Table 4.

Testing phase planning. Refer to the Subsection “Results selection criteria” for details on the two criteria used.


Criteria	MDE						MBL
Controlling for	B-MONTH			B-SEASON			B-MONTH		B-SEASON
Days of lag	6	10	10	6	10	4	3	4	3	3
Participants	KW	KW	KW	KW	KW	KW	KW	KW	KW	KW
Hypothesis	H4	H6	H9	H4	H6	H9	H4	H9	H4	H9
Total hyp.	3			3			2		2
	6						4
	10

Exploratory and confirmatory phases planning. Testing phase planning. Refer to the Subsection “Results selection criteria” for details on the two criteria used. We define lag as the number of days preceding the exposure day, including the exposure day, the day for which the DV was obtained for each hypothesis. The lag, therefore, defines the number of days for which the data has been acquired for DV. For example, using a lag of two days means the hypothesis considered data from the DV on the study day (t) and each of the two days before (t-1 and t-2), thus enabling the analysis of the variation of the DV before the exposure. There are 240 hypotheses for Exploratory A, 140 for Exploratory B, 10 hypotheses for the Testing phase, and 24 for the Confirmatory phase. These are defined as follows: The Testing phase included several tasks as follows. (1) Using the model fit with data from part B of KW's data during Exploratory B, we predicted the DV values of part A and added Gaussian noise to each prediction using the standard deviation (SD) of the model residuals from part B. (2) We estimated the association of the IDV with the DV, or “naive effect estimate,” by comparing the means of the noisy predicted DV values between high/low or present/absent exposure (and its t test P value) using the first method; hence, we refer to this as the “naive method.” (3) We also assessed the fit of that same model on KW's data from part A, using the mean squared error (MSE) of its residuals. (4) Then we predicted the DV values (with noise, as before) of part A using the MoTR method, now calculating the hypothetical suggested effect of the IDV on the DV (and its t test P value). Exploratory A Phase—this phase was intended to explore the impact on analytic results of bigger lags (from 1 to 10 days), as well as the changes in DV produced by controlling for interactions within IDVs. It was applied to all Type A hypotheses (H1, H2, H3, and H5) one at a time. Exploratory B Phase—similarly to Exploratory A, this phase aimed at exploring the impact on analytic results of lags bigger than one day (lag from 1 to 10 days), but on a longer period of days and only from one participant (KW), non-overlapping with the respective participant's data on Type A hypotheses. It was applied to all Type B hypotheses (H4, H6, H7, H8, H9, H10, and H11) one at a time. Testing Phase—the goal of this phase was to assess the accuracy of using a model fitted on 1278 days of KW's data (Type B time frame) to predict the 148 days after (Type A time frame). Using the machine learning model fitted for KW's data on the Exploratory B phase (one repetition for each of the Type B hypotheses), we predicted the DV values for each study day on Type A's data. Confirmatory Phase—this phase aimed to assess the suggested effect (or not) of each hypothesis’ IDV on the DV without using any lag bigger than 1 day (i.e. lag 1 only) nor using interactions within the IDVs as controls. It was applied to all Type A hypotheses (H1, H2, H3, and H5) one at a time. In the end, we compared both the fit of the model on observed data in parts B and A measured as their MSEs and the difference between the naive effect estimate of the IDV in part A with its hypothetical suggested effect estimated using MoTR. We chose to compare model fit between parts B and A using the MSE rather than R-squared because this metric expresses the same qualitative information as the R-squared in how well the model explains random variation in the DV. However, the MSE also preserves the original scale of the DV, such that it conveys this added information that is masked when calculating the R-squared. The dataset processing, programming language libraries used, and the original code used to deploy the MoTR method are described in the Supplementary Materials and Formulas at the end of this paper. The data flow between hypotheses and phases is represented in Figure 1.

Figure 1.

Representation of the data flow between the different hypotheses and phases of the methods. The orange arrows represent the hypotheses pool used in each phase (e.g. the Testing phase only considered the hypotheses selected by Exploratory B from the entire pool). (WM = wearable-measured, CB = calendar-based, SR = self-reported, KW = Katarzyna Wac, EJD = Eric J. Daza, IM = Igor Matias).

Results

Missing data imputation

Of all the 148 total days of data per person used in Type A hypotheses, TST, SAT, sleep HR, and DIF-HR was missing on two days (1.35%) for KW's data and 4 days (2.70%) for IM's data, with no missing data for EJD. From the 1278 days of data used in Type B hypotheses, TST was missing on 28 days (2.19%), SAT was missing on 46 days (3.60%), and sleep HR and DIF-HR were missing on 30 days (2.35%). We considered the data to be missing at random.[93,94] The missing data were imputed using linear interpolation for the missing values only, keeping the original values that were not missing in the interval.

Results selection criteria

Although we calculated results for all lags (i.e. 1 to 10) for all the hypotheses in Table 2, only the models with the most interesting results are presented in detail and discussed. Each Exploratory A and B hypothesis model included only one of 10 possible lags, chosen using the following criteria. (The Confirmatory hypotheses had only one model each, with a lag of one day; hence, we report all their results, and no results selection was done for the Confirmatory hypotheses.) We used the following two criteria to select models with the most interesting results. Most discernible effect (MDE)—We selected the model with the lowest P value after applying the MoTR method. MDE, best-fitting model, and largest confounding influence (MBL)—We selected the model that jointly met three criteria: the lowest P value after the MoTR method was applied (MDE); the smallest value of each model's Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and F statistic P value; and the largest value of the confounding influence, defined as the absolute difference between the mean differences in outcomes under the two different exposure levels (e.g. low and high), before and after applying MoTR. Please refer to the Supplementary Materials and Formulas section for results and discussion using this criterion.

MDE criterion

The MDE criterion selects the model that produces the most statistically discernible suggested effect among all candidate models. The selected model has the greatest statistical evidence (i.e. lowest t test P value) that there is a true mean difference in its predicted noisy outcomes between the two IDV levels (e.g. low/high exposure), after following the MoTR procedure to randomize the IDV. Recall that randomizing the IDV makes this mean difference an estimate of a suggested causal effect of the IDV on the DV—not just an association or correlation between IDV and outcome. Note that the model selected using the MDE criterion does not necessarily produce the largest estimated suggested effect. The correct interpretation is that the selected model has the most statistical evidence for the existence of an effect of the IDV on the DV, which may be large or small. However, it can only ever be a suggested effect; the model may or may not resemble the true data-generating model needed to calculate the true, unknown effect (if any). Recall that this is the main limiting assumption of the MoTR method, and indeed, of all models used in causal inference for observational studies (wherein the exposure or IDV was never randomized).

Procedure

After the first round of model selection, we still had 16 results for the Exploratory A phase (i.e. eight for MDE and eight for MBL), and 28 for Exploratory B (i.e. 14 for MDE and 14 for MBL). Thus, we performed a second round of selection of results. In this second round, for each hypothesis, we selected the model with both P ≤ .05 and an estimated suggested effect higher than the minimally important difference in the outcome from the clinical perspective (defined in Subsection “Daily-life stressors: exercise, sleep, alcohol and food, psychological stress, aircraft travels, and nighttime HR”) and at the same time higher than device's error as defined in Subsection “Accuracy of sleep duration, steps, distance, and heart rate monitored with Fitbit and Apple Watch.” Therefore, we consider a suggested effect of at least 11 min (660 s) for TST, at least 0.035 (KW and EJD) or 0.039 (IM) meters for step length (i.e. 5% of the mean from all three individuals), and at least 2 BPM for heart rate data. These inclusion criteria are summarized in Table 5.

Table 5.

Results inclusion criteria: minimally important difference in exposure/outcome considered.

DV	Minimum effect of IDV on the DV
TST	660 s (11 minutes)
Step length—KW	0.035 m
Step length—EJD	0.035 m
Step length—IM	0.039 m
HR	2 BPM

Abbreviations: TST: total sleep time; and HR: heart rate.

Results inclusion criteria: minimally important difference in exposure/outcome considered. Abbreviations: TST: total sleep time; and HR: heart rate. Following the second round's selection above described, applying the MDE criterion resulted in the inclusion of results for H1 and H3 (only from IM's data) for the Exploratory A phase, and results for H4, H6, H7, and H9 for Exploratory B (KW's data). MBL resulted in the inclusion of results for H1 and H3 (only from IM's data) for the Exploratory A phase and results for H4 and H9 for Exploratory B (KW's data).

Exploratory phases results using MDE criterion

Exploratory A—suggested effect of SAT on TST (H1)

For H1, only IM's data yielded results with a t test P value below .05 and a suggested effect greater than 660 seconds (11 minutes). KW's data led to P values higher than .10 and EJD's higher than .23, thus being discarded. Following the selection process previously described, Table 6 presents the metrics of the lag with the lowest t test P value for each type of control. When controlling for A-MONTH, the lowest t test P value was .026 for two days of lag, measuring a suggested effect of 1286.901 seconds more ( + 21 minutes and 26.901 seconds) on TST when IM's number of SAT was higher than his daily mean. When controlling for A-SEASON, the lowest t test P value was .027, very similar to the one controlling for A-MONTH, for one day instead of two, for which the suggested effect has the opposite meaning, that is, when IM's SAT values are above mean it results in a decreased TST of −1101.077 seconds (−18 minutes and 21.077 seconds), instead of a positive suggested effect as before.

Table 6.

Selected results from Exploratory A and Exploratory B phases using MDE criterion. Time values are in the format “minutes:seconds.”

Exploratory A
	H1	IM's data	Controlling for	A-MONTH	A-SEASON
			lag	2 days	1 day
			IDV effect on DV	+ 21:26.901	−18:21.077
			t test P value	.026	.027
	H3	IM's data	Controlling for	A-MONTH	A-SEASON
			lag	1 day	6 days
			IDV effect on DV	+ 5:61 BPM	+ 2.53 BPM
			t test P value	< .001	.057
Exploratory B
	H4	KW's data	Controlling for	B-MONTH	B-SEASON
			lag	6 days	6 days
			IDV effect on DV	+ 2.84 BPM	+ 4.99 BPM
			t test P value	.004	.002
	H6	KW's data	Controlling for	B-MONTH	B-SEASON
			lag	10 days	10 days
			IDV effect on DV	+ 4.58 BPM	+ 10.90 BPM
			t test P value	.051	.020
	H7	KW's data	Controlling for	B-MONTH	B-SEASON
			lag	2 days	1 day
			IDV effect on DV	+ 3.36 BPM	+ 2.523 BPM
			t test P value	.051	.016
	H9	KW's data	Controlling for	B-MONTH	B-SEASON
			lag	10 days	9 days
			IDV effect on DV	−4.00 BPM	−6.63 BPM
			t test P value	.001	.001

Selected results from Exploratory A and Exploratory B phases using MDE criterion. Time values are in the format “minutes:seconds.” Because of this inverse suggested effect, while controlling whether for month or season (the only difference between A-MONTH and A-SEASON), Table 7 presents the direct comparison of the two selected results with their correspondence (labeled as “not selected”) on the other control type, that is, the results using the same lag size. Although the correspondent lags do not surpass the minimum 660 seconds for being plausible, we can still confirm that the positive/negative suggested effect stays the same when controlling for A-MONTH (always positive) or A-SEASON (always negative). These results will be further discussed in the later sections of this article.

Table 7.

Comparison of the selected results for H1 (Exploratory A), using the MDE criterion, with its correspondents (same days of lag) on the other control type. Time values are in the format “minutes:seconds.”

IM's data	Controlling for	A-MONTH	A-SEASON
	lag	2 days	2 days (not selected)
	IDV effect on DV	+ 21:26.901	−6:26.480
	t test P value	.026	.037
IM's data	Controlling for	A-MONTH	A-SEASON
	lag	1 day (not selected)	1 day
	IDV effect on DV	+ 6:34.288	−18:21.077
	t test P value	.048	.027

Exploratory A: Suggested effect of SAT on DIF-HR (H3)

Like for H1, for H3, only IM's data yielded results with a t test P value below .05, this time with a suggested effect higher than 2 BPM. KW's data led to P values higher than .07 and EJD's higher than .20, thus being discarded. Like the last stated hypothesis, Table 6 presents the metrics of the lag with the lowest t test P value for each type of control. Although the t test P value when controlling for A-SEASON is slightly above .05, we still consider it. Thus, whether controlling for A-MONTH or A-SEASON, when IM's SAT values are above the daily mean, the difference between the highest and the lowest HR during sleep increases (5.61 BPM controlling for A-MONTH with one day of lag, 2.53 BPM controlling for A-SEASON with six days of lag).

Exploratory B: Suggested effect of socializing on DIF-HR (H4)

While for the Exploratory A phase, we screened all the results from all three data sources (KW, EJD, and IM), the only data used for the Exploratory B phase came from KW, as previously described in this article. For the first selected results, that is, for the suggested effect of socializing on nighttime HR, the lowest t test P value was obtained when considering six days of lag for both control types. The suggested effect was positive in both controls when KW's data reported the existence of socializing, increasing the nighttime HR after by 2.84 BPM (controlling for B-MONTH) and 4.99 BPM (controlling for B-SEASON), as shown in Table 6.

Exploratory B: Suggested effect of yoga on nighttime HR (H6)

Like for H4, for the hypothesis of yoga affecting the nighttime HR (H6), the results showed a positive suggested effect with 10 days of lag on both control types, as detailed in Table 6. When controlling for B-MONTH, yoga exercise affects the HR during sleep after at + 4.58 BPM and + 10.90 BPM when controlled for B-MONTH and B-SEASON, respectively. Like for H3 above, we considered the value when controlling for B-MONTH even with a P value slightly above .05.

Exploratory B: Suggested effect of exercise on nighttime HR (H7)

As in Table 6, the suggested effect of exercise on nighttime HR was positive as the suggested effect of yoga. While controlling for B-MONTH, we found a positive suggested effect of 3.36 BPM with two days of lag. Controlling for B-SEASON allowed us to reveal a possible positive effect of 2.52 BPM with one day of lag.

Exploratory B: Suggested effect of tired/sick/stress on nighttime HR (H9)

Inversely to socializing, yoga, and exercise, the presence of a tired/sick/stress state during the day of KW revealed a negative suggested effect on the average nighttime HR, with 10 days of lag and nine days of lag while controlling for B-MONTH and B-SEASON, respectively, as shown in Table 6. The strongest suggested effect was found while controlling for B-SEASON with −6.63 BPM of change, compared with −4.00 BPM when controlling for the other type. Table 8 presents a comparison between the a priori hypotheses from both phases Exploratory A and B and the results obtained following the MDE criterion.

Table 8.

Comparison of the a priori hypotheses and the results obtained using the MDE criterion. For H1 there were different results when controlling for A-MONTH (increase) and A-SEASON (decrease).

	A priori				Results with MDE
Hyp./participant	Exposure	Exp. Change	Outcome	Out. change	Out. change	Result
H1/IM	SAT	Increase	TST	Increase	Inc./Dec.	Inconclusive
H3/IM	SAT	Increase	DIF-HR	Decrease	Increase	Not supported
H4/KW	Socializing	Presence	DIF-HR	Increase	Increase	Supported
H6/KW	Yoga	Presence	Nighttime HR	Decrease	Increase	Not supported
H7/KW	Exercise	Presence	Nighttime HR	Decrease	Increase	Not supported
H9/KW	TSS	Presence	Nighttime HR	Increase	Decrease	Not supported

Abbreviations: TST: total sleep time; SAT: steps per awake time; and TSS: tired/sick/stress.

Comparison of the a priori hypotheses and the results obtained using the MDE criterion. For H1 there were different results when controlling for A-MONTH (increase) and A-SEASON (decrease). Abbreviations: TST: total sleep time; SAT: steps per awake time; and TSS: tired/sick/stress.

Testing phase for the results using MDE criterion

As described in the Subsection “Exploratory, Testing, and Confirmatory Phases,” the Testing phase was intended to assess the accuracy of the models fitted with part B of KW's data (from Type B time frame) for predicting data from part A (from Type A time frame). Because the models used were selected according to the two criteria used (MDE and MBL), this first subsection presents the results only for the models obtained from the results chosen using MDE (H4, H6, H7, and H9). Table 9 shows the metrics for hypotheses H4, H6, and H9. The testing phase could not be applied to H7 because KW's data did not include any positive values for Exercise in part A of the data.

Table 9.

Testing phase's results for the hypotheses selected using the MDE criterion.

H4	KW's data	Controlling for	B-MONTH	B-SEASON
		lag	6 days	6 days
		R² in B	0.312	0.290
		MSE in B	0.001	0.001
		IDV effect on DV (naïve method)	0.12 BPM	0.12 BPM
		t test P value (naïve method)	.885	.885
		R² in A	−0.143	−0.078
		R² in B − R² in A	0.455	0.368
		MSE in A	0.001	0.001
		MSE in B − MSE in A	0.000	0.001
		IDV effect on DV (MoTR)	−0.43 BPM	−0.72 BPM
		t test P value (MoTR)	.221	.224
H6	KW's data	Controlling for	B-MONTH	B-SEASON
		lag	10 days	10 days
		R² in B	0.291	0.281
		MSE in B	0.001	0.001
		IDV effect on DV (naïve method)	−1.23 BPM	−1.23 BPM
		t test P value (naïve method)	.082	.082
		R² in A	−0.364	−0.598
		R² in B − R² in A	0.655	0.879
		MSE in A	0.001	0.001
		MSE in B − MSE in A	0.000	0.000
		IDV effect on DV (MoTR)	−1.50 BPM	−2.27 BPM
		t test P value (MoTR)	.670	.660
H9	KW's data	Controlling for	B-MONTH	B-SEASON
		lag	10 days	4 days
		R² in B	0.302	0.283
		MSE in B	0.001	0.002
		IDV effect on DV (naïve method)	−1.35 BPM	−1.31 BPM
		t test P value (naïve method)	.134	.141
		R² in A	0.014	−0.090
		R² in B − R² in A	0.288	0.373
		MSE in A	0.001	0.001
		MSE in B − MSE in A	0.001	0.001
		IDV effect on DV (MoTR)	−1.31 BPM	−0.89 BPM
		t test P value (MoTR)	.576	.422

Testing phase's results for the hypotheses selected using the MDE criterion.

Suggested effect of socializing on DIF-HR (H4)

The difference between the MSE of the model in B and the MSE in A is almost null for both control types (B-MONTH and B-SEASON). However, the suggested effect of IDV and t test P value between the naive method and the MoTR method requires additional attention. The t test P value is notably smaller (approximately four times) when using the MoTR method, even though it is always above .05. When measuring the suggested effect of Socializing on DIF-HR, the biggest value is obtained when controlling for B-SEASON and using the MoTR method (−0.72 BPM). However, none of the calculated suggested effects is above the minimum suggested effect defined in Table 5, and we obtain a positive impact when using the naive method. In contrast, it is negative if we consider the MoTR method.

Suggested effect of yoga on nighttime HR (H6)

The suggested effect of yoga on nighttime HR is negative whether we consider the naive method or the MoTR method, always being less than the minimum suggested effect defined in Table 5 except when controlling for B-SEASON and using the MoTR method. The model fitted has an MSE of 0.001 in part A and part B, being this fitting difference is virtually non-existent. However, the t test P value is always higher than .05, making all the results not statistically discernible—the naive method gave lower P values than the MoTR.

Suggested effect of tired/sick/stress on nighttime HR (H9)

Like the H6, the suggested effect of tired/sick/stress on nighttime HR was always below the minimum suggested effect defined in Table 5, being approximately the same between the naive method and the MoTR method when controlling for B-MONTH. The suggested effect was smaller when controlling for B-SEASON while using the MoTR method, although the lag differed between the two control types. The MSE differed by approximately 0.001 between the two processes. In Table 9, note that the R-squared values for part B are positive for both H9 controlling for B-MONTH, and for H6 controlling for B-SEASON. This makes sense because the outcomes are predicted using the model fit to the same data in each case. However, while the R-squared value is positive for part A for H9 controlling for B-MONTH, the R-squared is negative for part A for H6 controlling for B-SEASON. This is because the R-squared formula (see Supplementary Materials and Formulas) relies on the empirical overall mean of the observed outcomes in the target dataset in which predicted outcomes are calculated. To elaborate, predicted outcomes are created using a model fit to a dataset with a certain empirical mean (e.g. part B). If the empirical mean of the new target dataset (e.g. part A) differs from the original dataset's empirical mean, then the R-squared value calculated using predictions calculated using the original model can be negative. The mean of the predicted values will resemble that of the original dataset's outcomes, while the new dataset's overall mean outcome will not. Such a difference in empirical means between the target dataset's predicted and observed values is shown between the two example cases mentioned above in Figures 2 and 3.

Figure 2.

Scatter plot comparing the observed (original) with the predicted DV values for H6, controlling for B-SEASON with 10 days of lag. Results were selected using the MDE criterion.

Figure 3.

Scatter plot comparing the observed (original) with the predicted DV values for H9, controlling for B-MONTH with 10 days of lag. Results were selected using the MDE criterion.

Scatter plot comparing the observed (original) with the predicted DV values for H6, controlling for B-SEASON with 10 days of lag. Results were selected using the MDE criterion. Scatter plot comparing the observed (original) with the predicted DV values for H9, controlling for B-MONTH with 10 days of lag. Results were selected using the MDE criterion.

Confirmatory phase results

As shown in Table 10, most of the results obtained during the Confirmatory phase are not statistically discernible at the .05 level of statistical significance. Hence, the data we have cannot infer any conclusions about 19 of these 24 hypotheses reliably.

Table 10.

Confirmatory phase's results for all the studied hypotheses. Time values are in the format “minutes:seconds.”

H1	KW's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−13:35.953	−8:54.645
		t test P value	.142	.153
	EJD's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	+ 00:22.307	−00:19.555
		t test P value	.512	.538
	IM's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−23:24.859	−16:25.250
		t test P value	.027	.022
H2	KW's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	+ 0.014 m	+ 0.006 m
		t test P value	.014	.683
	EJD's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−0.002 m	−0.002 m
		t test P value	.543	.655
	IM's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−0.003 m	−0.006 m
		t test P value	.442	.460
H3	KW's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−517.69 BPM	+ 9.20 BPM
		t test P value	.116	.169
	EJD's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	+ 1.92 BPM	−0.59 BPM
		t test P value	.472	.482
	IM's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	+ 1.66 BPM	+ 0.72 BPM
		t test P value	.290	.128
H5	KW's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−2.23 BPM	+ 2.99 BPM
		t test P value	.126	.168
	EJD's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	+ 0.26 BPM	+ 1.75 BPM
		t test P value	.558	.520
	IM's data	Controlling for	C-MONTH	C-SEASON
		IDV effect on DV	−0.09 BPM	−0.06 BPM
		t test P value	.017	.017

Confirmatory phase's results for all the studied hypotheses. Time values are in the format “minutes:seconds.” Nevertheless, if we apply the same filtering criteria defined in subsection “Results selection criteria,” we can only use one group of results. That is the H1 using IM's data, showing a negative suggested effect of the SAT (when greater than the daily average) in the TST of −1404.86 seconds (−23 minutes and 24.86 seconds) and −985.25 seconds (−16 minutes and 25.25 seconds), while controlling for C-MONTH and C-SEASON, respectively. We also focus on the result obtained for H3 with KW's data while controlling for C-MONTH, as the suggested effect obtained with the MoTR method is impossible to obtain in real life (−517.69 BPM). The most plausible explanation for this result is that we used noise simulation for every MoTR implementation, with a value randomly generated out of a normal distribution, with the mean being the mean value of the data with which the model was fitted previously. By doing so, this specific simulation occasionally generated too many outliers or one extra distant outlier that increased the noise inputted in one or more simulations. It is important to mention that this simulation was repeated during the results-making process, so we could be sure this was not caused by any code or compiler error—this was possible because MoTR implementation has a fixed initial seed for all the randomly generated values.

Discussion

Discussion of the results using the MDE criterion

Of all the a priori hypotheses from the Exploratory A phase, only the results of two were included following the inclusion criteria defined in Subsection “Results Selection Criteria,” and both results were based on the IM's data (H1 and H3). One of these hypotheses, H3, was not supported. The other, H1, was supported when controlling for A-MONTH but not supported when controlling for A-SEASON. That may tell us that the month to which the data is referring influences the suggested effect of SAT on the TST A possible explanation for that is that the data used by this type of hypothesis (A) includes a total of six months (August to January) and a total of three seasons (Summer, Autumn, and Winter), for which Summer and Winter are only represented by approximately one and two months, respectively. In contrast, Autumn is fully represented with three months of data. If we assume the month influences the outcome being studied (TST), then we might be looking at a version of Simpson's Paradox, in which the effect size of every month is lost when we combine them into seasons, thus explaining why we get a positive suggested effect when controlling for all the months, but we get a negative suggested effect when we control for the season, that is, a combined version of the months. According to this paradox, the correct result is given when controlling A-MONTH. That is, the hypothesis can be supported. Additionally, this observed difference might be caused by different distributions of physical activity—steps—throughout the daytime, which may depend on the season or weather conditions, for example. From the Exploratory B phase, only H4 was supported, showing that KW's socializing events could positively have affected the DIF-HR. The difference between the maximum and minimum heart rate registered during the following sleep period increased. The biggest suggested effect was obtained with six days of lag, that is, considering the history of the past six days. That means that the KW's DIF-HR values are dependent on the days before. A possible cause of that is that socialization happened on a weekend day (sixth day), with a stressful proceeding week, in which DIF-HR was slightly higher than usual mean DIF-HR; the socialization event resulted in the following DIF-HR notably higher than usual. In the case of days with usual DIF-HR, followed by one day of socialization, the resulting nighttime DIF-HR may not be notably higher than usual; the body metabolizes” well socialization exposure. All the other hypotheses (H6, H7, and H9) were not supported, for which the MoTR method always revealed the contrary change in the outcome variables compared to the initial hypotheses. Although, if we compare the results using MoTR with the suggested effect direction shown by the original data (cf. Table 11), H7 and H9 are supported by the original data and not by MoTR. A possible cause is that the MoTR method meticulously evaluates the causation between the IDV and DV variables, while the original data can give false conclusions about that causation.

Table 11.

	A priori	Results with MDE	Results with naïve method
Hypothesis	Outcome change	Outcome change	Outcome change

H4	Increase	Increase	Decrease
H6	Decrease	Increase	Increase almost always (decrease with 10 days of lag)
H7	Decrease	Increase	Decrease
H9	Increase	Decrease	Increase

Comparison between the outcome change previewed by the a priori hypotheses, obtained by the results selected using the MDE criterion, and the change observed using the naive method for H4, H6, H7, and H9 (note: only applies to participant KW). Refer to Table 8 for details on the hypotheses. For all the hypotheses, the naive method resulted in the same suggested effect direction, whether controlling for B-MONTH or B-SEASON. When looking at the results selection criteria, there might have been some results being discarded erroneously because the minimum suggested effect definitions (cf. Table 5) were considered as the highest possible device error according to the literature. Because multiple devices were used to collect the data, there is a possibility that some of the data originating from a more accurate device still got rejected according to the minimum value inclusion thresholds defined in this study. The main goal of the Testing phase was to test if the model used in MoTR could be used in future data, that is, data in which the model would have never been trained/fitted. In this analysis, we assessed two results: (1) the model fitting and (2) the model estimates for the new data. Firstly, as shown in Table 9, the R-squared in A values were almost always negative except for H9 controlling for B-MONTH. That means that the estimates of the model for the new data (part A) were shifted from the actual original new data (original values of part A), thus revealing that even with the model being accurate in estimating the new data values, adjustments must be made to obtain them in the right range of values. The smallest differences between the R-squared in B and the R-squared in A are for H9 (controlling for B-MONTH and B-SEASON) and H4 (controlling for B-SEASON), all having a difference below 0.400. These three results were selected based on an arbitrary decision of including half of the total amount of results, although this can be shifted to suit the researcher's needs. Second, to evaluate the model estimates for the new data, we should compare the direction of the suggested effect measured by MoTR in the new data with the result shown in Table 8. Picking only the H4 and H9 selected as previously detailed, the IDV suggested effect on DV for H4 was a decrease of the DV, that is, the contrary direction of the suggested effect estimated by the model for the type B data (in which it was trained). A note should be made about the direction of the suggested effect calculated using the naive method for the H4, which increases the DV identically to the results in data of type B. For H9, both the naive and the MoTR methods estimated a negative suggested effect of IDV, that is, a decrease of the DV, consistent with the results in type B data. However, all the P values of H4 and H9 testing phase results were higher than .05, making those results not statistically discernible. From the fourth and last phase of results, the Confirmatory, only the H2 and H5 were supported, although for different participants (H2 for KW and H5 for IM) and never with a discernible result, that is, always with a suggested effect estimated below the minimum suggested effect inclusion criteria (cf. Table 5—0.035 m for KW's step length and 2 BPM for all HR values). Because this is an observational study and not an interventional one, no causality can be strongly concluded but only hypothesized, using the presented MoTR method to do so.

Potential unobserved confounding variables

Additionally, there is a possible explanation for not obtaining any statistically discernible result for KW's nor EJD's data on the Exploratory A phase. Because the used data of type A was collected from 14 August 2020 until 8 January 2021 (cf. Subsection “Participants and Collected Data Definition”), the lockdown due to the COVID-19 pandemic may have been a confounder that this study did not account for. Three types of behavior occurred with the three participants: KW had a relatively active daily life, walking to work every day as usual; EJD was in lockdown for approximately half of the data collection time; IM was in lockdown during almost all the days of data collection. That may help justify why EJD's data did not show the needed consistency for the results to be statistically discernible. As for KW's data, although she did not stop the normal daily physical activity involved in commuting to work every day, there may have been context changes influencing the measurements made during that period, and for which this study did not account for, like for example changes in social interaction, travels, or in exercise patterns. Finally, we acknowledge some possible confounders that this study did not account for when applying the methods. The country where KW lived might have impacted her data, as in study period B (February 2017–August 2020) she was moving every few months between Denmark, Switzerland, and the United States of America (mostly during Summer), that have had influenced her overall lifestyle patterns, the sleep and steps taken per day and exercising, but also patterns of nutritional intake that influences the metabolism and hence the HR patterns. For EJD's and IM's data, it is possible that external factors might have influenced the collected data in addition to lockdown, namely alcohol intake, late meals, and visual and psychological stimulation (e.g. watching movies, working until late, mobile devices used before or in bed) that were not measured and can influence HR levels and the TST

Applicability of the method

The N-of-1 Observational Study method presented here provides a new tool to interpret self-collected data and correlate it with daily-life stressors. Specifically, the MoTR method is presented as a new tool to assess potential causality using intensive longitudinal data.[96-98] One use case of the MoTR method is to help develop N1RTs for diagnosis or intervention/treatment. After applying it to data, researchers can use this method to select findings with the largest statistically discernible differences from the naive. That will indicate possible confounding variables that can change decisions on a future intervention. Finally, MoTR can also help discuss a possible intervention plan by analyzing the highest potential causality between intensive longitudinal data and the outcome expected to be affected during a study. Thus, this novel method is intended to be used with one's data. However, the best practice would be to (a) use occasional self-screening mental health questionnaires and (b) work with a health expert to analyze the data and conclude about them.

Conclusions and future work areas

Self-tracking devices are nowadays very common and used for a multitude of purposes. Most of those are used to track sleep, heart rate, and physical activity data only to enable a generic self-perception of one's daily behaviors and state of the body. They are ubiquitous, simple, and useful tools to conclude the individual's behavior patterns. Additionally, if used longitudinally, they can also enable the acquisition of datasets that can further help understand how certain behaviors and external factors (such as stressors) impact the physiology and functioning of the body. Therefore, this study implemented the MoTR method to evaluate the suggested effects of daily stressors on nighttime heart rate, sleep time, and physical activity in an individualized way: via the N-of-1 approach. For one of the three participants (IM), we found that physical activity can increase the nighttime heart rate amplitude, whereas there are no strong conclusions about its suggested effect on TST. For one of the other participants (KW), socializing, yoga, and self-reported exercise may have increased the nighttime heart rate. On contrary, being tired/sick/stressed (a self-reported state) may have decreased the nighttime heart rate. Our study had the following limitations. The interval of collection of data of type A might have been too short to accurately evaluate the suggested effect of the selected IDVs (approximately only five months long, under changing seasons and variable COVID-19 conditions). There were only self-reported and wearable-collected data daily, thus losing any detail that an intra-day sampling might have provided. The data regarding physical activity—steps—had no detail of the moment of the day. Thus, the results in this paper only focus on the possible effect of the aggregate total number of steps rather than the steps taken during, for example, the morning, afternoon, or night periods. This might also have caused the differences between the two groups of control data when estimating the effect of SAT on TST The self-reported (except stress) data were coded by the user (KW) at the moment of this study deployment (early/mid-2021), possibly containing a bias based on unclear calendar notes/events for 2017–2020. Specifically, when collecting stress data, KW did not account for its use in this study, using it as momentary week-to-week management of health and work/life balance. Thus, a minimal bias is also expected in this data. The lockdown during the COVID-19 pandemic might have influenced the measured suggested effects of the IDV's, especially for one of the three participants (IM) who was in an almost full lockdown during the collection period for data of type A. For example, this lockdown might have interfered with the participant's sleep-wake regimen before its start. Additionally, many other confounders likely influenced the variables measured in both types of data (self-reported and wearable-collected). Future studies shall focus on collecting, analyzing, and modeling intraday data for the hypotheses stated during this study. A longer data collection period would be beneficial too. We will also assess multilevel models for this same approach. Data collected before the COVID-19 lockdown could help understand this paper's suggested effects by comparing the conclusions before and after that moment. Finally, a randomized control trial should also be conducted to test the causality of variables suggested by the methods presented here. Additionally, in a future study, rather than using the AIC, BIC, and F statistics to perform model selection in order to meet the MBL criterion (cf. Subsection “Results Selection Criteria” and Supplementary Materials And Formulas), we may instead apply k-fold cross-validation. For example, we might conduct a leave-one-out cross-validation by first calculating each model's predicted residual sum-of-squares (PRESS) statistic, and then selecting the model with the highest PRESS statistic. We would then fit this model on all training data and use its estimated coefficients to predict values in any new (i.e. test) dataset. Click here for additional data file. Supplemental material, sj-docx-1-dhj-10.1177_20552076221120725 for What possibly affects nighttime heart rate? Conclusions from N-of-1 observational data by Igor Matias, Eric J. Daza and Katarzyna Wac in Digital Health

76 in total

1. A time-varying effect model for intensive longitudinal data.

Authors: Xianming Tan; Mariya P Shiyko; Runze Li; Yuelin Li; Lisa Dierker
Journal: Psychol Methods Date: 2011-11-21

2. Useful theories should apply to individuals.

Authors: Derek W Johnston; Marie Johnston
Journal: Br J Health Psychol Date: 2013-05-31

3. Long-term prognostic value of resting heart rate in patients with suspected or proven coronary artery disease.

Authors: Ariel Diaz; Martial G Bourassa; Marie-Claude Guertin; Jean-Claude Tardif
Journal: Eur Heart J Date: 2005-03-17 Impact factor: 29.983

4. Relation between job strain, alcohol, and ambulatory blood pressure.

Authors: P L Schnall; J E Schwartz; P A Landsbergis; K Warren; T G Pickering
Journal: Hypertension Date: 1992-05 Impact factor: 10.190

5. The n-of-1 clinical trial: the ultimate strategy for individualizing medicine?

Authors: Elizabeth O Lillie; Bradley Patay; Joel Diamant; Brian Issell; Eric J Topol; Nicholas J Schork
Journal: Per Med Date: 2011-03 Impact factor: 2.512

Review 6. Resting heart rate in cardiovascular disease.

Authors: Kim Fox; Jeffrey S Borer; A John Camm; Nicolas Danchin; Roberto Ferrari; Jose L Lopez Sendon; Philippe Gabriel Steg; Jean-Claude Tardif; Luigi Tavazzi; Michal Tendera
Journal: J Am Coll Cardiol Date: 2007-08-13 Impact factor: 24.094

7. Circadian profile of cardiac autonomic nervous modulation in healthy subjects: differing effects of aging and gender on heart rate variability.

Authors: Hendrik Bonnemeier; Gert Richardt; Jürgen Potratz; Uwe K H Wiegand; Axel Brandes; Nina Kluge; Hugo A Katus
Journal: J Cardiovasc Electrophysiol Date: 2003-08

8. Accuracy of Consumer Wearable Heart Rate Measurement During an Ecologically Valid 24-Hour Period: Intraindividual Validation Study.

Authors: Benjamin W Nelson; Nicholas B Allen
Journal: JMIR Mhealth Uhealth Date: 2019-03-11 Impact factor: 4.773

9. Wearable activity monitors to assess performance status and predict clinical outcomes in advanced cancer patients.

Authors: Steven Piantadosi; Arvind M Shinde; Gillian Gresham; Andrew E Hendifar; Brennan Spiegel; Elad Neeman; Richard Tuli; B J Rimel; Robert A Figlin; Curtis L Meinert
Journal: NPJ Digit Med Date: 2018-07-05

10. The Minimal Important Difference in Physical Activity in Patients with COPD.

Authors: Heleen Demeyer; Chris Burtin; Miek Hornikx; Carlos Augusto Camillo; Hans Van Remoortel; Daniel Langer; Wim Janssens; Thierry Troosters
Journal: PLoS One Date: 2016-04-28 Impact factor: 3.240