Keenan Hawekotte1, Susan E Luczak2, I G Rosen1. 1. Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA. 2. Department of Psychology, University of Southern California, Los Angeles CA 90089, USA.
Abstract
The posterior distribution (PD) of random parameters in a distributed parameter-based population model for biosensor measured transdermal alcohol is estimated. The output of the model is transdermal alcohol concentration (TAC), which, via linear semigroup theory can be expressed as the convolution of blood or breath alcohol concentration (BAC or BrAC) with a filter that depends on the individual participant or subject, the biosensor hardware itself, and environmental conditions, all of which can be considered to be random under the presented framework. The distribution of the input to the model, the BAC or BrAC, is also sequentially estimated. A Bayesian approach is used to estimate the PD of the parameters conditioned on the population sample's measured BrAC and TAC. We then use the PD for the parameters together with a weak form of the forward random diffusion model to deconvolve an individual subject's BrAC conditioned on their measured TAC. Priors for the model are obtained from simultaneous temporal population observations of BrAC and TAC via deterministic or statistical methods. The requisite computations require finite dimensional approximation of the underlying state equation, which is achieved through standard finite element (i.e., Galerkin) techniques. The posteriors yield credible regions, which remove the need to calibrate the model to every individual, every sensor, and various environmental conditions. Consistency of the Bayesian estimators and convergence in distribution of the PDs computed based on the finite element model to those based on the underlying infinite dimensional model are established. Results of human subject data-based numerical studies demonstrating the efficacy of the approach are presented and discussed.
The posterior distribution (PD) of random parameters in a distributed parameter-based population model for biosensor measured transdermal alcohol is estimated. The output of the model is transdermal alcohol concentration (TAC), which, via linear semigroup theory can be expressed as the convolution of blood or breath alcohol concentration (BAC or BrAC) with a filter that depends on the individual participant or subject, the biosensor hardware itself, and environmental conditions, all of which can be considered to be random under the presented framework. The distribution of the input to the model, the BAC or BrAC, is also sequentially estimated. A Bayesian approach is used to estimate the PD of the parameters conditioned on the population sample's measured BrAC and TAC. We then use the PD for the parameters together with a weak form of the forward random diffusion model to deconvolve an individual subject's BrAC conditioned on their measured TAC. Priors for the model are obtained from simultaneous temporal population observations of BrAC and TAC via deterministic or statistical methods. The requisite computations require finite dimensional approximation of the underlying state equation, which is achieved through standard finite element (i.e., Galerkin) techniques. The posteriors yield credible regions, which remove the need to calibrate the model to every individual, every sensor, and various environmental conditions. Consistency of the Bayesian estimators and convergence in distribution of the PDs computed based on the finite element model to those based on the underlying infinite dimensional model are established. Results of human subject data-based numerical studies demonstrating the efficacy of the approach are presented and discussed.
Historically, researchers and clinicians interested in tracking alcohol
consumption and metabolism in the field would require data from either a
drinker’s self-report or from having them use a breath alcohol analyzer.
Because both methods require active participation by the subject, the data they
produce are often plagued by inaccuracies. Self-report often leads to
misrepresentation as (1) subjects may deviate from naturalistic behaviors due to the
reporting requirement seeming unnatural, and (2) alcohol directly impairs
subjects’ ability to be an active participant [1]. Using a breath alcohol analyzer correctly requires specialized
training and can produce erroneous measurements due to mouth alcohol and/or a
reading based on a shallow breath by the subject. Dating back to the 1930’s,
ethanol, the type of alcohol in alcoholic beverages, has been known to be excreted
from the human body through the skin [2-5]. This is due to the
fact that water and ethanol are highly miscible [6] and the ethanol finds its way into all of the water in the body. More
recently, this observation paved the way for the development of a device to measure
the amount of alcohol excreted transdermally through the skin [7-9]. The
benefits derived from such a device include the availability of near continuous
measurements and the ability to collect them passively (i.e., without the active
participation of the subject). This gives researchers and clinicians the potential
to continuously observe naturalistic drinking behavior and patterns. There is also
the possibility of making these devices available on the consumer market (e.g.,
wearable body system monitoring technology like Fitbits, Apple watches, etc.). In
addition, the ideas we discuss here may also be applicable to the monitoring of
other substances once the appropriate sensor hardware has been developed.The challenge in using transdermal alcohol sensors is that they provide
transdermal alcohol concentration (TAC), whereas alcohol researchers and clinicians
have always based their studies and treatments on measurements of breath alcohol
concentration (BrAC) and blood alcohol concentration (BAC). Thus, a means to
reliably and accurately convert TAC to BrAC or BAC would be desirable. At levels up
to approximately 0.08 (see, for example, [13,
14]) BrAC correlates well with BAC via a
simple linear relationship based on an empirical relationship known as
Henry’s law [10, 11]: BAC =
ρ × BrAC,
where the constant ρ is
known as the partition coefficient of ethanol in blood and breath.More generally, according to Henry’s law, when a liquid is in contact
with a gas, the concentrations, C and
C, of a compound present in both
the liquid and the gas will come to equilibrium according to the linear relationship
C =
ρC,
where the empirical determined constant
ρ is known as the
partition coefficient for the that compound in that liquid and gas. Not
surprisingly, the partition coefficient,
ρ, is temperature
dependent and of course its actual value will vary depending on the choice of units
for C and
C. It has been shown (see, for
example, [12]) that at 34°C, the
partition coefficient for ethanol in blood and air is
ρ = 2157 ± 9.6
for men and ρ = 2195 ±
10.9 for women, at 37°C, the partition coefficient for ethanol in blood and
air is ρ = 1783 ± 8.1
for men and ρ = 1830 ±
7.8 for women. Using a regression model, Jones [12] found that at 37°C the partition coefficient for ethanol in
water and air is ρ = 2133, in
blood and air is ρ = 1756,
and between plasma (all of the components of blood with the exception of the oxygen
carrying red blood cells) and air is
ρ = 2022. All of
these values are for the case when the concentration of ethanol in air is given in
units of grams per liter, and in water, blood, or plasma in units of grams per
deciliter. We note that it is generally accepted that a BrAC reading of 0.08 percent
alcohol corresponds to .008 grams of ethanol per 210 liters of breath and a BAC of
0.08 grams of ethanol per 100 milliliters (equal to 1 deciliter (dL) or 0.1 liters
(L)) of blood.Unfortunately, however, the correlation between TAC and BrAC/BAC, on the
other hand, can vary due to a number of confounding factors. These factors include,
but are not limited to, stable features of the skin like its thickness, tortuosity,
and porosity, particularly as they apply to the epidermal layer of the skin, which
does not have an active blood supply. Environmental factors such as ambient
temperature and humidity can also affect both perspiration and vasodilation, and can
thus alter skin conductance, blood flow, the amount of alcohol passing below the
skin in the blood, and the amount and rate of alcohol diffusing through the skin.
One would also expect there to be manufacturing and operational variations among
different TAC sensors.Earlier attempts to investigate the relationship between TAC and BrAC/BAC
have used deterministic models [15-21]. Some utilized
regression-based models [16], whereas others
utilized first principles physics-based models that on occasion included modeling
the transport of alcohol all the way from ingestion to excretion through the skin
[22,23]. In our group’s initial efforts, we modeled the transport of
alcohol from the blood in the dermal layer through the epidermal layer and its
eventual measurement by the sensor using a one-dimensional diffusion equation [15, 21].
The parameters in the diffusion equation model then had to be fit or tuned (i.e.,
calibrated) to each individual subject, the environmental conditions, and the device
through the use of simultaneous BrAC/TAC training data collected in the laboratory
or clinic through a procedure known as an alcohol challenge. Once the model was fit,
it could then be used to deconvolve BrAC from TAC collected in the field. This
two-pass approach and the related studies were relatively successful [15, 19–21, 24, 25]. However,
this calibration procedure is quite burdensome to researchers, clinicians, subjects
and patients, and because the models were fit to a single uni-modal drinking
episode, unaccounted for variation and uncertainty in the relationship between BrAC
and TAC frequently arose, making it difficult to accurately convert TAC collected in
the field to BrAC [26, 27].More recently, to eliminate the need for calibration, deconvolution of BrAC
from TAC was effected using population models fit to BrAC/TAC training data from
drinking episodes across a cohort of subjects, devices, and environmental conditions
[24,25,28]. These population models
took the form of the deterministic transport models but now the parameters appearing
in the model equations were considered to be random. Then in fitting the models,
instead of estimating the actual values of these parameters, it was their joint
distributions that were estimated. Once the models were fit, they could be used to
deconvolve an estimate of the BrAC input, and by making use of the distribution of
the population parameters, conservative error bands could also be generated which
quantified the uncertainty in the estimated BrAC [24, 25]. The results in these
studies were based on a naive pooled data statistical model and a non-linear least
squares estimator.In this paper, we seek to build on the approach described in the previous
paragraph by now using a Bayesian approach to account for the underlying uncertainty
and variation in the alcohol diffusion and measurement process. We obtain posterior
distributions for the transport model parameters conditioned on the training
BrAC/TAC data and regularized by prior distributions based on deterministic fits.
Being Bayesian based, our approach yields credible sets for the estimated parameters
and what we shall refer to as conservative credible or error bands
for the deconvolved estimated BrAC. What is meant by the term conservative credible
band will be made precise later.An outline of the remainder of the paper is as follows. In the next section
of the paper we provide a description of our method including a derivation of a new
abstract parabolic hybrid PDE/ODE model for the transdermal transport of alcohol
through he epidermal layer of the skin and its capture in the vapor collection bay
of the sensor. Then using linear semigroup theory we obtain an input/output model in
the form of a discrete time convolution. A discussion of finite dimensional
approximation and convergence issues related to the use of our model to carry out
the requisite computations is also included. Then in the results section of the
paper we first construct our Bayesian estimator and present two theoretical results
related to it: convergence of the finite dimensional approximation and consistency.
We then show how our population model based on Bayesian estimates for the random
parameters can be used as part of a deconvolution scheme that yields estimated BrAC
curves and conservative credible or error bands from a biosensor provided TAC
signal. In this section we also present and discuss a sample of our numerical
findings demonstrating the efficacy of our approach. Our numerical studies were
based on human subject data collected in the Luczak laboratory in the Department of
Psychology at USC. A final section contains some discussion of our theoretical and
numerical results along with a few concluding remarks and avenues for possible
future research.
Methods
A distributed parameter model for the transdermal transport of
alcohol
As in [21] and [24], and making use of an idea recently introduced in
[28], we model the alcohol biosensor
problem described in Section 1 using a one
dimensional diffusion equation to describe alcohol transport through the
epidermal layer of the skin coupled with an inflow/outflow compartment model to
describe the perspiration vapor collection chamber of the TAC biosensor.The epidermal layer of the skin sits atop the dermal layer. The dermal
layer has an active blood supply while the epidermal layer does not. The latter
consists of both dead (the stratum corneum layer which is closest to the
surface) and living (the deeper layers closer to the dermal layer) cells
surrounded by interstitial fluid. Not having an active blood supply, the cells
in the epidermal layer obtain nourishment primarily from O2 that
diffuses in from the environment beyond the skin.The SCRAM TAC biosensor (see fig. 1
in section 3.4 below) has a perspiration
vapor collection chamber on the bottom of the sensor that sits atop, and is in
direct contact with, the stratum corneum layer of the skin’s epidermal
layer. Perspiration in vapor form collects in the chamber. A small pump extracts
a sample of the vapor from the collection chamber approximately once every 30
minutes. This sample is then electro-chemically analyzed based on an
oxidation-reduction (redox) reaction in much the same way that a fuel cell
produces a current (and heat and water) from hydrogen and oxygen. In the TAC
sensor, ethanol molecules in the sample are oxidized producing electrons in the
form of an electrical current. This current is converted into the TAC
measurement based on an a priori bench calibration.
Figure 1.
SCRAM Systems transdermal continuous alcohol monitoring device.
To make this more precise, we let Λ denote the thickness of the
epidermal layer (units: cm) of the skin at the location of the sensor and let
η denote the depth in the epidermal layer (units:
cm), 0 ≤ η ≤ Λ,
η = 0 denoting the skin surface and
η = Λ denoting the boundary between the
epidermal and dermal layers. Let t denote time (units: hrs) and
let x(t, η) denote the
concentration of ethanol at time t and depth
η in the epidermal layer (units: grams per
milliliter of interstitial fluid). Let denote the concentration of ethanol in the TAC
sensor collection chamber at time t (units: grams per
milliliter of air), and let u(t) denote the
BrAC at time t (units: grams per milliliter of air). Let
y(t) denote the TAC at time
t (units: grams per milliliter of air), and let
(units: grams per milliliter of air) and
φ0 (units: grams per milliliter of
interstitial fluid) denote the initial conditions for and x, respectively. We will
assume that there is no ethanol in either the epidermal layer or the TAC
biosensor collection chamber at time t = 0 so
and φ0 = 0.
Let T denote the duration of the drinking episode (units: hrs).
Then, With these definitions, our model takes the form
where α > 0 denotes the
effective diffusivity of ethanol in the interstitial fluid
in the epidermal layer (units: cm2/hr), β
> 0 denotes the effective linear flow rate at which
capillary blood plasma from the dermal layer replenishes the interstitial fluid
in the epidermal layer (units: cm/hr), and
ρ denotes the
partition coefficient for ethanol in plasma and air with respect to the
concentration units of grams per milliliter of plasma and grams per milliliter
of air at 37°C (normal body temperature).In modeling the TAC collection chamber, we assume that the inflow of
ethanol is proportional to the flux out of (i.e., from right to left) the
epidermal layer at the surface of the skin (i.e., at η =
0), , with constant of proportionality
γ (units: cm−1), and the outflow
is simply proportional to the concentration of ethanol in the collection chamber
(i.e., a simple linear model) with constant of proportionality
δ (units: hr−1. Finally, the
output gain, θ, represents the bench calibration factor
for the TAC sensor that converts the concentration or ethanol in the collection
chamber into a TAC (units: dimensionless).Since the thickness of the epidermal layer, Λ, is in general
difficult to measure and can be mathematically difficult to estimate
computationally due to it determining the spatial domain of the diffusion
equation, it is desirable to transform the system eq. (2.1) to a domain of fixed length,
Λ = 1. We make the standard change of variable thus rendering η
dimensionless. For t ≥ 0, We also set
. Then, recalling our assumption of zero initial
conditions, the following hybrid ordinary-partial differential equation
input/output system results
where , , , and q4 =
δ. We note that since the only observable and
observed quantities are BrAC, u, and TAC, y,
the physiological interpretations of the variables and parameters in between
that define our model in the form of an input/output map from BrAC to TAC are of
little interest to us. Although we have relied on first principles modeling to
derive the system of equations given in eq. (2.2), our motivation was not to gain a deeper understanding of
the transdermal transport of ethanol. Rather, it was to be able to keep the
dimension of the space of unknown parameters as low as possible by capturing the
underlying physics and physiology of the transport process, albeit in a greatly
simplified form. Indeed, our primary objective here is to first fit the
parameters (or, more precisely, their distributions) in the model to observed
input/output BrAC/TAC training pairs and to then use the resulting population
model to obtain an estimate for the BrAC and associated error bars corresponding
to a given TAC signal collected in the field from a member of the cohort or
population that provided the data which was used to train the model.Let = [q1,
q2] denote the
unknown, un-measurable, and, in general, subject-dependent physiological
parameters. The parameters q3 and
q4 are device (i.e., hardware) dependent
parameters and as such, we consider them to be bench-measurable empirically in
the lab. We do note however, with simple changes of variable, the theory and
methods we develop below apply, and their distributions could also be estimated
along with those of q1 and
q2 with the same techniques we use here to
estimate the distributions of q1 and
q2. In addition, q3
and q4 could also be estimated using a deterministic
scheme such as a regularized nonlinear least squares approach. For clarity and
ease of exposition, we will focus our attention here on the development of a
population model for a cohort of subjects by estimating the distribution of the
un-measurable physiological parameters q1 and
q2.We consider this system on a finite-time horizon, 0 ≤
T < ∞, and we assume zero-order hold input,
u(t) =
u, t
∈ [kτ, (k +
1)τ), k = 0, 1, 2, ..., where
τ denotes the sampling time of the biosensor. We set
x =
x(η)
= x(kτ, η),
w =
w(kτ), and
y =
y(kτ), k = 0, 1, .
. . , K, where we assume T =
Kτ. For k = 0, 1, 2, ... we
consider the system eq. (2.2) on
the interval [kτ, (k +
1)τ] and make the change of variable:
v(t, η) =
x(t, η) −
ξ(η)u
where . It is then easily verified that
w and v satisfy the following hybrid
system
with initial conditions
v(kτ, ·) =
x(kτ, ·) −
ξ(·)u
= x −
ξu and
w(kτ) =
w on [0, 1].We then use linear semigroup theory to rewrite the system eq. (2.3) in state space form in an
appropriately chosen Hilbert space and subsequently obtain a discrete time
evolution system for (w,
x), k = 0, 1,
2, ....K which is equivalent to eq. (2.2). Let Q denote a
compact subset of the positive orthant of , and for =
[q1, q2,
q3,
q4] ∈
Q we define the Hilbert spaces
with respective corresponding inner products
and , and respective norms
|·|, and
∥·∥. Note that the
Sobolev Embedding Theorem [29] yields
that the norm induced by the V inner product is equivalent to
the standard H1 norm on V. It is
not difficult to show that V is densely and continuously
embedded in H and
that we have the Gelfand triple of dense and continuous embeddings
V ↪ H
↪ V*.Then based on the weak formulation of the system eq. (2.3), for each
∈ Q define the
bilinear form by for , , where , , and =
[q1, q2,
q3,
q4] ∈
Q. Standard arguments can be used to argue that the form
a(, ·, ·)
satisfies the following three properties. Note that in (1)–(3) above, Q compact implies
that the constants α0,
λ0, and
μ0 may all be chosen independent of
∈ Q.
Furthermore, properties (1)–(3) immediately yield that the form
a(, ·, ·)
defines a bounded, elliptic (i.e., λ0 = 0)
operator given by for , . If we define the set which is independent of
for
∈ Q, we obtain the closed, densely defined linear
operator given by , . The operator is regularly dissipative and (see, for example,
[30]) is the infinitesimal generator
of a holomorphic semigroup of bounded linear operators on
H and
V*. Moreover, the system eq. (2.3) then has a state space form where
and . Then for time step τ
> 0 and k = 0, 1, . . . , K, letting
, , and , it follows that
where for we have used the fact that the operator
A()−1
commutes with the semigroup generated by
A(). Note that the
operator is in fact an element in
H and
that (0, ξ) ∈ V, but that
is only an element in
H. From
eq. (2.5) the state space
form of our discrete time model is
where , k = 0, 1, 2, . . . and the
operator is given by , where (θ,
ψ) ∈
H. Note
that the ellipticity of guarantees the existence of
A()−1.Boundedness There exists a constant
α0 > 0 such that
, , ,Coercivity There exists constants
and
μ0 > 0 such that
, .Continuity For all , , we have that
is a continuous mapping from
Q into .From eq. (2.6) it is
immediately clear that if we assume zero initial conditions,
, the output y can be written
as a discrete time convolution of the input, u, with a filter,
h(), as
where for ∈
Q, , i = 0, 1, 2, . . .. Using the
Trotter-Kato semigroup approximation theorem (see, for example, [31]), the following result can be shown (see, for
proof, [32]).
Lemma 2.1.
For Q a compact subset of the positive orthant or
, K = Kτ for
constant time step τ > 0, and
h
as defined in
eq. (2.7), we have
that the mapping
↦
h()
from Q into
is continuous, uniformly in
and i, for
∈ Q and i ∈
{0, 1, 2, . . . , K}.Now although the input/output model given in eq. (2.7) is a standard convolution in
, the filter,
{h(q)}
involves the semigroup
{e
: t ≥ 0} which is defined on the infinite
dimensional Hilbert space
H.
Consequently, finite dimensional approximation is required. For
n = 1, 2, . . . let denote the standard linear B-splines on the
interval [0, 1] defined with respect to the uniform mesh
, . Set
and let denote the orthogonal projection of
H on to
V along
(V)⊥.
Standard arguments from the theory of splines (see, for example, [33]) can be used to argue that
, as n → ∞,
for all (θ, ψ) ∈
H, and that
, as n → ∞,
for all with the convergence uniform in
for
∈ Q.For n = 1, 2, ... and k = 0, 1, 2,
... we set , and we approximate the operator
A() using a Galerkin
approach. That is, we define the operator by restricting the form
a(, ·,
·) to V ×
V. We then set
where . The matrix representations for these
operators with respect to the basis are then given by , , and , where , , , and . Letting , we consider the discrete time dynamical
system in V given by
or equivalently in given by the system where
, , and , we obtain that
for k = 0, 1, 2, . . . , K
where , i = 0, 1, 2, . . . ,
K − 1.Using linear semigroup theory (see, for example, [21, 34,
35]) and in particular the
Trotter-Kato semigroup approximation theorem (see, for example, [36] and [31]) the following results can be established
(for proof, see [32]).
Theorem 2.1.
For Q a compact subset of the positive orthant of
[0, 1] defined with respect to the uniform
mesh
on to V
along
(V)⊥,
and
eq. (2.9), we have
that
, as n → ∞,
for all (θ,
ψ) ∈
H,
that
, as n → ∞,
for all
, and that
, as n → ∞,
with the convergence in all cases uniform in
for
∈ Q.
Theorem 2.2.
Under the same hypotheses as → ∞
uniformly in k for k ∈ {0, 1, 2, . . . ,
K} and uniformly in
for
∈ Q.Finally, we will assume that we have training data,
, from R participants or
subjects where without loss of generality (i.e., by padding with zeros) we
have assumed that all training input/output datasets have the same number,
K, of observations. In this case, for
i = 1 , . . . , R we have,
where and , for j = 0, 1, . . . ,
K − 1. This formulation facilitates the
estimation of the population parameters . If
one wishes to find the parameters for a
specific individual, the methods outlined in Section 2.2 can still be applied by letting the indices
i = 1, . . . , R refer to different
measured BrAC/TAC events each with k = 0, . . . ,
K denoting the measurement times for the desired
individual subject.
Bayesian estimation of dynamical system parameters
In this section we develop a Bayesian framework to estimate the unknown
parameters = [q1,
q2] in the system
eq. (2.2). To illustrate our
approach, for simplicity but without loss of generality, we have assumed that
the sensor parameters q3 and
q4 have been bench-measured and are therefore
known and concentrate our effort on estimating the physiological
subject-dependent parameters q1 and
q2. All of what follows below can easily be
extended to estimating all four of the parameters in the model. Our underlying
statistical model incorporating noise is based on the observation of
as in eq. (2.12) and is given by
where are our measured TAC values, and
are the i.i.d. noise terms corresponding to
person i at time jτ with
σ > 0, τ > 0.
Commonly, as we will assume in Section 3.2
and beyond, . In order to be able to carry out the requisite
computations, we consider the approximating statistical model based on eq. (2.12)
where once again the are assumed to be the measured TAC values. We
consider to be a random vector on some probability space
{Ω, Σ, P} with support Q and
assume that the prior distribution of is given by the push forward measure
π0. That is for A
⊂ Q, .We assume independence across both i (individuals) and
j (sampling times for each individual), for each
i and j we have (commonly distributed N(0,
σ2)) and similarly,
(again commonly distributed
N(0, σ2)). Letting
φ denote the density of , for ∈
Q the likelihood and the approximating likelihood functions
are given respectively by (see, for example, [37-40])An application of Bayes’ Theorem (see, for example, Theorem 1.31
in [41]) yields that the posterior
distribution of or the conditional distribution of
conditioned on the data,
, is a push forward measure
that is absolutely continuous with respect to
π0 and whose Radon-Nikodym derivative, or
density, for ∈ Q is
given by
In this way, for A ⊂ Q,
we have . If, in addition, we have
π0 ≪ λ
with density where λ denotes
Lebesgue measure on Q, then π ≪
λ with conditional density f given
byAnalogously, in the case of the approximating likelihood eq. (2.15), eq. (2.16), eq. (2.17), eq. (2.18), and eq. (2.19) respectively become
and
Results
Convergence in distribution
Consider the random variable with posterior distribution described by the
measure π given in eq. (2.15) and eq. (2.16), and let denote the random variable
but with posterior distribution
π given by eq. (2.20) and eq. (2.21). In this section we establish that
as n → ∞; that
is that converges in distribution to
. Recall that due to the physical constraints
based on our model for the alcohol biosensor problem, eq. (2.2), we require that the parameters
lie in the interior of the positive
orthant of .
Theorem 3.1.
For Q a compact set in the interior of the positive orthant
of 0
with compact support Q and a density that is continuous on Q, and a
noise distribution with bounded density function φ and support on
given by
eq. (2.20)
and
eq. (2.21)
converges in distribution to the random variable
with posterior distribution π given by
eq. (2.15)
and
eq. (2.16).
Proof.
For S a subset Q, the triangle
inequality yields
where φ is the normal density
describing the distribution of the noise term in eq. (2.13), and Z
and Z are as in eq. (2.16) and eq. (2.21),
respectively.Focusing first on the limit of |1/Z −
1/Z| as
n → ∞, by Lemma 2.1 we have that the
are continuous in
for
∈ Q,
i ∈ {0, 1, . . . , R}, and
j ∈ {0, 1, . . . , K}.
Since Q is compact, the are bounded and thus 0 <
Z < ∞. By Theorem 2.2, since uniformly in
for
∈ Q as n → ∞, 0
< Z <
∞ for n large enough. Again by Theorem 2.2 it follows from eq. (2.16), eq. (2.21), and the Bounded
Convergence Theorem that Z
→ Z as n → ∞ and
therefore that |1/Z −
1/Z| → 0 as
n → ∞. Then, essentially the same
arguments yield that , from which it then immediately follows
that , and therefore from eq. (3.1) that
as n → ∞
and the theorem has been proved. □For the push forward measures π and
π from eq. (2.15) and eq. (2.20), respectively,
we are commonly interested in their respective expected values and the
convergences there within. Since Q is compact, a direct
invocation of the Portmanteau theorem (see, [41]) establishes the following corollary
which guarantees the convergence of all moments described by
π to those of
π.
Corollary 3.1.
Under the hypotheses of
Theorem 3.1, and for any
continuous function
we have that
as n → ∞.
Consistency
In this section we demonstrate the strong consistency of the posterior
distribution with respect to the parameters, , by
imposing stronger assumptions on the distribution of the noise terms
in eq.
(2.13) and on the prior, π0, by
restricting Q to a rectangle in the positive orthant of
, and by applying the framework summarized in
[42].As in [42], we show consistency
of the posterior distribution π as in eq. (2.17), rather than consistency of a
point estimator based on the posterior distribution. As such, for prior
π0 over Q, posterior
as in eq. (2.15), and i.i.d. noise for σ > 0, we
also consider that for ∈
Q assumed known we have random variables
as determined by eq. (2.13) for i = 1, 2, . .
. , R and j = 1, 2, . . . ,
K. Further, we have that are independent in i and
j, but are non-identically distributed (i.n.i.d).For clarity and brevity we consider the random vector
with values in and independent entries derived from the matrix
equivalent to eq. (2.13) and
eq. (2.14), namely
=
+
=
+
and
=
+ =
+ , with
noise vectors , observed TAC vectors , theoretical TAC vectors
and analogous
,
BrAC data vectors and analogous
,
and kernel matrices H and
H with entries
and , respectively. In this way, for every
i ∈ {1, 2, . . . , R}, by
independence in j we have the family of joint distributions,
, representing the possible densities of
for φ the noise density
and as in eq. (2.12). We are interested in the scenario where the number of
subjects R → ∞.With our reframing, for A ⊂ Q
by independence in i we may rewrite eq. (2.15) as
where for all i we have data vectors
, and for
our purposes we will be interested in the equivalent form on the right-hand side
of the equation above where 0 ∈
Q is the true value of our parameters
[q1,
q2].We first formalize the results discussed in Section 7 of [42] that handle the i.n.i.d case of
posterior consistency. As such, we say that our posterior distributions
{π(·|{})}
as in eq. (3.2) are
strongly consistent at
0 if
{π(U|{})}
→ 1 a.s for every neighborhood U of
0 as R
→ ∞, where with the probability distribution generated by
with data samples
{}.For this we show that for sets A ⊂
Q with 0 ∉
A,
J({V})
→ 0 and
J({})
→ ∞ as R → ∞ in some appropriate
manner to be made precise below. For
J({})
→ 0 we take the same approach as expressed in [42] and thus state the following definition below
without motivation, where we note that for any two densities f,
g on some space their ,
denoted Aff(f, g), is given by
.
Definition 3.1.
Let A ⊂ Q and
δ > 0. The set A and
0 are
strongly
δ
separated if for every probability measure
ν on A,
where for as in the work surrounding eq. (3.2), and is the marginal density of
given by for any R = 1, 2, . . ..
We will say that A and
0 are strongly
separated if they are strongly δ
separated for some δ > 0.From these definitions we provide the following theorem without
proof. For proof see Sections 3 and 7 of [42] (or, for examples, [43])
Theorem 3.2.
Let π0
be a prior over parameter space Q,
be independent but not identically distributed data with
distribution generated by fi,
for
∈ Q,
0 ∈ Q the true
value of the parameters [q1,
q2],
and A ⊂ Q with
0 ∉ A. If
such that
then for some β0 > 0,
a.s.
as R → ∞ for
J({})
as in
eq. (3.2).For some δ > 0,
all
A’s are
strongly δ separated from
0
for the model
⟼
fi,,
and,To show
J({})
→ ∞ as R → ∞ we utilize the
approach as outlined in the proof Theorem 1 of Appendix A.2 in [44] (specifically the proof of (8) in
Appendix A.2). For a direct proof see [32]. For a similar approach see [45].
Theorem 3.3.
Let π0
be a prior over parameter space Q,
be independent but not identically distributed data with
distribution generated by fi,
for
∈ Q, and
q0 ∈ Q the true value of the
parameters [1,
2].
For
∈ Q define
, , and
. If there exists a set B
⊆ Q with π0(B)
> 0 such that
Then
a.s.
as R → ∞ for
J({})
as in
eq. (3.2).∀ ∈ B,
andFor every ε > 0,
.Before moving on to our main theorem we apply the following theorem
and subsequent corollary to prove a lemma that will be of use to us later.
For proof of the following see Theorem 5.3 of [46].
Theorem 3.4.
For parameters
∈ Q a subset of the positive
orthant of
, and
-dependent semigroup
{T(·; ) :
t > 0} with infinitesimal generator
A() defined in terms of
a
-dependent sesquilinear form
on a Hilbert space V satisfying items 1 and 2 in
Then, the semigroup T(·;
) is (Fréchet)
differentiable in
in the interior of Q, where for t > 0,
, and acting on
δ ∈ Q
the derivative is given by
for R(λ,
) =
(A() −
λI)−1
the resolvent of A(),
and the obtuse sector
with γ ∈ (0, 1),
α0
as in item 1, and λ0,
μ0
as in item 2.() The
map
↦
σ()
is affine, in the sense that for any u,
v ∈ V,
σ(,
u, v) =
σ0(u,
v) +
σ1(,
u, v) where
σ0
is independent of
and the map
↦
σ1(,
·, ·) is linear, and() For
any
,
with metric
d(·, ·)
we have the bound ,
v ∈ V.The following corollary is an immediate consequence of the work in
[6]. Specifically that the map
↦
R(λ,
) is analytic as a map from
Q to , and from eq. (3.3) the map
depends continuously on
. Note that
Σγ is independent of
as the constants
α0 and
λ0,
μ0 from items 1 and 2, respectively,
of Section 2.1 are independent of
.
Corollary 3.2.
Under the same hypotheses as Theorem
3.4, we have that the map is continuous in the operator norm on
for in the interior of Q.
Lemma 3.1.
For Q a rectangle in the positive orthant of
, Hilbert spaces
H
and V as in
eq. (2.4), bilinear
form
as in
Section 2.1
with q3
and q4
assumed known, and induced infinitesimal generator
A() as in
Section 2.1, then the generated
holomorphic semigroup of bounded linear operators
on H
and V* is (Fréchet) differentiable and
Lipschitz in
in the interior of Q. Further, for i = 1, . . . , R
and j = 1, . . . , K,
and
as in
eq. (2.12)
and
eq. (2.12)
are (Fréchet) differentiable and Lipschitz in
with Lipschitz constants independent of i and j.First, by item 3 of Section
2.1 we have that for ∈
Q, ↦
a(, ·,
·) is continuous in . Second,
notice that for , with and , we have , where
↦ a0(,
ψ, φ) and
↦
a1(,
ψ, φ) are clearly
linear in for
∈ Q. Hence
the bilinear form a(,
·, ·) is affine and continuous in
Thus, by Theorem 3.4, we have that the semigroup
generated by a(,
·, ·),
{e
: t ≥ 0} =
{T(t,
) : t ≥ 0},
is (Fréchet) differentiable in
for ∈ Q. Denote
the derivative in and
acting on
δ
∈ Q by .Moreover for t ∈ (0, T]
we have for 1,
2 ∈
Q and line segment S =
{s1 + (1
− s)2
: 0 ≤ s ≤ 1},
with the operator norm, where the first
inequality follows from the Mean Value Theorem on Banach spaces (see
Theorem 4 of Section 3.2 in [47]), and the second and third inequalities follow from the
compactness of Q, Corollary 3.2, and the continuity of the map
.Further, under zero-order hold the differentiability and
Lipschitz properties of remain. Considering
from Section 2.1, , we find that it is a sum and product
of -differentiable and
-Lipschitz terms and thus is
differentiable and Lipschitz in . Since
h and y as in eq. (2.12) are a composition and sum
of -differentiable terms they remain
differentiable. Further, using eq. (3.4) we have the following Lipschitz bound for all
j ∈ {0, 1, . . . , K} and
, in the interior of Q,
for C1 the max of the
operator norms for , , over Q, and
, the max Lipschitz constants of
and over all k and
Q. The final inequality above follows from eq. (3.4) by noticing that
for all , and that by identification the supremum
over V* is larger than that over
H.
For and as in eq. (2.9) by a repetition of the
above arguments we maintain differentiability in
, and thus
h and
y as in eq. (2.12) are
differentiable and Lipschitz in .Lastly, for all i = 1, . . . ,
R, j = 1, . . . ,
K we have that for ,
where are BrAC values bounded by definition
to be in [0, 1], is the Lipschitz constant from eq. (3.5), and
K + 1 is the fixed upper bound on the number of
temporal observations, j. Hence, the Lipschitz constant
for is independent of (i,
j). By noticing that the previous statement holds
for with a repetition of the work leading
to eq. (3.6), our lemma
has been proved. □A direct consequence of Lemma
3.1 is that for all i = 1, 2, . . . ,
R with and as in the statement of Theorem 3.3, we have
where σ > 0 is the
standard deviation of the N(0,
σ2) noise density, and
is the Lipschitz constant from eq. (3.6) that is
independent of i (and j). Thus, for
any δ* > 0, i ∈
{1, . . . , R}, and , we have that by the relationship between total
variation and Kullback-Leibler distances, for as in eq. (3.7). If we let
* ∈ Q be
such that and consider the set
, then is strongly separated from
0 (see Definition 3.1). This follows from the
relationship between Affinity and total variation distance (via the
Hellinger distance) as well as by noticing that for any density ν
on , the marginal density of
1 satisfies
(for full example, see [32] or Example 3.5 in [42]). If this holds for all
i then and
0 are strongly
separated with
independent of i.With this example in mind we note that items 1 and 2 of Theorem 3.2 are satisfied if the
following special condition is met: where π0 is the prior over
Q. This follows from the fact that if special item
1 holds then we may take an ε*-neighborhood of
0,
. As discussed above, since
is independent of i,
U is non-empty and contains the set
. Now set , and by compactness cover
Q with a finite number of disjoint sets
A determined by the
balls with model
↦
f,
where represents a finite set of points in
Q chosen so that . From these
A’s we have
that the finite subset that intersect with
Uc must cover
Uc. This finite subset of
A’s
subsequently satisfies the assumptions of Theorem 3.2. Specifically, the strong separation condition
is satisfied as per the discussion leading up to special item 1 by
noticing that on each A we
have , and the convergent sum condition is
satisfied by the fact that the
A’s can be
considered (made) mutually exclusive with union contained in
Q. We now state and prove our main theorem.For every δ* > 0,
there exist sets A1,
A2, . . . with
L1 diameter less than
δ*, , and
for the mappings
Theorem 3.5.
For Q a rectangle in the interior of the positive orthant of
0
with compact support Q and a density that is continuous on Q, i.i.d
noise distributed as N(0,
σ2) for σ
> 0, data
drawn from independent but not identically distributed distributions
generated by fi,
as in
eq. (3.2), Hilbert
spaces H
and V as in
eq. (2.4), bilinear
form
as in
Section 2.1
with q3
and q4
assumed known, induced infinitesimal generator
A() as in
0 ∈ Q, we have that
our posterior
π(·|{})
as in
eq. (3.2)
is consistent for
0
as R → ∞.For any set A ⊂ Q with
0 ∉
A we will use the form of
π(A|{})
as in eq. (3.2) and
handle the numerator, J
and denominator, J separately.First, as Q is compact, for any
δ > 0 we may cover Q
by a finite number of sets
A, i
= 1, 2, . . . , γ where each
A is a subset of
an L1 ball in Q. That is,
for every i and ,
we have that . For R large enough,
if on each A we consider
the model ↦
f
for i ∈ {1, . . . , R} and
f
the density of the random variable with
assumed known, then special item 1 is satisfied for prior
π0. Hence, by Theorem 3.2 we have that for some
β0 > 0,
a.s. as R →
∞.Now for Λ,
, and S
as in the statement of Theorem 3.3, for i = 1, 2, .
. . , R and ∈
Q, we have and
for as determined by eq. (3.7).Thus, for 0,
∈ Q we
find that . Further, by the bounds above we have
that for every ε > 0 and
i, is non-empty and our choice in such
does not depend on
i. Hence the set is non-empty. Thus, for
B = Q we satisfy the assumptions
of Theorem 3.3 and therefore find
that ∀β > 0,
eJ({})
→ ∞ a.s. as k →
∞.So for any set A ⊂ Q
with q0 ∉ A, from
eq. (3.2) we have
that
π(A|{})
→ 0 a.s. as R → ∞
and thus the theorem has been proved. □From Lemma 3.1 we find
that we maintain the differentiability and Lipschitz properties of the
finite-dimensional semigroup as in eq. (2.9) and respective kernel as in
eq. (2.12). Thus,
with a straightforward rewriting of eq. (3.2) and eq. (3.2) in terms of the
finite-dimensional posterior eq. (2.20), and repetition of the work following Lemma 3.1 through the proof of Theorem 3.5 we have the following
corollary.
Corollary 3.3.
Under the same hypotheses as (·|{})
as in
eq. (2.20)
is consistent at
0
as R → ∞.
Deconvolution of BrAC from TAC
In this section we consider the problem of using the biosensor measured
TAC signal to estimate BrAC. We do this by deconvolving it; to wit we invert the
convolution given in eq. (2.12)
subject to a positivity constraint and regularization to mitigate the inherent
ill-posedness of the inversion. Recall that the convolution given in eq. (2.12) was found by solving
the finite-dimensional discrete time system eq. (2.10) derived from eq. (2.2). We employ the method originally
described in [25], wherein the problem is
formulated as a constrained, regularized, optimization problem (see, for
example, [48]).We first briefly summarize the treatment in [25] and then follow by showing how our work is able
to make direct use of this theory. Let and be Hilbert spaces forming a Gelfand Triple,
. For an admissible set Q, a
compact subset of the positive orthant of , with ∈
Q, let A()
be an abstract parabolic operator defined by a sesquilinear form
(i.e., one that satisfies items 1 to 3 in Section 2.1) that when restricted to
generates a holomorphic semigroup on
. For bounded operators
and consider the input/output system where
, , and y(t) =
C()x(t)
where on the interval [0, T], u ∈
L2(0, T) is the input,
y the output, and x is the state
variable.For sampling interval τ > 0 and
zero-order hold input u(t) =
u, t ∈
[jτ, (j +
1)τ), j = 0, 1, 2, . . . the
corresponding sampled-time system is given by
where , and , ,
j = 0, 1, 2, . . . , and for all j,
are zero-order hold input values.Now let be a random variable with support the parameter
space Q. For the probability measure of
, define the Bochner spaces
, , and . It is easily shown that the spaces
and form a Gelfand triple . Define by for . Then, as in Section 2.1, the form satisfies items 1 to 3 and therefore defines a
linear map that when restricted to
, generates an analytic semigroup on
, .Assume further that the map ↦
B() is in
and that the map
↦
C() is in
with respect to the measure
. (Note that since the domain space in
and the co-domain space in
are both , it follows that in fact the mapping
, and by the Riesz Representation Theorem, that
effectively the mapping ). Then define bounded linear operators
and by and , respectively, for and . It can then be shown [49,50] that
for the solution to the input/output system from
above with agrees with the solution to the system where
, , and for —almost every
∈ Q. Then with
sampling interval τ > 0 as in eq. (3.8), and zero-order hold input
, , j = 0, 1, 2, . . . , this
system becomes
for j = 0, 1, 2, . . . where
, , and .Now, with eq. (3.9),
note that is obtained by zero-order hold sampling a
continuous time signal. That is, the input to eq. (3.9) is with at least continuous on [0, T].
We seek an estimate for the input based on this model, wherein the input
estimate is a function of both time and the random
parameters . For optimization purposes (more precisely, to
be able to include regularization) we require additional regularity. Given the
time interval [0, T], let and let be a compact subset of S(0,
T).The input estimation or deconvolution problem is then given by
where is a norm on S(0,
T) that will be defined below, are measured output values, the term
serves as regularization, and
, for k = 1, 2, . . . ,
K with for j = 1, 2, . . . ,
K the zero-order hold input to the discrete time system
eq. (3.9), and convolution
filter (which is equal to , by the Riesz Representation Theorem) where
, , and .Solving eq. (3.10)
requires finite dimensional approximations. For index M, let
define an approximating family of closed
subsets of , where each subset is contained within a
corresponding finite dimensional subspace, of S(0, T).
Further we require that for each there exists a sequence
with such that in S(0, T) as
M → ∞. For index N, let
be an element of an approximating family of
finite-dimensional subspaces of , and let be the orthogonal projection of
onto . We also require of the spaces
that for each , in as N → ∞.We next specify finite-dimensional operators , , and that define the finite-dimensional system
analogous to eq. (3.9). That is,
let be given by for , , , , and . In this way we obtain a doubly-indexed
sequence of approximating finite-dimensional optimization or deconvolution
problems given by
where L = (M,
N), , andUsing the approximation properties of the subspaces
and (that is, that for each
there exists a sequence
with and as M → ∞, and
that for each , as N → ∞), and
the corresponding operators , , and , it can be shown that 1) for each multi-index
L, eq.
(3.11) admits a solution , and 2) there exists a subsequence of
, with strongly as k →
∞ with a solution of eq. (3.10). Further, if in addition
is assumed to be a closed and convex subset of
S(0, T), for each M,
is a closed and convex subset of
, and the optimization problem given in eq. (3.10) admits a unique (with
respect to sampling) solution, then the sequence of solutions to eq. (3.11),
converges strongly, or in S(0,
T) to the unique solution of eq. (3.10), . For the proofs of these results see Section 5
of [25]To numerically carry out the requisite computations to actually
determine for given values of M,
N and L = (M,
N), we continue to apply the results in [25] while also connecting them to our treatment in
Sections 2.1 and 2.2 above. We assume that the feasible parameter set
Q is a compact rectangle in the positive orthant of
, we set and as in eq. (2.4), and we identify the operators in eq. (3.8) with those in eq. (2.6). Our distribution over
, , is the finite-dimensional posterior
for fixed n as in eq. (2.20) and we proceed with the
Bochner spaces and to achieve eq. (3.9).For the state variables
x(η,
) we have that η
∈ [0, 1] and ∈ Q
= [a1, b1] ×
[a2, b2] for 0
< a <
b, i = 1, 2.
Further, for the inputs we have that t ∈ [0,
T] and ∈
Q. Let n be as in eq. (2.8) and m a positive
integer, and we discretize [0, 1] and [0, T] using the sets of
linear B-splines, and , respectively, on the uniform meshes,
and , respectively. Further, for positive integers
m1 and m2, we
discretize Q with the 0th-order B-splines
, i = 1,2 defined with respect
to the uniform grids , on
[a,
b], i =
1, 2.Then for multi-indices N = (n,
m1, m2) and
M = (m, m1,
m2) we define the approximating subspaces
and as follows using tensor products. That is, let
, and , where and . Standard approximation theoretic arguments
(see, for example, [33]) can be used to
argue that the subspaces defined in above satisfy the required approximation
assumptions on and . Then and , can be written as and , respectively.Then with the bases for and as chosen above, it is an elementary exercise
to determine the matrix representations for the operators
, , , , and . It then follows that eq. (3.9) takes a matrix system where for
k = 1, 2, . . . , K,
and with the coefficients of the basis elements
, the coefficients of the basis elements
as in the previously mentioned approximating
subspaces, a matrix with entries , a matrix with entries , a matrix with entries , and given by [1, 0, . . . , 0]. From here the
matrix representation of (with L = (M,
N) in place of N due to the joint
dependence on the multi-indices M and N) can
be found using this matrix system.We note that the optimization problem eq. (3.11) is a constrained problem, in that
of the previously stated matrix system are to
be non-negative. With a proper placement of into the block matrix , the approximating deconvolution problem eq. (3.11) is now given by
where we have that
where is the dimensional column vector of the coefficients
of , and is the column vector of measured output values
followed by zeros. Further, for i = 1, 2 are matrices with
entries given by the inner products of the basis elements for the
subspaces as determined by the regularization term
below. Note that the regularization term
is derived from a weighted inner product on
S[0, T] and thus corresponds to a squared
norm on S(0, T) given by
.The values of the regularization weights used in eq. (3.13), for i = 1, 2 are chosen
optimally. Indeed, in order to find , BrAC-TAC input-output training data pairs,
are used to optimize
(r1, r2) via the
following scheme:
where , are the predicted BrAC values found by finding
the minimum of J( · ;
r1, r2) given by
eq. (3.13) with
r1 and r2 candidate
values for the regularization weights from a specified feasible set in the
positive orthant of , and are the TAC values found by using
as input to eq. (3.12).
Numerical results
All of the data used in the studies detailed below, unless otherwise
specifically stated (e.g., as in Section
3.4.2), were collected in USC IRB approved human subject experiments
designed and run by researchers in the laboratory of one of the authors (S. E.
L.) as part of a National Institutes of Health (NIH) funded investigation (see,
[51]). These experiments were carried
out in controlled environments wherein 40 participants completed one to four
drinking episodes, with viable data recorded in 146 drinking episodes. BrAC was
obtained using Alco-sensor IV breath analyzer devices from Intoximeters, Inc,
St. Louis, MO, and participants each wore two SCRAM (Secure Continuous Remote
Alcohol Monitoring) devices manufactured by Alcohol Monitoring Systems (AMS) in
Littleton, Colorado (see Figure 1)
simultaneously placed on the participants’ left and right arms for TAC.
For each separate SCRAM device, participants started their readings with a TAC
and BrAC of 0.000, consumed alcohol (equivalent across all sessions per
participant) in one of three different drinking patterns (single: over 15
minutes; dual: over two 15-min periods spaced 30-minutes apart; or steady: over
60 minutes), and then ended their session when their TAC and BrAC had returned
to 0.000. We note that the placement of the two sensors challenges the
independence assumption from Section 2.2,
but for experimental purposes we will include all of the data as independently
measured drinking episodes with this caveat in mind. In addition, we did not
focus on any specific drinking pattern as including all possible patterns is in
line with real-world, varying drinking patterns and may improve the
generalizability of our model. In the calculations of Sections 3.4.1 and 3.4.2, as in eq.
(2.5), time is discretized by a constant sampling time
τ of 5 minutes and is subject to our zero-order hold
assumption. While this challenges the implications of our zero order hold
assumption, namely that τ = .0833 hours implies that
subjects’ BAC is constant for 5 minutes, this restriction is needed as
computational complexity becomes unstable as τ
decreases. In order to achieve this sampling time, we first linearly interpolate
all of the data (both BrAC and TAC), and then re-sample at our desired rate of
τ = 5. For Section
3.4.3, a τ will be discussed. Further, in all
sections we assume a truncated multivariate normal (tMVN) prior
π0 (as in eq. (2.20)) on
with mean μ and
covariance matrix Σ which varies from example to example.Unfortunately, the USC IRB approved experiments for collecting human
subject data were not designed around the problem of estimating the sensor
collection chamber inflow and outflow parameters, q3
and q4 as in eq. (2.3), nor do the authors have the laboratory facilities or
expertise to determine them experimentally. Moreover, since the approach
developed in sections 2.1 and 2.2, and in particular our underlying hybrid
PDE/ODE model given in 2.3, are relatively novel, no values for
q3 and q4 are
available from either the manufacturers of the sensors or the current
literature. Consequently, for the purposes of this study we have chosen values
for q3 and q4
arbitrarily as q3 = q4 =
1. However, we note that the precise values chosen for
q3 and q4 had no
perceptible qualitative effect on the results to be presented below. Finally we
note that all computational work was done in Python 3.7.2 and includes ported
MATLAB code from the work of [15,21,24,25], in particular with
respect to the creation of the finite-dimensional, discrete-time kernel as in
eq. (2.12). Ported code was
verified against the original code through the use of unit tests.
Convergence in distribution
We used surface plots as well as Metropolis Hastings (MH) Markov
Chain Monte Carlo (MCMC) methods to validate our convergence in distribution
results. Throughout the results described here we have that from eq. (2.14) for all sample
times the i.i.d. noise ε is distributed as
N(0, 0.0052), and prior
π0 as in eq. (2.20) is distributed as the optimal
distribution found in Section 6 of [25]. Specifically, the prior is a tMVN random variable with
mean, and covariance matrix,
with the feasible parameter set,
Q, taken to be Q = [0.01, 2.2877]
× [0.01, 2.1410] for =
[q1,
q2]. The
choice of 0.005 for the standard deviation of ε was
made to limit the role of noise in our subsequent sampling algorithms so
that we may focus on the role of the dimension of our approximating system
in the resulting posterior distribution. In addition, when comparing this
choice in standard deviation to the peak TAC values of our training dataset,
we had a typical peak TAC to noise ratio of 20. For computational reasons,
we limit ourselves to measurements from a random subgroup of
R = 3 subject drinking episode measurements. Figure 2 contains the resulting surface
plots for n values of 1, 3, and 25. Further, Table 1 contains the means and credible regions
for n values of 1, 2, 3, and 25 as determined by respective
1000 sample (1100 draws with a 100 draw burn-in period) MH MCMC sampling
runs. The MCMC sample size chosen here was due to computational complexities
and runtimes. Figure 3 displays
deconvolution results for a randomly chosen, non-training drinking episode
for different values for the dimension of the approximating system,
n. This figure used the method from Section 3.3 along with the resulting posteriors as
shown in Figure 2 and Table 1.
Figure 2.
Posterior distribution surface plots for varying finite dimensional
approximations of the kernel from eq.
(2.12).
Table 1.
90% credible circles for MCMC sampled posteriors with noise
distribution .
n Dimension
1
2
3
25
Mean (q1,
q2)
(0.7185, 0.8512)
(0.6829, 0.8651)
(0.6776, 0.8686)
(0.6719, 0.8716)
Credible Circle Radius
0.1173
0.1097
0.1029
0.1289
Figure 3.
Deconvolutions for differing approximating dimension values,
n, associated with posteriors from Figure 2 and Table
1.
Consistency
We again used surface plots as well as MH MCMC sampling methods to
verify our consistency results. For these studies we have assumed that the
noise ε is now distributed as N(0,
0.0252) while our prior π0
from eq. (2.20) is still the
optimal distribution found in Section 6 of [25]. That is, π0 is a tMVN
with and covariance matrix,
with bounds [0.01, 2.2877] and [0.01,
2.1410] for q1 and
q2, respectively. The choice for a
distribution for the random noise is meant to simulate a more realistic
situation where little is assumed known about any external effects that play
a role in perturbing the sensor measurements. Consequently the data is
assumed noisy. When comparing this choice for the standard deviation of the
noise process to the peak TAC values of our training dataset, we had a
typical peak TAC to noise ratio of 8.To test Theorem 3.5, we
generated 276 TAC values using subject-measured BrAC values via eq. (2.14) with a
predetermined 0 value of [1, 1],
n = 24, and noise variance of 0.0252. Table 2 displays the calculated means
and 90% credible circle radii for the posterior distribution eq. (2.22) for increasing
amounts of idealized (BrAC, TAC) data pairs (R) all
generated using the “true”
0 value previously stated. To
calculate these values, MH MCMC samples were drawn with a sample of size
1400 (1500 data points with a 100 sample burn-in phase) where the MCMC
sample size was increased from that of Section 3.4.1 due to the increase in noise variance.
Table 2.
90% credible circles for MCMC sampled posteriors from idealized TAC
data with noise distributed and prior as determined in Section 3.4.2.
R
1
11
26
101
276
Mean (q1,
q2)
(0.683, 1.023)
(0.734, 1.031)
(0.776, 1.025)
(0.877, 1.011)
(0.942, 1.003)
Cred. Circle Radius
0.2854
0.1963
0.1677
0.1590
0.0787
We now investigate the results of Section 3.2 with respect to the field-measured (BrAC, TAC) data
pairs. Note that we no longer are able to know the true value of the
parameters, 0. Surface plots for
increasing amounts of subject drinking episode measurements,
R = 1, 26, 76, and 101 are contained within Figure 4. Table 3 displays the calculated means and 90% credible circle
radii for increasing numbers of subjects, and thus data (corresponding to
R as in Section
3.2) included in determination of the prior. To calculate these
values, for each R, we again used 1400 MH MCMC samples
(1500 draws with a 100 sample burn-in phase) generated according to our
chosen prior.
Figure 4.
Posterior distribution surface plots for varying amounts of collected
data, m. All images use noise and prior distributions as stated
in Section 3.4.2.
Table 3.
90% credible circles for MCMC sampled posteriors with noise distributed
and prior distribution equivalent to the one in
Section 6 of [24], namely a tMVN with
, , and bounds
[0.01, 2.2877] and [0.01, 2.1410].
R
1
11
26
101
Mean (q1,
q2)
(0.913, 1.251)
(1.426, 1.629)
(1.900, 1.551)
(2.183, 1.231)
Credible Circle Radius
0.2824
0.1497
0.1560
0.1254
Deconvolution
As in Section 3.3, we rely on
the treatment in [25] for
deconvolving BrAC from TAC using a distribution for
over Q. The chosen
distribution was the posterior eq.
(2.22) with n = 3. To determine the posterior we
elected to investigate the case where a non-informative, or what is more
aptly described as an uneducated, prior was used. Thus we chose a prior of a
tMVN random variable with bounds [0.01, 10] × [0.01, 10], and
parameters and . The noise used was distributed as
. Note that this choice in prior also
highlights the effects of data on the posterior by not providing any initial
information to the posterior. When comparing this choice in noise standard
deviation to the peak TAC values of our training dataset, we had a typical
peak TAC to noise ratio of 8. Further, for the subspaces from Section 3.3 we set our discretization to
be n = 3, time discretization as m = 1300,
and discretized Q with m1 =
m2 = 20. As with Section 3.4.2, the noise distribution is meant to
simulate a situation where little is known about external effects that play
a role in determining noise, and so the data is assumed noisy.In all of the numerical results presented and discussed in this
section, the test dataset used consisted of five drinking episodes from four
different participants. These drinking episodes were chosen heuristically so
that the test dataset had two drinking episodes with peak BrAC greater than
peak TAC, two drinking episodes with peak BrAC less than peak TAC, and one
drinking episode with peak BrAC within 0.015 of peak TAC (deemed,
“close”). The remaining drinking episodes were used as
training data with the added restriction that whenever the desired number of
training sets to be used was not too large, BrAC/TAC pairs from any
participant who had a dataset included in the selected test data, would be
excluded from being among the data used for determining the posterior. The
primary exception to this restriction being Figure 6c, wherein we allowed all data that wasn’t the
current test data point to be included in the training set.
Figure 6.
BrAC deconvolution given TAC and predicted
values for a single test data drinking
episode derived using varying amounts of training data, R, with
conservative credible regions shaded in gray. All sub-figures use the same test
TAC data from a single right arm session from BT333. Prior used was tMVN with
bounds [0.01, 10] × [0.01, 10], , and . Associated data are contained in Table 4.
By linearly interpolating the BrAC and TAC data for each subject in
all test and training datasets, we are able to re-sample our data with
sampling interval τ = 45 seconds, and the time
discretization m = 1300 previously mentioned. The
associated participant IDs, TAC device placement (left vs. right arm), type
of drinking pattern used (single, dual, or steady), and number of subjects
used in posterior distribution determination (R) are
labeled in Figures 5 and 6. As in eq. (3.14), we utilized all available non-test subject drinking
episode measurements (R = 136) to determine population
parameters to be (4.7733, 1.7020).
Figure 5.
BrAC deconvolution given TAC and predicted
values for varying test data
participants’ right arm data with conservative credible regions shaded in
gray. Across subfigures, all training data remained constant with
R = 25. Prior used was tMVN with bounds [0.01, 10] ×
[0.01, 10], , and .
Figure 5, shows varying
deconvolution attempts for three test data participants, whereas Figure 6 shows deconvolution attempts for
the same test data participant reading (BT333), with varying
amounts of training subject data within the posterior eq. (2.22), R = 25, 75,
145. In both Figures 5 and 6, gray bands represent 90% error regions
that are determined by sampling respective parameter posterior distributions
and utilizing these samples with (3.13) to determine estimated BrAC values.
It follows that these error regions contain the 90% credible regions for the
pointwise BrAC values as functions of the population parameters appearing in
the model. This is the basis for our referring to them in what follows as
conservative credible bands.
Discussion
Bayesian estimation of model parameters
Figure 2 illustrates rapid
convergence in dimensionality of our spatial dimensions as n
grows, thus bolstering the results of Theorem
3.1. Within two steps (n = 3), we have a graph that
visually differs from that of n = 25 in ways barely
perceptible. Paired with the credible circles in Table 1, these provide evidence that after n = 3
the mean and radius of the credible circles stay
consistent. Thus one can choose a computationally efficient n
value that minimizes data lost when projecting eq. (2.6) into finite dimensions, eq. (2.11).For the consistency results, Table
2 exemplifies the theoretical prediction in Theorem 3.5 that as the amount of subject data
R grows, the posterior distribution better predicts the
true value by localizing the true parameter
0 in mean with higher
confidence (smaller credible circles). This increasing confidence is backed by
the decreasing variance results shown in Figure
4. Notice that although the variance decreases, the mean is allowed
to shift as more data are incorporated, as evident from comparing Figure 4c to Figure
4d. This shifting mean is permitted by the theoretical results and is
likely due to the incorporation of 70 extra data points. Table 3 displays the shifting of the mean as more
data are incorporated while quantitatively displaying a decreasing 90% credible
circle radius, as expected.As a final note, recall that TAC data were collected simultaneously
from both the right and left arms of participants. For an investigation into
this see [32].In Figure 5a, the deconvolved mean
BrAC curve more closely resembles the overall curve of the measured TAC values
rather than the desired BrAC, with its increased values towards the latter part
of the curve. This is to be expected as the measured TAC plays a role in the
Bayesian step, but notice that the severity of the increase in the mean value
curve is attenuated when compared to that of the TAC curve (red vs. yellow
curves at the five hour mark). A similar phenomenon also appears in Figure 6a. For Figures 6a to 6c, as the number
of subject drinking episodes R increases, we find that the mean
curve grows towards the actual BrAC curve, an expected convergence phenomenon
given the theoretical consistency results from Section 3.2.Lastly, the 90% conservative credible bands about the deconvolved BrAC
curves appear to always have a lower bound of zero. For the upper bound, the
extreme case is shown in Figure 5c. These
wide ranges in BrAC values allow us to capture the true BrAC value with high
probability, but also leave us capturing far more area under the curve than
needed. Thus, there are times when our two-step method would falsely signal that
the TAC device wearer is far more inebriated than they actually are. This
incorrect signaling might be due in part to the quantitative inaccurate readings
in Figure 5c, wherein the TAC curve is
greater than the BrAC curve. If our (training) data are mainly composed of the
other cases (TAC following BrAC at an attenuated rate), then the algorithm will
learn to “guess up” when turning the TAC back into BrAC. Lastly,
this phenomenon may be due to the use of an uninformed prior as the credible
regions in Table 3 do not approach zero.
Hence, in the future use of an informed prior may be preferable.
Concluding remarks
We believe that the i.n.i.d. assumption from Section 2.2 (specifically Section 3.2) may not reflect the realities of the data
collection method wherein two sensors are worn simultaneously on
participants’ left and right arms. We are currently investigating the
elimination of this i.n.i.d assumption. However, the results from Section 3.4 are quite reasonable and are
extremely useful when seeking to use this approach computationally in practice.
Further investigation is needed regarding the traveling mean exhibited in the
numerical results and how it is related to the non-inclusion of other covariate
data (age, height, weight, etc.). This investigation may also be aided by
attempting to combine the results of Sections
3.1 and 3.2 and let both the
approximating dimension of the kernel, as well as the amount of training data,
go to infinity simultaneously.We also believe that the packaging of all error sources into a single
random variable in Section 2.2 may yield
larger uncertainties than formulations where many additive errors are
considered. Namely, mixed-effects formulations may be utilized in order to
separate errors and might lower overall uncertainty. However, the results from
Section 3.4 are again quite
reasonable, and the usage of mixed-effects formulations can be left as a design
choice when considering the main goals and implementations of the PDE model from
Section 2.1.When our approach and results are optimized for use in actual practice,
some care will have to be taken in regard to the sampling methods used in Sections 3.4.1 and 3.4.2. If Markov Chain Monte Carlo methods are still
the method of choice, then issues such as sample size, convergence of the
chains, and randomized chain starting points will need to be taken into account.
In addition, a laboratory protocol will be need to be developed to estimate the
sensor-dependent values of q3 and
q4 that appear in eq. (2.3). As far as the numerical results
presented in Section 3.4 are concerned in
regard to the values chosen for q3 and
q4, they primarily serve to reinforce the
theoretical results in Sections 3.1 and
3.2.Finally, Of primary interest is the direct inversion of BrAC,
u, given TAC as in eq. (2.12) without the need for a two-step process like that of the
method used in this paper. We believe that a hierarchical model paired with a
Gaussian Process framework may reduce the problem down to a single step (see,
[52]). In such a framework, we place
a prior on , as well as a function space prior
over u. In this way, we obtain a method that statistically
deconvolves BrAC from TAC while providing a distribution from which we may
derive error bars on the estimated BrAC values. We are also currently examining
the inclusion of another hierarchical Bayesian model that incorporates
covariates in both priors placed over and
u. We believe that this will improve the accuracy of our
predictions by allowing the use of all available subject and environment
data.
Table 4.
Data associated with posterior determination and deconvolution used in
Figure 6.
Authors: Donald M Dougherty; Nathalie Hill-Kapturczak; Yuanyuan Liang; Tara E Karns; Sharon E Cates; Sarah L Lake; Jillian Mullen; John D Roache Journal: Drug Alcohol Depend Date: 2014-07-11 Impact factor: 4.492
Authors: Donald M Dougherty; Nora E Charles; Ashley Acheson; Samantha John; R Michael Furr; Nathalie Hill-Kapturczak Journal: Exp Clin Psychopharmacol Date: 2012-06-18 Impact factor: 3.157
Authors: Nathalie Hill-Kapturczak; John D Roache; Yuanyuan Liang; Tara E Karns; Sharon E Cates; Donald M Dougherty Journal: Psychopharmacology (Berl) Date: 2014-06-13 Impact factor: 4.530
Authors: Tara E Karns-Wright; John D Roache; Nathalie Hill-Kapturczak; Yuanyuan Liang; Jillian Mullen; Donald M Dougherty Journal: Alcohol Alcohol Date: 2016-08-13 Impact factor: 2.826