Literature DB >> 33665551

An opponent process for alcohol addiction based on changes in endocrine gland mass.

Abstract

Consuming addictive drugs is often initially pleasurable, but escalating drug intake eventually recruits physiological anti-reward systems called opponent processes that cause tolerance and withdrawal symptoms. Opponent processes are fundamental for the addiction process, but their physiological basis is not fully characterized. Here, we propose an opponent processes mechanism centered on the endocrine stress response, the hypothalamic-pituitary-adrenal (HPA) axis. We focus on alcohol addiction, where the HPA axis is activated and secretes β-endorphin, causing euphoria and analgesia. Using a mathematical model, we show that slow changes in the functional mass of HPA glands act as an opponent process for β-endorphin secretion. The model explains hormone dynamics in alcohol addiction and experiments on alcohol preference in rodents. The opponent process is based on fold-change detection (FCD) where β-endorphin responses are relative rather than absolute; FCD confers vulnerability to addiction but has adaptive roles for learning. Our model suggests gland mass changes as potential targets for intervention in addiction.

Entities: Chemical Disease Gene Species

Keywords: Human Physiology; Mathematical Biosciences; Systems Biology

Year: 2021 PMID： 33665551 PMCID： PMC7903339 DOI： 10.1016/j.isci.2021.102127

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

Drug addiction is a process in which the individual becomes increasingly occupied by drug-seeking and drug-taking behavior, escalates drug taking over time, and has difficulties quitting. Addiction has several affective stages. While the contribution of each of these stages to the addiction process is debated (Wise and Koob, 2014), it is acknowledged that the stages are shared among many addictive drugs, including alcohol, opiates, and cocaine. The first stage is the initiation stage, which is related to positive reinforcement by drug administration. Next, the phenomenon of hedonic tolerance sets in: increasing amounts of drug are needed to produce the same effect. This leads to the maintenance stage in which the drug is taken in part to avoid the negative effects of withdrawal. Changes in learning and memory systems result in compulsive drug-seeking habits and increase subsequent risk of relapse (Hyman et al., 2006; Milton and Everitt, 2012; Robbins, 2002). Once drug use is stopped, the withdrawal stage occurs, characterized by persistent negative affect (Koob, 2011; Koob and Le Moal, 2001). This provides negative reinforcement against cessation of drug use. Withdrawal gradually gives way to recovery (Figure 1A).

Figure 1

Opponent-process theory explains the affective stages of addiction

(A) According to the opponent-process theory of drug addiction (Koob and Le Moal, 2001; Koob and Volkow, 2010), the intake of an addictive drug activates two processes: a fast, primary process that causes the pleasurable effects of the drug and a slow “anti-reward” opponent process which results in negative affect. Repeated intake over long periods of time desensitizes the primary process and sensitizes the opponent process, resulting in allostasis of mood and sustained negative affect after drug withdrawal.

(B) Schematic illustrations of the effects of single and repeated drug intake on mood.

(C) The opponent process framework can explain the stages of addiction to various drugs: initiation (weeks), tolerance that leads to maintenance (weeks to decades), and withdrawal (weeks to months).

(D) Regulation of β-endorphin by alcohol. β-endorphin is secreted from POMC neurons in response to CRH from the hypothalamus, denoted H. CRH also causes pituitary corticotrophs to secrete ACTH, which is cleaved from the same peptide as β-endorphin. ACTH causes the adrenal gland to secrete cortisol, which, in turn, inhibits the secretion of CRH and the pituitary secretion of ACTH and β-endorphin. Cortisol is therefore is a candidate component of an opponent process mechanism for the rewarding effects of β-endorphin.

(E and F) This is supported by data indicating that cortisol is elevated after long-term alcoholism (E, data are obtained from the study by [Stalder et al., 2010], and data are represented as mean ± SEM), and that alcohol preference is reduced in adrenalectomized animals and restored by cortisol replacement (F, data are obtained from the study by [Fahlke et al., 1994], and data are represented as boxplot with median line and whiskers for the lowest and highest values).

Opponent-process theory explains the affective stages of addiction (A) According to the opponent-process theory of drug addiction (Koob and Le Moal, 2001; Koob and Volkow, 2010), the intake of an addictive drug activates two processes: a fast, primary process that causes the pleasurable effects of the drug and a slow “anti-reward” opponent process which results in negative affect. Repeated intake over long periods of time desensitizes the primary process and sensitizes the opponent process, resulting in allostasis of mood and sustained negative affect after drug withdrawal. (B) Schematic illustrations of the effects of single and repeated drug intake on mood. (C) The opponent process framework can explain the stages of addiction to various drugs: initiation (weeks), tolerance that leads to maintenance (weeks to decades), and withdrawal (weeks to months). (D) Regulation of β-endorphin by alcohol. β-endorphin is secreted from POMC neurons in response to CRH from the hypothalamus, denoted H. CRH also causes pituitary corticotrophs to secrete ACTH, which is cleaved from the same peptide as β-endorphin. ACTH causes the adrenal gland to secrete cortisol, which, in turn, inhibits the secretion of CRH and the pituitary secretion of ACTH and β-endorphin. Cortisol is therefore is a candidate component of an opponent process mechanism for the rewarding effects of β-endorphin. (E and F) This is supported by data indicating that cortisol is elevated after long-term alcoholism (E, data are obtained from the study by [Stalder et al., 2010], and data are represented as mean ± SEM), and that alcohol preference is reduced in adrenalectomized animals and restored by cortisol replacement (F, data are obtained from the study by [Fahlke et al., 1994], and data are represented as boxplot with median line and whiskers for the lowest and highest values). These affective changes have been studied by the opponent-process theory of addiction (Koob and Le Moal, 1997; Koob and Volkow, 2016; Solomon, 1980) (Figures 1A, 1B, and 1C). This theory posits that chronic drug use activates anti-reward processes called opponent processes. Opponent processes are secondary slow processes that are activated by the drug and which antagonize the primary, pleasurable process (Figure 1A). After initial drug use, the opponent process causes a slight and slow-to-decay undershoot in mood. Repeated, extensive drug use sensitizes the secondary process and desensitizes the primary process (a phenomena known as “reward allostasis”), leading to deficient primary hedonic responses and exaggerated negative affect upon drug withdrawal. This explains the transition from the initial euphoric stage of drug use to the maintenance stage, where the drug is taken to avoid the negative effects of withdrawal that are due to the opponent process. An important challenge for addiction research is to identify the molecular mechanisms of these opponent processes. The difficulty lies in the complexity of the physiological circuits that underlie addiction. These reward-processing circuits control pleasure, pain, and reinforcement (Berridge and Robinson, 2016; Kelley and Berridge, 2002; Koob and Le Moal, 1997; Koob and Volkow, 2016; Solomon, 1980). Addictive drugs including alcohol, cocaine, and opiates affect the levels of specific neurotransmitters and other secreted molecules, which, in turn, affect the brain regions involved in reward processing (Koob and Volkow, 2016). Since these circuits are intricate and involve feedback over multiple timescales, mathematical models are essential for analyzing their relation with addiction dynamics and for pointing to potential opponent processes. Most proposals for opponent processes involve neurotransmitter circuits, primarily dopamine, a key neuromodulator of learning and motivation (Berridge and Robinson, 1998; Glimcher, 2011; Schultz, 1998; Schultz et al., 1997; Wise, 2004). One such model was developed by Gutkin et al. to study nicotine addiction (Gutkin et al., 2006). Gutkin et al. analyzed the interaction between nicotine and nicotine receptors expressed in neurons which secrete dopamine. They proposed a model where on the fast timescale nicotine causes dopamine to be secreted and upregulates the phasic dopamine response, while on a much slower timescale, chronic nicotine administration activates an opponent process which downregulates tonic dopamine responses. Upon nicotine withdrawal, this slow process causes an undershoot in tonic dopamine. Other proposed opponent processes include the recruitment of stress-associated neurotransmitters such as dynorphin and corticotropin-releasing hormone (CRH) in the amygdala and various other neurochemical systems (reviewed in the study by (Koob and Volkow, 2016)). Since opponent processes are important for understanding addiction, it is useful to propose additional physiological systems which can provide opponent process properties. Here, we propose a new class of opponent processes, based not on neurotransmitters but instead on endocrine circuits. For this purpose, we model the dynamics of an endocrine system which plays an important role in alcohol addiction: the secretion of endogenous opioids following HPA axis activation (Figure 1D). Endogenous opioids control subjective reward, euphoria and pain sensitivity (Darcq and Kieffer, 2018; Drews and Zimmer, 2010; Gerrits et al., 2003; Gianoulakis, 2004; Kiefer et al., 2002; Kuzmin et al., 1997; Mitchell et al., 2012; Roberts et al., 2000; Roth-Deri et al., 2008; Trigo et al., 2009; Van Ree, 1996). They are secreted in response to stress after activation of the HPA axis (Nikolarakis et al., 1986; Tsigos and Chrousos, 2002) and affect dopamine release (Spanagel et al., 1992). In particular, β-endorphin, which causes euphoria and analgesia by binding to the same receptor as morphine, has been tightly linked with addiction to alcohol and other substances which activate the HPA axis (Gianoulakis, 2004; Kiefer et al., 2002; Roth-Deri et al., 2008; Trigo et al., 2009). Rodents that lack β-endorphin or its receptor (the mu-opioid receptor) show diminished alcohol self-administration (Hall et al., 2001; Racz et al., 2008; Roberts et al., 2000), and deficiency in β-endorphin during the weeks after alcohol withdrawal is associated with withdrawal anxiety in patients (Kiefer et al., 2002). The importance of β-endorphin for alcohol addiction raises the question of what its opponent process may be. One candidate is the secretion of the stress hormone cortisol from the adrenal gland. Cortisol inhibits β-endorphin secretion both directly and by inhibition of CRH secretion. This was shown by experiments employing adrenalectomy, as well as by pharmacological manipulations such as the administration of dexamethasone and of the glucocorticoid synthesis inhibitor metyrapone (Guillemin et al., 1977; Hargreaves et al., 1987; Holaday et al., 1979; Kreek et al., 1984; Lim et al., 1982; Nakao et al., 1978; Rivier et al., 1982; Young, 1989). Cortisol is greatly elevated during periods of excessive alcohol consumption (Figures 1E, (Esel et al., 2001; Stalder et al., 2010)) and returns to baseline over a few weeks after alcohol withdrawal (von Bardeleben et al., 1989; Esel et al., 2001; Kiefer et al., 2002; Marchesi et al., 1997). Abolishing cortisol secretion by the removal of the adrenal glands diminishes alcohol preference in rodents, and alcohol preference in adrenalectomized animals is rescued by corticosterone replacement (Figure 1F) (Fahlke, 2000; Fahlke et al., 1994; Hansen et al., 1995; Lamblin and De Witte, 1996). Similar results were reported for cocaine (Goeders, 2002; Goeders and Guerin, 1996). Since adrenalectomy removes the negative feedback from β-endorphin secretion (Young, 1989), one may hypothesize that cortisol secretion is an opponent process for β-endorphin. However, cortisol has a half-life of only about 1 h, similar to that of β-endorphin (Foley et al., 1979; McKay and Cidlowski, 2003). This is much faster than the timescale of days to weeks expected for an opponent process, suggesting that adrenal secretion of cortisol cannot by itself be the opponent process for β-endorphin secretion. Here, we propose that the opponent process for β-endorphin secretion is structural change of the HPA glands following repeated alcohol intake, namely the growth of the total functional mass of the hormone-secreting cells in the adrenal and pituitary glands. We show this by analyzing the dynamics of β-endorphin secretion following acute and repeated alcohol intake, using a recently developed mathematical model of the HPA axis (Karin et al., 2020). The important aspect of this model is that it includes the changes in the total cell mass of the HPA glands due to cell proliferation and growth. This growth is caused by the HPA hormones, which act as growth factors for specific HPA glands. Changes in gland masses are well documented in conditions associated with HPA activation in humans (Amsterdam et al., 1987; Carey et al., 1984; Doppman et al., 1988; Ludescher et al., 2008; Nemeroff et al., 1992; Rubin et al., 1995, 1996), including enlarged adrenal glands in chronic alcohol abuse (Carsin-Vu et al., 2016). Adrenal weight in rodents also increases after alcohol administration (Adams and Hirst, 1984; Đikić et al., 2011; Mendelson et al., 1971). The gland growth mechanism adds a timescale of weeks to the hour timescale of the HPA hormones. The model shows that while alcohol initially increases β-endorphin levels, persistent HPA activation by alcohol causes the HPA glands to grow, causing increased cortisol secretion which renormalizes β-endorphin levels and suppresses them upon withdrawal. The gland masses thus provide an opponent process, explaining data on β-endorphin and HPA hormones during alcohol addiction and withdrawal, as well as the effect of adrenalectomy and cortisol replacement on alcohol preference in rodents. The main message of this study is thus that a weeks-scale feedback loop in which HPA gland masses change over time provides an opponent process. Using the model, we analyze the fundamental reason for the HPA opponent process: fold-change detection (FCD) for β-endorphin dynamics in response to drug inputs. FCD is a property of biological circuits (Adler and Alon, 2018; Shoval et al., 2010) where the output of the circuit responds to “relative” changes in the input, rather than absolute changes. The model makes several predictions that can be tested experimentally. We show using a minimal model of reward optimization that FCD circuits that control subjective reward are uniquely fragile to addiction because they maintain sensitivity to reward at increasing levels of drug intake. In addition to this fragility, FCD control of subjective reward has two beneficial functions for reward-based learning. The first is the ability to learn across many orders of magnitude of rewards. The second benefit is that FCD implements an important concept from reinforcement learning, called potential-based reward shaping (Ng et al., 1999), which helps an individual to learn by adding auxiliary rewards that guide exploration. Thus, FCD has beneficial properties for learning, but has fragility to addiction.

Results

A model for drug-induced regulation of β-endorphin

To characterize the HPA opponent process, we model the regulation of β-endorphin (Figure 1D, see Table 1 for model parameters). β-endorphin is an endogenous opioid that is secreted in response to hypothalamic CRH from corticotroph cells in the pituitary gland and from Pro-opiomelanocortin (POMC) neurons. Alcohol taking causes CRH secretion and hence β-endorphin secretion (Čupić et al., 2017; Lee et al., 2001, 2004; Rivier, 1996; Rivier et al., 1984; de Waele and Gianoulakis, 1993). In addition to causing the secretion of β-endorphin, CRH also activates the HPA endocrine cascade which results in the secretion of the stress hormone cortisol. Cortisol, in turn, provides negative feedback on CRH and β-endorphin. This feedback acts on the timescale of minutes to hours and cannot by itself provide slow opponent properties on the timescale of days to weeks.

Table 1

Parameter values for HPA model

Parameter	Value
w1	0.17/min (Andersen et al., 2013)
w2	0.035/min (Andersen et al., 2013)
w3	0.009/min (Andersen et al., 2013)
w4	0.0019/min (Foley et al., 1979)
wC	0.099/day
wA	0.049/day
KGR	8
nGR	3 (Andersen et al., 2013)

Parameter values for HPA model To address the slow timescale, we recently developed a mathematical model of the HPA axis on the timescale of weeks (Karin et al., 2020). We showed that these dynamics can explain the week-long changes in responses to CRH tests after withdrawal from prolonged alcohol abuse (Karin et al., 2020). Here, we add β-endorphin to this model and use it to study addiction. The week timescale in the model is due to growth of the total mass of the pituitary corticotroph cells that secrete ACTH and β-endorphin and the adrenal cortex cells that secrete cortisol. Hereafter, we call such total cell masses “gland masses” for brevity, although the glands also contain other cell types. The mass of the gland affects the amount of hormone it secretes for a given input signal: twice the mass is assumed to result in twice the hormone secretion. These cells constantly turn over, with a turnover time of weeks. Their major growth factors are the HPA hormones themselves: CRH causes corticotroph mass to grow (Gertz et al., 1987; Westlund et al., 1985; Carey et al., 1984; Horvath, 1988; Schteingart et al., 1986; O'Brien et al., 1992; Asa et al., 1992; Bruhn et al., 1984; Young and Akil, 1985; Gulyas et al., 1991; McNicol et al., 1988), and ACTH causes the adrenal cortex mass to grow (Dallman, 1984; Lotfi and de Mendonca, 2016; Swann, 1940; Ulrich-Lai et al., 2006). To describe the dynamics of β-endorphin following alcohol intake, we developed equations as described in (Transparent Methods, Figure 2A). In the model, alcohol taking is described as an input that is additive with the basal circuit input, so the total input is (Figure 2A). Persistent HPA activation by alcohol increases the hormone levels, and since these hormones act as growth factors for the glands, the gland masses grow on the timescale of weeks.

Figure 2

Opponent process for β-endorphin secretion based on gland mass changes in the HPA axis

(A) We model HPA control of β-endorphin by adding β-endorphin to our recent model of the HPA axis (Karin et al., 2020) that incorporated week-timescale interactions for the gland masses (red arrows). CRH causes the growth of pituitary corticotroph cells, and ACTH causes the growth of the adrenal cortex cells. The model equations and parameters are provided in Transparent Methods.

(B) The model provides opponent process properties, where the primary process is the secretion of β-endorphin following HPA activation, and the opponent process is changes in cortisol secretion due to change in adrenal mass following prolonged HPA activation. The results generalize to other mood-enhancing factors that are secreted in response to alcohol intake and that are inhibited by cortisol.

(C) A prolonged increase in alcohol taking causes β-endorphin to initially increase (primary process) but then return to baseline (opponent process). The reason for this is the growth of the gland masses, which increases the negative feedback of cortisol on β-endorphin. The return of β-endorphin to baseline develops over the timescale of weeks, which is much slower than the half-life of β-endorphin. Stopping alcohol intake after months leads to a drop in β-endorphin levels which lasts for a few weeks, together with an undershoot in pituitary corticotroph mass. Shown is pituitary β-endorphin; similar secretion patterns occur for β-endorphin secreted from POMC neurons.

Opponent process for β-endorphin secretion based on gland mass changes in the HPA axis (A) We model HPA control of β-endorphin by adding β-endorphin to our recent model of the HPA axis (Karin et al., 2020) that incorporated week-timescale interactions for the gland masses (red arrows). CRH causes the growth of pituitary corticotroph cells, and ACTH causes the growth of the adrenal cortex cells. The model equations and parameters are provided in Transparent Methods. (B) The model provides opponent process properties, where the primary process is the secretion of β-endorphin following HPA activation, and the opponent process is changes in cortisol secretion due to change in adrenal mass following prolonged HPA activation. The results generalize to other mood-enhancing factors that are secreted in response to alcohol intake and that are inhibited by cortisol. (C) A prolonged increase in alcohol taking causes β-endorphin to initially increase (primary process) but then return to baseline (opponent process). The reason for this is the growth of the gland masses, which increases the negative feedback of cortisol on β-endorphin. The return of β-endorphin to baseline develops over the timescale of weeks, which is much slower than the half-life of β-endorphin. Stopping alcohol intake after months leads to a drop in β-endorphin levels which lasts for a few weeks, together with an undershoot in pituitary corticotroph mass. Shown is pituitary β-endorphin; similar secretion patterns occur for β-endorphin secreted from POMC neurons. To test whether the glands can implement an opponent process for β-endorphin secretion (Figure 2B), we consider a months-long input pulse that corresponds to a prolonged increase in average alcohol taking (Figure 2C, similar results are obtained by simulating discrete drinking episodes). Initially, β-endorphin and cortisol levels increase because the drug activates the HPA axis. The increase in β-endorphin after drug intake is the “a-process” in the opponent-process model and has a timescale of a few hours. This occurs without significant change in the gland masses. Over the next several weeks, glands masses adjust, as described in Karin et al. (Karin et al., 2020). A change in gland mass is assumed to cause a proportional change in the rate of secretion of the hormone produced by the gland. This is because we consider the secretion rate of each hormone as a product of three factors: the effect of the signals (agonists like CRH and antagonists like glucocorticoids), the maximal secretion capacity per unit biomass, and the total biomass of the cells. These factors are separated by timescales: signals can change over minutes, maximal secretion capacity over hours (due to protein expression changes), and total cell biomass over days to weeks (due to hypertrophy and hyperplasia). Thus, at a given level of signals and at a given maximal secretion capacity per unit biomass, the production of the hormone is proportional to the cell total biomass: doubling the biomass doubles secretion. The model takes into account both the fast changes in signals and the slow changes in gland masses. The separation into these factors allows us to consider the effects over weeks of gland mass changes and disentangle them from faster effects. It also allows us to follow changes in the mass of two cell types—pituitary corticotrophs and adrenal cortex cells—which change at the same time. With these considerations, we see that HPA activation causes growth of pituitary corticotroph mass over weeks, due to the action of CRH as a growth factor. The enlarged biomass enhances β-endorphin secretion. In parallel, HPA activation makes the adrenal mass grow because of increased levels of its growth factor, ACTH. Growth of the adrenal mass causes increased secretion of glucocorticoids, which eventually suppresses β-endorphin secretion. This acts to renormalize β-endorphin levels back to baseline (mathematically, this is due to the integral feedback in Equations 4 and 5 in Transparent Methods). Pituitary corticotroph mass also adapts back to baseline (Figure 2C). Such a growth in adrenal mass and adaptation of corticotroph mass is consistent with a recent single-cell study of murine HPA under chronic stress (Lopez et al., 2021). The change in the adrenal mass is the slow “b-process” in the opponent-process model. It causes suppression of β-endorphin on a timescale of days to weeks. The same effects cause an undershoot of β-endorphin which lasts for weeks to months upon alcohol withdrawal after prolonged intake. The undershoot is caused by the enlarged adrenal cortex which returns to its baseline size over several weeks, together with an undershoot in pituitary corticotroph mass (Figure 2C). These dynamics are qualitatively similar to those predicted by Gutkin et al. for the effect of nicotine on tonic dopamine through nicotinic receptors (Gutkin et al., 2006) but arise due to different physiological processes. These opponent-process stages of gland growth and shrinkage occur regardless of model parameters, as long as the turnover time of the gland cells is much slower than the half-lives of the hormones. The opponent process is thus a robust prediction of the model. The model explains several important observations regarding alcohol addiction. It explains the large increase in average cortisol levels seen after prolonged drinking (Stalder et al., 2010), the acute elevation of β-endorphin levels after alcohol administration (Frias et al., 2002; Marinelli et al., 2003), and the prolonged drop seen in β-endorphin levels after alcohol withdrawal (Aguirre et al., 1990; Esel et al., 2001; Inder et al., 1995; Kiefer et al., 2002; Marchesi et al., 1997; Vescovi et al., 1992). More generally, it captures the dynamics of dysregulation of HPA hormones during alcohol withdrawal (von Bardeleben et al., 1989; Karin et al., 2020). These effects are due in the model to the slow changes in the gland masses.

Opponent-process behavior is due to a fold-change detection property

The opponent-process behavior of β-endorphin is due to a systems-level feature of the HPA model called fold-change detection (FCD) (Figure 3A). This property has been extensively studied in systems biology. A system with FCD has an output whose entire dynamics depends only on relative changes in input, rather than absolute changes (Adler and Alon, 2018; Shoval et al., 2010). All systems with FCD also have the property of exact adaptation, where the output adapts precisely back to baseline after step changes in its input (Ferrell, 2016; Shoval et al., 2010).

Figure 3

FCD control of subjective reward provides the hallmarks of the opponent process model and is prone to addiction

(A) Fold-change detection is a property of biological circuits where, starting from steady-state conditions, the circuit responds to relative rather than absolute changes in input.

(B) A pulse input (dark arrow) causes a doubled lobed response with a fast positive response followed by a slow negative response, so that in the linear regime the overall integral is zero. This property entails that positive hedonic responses (“a-process”) are balanced by subsequent negative hedonic responses (“b-process”). Repeated pulses desensitize the positive response and sensitize the negative response, explaining reward allostasis.

(C) FCD circuits maintain sensitivity to drug intake by keeping the system away from saturation. We demonstrate this with a schematic illustration, using a Hill function which maps between β-endorphin levels and subjective reward . Initially, the individual increases drug intake to maximize subjective reward. This causes β-endorphin to increase so that subjective reward rises closer to saturation (arrow marked (i)). Then, due to FCD, β-endorphin secretion adapts back to baseline over weeks, and subjective reward returns to baseline (ii). The circuit maintains its sensitivity relative to the new baseline so that a larger drug increase is required to get the same rise in subjective reward. Finally, when the individual ceases to consume the drug, β-endorphin and subjective reward fall (iii), until β-endorphin re-adapts and returns to baseline over weeks (iv).

(D) Four prototypical circuits for the control of subjective reward, FCD (red lines) where subjective reward depends on the fold change of drug input, negative or positive feedback (green and orange lines, respectively), where a slow process Z feeds back on subjective reward, and activation (blue lines), where the drug input translates directly into subjective reward (equations in Transparent Methods).

(E) Simulations of drug intake that rises linearly with time, . In all circuits except FCD, drug-induced reward rises with the increase in drug intake. Withdrawal mood drops without bound only for the FCD circuit.

(F) The FCD circuit (red line) maintains preference for an increase in drug intake even at high levels of drug intake, whereas other circuits show diminishing preference at high intake levels.

FCD control of subjective reward provides the hallmarks of the opponent process model and is prone to addiction (A) Fold-change detection is a property of biological circuits where, starting from steady-state conditions, the circuit responds to relative rather than absolute changes in input. (B) A pulse input (dark arrow) causes a doubled lobed response with a fast positive response followed by a slow negative response, so that in the linear regime the overall integral is zero. This property entails that positive hedonic responses (“a-process”) are balanced by subsequent negative hedonic responses (“b-process”). Repeated pulses desensitize the positive response and sensitize the negative response, explaining reward allostasis. (C) FCD circuits maintain sensitivity to drug intake by keeping the system away from saturation. We demonstrate this with a schematic illustration, using a Hill function which maps between β-endorphin levels and subjective reward . Initially, the individual increases drug intake to maximize subjective reward. This causes β-endorphin to increase so that subjective reward rises closer to saturation (arrow marked (i)). Then, due to FCD, β-endorphin secretion adapts back to baseline over weeks, and subjective reward returns to baseline (ii). The circuit maintains its sensitivity relative to the new baseline so that a larger drug increase is required to get the same rise in subjective reward. Finally, when the individual ceases to consume the drug, β-endorphin and subjective reward fall (iii), until β-endorphin re-adapts and returns to baseline over weeks (iv). (D) Four prototypical circuits for the control of subjective reward, FCD (red lines) where subjective reward depends on the fold change of drug input, negative or positive feedback (green and orange lines, respectively), where a slow process Z feeds back on subjective reward, and activation (blue lines), where the drug input translates directly into subjective reward (equations in Transparent Methods). (E) Simulations of drug intake that rises linearly with time, . In all circuits except FCD, drug-induced reward rises with the increase in drug intake. Withdrawal mood drops without bound only for the FCD circuit. (F) The FCD circuit (red line) maintains preference for an increase in drug intake even at high levels of drug intake, whereas other circuits show diminishing preference at high intake levels. The HPA model shows that the dynamic response of CRH depends only on relative changes in input, rather than absolute changes (see Transparent Methods for the proof). Because β-endorphin is secreted in response to CRH, it also has the FCD property (see Transparent Methods for the proof ). The FCD property holds at low to moderate levels of HPA axis activation, when the low-affinity glucocorticoid receptor (GR) activation is weak. At higher levels of HPA activation, GR feedback ameliorates FCD and causes reduced responses to the same fold changes in input. The reason for FCD is that gland masses grow in response to an increase in input on the timescale of weeks. The increased gland mass results in higher levels of cortisol. Cortisol levels thus rise proportionally to the input level. Since cortisol inhibits CRH and β-endorphin secretion, the latter are “normalized” by the input and show exact adaptation back to their baseline (Figure 2B), as well as fold-change responses (Figure 3A). The FCD property provides the essential hallmarks of an opponent process (Figure 3B), as defined in the study by (Koob and Volkow, 2010). These hallmarks concern the hedonic responses to single and repeated episodes of drug intake. After drug intake, a positive hedonic response occurs and matches the intake's duration and intensity. The negative hedonic response, which is due to the opponent process, follows the positive response, with a slow and prolonged decay. Repeated exposure to the drug reduces the positive response and increases the negative response, so the overall average hedonic response remains constant (this is referred to as reward allostasis). All of these hallmarks are captured by FCD (Figure 3B). The positive response of an FCD circuit to a pulse input is always followed by a slow trailing negative response because the overall integral of the response must be zero (Figure 3B, proof in Transparent Methods). Because FCD circuits adapt to the average input level, repeated exposure to drug intake desensitizes the positive response and sensitizes the negative response, leading to reward allostasis (Figure 3B). We therefore propose that FCD, a concept from systems biology that is common in sensory circuits, aligns with the essential features of the opponent-process theory. To make these notions more precise, we attempted to connect these FCD properties to the reward aspects of the opponent-process theory more formally. For this purpose, we need to operationally define three variables in quantitative terms: subjective reward, drug reward, and withdrawal mood. We denote by the basal input to the HPA axis and by the input due to drug intake at time t, so the total input to the HPA axis is . We take the generic assumption that the effect of β-endorphin on subjective reward is a saturating (Hill) function, which describes the saturation of the opioid receptors, or downstream saturation of subjective reward in the brain. Thus, the instantaneous subjective reward is where is the level of β-endorphin and u is the HPA axis input. The total subjective reward R from taking a drug can be calculated by the accumulated subjective reward over a time period after taking the drug. As customary in the reinforcement learning literature, total subjective reward is given by the discounted integral over the instantaneous reward: where is a “future discounting” factor. Individuals adjust drug intake over time to maximize. We assume that the drug reward is the total subjective reward and that the mood during withdrawal from the drug is the total subjective reward without drug . With these definitions, the changes in subjective reward in the addiction process are illustrated in Figure 3C. Initially, drug use, which transiently increases β-endorphin levels, causes an increase in subjective reward (Figure 3C, arrow (i)). This is the initiation phase. After a few weeks, adaptation due to FCD brings β-endorphin and subjective reward back to baseline, despite the increase in drug intake (Figure 3C, arrow (ii)). The circuit therefore produces tolerance to the subjective reward of the drug, leading to escalation of drug intake. Withdrawal from the drug causes a drop in subjective reward (Figure 3C, arrow (iii)) which resolves after several weeks (Figure 3C, arrow (iv)). We conclude that the β-endorphin circuit, in which β-endorphin responds to fold changes in input, shows the hallmarks of the opponent-process model of addiction.

FCD circuits are more vulnerable to addiction than other circuits

FCD thus seems to provide the hallmarks for addiction, by providing a fast response and a slow opponent process. We asked whether FCD is special in this regard or whether other common circuit designs in biology are also prone to addiction. To test this, we compare different prototypical circuits (Figure 3D) in which a drug induces the secretion of a factor which increases subjective reward. We compare the FCD circuit with other circuits that have a slow component with either negative or positive feedback on the secreted factor. We also consider a simple activation topology without a slow component. To make a “mathematically controlled comparison” (Savageau, 1972), we provide all circuits with the same timescale parameters (Transparent Methods). Simulations show that the FCD circuit is unique in the sense that drug reward becomes dissociated from drug intake level and remains constant and that withdrawal mood drops without bound (Figure 3E). These properties can be shown analytically for all circuits (Transparent Methods). Therefore, FCD best captures the properties of the opponent-process model. To intuitively understand why FCD is especially fragile to addiction, consider the following explanation. At any level of drug intake, the FCD circuit shows a constant (positive) preference for increasing drug intake by 2-fold and a constant (negative) preference for decreasing drug intake by 2-fold (Figure 3F). This is in contrast with the other circuits, where preference for increasing drug intake and dislike for decreasing drug intake both diminish at high levels of drug intake, due to saturation effects. The reason for this is the FCD dynamics presented in Figure 3C, arrow (ii): as drug intake increases, FCD constantly pushes the sensitivity to further intake away from saturation, thus motivating further increases in intake and demotivating decreases in intake. This may explain the fragility of FCD to addiction since increasing drug intake does not diminish preference for further increases in drug intake.

FCD may be advantageous for learning and reward shaping

Finally, we ask what might be the selective advantage of an FCD circuit for reward. We therefore analyze possible advantages of FCD in circuits that control subjective reward. We focus on the role of reward in the learning of useful behaviors, a major task for cognitive systems. We demonstrate two advantages that FCD confers on learning of behaviors. Behavior is learned by adjusting actions according to outcomes such that actions that lead to rewarding outcomes are repeated more often (that is, they are reinforced); actions that lead to aversive outcomes are avoided (Dickinson, 1994; Herrnstein, 1970; Thorndike, 1927). This principle, known as “Thorndike's law of effect”, is a cornerstone of our understanding of animal learning. It has been mathematically analyzed in the field of reinforcement learning (Dayan and Daw, 2008; Sutton and Barto, 2018). From the point of view of learning, FCD control of subjective reward is initially puzzling. Algorithms for reinforcement learning, such as the widely used Q-learning algorithm for optimal action choice, often assume a direct correspondence between the value of an input and its translation into subjective reward. FCD, on the other hand, causes the subjective reward to depend also on the background level of input, breaking the direct correspondence between the input stimulus and its rewarding properties. Here, we point out that FCD can be crucial for reinforcement learning when rewards span several orders of magnitude. By rescaling inputs according to the background input, FCD allows learning despite large differences in reward values. This FCD feature provides a wide dynamic range over decades of input. An analogous behavior over multiple scales is provided by FCD in systems such as the E. coli chemotaxis circuit (Adler and Alon, 2018). The importance of such scale-invariant sensing for learning has been demonstrated for artificial algorithms such as neural networks (Ioffe and Szegedy, 2015; Santurkar et al., 2018; Sola and Sevilla, 1997). Scale invariance makes gradients more reliable and predictable, reducing the likelihood of vanishing or exploding gradients, and reduces sensitivity to hyper-parameters and initialization (Santurkar et al., 2018). In fact, artificial reinforcement learning algorithms have been shown to benefit from FCD-like processes. Van Hasselt et al. added adaptive normalization of the target values of a deep Q-network, which allowed the algorithm to learn to play computer games with varying magnitudes of reward (van Hasselt et al., 2016). This adaptive normalization is similar to FCD since the rewards were effectively normalized by their background level. This raises the possibility that scale invariance and FCD may also be important for biological reinforcement learning. In addition to scale invariance, FCD has a second important benefit: it provides a natural implementation of a well-established concept in reinforcement learning—“potential-based reward shaping” (Devlin and Kudenko, 2011, 2012; Laud, 2004; Ng et al., 1999; Skinner, 2019). Reward shaping is the addition of auxiliary rewards in order to guide exploration and behavior toward desirable objectives. The motivation behind reward shaping is that “real” rewards, such as access to food or sexual contact, may be sparsely achieved. This sparsity limits the extent to which an agent can learn how to achieve these rewards. As an example, consider a task presented by Ng. et al. (Ng et al., 1999; Randløv and Alstrøm, 1998), where an agent needs to learn to ride a bicycle from point A to a distant point B. To learn this, the agent receives a reward upon reaching B. Since reaching B is difficult, the agent gets only sparse feedback for its performance. To address this, the agent is also provided with an auxiliary reward when it approaches B (Randløv and Alstrøm, 1998). However, this leads to a problem since the agent can now receive reward by simply riding in loops around B, gaining the proximity reward again and again, without ever reaching B. To solve this problem and provide auxiliary rewards that preserve the learning of optimal behavior, Ng et al. suggested that the agent must be provided with “potential-based auxiliary rewards” (Ng et al., 1999). The intuition is that potential functions have a net effect of zero on any loop. Their net effect depends only on the start and end points. This concept was extended by Devlin and Kudenko (Devlin and Kudenko, 2012) to dynamical potential-based auxiliary rewards. In the standard notation of reinforcement learning, the auxiliary reward for a transition from state s at time t to state s’ at time t’ is given by , where is a discounting factor and is a real-valued potential function. Potential-based reward shaping does not have a problem of loops that can distract the agent. Moreover, potential-based reward shaping preserves the learning of optimal behavior (or “policy invariance”) (Ng et al., 1999). FCD control of subjective reward can provide a physiological implementation of potential-based reward shaping (Figure 4). The reason for this is that the output of the FCD circuit tracks the logarithmic derivative of its input (Adler and Alon, 2018; Adler et al., 2014; Lang and Sontag, 2016). Thus, FCD provides a potential function equal to the log of the input (see Transparent Methods). Such circuits can therefore be thought of as circuits to guide exploration, rather than to learn values.

Figure 4

FCD outputs the logarithmic derivative of the input, and therefore provides a physiological implementation of potential-based reward shaping

(A) The negative feedback circuit output (green) and activation circuit output (blue) track the input stimulus (cyan), whereas the FCD circuit output (red) tracks its logarithmic derivative (orange). The input used is . The logarithmic derivative of this stimulus is , shown scaled and shifted for clarity, (see Transparent Methods).

(B) Physiological circuits like those discussed in this study convert environmental stimuli to subjective reward. Circuits where the reward corresponds directly to the input stimuli (such as direct activation, negative, or positive feedback) facilitate learning of the input stimuli. On the other hand, circuits like FCD, where the subjective reward tracks the derivative of the input stimuli, facilitate reward shaping, where input stimuli are not directly learned but instead guide exploration toward rewarding behavior.

FCD outputs the logarithmic derivative of the input, and therefore provides a physiological implementation of potential-based reward shaping (A) The negative feedback circuit output (green) and activation circuit output (blue) track the input stimulus (cyan), whereas the FCD circuit output (red) tracks its logarithmic derivative (orange). The input used is . The logarithmic derivative of this stimulus is , shown scaled and shifted for clarity, (see Transparent Methods). (B) Physiological circuits like those discussed in this study convert environmental stimuli to subjective reward. Circuits where the reward corresponds directly to the input stimuli (such as direct activation, negative, or positive feedback) facilitate learning of the input stimuli. On the other hand, circuits like FCD, where the subjective reward tracks the derivative of the input stimuli, facilitate reward shaping, where input stimuli are not directly learned but instead guide exploration toward rewarding behavior.

Discussion

We propose an opponent process in alcohol addiction based on dynamical changes in the gland masses of the HPA axis. The gland- mass changes due to chronic alcohol intake cause the initial rise in β-endorphin to settle back down to baseline within weeks and to undershoot for weeks after withdrawal. These dynamics contribute to the physiological basis of the hedonic tolerance and withdrawal stages of addiction. The present opponent process may also apply to other addictive drugs that activate the HPA axis, including cocaine and nicotine (Armario, 2010). This suggests that, in addition to neurotransmitter circuits, endocrine gland mass changes may be potentially important for addiction. Of particular mechanistic importance in the model is the growth of the adrenal cortex caused by chronic activation of the HPA axis. This growth increases cortisol secretion, which suppresses β-endorphin secretion. Consistent with the model, cortisol levels are about 3-fold higher in people with alcohol abuse disorder during active drinking periods (Stalder et al., 2010); cortisol returns to normal levels after several weeks of abstinence (von Bardeleben et al., 1989; Esel et al., 2001; Marchesi et al., 1997). The model explains why adrenalectomy diminishes alcohol preference in rodents (Fahlke, 2000; Fahlke et al., 1994; Goeders, 2002; Goeders and Guerin, 1996; Hansen et al., 1995; Lamblin and De Witte, 1996). However, the role of HPA gland masses in alcohol abuse disorder or other drug addictions has not, to the best of our knowledge, been directly studied, except for the study by Carsin-Vu et al. (Carsin-Vu et al., 2016) which provides evidence for adrenal enlargement in alcohol abuse disorder. While our model focused on the role of β-endorphin, it can potentially be generalized to other factors important for alcohol addiction. Any secreted factor that is stimulated by alcohol and inhibited by glucocorticoids or is secreted as a response to hypothalamic CRH is predicted by the model to have similar long-term dynamics to β-endorphin (Transparent Methods). One potentially relevant factor is dopamine transmission, which is affected by glucocorticoids through interactions with several neural systems (Butts et al., 2011; Piazza et al., 1996; Piazza and Le Moal, 1996). Additional factors include enkephalins which preferentially bind the delta opioid receptor (Froehlich et al., 1991; Marinelli et al., 2005) and are implicated in alcohol addiction (Figure 2B). Future extensions of the present model may include these factors. The HPA-activating effect of drugs such as alcohol provides reward for mild to moderate stimulation. However, at high levels, limiting mechanisms begin to act. One mechanism is the inhibition of CRH secretion by high levels of cortisol through the low-affinity GR. Another mechanism is secretion of dynorphins, endogenous opioids which are secreted as a result of prolonged CRH secretion (Nikolarakis et al., 1986). In contrast to β-endorphin, dynorphins have a dysphoric rather than a euphoric effect. This dysphoric effect can presumably prevent over-activation of the HPA axis from being rewarding. Dynorphin secretion was hypothesized to play a role in dysphoria that follows drug withdrawal (Koob and Volkow, 2016). The present model infers that increased glucocorticoids during alcohol abuse act to inhibit β-endorphin. This inference is based on experiments done in other contexts, using pharmacological interventions and adrenalectomy. To the best of our knowledge, this inhibition effect has not been directly tested in the context of alcohol abuse disorder. Such experiments will be an important test for the model. The key mathematical feature that underlies the proposed opponent process is that β-endorphin responds to relative changes in average drug intake, rather than absolute changes, a feature called FCD. FCD provides the essential hallmarks of the opponent-process theory of addiction. We find that FCD control of reward is especially fragile to addiction. In our model for β-endorphin dynamics, FCD is implemented physiologically by the growth of endocrine glands. FCD in reward pathways can also be implemented in principle by other physiological mechanisms (Adler et al., 2017). For example, adaptation and relative responses to rewards (both hallmarks of FCD) are well documented for midbrain dopamine neurons (Tobler et al., 2005). FCD can occur at the cellular level by biochemical mechanisms such as receptor modification (Lazova et al., 2011) or by gene regulation circuits such as incoherent feedforward loops (Goentoro et al., 2009). Regardless of the precise implementation, FCD promotes addiction by maintaining sensitivity to increasing drug intake levels, keeping them away from reward saturation. In this way, wanting the drug is never sated until limiting mechanisms kick in. The relevant timescale for the development of drug tolerance, as well as the affective withdrawal, will depend on the timescale of adaptation in the FCD circuit. While this may be days to weeks for endocrine gland turnover, or possibly for epigenetic modifications, faster circuits (such as transcriptional circuits and protein-protein interactions) may adapt on a timescale of minutes to hours and are therefore less likely to be important for the development of addiction on the timescale of weeks. It will be interesting to test whether the nicotine receptor system analyzed by Gutkin et al. (Gutkin et al., 2006) also can show FCD (their original model did not generally have FCD). A recent study demonstrated how approximate FCD may be implemented in receptor-based mammalian signaling systems (Lyashenko et al., 2020). In their work, which focused on the pAkt pathway, Lyashenko et al. demonstrated that the adaptation mechanism for FCD can be implemented using receptor endocytosis. Endocytosis occurs generally in neurotransmitter pathways and may thus be a relevant mechanism for FCD. In addition to its vulnerability to addiction, FCD control of subjective reward also has selective benefits for learning behaviors that promote fitness. One benefit is to allow rapid learning when reward magnitude spans a large range. This is relevant when the reward baseline level is unknown or fluctuating. A second benefit is a physiological implementation of potential-based reward shaping. This is crucial when the input to the circuit represents a cue or proxy that is correlative with a “real” reward but does not have value by itself. Not all behaviorally relevant circuits have FCD: for example, mechanical pain does not adapt to background level (and hence cannot have FCD which entails exact adaptation), whereas pain mediated by interaction with the capsaicin receptor does (Holzer, 1991; Nolano et al., 1999; Winter et al., 1995; Yao and Qin, 2009). It would be fascinating to compare the design principles of different physiological circuits that control subjective reward. In summary, we propose an opponent process for addiction based on gland mass changes in the HPA axis. To test this, further experiments are important. Monitoring gland masses using imaging during the stages of the addiction process can help to test this proposal. If gland masses turn out to be important for the week-scale dynamics of addiction, they might serve as relevant targets for intervention. Interventions that suppress gland mass changes at the right time, perhaps using HPA agonists and antagonists, are predicted to interfere with the addiction process and to reduce withdrawal symptoms.

Limitations of the study

The study used a mathematical model of the regulation of hormones by alcohol to propose a mechanism for the development and maintenance of alcohol addiction. As with all models, establishing its validity requires experimental testing. Important experimental tests for the model include measuring dynamical changes in endocrine glands during alcohol addiction and withdrawal and dynamical measurements of alcohol-induced secretion of endorphins as a function of endocrine gland masses.

Resource availability

Lead contact

Uri Alon uri.alon@weizmann.ac.il.

Material availability

This study did not generate any new material.

Data and code availability

The full python code to generate all the simulations for all the figures is available at https://github.com/omerka-weizmann/hpa_addiction.

Methods

All methods can be found in the accompanying Transparent Methods supplemental file.

111 in total

Review 1. Activation of the hypothalamic-pituitary-adrenal axis by addictive drugs: different pathways, common outcome.

Authors: Antonio Armario
Journal: Trends Pharmacol Sci Date: 2010-05-25 Impact factor: 14.819

2. Opposing tonically active endogenous opioid systems modulate the mesolimbic dopaminergic pathway.

Authors: R Spanagel; A Herz; T S Shippenberg
Journal: Proc Natl Acad Sci U S A Date: 1992-03-15 Impact factor: 11.205

3. Adaptive coding of reward value by dopamine neurons.

Authors: Philippe N Tobler; Christopher D Fiorillo; Wolfram Schultz
Journal: Science Date: 2005-03-11 Impact factor: 47.728

4. Importance of delta opioid receptors in maintaining high alcohol drinking.

Authors: J C Froehlich; M Zweifel; J Harts; L Lumeng; T K Li
Journal: Psychopharmacology (Berl) Date: 1991 Impact factor: 4.530

5. The acute effect of ethanol on adrenal cortex in female rats--possible role of nitric oxide.

Authors: Dragoslava Dikić; Mirela Budeč; Sanja Vranješ-Durić; Vesna Koko; Sanja Vignjević; Olivera Mitrović
Journal: Alcohol Alcohol Date: 2011-05-24 Impact factor: 2.826

Review 6. Theoretical frameworks and mechanistic aspects of alcohol addiction: alcohol addiction as a reward deficit disorder.

Authors: George F Koob
Journal: Curr Top Behav Neurosci Date: 2013

7. The incoherent feedforward loop can provide fold-change detection in gene regulation.

Authors: Lea Goentoro; Oren Shoval; Marc W Kirschner; Uri Alon
Journal: Mol Cell Date: 2009-12-11 Impact factor: 17.970

8. Ectopic secretion of corticotropin-releasing factor as a cause of Cushing's syndrome. A clinical, morphologic, and biochemical study.

Authors: R M Carey; S K Varma; C R Drake; M O Thorner; K Kovacs; J Rivier; W Vale
Journal: N Engl J Med Date: 1984-07-05 Impact factor: 91.245

9. Effects of single and repeated exposures to ethanol on hypothalamic beta-endorphin and CRH release by the C57BL/6 and DBA/2 strains of mice.

Authors: J P de Waele; C Gianoulakis
Journal: Neuroendocrinology Date: 1993-04 Impact factor: 4.914

10. Adrenal gland volume in major depression: relationship to basal and stimulated pituitary-adrenal cortical axis function.

Authors: R T Rubin; J J Phillips; J T McCracken; T F Sadow
Journal: Biol Psychiatry Date: 1996-07-15 Impact factor: 13.382

2 in total

1. Dynamics of thyroid diseases and thyroid-axis gland masses.

Authors: Yael Korem Kohanim; Tomer Milo; Moriya Raz; Omer Karin; Alon Bar; Avi Mayo; Netta Mendelson Cohen; Yoel Toledano; Uri Alon
Journal: Mol Syst Biol Date: 2022-08 Impact factor: 13.068

2. A Mechanism for Ovulation Number Control.

Authors: Michal Shilo; Avi Mayo; Uri Alon
Journal: Front Endocrinol (Lausanne) Date: 2022-07-14 Impact factor: 6.055

2 in total