Literature DB >> 33391391

The Devils in the DALY: Prevailing Evaluative Assumptions.

Carl Tollef Solberg¹, Preben Sørheim², Karl Erik Müller³, Espen Gamlund⁴, Ole Frithjof Norheim⁵, Mathias Barra⁶.

Abstract

In recent years, it has become commonplace among the Global Burden of Disease (GBD) study authors to regard the disability-adjusted life year (DALY) primarily as a descriptive health metric. During the first phase of the GBD (1990-1996), it was widely acknowledged that the DALY had built-in evaluative assumptions. However, from the publication of the 2010 GBD and onwards, two central evaluative practices-time discounting and age-weighting-have been omitted from the DALY model. After this substantial revision, the emerging view now appears to be that the DALY is primarily a descriptive measure. Our aim in this article is to argue that the DALY, despite changes, remains largely evaluative. Our analysis focuses on the understanding of the DALY by comparing the DALY as a measure of disease burden in the two most significant phases of GBD publications, from their beginning (1990-1996) to the most recent releases (2010-2017). We identify numerous assumptions underlying the DALY and group them as descriptive or evaluative. We conclude that while the DALY model arguably has become more descriptive, it remains, by necessity, largely evaluative.

Entities: Disease Gene Species

Year: 2020 PMID： 33391391 PMCID： PMC7765634 DOI： 10.1093/phe/phaa030

Source DB: PubMed Journal: Public Health Ethics ISSN： 1754-9973 Impact factor: 1.940

Introduction

The Global Burden of Disease (GBD) study is ‘a systematic, scientific effort to quantify the comparative magnitude of health loss due to diseases, injuries, and risk factors by age, sex and geographies for specific points in time’ (Murray : 1). The GBD study quantifies disease burden via a measure called disability-adjusted life years (DALYs). During the first phase of the GBD study (1990–1996), it was widely acknowledged that the DALY model included evaluative assumptions (Murray, 1994, 1996; Murray and Lopez, 1996; Murray and Acharya, 1997). However, with the publication of the 2010 GBD study, two central evaluative practices—time discounting and age-weighting—were discontinued. After a substantial revision, the emerging view now appears to be that the DALY is primarily a descriptive measure of overall disease burden (Murray et al., 2012a,b; Salomon ; Murray and Lopez, 2013; Knudsen ). In brief, the DALY has seemingly undergone a transition from being a measure of burden ‘based on explicit and transparent value choices’ (Murray and Lopez, 1996: 8) to ‘a major step toward a replicable scientific approach to global descriptive epidemiology’ (Murray : 2065—our italics). This emerging terminology obscures the fact that several critical evaluative assumptions remain embedded within the DALY model. As we will see, both the magnitude and distribution of disease burden rely heavily on these evaluative assumptions. The GBD study was conceived to inform policymakers (Murray, 1994; Murray and Lopez, 1996). Since then, the DALY model has become increasingly popular in the global health community. Major organizations and institutions, as well as national health authorities, use this measure, and DALY-publications regularly appear in high-ranking academic journals. By using DALY data from the GBD study, epidemiological trends are monitored, disability groups are ranked according to their disease burden, and health inequity is quantified. Our aim in this article is to argue that the DALY model, despite the recent modifications, remains largely evaluative. We will restrict our analysis to the assumptions underlying the DALY model rather than the assumptions underlying the GBD study as such. Moreover, we will focus on the practices and corresponding rationales that have been suggested by the DALY architects themselves (e.g. Murray, 1994, 1996; Murray and Lopez, 2013). This article proceeds as follows: first, we motivate and review the basics of the DALY model. Next, we examine and contrast the core DALY assumptions during the two most significant phases of the GBD: 1990–1996 and 2010–2017. For our purposes, we classify these assumptions into two categories: descriptive and evaluative. We conclude that evaluative assumptions are ubiquitous and that while the DALY model has arguably become somewhat more descriptive, it necessarily remains largely evaluative.

The Disability-Adjusted Life Year

The DALY model can be viewed as a natural extension of earlier efforts (1940–1950) to evaluate instead of simply counting deaths. The initial ambition was to analyze disease burden beyond the descriptive crude death rates (CDRs) by instead evaluating deaths according to the prematurity of their occurrence (see e.g. Dempsey, 1947; Robinson, 1948; Haenszel, 1950). Two decades later, health- and quality-adjusted life years (HALYs and QALYs, respectively), and similar models—undertook to include the burden morbidity confer on people while they still are alive, in addition to the disease burden resulting directly from death. The DALY model was launched in 1993 with the first GBD study, and its raison d'être was to obtain a universally applicable measure integrating morbidity and mortality. According to the DALY model, morbidity is measured by assigning disability weights (DWs) to health conditions, where 0 represents the absence of disability, 0 ≤ DW ≤ 1 quantifies the burden that a particular health condition incurs and 1 is the highest possible DW, defined as a loss ‘equivalent to death’ (Salomon : 712). After a condition has been assigned its DW, the years lived with disability (YLDs) is calculated as the product of the condition’s duration and its DW, which account for morbidity. Years of life lost (YLLs), relative to a reference life expectancy, account for mortality. Finally, YLDs + YLLs = DALYs = disease burden. A simplified example of the calculation of lifetime DALYs for an individual may be helpful. Imagine a person who suffers from A = severe anxiety disorder () during 8 years of her young adulthood (see Figure 1). This suffering generates duration × DW = 8 years × 0.5 YLDs/year = 4.0 YLDs. Later in life, the person endures B = chronic neck pain () which lasts for 15 years, generating 15 years × 0.3 YLDs/year = 4.5 YLDs, until she dies at the age of 70. For 70-year-olds, the DALY reference life expectancy is 88.9 years, meaning that the person lost 88.9 years − 70 years = 18.9 years, which are the YLLs generated by her (premature) death (Murray ). Accordingly, the aggregate lifetime DALY amount is 4.0 YLDs + 4.5 YLDs + 18.9 YLLs = 27.4 DALYs. See Figure 1 for an illustration.

Figure 1.

An illustration of the DALY.

The GBD employs this type of calculation to arrive at estimates of the overall disease burden in a population. However, in the second phase of the GBD (2010–2017), the DALY estimates are (usually) given as annual figures and quantify the total amount of disease burden generated during that year. An illustration of the DALY.

Descriptive versus Evaluative Epidemiology

Historically, epidemiologists were chiefly concerned with the mortality in a population, and mortality rates were the standard measures of disease burden (in addition to incidences and prevalences). This initial approach included the CDR and age-specific death rates. The CDR is the number of fatalities per year per 1000 people. Age-specific death rates are CDRs that are restricted to a predefined age bracket. The child mortality rate, for example, reports the CDR for those between 1 and 5 years of age (Porta, 2014). Such descriptive mortality measures have several virtues. In general, they are simple, transparent, and inherently universal (Haenszel, 1950). Evaluative measures, in contrast, will, by necessity, rely on contested value assumptions. No measure of mortality, health or well-being serves equally well for all purposes. For instance, descriptive mortality measures do not highlight the fact that individual deaths are postponed rather than prevented. Moreover, they say nothing about whether some deaths are worse than others. For some purposes, we need evaluative measures instead (Gamlund and Solberg, 2019). In the philosophy of science, several frameworks have been suggested for how values influence the scientific process (Longino, 1990; Lacey 1999; Douglas, 2009; 2016). Moreover, there is a growing literature on the philosophy of measurement (see e.g. Cartwright and Bradburn, 2011; Cartwright and Runhardt, 2014; Tal, 2017). Furthermore, for philosophers of science, it is common to regard most scientific disciplines as more value-laden than what is acknowledged by the practitioners within those disciplines. Social and ethical values permeate scientific endeavors at many different levels (Chang and Cartwright, 2014). With regard to the DALY, one framework suggested by Hausman and McPherson (2006) could serve as an illustration of the intuitions we draw on. Admittedly, Hausman and McPherson (2006: 9, Figure 1.2.1) describe their framework as displaying ‘exaggerated contrasts between facts and values’. According to their caricatured framework, factual claims are characterized by disagreement that can easily be resolved, hypotheses that can be determined as true or false, and hypotheses that are independent of evaluative claims. In contrast, evaluative claims are characterized by little agreement and are not easily resolved, hypotheses that cannot easily be determined as true or false, and hypotheses that are dependent on factual claims (Hausman and McPherson, 2006: 9). However, it is difficult, or even impossible, to tease apart every evaluative aspect from every descriptive aspect of the assumptions within the DALY. While this might sometimes make it difficult to classify an assumption as ‘clearly evaluative’ or ‘predominantly descriptive’ in a manner that would be universally acceptable, we believe one can often rely on intuitions to discern that some assumptions contain largely evaluative, as opposed to descriptive, aspects. A rigorous account of descriptive assumptions, as opposed to evaluative ones, remains elusive, and we offer no formal criteria to distinguish descriptive from evaluative assumptions. Instead, we rely on what we regard as shared intuitions concerning the classification of the various aspects of the DALY discussed below. Even if some readers disagree with some of our intuitions, we believe that the overall case we present will be compelling.

The Devils in the DALY

Certain assumptions in the DALY model have been discussed frequently, such as discounting, age-weighting, DWs and choice of reference life expectancy (Anand and Hanson, 1997; Murray and Acharya, 1997; Fox-Rushby and Hanson, 2001; Lyttkens, 2003; Arnesen and Nord, 1999; Arnesen and Kapiriri, 2004; Voigt, 2012). Aside from the assumptions above, Murray (1994, 1996) also addressed the incidence versus prevalence approach and comorbidity at an early stage. That the DALY is assumed to be a measure of health rather than well-being has also been subject to debate (Broome, 2002; Hausman, 2015). The perhaps most systematic and rigorous work on the assumptions behind the DALY to date has been conducted by the DALY-architect himself Christopher J. L. Murray and by the philosopher S. Andrew Schroeder (Schroeder, 2012, 2017, 2019). In addition to the assumptions mentioned above, Schroeder adds grouping of disabilities to the list above (2016). In the present exposition, we expand on this list with a discussion of four further assumptions: intrapersonal and interpersonal aggregation, individual versus societal burden, gradualism versus non-gradualism and commensurability.

Health versus Well-Being

The health versus well-being debate lies at the core of the descriptive versus evaluative question. One of the strengths under which the current DALY is marketed is that it measures health simpliciter (Voigt and King, 2014). The discussion of this controversial issue began when the first GBD was launched (Anand and Hanson, 1997; Murray and Acharya, 1997); to date, it has not been satisfactorily resolved (Schroeder, 2016, 2019). Still, the DALY authors continue to describe the DALY as an objective and descriptive measure of health. Indeed, without health simpliciter, much of the mission behind the DALY may be undermined: one of the great advantages of the DALY is that it can be universally applied (conditional on measuring health simpliciter). In contrast, most health jurisdictions mandate that national tariffs for QALYs are used for health technology assessment analyses since it is empirically recognized that preferences do vary between cultures and within populations. Since the architects of the DALY strive to measure health descriptively, their methodology has (as previously mentioned) been revised to remove evaluative components (Salomon ). Whether the DALY should measure health or well-being is already thoroughly discussed in the literature (e.g. Broome, 2002; Hausman, 2015). It is unclear whether it is desirable to ignore the impact of health, let alone if ill health is a robust and meaningful construct when considered in isolation of the individuals who are burdened by it (Arnesen and Nord, 1999; Broome, 2002; Voigt and King, 2014). Lastly, the health versus well-being debate also illustrates the need for a careful analysis of that which we tentatively call the descriptive versus evaluative distinction in health measurement and epidemiology.

Disability Weights

The quantification of the burden through the so-called DWs is central to the DALY model. The DW is the device for modeling the assumption that some conditions are worse than others. The measuring of the burden of morbidity is carried out by assigning DWs to different conditions: 0 represents the absence of disability, 1 represents the maximal possible disability and intermediate values represent degrees of disability. For example, for the DWs set in 2017, severe multiple sclerosis was assigned a DW of 0.72, while a symptomatic tension-type headache was given a DW of 0.04 (Global Burden of Disease Collaborative Network, 2018). These numbers mean that the burden associated with severe multiple sclerosis is considered to be 18 times greater than the burden associated with a tension-type headache, per unit time. All work on the DWs in the DALY has relied on one or more of the standard preference-based evaluation methods: the person trade-off (PTO), the standard gamble and the time trade-off. In the early GBD studies, the PTO method was central (Arnesen and Nord, 1999). The current practice (as of 2020) is based on utility theory and draws on discrete choice methodology combined with the PTO method. Nevertheless, the DALY is currently referred to as a non-preference-based measure: the explanation provided is that respondents are asked to set DWs not based on their own preferences, but instead ‘to state which of the two individuals they would deem as being healthier than the other’ (Salomon : 713). However, there are reasons to believe that a change of wording alone cannot transform the inherent evaluative nature of such choice tasks into an objective measurement of health qua health (Voigt and King, 2014). A preference for something over something else implies the ranking of one alternative above another. Some preferences are simple and deeply rooted in our biology, such as preferring pleasurable sensations over pain or the taste of healthy nutrients over poisons. Such preferences tend to appear instinctive and require no conscious choice of our own. They may reasonably be referred to as descriptive. However, we should not confuse such physiological preferences with deliberative preferences. The rank order ‘healthier than’ encodes a high degree of complexity, since this ranking is (supposedly) not an automated input-output procedure. Rather, it relies on deliberative thinking informed by both intuitions and arguments, usually a long-term perspective, and the processing of information, experience, and context. Thus, it is hard to escape the view that the DWs are evaluative by their very construction. Second, expert panels set the DWs initially, but, since the 2010 GBD, the public’s involvement in establishing DWs has been imperative. The choice of involving the public can be based on at least two rationales: accuracy and legitimacy. The DALY architects were arguably mostly concerned with legitimacy (Salomon ). Hence, the move from the expert toward the public view implied the prioritization of public over technocratic legitimacy. To add to this, we can easily imagine that if ordinary people (instead of technocrats) are asked about their views regarding different conditions, this may be informative as to what is important for them. That being said, the YLL construct—which generates the largest share of the total disease burden worldwide—remains completely uninformed by the public’s views. Third, the question remains of whether some conditions are more burdensome than death. Recall that a DW of 1 has been interpreted as the maximum possible burden. No ‘worse than death’ values—that is, DWs above 1—have ever been used in the DALY model. Comorbidity is also accounted for in such a way that no compound health condition can ever be worse than death (Burstein ). This characteristic suggests the evaluative assumption that life is better than death—no matter what. In summary, the DWs remain evaluative because people are asked how good or bad different conditions seem to them. We doubt that any system for setting DWs can avoid being evaluative (at least, such a system would no longer measure something that matters). Consequently, the YLD-component of the DALY is inherently evaluative as well.

Discounting

During the first phase of the GBD (1990–1996), future DALYs were discounted by a fixed rate of 3% per year. This practice reduced the YLLs attributed to premature deaths. Because future years were discounted (combined with age-weighting), death at birth was not counted as 86 YLLs but rather as approximately 32 YLLs. This discounting practice also implied that if two individuals each lost 20 YLLs and one person lost 40 YLLs, the latter loss would carry less weight since these YLLs were more distant. Three reasons were offered for this discounting practice: (i) the future is shrouded in uncertainty; (ii) health interventions are likely to improve in the future; and (iii) people tend to prefer goods in the near rather than the far future (Murray, 1996; Murray and Acharya, 1997). Reason (i) can be seen as descriptive as it is based on an epistemic concern about how to forecast the future. In isolation, this issue does not concern any value-theoretical questions. That is to say, it is simply true and indisputable that the future is shrouded by epistemic uncertainty. Reason (ii) can also be seen as descriptive. On the one hand, one may argue that the very concept of improvement itself relates to something evaluative as the improvement implies that something is better than it was before. Moreover, that something resembling health interventions is likely to improve may involve value concerns. Such an improvement may include ‘better health’, ‘better quality of life’, or ‘better lives’—all evaluative concepts. On the other hand, there is a correct answer as to whether health interventions will improve in the future, and reasonable agreement on the assumption that health interventions will improve. Thus, given a narrow definition of evaluative, we may determine reason (ii) to be descriptive. Reason (iii), however, seems to be evaluative, even in a narrow sense. We can indeed describe people’s preferences—doing so is a descriptive endeavor. Moreover, we can provide true empirical answers to questions about what preferences people have. However, recall that deliberative preferences have an evaluative rather than a descriptive nature. More specifically, reason (iii) implies the judgment that goods in the near future are evaluated and ranked as better than goods in the far future, which is evaluative. Scholars have heavily debated such discounting (Anand and Hanson, 1997), and from 2010 GBD and onwards, it was omitted to make the DALY more descriptive. Nevertheless, this omission was grounded in yet another evaluative rationale, namely that every life year should count equally, independently of when in life it occurs (Murray ).

Age-Weighting

Another characteristic of the first phase of the GBD studies (GBD 1990–1996) was age-weighting, which meant that less weight was attributed to years lived at very young and very old ages. In this marginal age-weighting, life years between the ages of 15 and 40 were given the highest relative value, while the years in the very first phase of life, as well as in the final period, were given the least relative value. Three main rationales were offered for this age-weighting: (i) Well-being; people themselves may value life years differently at different life-stages, (ii) Productivity; one may attribute a higher value to the most productive life years of an individual’s life, (iii) Well-being interdependence; the belief that some people play a unique role in providing well-being for others, such as children and elderly parents (Murray and Acharya, 1997). We see that (i–iii) are arguably evaluative: (i) concerns preferences for prudential value (see the DW discussion below), while (ii–iii) highlight instrumental value assumptions relating to particular people. From the 2010 GBD and onwards, this age-weighting was omitted based on the same evaluative premise that omitted the discounting practice: that every life year for every person around the world should count equally (Murray ).

Choice of Reference Life Expectancy

The largest part of the total disease burden is generated by deaths rather than morbidity. Hence, it is crucial to know how these deaths are evaluated. Several of the assumptions in the YLL component of the DALY will have high elasticity: specifically, small changes in the assumptions may incur large changes in the overall disease burden. The YLL component of the DALY is based on the concept of potential years of life lost (PYLL), a mortality measure that originated in the late 1940s (Dempsey, 1947; Robinson, 1948; Haenszel, 1950). In the second phase of the GBD, YLLs are calculated in the following way. If an individual dies at age 20, her YLLs are calculated as the temporal distance between her age of death and a reference life expectancy for her age group. According to the life table used in the 2010 GBD, this death would generate 86.4–20 = 66.4 YLLs. Similarly, a 10-year-old would lose 86.3–10 = 76.3 YLLs, an 80-year-old 91.0–80.0 = 11.0 YLLs, and a stillborn child 0–0 = 0 YLLs (Murray ). This practice implies that an individual generates more YLLs the younger she is at the time of death, reaching a maximum immediately after birth. The YLLs have been counted from the time of birth throughout the history of the GBD, but the choice of reference life expectancy has varied (see e.g. Murray, 1994 versus Murray ). Several questions need to be answered in order to create a reference life expectancy. To begin with, when do individuals begin to accrue YLLs? This issue was debated when the PYLL—the precursor of the YLL—was developed in the 1940s. Most authors suggested counting from birth (Dempsey, 1947; Haenszel, 1950), but opinions ranged from including stillbirths (Robinson, 1948) to counting from age one (Romeder and McWhinnie, 1977). The DALY model starts counting YLLs at birth, thus excluding all stillbirths. Discussion of this issue, however, is largely absent in the GBD literature. From the 1990 GBD until the 2017 GBD, stillbirths were excluded (i.e. generating 0 DALYs). Until the 2010 GBD, the death of 10-year-olds was measured as the greatest possible amount of DALYs lost due to the combined effect of age-weighting and discounting. Since the 2010 GBD, however, neonatal deaths have been attributed the maximal possible burden (approximately 86 DALYs). In the 2017 GBD, stillbirths were counted in the mortality statistics but not in the DALY count that generates disease burden (Wang ). As of 2020, stillbirths do not generate any disease burden in the GBD study. The question of the age at which we should begin to count DALYs is an important aspect of the GBD, and merely small changes in the lower age limit would result in a large difference to the total disease burden. Since a discussion of reasons for setting a lower age limit is lacking in the DALY literature, we are led to imagine candidate reasons for this practice. This issue ultimately relates to the following question: what is disease burden? Providing an answer to this question is not trivial. One approach is to argue that disease burden is usually something we experience, and since embryos and fetuses (usually) cannot have experiences, they cannot be subjected to any disease burden. However, such an approach seems unreasonable given that the majority of the total disease burden is a direct result of YLLs, which occur when individuals have died and therefore cannot experience at all. Another strategy could be to argue that individuals can be harmed by their own death only after they are born. Such an approach would involve value-theoretical considerations regarding the harm of death. A third approach is to argue that the lower age limit is set at birth because this is in line with ordinary norms and sensible in a practical sense. However, even this third approach involves choices that are value-laden and subject to reasonable disagreement. In relation to this, Murray’s claim that every life year should count equally for everyone disregards the discontinuous jump in disease burden between fetuses and neonates. Furthermore, the question remains as to what the upper limit for YLLs should be. In the first GBD, the reference life expectancy used for measuring YLLs was 80.0 years at birth for men and 82.5 years at birth for women (Murray, 1994). This female reference life expectancy was based on Japan’s, which had the highest life expectancy at the time (Murray and Acharya, 1997). The decrement of 2.5 years for males was arbitrarily estimated. This sex difference was later omitted. From the 2010 GBD, age-specific reference life expectancies are synthesized by choosing the lowest national age-specific death rate recorded (Murray ). The use of these so-called synthetic reference life expectancies means that the longevity used in YLL-computations is estimated so that individuals always enjoy the lowest recorded national age-specific death rate. Another important issue concerns the question of whether the reference life expectancy should be local or universal. The initial approach suggested in the PYLL used national life expectancies (e.g. Dempsey, 1947). Throughout all of the GBD studies, however, the reference life expectancy has been universal. This means that death at, say, age 60 is attributed the same number of DALYs regardless of the nationality of that individual. Murray explicitly mentions that the universal application of synthetic life expectancy is grounded in an ‘egalitarian nature’. That is to say, if one were to count YLLs from local life expectancy, then preventing the death of a 40-year old woman in a high-income country (with higher life expectancy) would lead to a larger reduction in the global burden of disease than preventing the death of 40-year-old in a developing country (with low life expectancy; Murray, 1996). Finally, should the reference life expectancy be fixed or progressive? Some of the very first PYLL measures were fixed, which meant that one calculated the PYLL for all individuals based on life expectancy at birth (Dempsey, 1947). In all of the GBD studies, however, the reference life expectancy has been progressive. This means that one uses statistical tables, which show life expectancy for each age and make age-adjustments so that the older a person becomes, the higher her life expectancy will be. This progressive approach is reflected in the YLL so that, while life expectancy falls for each year a person ages, it does not fall with a full year. The use of such progressive instead of fixed life-expectancy models implies that more YLLs are attributed to the elderly, more YLLs are generated in total, and the DALY acquires a slightly less egalitarian flavor. There are a few things to note about the four assumptions above: they have competing alternatives, and there is reasonable disagreement on the alternatives. Taken together, we see that even though life expectancy is descriptive in an empirical sense, the choice of one system of reference life expectancy over another as a way of calculating YLLs is evaluative. There are reasons to believe that we cannot choose systems of reference life expectancies in a value-neutral way (see Anand and Reddy, 2019).

The Incidence versus Prevalence Approach

Theoretically, the two components of the DALY—the YLD and the YLL—can be measured both as incidence and as prevalence parameters. Because death rates are incidence rates, the YLL has been accounted for by incidence rate by default (Murray, 1994, 1996), but this default is not self-evident. In contrast, both incidence and prevalence rates make immediate sense for the YLD. The GBD in 1990 and onwards used an incidence perspective for both the YLD and the YLL. Recall that DALYs are calculated for 1 year at a time. The issue of the incidence versus prevalence approach has to do with the year in which morbidity is assigned. Under an incidence approach, all DALYs associated with a diagnostic incident in a given year, including expected future DALYs, are assigned to that year. This incidence practice means that if a person is diagnosed with a chronic obstructive pulmonary disease in 2005 (moderate DW 0.23), and she is expected to live for a further 10 years, then 10 × 0.23 = 2.30 DALYs are attributed to the disease burden for the year 2005. Three reasons were initially given for the practice of having a pure incidence perspective. First, quantifying incidence YLDs is more consistent with incidence YLLs. Second, an incidence perspective for YLDs identifies the impact of health interventions more rapidly. Third, with the prevalence YLD alternative, there is a risk of uncritical reading (Murray, 1994, 1996). The second phase of the GBD (2010–2017) saw the use of a prevalence perspective of the YLD. This prevalence perspective implies that YLDs are accounted for one year at a time, instead of all at once. In the chronic obstructive pulmonary disease example above, this practice means that only 1 × 0.23 = 0.23 DALYs were accounted for in 2015, and the same will be the case for the next 9 years. There are at least two rationales given for the switch from incidence to prevalence YLDs. First, incidence YLDs rely on strong assumptions about an uncertain future. Second, under a falling incidence, future years may come out better than they should (if DALYs should describe health care needs) because the need for health care services might still be high (Murray ; Schroeder, 2016). In this case, the rationales behind the choice between an incidence and prevalence YLD can be seen as descriptive as they concern epistemic forecasting rather than value theory.

Individual versus Societal Burden

The question remains of whether the DALY is concerned with morbidity and mortality for those who are sick and dying, for their dependents, or society itself. The DALY has always been primarily concerned with an individual burden. However, in the first phase of the GBD, the DALY also referred to a societal burden. Murray suggested two possible rationales for this concern: first, the human capital approach, where the value of the time at each age should be proportional to the productivity at that age. Second, social values were attributed to age groups that normally act as caregivers for their children and parents. As mentioned, these two rationales were supposed to favor age-weighting (Murray, 1994, 1996). When the GBD also measured societal burden, there seemed to be no principled reasons for a sharp cutoff in the count’s lower limit. A concern for societal burden may imply a gradual increase with regard to disease burden generated. This is because societal burden refers to how all but the deceased are affected by a person’s premature death or disability. According to this line of thinking, it is not unreasonable to claim that stillbirths also incur at least a minimal degree of societal burden. From the GBD 2010 and onwards, societal burden was excluded. In the words of Murray , 14) ‘Burden should be assessed individual by individual’. This omission led some authors to conclude that the GBD is now almost value-free (Murray ; Salomon ; Salomon ; Knudsen ). The choice to omit societal burden seems reasonable and represents a step toward a slightly more descriptive DALY. However, even the individual burden itself implies something of disvalue to the individual, and it is hard to see that such a concern does not rely on value theory or other evaluative approaches. Even if the omission of societal burden can be seen as non-evaluative, individual disease burden is itself an evaluative concept.

Gradualism versus Non-Gradualism

From the 1990 GBD until the 2010 GBD, the combined effect of age-weighting and time discounting gave a gradual curve for disease burden throughout individual lives. This combined effect implied that the highest possible number of YLLs was incurred when 10-year-olds died (Murray ). Importantly, this gradual function was a result of the combined effect of individual and societal burden. It is hard to say whether this precise implication—that the death of 10-year-olds incurred the greatest number of DALYs—was intended or not. It is hard to see how a human capital approach would only matter from birth onwards. When age-weighting and time discounting were omitted, the GBD-curve showed a non-gradual function with a sharp discontinuous boundary at birth. Moreover, this non-gradual curve represents an individual burden only. This latter view implies that the age at which death becomes a burden is also when death generates the greatest possible burden. In the current GBD, the burden of death is assumed to be the greatest at birth, where neonatal deaths incur around 86 DALYs each (Murray ). Rationales for gradualism versus non-gradualism are lacking in the GBD literature but can be found elsewhere (McMahan, 2002; Millum, 2015; Solberg and Gamlund, 2016). The choice of whether gradualism or non-gradualism should apply to the DALY may, arguably, be classified as both descriptive and evaluative. The choice is descriptive if we assume that there is a true answer as to when the worst time to die is. However, it is also a value-theoretical question and therefore evaluative in this sense. Moreover, if there is no true answer as to when the worst time to die is, how we evaluate deaths in the DALY remains an open question that requires reasons that are directly value-laden. Thus, we hold that the challenge of gradualism versus non-gradualism in the DALY is an evaluative concern.

Aggregation (Intra- and Interpersonal)

There are two forms of aggregation in the DALY. First, intrapersonal aggregation—that is, an aggregation of burden across time within an individual’s life. In the GBD study, the individual is the fundamental unit for the disease burden (Murray ). Several assumptions need to be in place for intrapersonal aggregation to make sense, and this has been discussed elsewhere (Broome, 2004; Hirose, 2015). The current GBD indirectly assumes that we begin to exist from the moment of birth. Additionally, it is assumed that burdens can accumulate within the lives of individuals (as illustrated in Figure 1). This kind of intrapersonal aggregation entails the idea that some relevant property (e.g. the brain, our bodies or sentience) grounds an individual’s identity throughout her life. In philosophy, this property is called personal identity. There is little consensus among philosophers about what constitutes the grounds for personal identity (Parfit, 1984; McMahan, 2002; Olson, 2007). Still, to make sense, there are reasons to believe that the DALY must rely on the assumption that personal identity is acquired at birth and continues until the current definition of death occurs. Second, the DALY presupposes interpersonal aggregation—that is, an aggregation of burden across people, at least for estimating DWs. Interpersonal aggregation is an assumption of the DALY that is seldom articulated, even though the founders of the DALY are probably aware of it. An important aspect of interpersonal aggregation is additive aggregation. Regardless of what position one adopts on the issue of distribution, the choice of employing a straightforward additive aggregation formula is a value choice.

Commensurability

Closely related to the aggregation of burden is the issue of commensurability. There are at least two assumptions regarding commensurability in the GBD. First and foremost, YLDs quantify disease burden—an inherently multidimensional construct. More precisely, in the 2015 GBD, the YLD quantifies the burden of 235 distinct conditions. The underlying assumption here is that these conditions are commensurable as individual burdens. However, several of these conditions are not intuitively comparable. Compare, for instance, mild low back pain, moderate hearing loss and amputation of one arm or severe dementia. We should ask what these conditions have in common. To answer this question, we will need an account that unifies these conditions. A second concern is the paramount assumption that it makes sense to aggregate YLDs and YLLs. The idea is that YLDs and YLLs can be measured on a cardinal ratio scale, which captures the assumption that YLDs and YLLs are commensurable as an individual burden (Murray ; Murray and Evans, 2003). However, it is not entirely clear that the YLL measures an individual burden in the first place, as we do not experience—or even exist—while ‘being dead’. If the YLL is, in fact, not an individual burden, then the YLD and the YLL will be incommensurable qua individual burden. This may be a severe problem for the DALY as the YLL sets the reference frame for the YLD and because the majority of the total disease burden consists of YLLs. The best candidate for a justification of commensurability is perhaps the fact that the DALY presupposes a counterfactual account of harm. Reference to counterfactual harm is probably the best candidate for explaining how the YLD is a multidimensional concept, as well as how the YLD and the YLL are commensurable. Note that harms and benefits directly concern how our well-being is affected, and so the concern about commensurability is strongly related to evaluative concerns (Solberg ). There are reasons to question whether purely descriptive concepts of morbidity and mortality can be commensurable at all. If the YLD and the YLL in the DALY are to be seen as commensurable, then one will have to admit that there are evaluative aspects involved in this measure. However, this concern about commensurability is absent in the GBD literature. Schroeder has, however, responded to this concern. He argues that if we grant that the DALY is best understood as an index, then, the concern for YLD-YLL commensurability may matter less (Schroeder, 2018). Whether or not Schroeder is right is an open question, but the very assumption of equivalence in value between YLD-YLL is, in our view, an evaluative matter.

Summary of Assumptions

In summary, even if we grant a narrow definition of evaluative assumptions, most of the assumptions that we have discussed are evaluative. We have provided strong reasons in support of the view that the DALY measure should still be regarded as an evaluative endeavor. See Table 1 for a summary of the DALY assumptions that we have explored.

Table 1.

DALY assumptions in the GBD

Assumption	GBD 1990–1996	GBD 2010–2017	Descriptive
1. Health versus well-being	Practice Proxy of well-being with an attempted rewording to ameliorate the situation	Practice Health	No
2. Disability weights	Practice No condition worse than death Expert panels The PTO method	Practice No condition worse than death Public involvement Utility like-choice tasks	No
3. Discounting	Practice Yes, 3% p.a.	Reasons Uncertainty Improvement Time preferences	Practice No, 0% p.a.	Reasons Every life year should count equally	No
4. Age-weighting	Practice Yes	Reasons Well-being Productivity Well-being interdependence	Practice No	Reason Every life year should count equally	No
5. Choice of reference life expectancy	Practice Birth as lower limit. 80 years for men, and 82 years for women as upper limit. Age-adjusted (life table) Universal (except sex difference)	Practice Birth as lower limit. Synthetic life-tables Age-adjusted (life table) Universal No sex difference	No
6. The incidence versus prevalence approach	Practice Incidence YLD Incidence YLL	Reasons Consistency Rapidity Uncritical reading	Practice Prevalence YLD Incidence YLL	Reasons Incidence YLDs: strong future assumptions, if falling incidence, future years may come out too good.	Yes
7. Individual versus societal burden	Practice Individual (majority) and societal burden	Practice Individual burden	No
8. Gradualism versus non-gradualism	Practice Gradualism	Practice Non-gradualism	No
9. Aggregation (inter- and intrapersonal)	Practice Both	Practice Both	No

Our categorization and contrasting of the assumptions of the first phase of the GBD (1990–1996), against its second phase (2010–2017). We believe that most underlying assumptions and their corresponding rationales are evaluative rather than descriptive.

DALY assumptions in the GBD Proxy of well-being with an attempted rewording to ameliorate the situation Health No condition worse than death Expert panels The PTO method No condition worse than death Public involvement Utility like-choice tasks Yes, 3% p.a. Uncertainty Improvement Time preferences No, 0% p.a. Every life year should count equally Yes Well-being Productivity Well-being interdependence No Every life year should count equally Birth as lower limit. 80 years for men, and 82 years for women as upper limit. Age-adjusted (life table) Universal (except sex difference) Birth as lower limit. Synthetic life-tables Age-adjusted (life table) Universal No sex difference Incidence YLD Incidence YLL Consistency Rapidity Uncritical reading Prevalence YLD Incidence YLL Incidence YLDs: strong future assumptions, if falling incidence, future years may come out too good. Individual (majority) and societal burden Individual burden Gradualism Non-gradualism Both Both Our categorization and contrasting of the assumptions of the first phase of the GBD (1990–1996), against its second phase (2010–2017). We believe that most underlying assumptions and their corresponding rationales are evaluative rather than descriptive.

Why the DALY is Primarily Evaluative

All measures of morbidity known to us, such as QALYs and DALYs, erect that scale on evaluative judgments. It is hard to imagine any way to circumvent evaluative judgments, and the burden of justification lies with those who claim that this can be done. Thus, there are reasons to believe that descriptive mortality measures are unsuitable for direct comparison with evaluative morbidity measures without further evaluative adjustments. According to this line of reasoning, the DALY is not, in this publicly accessible sense, descriptive since it is erected on a scaffold of evaluative assumptions. Moreover, the DALY measures disease burden, and burden is a normative term—it connotes something negative that one wants to discard. This is another sense in which the DALY is evaluative. The closeness between disease burden and the monitoring of global health, discussions on inequalities in health, and the aim of prioritization between major health programs should at least remind us that motivation behind the construction of the DALY is inherently ethical. If the DALY was descriptive in this stronger sense, then to learn that, say, a virus-pandemic generated n DALYs would be less interesting. However, DALYs are not intended to be seen as purely descriptive observations but rather information that motivates policymakers to act. It is important to remark that we see nothing wrong with using an evaluative measure such as the DALY to measure disease burden. What we take issue with is the air of authority that comes with claiming objectivity and a descriptive model. The assumptions underlying the DALY that we have found are not neutral and demand scrutiny and continuous reassessment. Accordingly, the DALY should not be able to evade the ongoing critical discourse regarding its axiological foundation.

Conclusion

We acknowledge that the revisions made in conjunction with the second phase of the GBD (2010–2017) have made the DALY slightly more descriptive of individual disease burden. Notwithstanding—as we have argued—the DALY is still dissimilar from descriptive endeavors such as the crude prevalence or incidence metrics. Many evaluative assumptions will, by necessity, remain embedded in the DALY construct. Modifying these assumptions may affect both the size and the distribution of disease burden across the globe. Our exposition has been a call for more transparency as well as continued scholarly and public scrutiny of the DALY. We conclude that the DALY is primarily evaluative and encourage scholars to continue to seek a firmer ethical foundation of this influential measure.

Conflict of Interest

O.F.N. has been a co-author on several GBD-articles since 2010.

4 in total

Review 1. A scoping review of burden of disease studies estimating disability-adjusted life years due to Taenia solium.

Authors: Andrew Larkins; Mieghan Bruce; Carlotta Di Bari; Brecht Devleesschauwer; David M Pigott; Amanda Ash
Journal: PLoS Negl Trop Dis Date: 2022-07-06

2. Wherein is the concept of disease normative? From weak normativity to value-conscious naturalism.

Authors: M Cristina Amoretti; Elisabetta Lalumera
Journal: Med Health Care Philos Date: 2021-08-30

3. Disparities in Health Financing Allocation among Infectious Diseases in Ebola Virus Disease (EVD)-Affected Countries, 2005-2017.

Authors: Kazuki Shimizu; Francesco Checchi; Abdihamid Warsame
Journal: Healthcare (Basel) Date: 2022-01-18

4. Benchmarking gambling screens to health-state utility: the PGSI and the SGHS estimate similar levels of population gambling-harm.

Authors: Matthew Browne; Alex M T Russell; Stephen Begg; Matthew J Rockloff; En Li; Vijay Rawat; Nerilee Hing
Journal: BMC Public Health Date: 2022-04-27 Impact factor: 4.135

4 in total