Literature DB >> 30606087

Joint models of tumour size and lymph node spread for incident breast cancer cases in the presence of screening.

Gabriel Isheden¹, Linda Abrahamsson¹, Therese Andersson¹, Kamila Czene¹, Keith Humphreys¹.

Abstract

Continuous growth models show great potential for analysing cancer screening data. We recently described such a model for studying breast cancer tumour growth based on modelling tumour size at diagnosis, as a function of screening history, detection mode, and relevant patient characteristics. In this article, we describe how the approach can be extended to jointly model tumour size and number of lymph node metastases at diagnosis. We propose a new class of lymph node spread models which are biologically motivated and describe how they can be extended to incorporate random effects to allow for heterogeneity in underlying rates of spread. Our final model provides a dramatically better fit to empirical data on 1860 incident breast cancer cases than models in current use. We validate our lymph node spread model on an independent data set consisting of 3961 women diagnosed with invasive breast cancer.

Entities: Chemical Disease Gene Species

Keywords: Breast cancer; joint modelling; lymph node spread; random effects modelling; screening; tumour growth model

Mesh：

Year: 2019 PMID： 30606087 PMCID： PMC6745622 DOI： 10.1177/0962280218819568

Source DB: PubMed Journal: Stat Methods Med Res ISSN： 0962-2802 Impact factor: 3.021

1 Introduction

Since its popularisation in medical statistics, the multi-state Markov model has been the primary tool to model breast cancer progression using epidemiological or breast cancer screening data.[1-5] In recent years, however, several research groups have developed alternatives based on continuous processes. Bartoszynski et al.[6] estimated tumour growth with an exponential growth function, explaining individual variation in growth rates with gamma distributed random effects. Plevritis et al.[7] described some extensions of the model. Both of these early models based inference on tumour sizes of breast cancer cases in non-screened populations. Weedon-Fekjaer et al.[5,8] fitted a continuous tumour growth model to data collected from a screened population. They presented a parsimonious model, containing only four parameters, that described both tumour growth and screening sensitivity as continuous functions, enabling them to condition on screening history and mode of detection. An alternative approach that also uses screening data was described by Abrahamsson and Humphreys.[9] The approach is based on specifying three underlying processes: tumour growth, screening sensitivity, and symptomatic detection. The authors essentially extended the model of Bartoszynski et al.[6] and derived probability distributions for tumour sizes, conditioned on screening history and mode of detection. Isheden and Humphreys[10] derived a number of mathematical results that simplified and reduced the computational complexity of the model. The Markov model requires many parameters when the number of disease states is large. As a consequence, the model is not well suited for quantifying the role of individual risk factors on breast cancer progression. Continuous growth models, on the other hand, have few parameters, and can easily be modified to estimate tumour progression at an individual level. For example, Abrahamsson et al.[11] modelled tumour growth rate as a function of BMI and time to symptomatic detection as a function of breast size, and Abrahamsson and Humphreys[9] and Isheden and Humphreys[10] have estimated screening sensitivity as a continuous function of tumour and mammographic image-based covariates. For data collected in the absence of screening, Plevritis et al.[7] extended the exponential growth model of Bartoszynski et al.[6] with two disease states, representing regional lymph node spread and metastatic spread. In the presence of screening, no model that we know of has modelled both tumour growth and tumour spread as continuous time processes. The aim of this article is to develop such a joint process, based on the literature on lymph spread modelling and lymph spread simulation. In 2000, the U.S National Cancer Institute established a consortium,[12] consisting of six research groups from Georgetown University, University of Texas MDACC, Dana-Farber Cancer Institute, Erasmus MC, Stanford University, and University of Wisconsin, to develop simulation-based modelling approaches for investigating the impact of breast cancer interventions, with a focus on prevention, screening, and treatment. Each group models the natural history of breast cancer as part of their investigation, and the last three groups model breast cancer tumour growth as a continuous time growth process. All groups model localised tumour stages, regionally spread stages, and distant metastatic stages, but only the University of Wisconsin group models breast tumour spread as a continuous time process. To model breast tumour spread, the Wisconsin group uses a model proposed by Shwartz,[13] which assumes that tumour volume V grows exponentially with an individually assigned growth rate, and that the instantaneous rate of additional lymph node spread at time t is equal to , where V(t) is tumour volume at time t, represents growth rate at time t, and b1, b2 and b3 are constants. The group has modified the growth component slightly. They assume an exponential Gompertz function with decelerating doubling time. In fitting the model to observed breast cancer incidence data, the group found the overall model fit to be inadequate. When simulating lymph node progression and calibrating against U.S. breast cancer surveillance data, they found that the Shwartz model produced too much lymph node spread for large tumours. The model also generated too little lymph node spread for small tumours. Consequently, in order to improve fit, they made two ad-hoc adjustments to the model. First, they adjusted lymph node spread for large tumours by simulating spread based on tumour diameters 25% smaller than predicted by the growth model. Second, they assumed that 1% of all invasive tumours had four lymph nodes involved at tumour onset and that 2% had five or more lymph nodes involved. Related tumour spread models have been proposed by Hanin and Yakovlev[14] and, as mentioned, by Plevritis et al.[7] Hanin and Yakovlev based their model on the model of Shwartz[13] and assumed that tumours grow exponentially and that the rate of additional lymph node spread is proportional only to tumour volume. They introduced a number of additional assumptions and provided a detailed mathematical description of the model. Plevritis et al.[7] described a simpler model. They assumed that the hazard of a localised tumour spreading to the lymph nodes is proportional to the volume of the tumour. They also relied on exponential tumour growth. From the observations that: a) the CISNET University of Wisconsin group had to introduce additional assumptions to fit the Shwartz model to data and b) that the Shwartz model represents a generalisation of the models of Hanin and Yakovlev and Plevritis et al., we conclude that existing models can be improved upon. In this article, we back this claim by showing that the Shwartz model has two inherent weaknesses. The first weakness is that the model implies that slow growing tumours have a higher degree of lymph node spread, compared to fast growing tumours, and the second weakness is that the model implies either an unrealistically high degree of lymph node spread for large tumours or an unrealistically low degree of spread for small tumours. Based on these observations, there is a need for new statistical models of regional lymph node spread. This article is structured in the following way. We present the models of Shwartz[13] and Hanin and Yakovlev,[14] and describe these weaknesses in detail. We then propose several models of regional lymph node spread. The first one is based on Shwartz[13] but does not suffer the first weakness. The second one is a new model that addresses both of the weaknesses mentioned above. The new model assumes that the instantaneous rate of additional lymph node spread at time t, , is proportional to the number of times the tumour cells have divided at time t, D(t), and the rate of cell division in the tumour at time t, , Based on this, we propose a class of models in which every model avoids the weaknesses mentioned earlier, and show how our models can be modified to incorporate random effects to allow for heterogeneity in underlying rates of spread. We then describe a joint likelihood for tumour size and number of lymph nodes affected, given a patient's screening history and mode of detection. We use this likelihood to jointly estimate the tumour growth and lymph spread parameters from data on 1860 incident cases of breast cancer, collected from a population in which screening is offered. We show that our new models have superior model fit compared to the Shwarz-based model. In addition to showing that our new approaches provide better models of the mean, we show that incorporating random effects in the lymph node spread models further improves fit (dramatically so). We validate the lymph spread model on an independent data set consisting of 3961 women diagnosed with invasive breast cancer between January 2001 and December 2008 in the Stockholm-Gotland healthcare region in Sweden. We conclude with discussions of implications, strengths and weaknesses of the new models.

2 Traditional models of regional lymph node spread

Shwartz[13] described a joint process for tumour growth and regional lymph node spread. Given an inverse growth rate r (assumed to vary across individuals), he assumed that tumour volume grows exponentially from time t = 0 starting at an initial volume V0, corresponding to a sphere of diameter , and that the additional number of affected lymph nodes at time t follows an inhomogeneous Poisson process with intensity function The model included additional assumptions. In what follows, we show that the first two assumptions alone imply that a tumour at volume V, with an inverse growth rate r, has higher expected lymph node spread than a faster growing tumour of the same size. Based on the Poisson process proposed by Shwartz,[13] it follows that the intensity measure is given by Given r and the tumour growth model (1), the time t is determined by . Substituting this into the first term of the above expression gives For the inhomogeneous Poisson process, the intensity measure is the same as the expected value. Therefore, the expected number of affected lymph nodes at time t, given r, is Writing V = v, the expected number of lymph nodes affected, given R = r and V = v, is From this expression, we can identify the first weakness: namely, for a given tumour volume, it follows that if r is large (slow growing) the expected number of affected lymph nodes is large, and when r is small (fast growing) the expected number of lymph nodes is small. This property is not supported by empirical evidence. If slow growing tumours would have a comparatively higher degree of lymph node spread, then screen detected cancers would have more lymph node involvement compared to interval cancers, due to length biased sampling. Empirical data show that this is not true (see end of Section 5). Hanin and Yakovlev[14] used the same tumour growth function, but assumed that the intensity function was given by . Following the steps in the above argument for their model, the expected number of lymph nodes affected, given tumour volume and growth rate, is It follows that also their model is affected by the first weakness. Both models, but especially the model of Hanin and Yakovlev, exhibit a second weakness. Namely, that the rate of additional lymph node spread increases enormously with increasing tumour volumes. We illustrate this by comparing the expected number of lymph nodes for tumours of diameters 5 mm and 30 mm. Plugging these diameters into equation (2), for fixed values of r and γ, we find that the expectation is more than 200 times larger for the bigger tumour. Such an extreme difference is not supported by clinical data. In our data the mean number of lymph nodes for 30 mm tumours (2.15) is less than nine times that for 5 mm tumours (0.25). Shwartz[13] found that when simulating cohorts of symptomatic cancers based on his model, he produced too few affected lymph nodes in tumours with diameters smaller than 1 mm. The CISNET University of Wisconsin group also saw this when using their modified approach. They also found that the model generated too many lymph nodes in large tumours. Even though the second weakness, strictly speaking, does not have to apply to the Shwartz model, it clearly does, as the findings of the two groups show. Together, these weaknesses mean that new models of lymph node spread are needed.

3 Joint processes of tumour growth and lymph node spread

In this section, we present joint processes of tumour growth, time to symptomatic detection, and lymph node spread. The joint processes share the same models for tumour growth, variation in tumour growth, and time to symptomatic detection, but differ in their models of lymph node spread. All models are, however, based on the same general framework: an inhomogeneous Poisson process with intensity function dependent on tumour volume. The first, called model A, is a variation of Shwartz,[13] where we assume that the intensity function is proportional to the first derivative of tumour growth, . Model B is biologically inspired, and assumes that the intensity function is proportional to the number of times the tumour cells have divided and the rate of cell division in the tumour. Lastly, we present a larger class of models based on model B. In what follows, we describe the shared models, the shared modelling assumptions, and give detailed descriptions of the proposed lymph spread models.

3.1 Shared processes and assumptions

We use the original tumour growth process that Schwartz[15] described in 1961, and adopt the other shared models from Bartoszynski et al.,[6] Hanin and Yakovlev,[14] Plevritis et al.,[7] Abrahamsson and Humphreys,[9] and Isheden and Humphreys.[10] We assume that the tumour is monoclonal and originates from a spherical cell of diameter 10 µm, with corresponding volume . The tumour grows exponentially at a constant cell reproductive rate, here represented by the inverse growth rate r; the volume of the tumour t years after its onset is specified by We explain individual variation in growth rate with a gamma distribution of shape τ1 and rate τ2 Lastly, we assume that the tumour can be detected with non-zero probability, either symptomatically or via screening, from size V0, corresponding to a spherical tumour of diameter , and that the rate of symptomatic detection at time is proportional to the size of the tumour These processes are assumed to be independent of lymph node spread, i.e. the tumour does not grow faster or slower as it spreads and symptomatic detection is not triggered by lymph node metastases. We begin with the two models of spread which we call A and B. We then work in a stepwise fashion, developing model B to a class of models and then showing how these can be extended to incorporate random effects. In all models, we assume that spread occurs one cell at a time and that secondary tumours (in the lymph nodes) have the same cell reproductive rate as the primary tumour. We only model spread that eventually becomes clinically detectable, i.e. lymph node spread that is detectable by the physician once the primary tumour has been detected. Secondary tumours need to grow to size V0 to be clinically detectable (we could in theory use a volume different to V0 here). This means that between the time of tumour spread and the time at which the secondary tumour becomes a clinically detectable lymph node metastasis, there is a time lag of size t0. For a fixed inverse growth rate r, the time lag is defined by Given that secondary tumours are detectable first at size V0, a lymph node metastasis that is clinically detectable at time t must have occurred between times zero and , so that it has a volume greater than or equal to V0 at time t. Thus, if the intensity measure is , the number of clinically detectable lymph nodes at time t is Poisson distributed with intensity measure .

3.2 Model A

The first lymph node spread model is an inhomogeneous Poisson process with intensity function The intensity measure at time is Using equations (3) and (6) and introducing , the number of clinically detectable lymph nodes metastases at time T = t, given R = r, follows Given R = r and the tumour growth function (3), the time T = t is uniquely determined by the volume of the tumour. Writing V = v, the probability for N = n clinically detectable lymph nodes, given R = r and V = v, is The right hand side of equation (7) is independent of r. For model A, the probability for N = n clinically detectable lymph nodes, given volume V = v, is therefore Note that in the case of the two lymph spread models described in Section 2, given v, N is not conditionally independent of r.

3.3 Model B

We now focus on deriving a biologically inspired model for lymph node spread. We base our model on two observations: A) lymphatic fluid is hostile to tumour cells; it contains little oxygen and nutrients, and tumour cells in the lymphatic system are under constant attack by the immune system. In order to survive, tumour cells entering the lymph system need to be highly mutated. B) Cell migration and cell proliferation share some common growth factors, such as the Hepatocyte Growth Factor/scatter factor.[16] Thus, the rate of cancer spread may be related to the rate of cell division. Based on A) and B), we assume that the rate of lymph node spread is proportional to the average number of mutations in the cancer cells and the rate of cancer cell division. The first of these quantities is not observable, but assuming a constant rate of mutation during cell division, the average number of mutations in the cancer cells is proportional to the number of times the cells in the tumour has divided. In summary, the second spread model is an inhomogeneous Poisson process with intensity function where D(t, r) is the number of times the cells in the tumour has divided and is the rate of cell division in the tumour – both at time t, assuming an inverse growth rate r. Assuming that cancer cells form a spherical and densely packed tumour, and that cancer cells resist cell death, the number of times the cells in the tumour has divided is calculated by describing tumour volume as a doubling process. We get the number of cell divisions from the following relation Using equations (3), (6), and (9), we derive the intensity measure at time as follows Introducing , the number of clinically detectable lymph nodes metastases at time T = t, given R = r, follows Given R = r and the tumour growth function (3), the time T = t is uniquely determined by the volume of the tumour. Writing V = v, the probability for N = n clinically detectable lymph nodes, given R = r and V = v, is As in model A, the right hand side of equation (10) is independent of r. Therefore, for model B, the probability for N = n clinically detectable lymph nodes, given volume V = v, is

3.4 A new class of models for lymph node spread

Based on model B, we here define a class of mathematically tractable models for lymph node spread. We show that if the intensity function is assumed to be proportional to the kth power of the number of cell divisions in the tumour and the rate of cell division in the tumour, and if we make the same assumptions as in model B, we can derive closed forms for and . These functional forms are harder to motivate. However, we note that if model fit would be better for a higher power of k, it could imply that lymph node spread depends on higher powers of tumour mutation or that breast cancer tumours mutate at an accelerating rate (referred to as genomic instability[17]. We define the new model class, similarly to model B, by assuming that lymph node spread follows an inhomogeneous Poisson process with intensity function where k is a number greater than minus one, t is time, r is the inverse growth rate of the tumour, D(t, r) is the number of times the cells in the tumour has divided, and is the rate of cell division in the tumour. It follows that the intensity measure at time is where . Similar to earlier, the probability for N = n clinically detectable lymph nodes, given R = r and V = v, is and the probability for N = n clinically detectable lymph nodes, given volume V = v, is

3.5 Random effects modelling of lymph node spread

So far we have concentrated on developing new models of the mean numbers of affected lymph nodes. Breast cancer is, however, a heterogeneous disease; just as tumours grow at different speeds for different women, it would seem reasonable that breast cancer lymph node spread will occur at different rates for different women. We derive here a Poisson process where the constant factor s is gamma distributed. As before, we assume that It follows that the intensity measure at time is where . Now, the probability for N = n clinically detectable lymph nodes, given S = s, R = r and V = v, is If we assume that s is gamma distributed with shape γ1 and inverse scale γ2 then it follows that the probability for N = n clinically detectable lymph nodes, R = r and V = v, is As before, does not involve r, and therefore the probability for N = n clinically detectable lymph nodes, given S = s and V = v, is This probability follows a negative binomial distribution with and

4 Likelihood for incident cases in the presence of screening

To jointly estimate the parameters of the processes, we derive a likelihood function for incident breast cancer cases, collected in the presence of screening. This approach requires a model for mammography screening test sensitivity. A screening test depends primarily on two factors: tumour size and mammographic density. Mammographic density reflects the different tissues in the breast. Fatty tissue appears dark on a mammogram, whereas fibroglandular tissue is bright. Since tumours also appear bright, they can be concealed in fibroglandular regions. A widely used measure of mammographic density is percentage density, which is measured as the fraction of pixels within the breast region on the mammogram that have an intensity above a particular threshold. For screening sensitivity, we adopt a model from Abrahamsson and Humphreys.[9] We assume that the probability for a positive screening test, given a tumour in the breast, is equal to where d is the diameter of the tumour and m is percentage density of the breast. Implicitly, we assume that screening test sensitivity is independent of lymph node spread. We can use the model for screening sensitivity, along with the other models, to write the joint likelihood of tumour size and number of lymph nodes affected, conditioning on screening history and mode of detection. Under stable disease assumptions[14,10] and assuming that tumour growth rate is independent of screening attendance, it has been shown that optimising this likelihood, using incident cases only, yields unbiased parameter estimation.[10] The stable disease assumptions are The rate of births in the population is constant across calendar time, The distribution of age at tumour onset is constant across calendar time, and The distribution of time to symptomatic detection is constant across calendar time. These assumptions manifest in a constant incidence of breast cancer in the population. We discuss these assumptions in the light of our analysis in section 7. Pathologists tend to round small tumour diameters to the nearest mm, and larger tumour diameters to the closest 5 or 10 mm. Therefore, we divide tumour sizes into 24 different millimetre size intervals, , and express the likelihood of those discrete size categories. Each likelihood is schematically written as where we use medical history to denote the time of tumour detection, the mode of detection, the number and time points of previous screening visits, and percentage mammographic density (conceptually, any type of medical history, such as previous use of hormone replacement therapy, could be included). In the following sections, we express the likelihood mathematically, using the following notation: The likelihood is treated somewhat differently for screen detected and symptomatically detected cancers. For the sake of clarity, we omit mammographic density from the likelihood calculations. A – There is a tumour in the woman's breast at time t. B0 – A tumour is screen detected at time t. – No tumour is detected at screenings 1 through n previous to detection (at years prior to time t). C – The tumour is in size interval i at time t. – The tumour is in size interval l at τ years previous to time t. D – The tumour is symptomatically detected at time t. N – The number of lymph nodes affected at time t is j.

4.1 Likelihood for screen detected cases

Given that a tumour is screen detected, the probability of the tumour being in size interval i with j lymph nodes affected is We rewrite the probability algebraically, and use independence of screening test and lymph node status to get where , and . The value q = 0 represents a tumour that is too small to be clinically detectable, i.e. tumour diameter less than 0.5 mm. When there is no screen previous to detection, we omit the last summation from the product.

4.2 Likelihood for symptomatic cases

Given that a tumour is symptomatically detected, the probability of the tumour being in size interval i with j affected lymph nodes is Similarly as for screen detected cases, we rewrite the probability algebraically. Here, however, we also use independence of nodal involvement and symptomatic detection. We get As before, when there is no screen previous to detection, we omit the last summation from the product.

4.3 Calculating the likelihood

The likelihoods described in equations (18) and (19) are the joint probabilities of tumours belonging to size interval i and having j affected lymph nodes, conditioned on mode of detection, numbers and times of previous negative screens, and mammographic density (the latter is omitted from the likelihood expressions for simplicity, but is included in our calculations for the analyses presented in the next section). There are seven different quantities in the likelihood, which we express in terms of models (3) to (5), the screening sensitivity (17), and lymph node models (7) and (8), (10) and (11), (12) and (13), or (15) and (16). The first quantity is , the probability for a positive screen, given the size of the tumour. This is the screening sensitivity, which we model using equation (17). This quantity helps adjust the tumour size distribution of screened tumours, which is different from the tumour size distribution of symptomatic tumours. The second quantity is , the probability of being in size interval i, given symptomatic detection, which is equal to . Without further information, P(D) is constant, and thus the is equal to the probability of having a symptomatic detection at size interval i. Based on equations (1), (4), and (5), Plevritis et al.[7] showed that the volume at symptomatic detection is given by The proof is based on integrating (5) from V0 to infinity. Since our value of V0 is the same as in Plevritis, and the tumour starts growing before this value, we can use equation (20) to calculate . It should be noted that this factor only conditions on the tumour being symptomatic. In other words, this factor does not take into consideration that there have been previous negative screens. This is instead accounted for by quantities five and seven. The third quantity is , the probability that the tumour is in size interval i, given that there is a clinically detectable tumour in the breast. Isheden and Humphreys[10] showed that this quantity satisfies where is the upper/lower bound of tumour size interval i, and c is some average value between c and c. We calculate this quantity with c being the geometric mean of c and c. As with quantity two, this quantity does not take into consideration previous negative screens or the current positive screen. Those factors are instead adjusted for by quantities one, five, and six. The fourth quantity is , the probability of having j lymph nodes affected when the tumour is in size interval i. To calculate this probability we use equation (8) in model A, we use equation (11) in model B, we use equation (13), for the class of lymph spread models, and we use equation (16) for the random effects model. These equations are conditioning on a single value of the volume. For approximating the probability when conditioning on a tumour size interval, our conditioning value is the geometric mean of the upper and lower bounds of size interval i. This factor describes only the number of lymph nodes conditional on tumour size interval. The screening history is adjusted for by quantities one, five, six, and seven. The fifth quantity is , the probability of n negative previous screens, given the size of the tumour at the first previous negative screen and the size at detection. This probability is calculated as where is the probability of a negative screening at the mth screen prior to detection, calculated from equation (17). The sizes of the tumour at the previous screens are calculated by projecting backwards from the trajectory intersecting the midpoints of intervals q and i. This is one of four quantities that adjust for the screening history in the likelihood. The sixth quantity is , the probability to be in size interval q at time point t1, given that the tumour is found in size interval i with j lymph nodes affected. It is calculated by marginalising the probability over growth rate, using We approximate by 1 if a tumour in size interval i, growing with an inverse growth rate r, passes the q:th size interval t1 years previous to detection, and 0 otherwise. For models A and B and for the class of lymph spread models, it holds that where v is the geometric mean of the upper and lower bounds of size interval i. In all three cases, this follows from the fact that and from Theorem 3 in Isheden and Humphreys,[10] which states that This quantity accounts for the tumour growth rate when adjusting for the screening history in the likelihood for screen detected tumours. The seventh quantity is , the probability to be in size interval q at time point t1, given that the tumour is symptomatically detected in size interval i with j lymph nodes affected. This quantity is calculated in the same way as the sixth quantity. This quantity accounts for the tumour growth rate when adjusting for the screening history in the likelihood for symptomatic tumours. For the four lymph node spread models described here, the likelihood is separable; we can separate into a size component and a nodes component. This can be seen, for example for the likelihood for screening cases, by writing The same goes for the likelihood for symptomatic cases. For the models of Hanin and Yakovlev[14] and Shwartz,[13] the likelihood calculation is not as straightforward. The quantity has to be calculated by marginalising over growth rate using . In both cases, , which means that the likelihood does not separate into two components which can be optimised independently. For example, for Hanin and Yakovlev's lymph node spread model, i.e. with , it can be shown that For their lymph node spread model is calculated using The likelihood that we describe in this section is complex. It relies on several approximations that are needed mainly to account for discretisation in the estimation procedure. To verify that we implemented the methods correctly, we performed a simulation study. The aim of the study was to show that we can accurately retrieve parameter estimates from the likelihood. The results are shown in Appendix 1.

5 Joint modelling of tumour size and lymph node spread – a study of incident invasive breast cancer in post-menopausal women

We illustrate the joint approach by fitting models A and B to 1860 cases of incident invasive breast cancer from a case-control study of post-menopausal breast cancer[18] known as CAHRES. The study invited all Swedish born women ages 50–74 that were diagnosed with invasive breast cancer in Sweden from October 1993 to March 1995. The participation rate of the study was 84% (n = 3345). In extensions of the study, analog mammographic images were retrieved from mammography screening units and radiology departments managing mammography screening in Sweden. Information on tumour size, screening history, and mode of detection was collected from the Swedish Cancer Registry and the Stockholm-Gotland Breast Cancer Registry. The collection of this data has been described previously by Rosenberg et al.[19,20] and Eriksson et al.[21] We excluded women from our analysis if they had missing lymph node status, lacked written consent, had a previous or other cancer diagnosis, had a noninvasive breast cancer diagnosis, were diagnosed before or after study period, were pre-menopausal, had unknown age at menopause, lacked screening information, lacked images, had a missing mode of diagnosis, or were missing tumour size. After those exclusions, 1860 were cases available for analysis. Descriptive information on the 1860 cases included in our analyses is presented in Table 1.

Table 1.

Comparison of screening and symptomatically detected cancers in CAHRES.

	Screening	Symptomatic
Number of cases	1133	727
Tumour size in mm (median and quartiles)	12(9,18)	20 (13, 26)
Percentage density (median and quartiles)	13.6(6.8,23.3)	15.7(8.6,28.1)
Time since last negative screen in years[a] (median and quartile)	2.0(1.8,2.1)	1.4(1.0,2.0)
Number of previous screens
Cases with no previous screen	133	197
Cases with one previous screen	214	105
Cases with two previous screens	658	247
Cases with three or more previous screens	128	178
Number of affected lymph nodes
Cases with no affected lymph nodes	890	438
Cases with one affected lymph node	103	104
Cases with two affected lymph nodes	45	55
Cases with three or more affected lymph nodes	95	130

Among cases with at least one negative screen.

Comparison of screening and symptomatically detected cancers in CAHRES. Among cases with at least one negative screen. We fitted model A and model B by maximising the likelihood over parameters , and σ or σ. Parameter estimates are given in Table 2. For each model, we used 200 non-parametric bootstrap replicates to estimate 95% coverage intervals, using the percentile method. Comparing the two models in terms of their likelihood values, it is clear that model B provides a much better fit to the data than model A. Model-based estimates of expected lymph node spread as a function of tumour size are plotted alongside observed numbers in Figure 1 (neither model fits the data well). In the figure, each circle represents the observed averages for each tumour size interval. The bars intersecting each circle represent 95% confidence intervals, obtained via bootstrapping.

Table 2.

Parameter estimates in joint models of tumour size and lymph node spread, together with bootstrapped 95% coverage intervals, based on 1860 post-menopausal breast cancer cases (CAHRES).

Parameter	Model A	Model B
τ ₁	2.12(1.65,3.03)	2.17(1.55,3.12)
τ ₂	4.26(2.76,7.51)	4.40(2.93,8.05)
-log(η)	8.09(7.53,8.55)	8.09(7.53,8.58)
β ₁	-4.70(-5.16,-4.40)	4.70(-5.24,-4.41)
β ₂	0.58(0.48,0.78)	0.58(0.50,0.81)
β ₃	-2.10(-3.88,-0.93)	-2.11(-3.77,-0.77)
σ _A	0.00017(0.00014,0.00020)	–
σ _B	–	0.010(0.0093,0.012)
-logL(θ)	7700.5	7342.3

Figure 1.

Model-based estimates of expected lymph node spread as a function of tumour size, based on CAHRES. Circles and bars represent averages and 95% confidence intervals of numbers of lymph nodes affected within each tumour size interval. Model A (dotted) produces excessive spread at large tumour sizes, while model B (solid) underestimates spread at large tumour sizes. Parameter estimates in joint models of tumour size and lymph node spread, together with bootstrapped 95% coverage intervals, based on 1860 post-menopausal breast cancer cases (CAHRES). We attempted to jointly fit the tumour growth model with the lymph spread models of Hanin and Yakovlev, and Shwartz. In both cases, the models did not converge, and we were not able to retrieve parameter estimates. Their lymph spread models are not consistent with the data (see Section 2). If we would have been able to get the joint model to converge, we know that the lymph node models of Hanin and Yakovlev and Shwartz would have over-estimated the number of affected lymph nodes at large tumour sizes and underestimated the number of affected lymph nodes at small tumour sizes to the same extent as model A does. Although model B does underestimate the number of lymph nodes at larger tumour sizes, overall, it provides a better fit to the data than model A, and the models of Hanin and Yakovlev, and Shwartz. Next, we experimented with other functional forms from our new class of lymph node spread models. In Table 3, we display estimates of the parameter σ from equations (12) and (13) for , together with optimised log-likelihood values from fitting the joint models of tumour size and spread to the 1860 breast cancer cases.

Table 3.

Parameter estimates and log-likelihood values for different functional forms of the Poisson lymph node spread model (CAHRES).

Parameter	k=1	k = 2	k = 3	K = 4	k = 5	k = 6
σ _C	1.03·10-2	9.61·10-4	8.80·10-5	7.88·10-6	6.90·10-7	5.88·10-8
-logL(θ)	7342.3	7201.6	7107.5	7056.6	7046.2	7074.1

Parameter estimates and log-likelihood values for different functional forms of the Poisson lymph node spread model (CAHRES). Parameter estimates and log-likelihood values for different functional forms of the random effects lymph node spread model (CAHRES). From the integer values, we achieved the best model fit for k = 5, with a log-likelihood difference of 296.1 compared to the spread component of model B (k = 1). Varying both σ and k, we found the optimal value of k to be 4.75, with 95% confidence interval , based on the profile likelihood. In Figure 2, we plot estimates of expected lymph node spread based on the model with k = 5, alongside those based on model A.

Figure 2.

Model-based estimates of expected lymph node spread as a function of tumour size (CAHRES). Circles and bars represent averages and 95% confidence intervals of numbers of lymph nodes affected within each tumour size interval. The spread component of Model A (dotted) produces excessive spread in large tumours, whereas in terms of expected numbers of affected lymph nodes the spread model with k = 5 (solid) fits at all tumour sizes. We fitted joint models of tumour size and lymph node spread, based on the random effects models described by equations (15) and (16), for ; see Table 4. Allowing for heterogeneity in rates of spread improved model fit tremendously for all considered integer values of k. Improvements in optimised log-likelihood values ranged from 1617.9 to 1359.8, and differences in model fit, across different values of k, also diminished greatly. Varying and k, we obtained an estimate of 4.11 for k, with a 95% confidence interval . In Figure 3, we plot estimates of expected number of lymph nodes affected based on the random effects lymph spread model with k = 4. These estimates pass through all the 95% confidence intervals (except one, which comes very close). In Figure 4 we plot the observed numbers of lymph nodes (bars) within two size categories, along with the model predicted probabilities at the end points of the intervals. The random effects model accounts for overdispersion in relation to the Poisson model. We note that if we were to represent model A, allowing or not allowing for overdispersion, even in this plot, the prediction of the mean value of the number of lymph nodes would be overestimated for large tumour sizes.

Figure 3.

Figure 4.

Observed and predicted numbers of affected lymph nodes (CAHRES). The bars represent the observed numbers of affected lymph nodes, within tumour size interval 10–15 mm (left) and 35–45 mm (right), in the CAHRES dataset. Circles represent predicted probabilities from the Poisson model with k = 5, estimated on the CAHRES data set, and dots represent predicted probabilities from the random effects Poisson model with k = 4, also estimated on the CAHRES data set.

Model-based estimates of expected lymph node spread as a function of tumour size (CAHRES). Circles and bars represent averages and 95% confidence intervals of numbers of lymph nodes affected within each tumour size interval. The spread component of Model A (dotted) is plotted alongside the random effects spread model with k = 4 (solid). Observed and predicted numbers of affected lymph nodes (CAHRES). The bars represent the observed numbers of affected lymph nodes, within tumour size interval 10–15 mm (left) and 35–45 mm (right), in the CAHRES dataset. Circles represent predicted probabilities from the Poisson model with k = 5, estimated on the CAHRES data set, and dots represent predicted probabilities from the random effects Poisson model with k = 4, also estimated on the CAHRES data set. Finally for CAHRES, we divided the data set into screen detected cases and symptomatically detected cases, and plotted 95% confidence intervals for average number of affected lymph nodes, along with the model-based estimates obtained from fitting model A and the random effects Poisson model with k = 4; see Figure 5. Although the estimates based on our selected model intersect all but one of the confidence intervals, there is some suggestion (at large tumour sizes) that the model could be underestimating expected number of lymph nodes in symptomatic cases and overestimating the expected numbers in screening cases.

Figure 5.

Model-based estimates of expected lymph node spread as a function of tumour size (CAHRES). To the left, circles and bars represent averages and 95% confidence intervals of numbers of lymph nodes affected within each tumour size interval for screen detected cancers, and to the right the corresponding quantities for symptomatically detected cancers. On both figures, the spread component of Model A (dotted) is plotted alongside the random effects spread model with k = 4 (solid).

6 Validation study of the random effects lymph node spread model

We attempted to validate our lymph node spread model using an independent data set of women diagnosed with invasive breast cancer between January 1, 2001 and December 31, 2008 in the Stockholm-Gotland healthcare region in Sweden, known as Libro-1. These women were identified though the Regional Breast Cancer Register. Information on diagnosis and tumour characteristics were available, but not on time and number of screening rounds. Women were excluded if they were less than 50 years old, underwent diagnostic operations, were pre-operatively diagnosed with in situ breast cancer but pathology reports showed an invasive component, had incorrect dates of diagnosis, had more than 63 days between diagnosis and date of surgery, had missing tumour size, or missing lymph node status. In 2007, the registers changed the definitions and procedures for evaluating lymph node spread. To keep the data set comparable to the CAHRES data set, we excluded women that were categorised according to the new standard. The final data set consisted of 3961 women. In Figure 6, we plot 95% confidence intervals of number of affected lymph nodes, within tumour size intervals, based on the Libro-1 data (bars), along with the expected number of lymph nodes, as a function of tumour diameter, based on the random effects model with k = 4, estimated from the CAHRES data. We also fitted the random effects model to the Libro-1 data with ; see Table 5. With an integer value for k, model fit was best at k = 4 also on this data set. Estimates of expected numbers of affected lymph nodes for this model are plotted as the solid line in Figure 6. In Figure 7, we plot the observed numbers of lymph nodes (bars) within two size categories, for the Libro-1 data, along with the model predicted probabilities at the end points of the intervals (i.e. self-trained), based on the random effects models fitted to CAHRES data, and Libro-1 data, and also the Poisson model with k = 5, estimated from CAHRES without random effects. Even with parameter values obtained from the CAHRES data, the random effects (k = 4) lymph node spread model seems to fit the Libro-1 data on numbers of affected lymph nodes extremely well.

Figure 6.

Table 5.

Parameter estimates and log-likelihood values for different functional forms of the random effects lymph node spread model (Libro-1).

Parameter	k = 1	k = 2	k = 3	k = 4	k = 5	k = 6
log(γ1)	–1.31	–1.24	–1.20	–1.20	–1.21	–1.24
log(γ2)	3.39	5.84	8.25	10.63	12.98	15.30
-logL(θ)	4984.8	4938.9	4914.7	4911.6	4927.6	4959.8

Figure 7.

Observed and predicted numbers of affected lymph nodes (Libro-1). The bars represent the observed numbers of affected lymph nodes, within tumour size interval 10–15 mm (left) and 35–45 mm (right), in the Libro-1 dataset. Circles represent predicted probabilities from the Poisson model with k = 5, estimated on the CAHRES data set, dots represent predicted probabilities from the random effects Poisson model with k = 4, also estimated on the CAHRES data set, and crosses represent estimated probabilities from the random effects Poisson model with k = 4, estimated on the Libro-1 data set.

Model-based estimates of expected lymph node spread as a function of tumour size based on the random effects Poisson model (k = 4), estimated on CAHRES (dotted line) and Libro-1 (solid line), along with 95% confidence intervals of average lymph node spread obtained from Libro-1. Observed and predicted numbers of affected lymph nodes (Libro-1). The bars represent the observed numbers of affected lymph nodes, within tumour size interval 10–15 mm (left) and 35–45 mm (right), in the Libro-1 dataset. Circles represent predicted probabilities from the Poisson model with k = 5, estimated on the CAHRES data set, dots represent predicted probabilities from the random effects Poisson model with k = 4, also estimated on the CAHRES data set, and crosses represent estimated probabilities from the random effects Poisson model with k = 4, estimated on the Libro-1 data set. Parameter estimates and log-likelihood values for different functional forms of the random effects lymph node spread model (Libro-1). When estimating and k from the Libro-1 data, we estimated k to have a value of 3.65 and a 95% confidence interval of . 95% confidence intervals for k, estimated from Libro-1 and CAHRES, overlapped, and both included k = 4.

7 Discussion

Continuous growth models offer an interesting alternative to multi-state Markov models for studying the natural history of breast cancer. Previously proposed continuous growth models have components for tumour growth, time to symptomatic detection, and screening sensitivity. The aim of this article has been to add an additional component for lymph node spread. We began this article by reviewing the literature of breast cancer lymph spread models. We identified two models, one from Hanin and Yakovlev,[14] and one from Shwartz,[13] which is also used by the CISNET University of Wisconsin group. Both models are Poisson processes with intensity functions dependent on tumour volume. In this paper, we show that these models have two weaknesses. The first is that slow growing tumours spread more quickly than fast growing tumours, and the second is that the rate of additional lymph node spread grows excessively with increasing tumour volume. In order to avoid these two weaknesses, we have improved upon the existing models and developed new models of lymph node spread in a step-by-step fashion. We focused first on modelling the mean structure and then extended the lymph node spread model to incorporate random effects. The first step of the process was to construct a model A, which avoids an inverse relation between tumour growth rate and lymph node spread. This was done by removing the terms from the intensity function that contributed to the inverse relationship in Shwartz[13] model. We were able to estimate the parameters of model A jointly with the tumour growth models (see Table 2). This was not the case with the models of Hanin and Yakovlev,[14] or Shwartz. Since we were not able to make those models converge and since we were able to remove the inverse relationship between tumour growth rate and lymph node spread, we consider model A an improvement on Hanin and Yakovlev, and Shwartz. In the second step, we created model B. At this step, we addressed the second weakness. Model A assumes a linear relationship between the expected number of lymph nodes affected and tumour volume. Because tumour growth is assumed to be exponential with time, this linear relationship implies that the number of affected lymph nodes grows exponentially with time. To decrease the rate of spread in the model, we introduce a logarithmic term. We assume that the intensity function depends on the number of cell divisions instead, which is equivalent to the tumour volume divided by the volume of a single cell. We found that model B was an improvement on model A, although it overestimated spread at small tumour sizes and underestimated spread at large tumour sizes. Model B removes the exponential spread behaviour of previous models in the literature, and provides a basis on which to build further. We tested different shapes of the spread functions by introducing a class of lymph spread models. These models differ in their shape, defined by a factor k, with model B represented as a special case (k = 1). In this model class, we found that k = 5 provided good model fit. In terms of expected values, this model fitted well across all tumour volumes. By extending the lymph node spread models to allow for random effects, we were able to incorporate heterogeneity in rates of lymph node spread. This extension turned out to be extremely important, and corrected for overdispersion with respect to the classical Poisson models. In fitting the overdispersed random effects model, we obtained a point estimate of k = 4.11 using the CAHRES study data, and an estimate of k = 3.65 using the Libro-1 study data. The 95% confidence intervals for k, estimated on the two data sets, overlapped and included k = 4; this value provided good model fit in both data sets. The analyses in this paper rely on the assumptions of a stable disease population and the assumption that screening attendance is independent of tumour growth rate. For the joint analysis of size and lymph node spread, we have worked with CAHRES, a nationwide cohort with 84% participation rate. The study invited all Swedish born women ages 50 to 74 that were diagnosed with invasive breast cancer in Sweden from October 1993 to March 1995. In the absence of screening, a population satisfying stable disease assumptions will exhibit a constant incidence of breast cancer.[10] Once a screening program has run for a number of years, we expect a constant incidence if the stable disease assumptions hold. Of the 26 counties in Sweden, 22 had implemented screening programmes by, and in many cases well before, 1990,[22] and incidence data from the Swedish Cancer Registry shows that breast cancer incidence was approximately constant between 1991 and 1997.[23] In the current study, all women were post-menopausal at diagnosis. It is unlikely that a large fraction of the women took part in extra surveillance for breast cancer, which means that the assumption that screening attendance is independent of tumour growth rate is likely to be reasonable. The joint model of tumour growth and lymph node spread has two main areas of application. The first one is for evaluating screening programs, which can be done via microsimulation. Several research groups[12,24] have used Markov models to simulate the natural history of breast cancer. As the number of disease states increases, these models become impractical, especially if the objective is to simulate screening options based on individual risk factors. For this, continuous growth models present a strong alternative. The second area of application is to study factors behind growth and spread. Abrahamsson et al.[11] used continuous growth models to regress BMI on the log inverse growth rate, and breast size on the log of the hazard proportionality constant in the model for time to symptomatic detection, and Isheden and Humphreys[10] studied in detail the relationship between mammographic density, tumour size, and screening sensitivity. For the new sub-model for lymph node spread, we are currently working on extensions of the model to study association with observable factors, both traditional breast cancer risk factors and tumour characteristics/subtypes. As we have pointed out, our models assume several well known biological properties of cancer. The fact, however, that the k = 4 model fits better than the k = 1 model implies that there may be a degree of genomic instability as breast cancer cells divide. Finally, we point out that in our work we have not been able to specify a tractable model where fast growing tumours spread more rapidly than slow growing ones. It is possible that an alternative model with this characteristic will also provide a good fit to incidence data on tumour size and lymph node spread.

Table 4.

Parameter estimates and log-likelihood values for different functional forms of the random effects lymph node spread model (CAHRES).

Parameter	k=1	k = 2	k = 3	k = 4	K = 5	k = 6
log(γ₁)	–1.58	–1.51	–1.46	–1.43	–1.43	–1.44
log(γ₂)	3.13	5.58	8.00	10.38	12.73	15.05
−logL(θ)	5724.4	5702.1	5688.5	5683.5	5686.4	5696.4

Table 6.

Biases, standard errors, and coverages of 95% confidence intervals based on 500 randomly generated cohorts.

Model	Parameter	True value	Bias (%)	Standard error	Coverage of 95% CI
All models	τ ₁	2.36	+ 2.2%	0.008	95.2%
	τ ₂	4.16	+ 3.5%	0.023	95.4%
	-log(η)	8.36	+ 0.2%	0.004	94.4%
	β ₀	–5.2	−7.6%	0.006	21.2%[a]
	β ₁	0.560	−4.9%	0.001	81.0%
Model A	σ _A	0.000170	+ 0.1%	1·10-7	94.0%
Model B	σ _B	0.010	0.0%	7·10-6	93.8%
Extended Poisson	σ _C	7.88·10-6	0.0%	6·10-9	95.8%
Negative binomial	log(γ1)	–1.43	+ 0.2%	0.002	94.0%
	log(γ2)	10.38	+ 0.3%	0.003	94.6%

The coverage of β0 is highly dependent on the parametrisation of the model for screening sensitivity. Changing the location of the model, we can achieve 95% coverage probability, as explained in the text below.

22 in total

1. A biomathematical approach to clinical tumor growth.

Authors: M SCHWARTZ
Journal: Cancer Date: 1961 Nov-Dec Impact factor: 6.860

2. Estimating screening test sensitivity and tumour progression using tumour size and time since previous screening.

Authors: Harald Weedon-Fekjaer; Steinar Tretli; Odd O Aalen
Journal: Stat Methods Med Res Date: 2010-03-31 Impact factor: 3.021

3. The influence of mammographic density on breast tumor characteristics.

Authors: Louise Eriksson; Kamila Czene; Lena Rosenberg; Keith Humphreys; Per Hall
Journal: Breast Cancer Res Treat Date: 2012-06-19 Impact factor: 4.872

4. A natural history model of stage progression applied to breast cancer.

Authors: Sylvia K Plevritis; Peter Salzman; Bronislava M Sigal; Peter W Glynn
Journal: Stat Med Date: 2007-02-10 Impact factor: 2.373

5. Significance of circulating hepatocyte growth factor level as a prognostic indicator in primary breast cancer.

Authors: M Toi; T Taniguchi; T Ueno; M Asano; N Funata; K Sekiguchi; H Iwanari; T Tominaga
Journal: Clin Cancer Res Date: 1998-03 Impact factor: 12.531

6. Natural history of breast cancers detected in the Swedish mammography screening programme: a cohort study.

Authors: Per-Henrik Zahl; Peter C Gøtzsche; Jan Mæhlen
Journal: Lancet Oncol Date: 2011-10-11 Impact factor: 41.316

7. A Monte Carlo tool to simulate breast cancer screening programmes.

Authors: C Forastero; L I Zamora; D Guirado; A M Lallena
Journal: Phys Med Biol Date: 2010-08-16 Impact factor: 3.609

8. Menopausal hormone therapy in relation to breast cancer characteristics and prognosis: a cohort study.

Authors: Lena U Rosenberg; Fredrik Granath; Paul W Dickman; Kristjana Einarsdóttir; Sara Wedrén; Ingemar Persson; Per Hall
Journal: Breast Cancer Res Date: 2008-09-19 Impact factor: 6.466

9. Estimation of mean sojourn time in breast cancer screening using a Markov chain model of both entry to and exit from the preclinical detectable phase.

Authors: S W Duffy; H H Chen; L Tabar; N E Day
Journal: Stat Med Date: 1995-07-30 Impact factor: 2.373

Review 10. Hallmarks of cancer: the next generation.

Authors: Douglas Hanahan; Robert A Weinberg
Journal: Cell Date: 2011-03-04 Impact factor: 41.582

1 in total

1. Estimating latent, dynamic processes of breast cancer tumour growth and distant metastatic spread from mammography screening data.

Authors: Alessandro Gasparini; Keith Humphreys
Journal: Stat Methods Med Res Date: 2022-02-01 Impact factor: 2.494

1 in total