Literature DB >> 32968502

Augmenting disease maps: a Bayesian meta-analysis approach.

Farzana Jahan¹, Earl W Duncan¹, Susanna M Cramb², Peter D Baade², Kerrie L Mengersen¹.

Abstract

Analysis of spatial patterns of disease is a significant field of research. However, access to unit-level disease data can be difficult for privacy and other reasons. As a consequence, estimates of interest are often published at the small area level as disease maps. This motivates the development of methods for analysis of these ecological estimates directly. Such analyses can widen the scope of research by drawing more insights from published disease maps or atlases. The present study proposes a hierarchical Bayesian meta-analysis model that analyses the point and interval estimates from an online atlas. The proposed model is illustrated by modelling the published cancer incidence estimates available as part of the online Australian Cancer Atlas (ACA). The proposed model aims to reveal patterns of cancer incidence for the 20 cancers included in ACA in major cities, regional and remote areas. The model results are validated using the observed areal data created from unit-level data on cancer incidence in each of 2148 small areas. It is found that the meta-analysis models can generate similar patterns of cancer incidence based on urban/rural status of small areas compared with those already known or revealed by the analysis of observed data. The proposed approach can be generalized to other online disease maps and atlases.

Entities: Chemical Disease Gene Species

Keywords: cancer atlas; cancer incidence; disease atlas; geographical patterns; online estimates; small area estimates

Year: 2020 PMID： 32968502 PMCID： PMC7481717 DOI： 10.1098/rsos.192151

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

A major field of research in spatial epidemiology involves the construction of disease maps showing the spatial distribution of disease [1]. Disease mapping is usually undertaken using observational disease data (spatially aggregated count data) to show small area estimates of disease incidence, prevalence, mortality or relative risk of a specific disease in small areas [2]. Sometimes the disease maps are published in the form of an atlas, for example, Surveillance Atlas of Infectious Diseases [3], the Atlas of Heart Disease and Stroke [4], the Environment and Health Atlas of England and Wales [5], the U.S. Atlas of Cancer Mortality [6], Atlas of Cancer in Queensland [7], Cancer Atlas of the United Kingdom and Ireland [8] and Australian Cancer Atlas [9]. Since the observational disease data are usually subject to privacy and confidentiality constraints, the modelled disease rates for small areas of a country or region are reported in the atlases. If these reported estimates can be used to extract important information about the spatial distribution of a disease and the influence of different risk factors on the incidence and/or survival, this would widen the scope of research in spatial epidemiology. Meta-analysis is a popular method for combining results from different studies quantitatively [10], particularly when those results are in the form of aggregate or summary data [11,12]. The present study aims to develop a Bayesian hierarchical meta-analysis model which will model published estimates of disease from atlases to identify specific patterns in disease incidence and will yield similar types of inferences to those obtained using the raw data. The proposed model is illustrated for a specific disease, cancer, in the present study. Cancer is the second leading cause of death in the world [13] and is an important topic of research in the field of spatial epidemiology [14]. In the twenty-first century, cancer is expected to become the single most significant obstacle to increasing life expectancy in every country in the world [15]. In 2018, the estimated number of new cases of cancer was 18.1 million and estimated number of deaths from cancer was 9.6 million worldwide [15]. A major field of research in cancer epidemiology is the assessment of spatial patterns of cancer. Substantial research has focused on assessing spatial patterns of cancer in different parts of the world [16-23]. The common sources of spatially referenced cancer data used for spatial analysis of cancer are population-based cancer registries; those are supplemented with additional population data, health survey data along with environmental and remote sensing data [24]. Most of the previous studies modelling geographical patterns of cancer incidence used observed population-based counts of new cases of cancer in a region or area, adjusted by the size of the population in each area, for specified periods obtained from a population or cancer registry [20,25,26]. Models for spatial patterns often incorporate spatial smoothing. Among many methods of smoothing, Bayesian methods [27] are very popular, for example, via Gaussian Markov Random fields [28], especially conditional autoregressive (CAR) models, as a prior for the spatial effects [29]. Some common model formulations using CAR representations for spatial smoothing in disease mapping are the Besag–York–Mollié (BYM) model [30], Leroux model [31] and Cressie model [32]. Often, researchers wish to undertake further analysis of disease patterns in these atlases. As an alternative to accessing the underlying observational data, the reported estimates in the atlases can be further modelled using a Bayesian hierarchical meta-analysis model to gain additional insights. In the literature, there are many studies that perform meta-analysis of disease estimates (standardized incidence or mortality ratio, SIR/SMR) across different published studies [33-36]. However, meta-analysis models using estimates derived from areal data from a single database, particularly to analyse the outcomes of a cancer atlas have not been explored yet, to the best of our knowledge. The proposed Bayesian hierarchical meta-analysis model approach is not a traditional application of meta-analysis to estimates from different published studies; rather, it is an application of meta-analysis to estimates from a single study, that quantifies the ecological effect of covariates using model-based estimates. The proposed model is applied to analyse the spatially smoothed point and interval estimates available in the Australian Cancer Atlas (ACA) for the 20 different cancers in 2148 small areas (Statistical Area level 2, SA2, defined by Australian Bureau of Statistics (ABS), Australian Statistical Geography Standard (ASGS) 2011 boundaries [37]) across the country. The primary question of interest that is used to illustrate the approach is focused on whether cancer incidence varied in urban, regional and remote areas across Australia. Each of the SA2s belongs to each of the three remoteness categories according to geographical and accessibility to services within each area. In cancer epidemiology, studies to examine the relationship of cancer incidence/survival/mortality and geographic remoteness is well researched [38-41]. For example, the relationship between risk of advanced colorectal cancer incidence in Queensland and geographical remoteness was found to be significant for those diagnosed with colon cancer [42]. A classification tree approach has also confirmed a significant association between remoteness and the incidence of several cancers [41]. The cancer disparities in different remoteness categories have also been researched in [43-46], etc. All the mentioned studies focused on the influence of remoteness on cancer outcomes using population-based cancer data. Hence, in the present study, this well-researched and important research question is chosen to provide additional comparisons by which the validity of the proposed approach can be assessed. The meta-analysis approach was proposed because this opens up other avenues for extracting insights from data when only the summary data are published and the original data are unavailable. This is often the case for health data which are subject to privacy and confidentiality. It is true that the original authors could be approached to undertake the follow-on analyses, but this may not always be possible: for example, they could simply refuse, or not have time or funding to implement the request. Moreover, even if the original data were available, the analysis of primary data often requires some domain knowledge. The proposed approach provides a statistically valid methodology to model the published point estimates, taking into account their associated uncertainty, in straightforward manner. We show that this can facilitate new insights, in our case an enhanced understanding of the spatial distribution of cancer. Following this introduction the paper is organized as follows: §2 consists of a description of data; §3 describes the methods and §4 reports the results; §5 outlines some possible model extensions followed by a discussion in §6.

Data

The present study uses publicly available small area estimates from the Australian Cancer Atlas (ACA) [9]. The ACA is a freely accessible and interactive online platform, showing the spatial variation in standardized incidence and excess deaths for 20 cancers across Australia (see table 13 for a list of the cancers). A key feature of the ACA is the use of Bayesian spatial models to generate the point estimates and their 95% credible intervals for the estimates of standardized incidence ratios (SIR) and excess hazard ratios (EHR) for each cancer in each of 2148 geographical areas (SA2) covering Australia. To generate the estimates of SIR, unit-level data on each cancer over different time periods were modelled using Bayesian spatial models. For 14 cancer types (oesophageal, stomach, liver, pancreatic, cervical, uterine, ovarian, kidney, brain, thyroid, non-Hodgkin lymphoma, leukaemia, myeloma and head and neck), data on a 10-year time period (2005–2014) were used. For the remaining six cancer types (bowel, lung, melanoma, breast, prostrate, all cancers combined), data on a 5-year time period (2010–2014) were used.

Table 13.

Pairwise comparison and average ranks of regions for all cancers (males and females).

		probabilities			ranks
cancer	sex	major cities > regional	major cities > remote	regional > remote	major cities	regional	remote
all cancers	male	0.00	0.992	0.999	2	1	3
all cancers	female	0.012	0.943	0.985	2	1	3
bowel cancer	male	0.000	0.145	0.974	3	1	2
bowel cancer	female	0.000	0.468	0.994	3	1	2
brain cancer	male	0.578	0.892	0.878	1	2	3
brain cancer	female	0.716	0.826	0.777	1	2	3
breast cancer	female	1.000	0.999	0.992	1	2	3
cervical cancer	female	0.000	0.000	0.000	3	2	1
head and neck cancer	male	0.000	0.000	0.000	3	2	1
head and neck cancer	female	0.000	0.000	0.000	3	2	1
kidney cancer	male	0.605	0.995	0.994	1	2	3
kidney cancer	female	0.007	0.784	0.935	2	1	3
leukaemia	male	0.257	0.735	0.788	2	1	3
leukaemia	female	0.936	0.476	0.302	2	3	1
liver cancer	male	1.000	0.456	0.000	2	3	1
liver cancer	female	1.000	0.129	0.000	2	3	1
lung cancer	male	0.000	0.000	0.0001	3	2	1
lung cancer	female	0.001	0.006	0.061	3	2	1
melanoma	male	0.000	1.000	1.000	2	1	3
melanoma	female	0.000	0.432	1.000	3	1	2
myeloma	male	0.652	0.955	0.942	1	2	3
myeloma	female	0.998	0.992	0.941	1	2	3
non-Hodgkin lymphoma	male	1.000	1.000	0.999	1	2	3
non-Hodgkin lymphoma	female	1.000	0.999	0.937	1	2	3
oesophageal cancer	male	0.000	0.000	0.0002	3	2	1
oesophageal cancer	female	0.000	0.000	0.0001	3	2	1
ovarian cancer	female	0.996	0.955	0.816	1	2	3
pancreatic cancer	male	0.766	0.573	0.487	1	2	3
pancreatic cancer	female	0.587	0.725	0.698	1	2	3
prostate cancer	male	0.440	1.000	1.000	2	1	3
stomach cancer	male	1.000	1.000	0.953	1	2	3
stomach cancer	female	1.000	0.999	0.904	1	2	3
thyroid cancer	male	1.000	1.000	0.878	1	2	3
thyroid cancer	female	1.000	1.000	0.898	1	2	3
uterine cancer	female	0.769	0.069	0.046	2	3	1

Statistical Areas level 2 (SA2) are medium-sized general-purpose areas designed to represent a community that interacts together socially and economically [37]. There are 2196 SA2 regions (ASGS 2011 Boundaries [37]) covering the whole of Australia without gaps or overlaps. In the ACA, SA2s with zero/nominal population, far-flung islands are excluded. To address the research question, we focused on the SIR and information on the remoteness status in 2011 of each SA2. This information on remoteness was obtained using the Remoteness Index provided by the Australian Bureau of Statistics (ABS), which classifies small areas in Australia into five categories of remoteness based on their relative access to services. There are five categories of Remoteness—Major city, Inner regional, Outer regional, Remote and Very remote. The original five categories were combined into three classes as: 1 = Major Cities (1242 SA2s), 2 = Inner/Outer Regional (810 SA2s) and 3 = Remote/Very Remote (96 SA2s) in the present study.

Methods

Bayesian meta-analysis Model

Meta-analysis can be accomplished by applying fixed effects or random effects models to analyse aggregate or summary data or individual data published in different studies on the same subject [11,12]. There has been an active literature on Bayesian approaches of meta-analysis of different types of data, since the use of hierarchical Bayesian model to cast a random effects model [47]. For more details on meta-analysis using Bayesian inference, choice of prior distributions, Bayesian computation and interpretation with examples, see Koricheva et al. [48]. The Bayesian spatial model used to generate the estimates reported in the ACA is described in §3.2. Since this model has already incorporated spatial smoothing to protect the identity of the individuals, additional smoothing terms are not considered in the proposed meta-analysis model, although we revisit this in the section on Model extensions. In the following, we adopt the approach taken in the ACA and model each cancer individually.

Model specification

The data available for modelling are the published estimates of the posterior mean standardized incidence rate (SIR) and corresponding 95% credible intervals for each of N = 2148 statistical areas (SA2). Using notation based on [49], let Y be the modelled log(SIR) for the ith SA2 (i = 1, 2, ….2148) in the jth remoteness region (j = 1, 2, 3, 1 = major cities, 2 = regional and 3 = remote). Similarly, let S be the associated standard deviation of Y, obtained from (log(UCL) − log(LCL))/(2 × 1.96), where LCL and UCL are, respectively, the lower and upper bounds of the corresponding published 95% credible intervals for the SIR. The proposed hierarchical Bayesian meta-analysis model for each of the cancers can be formulated aswhere μ denotes the true value of the log(SIR) in the ith SA2 inside the jth region with associated variance . The standard deviations S are used to formulate prior for the associated variance parameter, , which is shown in equation (3.5). Now, μ can be modelled aswhere θ is the region-specific mean of the logSIR for the specific cancer and is the corresponding variance. The region-specific mean θ can in turn be modelled aswhere μ0 is the overall mean log(SIR) and is the associated variance. Priors of the above hierarchical Bayesian meta-analysis model can be specified as:where ν is the associated degrees of freedom for , and are the precision parameters associated with the region-specific variance and the overall variance, and N+ denotes the zero-mean truncated normal distribution truncated from below at zero. The above model is fitted for each sex and each of the 20 cancers independently. Thus, we did not include any cancer-specific or sex-specific indices. The choices of priors and associated hyperparameters were made on the basis of the results of the sensitivity analysis. A total of 76 combinations of priors using different distributions and different values for the hyperparameters were compared and the above model was evaluated to perform reasonably well in terms of convergence (assessed by visual diagnostics and the Gelman–Rubin statistic [50]) and posterior predictive checks (posterior predictive distribution plots with observed value for summary measures, Bayesian posterior predictive p-values and visual comparisons between replicated data drawn using a posterior predictive distribution with the estimates from the ACA). For the prior on the overall mean, μ0, a normal prior with many different choices of variance parameter were compared. For the variance parameters, and , after comparing all the conventional priors (inverse Gamma, half normal for standard deviation and so on), we found that, for modelling the atlas output data, placing half normal priors on precision parameters gave better model performance, hence we made the above choices. We set ν = 2, as a small value of ν reflects little a priori knowledge about the within-study variation. For more details of the sensitivity analysis, see the electronic supplementary material.

Model implementation

The proposed model was fitted using the R programming language (R 3.5.3) [51] using the package R2jags v. 0.5-7 [52]. The MCMC output of the JAGS model was summarized in R using the coda package [53]. The JAGS code for the model is given in appendix A. Three parallel MCMC chains, each with 500 000 iterations with a burn in period of 400 000 iterations were run to fit the proposed model. Convergence was examined by using visual diagnostics for the parameters of interest. The parameters for which posterior samples are drawn are: μ, θ, τ, μ0 and τ0. In this study, we focused on the region-specific estimates of mean log(SIR) represented by θ, the corresponding precision parameters τ and the overall mean μ0 along with the corresponding precision parameter τ0. The region-specific means of log(SIR), θ, or , which is the mean SIR in jth region, are the key parameters of interest to describe the pattern of incidence of each cancer in major cities, regional and remote areas.

Model inferences

A range of measures were computed in order to identify patterns of cancer incidence by remoteness of the regions. Using the posterior estimates of region-specific means of log(SIR), θ, in each MCMC iteration the regions were ranked in descending order and the summary of ranks along with 95% credible intervals and probability distributions of ranks for each region were calculated for each cancer. The posterior means of θ (j = 1, 2, 3) were also used to obtain pairwise comparisons of region-specific incidence of each cancer probabilistically. For each of the cancers, the values of θ in each of the MCMC iterations were ordered, thus enabling calculation of the probabilities of major cities having higher posterior means of log(SIR) than that of regional and remote areas, also the probability of regional areas having higher incidence than those in remote areas for males and females. These probabilities provided additional support to the comparisons made using ranks for the observed patterns of cancer incidence by remoteness. In addition to the pairwise probabilistic comparison of major cities, regional and remote areas, the point and interval estimates of posterior mean differences were calculated and reported for each of the three pairs (major cities and regional, major cities and remote, regional and remote) for all the cancers by sex.

Model validation

The results of the proposed Bayesian meta-analysis model for all 20 cancers were verified by comparing with those obtained from unit-level analysis of cancer registry data from each cancer in 2148 SA2s in Australia. Following the choices adopted in ACA, the Leroux model [31] was fitted using the observed incidence data for each cancer. Let Z be the number of reported cancer diagnoses in the ith SA2, and jth region modelled aswhere E represents the expected number of cancers in the ith SA2 and jth region, which can be calculated for each SA2 in each region by multiplying the SA2 population with the ratio of number of cancers diagnosed in Australia to the Australian population in each 5-year age group, then summing over all age groups. Under this model, the log(SIR), μ, was modelled as,where I ∈ (0, 1) is an indicator for the jth region (j = 1, 2, 3 for major cities, regional and remote areas, respectively) and β is the associated coefficient. Note that the model adopted in the ACA does not have the covariate I to consider the remoteness of an area in the model; it was included here in order to validate the proposed meta-analysis model. The term R accounts for spatial autocorrelation between the SA2s. The spatial random effects term has a conditional distribution [31]; That is, the expected random effects are weighted averages of the random effects of neighbouring areas, R−1 = {R1, R2, …, R, R, …, R} with weight ρ and have a global mean of 0 with weight (1 − ρ). If the spatial dependence parameter, ρ, equals 1, the Leroux prior is the same as the intrinsic CAR prior [30], and if ρ equals 0, the areas are spatially independent and no spatial smoothing occurs. The term W is the (i, k)th element of N × N spatial adjacency matrix having value 1 if areas i and k are considered to be neighbours and 0 otherwise. Neighbours are generally defined by shared boundaries, although islands are assigned neighbours predominately based on the closest mainland access points. An inverse Gamma prior is specified for the variance and a uniform (0, 1) prior is chosen for the spatial-dependent parameter, ρ. For further details, see https://atlas.cancer.org.au/methodology/model-for-cancer-diagnosis/. In the Leroux model applied to observed incidence data (equations (3.8)–(3.10)), the regions, major cities, regional and remote were used as covariates and a spatial random effects term (R) was included to account for spatial autocorrelation. In the proposed meta-analysis model, the regions are added in a hierarchy of the model (see equations (3.1)–(3.7)) and no spatial term was included explicitly since this was already in the reported ACA estimates. To compare the results of the proposed meta-analysis model and the Leroux model fitted to the observed (real) areal data of cancer incidence, the relative difference between the posterior estimates of region-specific means of the meta-analysis model (denoted by ) and the unit record model (denoted by ), was calculated asThe posterior distribution of ranks of the region-specific mean SIRs (for major cities, regional and remote areas) are also calculated for both models.

Results

The proposed model was applied to analyse the outputs of each of the 20 cancers included in the ACA for both males and females, where applicable. In §4.1, the detailed results obtained by fitting the proposed model for two cancers, lung cancer and thyroid cancers for males and females are reported. Following this, the findings for all the cancers are summarized in §4.2. This is followed in §4.3 by a comparison of the outputs of meta-analyses with those found by analysing the raw incidence data in order to validate the proposed modelling approach.

Extended results for two cancers

Lung cancer

The summary estimates from the posterior samples for the parameters of interest for lung cancer (males and females), obtained from fitting the proposed meta-analysis model by running MCMC chains (§3.1.2), are given in table 1. The table shows that the overall fitted mean of log(SIR), calculated by μ0, is 0.049 for males (95% CI: −0.362, 0.459), i.e. the overall SIR of 1.05 (95% CI: 0.696,1.582). The corresponding estimates for females are an overall mean log(SIR) of 0.013 (95% CI: −0.388, 0.415), i.e. an overall SIR of 1.01 (95% CI: 0.674,1.514). The estimates of θ1, θ2 and θ3 are the region-specific means of the log(SIR) in major cities, regional and remote areas, respectively. From the summary results, we can see that the standardized incidence of lung cancer is relatively higher than the Australian average in remote areas and regional areas, and lower than the Australian average in major cities.

Table 1.

Summary of posterior samples for lung cancer.

	males			females
		0.025th	0.975th		0.025th	0.975th
	mean	quantile	quantile	mean	quantile	quantile
deviance	−1578.190	−1698.225	−1455.656	−2042.063	−2153.995	−1928.675
μ₀	0.049	−0.362	0.459	0.013	−0.388	0.415
τ₁	28.618	25.518	31.921	39.242	35.668	42.984
τ₂	31.828	28.249	35.541	35.810	32.125	39.641
τ₃	14.205	10.495	35.541	14.965	11.151	19.058
τ₀	11.943	2.129	18.266	12.387	2.200	26.940
θ₁	−0.058	−0.074	−0.042	−0.030	−0.044	−0.016
θ₂	0.034	0.016	0.053	0.006	−0.012	0.023
θ₃	0.172	0.100	0.245	0.063	−0.006	0.133

Summary of posterior samples for lung cancer. For a more detailed comparative analysis of geographical disparities in lung cancer incidence, the region-specific means, θ, were ranked in descending order (region with highest mean log(SIR) is ranked 1) in each iteration of the MCMC chains generated for fitting the proposed model. The summaries of the posterior distribution of ranks of each region and the probability distribution of ranks per region are shown in tables 2 and 3.

Table 2.

Summary of posterior distribution of ranks for region-specific mean for lung cancer incidence.

remoteness category	median rank		95% credible interval
remoteness category	male	female	male	female
major cities	3	3	(2,3)	(2,3)
inner and outer regional	2	2	(2,3)	(2,3)
remote	1	1	(1,1)	(1,1)

Table 3.

Probability distribution of region-specific ranks for lung cancer.

	probabilities
	major cities		regional		remote
rank	male	female	male	female	male	female
1	0.000	0.000	0.000	0.021	1.000	0.979
2	0.000	0.001	0.999	0.979	0.000	0.021
3	1.000	0.999	0.000	0.021	0.000	0.001

Summary of posterior distribution of ranks for region-specific mean for lung cancer incidence. Probability distribution of region-specific ranks for lung cancer. From tables 2 and 3 of the ranks of the area-specific means, it is clear that areas within major cities are more likely to have lower incidence of lung cancer and remote areas are most likely to have higher incidence of lung cancer for both sexes. This is supported by the pairwise comparisons given in table 4, the pairwise posterior mean differences of log(SIR), shown in table 5 and the posterior relative risks of SIRs for regional and remote areas with reference to major cities shown in table 6.

Table 4.

Pairwise comparison of region specific mean for lung cancer.

	probability
pairs	male	female
major cities > regional	0	0.001
major cities > remote	0.001	0.006
regional > remote	0.172	0.058

Table 5.

Posterior mean differences (pairwise) with 95% credible intervals for lung cancer.

	males		females
	posterior	95% CI	posterior	95% CI
pairs	mean difference		mean difference
major cities–regional	−0.092	(−0.117, −0.068)	−0.036	(−0.057, −0.013)
regional–remote	−0.138	(−0.212, −0.064)	−0.057	(−0.129, 0.015)
major cities–remote	−0.230	(−0.157, −0.303)	−0.093	(−0.022, −0.164)

Table 6.

Posterior relative risks for lung cancer in regional and remote Australia (baseline: major cities).

	relative risk (95% credible interval)
remoteness category	males		females
regional	1.096	(1.093, 1.099)	1.037	(1.032, 1.040)
remote	1.259	(1.190, 1.333)	1.097	(1.083, 1.162)

Pairwise comparison of region specific mean for lung cancer. Posterior mean differences (pairwise) with 95% credible intervals for lung cancer. Posterior relative risks for lung cancer in regional and remote Australia (baseline: major cities). This is also well established in the literature that lung cancer is known as a low socio-economic status (SES) cancer, having higher risk in the areas with low SES neighbourhood [54,55]. In Australia, lung cancer risk is more in remote areas in comparison to major cities and regional areas [56]. So the output of the proposed meta-analysis models for lung cancer by remoteness categories are supported by the existing literature as well. Figures 1 and 2 show the SIRs of lung cancer (for males and females) from the ACA and the fitted Bayesian hierarchical meta-analysis models. There is larger variability in the SIRs from the atlas compared with those of the fitted model, since the effect of remoteness has been removed from the estimates in our fitted model. The atlas estimates are obtained grouping all small-area estimates by remoteness categories without considering remoteness in the model, whereas in the proposed model, the posterior SIRs are the result of a model where remoteness categories are included as a covariate to obtain region-specific SIRs for major cities, regional and remote areas.

Figure 1.

Boxplots of SIRs per region from ACA and fitted model for lung and thyroid cancers (males).

Figure 2.

Boxplots of SIRs per region from ACA and fitted model for lung and thyroid cancers (females).

Boxplots of SIRs per region from ACA and fitted model for lung and thyroid cancers (males). Boxplots of SIRs per region from ACA and fitted model for lung and thyroid cancers (females).

Thyroid cancer

The summary estimates from the posterior samples for the parameters of interest for thyroid cancer (males and females) are given in table 7. The table shows that the overall fitted mean log(SIR) is −0.104 for males (95% CI: −0.514, 0.307), i.e. the overall SIR of 0.901 (95% CI: 0.598, 1.359). The corresponding estimates for females are −0.206 (95% CI: −0.636, 0.221), i.e. an overall SIR of 0.814 (95% CI: 0.529, 1.247). The estimates of θ1, θ2 and θ3 are the specific means of the log(SIR) in major cities, regional and remote areas, respectively. From the summary results, we can see that the standardized incidence of thyroid cancer is relatively higher than the Australian average in major cities and lower than the Australian average in regional and remote areas.

Table 7.

Summary of posterior samples for thyroid cancer.

	males			females
		0.025th	0.975th		0.025th	0.975th
	mean	quantile	quantile	mean	quantile	quantile
deviance	−1617.290	−1733.616	−1498.474	−940.021	−1077.563	−800.930
μ₀	−0.104	−0.514	0.307	−0.206	−0.636	0.221
τ₁	28.658	25.660	31.818	14.801	13.111	16.652
τ₂	33.874	30.209	37.731	18.963	16.187	22.041
τ₃	16.103	12.252	20.306	14.793	11.035	18.893
τ₀	12.088	2.112	26.621	11.248	1.849	24.984
θ₁	0.012	−0.003	0.028	−0.001	−0.020	0.019
θ₂	−0.143	−0.162	−0.125	−0.285	−0.309	−0.261
θ₃	−0.183	−0.250	−0.117	−0.334	−0.405	−0.263

Summary of posterior samples for thyroid cancer. The summaries of the posterior distribution of ranks of each region and the probability distribution of ranks per region for thyroid cancer are shown in tables 8 and 9.

Table 8.

Summary of posterior distribution of ranks for region-specific means for thyroid cancer incidence.

	median rank		95% credible interval
remoteness category	male	female	male	female
major cities	1	1	(1,3)	(1,3)
inner and outer regional	2	2	(1,3)	(1,3)
remote	3	3	(1,3)	(1,3)

Table 9.

Probability distribution of region-specific ranks for thyroid cancer.

	probabilities
	major cities		regional		remote
rank	male	female	male	female	male	female
1	1.000	0.999	0.000	0.058	0.000	0.000
2	0.000	0.001	0.878	0.942	0.122	0.102
3	0.000	0.000	0.122	0.000	0.898	0.898

Summary of posterior distribution of ranks for region-specific means for thyroid cancer incidence. Probability distribution of region-specific ranks for thyroid cancer. From tables 8 and 9, it is apparent that thyroid cancer is most likely to have higher incidence in the areas within major cities for both sexes, which is supported by the pairwise comparisons given in table 10, pairwise posterior mean differences of log(SIR), shown in table 11, and the posterior relative risks of SIRs for regional and remote areas with reference to major cities in table 12. Figures 1 and 2 also depicts the SIRs of thyroid cancer (for males and females) from the ACA and the fitted Bayesian hierarchical meta-analysis models.

Table 10.

Pairwise comparison of region-specific mean for thyroid cancer.

	probability
pairs	male	female
major cities > regional	1.000	1.000
major cities > remote	1.000	1.000
regional > remote	0.878	0.898

Table 11.

Posterior Mean Differences (pairwise) with 95% credible intervals for thyroid cancer.

	males		females
	posterior	95% CI	posterior	95% CI
pairs	mean difference		mean difference
major cities–regional	0.155	(0.131, 0.180)	0.285	(0.254, 0.315)
regional–remote	0.040	(−0.028, 0.110)	0.049	(−0.026, 0.125)
major cities–remote	0.196	(0.264, 0.128)	0.333	(0.407, 0.260)

Table 12.

Posterior relative risks for thyroid cancer in regional and remote Australia (baseline: major cities).

	relative risk (95% credible interval)
remoteness category	males		females
regional	0.856	(0.8536, 0.8585)	0.753	(0.7490, 0.7557)
remote	0.823	(0.7812, 0.8647)	0.717	(0.6801, 0.7544)

Pairwise comparison of region-specific mean for thyroid cancer. Posterior Mean Differences (pairwise) with 95% credible intervals for thyroid cancer. Posterior relative risks for thyroid cancer in regional and remote Australia (baseline: major cities). Regarding the mismatches in the probabilities and posterior mean differences, the width of credible intervals constructed for the posterior mean differences obviously reflects the amount of variation in these estimates. While calculating the probabilities, we calculated the proportion of times the fitted mean for major cities is larger than the corresponding means for regional and remote areas, and so on. We did not consider the magnitude of the differences or the uncertainty of the obtained probabilities. We inferred differences between regions based on the credible intervals for the posterior mean differences.

Summary of outputs for Bayesian meta-analysis model

A summary of outputs for all the 20 cancers included in ACA is reported in this section. Table 13 shows the pairwise probabilities of major cities having higher SIR than regional and remote regions, as well as the probabilities of regional areas exceeding remote regions with respect to SIR, for each of the 20 cancers (by sex where applicable). For instance, on average, the SIR in major cities is almost certainly larger than the SIR in regional and remote areas for eight cancer/sex groups (for males: non-Hodgkin lymphoma, stomach and for females: breast, myeloma, non-Hodgkin lymphoma, ovarian, stomach, thyroid). Conversely, on average, major cities are not likely or substantially less likely to have higher SIRs than regional and remote areas for 10 cancer sex group (for males: bowel, head and neck, lung, oesophageal and for females: bowel, cervical, head and neck, lung, melanoma, oesophageal). On the contrary, for leukaemia (females), the probability that the SIR is larger in major cities than regional areas is 0.476 which indicates little difference between the SIR in major cities and regional areas, on average. Similar interpretation can be made for pancreatic cancer (for males: P(major cities > remote) = 0.573, P(regional > remote) = 0.487; for females: P(major cities > regional) = 0.587), bowel cancer (for females: P(major cities > regional = 0.587) and kidney cancer (for males: P(major cities > regional) = 0.605). So a probability close to 0.5 does not indicate substantial difference between the pair under comparison. However, the ranking of the three regions is still able to show a pattern, so in the mentioned cases the pattern is suggestive, not substantial. In addition to the ranks and probability, table 14 shows the relative risks for each cancer by sex. The relative risks shown in table 14 provide the magnitude increase or decrease in the SIRs of regional and remote areas in Australia compared with those of major cities, on the average.

Table 14.

Relative risks for cancers by remoteness categories and sex (reference: major cities).

cancers	relative risk (95% credible intervals)
	males				females
	regional		remote		regional		remote
all	1.038	(1.036, 1.043)	0.931	(0.897, 0.978)	1.016	(1.013, 1.018)	0.961	(0.922, 1.001)
bowel	1.088	(1.086, 1.093)	1.033	(0.985, 1.079)	1.076	(1.072, 1.079)	1.002	(0.958, 1.048)
brain	0.989	(0.995, 1.000)	0.957	(0.921, 1.011)	0.995	(0.992, 0.999)	0.974	(0.931, 1.019)
breast					0.969	(0.966, 0.972)	0.908	(0.869, 0.947)
cervical					1.054	(1.048, 1.059)	1.243	(1.173, 1.318)
head and neck	1.469	(1.270, 1.281)	1.677	(1.585, 1.775)	1.116	(1.111, 1.121)	1.280	(1.213, 1.349)
kidney	0.998	(0.994, 1.000)	0.923	(0.879, 0.969)	1.026	(1.022, 1.031)	0.974	(0.924, 1.027)
leukaemia	1.006	(1.001, 1.009)	0.982	(0.938, 1.028)	0.985	(0.981, 0.989)	1.002	(0.953, 1.054)
liver	0.824	(0.821, 0.829)	1.005	(0.936, 1.083)	0.840	(0.836, 0.844)	1.054	(0.979, 1.134)
lung	1.096	(1.093, 1.099)	1.259	(1.190, 1.333)	1.037	(1.032, 1.040)	1.097	(1.083, 1.162)
melanoma	1.125	(1.128, 1.126)	0.834	(0.789, 0.875)	1.241	(1.246, 1.242)	1.021	(0.957, 1.062)
myeloma	0.996	(0.992, 1.000)	0.945	(0.897, 0.996)	0.973	(0.969, 0.977)	0.926	(0.882, 0.973)
nH lymphoma	0.959	(0.956, 0.963)	0.868	(0.830, 0.908)	0.949	(0.946, 0.952)	0.909	(0.870, 0.950)
oesophageal	1.239	(1.235, 1.245)	1.403	(1.332, 1.480)	1.239	(1.235, 1.246)	1.404	(1.331, 1.481)
ovarian					0.979	(0.976, 0.982)	0.955	(0.913, 0.997)
pancreatic	0.993	(0.990, 0.997)	0.995	(0.949, 1.042)	0.998	(0.994, 1.000)	0.984	(0.940, 1.029)
prostate	1.001	(0.998, 1.005)	0.790	(0.747, 0.835)
stomach	0.934	(0.931, 0.938)	0.882	(0.838, 0.929)	0.906	( 0.903, 0.909)	0.867	(0.824, 0.912)
thyroid	0.856	(0.854, 0.859)	0.823	(0.781, 0.865)	0.754	(0.749, 0.756)	0.717	(0.680, 0.754)
uterine					0.993	(0.989, 0.997)	1.048	(0.997, 1.101)

Pairwise comparison and average ranks of regions for all cancers (males and females). Relative risks for cancers by remoteness categories and sex (reference: major cities). Figures 3 and 4 show the fitted values of SIRs obtained from the proposed meta-analysis model for major cities, regional and remote areas for each cancer for males and females, respectively; the horizontal line represents the Australian average. These figures show, at a glance, the large variation in the SIRs by remoteness categories. A comparison of these fitted values to the SIRs from the atlas is shown in figures 5 and 6. The comparison confirms that the two sets of figures show the same patterns of cancer incidence by remoteness of the regions to that present in the atlas data, i.e. for each of the cancers, the region with highest or lowest SIRs are the same despite having a different spread of the values.

Figure 3.

Box plots of Bayesian meta-analysis fitted SIR for males by region and cancer type.

Figure 4.

Box plots of Bayesian meta-analysis fitted SIR for females by region and cancer type.

Figure 5.

Boxplots of SIRs per regions from ACA and fitted model for all cancers for males.

Figure 6.

Boxplots of SIRs per regions from ACA and fitted model for all cancers for females.

Box plots of Bayesian meta-analysis fitted SIR for males by region and cancer type. Box plots of Bayesian meta-analysis fitted SIR for females by region and cancer type. Boxplots of SIRs per regions from ACA and fitted model for all cancers for males. Boxplots of SIRs per regions from ACA and fitted model for all cancers for females.

Validation of outputs of Bayesian meta-analysis model

In this section, the region-specific estimates obtained from the proposed Bayesian hierarchical meta-analysis model on the ecological data (equations (3.1)–(3.7)) and the Leroux model using the observed areal incidence data with remoteness categories as covariates (equations (3.8)–(3.10)) are compared. The posterior distribution of ranks of the region-specific mean SIRs (for major cities, regional and remote areas) for each cancer revealed consistently the same patterns. For instance, according to both models, SIR of brain cancer is most likely to have higher values in major cities compared with regional and remote areas, melanoma is most likely to have a higher SIR in regional areas compared with major cities and remote areas (table 13). The difference in posterior estimates of mean SIR for each cancer, for males and females, between the output of unit record model and Bayesian hierarchical meta-analysis model are displayed in figures 7 and 8 respectively. Table 15 shows the relative difference between the posterior estimates of region-specific means of the meta-analysis model and the unit record model.

Figure 7.

Actual differences in posterior mean SIR between the unit record model and meta-analysis model for males by region and cancer type.

Figure 8.

Actual differences in posterior mean SIR between the unit record model and meta-analysis model for females by region and cancer type.

Table 15.

relative differences in posterior means for SIRs by region.

	relative differences (%)
	males			females
cancers	major cities	regional	remote	major cities	regional	remote
all	−1.15	1.28	−1.66	−0.83	1.71	−1.16
bowel	−0.52	−2.77	3.49	−0.92	−1.13	4.53
brain	−0.63	−1.05	27.64	−0.19	−0.14	16.02
breast				−0.09	0.79	9.11
cervical				−2.47	1.19	−6.79
head and neck	−6.92	2.85	8.15	0.66	−7.79	−27.16
kidney	−1.66	4.46	8.30	0.27	−3.17	3.99
leukaemia	0.27	−1.07	−0.18	−0.39	1.36	−3.15
liver	0.72	−3.65	3.02	−2.67	9.03	−25.35
lung	−1.38	−1.15	3.32	−2.10	3.60	−19.26
melanoma	0.202	−7.64	−15.20	−0.89	−8.28	−16.64
myeloma	−1.24	0.59	31.41	−1.02	1.73	25.37
nH lymphoma	0.24	−1.03	14.41	−1.16	3.44	14.47
oesophageal	−0.84	−8.01	−12.66	−2.51	1.65	6.09
ovarian				−0.81	1.59	15.63
pancreatic	−1.46	3.45	1.90	−0.60	1.05	1.74
prostate	1.21	−2.92	−15.41
stomach	0.10	0.49	13.50	0.00	0.00	0.00
thyroid	4.38	−9.75	4.34	0.00	0.00	0.00
uterine				1.45	−3.17	−4.48

Actual differences in posterior mean SIR between the unit record model and meta-analysis model for males by region and cancer type. Actual differences in posterior mean SIR between the unit record model and meta-analysis model for females by region and cancer type. relative differences in posterior means for SIRs by region. The relative differences between the posterior means for major cities and regional areas are small, with the relative differences being less than 5% for most of the cancers, and less than 10% for only four cancers (table 15). However, the relative differences between the posterior mean estimates for remote areas are much larger. This is because of the smaller number of remote areas and the greater variability between the estimated SIRs in this category. See appendix A.2. for further exposition.

Model extensions

The proposed Bayesian hierarchical meta-analysis models have been applied to ACA data to identify relationship between remoteness categories and incidence of 20 different cancer types by sex. The proposed model could be extended in a number of ways. Two examples of these extensions are the inclusion of a spatial component and the inclusion of continuous covariates.

Inclusion of a spatial component

In the present study, we have modelled spatially smoothed estimates of log(SIR) from ACA. Since the estimated SIRs were already results of a Bayesian spatial model (with a Leroux prior for the spatial term), we did not use any spatial component in our proposed hierarchical Bayesian meta-analysis model. Further investigation of the residuals of the fitted models for 20 different cancer types by sex resulted in Moran’s I values ranging from 0.15380 to 0.89485 with corresponding standard deviations from 0.01447 to 0.01450 and all p-values less than 0.0001 (see appendix A.3, table 22). To improve the model performance, a spatial term could be added to the proposed meta-analysis model and a spatial prior could be chosen for modelling. This would be a straightforward extension to the specified model (equations (3.1)–(3.7)); instead of equation (3.1) in the present model, we can adopt the following:where, is an unstructured error component and ψ is a spatial random effect having a spatial prior. Any suitable spatial prior could be chosen to model the spatial component [30,31]. For illustration, the proposed Bayesian meta-analysis model with spatial component is fitted for two cancer types, pancreatic and liver cancer (males), which have Moran’s I values 0.15380 and 0.89485, respectively (table 22). The updated model codes adding a spatial component are available in appendix A.4.

Table 22.

Measures of spatial autocorrelation among the residuals of the fitted models.

cancer	sex	observed Moran’s I	s.d.	p-value^a
all	male	0.6964	0.01449	<0.0001
all	female	0.7117	0.01449	<0.0001
bowel	male	0.7113	0.01449	<0.0001
bowel	female	0.7455	0.01449	<0.0001
brain	male	0.6783	0.01449	<0.0001
brain	female	0.8135	0.01448	<0.0001
breast	female	0.6789	0.01449	<0.0001
cervical	female	0.7859	0.01450	<0.0001
head and neck	male	0.7064	0.01449	<0.0001
head and neck	female	0.7573	0.01449	<0.0001
kidney	male	0.7784	0.01447	<0.0001
kidney	female	0.7051	0.01449	<0.0001
leukaemia	male	0.8812	0.01450	<0.0001
leukaemia	female	0.6556	0.01449	<0.0001
liver	male	0.8948	0.01450	<0.0001
liver	female	0.8448	0.01450	<0.0001
lung	male	0.6644	0.01449	<0.0001
lung	female	0.6559	0.01449	<0.0001
melanoma	male	0.7984	0.01449	<0.0001
melanoma	female	0.8525	0.01449	<0.0001
myeloma	male	0.4846	0.01449	<0.0001
myeloma	female	0.84847	0.01449	<0.0001
non-Hodgkin lymphoma	male	0.8166	0.01449	<0.0001
non-Hodgkin lymphoma	female	0.8079	0.01449	<0.0001
oesophageal	male	0.7592	0.01449	<0.0001
oesophageal	female	0.7410	0.01450	<0.0001
ovarian	female	0.7459	0.01449	<0.0001
pancreatic	male	0.1538	0.01449	<0.0001
pancreatic	female	0.7649	0.01449	<0.0001
prostate	male	0.7382	0.01449	<0.0001
stomach	male	0.7688	0.01449	<0.0001
stomach	female	0.8589	0.01450	<0.0001
thyroid	male	0.8941	0.01450	<0.0001
thyroid	female	0.8568	0.01449	<0.0001
uterine	female	0.8144	0.01448	<0.0001

aNull hypothesis: no spatial autocorrelation is present and the data are randomly distributed.

Meta-analysis models with spatial component for liver and pancreatic cancer (males)

The Bayesian hierarchical meta-analysis model with spatial component introduced at first step as shown by equation (5.1) are fitted in R using R2WinBUGS package. We fitted the model for liver and pancreatic cancer (males). The results are summarized in this section. The posterior estimates of the model parameters and the corresponding 95% credible intervals for the two cancers are presented in table 16. The posterior estimates of region-specific means θ, j = 1, 2, 3 for each of the three regions (major cities, regional and remote areas) are ranked and the ranks are summarized in table 17. The ranks show that major cities have highest incidence of liver and pancreatic cancer. The median ranks are the same in terms of identifying the region with lowest incidence compared with those obtained by the model without the spatial component (table 13). However, for the region with highest incidence, the 95% credible intervals for both models agree in spite of differences in median ranks, as observed from the credible intervals of the ranks. The probabilities of major cities having larger incidence than regional and remote areas are shown in table 18, followed by the posterior mean differences with 95% credible intervals in table 19.

Table 16.

Summary of posterior samples for liver and pancreatic cancer (males).

	liver			pancreatic
		0.025th	0.975th		0.025th	0.975th
	mean	quantile	quantile	mean	quantile	quantile
deviance	−640.163	−1032.000	−341.800	−3138.183	−3274.000	−3004.000
μ₀	−0.090	−0.497	0.319	−0.011	−0.409	0.381
τ₁	37.356	33.640	41.210	53.158	49.185	57.110
τ₂	32.383	28.670	36.420	44.294	40.395	48.250
τ₃	12.782	9.062	16.850	17.756	13.760	22.020
τ₀	12.296	2.135	27.200	12.472	2.236	27.261
θ₁	−0.036	−0.059	−0.013	−0.009	−0.021	0.003
θ₂	−0.171	−0.202	−0.140	−0.014	−0.030	0.002
θ₃	−0.067	−0.170	0.039	−0.011	−0.073	0.050

Table 17.

Summary of posterior distribution of ranks for region specific mean for liver and pancreatic cancer (males).

	liver		pancreatic
remoteness category	median ranks	95% credible interval	median ranks	95% credible interval
major cities	1	(1,2)	1	(1,2)
inner and outer regional	3	(2,3)	2	(1,3)
remote	2	(1,2)	3	(1,3)

Table 18.

Pairwise comparison of region specific means.

	probability
pairs	liver	pancreatic
major cities > regional	1.000	0.691
major cities > remote	0.717	0.534
regional > remote	0.026	0.463

Table 19.

Posterior mean differences (pairwise) with 95% credible intervals.

	liver		pancreatic
	posterior	95% CI	posterior	95% CI
pairs	mean difference		mean difference
major cities–regional	0.136	(0.096, 0.175)	0.006	(−0.012, 0.024)
regional–remote	−0.104	(−0.193, −0.016)	−0.003	(−0.056, 0.049)
major cities–remote	0.031	(−0.064, 0.125)	0.003	(−0.051, 0.056)

Summary of posterior samples for liver and pancreatic cancer (males). Summary of posterior distribution of ranks for region specific mean for liver and pancreatic cancer (males). Pairwise comparison of region specific means. Posterior mean differences (pairwise) with 95% credible intervals. The posterior relative risks of liver and pancreatic cancer for the extended model (table 20) and the model without a spatial component (table 14) are also similar. Hence adding an additional spatial component to the Bayesian hierarchical meta-analysis model did not change the inference for the two cancers modelled here. In figures 9 and 10, we can observe the posterior mean SIRs for each region for both models. As expected, there are differences in the magnitude due to the additional spatial smoothing, but the observed pattern remains the same for both the cancers.

Table 20.

Posterior relative risks for liver and pancreatic cancer for males regional and remote Australia (baseline: major cities).

	relative risk (95% credible interval)
remoteness category	liver		pancreatic
regional	0.872	(0.867, 0.879)	0.995	(0.991, 0.999)
remote	0.966	(0.889, 1.046)	0.998	(0.949, 1.048)

Figure 9.

Posterior mean SIR of liver cancer (males) for meta-analysis models with and without spatial component.

Figure 10.

Posterior mean SIR of pancreatic cancer (males) for meta-analysis models with and without spatial component.

Posterior relative risks for liver and pancreatic cancer for males regional and remote Australia (baseline: major cities). Posterior mean SIR of liver cancer (males) for meta-analysis models with and without spatial component. Posterior mean SIR of pancreatic cancer (males) for meta-analysis models with and without spatial component. The relative differences of estimated log(SIR) in the three different remoteness regions using a meta-analysis model with a spatial component and the model using observed incidence data are shown in table 21. The relative differences of the model without spatial component (table 15) and the extended model with a spatial term are very close and they are all very small (less than 5%). However, the extended model could be fitted to cancers with larger relative differences such as myeloma, brain, prostate for males and liver, head and neck, ovarian for females to check whether adding a spatial component can reduce the relative difference from the estimates obtained using the observed data.

Table 21.

Relative differences in posterior means for SIRs by region.

	relative differences (%)
cancers	major cities	regional	remote
liver	−1.07	1.58	4.06
pancreatic	1.48	2.16	1.78

Relative differences in posterior means for SIRs by region. Moran’s I of the residuals after fitting the meta-analysis model with a spatial component for liver and pancreatic cancer (males) are 0.8857 (s.d. 0.1449, ) and 0.2051 (s.d. 0.0144, ), respectively, which are still statistically significant and very similar to what we obtained by fitting the meta-analysis model without the spatial component (see appendix A.3, table 22). Hence adding another spatial component to the proposed meta-analysis model, which was modelling already spatially smoothed estimates of log(SIR) of each cancer, did not alleviate the spatial autocorrelation from the model residuals. Measures of spatial autocorrelation among the residuals of the fitted models. aNull hypothesis: no spatial autocorrelation is present and the data are randomly distributed. From this further investigation, we recommend using the proposed Bayesian hierarchical meta-analysis model without the additional spatial component to model estimated cancer incidence data if the estimates were results of spatial models, for example, in case of ACA data. However, if the estimates are not obtained using a spatial random effects model, then including a spatial component in the meta-analysis model might be useful and indeed essential. It is also noted that these observations are based on this particular dataset; this motivates further research in order to make more general statements.

Adding continuous covariates

In the proposed Bayesian hierarchical meta-analysis model, only one covariate is included as a level of hierarchy, which is remoteness categories. The model can be modified to include continuous covariates if we replace the θ by θ in equation (3.2) aswhere X is a design matrix comprising continuous covariates and β is the vector of corresponding regression coefficients. The resulting model equation can thus be rewritten as a meta regression model with normal priors on the regression coefficients [48].

Discussion

This study has proposed and illustrated a methodology for using information provided in the form of disease maps in atlases. The benefits of the proposed approach that has been demonstrated include the ability to gain further insights from the atlases without having access to the original data, which are sometimes unavailable due to privacy and other reasons. The utility of this approach has been illustrated by the substantive analysis of estimated point and interval estimates of SIRs available in the ACA. The proposed methodology was applied to analyse the outputs from the ACA for 2148 small areas across Australia. This study aimed to determine whether a meta-analysis approach using modelled estimates was able to obtain similar inferences with those obtained using observed areal incidence data in identifying differences in cancer incidence by remoteness of an area. The output of the proposed meta-analysis model demonstrated patterns of cancer incidence by remoteness categories for most of the cancer types. It was observed that some cancers are more likely to occur in remote areas (head and neck cancers, liver cancer, lung cancer, oesophageal cancer for males and females and cervical cancer, uterine cancer for females), while some cancers are more likely to occur in regional areas (all cancers, bowel cancer, melanoma for both sexes, kidney cancer for females, leukaemia and prostate cancer for males) and some cancers are more likely to have greater incidence in major cities (brain, myeloma, non-Hodgkin lymphoma, pancreatic, stomach, thyroid cancer for both sexes, kidney cancer for males, leukaemia and ovarian cancer for females). Using the estimated SIRs reported in the ACA, the proposed model probabilistically identified these patterns of cancer incidence for different regions (major cities, regional and remote areas) in Australia. These probabilistic inferences could be useful to policy-makers by supplementing the information from the ACA. The meta-analysis method used in our study is, in effect, calculating the average of already averaged SIR estimated for each SA2; this means that there will be some measurement error compared with average SIR estimates based on observed areal incidence data. However, given the slight differences in time periods, the general consistency between the remoteness patterns in cancer incidence we have reported in our study with those reported by the Australian Institute of Health and Welfare (AIHW) in the CIMAR (Cancer Incidence and Mortality Across Regions) books using observed areal incidence data [38] is encouraging. This is particularly so when we focus on our results that had very high (∼1) or low (∼0) probabilities of being higher or lower than other remoteness categories. Clearly, there is greater uncertainty when the probabilities are closer to 0.5, and so these patterns should be interpreted with much greater caution. As an example, our model reported that incidence of kidney cancer for males in major cities could be higher than that of regional areas with a probability of 0.605, which still leaves a substantial 30% probability that regional areas do have higher incidence (as was reported by the AIHW [38]). The remoteness patterns from our proposed meta-analysis model for Australia are also similar to that reported in the Atlas of Cancer in Queensland [57]. The authors considered 13 cancers and four geographical regions: major cities, inner regional, outer regional and remote areas. Cancers having higher incidence in remote areas were lung cancer, oesophageal cancer, cervical cancer; cancers having higher incidence in major cities were breast cancer, kidney cancer; and cancers which have higher incidence in regional (inner or outer) areas were melanoma, prostate and non-Hodgkin lymphoma. The output of the proposed meta-analysis model was also validated by modelling the observed areal incidence data from 2148 SA2s around Australia. The meta-analysis approach was able to unveil the original regional patterns of cancer incidence found by the unit record model in the majority of cases. While insights about the different patterns of cancer incidence by remoteness of regions have been widely published [38,57], the rationale of using this methodology to regenerate these patterns was to provide evidence supporting the validation of the proposed modelling approach. The purpose was to confirm whether the proposed model could reveal the same patterns as those observed using the raw data so that this methodology can later be applied to obtain additional insights into other ecological factors which are not already known. There are limitations to the proposed approach. First, the posterior mean of overall average of each cancer using the proposed meta-analysis model is sometimes different from the Australian average 1 (for lung cancer: 1.05 for males, 1.01 for females, thyroid cancer: males 0.901, females 0.814). However these differences are not substantial as the 95% credible interval around the overall fitted mean contains 1 in all the cases. The reason behind the mean of the parameter not being exactly equal to the Australian average is that, in our proposed meta-analysis model, we are modelling the SIRs, which are ratios themselves and the average of the ratios and the ratio of averages will not be the same [58]. Second, to be able to apply the proposed methodology to online atlases, both point estimates as well as interval estimates or uncertainty estimates need to be available. Moreover, the proposed model might not provide accurate numerical estimates in the presence of outliers; this may explain the more limited validation of the proposed model for remote areas. We also acknowledge that the proposed methodology could be subject to Simpson’s paradox [59-61], which refers to patterns observed for groups of data reversing or disappearing when combined together, since we are analysing summary statistics, as opposed to individual-level data [62,63]. In the spatial context, Simpson’s paradox could be considered as a form of modifiable areal unit problem, where results differ depending on the spatial units chosen [64]. Our analysis has the advantage of retaining the original areas within the analysis, and in the case study presented, the areas are the smallest possible available. Results can be considered valid provided they are not extrapolated to an individual level. The proposed Bayesian hierarchical meta-analysis model is somewhat sensitive to the choice of priors and hyperparameters since they are needed at the lower levels of hierarchy, i.e. for overall mean, overall precision and region-specific precision parameters. Sensible choices of the hyperparameters and prior distributions must be made for the priors of overall mean and precision parameters (or variance parameters) in the model to ensure better model performance. Details of a comprehensive sensitivity analysis to explore this issue are reported in the electronic supplementary material. The modelling approach proposed in this study can be applied to other published disease atlases that provide point and interval estimates of disease rates. This study explored the behaviour of cancer incidence according to urban/rural status of the small areas. Similar types of analyses can be carried out to identify the influence of different socio-demographic, economic, ecological or environmental factors on specific disease incidence or mortality. This implies that further ecological modelling applying the proposed approach can answer important additional research questions to complement the information already published in the atlas or associated literature. The proposed model can also be extended relatively straightforwardly to a multivariate framework. Another possible extension of the proposed approach is to combine estimates from other atlases together with ACA and pool the effect of remoteness. There will be some additional challenges in appropriately describing the heterogeneity arising within and between the different studies, and accommodating different definitions of the remoteness categories from the atlases. This is a worthy objective for a future study.

36 in total

1. Multilevel models for meta-analysis, and their application to absolute risk differences.

Authors: S G Thompson; R M Turner; D E Warn
Journal: Stat Methods Med Res Date: 2001-12 Impact factor: 3.021

2. Posterior predictive model checks for disease mapping models.

Authors: H S Stern; N Cressie
Journal: Stat Med Date: 2000 Sep 15-30 Impact factor: 2.373

3. Simpson's paradox in meta-analysis.

Authors: J A Hanley; G Thériault
Journal: Epidemiology Date: 2000-09 Impact factor: 4.822

4. Meta-analysis of standardized incidence and mortality rates of childhood leukaemia in proximity to nuclear facilities.

Authors: P J Baker; D G Hoel
Journal: Eur J Cancer Care (Engl) Date: 2007-07 Impact factor: 2.520

5. A flexible parametric approach to examining spatial variation in relative survival.

Authors: Susanna M Cramb; Kerrie L Mengersen; Paul C Lambert; Louise M Ryan; Peter D Baade
Journal: Stat Med Date: 2016-08-08 Impact factor: 2.373

6. Disease mapping and regression with count data in the presence of overdispersion and spatial autocorrelation: a Bayesian model averaging approach.

Authors: Mohammadreza Mohebbi; Rory Wolfe; Andrew Forbes
Journal: Int J Environ Res Public Health Date: 2014-01-09 Impact factor: 3.390

7. Identification of area-level influences on regions of high cancer incidence in Queensland, Australia: a classification tree approach.

Authors: Susanna M Cramb; Kerrie L Mengersen; Peter D Baade
Journal: BMC Cancer Date: 2011-07-24 Impact factor: 4.430