Literature DB >> 31120421

Estimating the burden of α-thalassaemia in Thailand using a comprehensive prevalence database for Southeast Asia.

Carinna Hockham1,2, Supachai Ekwattanakit3, Samir Bhatt4, Bridget S Penman5, Sunetra Gupta2, Vip Viprakasit3,6, Frédéric B Piel7.   

Abstract

Severe forms of α-thalassaemia, haemoglobin H disease and haemoglobin Bart's hydrops fetalis, are an important public health concern in Southeast Asia. Yet information on the prevalence, genetic diversity and health burden of α-thalassaemia in the region remains limited. We compiled a geodatabase of α-thalassaemia prevalence and genetic diversity surveys and, using geostatistical modelling methods, generated the first continuous maps of α-thalassaemia mutations in Thailand and sub-national estimates of the number of newborns with severe forms in 2020. We also summarised the current evidence-base for α-thalassaemia prevalence and diversity for the region. We estimate that 3595 (95% credible interval 1,717-6,199) newborns will be born with severe α-thalassaemia in Thailand in 2020, which is considerably higher than previous estimates. Accurate, fine-scale epidemiological data are necessary to guide sustainable national and regional health policies for α-thalassaemia management. Our maps and newborn estimates are an important first step towards this aim. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
© 2019, Hockham et al.

Entities:  

Keywords:  Thalassaemia; epidemiology; genetic diversity; global health; human; newborn prevalence; spatial distribution

Mesh:

Substances:

Year:  2019        PMID: 31120421      PMCID: PMC6533055          DOI: 10.7554/eLife.40580

Source DB:  PubMed          Journal:  Elife        ISSN: 2050-084X            Impact factor:   8.140


Introduction

α-thalassaemia is one of the commonest monogenic disorders of humans, spanning much of the malaria belt, including the Mediterranean, sub-Saharan Africa, Asia and the Pacific. It is estimated that up to 5% of the world’s population carries at least one α-thalassaemia variant, with some populations (e.g. in India and Papua New Guinea) reporting gene frequencies of close to 80% (Piel and Weatherall, 2014). Central to its elevated frequency is the malaria protection afforded by the underlying genetic mutations, which have been favoured by natural selection in populations with historically high rates of malaria (Flint et al., 1998; Premawardhena et al., 2017; Weatherall and Clegg, 2001a; May et al., 2007). Due to recent population migrations, α-thalassaemia is now common in other parts of the world, as illustrated by the inclusion of haemoglobin H (HbH) disease (a form of α-thalassaemia) in the newborn screening programme in California (Vichinsky, 2013; Hoppe, 2009). Humans typically possess four copies of the α-globin gene. In an individual with α-thalassaemia, at least one of these four copies is absent or dysfunctional. The resulting deficit in α-globin affects the balance between α-globin and β- or γ-globin chains that is necessary to produce normal adult haemoglobin (HbA) and normal foetal haemoglobin (HbF), respectively (Weatherall and Clegg, 2001b). The severity of α-thalassaemia is inversely related to the number of functional copies of the α-globin gene. A deficit of three or more α-globin genes leads to the production of γ-globin tetramers, called Hb Bart’s, in the foetus or β-globin tetramers, called HbH, in adults. Due to their very high oxygen affinity, neither tetramer is capable of transporting oxygen efficiently (Galanello and Cao, 2011). Furthermore, the instability of HbH leads to the production of inclusion bodies in red blood cells and a variable degree of haemolytic anaemia. To date, 121 α-globin gene mutations have been identified (HbVar, http://globin.bx.psu.edu, accessed 07 July 2018). These include: (i) double gene deletions that remove both α-globin copies in a gene pair (α0-thalassaemia), (ii) single gene deletions that remove one α-globin copy (α+-thalassaemia), and (iii) non-deletional (ND) mutations that in some way inactivate the affected gene (αND-thalassaemia). While deletions constitute the vast majority of these α-thalassaemia variants, non-deletional variants are typically associated with more severe phenotypes (Chui et al., 2003; Fucharoen and Viprakasit, 2009; Lal et al., 2011). However, even amongst non-deletional variants, considerable phenotypic variability is observed (Singer, 2009). Because the geographical distribution of β-thalassaemia largely overlaps with the distribution of α-thalassaemia, it is important to note that their co-inheritance often leads to a reduced imbalance between α-globin and β-globin chains, resulting in a milder thalassaemia phenotype (Weatherall et al., 1981; Wainscoat et al., 1983; Kan and Nathan, 1970; Law et al., 2003). From a clinical perspective, α-thalassaemia is mostly a burden in Southeast Asia where α0-thalassaemia variants (e.g. --SEA, --THAI) are common and result in HbH disease when inherited with α+-thalassaemia (e.g. -α3.7 or -α4.2) or αND-thalassaemia (e.g. Hb Constant Spring, or Hb CS, or Hb Paksé), or in Hb Bart’s hydrops fetalis when inherited from both parents (Weatherall et al., 2006; Chui, 2005). Previously, HbH disease was considered to be relatively benign; however, recent evidence suggests a spectrum of mild to severe forms of HbH disease, with the worst affected individuals requiring lifelong transfusion (Chui et al., 2003; Fucharoen and Viprakasit, 2009; Lal et al., 2011). Hb Bart’s hydrops fetalis, the most severe form of α-thalassaemia, associated with an absence of any functional α-globin genes, is almost always fatal in utero or soon after birth, although intrauterine interventions and perinatal intensive care can lead to survival (Songdej et al., 2017). In this context, there is a growing demand for a better understanding of the epidemiology of α-thalassaemia such that burden estimates can be calculated to guide public health decisions and assess the need for new pharmacological treatments. However, whilst several narrative reviews of the epidemiology of α-thalassaemia in Southeast Asian countries are available (Chui, 2005; Fucharoen and Winichagoon, 1987; Fucharoen and Winichagoon, 2011), a comprehensive review for the whole region has not been performed, making the current evidence-base patchy and incohesive. In addition, there appears to be a substantial amount of data that are available only in local data sources, which is not being accessed by the international community. Estimates of the number of newborns with severe forms of α-thalassaemia published by Modell and Darlison currently represent the only source of information on the epidemiology of thalassaemias and other inherited haemoglobin disorders at national and regional levels (Modell and Darlison, 2008). However, various inconsistencies have been identified in these α-thalassaemia estimates (Piel and Weatherall, 2014). Furthermore, they do not include most of the surveys conducted in the genomic era, which has allowed accurate diagnosis through DNA testing. Finally, haemoglobinopathies often present remarkably heterogeneous geographical distributions (Piel et al., 2013a; Piel et al., 2013b). As shown for other genetic conditions (e.g. sickle-cell anaemia), these variations can be captured by generating continuous allele frequency maps interpolated from population surveys using geostatistical techniques. Combined with high-resolution demographic and birth rate data, these maps allow sub-national newborn estimates to be calculated (Piel et al., 2013a; Piel et al., 2013b; Howes et al., 2012). The aims of this study are therefore three-fold: i) to compile a geodatabase of published evidence for the distribution of α-thalassaemia and its common genetic variants in Southeast Asia, (ii) to generate the first model-based continuous maps of α-thalassaemia in Thailand and calculate refined estimates of the annual number of newborns affected by severe forms of α-thalassaemia, and iii) to comprehensively evaluate and summarise the compiled evidence-base for the whole region.

Results

The database

Our keyword searches yielded a total of 868 unique potential sources of data on α-thalassaemia prevalence and/or genetic diversity in 10 Southeast Asian countries: Brunei Darussalam, Cambodia, Indonesia, Lao People’s Democratic Republic, Malaysia, Myanmar, Philippines, Singapore, Thailand and Vietnam (Figure 1—figure supplement 1). A further 74 potential data sources were identified by one of the authors (SE) from local Thai journals and independently double-checked for inclusion into the study (CH). Of all these sources, 75 met our inclusion criteria and were included in the final database. Due to some sources reporting estimates for more than one population, data were available for 106 individual population samples: 58 from the online literature search and 48 from the local literature. A detailed description of the database is provided in Appendix 5.
Figure 1—figure supplement 1.

A map of the countries included in this study.

Here, we defined the Southeast Asian region according to the member states of the Association of Southeast Asian Nations (ASEAN) (http://asean.org/asean/asean-member-states/).

Forty-six surveys provided data on α-thalassaemia gene frequency alone, two provided data only relating to genetic diversity and 58 provided data on both. Four surveys were reported at the national level (two from Malaysia, one from Singapore and one from Thailand), and were retained for the regional analysis. The spatial and temporal distributions of identified surveys are shown in Figure 1—figure supplement 2. The country for which the highest number of surveys was published in the international literature (i.e. excluding surveys obtained through local searches) was Thailand. No published surveys were identified for Brunei Darussalam or the Philippines. Within countries, surveys predominated in certain areas. For instance, in Thailand, the northern and northeastern parts of the country contained >75% of all surveys. Data for southern Thailand came exclusively from Thai journals (n = 4) (Figure 1—figure supplement 2). The total number of individuals sampled was 133,649 (the population of the region is estimated to be 647,483,729 in 2017), with a mean sample size of 1261. Further details on the final database are provided in Appendix 5.
Figure 1—figure supplement 2.

Spatial and temporal distributions of the α-thalassaemia surveys included in the final database.

In both panels, the shape of the data points indicates the type of data provided by the survey, the colour indicates whether the survey was found in our online literature search or in local journals, and size represents the sample size of the survey. In (A) a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Prevalence surveys varied considerably with regards to the α-thalassaemia alleles and/or genotypes that were tested for or reported upon; whilst some reported allele frequencies for α0-, α+- and αND-thalassaemia, others provided data for only one or two of these. To maximise use of the available allele frequency data, whilst avoiding the incorporation of potentially biased estimates for overall α-thalassaemia allele frequency, we generated separate maps for each of the three major forms of α-thalassaemia – that is, α0-, α+- and αND-thalassaemia (Figure 1A,B and C, respectively). α0-thalassaemia was the most extensively studied form (n = 97), followed by αND-thalassaemia (n = 49) and then α+-thalassaemia (n = 47).
Figure 1.

Descriptive maps of the observed allele frequencies in the database.

(A) α0-thalassaemia, (B) α+-thalassaemia and (C) αND-thalassaemia. A spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points. Colour intensity indicates allele frequency; circle size represents the size of the survey size. Surveys that could only be mapped at the national level are indicated by a black star.

Data were obtained through a review of the published literature using a rigorous inclusion/exclusion protocol. In the figure, a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Data were obtained through a review of the published literature using a rigorous inclusion/exclusion protocol. In the figure, a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Data were obtained through a review of the published literature using a rigorous inclusion/exclusion protocol. In the figure, a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Here, we defined the Southeast Asian region according to the member states of the Association of Southeast Asian Nations (ASEAN) (http://asean.org/asean/asean-member-states/).

In both panels, the shape of the data points indicates the type of data provided by the survey, the colour indicates whether the survey was found in our online literature search or in local journals, and size represents the sample size of the survey. In (A) a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Only the most common variants for α+-thalassemia (-α3.7and -α4.2) and α0-thalassemia (--MED and --SEA) are shown for each region. The variants that appear in parentheses are those for which the data used to make this map are limited.

Descriptive maps of the observed allele frequencies in the database.

(A) α0-thalassaemia, (B) α+-thalassaemia and (C) αND-thalassaemia. A spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points. Colour intensity indicates allele frequency; circle size represents the size of the survey size. Surveys that could only be mapped at the national level are indicated by a black star.

Source data for Figure 1A,a map of the observed α0-thalassaemia allele frequencies in the database.

Data were obtained through a review of the published literature using a rigorous inclusion/exclusion protocol. In the figure, a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Source data for Figure 1B,a map of the observed α+-thalassaemia allele frequencies in the database.

Data were obtained through a review of the published literature using a rigorous inclusion/exclusion protocol. In the figure, a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

Source data for Figure 1C,a map of the observed αND-thalassaemia allele frequencies in the database.

Data were obtained through a review of the published literature using a rigorous inclusion/exclusion protocol. In the figure, a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

A map of the countries included in this study.

Here, we defined the Southeast Asian region according to the member states of the Association of Southeast Asian Nations (ASEAN) (http://asean.org/asean/asean-member-states/).

Spatial and temporal distributions of the α-thalassaemia surveys included in the final database.

In both panels, the shape of the data points indicates the type of data provided by the survey, the colour indicates whether the survey was found in our online literature search or in local journals, and size represents the sample size of the survey. In (A) a spatial jitter of up to 0.30 latitude and longitude decimal degree coordinates was applied to allow visualisation of spatially duplicated data points.

A map of our current knowledge of the global distribution, gene frequency and genetic diversity of α-thalassemia.

Only the most common variants for α+-thalassemia (-α3.7and -α4.2) and α0-thalassemia (--MED and --SEA) are shown for each region. The variants that appear in parentheses are those for which the data used to make this map are limited.

Continuous allele frequency maps for Thailand

Data for Thailand and its neighbouring countries (Cambodia, Lao PDR, Malaysia and Myanmar) formed the evidence-base for a Bayesian geostatistical model and are presented in Figure 2—figure supplement 1. The total number of data points available for α0-, α+- and αND-thalassaemia was 88, 37 and 42, respectively. The data were used to generate 1 km x 1 km maps of allele frequencies of α0-, α+- and αND-thalassaemia in Thailand (Figure 2). One hundred realisations of the model were performed to generate a posterior predictive distribution (PPD) for each 1 km x 1 km pixel. The mean of the PPD is displayed, along with the 95% credible interval as a measure of model uncertainty.
Figure 2—figure supplement 1.

Maps of the observed allele frequencies used to construct the models and generate the predicted continuous allele frequency maps for Thailand in Figure 2.

(A) α0-thalassaemia, (B) α+-thalassaemia and (C) αND-thalassaemia. A variable spatial jitter was applied to allow visualisation of spatially duplicated data points. Colour intensity indicates allele frequency.

Figure 2.

Maps of the mean of, and uncertainty in, the predicted α-thalassaemia allele frequencies in Thailand.

Panels A to C display the mean of the posterior predictive distribution (PPD) of 100 realisations of the geostatistical model. Panels D to F display the 95% credible interval of the PPD. Each row corresponds to a different α-thalassaemia form: α0-thalassaemia (A and D); α+-thalassaemia (B and E) and αND-thalassaemia (C and F). Figure 2—figure supplement 1 shows the observed data used to construct the models and Figure 2—figure supplement 2 displays the province names for reference.

(A) α0-thalassaemia, (B) α+-thalassaemia and (C) αND-thalassaemia. A variable spatial jitter was applied to allow visualisation of spatially duplicated data points. Colour intensity indicates allele frequency.

The Bangkok Metropolitan Region, which includes Bangkok City and surrounding provinces, is shaded in red, with Bangkok City shaded a darker red.

Maps of the mean of, and uncertainty in, the predicted α-thalassaemia allele frequencies in Thailand.

Panels A to C display the mean of the posterior predictive distribution (PPD) of 100 realisations of the geostatistical model. Panels D to F display the 95% credible interval of the PPD. Each row corresponds to a different α-thalassaemia form: α0-thalassaemia (A and D); α+-thalassaemia (B and E) and αND-thalassaemia (C and F). Figure 2—figure supplement 1 shows the observed data used to construct the models and Figure 2—figure supplement 2 displays the province names for reference.
Figure 2—figure supplement 2.

A reference map of Thailand provinces.

The Bangkok Metropolitan Region, which includes Bangkok City and surrounding provinces, is shaded in red, with Bangkok City shaded a darker red.

Maps of the observed allele frequencies used to construct the models and generate the predicted continuous allele frequency maps for Thailand in Figure 2.

(A) α0-thalassaemia, (B) α+-thalassaemia and (C) αND-thalassaemia. A variable spatial jitter was applied to allow visualisation of spatially duplicated data points. Colour intensity indicates allele frequency.

A reference map of Thailand provinces.

The Bangkok Metropolitan Region, which includes Bangkok City and surrounding provinces, is shaded in red, with Bangkok City shaded a darker red. The maps for α0- and α+-thalassaemia indicate clear spatial heterogeneity in allele frequencies, with ranges of 0.57–4.46% and 2.43–15.03%, respectively (Figure 2A,B). Heterogeneity is greatest in the north of the country. For α0-thalassaemia, while large parts of the northernmost provinces of Chiang Rai, Phayao and Nan have predicted allele frequencies of up to 2%, allele frequencies for the neighbouring provinces of Chiang Mai, Lampang and Phrae are often twice as high (see Figure 2—figure supplement 2 for a reference map of Thailand provinces). The allele is also predicted at frequencies of up to 4% in the northeast of the country, along a belt across most of the north of the country and in Chonburi and Rayong provinces in central Thailand. Allele frequencies below 1% are predicted throughout the southern zone. α+-thalassaemia has its highest predicted allele frequencies across the whole of the north and northeastern zones. Predicted allele frequencies of αND-thalassaemia range between 1.57% and 1.65% only. Model uncertainty is greatest in areas where data are scarce (e.g. southern Thailand and along the border with Myanmar) or where there is heterogeneity in the available data (e.g. in Chiang Mai and the surrounding area). Overall, uncertainties are higher for α+-thalassaemia than for α0-thalssaemia or αND-thalassaemia, which is partly due to the wider range of observed frequencies for this form. For α0-thalassaemia, the highest level of uncertainty is 9% and is found in Chumphon and Ranong provinces in southern Thailand and Kanchanaburi and Tak in the westernmost part of the country. For α+-thalassaemia, the highest uncertainty (up to 23%) is observed in the northeastern zone and in the north. Uncertainty for αND-thalassaemia is patchy and ranged from 1.5% in central and northern Thailand to 2.5% in southern and northeastern Thailand. The results of the 10-fold cross-validation procedure reveal an average mean absolute error of the predictions of 0.93%, 4.10% and 2.30%, for α0-, α+- and αND-thalassaemia, respectively. The average correlation between the observed and predicted values is 0.74 (0.62–0.83), 0.71 (0.49–0.85) and 0.47 (0.17–0.69), respectively.

Estimates of number of affected newborns in Thailand

Estimates of the number of newborns born with a severe form of α-thalassaemia (i.e. Hb Bart’s hydrops fetalis and HbH disease) in Thailand in 2020 were generated by pairing our allele frequency predictions to high-resolution demographic data for the country. We estimate that the number of Hb Bart’s hydrops fetalis births in the country will be 423 (CI: 184–761) in 2020. The number of new cases of HbH disease is estimated to be 3,172, including 2674 (CI: 1,296–4,491) deletional and 498 (CI: 237–947) non-deletional cases. The highest absolute burden of hydrops fetalis is predicted in Bangkok City (57 [CI: 13–151]) (Figure 2—figure supplement 2), with its high population density, and Udon Thani (23 [CI: 6–66]) in the northeastern zone, where some of the highest α0-thalassaemia allele frequencies are predicted. Other provinces with a comparatively high burden include: Chiang Mai in the north of the country; Khon Kaen, Sakon Nakhon and Ubon Ratchathani in the northeast; and Chon Buri, Samut Prakan and Nonthaburi close to Bangkok City. The estimated number of hydrops fetalis births in these provinces range between 10 and 19. For HbH disease, the highest burden is predicted in northeast Thailand for both the deletional and non-deletional forms. Bangkok City is predicted to have the highest burden of HbH disease (301 [CI: 94–639] for deletional HbH disease and 68 [CI: 25–148] for non-deletional HbH disease). To directly compare estimates generated using our methodology with those previously published by Modell and Darlison, we also calculated estimates using population and birth rate data for 2003 (Appendix 3). Modell and Darlison estimated 1017 and 2515 births to be affected by Hb Bart’s hydrops fetalis and HbH disease, respectively, in 2003. Using population data from the same year paired with our model-based maps, and assuming no consanguinity, we estimate 709 and 5469 newborns to be born with Hb Bart’s hydrops fetalis and HbH disease in the country. As Modell and Darlison included a population coefficient of consanguinity (F) in their calculations, we examined the effect that this would have on our estimates. We found that they do not change considerably (951 and 5,409), when a value of F of 0.1, a high value for the region (www.consang.net), is incorporated. Our estimates are therefore consistent with those by Modell and Darlison for Hb Bart’s hydrops fetalis. However, they suggest that the burden of HbH disease in Thailand may have previously been underestimated. Moreover, whilst Modell and Darlison did not estimate the burden of non-deletional forms of HbH disease, our estimates suggest that 15% of the 5469 neonatal cases were of non-deletional types, which are usually associated with more severe phenotypes.

Overall distribution of α-thalassaemia across Southeast Asia

In our database for all of Southeast Asia, the number of surveys that tested for all three forms of α-thalassaemia was 40. Amongst these, the overall α-thalassaemia gene frequency ranged from 0% in populations from peninsular Malaysia to 35.4% in Preah Vihar, Cambodia (Munkongdee et al., 2016). A higher allele frequency of 49% was also reported in the So ethnic group from Khammouane Province in Lao PDR, although the sample size for this study was small (n = 50) (Sengchanh et al., 2005). Appendix 5—table 2 shows the range of allele frequencies observed for the different α-thalassaemia forms (α0-, α+- and αND-thalassaemia) in each country.
Appendix 5—table 2.

Observed allele frequency ranges for different α-thalassaemia forms.

CountryAllele frequency range (%)
α0-thalassaemiaα+-thalassaemiaαND-thalassaemia
Brunei DarussalamNo surveys identifiedNo surveys identifiedNo surveys identified
Cambodia0.80–1.1010.30–26.302.44–4.20
IndonesiaNo surveys identified2.91No surveys identified
Lao PDR0.00–6.194.60–40.002.28–9.00
Malaysia0.00–1.920.00–16.800.00–16.25
Myanmar0.9320.58No surveys identified
PhilippinesNo surveys identifiedNo surveys identifiedNo surveys identified
Singapore0.86–0.901.88–3.040.04
Thailand0.00–9.292.98–21.430.00–7.30
Vietnam0.00–2.661.59–14.42.07–14.43
For α0-thalassaemia, the highest allele frequencies were observed in Thailand (Figure 1A, Figure 1—source data 1) In Lao PDR, surveys along the Lao PDR-Thailand border near Vientiane reported allele frequencies between 4.03% and 7.28%, whilst the survey among the aforementioned So ethnic group reported an absence of the α0-thalassaemia allele. The highest reported allele frequencies in Cambodia and Vietnam were 1.10% and 2.66%, respectively, with the majority of studies reporting even lower frequencies. However, data were sparse in the two countries (n = 4 in each). Allele frequencies of up to 1.53% were observed in southern Thailand, whilst the few surveys carried out in Myanmar (n = 1), Malaysia (n = 11) and Singapore (n = 2) reported allele frequencies of around 1%. In Malaysia, the highest allele frequency of α0-thalassaemia was 1.92% from a study carried out in newborns in Kuala Lumpur, the capital city (Alauddin et al., 2017). α+-thalassaemia, the most prevalent form, reached allele frequencies of 26% in Cambodia (Figure 1B, Figure 1—source data 2) (Munkongdee et al., 2016). The surveys revealed a clear north-to-south decline in the distribution of α+-thalassaemia across the region, with a single high allele frequency estimate of 16.8% observed in Sabah in Malaysian Borneo (Tan et al., 2010). High allele frequencies (≥10%) were observed in all four surveys in Cambodia. In Vietnam the reported α+-thalassaemia allele frequency ranged from 1.59 to 14.4%. The observed allele frequency of αND-thalassaemia ranged between 0% in various locations across Malaysia and 16.25% in central Peninsular Malaysia (Figure 1C, Figure 1—source data 3). Within Thailand, the highest reported allele frequencies of around 7% were observed in Khon Kaen in the northeast and Chachoengsao in central Thailand. In Cambodia and Vietnam, αND-thalassaemia allele frequencies of up to 8% and 14.3% were reported, respectively, in surveys in which α0-thalassaemia was found to be absent, whilst in other parts of these countries, the two forms were found to co-exist at similar allele frequencies (e.g. around 2.5% in Binh Phuoc and Khanh Hoa provinces in Vietnam).

Genetic diversity of α-thalassaemia across Southeast Asia

Maps of the genetic diversity of α-thalassaemia across Southeast Asia are shown in Figures 3–6. Figure 3 (Figure 3—source data 1) displays surveys that included all three α-thalassaemia forms (α0-, α+- and αND-thalassaemia), allowing relative proportions of each of the forms to be calculated without giving specific variant details. Figures 4–6 (Figure 4—source data 1, Figure 5—source data 1, Figure 6—source data 1) display surveys that provided information on the frequencies of specific α-thalassaemia variants (e.g. --SEA, -α3.7, etc.). For these, the variants that were tested for differ between surveys. Some surveys are included in both Figure 3 and Figures 4–6. For the latter figures, the region has been divided to improve visualisation of the data.
Figure 3.

Map showing the proportions of α0-, α+- and αND-thalassaemia in Southeast Asia.

Three surveys were mapped at the national level (indicated by a white star). The size of the pie charts reflects survey size.

Reported allele frequencies (converted to percentages here) and sample size were used to calculate the number of chromosomes bearing each form of α-thalassaemia. Sample size was multiplied by two to obtain the total number of chromosomes in the study sample. In total, 40 surveys included genetic diversity information for the three α-thalassaemia forms.

Figure 6.

Map showing the allele frequencies of specific α-thalassaemia variants in Malaysia, Singapore and Indonesia.

The y-axis scale is the same across all bar charts, ranging from 0 to 1. The variants that were tested for in each survey are indicated above each bar. α0-thalassaemia mutations are shown in red, α+-thalassaemia mutations in blue and αND-thalassaemia mutations in green. Empty spaces along the x-axis indicate an absence of the corresponding mutation in the survey sample. The sample size of the survey is given under each plot. Bar charts are connected to their spatial location by a black line. Data points are coloured by country, using the same colour scale as that in Figure 1—figure supplement 1.

'NA' indicates those variants that were not tested for in the survey. In total, 14 surveys included genetic diversity information for specific α-thalassaemia variants in these countries.

Figure 4.

Map showing the allele frequencies of specific α-thalassaemia variants in Thailand.

Given the high number of surveys in northeast Thailand, this region has been magnified. The y-axis scale is the same across all bar charts, ranging from 0 to 1. The variants that were tested for in each survey are indicated above each bar. α0-thalassaemia mutations are shown in red, α+-thalassaemia mutations in blue and αND-thalassaemia mutations in green. Empty spaces along the x-axis indicate an absence of the corresponding mutation in the survey sample. The sample size of the survey is given under each plot. Bar charts are connected to their spatial location by a black line.

'NA' indicates those variants that were not tested for in the survey. In Thailand, 29 surveys included genetic diversity information for specific α-thalassaemia variants.

Map showing the proportions of α0-, α+- and αND-thalassaemia in Southeast Asia.

Three surveys were mapped at the national level (indicated by a white star). The size of the pie charts reflects survey size.

Source data for Figure 3, a map showing the proportions of α0-, α+- and αND-thalassaemia in Southeast Asia.

Reported allele frequencies (converted to percentages here) and sample size were used to calculate the number of chromosomes bearing each form of α-thalassaemia. Sample size was multiplied by two to obtain the total number of chromosomes in the study sample. In total, 40 surveys included genetic diversity information for the three α-thalassaemia forms.

Map showing the allele frequencies of specific α-thalassaemia variants in Thailand.

Given the high number of surveys in northeast Thailand, this region has been magnified. The y-axis scale is the same across all bar charts, ranging from 0 to 1. The variants that were tested for in each survey are indicated above each bar. α0-thalassaemia mutations are shown in red, α+-thalassaemia mutations in blue and αND-thalassaemia mutations in green. Empty spaces along the x-axis indicate an absence of the corresponding mutation in the survey sample. The sample size of the survey is given under each plot. Bar charts are connected to their spatial location by a black line.

Source data for Figure 4, a map showing the proportions of specific α-thalassaemia variants in Thailand.

'NA' indicates those variants that were not tested for in the survey. In Thailand, 29 surveys included genetic diversity information for specific α-thalassaemia variants.

Map showing the allele frequencies of specific α-thalassaemia variants in Myanmar, Lao PDR, Cambodia and Vietnam.

The y-axis scale is the same across all bar charts, ranging from 0 to 1. The variants that were tested for in each survey are indicated above each bar. α0-thalassaemia mutations are shown in red, α+-thalassaemia mutations in blue and αND-thalassaemia mutations in green. Empty spaces along the x-axis indicate an absence of the corresponding mutation in the survey sample. The sample size of the survey is given under each plot. Bar charts are connected to their spatial location by a black line. Data points are coloured by country, using the same colour scale as that in Figure 1—figure supplement 1.

Source data for Figure 5, a map showing the proportions of specific α-thalassaemia variants in Cambodia, Lao PDR, Myanmar and Vietnam.

'NA' indicates those variants that were not tested for in the survey. In total, 13 surveys included genetic diversity information for specific α-thalassaemia variants in these countries.

Map showing the allele frequencies of specific α-thalassaemia variants in Malaysia, Singapore and Indonesia.

The y-axis scale is the same across all bar charts, ranging from 0 to 1. The variants that were tested for in each survey are indicated above each bar. α0-thalassaemia mutations are shown in red, α+-thalassaemia mutations in blue and αND-thalassaemia mutations in green. Empty spaces along the x-axis indicate an absence of the corresponding mutation in the survey sample. The sample size of the survey is given under each plot. Bar charts are connected to their spatial location by a black line. Data points are coloured by country, using the same colour scale as that in Figure 1—figure supplement 1.

Source data for Figure 6, a map showing the proportions of specific α-thalassaemia variants in Indonesia, Malaysia and Singapore.

'NA' indicates those variants that were not tested for in the survey. In total, 14 surveys included genetic diversity information for specific α-thalassaemia variants in these countries. α+-thalassaemia most consistently constituted the highest proportion of α-thalassaemia, although there were some surveys in which αND-thalassaemia was the predominant form (e.g. central Vietnam and in parts of Malaysia). In Figure 3, areas where the observed relative proportion of α0-thalassaemia was greatest include: Chiang Mai and Phayao provinces in north Thailand, Kalasin in northeast Thailand, Vientiane in Lao PDR, Kuala Lumpur and Selangor in Malaysia, Singapore and Jakarta in Indonesia. The α0-thalassaemia allele was absent in the survey from Malaysian Borneo as well as in central Vietnam and central Lao PDR. In certain areas, α0- and αND-thalassaemia together accounted for the majority of α-thalassaemia (e.g. ~75% in Kalasin in Thailand,~60% in Kuala Lumpur and Jakarta and ~53% in Khon Kaen and Chachoensao in Thailand and Vientiane in Lao PDR). Some of these areas also correspond to where the highest allele frequencies of these alleles are found, for example, northeast Thailand and the Thailand-Lao PDR border. Among surveys that tested for specific α-thalassaemia variants, the most commonly tested variant throughout the region was --SEA (n = 44), followed by -α3.7 (n = 36) (Figures 4–6). Most of the studies in Thailand only tested for a subset of the mutations considered in this study; only one survey in Bangkok City tested for the whole suite. More than in other countries, surveys in Thailand tested specifically for α0- or αND-thalassaemia mutations (n = 7 and 9, respectively). Throughout the region, --SEA was the dominant α0-thalassaemia mutation, and in the majority of surveys -α3.7 was the dominant α+-thalassaemia and Hb CS the dominant αND-thalassaemia mutation. The only exceptions were in Java in Indonesia, where -α3.7 and -α4.2 were found at similar frequencies and in Kelantan in Malaysia, where Hb Adana was the only αND-thalassaemia mutation identified. The -(α)20.5 and --MED mutations were not detected in any of the surveys, whilst the --FIL mutation was found in 2 of the 16 surveys in which it was tested for and --THAI in 9 of the 31 surveys in which it was included. Consistent with Figure 3, α0-thalassaemia variants contributed minimally to α-thalassaemia mutations in Myanmar, Cambodia and Vietnam but were found at higher frequencies in surveys along the Thailand-Lao PDR border. In Vietnam, Malaysia, Indonesia and Singapore, the frequency of α0-thalassaemia varied considerably, with it being absent in some areas and a predominant form in others. This is also true for αND-thalassaemia.

Discussion

α-thalassaemia is a neglected public health problem whose burden has, to date, been largely overlooked, but for which morbidity is expected to increase in the coming decades as a result of the epidemiological transition, whereby acute infectious disease is replaced by chronic disease as the predominant cause of morbidity and mortality (Piel and Weatherall, 2014; Weatherall, 2011). Moreover, country reports (e.g. from Malaysia) indicate a shift in the age distribution of thalassaemia patients towards older ages (Ibrahim, 2012). As the burden increases, there will be greater demand for resources, including healthcare facilities and staff, genetic counselling and drugs, to treat and manage affected patients. This is particularly true for countries in Southeast Asia as well as the Mediterranean, where severe forms of α-thalassaemia (i.e. α0-thalassaemia) are found.

Comparison with existing maps and population estimates

The model-based maps for Thailand presented here are, to our knowledge, the first spatially continuous maps of the distribution of α-thalassaemia in any country. Our newborn estimates represent the first evidence-based estimates of specific forms of α-thalassaemia disease amongst newborns since 2003 (although the study in which they were reported was published in 2008) (Modell and Darlison, 2008) and the first estimates at sub-national level. Importantly, whilst there are currently no estimates of the number of stillbirths that will occur in Thailand in 2020, our estimate of the number of Hb Bart’s hydrops fetalis births represents more than 10% of the 3697 stillbirths estimated for 2015 (Blencowe et al., 2016). Comparisons between previous newborn estimates and those generated in this study using our updated database and 2003 demographic data revealed an almost two-fold difference for deletional HbH disease (2515 compared to 4694 in the present study). Reasons for such discrepancies most likely relate to: (i) differences in the inclusion criteria used in the generation of our map and therefore our calculation of newborn estimates, (ii) the quantity of survey data used, and iii) the statistical methods employed. For instance, spatial specificity was not a consideration in the study by Modell and Darlison, who used a single allele frequency estimate extrapolated to the whole country. As such, the newborn calculations in the present study represent a methodological advance over previous efforts to assess the burden of α-thalassaemia. We related fine-scale allele frequency data to birth count data of equally high resolution, allowing location-specific estimates to be generated that could be aggregated to province level. Moreover, the use of model-based maps in our calculations enabled the measurement of uncertainty in our predictions. Finally, by including allele frequency data on αND-thalassaemia, we were able to estimate the burden of the more severe non-deletional HbH disease. Our newborn estimates for 2020 are considerably lower than those for 2003. This reduction is due to the lower number of births in Thailand in 2020 as a result of a decreasing birth rate and population size (World population prospects, 2017). It would be interesting to quantify how improvements in the prevention of thalassaemias will affect these estimates in the future. Our descriptive maps represent the first detailed cartographic representations of α-thalassaemia allele frequency estimates in Southeast Asia, which take into account the specific geographical location of the surveys in which they were observed. Until now, available maps (e.g. Figure 1—figure supplement 3) provided only a crude overview of overall α-thalassaemia gene frequency, without any distinction between different α-thalassaemia forms, and extrapolated to the entire region, thereby masking sub-national and even international, variation in allele frequencies (Piel and Weatherall, 2014).
Figure 1—figure supplement 3.

A map of our current knowledge of the global distribution, gene frequency and genetic diversity of α-thalassemia.

Only the most common variants for α+-thalassemia (-α3.7and -α4.2) and α0-thalassemia (--MED and --SEA) are shown for each region. The variants that appear in parentheses are those for which the data used to make this map are limited.

The maps are broadly consistent with early narrative reviews of the gene frequency of α-thalassaemia in the region (Fucharoen and Winichagoon, 1987), showing a clear north-to-south trend of decreasing allele frequencies of α0- and α+-thalassaemia and a patchier distribution of αND-thalassaemia. However, our maps also demonstrate a severe lack of data on the allele frequency of α-thalassaemia across large parts of Southeast Asia, including in Myanmar, northern Lao PDR, northern Vietnam, Indonesia, Philippines and Brunei. This impedes our ability to assess the fine-scale burden of α-thalassaemia, making efficient public health planning for its control difficult, and limits our ability to track progress in the prevention and management of the disorder.

Patterns of genetic variation and their public health implications

The pattern of genetic diversity observed in this study indicates variable distributions of mild and severe α-thalassaemia forms. Reasons for this are unclear. However, high variant heterogeneity has been observed for other genetic disorders (e.g. G6PD deficiency) in Southeast Asia, (Howes et al., 2013) which might suggest a similar underlying cause. In their global study, Howes et al. noted that G6PD variants were most diverse in East Asia and the West Pacific, where P. falciparum parasites show strong population structure with lower genetic relatedness between populations in the region. Indeed, P. falciparum has been shown to display genetically structured populations within Thailand alone. (Pumpaibool et al., 2009) It is possible that the evolutionary dynamics between P. falciparum and haemoglobin variants, including α-thalassaemia, are more complex than we currently appreciate. The observed spatial distributions of the different α-thalassaemia forms and variants has important implications for the design of newborn screening programmes with regards to the preferred diagnostic algorithm and allocation of treatment and management service provision. Areas with the highest proportions of co-occurring severe α-thalassaemia forms (i.e. α0-thalassaemia and αND-thalassaemia) may experience a higher prevalence of the severe non-deletional form of HbH disease. Furthermore, the predominance of Hb CS in surveys from Malaysia and Vietnam suggests that the health burden of α-thalassaemia in these areas may be greater than previously thought. Hb CS is a mutation at the termination codon of the α2-globin gene, which, in a normal individual, accounts for three-quarters of overall α-globin production (Liebhaber and Kan, 1981; Orkin et al., 1981). As a result, α2-globin gene mutations, such as Hb CS, tend to cause a more severe phenotype (Chui, 2005).

Model strengths and limitations

The reliability of the model-based maps is intrinsically linked to the quality, quantity and spatial coverage of the data upon which the models are based. We were unable to generate continuous maps for the whole of the Southeast Asian region as data were sparse in large areas. Whilst we are aware that unpublished surveys are likely to be available for most countries of the region, obtaining local data for all of the countries was beyond the scope of this study. Nevertheless, we have demonstrated that substantial additional survey data can be identified in locally published sources and, as a result, highlighted the enormous value of future collaborations to collate local data in other regions. For Thailand, limitations relating to data sparsity, uneven survey distributions and heterogeneity in allele frequency can be quantified in the presented uncertainty intervals. Areas where there is little data or where observed allele frequencies are highly heterogeneous within a small geographical area will have more uncertain predictions, whilst a large amount of data for which there is little heterogeneity will lead to more precise predictions. We identified a lack of data in the southern part of Thailand, which is reflected in larger uncertainty estimates. Other predictions with high associated uncertainty include those along the Thailand-Myanmar border, particularly the southern tail of Myanmar, where no α-thalassaemia prevalence surveys are found. This highlights the arbitrary nature of country borders in mapping studies. Spatial smoothing is an important component of most geostatistical models. For the modelling approach used here, the range function (i.e. the extent of spatial autocorrelation) is defined by a parameter within the SPDE framework and takes a prior distribution. The smoothing in the approximate posterior therefore balances over- and under-fitting and is necessary to ensure that the model predicts adequately without fitting the idiosyncrasies of the data. As a result, the model does not predict allele frequencies that fully reflect heterogeneity between nearby surveys. Although extensive variation in allele frequencies between different ethnic groups in similar geographic locations has been observed in Thailand (Kulaphisit et al., 2017) and other countries (e.g. Sri Lanka), this could not be reflected in our predicted allele frequencies. For example, allele frequencies of around 3.65% for the Hb CS mutation have been reported in the Khmer ethnic group in Surin and Buriram provinces, whilst our model predicts maximum allele frequencies of 1.65% here. This smoothing process can similarly explain why the highest observed αND-thalassaemia frequency of 7% in Khon Kaen was not reproduced in the predicted maps. In fact, our model-based predictions for αND-thalassaemia are remarkably homogenous and the average correlation between the observed and predicted frequencies is low (0.47). This is because the close-range heterogeneity in the observed data, coupled with the absence of a long-range trend in frequency (as is observed for α0- and α+-thalassaemia), makes it difficult for the model to discern a signal. It is likely that other factors influence the allele frequencies of the different α-thalassaemia forms, but have not been considered in this mapping study, including ethnicity, consanguinity, historic rates of malaria (both Plasmodium falciparum and P. vivax) (Douglas et al., 2012) and population migration patterns. Furthermore, there is bound to be uncertainty in the geolocation of some of the surveys included in the study due to the lack of details published or available. This uncertainty could not be accounted for. Finally, whilst the inclusion criterion of molecular methods should help to improve the reliability of allele frequency estimates, they are not 100% sensitive (Old and Henderson, 2010) and do not cover all possible α-thalassaemia mutations, which may lead to some error in the reported allele frequencies. Whilst we have calculated the burden of α-thalassaemia in terms of the number newborns born with severe forms in 2020, there are other aspects of the disease burden that would be worth considering pending the availability of more data, for example, milder-forms and their coinheritance with β-thalassaemia, DALY losses from α-thalassaemia, maternal complications (some of which can be life-threatening) (Chui, 2005; Ratanasiri et al., 2009), psychological effects and, in the case of HbH disease, survival data allowing the calculation of all-age population estimates. Furthermore, the estimates presented here do not include compound disorders, such as EA Bart’s and EF Bart’s diseases (HbH disease with heterozygous and homozygous forms of βE, another clinically important structural β-globin variant, respectively), and therefore remain underestimates of the overall burden of α-thalassaemia disorders in Thailand (Galanello, 2013). Finally, the visualisation of our burden estimates are subject to the modifiable area unit problem, whereby the presentation of estimates at the province level likely masks pockets of high burden (Wong, 2009).

Future prospects and conclusions

The allele frequency, distribution and genetic variant profile of α-globin forms is only a part of their epidemiological complexity. An improved understanding of the natural history of α-thalassaemia and the factors that modify its clinical outcome will be imperative for establishing better estimates of its burden. This is particularly pertinent in the Southeast Asian region, where the disorder co-exists with β-thalassaemia, including the commonest haemoglobin variant, Hb E. Many studies have shown a positive epistatic interaction between α- and β-thalassaemia, whereby their co-inheritance results in the amelioration of the associated blood disorder (Fucharoen and Weatherall, 2012; Viprakasit et al., 2004). A detailed assessment of current knowledge on the allele frequency of α-thalassaemia and the magnitude of its health burden is needed to develop suitable prevention and control programmes. This study provides a detailed overview of the existing data on the gene frequency and genetic diversity of α-thalassaemia in Southeast Asia. We show that our knowledge of the accurate allele frequency and distribution of this highly complex disease remains somewhat limited. Because of the remarkable geographic heterogeneities in the gene frequency of α-thalassemia, interventions have to be tailored to the specific characteristics of the local population (e.g., prevalence of the disorder in the population, ethnic makeup, and consanguinity) and the local health care system As the epidemiological transition in these countries continues (Weatherall, 2011; Bundhamcharoen et al., 2011; Dhillon et al., 2012), it will become increasingly important to regularly update regional and national maps of α-thalassaemia gene frequency and newborn estimates such that health and demographic changes can be properly quantified (Piel and Weatherall, 2014). Our findings provide a baseline for such endeavours.

Materials and methods

Compiling a geodatabase of α-thalassaemia allele frequency and genetic diversity

A comprehensive search of three major online bibliographic databases (PubMed, ISI Web of Knowledge and Scopus) was performed to identify published surveys of α-thalassaemia prevalence and/or genetic diversity in Southeast Asia (Figure 7). The 10 member states of the Association of Southeast Asian Nations (ASEAN) were used to define the region under study and include: Brunei Darussalam, Cambodia, Indonesia, Lao PDR, Malaysia, Myanmar, Philippines, Singapore, Thailand and Vietnam (Figure 1—figure supplement 1). In addition, for Thailand, articles published in national journals (in Thai) – not included in international bibliographic databases – were manually searched for local surveys. Consistent and pre-defined sets of inclusion criteria for prevalence/allele frequency data and genetic diversity data, outlined in the Appendix 1, were used to identify relevant surveys. Data extracted from Thai journals were independently validated against the inclusion criteria by two of the authors (SE and CH).
Figure 7.

A schematic overview of the methodology used in this study and a breakdown of the data types analysed.

Pink diamonds indicate the database and input data; green boxes denote model processes and data visualisation steps; blue rods represent study outputs. .

A schematic overview of the methodology used in this study and a breakdown of the data types analysed.

Pink diamonds indicate the database and input data; green boxes denote model processes and data visualisation steps; blue rods represent study outputs. .

Modelling continuous maps of α-thalassaemia allele frequency in Thailand

We employed a Bayesian geostatistical framework to model the allele frequencies of α0-, α+- and αND-thalassaemia, respectively, in Thailand, where a substantially higher number of surveys were identified. We included data from Thailand and its neighbouring countries (Myanmar, Lao PDR, Cambodia and Malaysia) in order to preclude the possibility of a border effect. Three surveys that were reported only at the national level (one in Thailand and two in Malaysia) were excluded for this part of the analysis. Only geographical location was included as a predictor of α-thalassaemia allele frequency (Figure 7). For each of the three main forms of α-thalassaemia, a model was fitted using a Bayesian Stochastic Partial Differential Equation (SPDE) approach with Integrated Nested Laplace Approximation (INLA) algorithms developed by Rue et al. (Rue et al., 2009), available in an R-package (www.r-inla.org). The observed allele frequency data were transformed through an empirical logit to facilitate approximation by the Gaussian likelihood. The fitted model was then used to generate predictions at a resolution of 1 km x 1 km for α-thalassaemia allele frequencies for all unsampled locations in Thailand. Uncertainty estimates, measured as the 95% credible interval, for the predictions were calculated using 100 conditionally simulated realisations of the model to generate a posterior predictive distribution (PPD) for each 1 km x 1 km pixel. Full details of the modelling process and model validation procedures, which involved a 10-fold cross validation, are provided in Appendix 2.

A breakdown of the genotypes for the three clinically important forms of α-thalassaemia – Hb Bart’s hydrops fetalis, deletional HbH disease and non-deletional HbH disease – and the Hardy-Weinberg equilibrium (HWE) proportions used for their calculation.

To compare our model output with previous newborn estimates for Hb Bart’s hydrops fetalis and deletional HbH disease, we paired our allele frequency maps with 2003 demographic and birth data and included a measure of consanguinity in our calculations.

Refining estimates of the annual number of neonates affected by severe disease forms

To generate estimates of the annual number of newborns affected by Hb Bart’s hydrops fetalis syndrome (--/--) and deletional and non-deletional HbH disease (-α/-- and ααND/--, respectively) in Thailand in 2020, we paired the predicted allele frequency maps generated using our Bayesian geostatistical framework with high-resolution birth count data. First, we combined the three allele frequency maps to estimate the frequency of each genotype in each pixel, assuming Hardy-Weinberg proportions for a four-allele system (Equation 1 and Table 1) (Hardy, 1908; Weinberg, 1908).where, p is the allele frequency of α a0-thalassaemia (--), q is the allele frequency of α+-thalassaemia (-α), r is the allele frequency of αND-thalassaemia (ααND) and s is the allele frequency of the wild-type α-globin haplotype (αα).
Table 1.

A breakdown of the genotypes for the three clinically important forms of α-thalassaemia – Hb Bart’s hydrops fetalis, deletional HbH disease and non-deletional HbH disease – and the Hardy-Weinberg equilibrium (HWE) proportions used for their calculation.

To compare our model output with previous newborn estimates for Hb Bart’s hydrops fetalis and deletional HbH disease, we paired our allele frequency maps with 2003 demographic and birth data and included a measure of consanguinity in our calculations.

GenotypeDisorderHWE proportionsInclusion of population coefficient of consanguinity (F)
--/--Hb Bart’s hydrops fetalisp2p2 + Fp(1 p)
-α/--Deletional HbH disease2pq2pq(1 F)
ααND/--Non-deletional HbH disease2pr2pr(1 F)
To calculate birth counts, the 2015–2020 crude birth rate for Thailand was downloaded from the 2017 United Nations (UN) world population prospects (World population prospects, 2017) and multiplied with a high-resolution predicted 2020 population surface, adjusted to UN population estimates, obtained from the WorldPop project (www.worldpop.org.uk, last accessed 23 January 2018) (Tatem, 2017). The predicted genotype frequencies were then paired with the birth count data over 100 conditionally simulated realisations of the geostatistical model and areal estimates at province level calculated, together with 95% credible intervals; their calculation is described in Appendix 3. We also applied our maps to 2003 demographic data, and incorporated consanguinity into our calculations (Table 1), (Vieira et al., 2013) in order to more directly compare estimates generated using our method with previous estimates (Modell and Darlison, 2008). We used the online global database of consanguinity estimates (www.consang.net) to identify an upper limit for the coefficient of consanguinity for Thailand (F = 0.1) (Bittles and Black, 2015). However, due to important variations of this coefficient between ethnic groups and the lack of reliable or high-resolution data for consanguinity, we chose not to include it in our main calculations.

Summarising the current evidence-base for α-thalassaemia gene frequency in Southeast Asia

Cartographic representations of the identified prevalence surveys were generated using ArcGIS 10.4.1 (ESRI Inc, Redlands, CA, USA). The descriptive maps reflect the spatial distribution of the prevalence surveys, along with their respective sample sizes and observed α0-, α+- and αND-thalassaemia allele frequencies. Other features of the database, including the temporal distribution of the surveys, the identity of the populations studied (e.g. community, pregnant women, newborns, etc.) and the contribution of local Thai surveys to the evidence-base, were also examined.

Mapping α-thalassaemia genetic diversity

Maps of the genetic diversity of α-thalassaemia across Southeast Asia were also generated (Figure 6). Given the heterogeneity in the reporting of different α-thalassaemia genotypes, we divided the genetic diversity data into two subtypes: (i) those surveys that only distinguished between the different α-thalassaemia forms (α0-, α+-, and αND-thalassaemia), and (ii) those surveys that contained detailed count data for a range of common mutations. We focused on the 11 mutations that are most commonly reported in Southeast Asia or are part of standard multiplex polymerase chain reaction (PCR) methods: -α3.7, -α4.2, --SEA, --THAI, --MED, --FIL, -(α)20.5, Hb Adana (HBA2:c.179G > A), Hb CS (HBA2:c.427T > C). Hb Paksé (HBA2:c.429A > T), Hb Quong Sze (HbA2:c.377T > C) (Liu et al., 2000). An ‘Other’ category was used for other, rarer α-thalassaemia mutations. For the first data subtype, only surveys that tested for all three α-thalassaemia forms were mapped and the relative proportions of the different forms in the study sample were displayed using pie charts. For the latter, the same approach to that used in Howes et al. (2013) was used; the variant proportions were displayed using bar charts in which all variants that were explicitly tested for were included on the x-axis (Howes et al., 2013). This allowed information regarding the suite of variants that were tested for in the survey to be displayed as well as unambiguous representation of the absence of a variant in the study sample. In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses. [Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed.] Thank you for submitting your article "Estimating the burden of α-thalassaemia in Thailand using a comprehensive prevalence database for Southeast Asia" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Eduardo Franco as the Senior Editor. The two reviewers have opted to remain anonymous. The Reviewing Editor has highlighted the concerns that require revision and/or responses, and we have included the separate reviews below for your consideration. If you have any questions, please do not hesitate to contact us. Summary: This is important, novel work. The analyses show a visualized map of α-thalassemia prevalence and genotypic diversity, which can be of a large benefit for future prevention policy in Thailand. A major strength of this work is an extensive review of relevant prevalence and genotype reports. Overall, this represents a valuable and much-needed contribution to the literature about this neglected genetic disorder. Major concerns: The model's smoothing effect appears to be quite strong, avoiding the more extreme points such as the 7% frequency of ND-thalassaemia quoted in paragraph four of subsection “Overall distribution of α-thalassaemia across Southeast Asia”. This ND-thalassaemia model also had relatively low correlation statistics. The authors need to discuss this further, and why the range in the predicted mean map is so homogenous for the ND-thalassaemias? Figures 4-6: why did you select to map proportions and not frequencies? Frequencies would seem a more informative metric. These maps show considerable variation between surveys very close together geographically (e.g. Vietnam and peninsular Malaysia), which would merit some mention in the Discussion. The authors need to clarify if this model underestimates the burden of α-thalassemia diseases in Thailand due to the overlapping prevalence of α- and β-thalassemia. The authors should clarify why Bangkok is predicted to have the highest burden of both hydrops fetalis and HbH disease (deletional and non-deletional), despite not having the highest allele frequency of α0-thalassemia (Figure 1). The influence of consanguinity is interesting. Authors ran their predictions for 2003 both with and without an adjustment coefficient, finding a relatively small difference. Can the authors explain their decision to not include this adjustment factor in their main predictions for 2020? Separate reviews (please respond to each point): Reviewer #1: The highlight of this work was, for the first time, employment of geostatistical model to estimate burden of severe form of α-thalassemia in Thailand, and to the lesser extent in other Southeast Asian countries, in 2020. The analyses also show visualized map of α-thalassemia prevalence and genotypic diversity, which can be of a large benefit for future prevention policy in Thailand. The strength of this work is an extensive review of relevant prevalence and genotype reports, not only from those published in international database, but also from local journals. Criteria for selection of unbiased surveys was clearly stated in the supplementary method 1. The limitation of this work was that these maps represent a prediction or estimation of prevalence. Therefore accuracy of the estimation depends largely upon availability of published data from individual area of the region. The authors, however, clearly stated this limitation in the discussion. Overlapping prevalence of α- and β-thalassemia are also commonly observed in Thailand. This includes e.g., AEBart's disease or AEBart's with CS, clinical phenotype of which can be more severe compare to those with deletional or non-deletional HbH alone. Therefore this model remain underestimates burden of α-thalassemia diseases in Thailand in this context. Future work on model of β-thalassemia prevalence will be very helpful. Other specific comment: In the Result subsection “Estimates of number of affected newborns in Thailand”: The authors should clarify why Bangkok is predicted to have highest burden of both hydrops fetalis and HbH disease (deletional and non-deletional), despite not having the highest allele frequency of α0-thalassemia (Figure 1). Reviewer #2: Hockham and colleagues present an impressive description of the available data about the burden of α-thalassaemia in Thailand and the broader ASEAN region. They make a strong case for the need for this analysis, and then detail a very thorough description of the available information, together with modelled surfaces of α-thalassaemia allele frequencies in Thailand. The scarcity of evidence outside Thailand constrained the modelling analysis to this country alone. From these maps, estimates of newborns affected by different forms of α-thalassaemia in Thailand are projected for 2020. Overall, this presentation of the evidence, and extension thereof through the modelling analysis, represents a valuable and much-needed contribution to the literature about this neglected genetic disorder – congratulations to the authors on this commendable undertaking and their clear distillation of a highly complex disorder. The Introduction also quotes frequencies of close to 80% in Southeast Asia (citing reference 1), while values of this magnitude are not discussed in the section "Overall distribution of α-thalassaemia across Southeast Asia". If the survey reporting 80% did not meet the inclusion criteria for the present study, perhaps take out from the Introduction as could be misleading and confusing. Supplementary methods 1, Survey inclusion criteria: the maps risk being misleading if ethnic groups indigenous to an area but not representative are included, given they may today be a minority group. Were any such surveys in fact included in the mapping dataset? Figures 1 and 2 need to be swapped around (data points first, then the continuous maps). Does the Ministry of Health reporting system include any statistics on α-thalassaemia disorders that could be compared to? Or are the necessary diagnostics not sufficiently accessible for figures to be helpful? Minor Comments: Introduction paragraph three: brief reference to the variation in the ND phenotypes would be helpful. E.g. "non-deletional variants are variable and typically associated with…" Reference 22: Author name is "Darlison", not "Darlinson" Reference 25 (Introduction paragraph five): this reference contains no geostatistical modelling. Suggest citing this instead: https://www.ncbi.nlm.nih.gov/pubmed/23152723 Results section paragraph two: was a minimum sample size imposed? (maybe c/ref to the supplementary files here) Subsection “Estimates of number of affected newborns in Thailand”: what limits were used to define the spatial extent of Bangkok? Subsection “Overall distribution of α-thalassaemia across Southeast Asia”: Supplementary file 2 doesn't currently include this data. Subsection “Genetic diversity of α-thalassaemia across Southeast Asia”: "only distinguished between them" is not very clear; could change to something like "and maps relative proportions of each allele type without giving specific variant details". Discussion, line three: suggest some additional detail in relation to the "epidemiological transition" – such as adding "from infectious to non-communicable causes of disease and death". Discussion, paragraph one, last sentence: "high prevalence" is difficult to see in the maps in Figure 2. Perhaps adjust the wording. Discussion, paragraph two, final sentence: are there no national statistics on the numbers of stillbirths? Discussion, subsection “Model strengths and limitations”, third paragraph: malaria is listed as a possible influence of allele frequencies – should this be "historic" rates of malaria? Given that the diagnostics used here are molecular, this should not have any haematological impact of malaria infection on the diagnostic outcomes. Subsection “Refining estimates of the annual number of neonates affected by severe disease forms”: state the value of the consanguinity coefficient, and list its source. A reference to explain how F is applied in the Hardy-Weinberg equations would be useful – e.g.: https://www.ncbi.nlm.nih.gov/pubmed/23950147 Figure 2: the specific locations of the points in panel A are hard to see, but would be helpful in exploring the modelled surfaces. Perhaps include a simpler map just showing the survey locations in the Supplementary information for a comparison with the modelled outputs. Or try scaling the data to resize the points? Figure 2: would also include the star meaning in the text legend. Figure 3: Legend needs to include some reference sample size pie charts; the pie charts are also quite difficult to distinguish by size, possible to widen the size range? Figure 7: the model input indicates that "transformed allele frequency" was input into the model – what transformation was applied and why? Major concerns: The model's smoothing effect appears to be quite strong, avoiding the more extreme points such as the 7% frequency of ND-thalassaemia quoted in paragraph four of subsection “Overall distribution of α-thalassaemia across Southeast Asia”. We have carefully rechecked the model’s smoothing effect in light of this comment. The smoothing effect reflects both the quantity, size and heterogeneity of surveys included in our study. To expand on this, we have provided more details in paragraph three in the Model strengths and limitations section in the Discussion as follows: “Spatial smoothing is an important component of our geostatistical model. For the modelling approach used here, the range function (i.e. the extent of spatial autocorrelation) is defined by a parameter within the SPDE framework and takes a prior distribution. The smoothing in the approximate posterior therefore balances over- and under-fitting and is necessary to ensure that the model predicts adequately without fitting the idiosyncrasies of the data. As a result, the model does not predict allele frequencies that fully reflect heterogeneity between nearby surveys. Although extensive variation in allele frequencies between different ethnic groups in similar geographic locations has been observed in Thailand (36) and other countries (e.g. Sri Lanka), this could not be reflected in our predicted allele frequencies. For example, allele frequencies of around 3.65% for the Hb CS mutation have been reported in the Khmer ethnic group in Surin and Buriram provinces, whilst our model predicts maximum allele frequencies of 1.65% here. This smoothing process can similarly explain why the highest observed αND-thalassaemia frequency of 7% in Khon Kaen was not reproduced in the predicted maps; the smoothing process is also taking into account the presence of multiple nearby surveys of larger sample size reporting lower frequencies, thereby masking the extreme value of 7%.” This ND-thalassaemia model also had relatively low correlation statistics. The authors need to discuss this further, and why the range in the predicted mean map is so homogenous for the ND-thalassaemias? We spent a substantial amount of time exploring alternative models for the ND-thalassaemia map, for example Gaussian models that used a higher number of nodes in the triangular mesh as well as models that used a betabinomial distribution for the observed data. However, the original model presented in the manuscript remained the best-performing model. We have added a few sentences to the end of the above Discussion paragraph to comment on the homogeneity in the predictions and the low correlation statistic as follows: “In fact, our model-based predictions for αND-thalassaemia are remarkably homogenous and the average correlation between the observed and predicted frequencies is low (0.47). This is because the close-range heterogeneity in the observed data, coupled with the absence of a long-range trend in frequency (as is observed for α0- and α+-thalassaemia), makes it difficult for the model to discern a signal.” Figures 4-6: why did you select to map proportions and not frequencies? Frequencies would seem a more informative metric. We have mapped both the proportions (Figure 3) and the frequencies (Figures 4–6), and corrected the legends accordingly. We have also amended the main text to reduce any ambiguity as follows: “Figure 3 (Figure 3—source data 1) displays surveys that included all three α-thalassaemia forms (α0-, α+- and αND-thalassaemia), allowing relative proportions of each of the forms to be calculated without giving specific variant details.” These maps show considerable variation between surveys very close together geographically (e.g. Vietnam and peninsular Malaysia), which would merit some mention in the Discussion. We fully agree and we have added the following paragraph under the section Patterns of genetic variation and their public health implications in the Discussion: “The pattern of genetic diversity observed in this study indicates variable distributions of mild and severe α-thalassaemia forms. Reasons for this are unclear. However, high variant heterogeneity has been observed for other genetic disorders (e.g. G6PD deficiency) in Southeast Asia, (34) which might suggest a similar underlying cause. In their global study, Howes et al. noted that G6PD variants were most diverse in East Asia and the West Pacific, where P. falciparum parasites show strong population structure with lower genetic relatedness between populations in the region. Indeed, P. falciparum has been shown to display genetically structured populations within Thailand alone. (35) It is possible that the evolutionary dynamics between P. falciparum and haemoglobin variants, including α-thalassaemia, are more complex than we currently appreciate.” The authors need to clarify if this model underestimates the burden of α-thalassemia diseases in Thailand due to the overlapping prevalence of α- and β-thalassemia. Although this was briefly mentioned in the Discussion under Model strengths and limitations, we have amended the sentence to clarify, as suggested: “Furthermore, the estimates presented here do not include compound disorders, such as EA Bart’s and EF Bart’s diseases (HbH disease with heterozygous and homozygous forms of βE, another clinically important structural β-globin variant, respectively), and therefore remain underestimates of the overall burden of α -thalassaemia disorders in Thailand.” The authors should clarify why Bangkok is predicted to have the highest burden of both hydrops fetalis and HbH disease (deletional and non-deletional), despite not having the highest allele frequency of α0-thalassemia (Figure 1). To clarify this point, we have amended the relevant sentence in the Results section as follows: “The highest absolute burden of hydrops fetalis is predicted in Bangkok city (57 [CI: 13 – 151]), with its high population density, and Udon Thani (23 [CI: 6 – 66]) in the northeastern zone, where some of the highest α0-thalssaemia allele frequencies are predicted.” The influence of consanguinity is interesting. Authors ran their predictions for 2003 both with and without an adjustment coefficient, finding a relatively small difference. Can the authors explain their decision to not include this adjustment factor in their main predictions for 2020? This is a recurring challenge associated with consanguinity data. We have added the following clarification to the Materials and methods section: “We used the online global database of consanguinity estimates (www.consang.net) to identify an upper limit for the coefficient of consanguinity for Thailand (F = 0.01) (52). However, due to important variations of this coefficient between ethnic groups and the lack of reliable or high-resolution data for consanguinity, we chose not to include it in our main calculations.” Separate reviews (please respond to each point): Reviewer #1: […] The strength of this work is an extensive review of relevant prevalence and genotype reports, not only from those published in international database, but also from local journals. Criteria for selection of unbiased surveys was clearly stated in the supplementary method 1. The limitation of this work was that these maps represent a prediction or estimation of prevalence. Therefore accuracy of the estimation depends largely upon availability of published data from individual area of the region. The authors, however, clearly stated this limitation in the discussion. Overlapping prevalence of α- and β-thalassemia are also commonly observed in Thailand. This includes e.g., AEBart's disease or AEBart's with CS, clinical phenotype of which can be more severe compare to those with deletional or non-deletional HbH alone. Therefore this model remain underestimates burden of α-thalassemia diseases in Thailand in this context. Future work on model of β-thalassemia prevalence will be very helpful. Other specific comment: In the Result subsection “Estimates of number of affected newborns in Thailand”: The authors should clarify why Bangkok is predicted to have highest burden of both hydrops fetalis and HbH disease (deletional and non-deletional), despite not having the highest allele frequency of α0-thalassemia (Figure 1). As described above, we have amended the relevant sentence in the Results section as follows: “The highest absolute burden of hydrops fetalis is predicted in Bangkok city (57 [CI: 13 – 151]), with its high population density, and Udon Thani (23 [CI: 6 – 66]) in the northeastern zone, where some of the highest α0-thalssaemia allele frequencies are predicted.” Reviewer #2: Hockham and colleagues present an impressive description of the available data about the burden of α-thalassaemia in Thailand and the broader ASEAN region. They make a strong case for the need for this analysis, and then detail a very thorough description of the available information, together with modelled surfaces of α-thalassaemia allele frequencies in Thailand. The scarcity of evidence outside Thailand constrained the modelling analysis to this country alone. From these maps, estimates of newborns affected by different forms of α-thalassaemia in Thailand are projected for 2020. Overall, this presentation of the evidence, and extension thereof through the modelling analysis, represents a valuable and much-needed contribution to the literature about this neglected genetic disorder – congratulations to the authors on this commendable undertaking and their clear distillation of a highly complex disorder. The Introduction also quotes frequencies of close to 80% in Southeast Asia (citing reference 1), while values of this magnitude are not discussed in the section "Overall distribution of α-thalassaemia across Southeast Asia". If the survey reporting 80% did not meet the inclusion criteria for the present study, perhaps take out from the Introduction as could be misleading and confusing. Thank you very much for highlighting this. This was an error – in fact reference 1 indicates that these very high frequencies have been reported in India and Papua New Guinea, not in Southeast Asia. We have corrected this in the Introduction as follows: “It is estimated that up to 5% of the world’s population carries at least one α-thalassaemia variant, with some populations (e.g. in India and Papua New Guinea) reporting gene frequencies of close to 80% (1)”. Supplementary methods 1, Survey inclusion criteria: the maps risk being misleading if ethnic groups indigenous to an area but not representative are included, given they may today be a minority group. Were any such surveys in fact included in the mapping dataset? We agree that this sentence is misleading and have changed it as follows: “Moreover, surveys that were carried out in specific ethnic groups were only included if the population being investigated was representative of the study area. Representativeness was determined based on information provided in the reference.” Figures 1 and 2 need to be swapped around (data points first, then the continuous maps). Thank you for pointing this out. We have swapped Figures 1 and 2 as suggested. Does the Ministry of Health reporting system include any statistics on α-thalassaemia disorders that could be compared to? Or are the necessary diagnostics not sufficiently accessible for figures to be helpful? Our Thai collaborators confirmed that statistics on severe α-thalassaemia cases born each year are not available for Thailand. Commonly reported figures are based on an old study from 1988 (Fucharoen and Winichagoon, 1988). Prenatal screening data from the Ministry of Health are not publicly accessed nor usable in their current format as they do not distinguish between severe forms of α-thalassaemia and β-thalassaemia. Minor Comments: Introduction paragraph three: brief reference to the variation in the ND phenotypes would be helpful. E.g. "non-deletional variants are variable and typically associated with…" We have added the following sentence to the Introduction: “However, even amongst non-deletional variants, considerable phenotypic variability is observed (13).” We have also added a reference: Variable clinical phenotypes of α-thalassemia syndromes (Singer, 2009). Reference 22: Author name is "Darlison", not "Darlinson" Thank you for pointing this out. We have rectified the mistake. Reference 25 (Introduction paragraph five): this reference contains no geostatistical modelling. Suggest citing this instead: https://www.ncbi.nlm.nih.gov/pubmed/23152723 Thank you for pointing this out. We have amended the reference as per your suggestion. Results section paragraph two: was a minimum sample size imposed? (maybe c/ref to the supplementary files here) We have referred the readers to the Supplementary Results section (now Appendix 5) for further details by adding the following statement in the main text “Further details on the final database are provided in Appendix 5.” Subsection “Estimates of number of affected newborns in Thailand”: what limits were used to define the spatial extent of Bangkok? When we refer to Bangkok, we are referring specifically to Bangkok city and not Bangkok metropolis, which includes some of the neighbouring provinces. I have changed all instances of “Bangkok” to “Bangkok city”. I have also included an extra supplement to Figure 2 (Figure 2—figure supplement 2) that provides a reference map of Thailand provinces and referred to the map in relevant places in the main text. Subsection “Overall distribution of α-thalassaemia across Southeast Asia”: Supplementary file 2 doesn't currently include this data. We have added all of the countries to the table. We have also amended the sentence to make it clearer that we are referring to the major α-thalassaemia forms and not specific variants: “Appendix 5—table 2 shows the range of allele frequencies observed for the different α-thalassaemia forms (α0-, α+- and αND-thalassaemia) in each country.” Subsection “Genetic diversity of α-thalassaemia across Southeast Asia”: "only distinguished between them" is not very clear; could change to something like "and maps relative proportions of each allele type without giving specific variant details". We have amended this section as follows to clarify: “Figure 3 (Figure 3—source data 1) displays surveys that included all three α-thalassaemia forms (α0-, α+- and αND-thalassaemia), allowing relative proportions of each of the forms to be calculated without giving specific variant details. Figures 4–6 display surveys that provided information on the frequencies of specific α-thalassaemia variants (e.g. --SEA, -α3.7, etc.). For these, the variants that were tested for differ between surveys. Some surveys are included in both Figure 3 and Figures 4–6. For the latter figures, the region has been divided to improve visualisation of the data.” Discussion, line three: suggest some additional detail in relation to the "epidemiological transition" – such as adding "from infectious to non-communicable causes of disease and death". We agree that this may be helpful to some readers. We have amended the sentence as follows: “α-thalassaemia is a neglected public health problem whose burden has, to date, been largely overlooked, but for which morbidity is expected to increase in the coming decades as a result of the epidemiological transition, whereby acute infectious disease is replaced by chronic disease as the predominant cause of morbidity and mortality.” Discussion, paragraph one, last sentence: "high prevalence" is difficult to see in the maps in Figure 2. Perhaps adjust the wording. We have adjusted the wording accordingly. The sentence now reads: “This is particularly true for countries in Southeast Asia as well as the Mediterranean area, where severe forms of α-thalassaemia (i.e. α0-thalassaemia) are found.” Discussion, paragraph two, final sentence: are there no national statistics on the numbers of stillbirths? We have double-checked with our Thai collaborators and co-authors. The only reference that we could find is the one cited. Discussion, subsection “Model strengths and limitations”, third paragraph: malaria is listed as a possible influence of allele frequencies – should this be "historic" rates of malaria? Given that the diagnostics used here are molecular, this should not have any haematological impact of malaria infection on the diagnostic outcomes. We agree with this comment. We have changed the sentence to reflect this: “It is likely that other factors influence the allele frequencies of the different α-thalassaemia forms, but have not been considered in this mapping study, including ethnicity, consanguinity, historic rates of malaria (both Plasmodium falciparum and P. vivax) (37) and population migration patterns.” Subsection “Refining estimates of the annual number of neonates affected by severe disease forms”: state the value of the consanguinity coefficient, and list its source. A reference to explain how F is applied in the Hardy-Weinberg equations would be useful – e.g.: https://www.ncbi.nlm.nih.gov/pubmed/23950147 As suggested, we have added the value of the consanguinity coefficient used and a reference, in the Material and methods paragraph: The revised text is as follows: “We used the online global database of consanguinity estimates (www.consang.net) to identify an upper limit for the coefficient of consanguinity for Thailand (F = 0.01) (53). However, due to the lack of reliable or high-resolution data for consanguinity, we chose not to include it in our main calculations.” Figure 2: the specific locations of the points in panel A are hard to see, but would be helpful in exploring the modelled surfaces. Perhaps include a simpler map just showing the survey locations in the Supplementary information for a comparison with the modelled outputs. Or try scaling the data to resize the points? We have included these maps in Figure 2—figure supplement 1 and referred to them in the appropriate section of the Results (subsection “Continuous allele frequency maps for Thailand” and in the legend for Figure 2). Figure 2: would also include the star meaning in the text legend. We have added the following sentence to the legend of Figure 2: “Surveys that could only be mapped at the national level are indicated by a black star.” Figure 3: Legend needs to include some reference sample size pie charts; the pie charts are also quite difficult to distinguish by size, possible to widen the size range? We have added reference sample pie charts to the legend and increased the range size. We have also removed the log scale. Figure 7: the model input indicates that "transformed allele frequency" was input into the model – what transformation was applied and why? The transformation was only mentioned in the Supplementary Information (now Appendix 2). We have added the following to the Materials and methods section: “The observed allele frequency data were transformed through an empirical logit to facilitate approximation by the Gaussian likelihood.”
Appendix 5—table 1.

Summary of the α-thalassaemia dataset characteristics according to the type of data provided (allele frequency data or genetic variant data), and overall.

Numbers correspond to individual surveys that met the study inclusion criteria. As some sources reported more than one survey from multiple locations or in multiple population groups, the number of surveys is greater than the number of references in the Supplementary files 1 and 2. Some surveys reported data on both α-thalassaemia prevalence and genetic diversity and are therefore included twice in these columns, but once in the overall column.

Allele frequency dataGenetic diversity dataOverall
Total surveys10460106
Number of countries888
Publication time
1959–1969000
1970–1979000
1980–1989222
1990–1999858
2000–2009461547
2010–2017473848
N/A101
Spatial extent
Admin 0 centroids444
Admin one centroids271227
Admin two centroids1089
Admin three centroids333
Points462247
Multiple centroids/points141116
Total individuals sampled132,15732,237133,649
Survey count by sample size
≤50324
51–250221822
251–500382338
501–750211021
751–1000313
>100017618
  48 in total

1.  Status of epidemiology in the WHO South-East Asia region: burden of disease, determinants of health and epidemiological research, workforce and training capacity.

Authors:  Preet K Dhillon; Panniyammakal Jeemon; Narendra K Arora; Prashant Mathur; Mahesh Maskey; Ratna Djuwita Sukirna; Dorairaj Prabhakaran
Journal:  Int J Epidemiol       Date:  2012-05-21       Impact factor: 7.196

2.  Molecular diagnostics for haemoglobinopathies.

Authors:  John Old; Shirley Henderson
Journal:  Expert Opin Med Diagn       Date:  2010-03-15

3.  Heterogeneity of hemoglobin H disease in childhood.

Authors:  Ashutosh Lal; Michael L Goldrich; Drucilla A Haines; Mahin Azimi; Sylvia T Singer; Elliott P Vichinsky
Journal:  N Engl J Med       Date:  2011-02-24       Impact factor: 91.245

Review 4.  Hb H disease: clinical course and disease modifiers.

Authors:  Suthat Fucharoen; Vip Viprakasit
Journal:  Hematology Am Soc Hematol Educ Program       Date:  2009

5.  Global epidemiology of haemoglobin disorders and derived service indicators.

Authors:  Bernadette Modell; Matthew Darlison
Journal:  Bull World Health Organ       Date:  2008-06       Impact factor: 9.408

6.  High prevalence of alpha- and beta-thalassemia in the Kadazandusuns in East Malaysia: challenges in providing effective health care for an indigenous group.

Authors:  Jin-Ai Mary Anne Tan; Ping-Chin Lee; Yong-Chui Wee; Kim-Lian Tan; Noor Fadzlin Mahali; Elizabeth George; Kek-Heng Chua
Journal:  J Biomed Biotechnol       Date:  2010-09-05

Review 7.  A brief review on newborn screening methods for hemoglobinopathies and preliminary results selecting beta thalassemia carriers at birth by quantitative estimation of the HbA fraction.

Authors:  Eleni Mantikou; Sandra G Arkesteijn; Jaqueline M Beckhoven van; Jean-Louis Kerkhoffs; Cornelis L Harteveld; Piero Carlo Giordano
Journal:  Clin Biochem       Date:  2009-09-03       Impact factor: 3.281

8.  Burden of disease in Thailand: changes in health gap between 1999 and 2004.

Authors:  Kanitta Bundhamcharoen; Patarapan Odton; Sirinya Phulkerd; Viroj Tangcharoensathien
Journal:  BMC Public Health       Date:  2011-01-26       Impact factor: 3.295

9.  National, regional, and worldwide estimates of stillbirth rates in 2015, with trends from 2000: a systematic analysis.

Authors:  Hannah Blencowe; Simon Cousens; Fiorella Bianchi Jassir; Lale Say; Doris Chou; Colin Mathers; Dan Hogan; Suhail Shiekh; Zeshan U Qureshi; Danzhen You; Joy E Lawn
Journal:  Lancet Glob Health       Date:  2016-01-19       Impact factor: 26.763

10.  Global epidemiology of sickle haemoglobin in neonates: a contemporary geostatistical model-based map and population estimates.

Authors:  Frédéric B Piel; Anand P Patil; Rosalind E Howes; Oscar A Nyangiri; Peter W Gething; Mewahyu Dewi; William H Temperley; Thomas N Williams; David J Weatherall; Simon I Hay
Journal:  Lancet       Date:  2012-10-25       Impact factor: 79.321

View more
  4 in total

1.  Epidemiology of thalassemia among the hill tribe population in Thailand.

Authors:  Tawatchai Apidechkul; Fartima Yeemard; Chalitar Chomchoei; Panupong Upala; Ratipark Tamornpark
Journal:  PLoS One       Date:  2021-02-11       Impact factor: 3.240

2.  Consensus statement for the perinatal management of patients with α thalassemia major.

Authors:  Tippi C MacKenzie; Ali Amid; Michael Angastiniotis; Craig Butler; Sandra Gilbert; Juan Gonzalez; Roberta L Keller; Sandhya Kharbanda; Melanie Kirby-Allen; Barbara A Koenig; Wade Kyono; Ashutosh Lal; Billie R Lianoglou; Mary E Norton; Keith K Ogasawara; Tachjaree Panchalee; Mara Rosner; Marisa Schwab; Alexis Thompson; John S Waye; Elliott Vichinsky
Journal:  Blood Adv       Date:  2021-12-28

3.  Loop-mediated isothermal amplification (LAMP) colorimetric phenol red assay for rapid identification of α0-thalassemia: Application to population screening and prenatal diagnosis.

Authors:  Wittaya Jomoui; Hataichanok Srivorakun; Siriyakorn Chansai; Supan Fucharoen
Journal:  PLoS One       Date:  2022-04-28       Impact factor: 3.240

4.  Feasibility of and barriers to thalassemia screening in migrant populations: a cross-sectional study of Myanmar and Cambodian migrants in Thailand.

Authors:  Julia Z Xu; Wilaslak Tanongsaksakul; Thidarat Suksangpleng; Supachai Ekwattanakit; Suchada Riolueang; Marilyn J Telen; Vip Viprakasit
Journal:  BMC Public Health       Date:  2021-06-21       Impact factor: 3.295

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.