Literature DB >> 34302177

Admixture mapping analysis reveals differential genetic ancestry associated with Chagas disease susceptibility in the Colombian population.

Desiré Casares-Marfil¹, Beatriz Guillen-Guio², Jose M Lorenzo-Salazar³, Héctor Rodríguez-Pérez², Martin Kerick¹, Mayra A Jaimes-Campos⁴, Martha L Díaz⁴, Elkyn Estupiñán^1,4, Luis E Echeverría⁵, Clara I González⁴, Javier Martín¹, Carlos Flores^2,3,6, Marialbert Acosta-Herrera¹.

Abstract

Chagas disease is an infection caused by the parasite Trypanosoma cruzi, endemic in Latino America. Leveraging the three-way admixture between Native American (AMR), European (EUR) and African (AFR) populations in Latin Americans, we aimed to better understand the genetic basis of Chagas disease by performing an admixture mapping study in a Colombian population. A two-stage study was conducted, and subjects were classified as seropositive and seronegative for T. cruzi. In stage 1, global and local ancestries were estimated using reference data from the 1000 Genomes Project (1KGP), and local ancestry associations were performed by logistic regression models. The AMR ancestry showed a protective association with Chagas disease within the major histocompatibility complex region [Odds ratio (OR) = 0.74, 95% confidence interval (CI) = 0.66-0.83, lowest P-value = 4.53 × 10-8]. The fine mapping assessment on imputed genotypes combining data from stage 1 and 2 from an independent Colombian cohort, revealed nominally associated variants in high linkage disequilibrium with the top signal (rs2032134, OR = 0.93, 95% CI = 0.90-0.97, P-value = 3.54 × 10-4) in the previously associated locus. To assess ancestry-specific adaptive signals, a selective sweep scan in an AMR reference population from 1KGP together with an in silico functional analysis highlighted the Tripartite Motif family and the human leukocyte antigen genes, with crucial role in the immune response against pathogens. Furthermore, these analyses emphasized the macrophages, neutrophils and eosinophils, as key players in the defense against T. cruzi. This first admixture mapping study in Chagas disease provided novel insights underlying the host immune response in the pathogenesis of this neglected disease.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34302177 PMCID： PMC8643504 DOI： 10.1093/hmg/ddab213

Source DB: PubMed Journal: Hum Mol Genet ISSN： 0964-6906 Impact factor: 6.150

Introduction

Chagas disease (ICD-10-CM B57) is caused by the protozoan parasite Trypanosoma cruzi and is endemic in Latin American countries (1). This infection affects ~7 million people in endemic and non-endemic areas because of migratory movements (www.who.int/health-topics/chagas-disease#tab=tab_1). The main transmission vectors are members of the Reduviidae family, although other transmission mechanisms such as oral and congenital have been described (2). The infection starts with an acute phase characterized by inflammatory and unspecific clinical symptoms followed by an asymptomatic or chronic phase, the last one with cardiac or digestive involvement (2). In this sense, the critical role played by the host’s innate and adaptive immune responses during the acute phase is well known (2,3). The differential susceptibility to the infection together with the high exposure to the parasite in endemic areas suggests the implication of the host genetic component in the pathogenesis of the disease (2,4). To elucidate this role, previous candidate gene assessments and genome-wide associations studies (GWAS) have been carried out supporting the existence of this genetic contribution to Chagas disease and to its chronic cardiac form (5–7). Nevertheless, this genetic contribution is complex and our knowledge of it incomplete, as many other genetic polymorphisms are expected to contribute to the risk, including population-specific variants that are not well covered by GWAS (8). Latin American populations are commonly modeled as a recently admixed population from three main different continental sources: Native Americans (AMR), Europeans (EUR) and Africans (AFR) (9). Specifically, the Colombian population has been described to show one of the most disparate admixture proportions from these three continental ancestries among all Hispanic populations (10,11). Furthermore, the interindividual variation in ancestry proportions has been correlated with the risk of several diseases, including infectious diseases such as malaria (12,13). Functional assessments of prioritized loci suggest that they contain ancestry-specific alleles related to the immune system, proposing that this could be driven by the high exposure to several microbial pathogens (13). This scenario is amenable for admixture mapping studies, which allow the identification of genomic susceptibility regions where affected individuals share their genetic ancestry locally on chromosome regions compared with the non-affected ones. Thus, it allows to scan disease associations with the varying local ancestries in admixed populations (14,15). This approach is complementary to the traditional GWAS, and one of its main advantages is that it requires lower samples sizes for a given statistical power because of the reduced penalty by the multiple testing corrections (14). In Chagas disease, previous studies have assessed global ancestry proportions in Brazilian population, either as a complementary analysis to their main objective (16) or assessing its association with the disease (17), but without evaluating local ancestries. To better understand the host genetics involved in the risk of T. cruzi infection, we carried out the first admixture mapping study of Chagas disease in the Colombian population.

Results

This study has a two-stage case–control design and samples were classified as seropositive (cases) and seronegative (controls) for parasite antigens. Demographic characteristics are summarized in Table 1. Cases and controls from the stage 1 samples were matched in terms of global ancestries (Supplementary Fig. 1). Local ancestry blocks were estimated for the 471 342 positions corresponding to those of genotyped single nucleotide polymorphisms (SNPs) in stage 1. Local ancestry punctuations per individual were averaged and compared with their global ancestries showing a high correlation among them (r = 0.89–0.96) and indicating consistent global and local ancestry proportions in the population under study. Cases and controls were also matched in terms of local ancestries (Supplementary Table 1), and supervised and unsupervised analysis provided consistent results, reinforcing the selection of the representatives of the parental populations in the ancestry assessment (Supplementary Fig. 2).

Table 1

Demographical characteristics and sample size of the Colombian collections

	Stage 1		Stage 2
	Seropositive	Seronegative	Seropositive	Seronegative
Pre QC sample size	998	659	122	532
Post QC sample size	913	592	122	512
Sex, females (%)	503 (55)	357 (60)	65 (53)	315 (62)
Age (mean ± SD)	62.7 ± 16.2	49.0 ± 17.6	64.7 ± 11.2	50.0 ± 15.6

QC, quality controls.

Demographical characteristics and sample size of the Colombian collections QC, quality controls. Admixture mapping results based on local ancestry estimations were not inflated because of the presence of population stratification for any of the ancestries after genomic inflation (λ) correction (λAMR = 1.00, λEUR = 1.00, λAFR = 1.00). These results revealed genome-wide significant associations of AMR and EUR ancestries with Chagas disease in the positions of the chromosome 6 region 30 079 993-30 332 160 according to build hg19 (Fig. 1). We observed a lack of association of AFR ancestry with the serological status. These results revealed a differential susceptibility to the infection associated with two of the ancestries in this particular region, where AMR was in higher proportion among seronegative individuals overall, therefore associated with a protective effect. The associated region is located in the major histocompatibility complex (MHC) locus where the leading signal corresponds to rs115833233 [AMR ancestry odds ratio (OR) = 0.74, 95% confidence interval (CI) = 0.66–0.83, P-value = 4.53 × 10−8], which is located in the untranslated region (UTR) of exon 5 in the Tripartite Motif Containing 40 (TRIM40) gene. The association of AMR ancestry was unrelated with the genotypes in that position because conditioning by the allele dosage of the rs115833233 variant did not change the results. Thus, the admixture mapping peak was not explained by the genetic variation of that SNP (Table 2).

Figure 3

Selective sweep scan results for Native Americans (AMR) from the 1000 Genomes Project. Y and X axes represent the iSAFE scores and hg19 genomic positions, respectively. (A) iSAFE scores within the MHC region (chr6:28477895-33 389 603). Top ranking variants included in the significant admixture mapping region are highlighted in red (chr6:30079993-30 332 160). (B) Zoom in of the admixture mapping significant region, where the top 25 variants according to their iSAFE score are represented. Genetic variants are depicted in colored dots to reflect its LD with the variant with the highest iSAFE score based on pairwise r values in AMR population. Their dot size correlates to their composite variant to genes score (V2G) from Open Targets Genetics (https://genetics.opentargets.org/).

Table 2

Joint single nucleotide polymorphism-ancestry analysis in the discovery stage

Factor	OR (95% CI)	P-value
AMR ancestry	0.74 (0.66–0.83)	4.53 × 10⁻⁰⁸
Allele dosage rs115833233	1.00 (0.89–1.13)	0.99
AMR ancestry (conditioned on rs115833233)	0.73 (0.65–0.82)	2.15 × 10⁻⁰⁸

AMR, Native American ancestry; CI, confidence interval; OR, odds ratio.

Manhattan plots of the admixture mapping results based on local ancestry estimates of Native American (AMR; op), European (EUR; middle) and African (AFR bottom). Y and X axes refer to the –log10 transformed P-values and hg19 positions in chromosomes, respectively. The horizontal line indicates the significance threshold (P-value = 3.07 × 10−6). Joint single nucleotide polymorphism-ancestry analysis in the discovery stage AMR, Native American ancestry; CI, confidence interval; OR, odds ratio. To identify potential variants explaining the result, imputed genotype data from the significant admixture mapping region were assessed in the stage 1 samples and in independent samples from stage 2. This analysis identified 23 variants in high linkage disequilibrium (LD, r2 > 0.8) with nominal significance and consistent direction of effects in the two stages (Table 3). The leading variant was rs2032134, located intergenic to the Ribonuclease P/MRP Subunit P21 (RPP21) and TRIM39 genes (OR for the C allele = 0.93, 95% CI = 0.90–0.97, P-value = 3.54 × 10−4) (Fig. 2). We confirmed this association was dependent on the AMR ancestry because adjusting the models by the ancestry score resulted not significant (OR for the C allele = 0.99, 95% CI = 0.95–1.03, P-value = 0.766). In addition, we confirmed that the C allele was associated with local AMR ancestry as stratifying cases and controls by this ancestry proportion, the allele frequency was increased among carriers of local AMR ancestry in cases and controls in comparison with individuals bearing other ancestries in that position (Supplementary Fig. 3). Functional analysis showed that rs2032134 is an expression-quantitative trait loci (eQTL) for different human leukocyte antigen (HLA) members in different tissues, most significantly for the HLA-C in whole blood (P-value = 5.1 × 10−35) according to eQTLGen. Evidence of long-distance interactions between rs2032134 and another member of the Tripartite Motif (TRIM) family, TRIM31, was observed in macrophages and neutrophils (18), as well as in lymphoblastoid cell lines (19). Further functional assessments for rs2032134 and its best LD-proxies are summarized in Table 4.

Table 3

Association testing results and allele frequencies on imputed data for the admixture mapping associated region

		Stage 1			Stage 2			Meta-analysis
SNP_ID(*EA)	CHR:BP	EA frequency (cases/controls)	OR (95% CI)	P-value	EA frequency (cases/controls)	OR (95% CI)	P-value	OR (95% CI)	P-value
rs2032134*C	6:30360509	0.81/0.86	0.95 (0.90–0.99)	2.57 × 10⁻²	0.75/0.84	0.92 (0.87–0.97)	3.67 × 10⁻³	0.93 (0.90–0.97)	3.54 × 10⁻⁴

EA, effect allele; BP, base pair; CHR, chromosome; CI, confidence interval; OR, odds ratio; SNP, single nucleotide polymorphism.

Figure 2

Regional plots of the results from admixture (top) and fine mapping (bottom) analyses, showing the −log10 transformed P-value in the y-axis and the hg19 genomic position in the x-axis. Top: Region centered on the significant variants for the AMR ancestry association. The horizontal line indicates the significance threshold (P-value = 3.07 × 10−6). Estimated recombination rates (light blue line) are plotted on the right y-axis. Bottom: Meta-analysis results from stages 1 and 2 within the admixture mapping significant region (chr6:30079993-30 332 160) and its proxies (r > 0.8) indicating the associated variant with the lowest significance in the region (rs2032134). The results for the remaining single nucleotide polymorphisms (SNPs) are color coded to reflect their degree of LD with the indicated SNP based on pairwise r values in AMR. The horizontal lines indicate the significant (solid line, P-value = 3.23 × 10−4) and suggestive thresholds (broken line, P-value = 6.46 × 10−3).

Table 4

In silico functional assessment of the fine mapping and selective sweep analyses top variants

Chr	SNP	Function	Nearest gene	eQTLs^a	C-HiC genes	PheWAS
6	rs2032134	Intergenic	RPP21/TRIM39	HLA-C, HCG18, HLA-E, HLA-G, TRIM10	TRIM31 in GM12878^b, macrophages^c and neutrophils^c	-
6	rs9261440	Intergenic	TRIM31/TRIM40	HLA-F, HLA-G, HLA-A, HLA-L, TRIM26, TRIM10	TRIM10 in GM12878^b	Hematological measurement, eosinphil count

eGENES in whole blood.

Lymphoblastoid cell line from Mifsud et al.

Macrophages and neutrophil from Javierre et al.

Queried databases were Open Targets Genetics, eQTLGen, HaploReg and Capture HiC Plotter.

Chr, chromosome; C-HiC, capture Hi-C; eQTL, expression quantitative trait loci; PheWAS, phenome-wide association study.

Association testing results and allele frequencies on imputed data for the admixture mapping associated region EA, effect allele; BP, base pair; CHR, chromosome; CI, confidence interval; OR, odds ratio; SNP, single nucleotide polymorphism. Regional plots of the results from admixture (top) and fine mapping (bottom) analyses, showing the −log10 transformed P-value in the y-axis and the hg19 genomic position in the x-axis. Top: Region centered on the significant variants for the AMR ancestry association. The horizontal line indicates the significance threshold (P-value = 3.07 × 10−6). Estimated recombination rates (light blue line) are plotted on the right y-axis. Bottom: Meta-analysis results from stages 1 and 2 within the admixture mapping significant region (chr6:30079993-30 332 160) and its proxies (r > 0.8) indicating the associated variant with the lowest significance in the region (rs2032134). The results for the remaining single nucleotide polymorphisms (SNPs) are color coded to reflect their degree of LD with the indicated SNP based on pairwise r values in AMR. The horizontal lines indicate the significant (solid line, P-value = 3.23 × 10−4) and suggestive thresholds (broken line, P-value = 6.46 × 10−3). In silico functional assessment of the fine mapping and selective sweep analyses top variants eGENES in whole blood. Lymphoblastoid cell line from Mifsud et al. Macrophages and neutrophil from Javierre et al. Queried databases were Open Targets Genetics, eQTLGen, HaploReg and Capture HiC Plotter. Chr, chromosome; C-HiC, capture Hi-C; eQTL, expression quantitative trait loci; PheWAS, phenome-wide association study. In order to assess adaptive signals within the significant admixture mapping region, we performed a selective sweep analysis in this region using a reference population. From this analysis, we found that the iSAFE scores were high; with the top-ranking variants corresponding to positions 30 087–30 102 Mb (iSAFE score ≥ 0.331). The top 25 variants ranked by the iSAFE score mapped within (n = 5; only rs28400887 was coding but synonymous) or nearby (n = 20) the TRIM31 gene, with the furthest SNP located at 21.1 Kb from the gene (Fig. 3). Given the excellent performance of iSAFE in prioritizing the most likely favored variant in 94% of the times (20), we performed a functional assessment of the variant with top scoring (rs9261440; iSAFE score = 0.340) assuming it to be the variant driving the selective sweep. According to Open Targets Genetics portal, rs9261440 is an eQTL for some HLA class I genes in whole blood, where the most significant were HLA-F (P-value = 5.4 × 10−143), HLA-G (P-value = 2.4 × 10−280) and HLA-A (P-value = 2.4 × 10−156), and also for the TRIM family member TRIM26 (P-value = 7.9 × 10−79). Additionally, HaploReg indicates that this variant is an eQTL of HLA-L and TRIM10 in whole blood (21). In agreement with this, phenome-wide association studies (PheWAS) data from Open Targets Genetics indicates that this variant associates with hematological measurements, particularly with the eosinophil count in the UK Biobank (P-value = 7.6 × 10−43). Selective sweep scan results for Native Americans (AMR) from the 1000 Genomes Project. Y and X axes represent the iSAFE scores and hg19 genomic positions, respectively. (A) iSAFE scores within the MHC region (chr6:28477895-33 389 603). Top ranking variants included in the significant admixture mapping region are highlighted in red (chr6:30079993-30 332 160). (B) Zoom in of the admixture mapping significant region, where the top 25 variants according to their iSAFE score are represented. Genetic variants are depicted in colored dots to reflect its LD with the variant with the highest iSAFE score based on pairwise r values in AMR population. Their dot size correlates to their composite variant to genes score (V2G) from Open Targets Genetics (https://genetics.opentargets.org/).

Discussion

We used an admixture mapping strategy to leverage varying local genomic ancestries in Colombians to identify loci associated with differential susceptibility to Chagas disease. We found significant associations of AMR and EUR ancestries within the MHC locus with the development of the infection, pointing out to the role of the immune response on the disease risk. Additionally, fine mapping assessments and a selective sweep scan of this region prioritized variants with potential functional implications in the disease and highlighting how powerful this strategy is for identifying regions that had been previously overlooked in other genomic studies. Several studies in bioarchaeological material confirmed that the American trypanosomiasis already existed long before European settlement when ancestral populations domesticated plants and animals in the process of sedentarization. This provided the vector with food availability and a more rapid domiciliation (22,23). Host–parasite co-evolution is considered one of the most important generators of biological diversity in the genome, as is the case of selective sweeps at loci with functional role in their interaction (24). It occurs when parasites trigger host adaptations, which will lead the parasite to adapt again to this new environment in their hosts (24). This co-evolution has been described for T. cruzi (3,25) and, based on the observed protective association of the AMR ancestry, one can speculate that the long-term exposure of ancestral populations with larger proportions of AMR ancestry may have provided a more efficient immune response to the parasite, as previously hypothesized, with an adequate defense against pathogens that are endemic to the New World (13). This response could be responsible for parasite clearance, precluding the establishment of a serological response. This ancestry-specific effective response against parasites and other infectious agents has been previously hypothesized to explain the enrichment of African alleles within the HLA-B locus in a Colombian population (12). Another study correlated the EUR and AFR ancestries with differential immune response to bacterial infections and hypothesized that this could be related to the differential pathogen exposure of each population and different selective pressures after the human migrations out of Africa (26). Local adaptations have shaped human genetic variation together with drift and migrations, and admixed populations are likely to have a larger number of genetic variants that have functional effects (13,27). Therefore, it is particularly striking that the selective sweep scan within the significant admixture mapping region revealed variants associated with eosinophil counts because eosinophils have cytotoxic functions to fight parasitic infections, and along with macrophages, monocytes and neutrophils, are the innate immune cells responsible for the control of the initial infection by T. cruzi (28,29). Interestingly, blood cell traits have been reported to differ among ancestries and are subjected to different selective pressures (30). Several genes of the TRIM family mapped within the significant admixture mapping region, and the top variant was located in the TRIM40 gene (Fig. 2). More importantly, the variants prioritized both by the fine mapping and the selective sweep scan are also eQTLs of TRIM genes in whole blood. The TRIM family is E3 ubiquitin kinases that play an important role in immune signaling pathways, and the expansion of this multigene family suggests their key regulatory role during the immune response against pathogens (31,32). When the pathogens are recognized by the immune system through the pattern-recognition receptors, several immune responses are initiated, including the production of interferons (IFNs), leading to the expression of TRIM proteins, among others (31). The upregulation of TRIM genes in response to IFN-γ has been reported in human monocytes and macrophages (33). The expression of IFN-γ is induced by interleukin 18 (IL-18), which is a pro-inflammatory cytokine mainly produced by macrophages (3). Remarkably, genetic variants of the gene encoding IL-18 have been associated with T. cruzi infection in previous candidate gene studies and showed suggestive association in a recently published GWAS (7,34,35). The fact that the top variant from the fine mapping analysis mapped in an intergenic region, made us speculate on its potential relation with nearby genes through an in silico functional analysis. This strategy suggested the significant correlation of this variant with the HLA-C and HLA-G expression in whole blood as eQTLs. HLA genes are well-known for their role in the modulation of the immune response during T. cruzi infection (6). A previous study associated the HLA-C*03 allele with the susceptibility to the chronic cardiac form of the disease in a Venezuelan population (36). Regarding the HLA-G, several alleles from the HLA-G 3′UTR region were tested in a Brazilian population, reporting evidence of association with different clinical forms of the disease (37). Additionally, lower gene expression and plasma concentrations of this gene have been described in the chronic phase of T. cruzi and Plasmodium falciparum infections (38). Further long-distance chromosome interaction analysis of this variant revealed a significant interaction with the promoter of TRIM31 in macrophages, neutrophils and the lymphoblastoid cell line GM12878. This gene is a member of the TRIM family that acts as a regulator of the NLR pyrin domain-containing 3 (NLRP3) expression (39). The NLRP3 is an inflammasome component that is well-known to be activated by molecular patterns associated with pathogens. The activation of the inflammasome is crucial for the control of intracellular protozoan parasitic infections, like T. cruzi and the production of nitric oxide during its acute infection in mouse models (40,41). Interestingly, TRIM31 has been also prioritized by the selective sweep scan as a locus under putative positive selection in the AMR population. There are some major limitations in our study that need to be taken into consideration. One of those limitations relates to the identification of the causal agent of the selective signal. The local adaptive signal maps within the MHC locus, which is a well-recognized target of selection. However, the causal factor underlying the selective sweep has been identified only in few instances in humans, such as skin pigmentation (42), lactase persistence (43) and adaptation to altitude (44). Therefore, we cannot infer nor guarantee that T. cruzi is the causal factor driving this putative adaptive signal. Another limitation is the limited statistical power, especially in the fine mapping, where we only had 80% power to detect variants with an allele frequency >20% and minimum effect allele of 1.4, which is fairly larger than expected for common variants in complex traits (8). Larger sample sizes would improve the statistical power of the study. Another limitation refers to the analyzed genetic variants, as only common SNPs were assessed in the fine mapping, and structural or less frequent genetic variants underlying the admixture mapping signal of Chagas susceptibility remain unexplored. Further insights concerning these types of genetic variants may be assessed by next-generation sequencing approaches. Moreover, despite the advantages of admixture mapping studies for identifying susceptibility loci, we are not in the position to warrantee if the results are generalizable to other populations with AMR ancestry, considering as well the variability of the infectious agent throughout the American continent. Finally, the scarce representation of AMR populations in reference databases reduces the precision to provide the exact functional implication of the associated variants, given the differences in LD structure among populations as well as other potential phenotypic features in different cell types. To our knowledge, this is the first admixture mapping analysis carried out in Chagas disease in the Colombian population. This assessment allows us to associate the AMR local ancestry at the MHC locus with protection from Chagas disease and highlights the role of the immune response during the acute phase of the infection.

Material and Methods

Ethical considerations

The protocols used in the study followed the Declaration of Helsinki principles and informed consent was obtained from all individuals included in the study design. The Industrial University of Santander and Cardiovascular Foundation (Colombia) Ethics Committee approved this study (Act No. 15/2005).

Study population and genotyping

All donors were recruited by the health care system from the Industrial University of Santander and Cardiovascular Foundation in the provinces of Guanentina, Comunera and Garcia Rovira, which are the provinces with the highest prevalence of Chagas disease in the Santander department in Colombia (45). We used a two-stage case–control design where individuals were classified as seropositive (cases) or seronegative (controls) for T. cruzi antigens according to an indirect hemagglutination commercial test (Chagatest, Wiener, Argentina) and enzyme-linked immunosorbent assays (Test ELISA Chagas III, Grupo Bios, Chile; Chagas ELISA IgG + IgM, Vircell, Spain). Samples from stage 1 comprised 1576 individuals (933 classified as cases and 643 as controls) as described elsewhere (46). Further sample recruitment composed stage 2, including 654 independent samples (122 cases and 532 controls) according to the same classification criteria. Genomic DNA isolation of blood samples was performed using the QIAamp Midi DNA Kit (QIAGEN, Germany) following manufacturer’s recommendations. All samples were genotyped with the Global Screening Array Platform (Illumina Inc., San Diego, CA, USA) as described elsewhere (7). As part of the quality controls (QCs) of genotyped data, individuals and variants with missing genotype rate > 5%, SNPs with different call rates between cases and controls (P-value < 0.05), and SNPs with large deviations from Hardy–Weinberg equilibrium (HWE) in the control group (P-value ≤ 1 × 10−6) was removed. QCs were performed using PLINK v.1.9 (47).

Admixture mapping analysis

Reference data from the 1KGP Phase 3 (48) was used as representatives of the parental populations in the downstream stage 1 global and local ancestry assessments. Briefly, we used data from AMR (n = 85), EUR (n = 503) and AFR (n = 504). Regarding the AMR population, only Peruvian from Lima (PEL) were included because this population has been described to have the highest proportion of AMR ancestry after the Maya population (49). All EUR subpopulations were taken into account (Utah residents, Finnish, British, Iberian and Toscani populations). In the case of AFR, both the African Caribbean from Barbados, and individuals with African ancestry from Southwest US were excluded as they are populations with recent admixture events (50). Using PLINK, we intersected the autosomal SNPs from stage 1 genotyped data (post QC 493 271 SNPs) with those from the 1KGP dataset, removing from the latter those variants with missing genotype rate >5% or with large deviations from HWE (P-value ≤ 1 × 10−6) in at least one population. After the intersection of datasets and QCs, a total of 471 342 SNPs were kept for further analysis.

Global ancestry assessment

Individual global ancestries were obtained with ADMIXTURE v1.3.0 (51), which calculates average ancestry proportions across the genome. We estimated the best number of ancestral populations (K) using a 10-time cross-validation with random seeds. The best fitting was obtained for K = 3, i.e. assuming 3 ancestral populations distinguishing AMR, EUR and AFR ancestries.

Local ancestry assessment and association testing

Local ancestries were estimated with ELAI v1.0 (52) assuming a three-way admixture and 10 generations since the last admixture event as has been indicated by previous studies (9,53). As QC steps, a correlation was calculated for individual global and averaged local ancestries across the genome using R v3.6.1. Principal components were calculated using PLINK and plotted along with those from the representative parental populations from 1KGP. For the admixture mapping, the three local ancestries by separate were tested for case–control association using the EMMAX mixed model (54) implemented in EPACTS (https://genome.sph.umich.edu/wiki/EPACTS). As related individuals were considered, EMMAX calculates the kinship matrix to include it as a covariate in the model, in addition to age and sex. We controlled for the type I error rate by calculating the λ using an in-house script for R v3.6.1. Significance was adjusted by the number of ancestry blocks across the genome and the number of generations since the admixture using the R package STEAM (9). Based on that, the significance threshold for the admixture mapping study was established at P-value < 3.07 × 10−6, similar to estimates that have been declared in independent studies in Latin American populations (9).

Fine mapping

A fine mapping assessment was performed to elucidate the most likely genetic variant(s) underlying the admixture mapping study results combining data from the two stages. Briefly, genotypic data from stage 2 were subjected to the same QCs that were used in stage 1. After this, imputation of both stages was performed with the Michigan Imputation Server using the admixed American population from 1KGP phase 3 as reference panel. Imputed variants were filtered by their minor allele frequency (MAF) and the imputation quality metric Rsq. Those variants satisfying both a MAF > 1% and Rsq > 0.3 were kept for the study. Association testing of imputed allele dosages was performed in both stages by separate using the EMMAX mixed model implemented in EPACTS, including age, sex and the kinship matrix as covariates. Summary-level statistics of each stage were meta-analyzed using METASOFT v2.0 (55). A random- or fixed-effect size meta-analysis was selected for each variant based on the results for the Cochran’s Q-test of heterogeneity. Significant and suggestive thresholds for the assessed region were established at P-value < 3.23 × 10−4 and P-value < 6.46 × 10−3, respectively, according to the estimates of GEC software (56) based on the LD structure from the stage 1 samples. In order to confirm the association of the specific allele with the local AMR ancestry, samples were stratified according to their ancestry punctuations and allele frequencies were recalculated in cases and controls separately. The statistical power of the fine mapping analysis was estimated using the Power Calculator for Two Stage Association Studies software CaTS (57).

Selective sweep analysis

Given the high rate of adaptive signals in the genome triggered by parasites, and that admixture serves as a mechanism driving adaptive evolution in humans (13,24), we used iSAFE (20) v1.0.4 to provide evidence of a selective sweep embedded in the admixture mapping region and to pinpoint the most likely favored variant. iSAFE exploits the evolutionary contributions hidden in the flanking regions surrounding the region under selection to provide a ranking of variants (iSAFE-score) based on their contribution to the overall signal of selection. For the analysis, we used 1KGP data from unrelated subjects from PEL population (n = 77) and from a random selection of 10% Yoruba individuals drawn from the reference (n = 91), which represents a non-target or outgroup population. iSAFE was executed enabling the IgnoreGaps flag and the default MaxFreq value (0.95). Ancestral fasta sequences for Homo sapiens (GRCh37) were downloaded from ENSEMBL release 75 (http://ftp.ensembl.org/pub/release-75/fasta/ancestral_alleles/).

In silico functional analysis

In silico functional analyses were performed to assess the biological consequences of the leading associated variants in the fine mapping and of those variants prioritized in the selective sweep analysis. The Open Targets Genetics portal (https://genetics.opentargets.org/) was used for functional annotation, to assess trait associations based on PheWAS and to retrieve the evidence of eQTLs in relevant tissues based on eQTLGen database (https://www.eqtlgen.org/). Additionally, long-distance physical interactions and regulatory genomic regions were considered using Capture HiC Plotter (58) and HaploReg v4.1 (59). Click here for additional data file.

59 in total

1. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.

Authors: Miao-Xin Li; Juilian M Y Yeung; Stacey S Cherny; Pak C Sham
Journal: Hum Genet Date: 2011-12-06 Impact factor: 4.132

Review 2. Host-parasite co-evolution and its genomic signature.

Authors: Dieter Ebert; Peter D Fields
Journal: Nat Rev Genet Date: 2020-08-28 Impact factor: 53.242

3. Genome-wide Significance Thresholds for Admixture Mapping Studies.

Authors: Kelsey E Grinde; Lisa A Brown; Alexander P Reiner; Timothy A Thornton; Sharon R Browning
Journal: Am J Hum Genet Date: 2019-02-14 Impact factor: 11.025

Review 4. Chagas disease.

Authors: José A Pérez-Molina; Israel Molina
Journal: Lancet Date: 2017-06-30 Impact factor: 79.321