Literature DB >> 30100616

Fine-mapping analysis of a chromosome 2 region linked to resistance to Mycobacterium tuberculosis infection in Uganda reveals potential regulatory variants.

Robert P Igo¹, Noémi B Hall¹, LaShaunda L Malone², Jacob B Hall^1,3, Barbara Truitt¹, Feiyou Qiu¹, Li Tao⁴, Ezekiel Mupere^1,5,6, Audrey Schnell¹, Thomas R Hawn⁷, William S Bush¹, Moses Joloba⁸, W Henry Boom^2,5, Catherine M Stein^9,10,11.

Abstract

Tuberculosis (TB) is a major public health burden worldwide, and more effective treatment is sorely needed. Consequently, uncovering causes of resistance to Mycobacterium tuberculosis (Mtb) infection is of special importance for vaccine design. Resistance to Mtb infection can be defined by a persistently negative tuberculin skin test (PTST-) despite living in close and sustained exposure to an active TB case. While susceptibility to Mtb is, in part, genetically determined, relatively little work has been done to uncover genetic factors underlying resistance to Mtb infection. We examined a region on chromosome 2q previously implicated in our genomewide linkage scan by a targeted, high-density association scan for genetic variants enhancing PTST- in two independent Ugandan TB household cohorts (n = 747 and 471). We found association with SNPs in neighboring genes ZEB2 and GTDC1 (peak meta p = 1.9 × 10-5) supported by both samples. Bioinformatic analysis suggests these variants may affect PTST- by regulating the histone deacetylase (HDAC) pathway, supporting previous results from transcriptomic analyses. An apparent protective effect of PTST- against body-mass wasting suggests a link between resistance to Mtb infection and healthy body composition. Our results provide insight into how humans may escape latent Mtb infection despite heavy exposure.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2018 PMID： 30100616 PMCID： PMC6374218 DOI： 10.1038/s41435-018-0040-1

Source DB: PubMed Journal: Genes Immun ISSN： 1466-4879 Impact factor: 2.676

Introduction

Tuberculosis (TB) is one of the most devastating communicable diseases in world health, with approximately 10.4 million incident cases and 1.4 million deaths from TB in 2015[1]. About nine in ten people initially infected with Mycobacterium tuberculosis (Mtb), however, do not go on to develop active disease. Hence, TB pathogenesis follows a two-step process[2], starting with initial infection, (latent Mtb infection, LTBI) diagnosed by the tuberculin skin test (TST) or interferon-gamma release assays (IGRA), and progression to symptomatic disease in a subset of infected persons. While a role for human genetic susceptibility to TB has been well-established[3, 4], genetic susceptibility to Mtb infection has been less studied. Genomewide analyses of TB, whether for genetic linkage[5-9] or, more recently, for association[10-17], have not provided consistent evidence for major TB susceptibility loci. Difficulty of replication may stem in part from the clinical definitions used for TB[18]. Few studies have examined latent Mtb infection[7, 17, 19–21]; yet, while few, these studies reveal consistency in genomic loci identified. Within the framework of household contact study[22, 23], we have focused on the persistent TST negative (PTST–) phenotype, which measures relative resistance to Mtb infection over an extended period of time, despite heavy exposure within the household[24]. The hypothesis that this resistance may be influenced by innate and adaptive immune factors is an ongoing area of investigation[25-27]. Studying resistance may reveal genetic insights into mechanisms underlying TB pathogenesis[16, 21]. Resistance to latent Mtb infection has particular relevance to the design of a preinfection vaccine[26], since reducing the pool of latently infected individuals will reduce the incidence of TB. Analysis of PTST– individuals could identify critical biologic mechanisms underlying resistance to Mtb infection. We previously found evidence for genetic linkage of regions on chromosome 2q and 5p with the PTST– phenotype[7]. We then found genetic association with the PTST– phenotype at an existing candidate locus, SLC6A3, within the chromosome 5 region[28] that had been identified by another group examining TST reactivity[17, 19, 20]. Now, we present a fine-mapping association scan for PTST– across the other major linkage region, on chromosome 2q, with two independent samples from Ugandan households ascertained through an index case with TB[22, 23, 29]. Heritability analysis suggested that approximately half of the genetic variation in PTST– is due to loci on chromosome 2q. A meta-analysis combining the two samples identified two loci of interest, one linked to histone de-acetylase regulation, and another linked to body mass composition, uncovering new biologic pathways underlying resistance to Mtb infection.

Results

Sample Description

Our study sample comprises two independently-recruited cohorts of Ugandan households ascertained through a proband with active TB (summarized in Table 1)[22, 29]. Sample 1 (n = 165 active TB, 501 LTBI, 81 PTST–) was genotyped for a fine-mapping panel with markers spaced approximately every 10 kb across the chromosome 2 linkage peak observed in an overlapping sample (overlap n = 103 TB, 277 LTBI and 45 PTST–)[7], and subsequently imputed to the Illumina HumanOmni5 panel. In addition, genotyping for haplotype tagging SNPs from several candidate loci that may affect resistance to Mtb infection[28] was performed. Sample 2 (n = 201 TB, 237 LTBI, 33 PTST–), genotyped for the Illumina HumanOmni5 BeadChip (average marker distance ~ 600 bp), includes a greater proportion of participants with active TB and a smaller proportion of PTST–. In both samples, PTST– are, on average, much younger than non-PTST–, but did not differ significantly in sex ratio or in proportion of HIV+ individuals.

Table 1

Characteristics of the two Ugandan PTST– samples.

	PTST–	Non-PTST–	Total	p
Sample 1
n	81	666	747	—
Active TB	0 (0.0%)	165 (24.8%)	165 (22.1%)	—
Female	38 (46.9%)	276 (41.4%)	314 (42.0%)	0.41
Age, y	9.3 ± 8.8	17.8 ± 13.5	16.9 ± 13.3	< 0.001
HIV+	4 (5.4%)	80 (13.0%)	84 (12.2%)	0.061
Sample 2
n	33	438	471	—
Active TB	0 (0.0%)	201 (45.9%)	201 (42.7%)	—
Female	16 (48.5%)	215 (49.1%)	231 (49.0%)	0.99
Age, y	11.5 ± 12.5	21.6 ± 13.4	20.9 ± 13.5	< 0.001
HIV+	3 (9.4%)	56 (12.9%)	59 (12.7%)	0.78

Values are presented either as n (% of total sample) or as mean ± SD. Non-PTST–, LTBI plus active TB; p, p value for test of differences between PTST– and non-PTST– individuals, by 2 × 2 χ2 test for sex, Wilcoxon rank-sum test for age, and Fisher’s exact test for HIV status.

Genetic Association Analysis

We focused on SNPs that showed association in both samples, thus demonstrating internal replication. We tested genetic association in both samples by means of logistic regression, with adjustment for relatedness, and combined results for markers tested in both cohorts, after correction for population structure by genomic control (Supplementary Figure 1; see Supplementary Table 1 for complete results; the most significantly associated SNPs within each sample are listed in Supplementary Tables 2 and 3).

ZEB2/GTDC1 association peaks

The leading combined association result, for rs7568133 (145.2 Mb; ORmeta = 2.12, 95% CI = (1.50, 3.00) for the A allele; p = 1.9 × 10−5; Figure 1A, Table 2), follows from nominally significant associations in both study samples (p = 0.00062 and 0.0085 in Samples 1 and 2, respectively) (Table 2; Figures 1 and 2). The effect of this variant is consistent between Samples 1 (OR = 2.00, 95% CI = [1.35, 2.98]) and 2 (OR = 2.54, 95% CI = [1.27, 5.11]; p value from Cochran’s Q test for heterogeneity = 0.56), and the minor allele frequencies are similar (0.495 vs. 0.478 in Samples 1 and 2, respectively). rs7568133 falls within the large intron 2 of the DNA-binding transcriptional repressor gene, zinc finger E-box-binding homeobox 2 (ZEB2; Figure 2A). This variant alters several potential DNA-binding motifs, and is an enhancer mark in primary monocytes, but is not listed as an expression quantitative trait locus (eQTL) in the GTEx database (Supplementary Table 4).

Figure 1

Manhattan plots of association results from (A) the meta-analysis, (B) Sample 1 and (C) Sample 2. Genotyped and imputed markers are represented as black and blue dots, respectively.

Table 2

Most significant meta-analysis results.

				Sample 1					Sample 2				Meta-analysis

Marker	Gene	Position	Alleles	RAF	Info	OR	95% CI	p	RAF	OR	95% CI	p	OR	95% CI	p
rs2028211	BIN1	127,901,657	C/A	0.1460	0.78	1.43	(0.80, 2.53)	0.22	0.1470	3.92	(2.18, 7.07)	5.4E-06	2.33	(1.55, 3.52)	5.3E-05
rs7568133	ZEB2	145,204,976	A/G	0.4952	0.82	2.00	(1.35, 2.98)	0.00062	0.4775	2.55	(1.27, 5.11)	0.0085	2.12	(1.50, 3.00)	1.9E-05
rs10169306	AC023469.1	151,926,472	A/G	0.0635	0.84	1.57	(0.84, 2.95)	0.16	0.0687	4.26	(2.24, 8.11)	1.0E-05	2.56	(1.63, 4.02)	4.3E-05
rs58110523	FMNL2	153,292,235	C/T	0.0486	0.84	2.85	(1.43, 5.66)	0.0028	0.0483	3.71	(1.48, 9.29)	0.0052	3.13	(1.80, 5.42)	4.8E-05
rs74762979	ARL6IP6	153,804,898	G/A	0.0957	0.88	1.88	(1.07, 3.31)	0.028	0.0763	5.25	(2.39, 11.55)	3.7E-05	2.66	(1.68, 4.22)	2.9E-05
rs114101795	ARL6IP6	153,825,388	A/G	0.0957	0.88	1.88	(1.07, 3.30)	0.028	0.0787	5.46	(2.45, 12.13)	3.2E-05	2.68	(1.69, 4.25)	2.8E-05
rs79513402	ARL6IP6	153,829,134	C/T	0.0957	0.88	1.88	(1.07, 3.30)	0.029	0.0774	5.29	(2.40, 11.66)	3.6E-05	2.66	(1.68, 4.22)	2.9E-05
rs78089492	AC092684.1	164,826,751	G/A	0.0737	0.92	2.54	(1.43, 4.52)	0.0015	0.0655	2.56	(1.27, 5.15)	0.0082	2.55	(1.64, 3.98)	3.6E-05

Gene, gene that contains or is nearest to marker; Alleles, effect/other allele, where the reference is the minor allele in the sample; RAF, reference allele frequency; Info, IMPUTE2 information quality score.

Figure 2

LocusZoom plots of the chromosome 2 region surrounding rs7568133, for (A) the meta-analysis, (B) Sample 1 only and (C) Sample 2 only. In A and B, genotyped and imputed markers are represented by squares and circles, respectively. In B, markers selected for haplotype-based analysis (see Online Resource 1, Supplementary Figure 2) are marked with asterisks. The LD structure shown is that of the 1000 Genomes 2014 AFR population.

The overall region of association at ZEB2 is primarily driven by strong associations in Sample 1 (Figure 2B), and extends about 200 kb to overlap with the glycosyltransferase-like domain-containing 1 (GTDC1) gene (Figure 2). Three markers within the association peak are among the five most significantly associated markers for Sample 1 (Supplementary Table 2). Of these, rs13390689 and rs79319398 are in introns: GTDC1 intron 3 and ZEB2 intron 2, respectively. Of particular interest is rs79319398, which lies within a DNA region associated with numerous enhancer histone-modification sites in monocytes, neutrophils, and B- and T-lymphocytes, and is predicted to affect three regulatory motifs (Supplementary Table 4). This variant also has a CADD score of 14.59, placing it in the top 3.5% of all possible genetic variants for deleteriousness. Though not within a gene, rs7580080, 7.8 kb downstream of ZEB2, is also connected in epigenomic studies to histone modification marks in monocytes, neutrophils and hematopoietic stem cells, and has a CADD score of 17.66 (Supplementary Table 4). The association peaks in ZEB2 and GTDC1 appear to be independent. Associated variants in this region (Figure 2B, marked with asterisks) mostly lie within a single LD block within GTDC1 (Supplementary Figure 2); however, rs7568133, is independent of this block and in only weak LD with one other associated SNP in ZEB2, rs79319398. A conditional association analysis of variants in the GTDC1/ZEB2 region, in which allele dosages of rs7568133 were included as a covariate, confirmed the independence of this SNP from the associated SNPs in GTDC1, in that adjusting for the rs7568133 genotype did not reduce the significance of association of the GTDC1 markers by more than an order of magnitude (data not shown). Haplotypes of the 11 SNPs in the LD block, which are highly correlated (r2 near 1.0), and of two SNPs in ZEB2, rs7568133 and rs79319338, which are in complete LD (D′ = 1) but not strongly correlated (Supplementary Figure 2), were tested for association with PTST– in Samples 1 and 2, but were not found to be more significantly associated than the best single markers (data not shown). Because HIV status is a potential confounder for TST results, we conducted a sensitivity analysis in which HIV+ individuals were omitted. Although the p values for some of the highly associated SNPs in Table 2 were slightly less significant, most likely on account of the smaller number of available individuals, the ORs were very similar (data not shown), indicating that HIV status is not a major cause of misclassification.

FMNL2/ARL6IP6 association peak

In addition, several SNPs in ADP ribosylation factor-like 6 interacting protein 6 (ARL6IP6) were associated in both samples and had meta p-values < 3x10−5 (Table 2), although evidence of association was stronger from Sample 2 (Supplementary Table 3). However, the CADD scores for these SNPs were not notable, and none of them was listed in the GTEx database as an eQTL (Phred-scaled score < 3; Supplementary Table 4). Because the SNPs were not potentially pathogenic, and also because this gene did not have a potential connection to Mtb biology, it was not considered further as a candidate gene. Other strongly associated markers (p < 10−4) from the meta-analysis have greatest support from Sample 2 (Table 2). Only one, rs58110523, is within a gene (intronic to FMNL2 p < 0.01 in both individual samples), but resides in a region with very little association in Sample 1 (Figure 1). These variants, like the index variant rs7568133, change multiple binding motifs but are not linked to many epigenetic marks or transcription factor binding sites (Supplementary Table 4).

Regional heritability analysis

From our variance components analysis for region-specific heritability, we estimated that the chromosome 2 linkage interval explains 9.22% of PTST– risk in Sample 2 (SE = 5.59%; one-sided p = 0.044). The remainder of the autosomal genome explains 12.5% of the overall risk (SE = 6.17%; one-sided p = 0.0084); thus, the whole genome accounts for 21.7% of the risk.

Body mass composition

Because body-mass wasting is correlated to Mtb infection susceptibility[30-34], and because GTDC1 has been associated with obesity-related phenotypes in a previous genomewide methylation scan[35], we examined the relationship between body mass parameters and PTST– status in our cohort. A smaller proportion of PTST– participants displayed evidence of body-mass wasting than non-PTST– participants by all three measured criteria: BMI, lean mass and fat mass (Table 3). However, only the lean-mass measurement showed a statistically significant difference (5.1% of PTST–s vs. 17.9% of non-PTST–s; p = 0.039). The non-PTST– subjects have a higher prevalence of lean-mass wasting than the PTST– subjects.

Table 3

Body mass wasting in PTST– and non-PTST– Ugandan individuals.

	PTST–	Non-PTST–	p
BMI	6 (12.2%)	322 (21.0%)	0.14
Lean mass	2 (5.1%)	218 (17.9%)	0.039
Fat mass	8 (20.0%)	301 (24.6%)	0.51

Criteria for wasting were body-mass index (BMI) < 18.5 kg/m2, fat mass index < 1.8 kg/m2 for men and < 3.9 kg/m2 for women, and lean mass index < 16.7 kg/m2 for men and < 14.6 kg/m2 for women[30]. All tested participants were HIV-negative aged 15 years or older. Values are shown as N (%). p, p value by Fisher’s exact test; values below 0.05 are italicized.

Discussion

We conducted a fine-mapping study exploring genetic variation associated with the PTST– phenotype in two Ugandan household samples over a segment of chromosome 2q with previous evidence for genetic linkage[7]. We measured disproportionate overall heritability attributable to the region, and more specifically, associated markers in both cohorts and through meta-analysis with the PTST– phenotype. Even though the risk for PTST– attributable to the chromosome 2 linkage region was only borderline significant (p = 0.044), this 51-Mb segment accounted for approximately 9% of the overall risk for, PTST– and more than 40% of the total genomic risk. These results support the hypothesis that at least one major locus underlying PTST– lies in this chromosomal region. The most significant association result from the meta-analysis, rs7568133, implicates the genes ZEB2 and GTDC1 on 2q22.3. Though this result falls just short of regionwide significance (ca. 8 × 10−6), this marker is associated in both samples with p < 0.01 and has good agreement in effect size. rs7568133 alters five potential regulatory motifs, and thus, although it is not upstream of either gene, it may function in regulation of gene expression. ZEB2 contains a binding motif that potentially disrupts histone deacetylase 2 (HDAC2)[36], a gene implicated by gene-set enrichment analysis of differences in transcriptional response to Mtb infection by monocyte-derived macrophages from PTST– and non-PTST– individuals[37]. The macrophage has a central role in Mtb pathogenesis, from recognition to killing, a key component of the innate immune response thought to influence PTST–[25-27]. Thus, our results suggest that genetic variation in macrophage response may influence resistance to Mtb infection. Several SNPs with strong association in Sample 1 occur within enhancer histone marks and DNaseI-hypersensitivity sites found in numerous types of immune-system cells, implying that ZEB2 may be under active transcription in these cell types. Moreover, three of these variants have CADD scores greater than 14 (Supplementary Table 1), suggesting that these variants may truly be pathogenic. Together, these findings support a role for the HDAC innate immunity pathway in relative resistance to Mtb infection that may be genetically regulated. The nearby gene, GTDC1, is involved with obesity and lipid metabolism. This gene may be of interest for TB pathology because resistance to Mtb infection is correlated with maintenance of body weight. Several previous studies reported that body mass composition is both a risk factor for development of active TB as well as for the speed of recovery from active TB[30-34]. Here, we examined for the first time whether body composition was associated with resistance to Mtb infection. Body composition results (Table 3) show a significant decrease in lean mass body-mass wasting in PTST– vs. non-PTST–, despite the modest sample size. This leads to the hypothesis that GTDC1 is a risk locus for lean mass wasting which in turn influences risk for Mtb infection. The ideal way to explore such a hypothesis is through Mendelian randomization analysis, which we are unable to perform in this dataset because there is not good overlap in the data with individuals having both genotype and bioelectrical impedance data. This will be the subject of future research. Hypocholesterolemia, a consequence of body-mass wasting, may increase susceptibility to Mtb infection through reduced activity of macrophages[38]. Moreover, previous studies in mice suggest that hypercholesterolemia, whether induced by a high-cholesterol diet or by knockout of apolipoprotein E (ApoE), impairs the immune response to Mtb infection, with much greater susceptibility in ApoE−/− mice[39, 40]. In contrast, hypercholesterolemic mice lacking LDL-R did mount a robust immune response to Mtb, although, like the ApoE−/− mice, the inflammatory response to Mtb was destructively exaggerated[40, 41], and statin drugs appear to increase resistance of human macrophages to Mtb infection[42]. Finally, methylation of GTDC1 was found to be associated with waist circumference in a European American cohort, but the result was not successfully replicated[35]. The chromosome 2 region featured in the present study has also been recently replicated in its association with Mtb infection in a cohort of HIV-infected individuals[21]. There, a different extreme phenotype approach was taken, by focusing on individuals that were especially susceptible to Mtb infection because they were immunosuppressed and living in TB-endemic settings. In addition, the associated SNPs from this analysis explained the original linkage result. This, in combination with our region-specific heritability estimate, provides evidence for at least one associated locus in this region. rs7568133 is 14 Mb from the major 2q linkage peak for the PTST– phenotype reported earlier[7], with greatest LOD score at microsatellite marker ATA27H09 (D2S1353, 2q24.1 at chr2:159,558,931–159,559,082), and a secondary LOD score peak at GATA4E11 (D2S410, 2q14.1 at chr2:116,240,929–116,241,085). Six of the eight most significantly associated markers from the meta-analysis are within 10 Mb of ATA27H09, suggesting that the linkage signal was not caused by a single genetic variant of large effect. This investigation has several limitations and strengths. First, the power is restricted by the number of available PTST– individuals. Family relationships within the sample reduce the number of effective independent individuals and family-based association requires a more complex association test. Together, these constraints prevent detection of causal variants with uncommon alleles (frequency < 0.05) unless they are of large (quasi-Mendelian) effect. Next, the use of Sample 2 HumanOmni5 genotypes as a reference for imputation of Sample 1 untyped variants potentially compromises the independence of the two samples. However, the differences in the major results from the individual samples (Supplementary Table 1; Supplementary Table 2) suggests that the induced correlation, if any, was slight. Our earlier report[23] shows a difference between level of exposure to TB index cases and PTST– vs. LTBI; however, this association is limited to children aged 5 to 15. Finally, the study samples are different in two respects: the average age for both PTST– and non-PTST– individuals is greater in Sample 2, and more non-PTST– have active TB in the Sample 2. These limitations attributable to small sample size are partly due to the observational nature of the study, whereby some subjects were lost to follow-up prior to the end of the 2-year observation period, therefore excluding them from analysis. In conclusion, we observed an association between PTST- and ZEB2, further supporting a role for differential regulation of the HDAC pathway in individuals resistant to Mtb infection. These variants are likely functional based on high CADD score and presence of enhancer histone marks. Evidence for GTDC1 was weaker, but further suggests a role for body composition in differential trajectories in TB pathogenesis. Deep resequencing, replication, and functional studies are needed to clarify the roles of these genes in Mtb infection.

Materials and methods

Study samples and phenotypes

All procedures performed in studies involving human participants were in accordance with the principles of the Declaration of Helsinki. All study protocols were reviewed and approved by the National HIV/AIDS Research Committee, the Uganda National Council of Science and Technology, and the institutional review board at the University Hospitals Case Medical Center, Cleveland, OH, USA. Informed consent was obtained from all participants. Participants in Sample 1, the initial sample for fine mapping, were recruited as reported previously[7, 23, 28]. Briefly, index cases with culture-positive pulmonary TB and their household members were enrolled and evaluated for TB symptoms and reactivity to TST. Participants were classified as PTST– if they tested TST– at recruitment and remained TST– over 24 months of follow-up. Because the TSTs were at least 3 months apart, boosting was unlikely to increase the chances of observing a TST conversion, as we demonstrate elsewhere[23]. The sample after quality control totalled 747 individuals with a PTST– phenotype[28]. This sample overlapped with the sample previously studied by linkage analysis[7]: 360 individuals belonged to both samples. Individuals in Sample 2, the follow-up sample, were recruited later in the same study, but are independent from Sample 1. A total of 471 individuals from Sample 2 passed quality controls (see below). HIV-negative individuals at least 15 years of age were measured for body-mass wasting by three related measures: body-mass index (BMI) and its two components, fat mass index (FMI) and lean mass index (LMI)[43]. The overall sample for body-mass composition comprised 232 PTST– and 1553 non-PTST– individuals from the household contact study[22, 23], including 236 individuals genotyped in Sample 1 (41 PTST–, 195 non-) and 253 individuals in Sample 2 (23 PTST–, 230 non-) Not all individuals had available measurements for all three measures. FMI and LMI were estimated by means of bioeletrical impedance analysis[30]. Criteria for wasting were body-mass index (BMI) < 18.5 kg/m2, fat mass index < 1.8 kg/m2 for men and < 3.9 kg/m2 for women, and lean mass index < 16.7 kg/m2 for men and < 14.6 kg/m2 for women[30]. The datasets analyzed in the current study are not publicly available, because the Ugandan participants did not consent to broad data sharing. However, individual-level data may be requested through a data access committee, chaired by Dr. Sudha Iyengar (ski@case.edu). All genetic association results (summary statistics) are available in Supplementary Table 1 (Online Resource 2).

Genotypes and quality control (QC)

The first phase of the study, conducted on Sample 1, focused on fine mapping a genomic region previously implicated by linkage analysis, 146–176 Kosambi cM on chromosome 2q[7]. We selected single-nucleotide polymorphisms (SNPs) within map position range chr2: 116,623,530–170,141,754, in GRCh37 coordinates, to cover the 1-LOD support interval (an approximate 95% confidence interval for location) underneath the linkage peak for the PTST– phenotype, at approximately 10-kilobasepair (kb) intervals for genotyping by means of the Illumina (San Diego, CA) iSelect platform. One informative SNP (minor allele frequency (MAF) ≥ 0.1) was selected within each 10-kb window with maximum Illumina assay design score. Of 4,672 SNPs attempted, 3,626 were successfully genotyped on Sample 1 and processed using Illumina Genome Studio, and 3,478 passed marker QC (call rate ≥ 0.9, minor allele frequency ≥ 0.005, p > 10−6 from exact test of deviation from Hardy-Weinberg proportions (HWP) in unrelated subjects, as tested by PLINK [44]). Sample QC for the primary analysis has been described[28]. Samples with call rate < 0.95 over the fine-mapping panel were omitted (total n = 34), as were all Mendelian incompatible genotypes within families. DNA samples in Sample 2 were typed for 4,310,364 markers on the Illumina HumanOmni5 Beadchip, version 1.0. Genotypes were called using Illumina GenomeStudio. Analysis was restricted to the region of chromosome 2 genotyped for Sample 1. Samples were required to have call rate ≥ 0.98, and samples with 10th percentile of GenCall scores < 0.42 over all markers passing initial QC (call rate ≥ 0.90, p > 10−6 for deviation from HWP) were subject to manual inspection of fluorescence intensity data (B allele frequency) plotted against map position of at least one autosome. Before analysis, markers were subject to a more stringent QC (call rate ≥ 0.98, MAF ≥ 0.01). Genetic sex was verified by means of X-chromosome heterozygosity and percentage of successfully called Y-chromosome genotypes. Relationships and unintentional (non-)duplicates were checked by means of PLINK’s --genome function[44] applied to a sparse set (pairwise r2 < 0.1 between markers) of common polymorphisms (MAF ≥ 0.05). Unreported relationships more distant than second-degree were classified as unrelated. We augmented the fine-mapping marker panel for Sample 1 by imputation[45]. Because none of the 1000 Genomes Phase 3 populations is respresentative of our Ugandan genomes, we used a set of Ugandan genomes typed for the HumanOmni5 panel, including Sample 2 as a subset, as a reference for imputation. Haplotypes of 44,542 common variants (MAF ≥ 0.5%) spanning the fine-mapping region ± 500kb were determined by means of SHAPEIT2[46], including the available parent-offspring duos and trios for more accurate phasing. Haplotypes from a subset of 302 unrelated individuals from the HumanOmni5-genotyped sample composed the reference data set for imputation into the discovery cohort (n = 892, including some without a PTST– phenotype). Genotypes from the Sample 1 fine-mapping panel were prephased using SHAPEIT2 before imputation in 5-Mb segments with 500 kb overlap using IMPUTE2[45]. Imputation yielded an augmented panel of 40,335 SNPs with IMPUTE2 imputation quality score[47] ≥ 0.5. We carried out a principal components analysis (PCA) on the 471 Sample 2 individuals passing QC to detect ancestry outliers and to correct for population structure during association analysis (see below). A genome-wide panel of 160,884 common (MAF ≥ 0.05) independent (pairwise r2 < 0.1) variants passing marker QC from the HumanOmni5 panel was chosen, excluding four genomic regions with extensive linkage disequilibrium, which can create artifactual principal components (PCs)[48]: Chr. 2, 135–137 megabasepairs (Mb) (lactase gene LCT), Chr. 6, 27–35 Mb (HLA region), Chr. 8, 6–16 Mb (inversion polymorphism), and Chr. 17, 40–45 Mb (extensive LD in admixed populations). We calculated principal components (PCs) by two different methods: first, using EIGENSOFT[49], which assumes that all individuals are unrelated; and second, using PCAiR[50], which performs PCA on an optimal unrelated subset of the sample and which uses genotype loadings to project PCs for relatives. Although the PCAiR approach is more valid for family data, we used EIGENSOFT PCs in association analyses because they resulted in a smaller genomic control (GC) parameter value (see below). To confirm African ancestry, a second PCA was conducted with addition of 119 unrelated individuals from the HapMap CEU, YRI, CHB and JPT samples, using a panel of 130,718 markers in common between the Omni5 PCA panel described above and the 1000 Genomes Phase 1, version 3 data set.

Statistical methods

Association analysis on the imputed Sample 1 genotype data was conducted by means of logistic regression, using the generalized estimating equations (GEE) model implemented in the gee package in R to allow for correlations within families. The number of minor alleles from genotyped markers, or the allele dosage data (expected number of minor alleles) from imputation, were used as a genotype predictor under an additive model (on the logarithmic scale). The “exchangeable” correlation structure was specified, in which all relatives within a family were assumed equally correlated; if this model failed to converge to a stable estimate, the “independence” structure was used, which provides a still valid, albeit less powerful, approach. For this study, because of the limited sample size and the complexity of the statistical model, families were defined by grouping individuals connected by first-degree relationships; within-household correlations owing to common household environment, and correlations between more distant family members, were not modeled. Only imputed SNPs with MAF ≥ 0.03 were tested for association, after it was discovered that the GEE algorithm had difficulty converging with some of the rarer imputed variants, even under the “independence” correlation structure, whereas models on genotyped SNPs with MAF ≥ 0.01 converged well. We chose GEE for association analysis, instead of a generalized linear mixed model (GLMM) adjusting for all relationships, not only because of model complexity but also because there were very few second- and third-degree relative pairs in either sample, and because complex correlations due to sharing the same household, the same bed, etc., are difficult to model by GLMM but are estimated from the data by GEE. With only a regional marker map, we were unable to assess inflation of test statistics by means of the genomic control parameter λ[51]. However, the value of λ over the region of linkage was only 1.007, and the quantile-quantile plot of the results is consistent with no genome-wide inflation (see Results). Because we expect this region to harbor truly associated variants, we are confident that the type I error was well controlled. Following Sobota et al.[52], we estimated the number of effective independent tests by isolating a set of low-dependence markers, using PLINK’s --indep-pairwise function with an r2 threshold of 0.2. This approach yielded 6,246 effective independent tests for a nominal p value threshold of 8.0 × 10−6 for regionwide significance of 0.05; and for the Omni5 marker set, 6,103 independent tests for nominal p = 8.1 × 10−6. Association analysis on Sample 2 was carried out in similar fashion, except that the fourth PC from the EIGENSOFT PCA was included as a predictor to adjust for population structure. The first 20 PCs were evaluated for association with the PTST– phenotype in Sample 2. PCs 3 and 4 were found to be significant when included singly, but in the presence of PC 4, PC 3 had a nonsignificant effect, and thus only PC4 was included in association analysis as a covariate. A sparse genome-wide scan of about 10,000 SNPs from the Sample 2 Omni5 panel, excluding regions implicated in TB susceptibility (the HLA region, but also all genes mentioned in two previous reports[28, 53]) was conducted to obtain an estimate of the genomic control (GC) parameter λ for genome-wide inflation of test statistics[51]. Because λ from the final analysis was greater than 1.05, we corrected association p values for genome-wide inflation by the method of Bacanu et al.[54]. It was uncertain what was causing the overall inflation of test statistics. One possibility was that inflation was caused by including markers with low minor allele counts, but the value of λ was not reduced by increasing the minimum MAF to several values from 0.03 to 0.10. Second, we compared λ from association analyses adjusting for PC 4 from EIGENSOFT, and adjusting for four PCs from PCAiR that were significantly associated with PTST–, and found that adjusting for EIGENSOFT PC4 resulted in a smaller value of λ[54]. We conducted meta-analysis by the inverse-variance-weighted fixed-effect method, and calculated Cochran’s Q statistic and I2 to assess effect heterogeneity between the two samples, using a custom script for the statistical software R. Haplotypes of 13 SNPs in genes GTDC1 and ZEB2 were determined for both Samples 1 and 2 by means of SHAPEIT2, with the --duohmm option to make use of parent/child relationships. Haplotypes from two sets of SNPs, a set of 11 and a set of two, showing linkage disequilibrium were used for haplotype-based association analysis. Best-guess haplotypes from these two sets were counted, and haplotypes with sample frequencies between 3% and 20% were tested for association with PTST– vs. all other pooled haplotypes, under the same GEE regression model used for single-marker association testing. We used the restricted maximum likelihood (REML) estimation approach in the GCTA software package[55] to partition genetic variance in Sample 2 explained by the specific region on Chr. 2 (position 117,911,357 to 168,853,091) and, separately, all other genotyped SNPs across the genome. Briefly, we filtered SNPs (excluding SNPs with call rate < 0.95 and minor allele frequency < 0.05), generated separate genetic relationship matrices for the region on Chr. 2 and the rest of the genome, then performed REML estimation using the expectation maximization fitting method to estimate the proportion of “risk” for being PTST explained by each of the two genetic partitions (i.e., region on Chr. 2 & all other SNPs). REML analysis was adjusted for age, sex, and HIV status. Differences in proportions in body-mass wasting between PTST– and non-PTST– subsets in the sample measured for body mass composition were evaluated by Fisher’s exact test.

Annotating strongly associated variants

We explored the likely effects of genetic variants with the most significant association results with information from several well-known databases. We obtained Combined Annotation-dependent Depletion (CADD) scores[56], a measure of deleteriousness based on evolutionary conservation and on numerous measures of regulatory importance and predicted protein effects, from the CADD Web site (http://cadd.gs.washington.edu/). We report CADD scores as PHRED-like scores, in which a score of 10x indicates pathogenicity within the top 100 × 10− percent of possible variants genome-wide. We extracted specific information on chromatin structure, effects on DNA regulatory motifs and association results from other GWAS and expression quantitative trait locus (eQTL) studies from the HaploReg v4.1[57] Web site (http://archive.broadinstitute.org/mammals/haploreg/haploreg.php), specifying the ChromHMM (Core 15-state model) algorithm for chromatin structure determination. We searched the GTEx database[58] for prominent PTST– associated markers for evidence of eQTL activity in 53 human tissues. Finally, we acquired a measure of overall evidence for a regulatory role from RegulomeDB ([59]; http://www.regulomedb.org/). Supplementary Materials (Microsoft Word format, .doc): Supplementary Figure 1. Quantile-quantile plots of analyses for (A) Sample 1 and (B) Sample 2. Supplementary Figure 2. Linkage disequilibrium structure of markers selected for haplotype-based association analysis in the GTDC1/ZEB2 region. Supplementary Figure 3. LocusZoom plots of chromosome 2 regions containing the most significantly associated markers of (A) Sample 1 and (B) Sample 2. Supplementary Table 2. Major association results from Sample 1. Supplementary Table 3: Major association results from Sample 2. Supplementary Table 1 (Microsoft Excel format, .xls): Supplementary Table 1. Full results from chromosome 2 association analyses and meta-analysis. Supplementary Table 4 (Microsoft Excel format, .xls): Supplementary Table 4. Summary of annotation data on strongly associated markers.

57 in total

1. The power of genomic control.

Authors: S A Bacanu; B Devlin; K Roeder
Journal: Am J Hum Genet Date: 2000-05-08 Impact factor: 11.025

2. Hypocholesterolemia: a major risk factor for developing pulmonary tuberculosis?

Authors: Carlos Pérez-Guzmán; Mario H Vargas
Journal: Med Hypotheses Date: 2006-02-24 Impact factor: 1.538

3. Improved whole-chromosome phasing for disease and population genetic studies.

Authors: Olivier Delaneau; Jean-Francois Zagury; Jonathan Marchini
Journal: Nat Methods Date: 2013-01 Impact factor: 28.547

4. Age, sex, and nutritional status modify the CD4+ T-cell recovery rate in HIV-tuberculosis co-infected patients on combination antiretroviral therapy.

Authors: Amara E Ezeamama; Ezekiel Mupere; James Oloya; Leonardo Martinez; Robert Kakaire; Xiaoping Yin; Juliet N Sekandi; Christopher C Whalen
Journal: Int J Infect Dis Date: 2015-04-21 Impact factor: 3.623

5. Genome-wide SNP-based linkage analysis of tuberculosis in Thais.

Authors: S Mahasirimongkol; H Yanai; N Nishida; C Ridruechai; I Matsushita; J Ohashi; S Summanapan; N Yamada; S Moolphate; C Chuchotaworn; A Chaiprasert; W Manosuthi; P Kantipong; S Kanitwittaya; T Sura; S Khusmith; K Tokunaga; P Sawanpanyalert; N Keicho
Journal: Genes Immun Date: 2008-10-09 Impact factor: 2.676

6. A genome wide association study of pulmonary tuberculosis susceptibility in Indonesians.

Authors: Eileen Png; Bachti Alisjahbana; Edhyana Sahiratmadja; Sangkot Marzuki; Ron Nelwan; Yanina Balabanova; Vladyslav Nikolayevskyy; Francis Drobniewski; Sergey Nejentsev; Iskandar Adnan; Esther van de Vosse; Martin L Hibberd; Reinout van Crevel; Tom H M Ottenhoff; Mark Seielstad
Journal: BMC Med Genet Date: 2012-01-13 Impact factor: 2.103

7. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants.

Authors: Lucas D Ward; Manolis Kellis
Journal: Nucleic Acids Res Date: 2011-11-07 Impact factor: 16.971

Review 8. Immunological mechanisms of human resistance to persistent Mycobacterium tuberculosis infection.

Authors: Jason D Simmons; Catherine M Stein; Chetan Seshadri; Monica Campo; Galit Alter; Sarah Fortune; Erwin Schurr; Robert S Wallis; Gavin Churchyard; Harriet Mayanja-Kizza; W Henry Boom; Thomas R Hawn
Journal: Nat Rev Immunol Date: 2018-09 Impact factor: 108.555

9. Genome scan of M. tuberculosis infection and disease in Ugandans.

Authors: Catherine M Stein; Sarah Zalwango; LaShaunda L Malone; Sungho Won; Harriet Mayanja-Kizza; Roy D Mugerwa; Dmitry V Leontiev; Cheryl L Thompson; Kevin C Cartier; Robert C Elston; Sudha K Iyengar; W Henry Boom; Christopher C Whalen
Journal: PLoS One Date: 2008-12-31 Impact factor: 3.240

10. Zeb2 recruits HDAC-NuRD to inhibit Notch and controls Schwann cell differentiation and remyelination.

Authors: Lai Man Natalie Wu; Jincheng Wang; Andrea Conidi; Chuntao Zhao; Haibo Wang; Zachary Ford; Liguo Zhang; Christiane Zweier; Brian G Ayee; Patrice Maurel; An Zwijsen; Jonah R Chan; Michael P Jankowski; Danny Huylebroeck; Q Richard Lu
Journal: Nat Neurosci Date: 2016-06-13 Impact factor: 24.884

10 in total

Review 1. Genetics and evolution of tuberculosis pathogenesis: New perspectives and approaches.

Authors: Michael L McHenry; Scott M Williams; Catherine M Stein
Journal: Infect Genet Evol Date: 2020-01-22 Impact factor: 3.342

2. HDAC3 inhibitor RGFP966 controls bacterial growth and modulates macrophage signaling during Mycobacterium tuberculosis infection.

Authors: Monica Campo; Sarah Heater; Glenna J Peterson; Jason D Simmons; Shawn J Skerrett; Harriet Mayanja-Kizza; Catherine M Stein; W Henry Boom; Thomas R Hawn
Journal: Tuberculosis (Edinb) Date: 2021-02-18 Impact factor: 3.131

3. Interaction between host genes and Mycobacterium tuberculosis lineage can affect tuberculosis severity: Evidence for coevolution?

Authors: Michael L McHenry; Jacquelaine Bartlett; Robert P Igo; Eddie M Wampande; Penelope Benchek; Harriet Mayanja-Kizza; Kyle Fluegge; Noemi B Hall; Sebastien Gagneux; Sarah A Tishkoff; Christian Wejse; Giorgio Sirugo; W Henry Boom; Moses Joloba; Scott M Williams; Catherine M Stein
Journal: PLoS Genet Date: 2020-04-30 Impact factor: 5.917

Review 4. Genetic Resistance to Mycobacterium tuberculosis Infection and Disease.

Authors: Marlo Möller; Craig J Kinnear; Marianna Orlova; Elouise E Kroon; Paul D van Helden; Erwin Schurr; Eileen G Hoal
Journal: Front Immunol Date: 2018-09-27 Impact factor: 7.561

Review 5. Phenotype Definition for "Resisters" to Mycobacterium tuberculosis Infection in the Literature-A Review and Recommendations.

Authors: Jesús Gutierrez; Elouise E Kroon; Marlo Möller; Catherine M Stein
Journal: Front Immunol Date: 2021-02-25 Impact factor: 7.561

6. Genome-wide association study of resistance to Mycobacterium tuberculosis infection identifies a locus at 10q26.2 in three distinct populations.

Authors: Jocelyn Quistrebert; Marianna Orlova; Gaspard Kerner; Le Thi Ton; Nguyễn Trong Luong; Nguyễn Thanh Danh; Quentin B Vincent; Fabienne Jabot-Hanin; Yoann Seeleuthner; Jacinta Bustamante; Stéphanie Boisson-Dupuis; Nguyen Thu Huong; Nguyen Ngoc Ba; Jean-Laurent Casanova; Christophe Delacourt; Eileen G Hoal; Alexandre Alcaïs; Vu Hong Thai; Lai The Thành; Laurent Abel; Erwin Schurr; Aurélie Cobat
Journal: PLoS Genet Date: 2021-03-04 Impact factor: 5.917

7. Interaction between M. tuberculosis Lineage and Human Genetic Variants Reveals Novel Pathway Associations with Severity of TB.

Authors: Michael L McHenry; Eddie M Wampande; Moses L Joloba; LaShaunda L Malone; Harriet Mayanja-Kizza; William S Bush; W Henry Boom; Scott M Williams; Catherine M Stein
Journal: Pathogens Date: 2021-11-15

8. Monocyte Transcriptional Responses to Mycobacterium tuberculosis Associate with Resistance to Tuberculin Skin Test and Interferon Gamma Release Assay Conversion.

Authors: Jason D Simmons; Kimberly A Dill-McFarland; Catherine M Stein; Phu T Van; Violet Chihota; Thobani Ntshiqa; Pholo Maenetje; Glenna J Peterson; Penelope Benchek; Mary Nsereko; Kavindhran Velen; Katherine L Fielding; Alison D Grant; Raphael Gottardo; Harriet Mayanja-Kizza; Robert S Wallis; Gavin Churchyard; W Henry Boom; Thomas R Hawn
Journal: mSphere Date: 2022-06-13 Impact factor: 5.029

9. Gene expression profiling identifies candidate biomarkers for new latent tuberculosis infections. A cohort study.

Authors: Mariana Herrera; Yoav Keynan; Paul J McLaren; Juan Pablo Isaza; Bernard Abrenica; Lucelly López; Diana Marin; Zulma Vanessa Rueda
Journal: PLoS One Date: 2022-09-28 Impact factor: 3.752

10. Monocyte metabolic transcriptional programs associate with resistance to tuberculin skin test/interferon-γ release assay conversion.

Authors: Jason D Simmons; Phu T Van; Catherine M Stein; Violet Chihota; Thobani Ntshiqa; Pholo Maenetje; Glenna J Peterson; Anthony Reynolds; Penelope Benchek; Kavindhran Velen; Katherine L Fielding; Alison D Grant; Andrew D Graustein; Felicia K Nguyen; Chetan Seshadri; Raphael Gottardo; Harriet Mayanja-Kizza; Robert S Wallis; Gavin Churchyard; W Henry Boom; Thomas R Hawn
Journal: J Clin Invest Date: 2021-07-15 Impact factor: 19.456

10 in total