Literature DB >> 34180596

Assessing a transmission network of Mycobacterium tuberculosis in an African city using single nucleotide polymorphism threshold analysis.

Edriss Yassine¹, Ronald Galiwango², Willy Ssengooba^3,4, Fred Ashaba⁵, Moses L Joloba⁵, Sarah Zalwango⁵, Christopher C Whalen², Frederick Quinn¹.

Abstract

Tuberculosis (TB) is the leading cause of death in humans by a single infectious agent worldwide with approximately two billion humans latently infected with the bacterium Mycobacterium tuberculosis. Currently, the accepted method for controlling the disease is Tuberculosis Directly Observed Treatment Shortcourse (TB-DOTS). This program is not preventative and individuals may transmit disease before diagnosis, thus better understanding of disease transmission is essential. Using whole-genome sequencing and single nucleotide polymorphism analysis, we analyzed genomes of 145 M. tuberculosis clinical isolates from active TB cases from the Rubaga Division of Kampala, Uganda. We established that these isolates grouped into M. tuberculosis complex (MTBC) lineages 1, 2, 3, and 4, with the most isolates grouping into lineage 4. Possible transmission pairs containing ≤12 SNPs were identified in lineages 1, 3, and 4 with the prevailing transmission in lineages 3 and 4. Furthermore, investigating DNA codon changes as a result of specific SNPs in prominent virulence genes including plcA and plcB could indicate potentially important modifications in protein function. Incorporating this analysis with corresponding epidemiological data may provide a blueprint for the integration of public health interventions to decrease TB transmission in a region.

Entities: Chemical

Keywords: zzm321990Mycobacterium tuberculosiszzm321990; single nucleotide polymorphism; social network; transmission; tuberculosis

Mesh：

Substances：

Year: 2021 PMID： 34180596 PMCID： PMC8209283 DOI： 10.1002/mbo3.1211

Source DB: PubMed Journal: Microbiologyopen ISSN： 2045-8827 Impact factor: 3.139

INTRODUCTION

Tuberculosis (TB) in humans is caused primarily by infection with Mycobacterium tuberculosis (Mtb). Most TB disease is generated when the bacilli transmit person‐to‐person via the aerosol route from an individual with an active infection coughing, sneezing, or speaking. Once the mycobacteria‐containing droplets are inhaled by an individual nearby, the infection that follows is typically established in the lungs; however, the bacteria can disseminate to other organs such as the kidneys, spine, and brain (Gupta et al., 2011; Yates et al., 2016). The World Health Organization (WHO) estimates that in 2018, there were 10 million new TB cases and 1.5 million deaths (WHO, 2019). Except for the COVID‐19 pandemic, TB is the leading infectious cause of death in the world today due to a single agent. An estimated two billion individuals may be latently infected with approximately 5%–10% being at risk for reactivation TB in their lifetime (WHO, 2019). Although the overall outlook for disease control has been reported to be trending positively, with incidence and mortality rates declining by 2% and 3%, respectively, since the year 2000, we are still below the goals set forth by the WHO End TB Strategy (WHO, 2015, 2019). In most parts of the world, public health organizations routinely screen for M. tuberculosis transmission among household contacts (Buu et al., 2010; Warria et al., 2020), which was long thought to be the primary means of dissemination. More recent epidemiological studies show that M. tuberculosis transmission is more likely to occur outside of the household (Buu et al., 2010; Yates et al., 2016). From outbreak investigations, research shows that transmission of M. tuberculosis bacilli can occur in social settings (Auld et al., 2018; Pinho et al., 2020) and at other events in the community (Cavalcante et al., 2010; Verver et al., 2004) with the actual frequency of occurrence in these settings outside of the household not being known. Thus, a more robust understanding of the transmission process would help to identify infected individuals early in the disease course, thus preventing transmission and subsequent disease (Meertens et al., 2013). The genome of M. tuberculosis provides a useful means of determining species‐specific diversity. Currently, eight global M. tuberculosis complex (MTBC) lineages have been identified: 1‐Indo‐Oceanic, 2‐East Asian (Beijing), 3‐East African Indian, 4‐Euro‐American, 5‐West Africa I, 6‐West Africa II, 7‐Ethiopia‐Horn of Africa, and 8‐African Great Lakes (Coll et al., 2014; Semuto Ngabonziza et al., 2020). Lineages are important for implementing control measures because it has been shown that different lineages may correlate with different epidemiologic and potential disease outcomes (Ford et al., 2013; Hernández‐Pando et al., 2003). Whole‐genome sequencing (WGS) has given researchers the ability to examine an organism's genetic structure down to the single nucleotide and the use of WGS has evolved from being primarily a research tool to being used in a clinical aspect to aid in the diagnosis and surveillance of diseases including M. tuberculosis (Meehan et al., 2019). Pertinent to this study, M. tuberculosis WGS also has allowed investigators to determine genetic diversity within the species, identify genomic variances potentially involved in pathogenesis (Sharma et al., 2017), and highlight transmission patterns based on the detection of single nucleotide polymorphisms (SNPs). A SNP is a nucleotide base variation at a single position in a DNA sequence. Generally, a SNP is considered valid when more than 1% of the population does not carry that specific nucleotide at the position through deletion or substitution (Jayakanthan et al., 2019). SNPs can be found in both coding and non‐coding regions of sequences and may or may not change the amino acid sequence depending on the nucleotide substitution. Examples of single SNP differences in M. tuberculosis that result in important gene function differences include modifications to katG, mabA, and Rv1772 and the subsequent development of drug resistance to one of the primary TB drugs, isoniazid (Ramaswamy et al., 2003). There is no shortage of studies that have used WGS and SNP‐based threshold analysis to assess TB transmission patterns. Famously, Walker et al. used these methods to determine the number of SNPs present between genomes in their study in the United Kingdom that would infer possible transmission of disease between individuals (Walker et al., 2013). Lee et al., (2015) used WGS to determine the reemergence of several M. tuberculosis strains in an outbreak in a small village in the arctic that was previously thought to have been controlled. Furthermore, Roetzer et al. (2013) used WGS and SNP threshold analysis in their longitudinal study to confirm the superiority of this method in the determination of transmission and improved surveillance. Uganda is one of the 30 high TB burden countries identified by the WHO with 86,000 new cases and an incidence rate of 200/100,000 in 2018 (WHO, 2019; Verver et al., 2004). In this study, using WGS and SNP analysis of M. tuberculosis isolates collected from active TB cases within a Ugandan social network study (Sekandi et al., 2015), we assessed transmission of disease by comparing the number of SNPs among the isolates using the SNP threshold method. The transmission data presented can be combined with epidemiological data to determine possible transmission hotspots within Ugandan social networks. In addition, we identified SNP differences in key virulence genes that could potentially be involved in enhancing or limiting transmission. Thus, in addition to providing an improved understanding of TB transmission within a population, SNP data such as these could be used to develop improved diagnostic tests, identify new targets for novel drug and vaccine development, and ultimately improve implementation of future public health intervention efforts to decrease the TB disease burden.

MATERIALS AND METHODS

Study design

This cross‐sectional transmission study was conducted in the Rubaga Division of Kampala, Uganda, located in the western part of the city. According to the Uganda Bureau of Statistics’ National Population and Housing Census 2014, Rubaga has a population of approximately 380,000 individuals (UBOS, 2017). Tuberculosis is a growing problem in this area of the city with the prevalence of positive TB smear tests estimated to be 1025 per 100,000 individuals, and a third of cases also being HIV‐positive (Sekandi et al., 2014). Study details, including sampling strategy and study population demographics, can be found in the manuscript by Kakaire et al., (2020). Briefly, adults, 15 years of age and older, as defined by a majority of African countries, presenting with TB symptoms and residing in the Rubaga Division were given a clinical test and acid‐fast staining was performed on two sputum samples. Individuals were included in the study if they showed clinical symptoms of pulmonary TB in addition to two positive sputum smears. The issue of drug resistance in an isolate was beyond the scope of this analysis.

Growth and DNA isolation of clinical isolates

Culturing and manipulation of M. tuberculosis isolates were performed in the College of American Pathologist (CAP)‐accredited, Mycobacteriology (BSL‐3) Laboratory in the Department of Medical Microbiology, Makerere University College of Health Sciences, Kampala, Uganda. Isolates were cultured and frozen bacterial stocks were made for research use. Clinical isolates were sub‐cultured on Middlebrook 7H10 agar (Becton and Dickinson), incubated at 37°C in 5% CO2. Growth was observed daily for four weeks. The bacteria were harvested and suspended in absolute ethanol (Sigma Aldrich) for inactivation by suffocation. Subsequently, chromosomal DNA was extracted using the protocol outlined in the ZR Fungal/Bacterial DNA Microprep kit (Zymo Research) with a slight modification. Because a bead‐beater instrument was not available, bacterial cells in ZR BashingBead Lysis tubes were attached to a vortexer and shaken for 5 min for lysis. After elution of each sample, the DNA concentrations were measured using a Nanodrop spectrophotometer. The DNA extracts were then shipped at ambient temperature to the Department of Infectious Diseases, University of Georgia, College of Veterinary Medicine, Athens, Georgia.

Sterility testing

Sterility testing of DNA samples was performed prior to WGS following the Center for Disease Control and Prevention protocol. Each DNA sample was resuspended in 20 μl of PBS. Middlebrook 7H10 agar (Becton and Dickinson) Petri dishes were spotted with 1 μl of each sample. One microliter of Mycobacterium bovis BCG was used as positive growth control. Plates were incubated at 37°C in 5% CO2 for six weeks and observed for growth. After the DNA samples were confirmed negative for growth, the remainder of the DNA samples was transferred to 96‐well plates and stored at −20°C until processed for DNA sequencing.

Whole‐genome sequencing (WGS) and single nucleotide polymorphism (SNP) analyses

Sequencing libraries were prepared using Nugen Ultralow V2 or Nextera XT V2 following the manufacturer's recommended protocol. The libraries were sequenced on a NextSeq 500 using mid output V2 chemistry (2 × 150 bp) or on a Miseq using V2 chemistry (2 × 250 bp). SNP analysis was conducted using BioNumerics 7.6.3 (Applied Maths NV). Reference‐guided assemblies were created using BioNumerics Reference Mapper 1.2.3 (Pouseele and Supply, 2015) with M. tuberculosis H37Rv (NCBI NC_00962.3) used as the reference genome for alignment. The settings for base calling were set as follows: minimum total coverage = 3, minimum forward coverage = 1, minimum reverse coverage = 1, Single base threshold = 0.75, double base threshold = 0.85, triple base threshold = 0.95, and gap threshold = 0.5. Isolates found with an average coverage of the genome of less than 50 were re‐sequenced (no sequences fell into this category). Reference‐guided assemblies are compared using Bionumerics 7.6.3 SNP analysis filters. For a SNP to be retained in the analysis, it had to meet the following criteria: have a total coverage of five reads, not contain ambiguous bases (bases not defined as A, T, C, G), not contain gaps and not be within 12 base pairs of adjoining called SNPs. Non‐informative SNPs were also excluded from further analysis. The number of high‐quality SNPs determined to be present between two isolates was recorded as the SNP distance. Isolates were grouped into lineages by the presence of pre‐defined SNPs that are unique to that particular lineage. Using the SNP threshold method, we used the Walker et al. limit of ≤12 SNPs being the determinant of relatedness between two isolates (Walker et al., 2013). Although Walker et al. established isolates containing 6–12 SNPs as indeterminate, a threshold of ≤12 SNPs was chosen to encompass all possibly linked pairs of isolates. Any indeterminate pairs can be filtered out by comparing the SNP data to the separate epidemiological data by the principal investigators, should the need arise. Sequences from a total of 143 isolates were analyzed using the BioNumerics pipeline.

Network analysis

Transmission networks were created using R statistical software (Vienna, Austria) and data visualization package qgraph (Epskamp et al., 2012). SNP distance matrices outputted by the BioNumerics pipeline were supplied into qgraph and desired output settings (color and SNP ranges) were selected to create the transmission network.

Mycobacterium tuberculosis gene SNP search

SNPs present in M. tuberculosis isolates were identified using UNIX command line tools. When the position of each SNP was attained, specific codon mutations were visualized using Integrative Genomics Viewer (IGV) (Broad Institute, MA).

RESULTS

MTBC lineages

Of the 143 sequences analyzed, a total of 30 were excluded from further analysis due to the following: Twenty‐five sequences did not meet the inclusion criteria described in the Methods section. One failed the de novo assembly process, and thus the pipeline was not able to assemble the sequenced fragments due to errors. Three contained mixed genomic material from more than one bacterial species. Lastly, one presented with general sequencing failure. After exclusion, a total of 113 isolates were included in the final SNP analysis (Figure 1).

FIGURE 1

UPGMA rooted tree of the 113 isolates included in the analysis separated into color‐coded MTBC lineages using the Bionumerics SNP analysis pipeline. Branch numbers indicate the SNP distance between isolates. L1 includes 2 isolates; L2, one isolate; L3, 23 isolates; L4, 87 isolates. UPGMA, unweighted pair group method with arithmetic mean Of the 113 isolates analyzed, 2 isolates, 17918 and 20850, grouped into MTBC lineage 1, Indo‐Oceanic. SNP analysis determined that the two isolates were identical with 0 SNPs occurring between them; indicating a possible transmission pair from the same individual. A single isolate, 28272, grouped into MTBC lineage 2, East Asian (Beijing). A second isolate forming a transmission pair was not identified thus indicating this was an isolated strain within the sampled population. A total of 23 isolates were grouped into MTBC lineage 3, East African Indian, separating into two transmission clusters of interest (Figure 2). Isolates 16294, 20695, 20839, 19621, 20918, and 22199 formed cluster 1 and isolates 20060, 20061, and 18346 formed cluster 2 (Table A1). The number of SNPs between each isolate can be seen in Table A1, Figure 3, and Figure 4. All samples from the two clusters contain ≤12 SNPs which may indicate that isolates within the clusters were transmitted from a single individual.

FIGURE 2

TABLE A1

MTBC lineage 3 clusters showing possible transmission pairs with the number of SNPs between isolates

Cluster number	Isolate pair IDs	Number of SNPs between pairs
1	16294 20695	0
	19621 20918	0
	16294 19621	1
	16294 20918	1
	20695 19621	1
	20695 20918	1
	16294 20839	1
	20695 20839	1
	16294 22199	1
	20695 22199	1
2	20060 20061	0
	20060 18346	2
	20061 18346	2

FIGURE 3

MST of MTBC lineage 3, cluster 1. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning tree

FIGURE 4

MST of MTBC lineage 3, cluster 2. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning tree

UPGMA rooted tree of the 23 isolates grouped into MTBC lineage 3 separated into possible transmission clusters. Branch numbers indicate the SNP distance between isolates. Cluster 1 consists of 6 total isolates, and cluster 2 consists of 3 isolates. White circles are isolates that did not group into clusters. UPGMA, unweighted pair group method with arithmetic mean MST of MTBC lineage 3, cluster 1. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning tree MST of MTBC lineage 3, cluster 2. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning tree There were a total of 87 isolates grouped into MTBC lineage 4 Euro‐American, separating into 19 clusters (Figure 5). Of the 19 clusters, transmission pairs containing ≤12 SNPs in 17 of the 19 clusters were identified. Clusters with two or more, non‐identical, isolates can be represented as a minimum spanning tree (MST) or a neighbor‐joining tree (NJT) (Figures A1, A2, A3, A4, A5, A6, A7, A8). The number of SNPs between isolates in each cluster can be found in Table A2 and their respective phylogenetic trees. The greatest number of isolate pairs in lineage 4 was found in cluster 13.

FIGURE 5

UPGMA rooted tree of isolates grouped into MTBC lineage 4 and separated into possible transmission clusters. Branch numbers indicate the SNP distance between isolates. There were 19 clusters of interest identified, each represented by a different color. White circles are isolates that did not group into clusters. Clusters 2 and 9 contained pairs that had >12 SNPs and therefore no transmission pairs. UPGMA, unweighted pair group method with arithmetic mean

FIGURE A1

MST of MTBC lineage 4, cluster 1. Numbers between branches indicate SNP distance. MST, minimum spanning tree

FIGURE A2

MST of MTBC lineage 4, cluster 3. Numbers between branches indicate SNP distance. MST, minimum spanning tree

FIGURE A3

MST of MTBC lineage 4, cluster 4. Numbers between branches indicate SNP distance. MST, minimum spanning tree

FIGURE A4

MST of MTBC lineage 4, cluster 6. Numbers between branches indicate SNP distance. MST, minimum spanning tree

FIGURE A5

MST of MTBC lineage 4, cluster 13. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. MST, minimum spanning tree

FIGURE A6

MST of MTBC lineage 4, cluster 15. Numbers between branches indicate SNP distance. MST, minimum spanning tree

FIGURE A7

NJT of MTBC lineage 4, cluster 17. Numbers between branches indicate SNP distance. MST, minimum spanning tree; NJT, Neighbor‐Joining Tree

FIGURE A8

MST of MTBC lineage 4, cluster 18. Numbers between branches indicate SNP distance. MST, minimum spanning tree

TABLE A2

MTBC lineage 4 clusters showing possible transmission pairs with the number of SNPs between isolates

Cluster number	Isolate pair IDs	Number of SNPs between pairs
1	19891 27889	4
3	16607 21779	6
3	16607 16608	2
4	19034 26720	7
5	22466 22468	0
6	14956 19895	2
7	15545 15547	0
8	23229 26963	0
10	13577 13578	0
	13577 13579	0
	13578 13579	0
11	20574 20603	0
12	19595 19801	0
	19595 19832	0
	19801 19832	0
13	19077 20606	0
	19077 16732	9
	20606 16732	9
	16732 15634	8
14	14158 14159	0
15	17778 17782	3
16	17549 17551	0
17	18673 20148	12
18	17085 14774	1
19	20253 20634	0

Isolates visualized as a network

Using the data generated in this study, it is observed that isolates from each lineage form distinct networks connected based on the number of SNPs between isolate pairs. The isolates in lineage 3 (Figure 6A) and lineage 4 (Figure 6B) both form identifiable networks and can be visualized based on the number of SNPs separating the isolates. Each node is connected to another if they are associated with each other within the network. Possible transmission pairs containing ≤12 SNPs are highlighted in red to indicate where they fit in the transmission network. Sixteen of the 23 “lineage 3” isolates and 67 of the 87 “lineage 4” isolates were included in the analysis from their respective transmission networks.

FIGURE 6

Pairwise SNP matrix visualized as a network colored by number of SNPs: 0–12 SNPs (red), 13–50 SNPs (blue), 51–100 SNPs (green) and >100 SNPs (black). Each node represents an individual isolate. Each node is connected to another if they are associated with each other within the network. Transmission networks were created using R statistical software and data visualization package qgraph. (a) Isolates grouping in MTBC lineage 3 forming a network. (b) Isolates grouping in MTBC lineage 4 forming a network

SNPs present in Mycobacterium tuberculosis virulence genes

The genomes of the isolates in this study were examined for the presence of SNPs in commonly identified MTBC virulence‐associated genes identified by Forrellad et al., (2013). Mycobacterium tuberculosis genes containing SNPs among 50% or more of the isolates are listed in Table 1 and those less than 50% in Table A3. One hundred percent of isolates contained at least one SNP in genes htrA2, ctpV, pks12, and pstA1 when compared to the M. tuberculosis H37Rv reference genome. Furthermore, greater than half of all isolates also contained SNPs in the genes mce1, plcA, plcB, pks7, dosT, and pks5. These data suggest that SNPs in these genes may contribute to the pathogenicity of these isolates, whether it be transmission or the establishment of disease. Furthermore, if certain SNPs are present in all isolates in this cohort then we can hypothesize that they may present some survival advantage. These top 10 genes were further evaluated to determine the specific SNP(s) present in each gene and to assess any potential changes in strain virulence that could be associated with the mutation(s). Of the SNPs found, the largest number and greatest diversity were most prominent in the genes plcA and plcB, encoding phospholipase C (Table 2) and the others of interest in Table A4. Both phospholipase C genes are translated in the reverse orientation in the M. tuberculosis genome, and therefore, the SNP positions occur early in the protein‐coding regions. The SNP in plcB, position 2630173, generates a nonsense mutation (from a serine to a stop codon). Of the 8 SNPs present in plcA, half were found to be synonymous; however, some of the non‐synonymous mutations translate to changes in amino acid charge that can potentially cause modifications in protein folding or alterations in side‐chain interactions.

TABLE 1

Mycobacterium tuberculosis virulence genes containing SNPs among 50% or more of the study isolates relative to reference strain H37Rv

Gene name	Rv number	Description	Number of isolates containing SNP	Percentage of isolates containing SNP
htrA2	Rv0983	Serine protease and chaperone	113	100
ctpV	Rv0969	Copper efflux p‐type ATPase	113	100
pks12	Rv2048c	Polyketide synthase	113	100
pstA1	Rv0930	Inorganic phosphate ABC transporter	113	100
mce1	Rv0166	Mammalian cell entry protein	105	93
plcA	Rv2351c	Phospholipase C	92	81
plcB	Rv2350c	Phospholipase C	87	77
pks7	Rv1661	Polyketide synthase	66	58
dosT	Rv2027c	Transcriptional regulator	63	56
pks5	Rv1527c	Polyketide synthase	62	55

TABLE A3

Mycobacterium tuberculosis virulence genes containing SNPs among less than 50% of the study isolates relative to reference strain H37Rv

Gene name	Rv number	Description	Number of isolates containing SNP	Percentage of isolates containing SNP
fadD26	Rv2930	Fatty acid CoA synthase	29	26
RD1	Rv3868	Esx1 component	26	23
dosR	Rv3133c	Transcriptional regulator	17	15
pknD	Rv0931c	Protein kinase D	11	10
pknE	Rv1743	Serine/Threonine kinase E	11	10
sigC	Rv2069	Sigma factor C	8	7
erp	Rv3810	Exported repetitive protein	4	4
esxB	Rv3874	esx1 component	4	4
mce2	Rv0586	Mammalian cell entry protein	4	4
esxD	Rv3874	Esx1 component	4	4
sodC	Rv0432	Superoxide dismutase C	3	3
acg	Rv2032	unknown	3	3
ahpC	Rv2428	Alkyl hydroperoxide reductase C	2	2
mce4	Rv3501c	Mammalian cell entry protein	2	2
pcaA	Rv0470c	Mycolic acid synthase	1	1
hspX	Rv2031c	Alpha Crystallin protein	1	1
mce3	Rv1964	Mammalian cell entry protein	1	1
hbhA	Rv0475	Heparin‐binding hemagglutinin protein	0	0
esxA	Rv3875	Esx1 component	0	0
katG	Rv1908c	Catalase peroxidase enzyme	0	0

TABLE 2

Mycobacterium tuberculosis virulence genes containing the highest number of SNPs from study isolates showing SNP codon‐specific changes

Gene	Rv#	SNP	Position	Codon change	AA change
plcA	Rv2351c	C → G	2631556	CCG → CGG	Pro → Arg
		T → C	2631565	ATG → ACG	Met → Thr
		T → C	2631574	GTG → GCG	Val → Ala
		G → A	2631583	AGC → AAC	Ser → Asn
		G → A	2631599	GGG → GGA	Synonymous
		A → G	2631620	TAA → TAG	Synonymous
		A → G	2631971	CAA → CAG	Synonymous
		G → C	2631977	CCG → CCC	Synonymous
plcB	Rv2350c	C → G	2630158	ACC → AGC	Thr → Ser
		A → G	2630161	GAT → GGT	Asp → Gly
		C → G	2630173	TCA → TGA	Ser → Stop
		C → G	2630176	ACA → AGA	Thr → Arg
		G → A	2630182	CGA → CAA	Arg → Gln
		T → A	2630184	TGT → AGT	Cys → Ser
		C → T	2630188	GCT → GTT	Ala → Val
		T → G	2630206	GTC → GGC	Val → Gly
		G → A	2630211	GGC → AGC	Gly → Ser
		A → G	2630215	AAG → AGG	Lys → Arg

TABLE A4

Mycobacterium tuberculosis virulence genes containing the highest number of SNPs from study isolates showing SNP codon‐specific changes

Gene	Rv#	SNP	Position	Codon change	AA change
htrA2	Rv0983	T → C	1100234	CCT → CCC	Synonymous
ctpV	Rv0969	C → A	1079927	ACC → ACA	Synonymous
pks12	Rv2048c	G → C	2296042	GGT → CGT	Gly → Arg
		G → T	2297287	TGC → TTC	Cys → Phe
		A → G	2300237	CCA → CCG	Synonymous
		A → T	2300546	CGA → CGT	Synonymous
		T → G	2300552	TGT → TGG	Synonymous
psta1	Rv0930	C → T	1037911	GCG → GTG	Ala → Val
psta1	Rv0930	T → C	1037012	AAT → AAC	Synonymous
mce1	Rv0166	C → T	196642	ACC → ATC	Thr → Ile
pks7	Rv1661	T → G	1875544	GTT → GGT	Val → Gly
dosT	Rv2027c	C → T	2273627	CCC → CCT	Synonymous
pks5	Rv1527c	G → A	1724120	AGG → AGA	Synonymous

Mycobacterium tuberculosis virulence genes containing SNPs among 50% or more of the study isolates relative to reference strain H37Rv Mycobacterium tuberculosis virulence genes containing the highest number of SNPs from study isolates showing SNP codon‐specific changes

DISCUSSION

In this study, we showed that possible transmission relationships do exist between numerous M. tuberculosis isolates collected from patients presenting with pulmonary TB symptoms in a defined geographic region (the Rubaga Division of Kampala, Uganda) based on genome sequence comparisons. One hundred and thirteen isolates were included in the SNP analysis and grouped into distinct MTBC lineages. According to the SNP analysis, using a threshold of ≤12 SNPs as indicative of a transmission pair, transmission pairs in all lineages containing at least two isolates were found, with lineage 4 having the highest frequency, the most transmission pairs, and the most isolates. This should be expected as lineage 4 is the dominant lineage present in Uganda, followed by lineage 3 (Wampande et al., 2015). When a pairwise SNP matrix is generated to visualize the isolates (Figure 6), clear relationships can be seen by the connection of the isolates to each other. These data not only show possible transmission of M. tuberculosis isolates between individuals, but the transmission networks identified, once combined with epidemiological data, will allow public health interventions to be implemented in this region for social gatherings and other establishments that are frequented by the human TB transmitters. This type of study also allows the correlation of SNPs in specific genes that may translate into functional differences in the resulting products and thus alterations in virulence phenotypes including transmission efficiency. Even though our overall understanding of many virulence factors expressed by M. tuberculosis is limited, some gene functions are fairly well defined and this type of analysis can add to that understanding. For example, multiple mutations in the genes plcA and plcB, especially a nonsense mutation in the latter, bring about the question of survival advantage to the bacteria. The plcABCD family of genes encodes a phospholipase C, playing a role in pathogenesis by cleaving phospholipids during intracellular replication and trafficking during acute infection (Talarico et al., 2005). These genes also have been shown to have sphingomyelinase activity which can catalyze the hydrolysis of sphingomyelin and can interfere with the host inflammatory response aiding the infection (Castro‐Garza et al., 2016). Alteration and/or inactivation of those genes as observed in our study isolates could potentially modify virulence to decrease lung damage and prolong a less severe disease stage for the host. An example of this concept was shown for Pseudomonas aeruginosa wild‐type infection caused significant lung function impairment and rapid death of the host animal (Wargo et al., 2011), whereas the effects of infection with a phospholipase C mutant strain were less severe, potentially permitting longer co‐survival of pathogen and host. Several future studies can be performed based on the data generated in this study. For example, the project protocol required patients to give a minimum of two sputum samples; however, the relationship between the isolates found in samples from the same person was not analyzed. Therefore, future studies should consider analyzing SNP differences between isolates collected from the same patient to determine within‐patient differences in the M. tuberculosis genomes from these infected individuals. This would possibly help determine if a person carries more than one strain of M. tuberculosis during infection in that region or if transmission occurred from multiple individuals. One limitation of this study is that it was conducted in one Division of Kampala, Rubaga. Rubaga was chosen for this study for several reasons: First, it was established by Sekandi et al. that Rubaga was an area of high tuberculosis disease burden (Sekandi et al., 2014). Next, due to the high levels of disease burden, we should also expect to see high levels of transmission. Third, the principal investigators have an established working relationship with the local community, the community health system, and political leaders. Lastly, due to the established relationship, the investigators have the trust of the community. Due to this geographical limitation, we suggest that this type of analysis should be expanded beyond the Rubaga Division to determine more transmission networks where interventions can be incorporated and to make the data more generalizable to more regions and potentially to the entire country. Currently, few countries have the capability to whole‐genome sequence every M. tuberculosis isolate to help better define transmission patterns and thus make national public health policy. Additionally, the minimum number or percentage of isolates needed to be sequenced in a region or country to help determine the most accurate transmission model has not been determined. Thus, in most TB endemic and non‐endemic areas of the world, smaller studies like this one are generating local transmission models as we plan for more expansive future programs (Gurjav et al., 2016).

CONFLICT OF INTEREST

None declared.

AUTHOR CONTRIBUTIONS

Edriss Yassine: Conceptualization (lead); Data curation (lead); Formal analysis (lead); Investigation (lead); Methodology (lead); Software (lead); Validation (lead); Visualization (lead); Writing‐original draft (lead); Writing‐review & editing (lead). Ronald Galiwango: Data curation (supporting). Willy Ssengooba: Project administration (supporting); Resources (supporting); Supervision (supporting). Fred Ashaba: Project administration (supporting); Resources (supporting); Supervision (supporting). Moses Joloba: Funding acquisition (supporting). Sarah Zalwango: Project administration (supporting). Christopher Whalen: Conceptualization (supporting); Funding acquisition (lead); Investigation (supporting); Methodology (lead); Project administration (lead); Resources (lead); Supervision (lead); Writing‐review & editing (supporting). Frederick Quinn: Funding acquisition (supporting); Methodology (supporting); Project administration (lead); Resources (lead); Supervision (lead); Writing‐review & editing (supporting).

ETHICS STATEMENT

The study was approved by the University of Georgia Institutional Review Board, the Higher Degrees Research and Ethics Committee at Makerere University School of Public Health, and approved by the Uganda National Council for Science and Technology.

29 in total

1. Insertion- and deletion-associated genetic diversity of Mycobacterium tuberculosis phospholipase C-encoding genes among 106 clinical isolates from Turkey.

Authors: Sarah Talarico; Riza Durmaz; Zhenhua Yang
Journal: J Clin Microbiol Date: 2005-02 Impact factor: 5.948

Review 2. Mycobacterium tuberculosis: immune evasion, latency and reactivation.

Authors: Antima Gupta; Akshay Kaul; Anthony G Tsolaki; Uday Kishore; Sanjib Bhakta
Journal: Immunobiology Date: 2011-07-18 Impact factor: 3.144

Review 3. Virulence factors of the Mycobacterium tuberculosis complex.

Authors: Marina A Forrellad; Laura I Klepp; Andrea Gioffré; Julia Sabio y García; Hector R Morbidoni; María de la Paz Santangelo; Angel A Cataldi; Fabiana Bigi
Journal: Virulence Date: 2012-10-17 Impact factor: 5.882

4. Investigating extradomiciliary transmission of tuberculosis: An exploratory approach using social network patterns of TB cases and controls and the genotyping of Mycobacterium tuberculosis.

Authors: Suani T R Pinho; Susan M Pereira; José G V Miranda; Tonya A Duarte; Joilda S Nery; Maeli G de Oliveira; M Yana G S Freitas; Naila A De Almeida; Fabio B Moreira; Raoni B C Gomes; Ligia Kerr; Carl Kendall; M Gabriela M Gomes; Theolis C B Bessa; Roberto F S Andrade; Mauricio L Barreto
Journal: Tuberculosis (Edinb) Date: 2020-10-24 Impact factor: 3.131

5. Proportion of tuberculosis transmission that takes place in households in a high-incidence area.

Authors: Suzanne Verver; Robin M Warren; Zahn Munch; Madalene Richardson; Gian D van der Spuy; Martien W Borgdorff; Marcel A Behr; Nulda Beyers; Paul D van Helden
Journal: Lancet Date: 2004-01-17 Impact factor: 79.321

6. An acidic sphingomyelinase Type C activity from Mycobacterium tuberculosis.

Authors: Jorge Castro-Garza; Francisco González-Salazar; Frederick D Quinn; Russell K Karls; Laura Hermila De La Garza-Salinas; Francisco J Guzmán-de la Garza; Javier Vargas-Villarreal
Journal: Rev Argent Microbiol Date: 2016-03-03 Impact factor: 1.852

Review 7. Where is tuberculosis transmission happening? Insights from the literature, new tools to study transmission and implications for the elimination of tuberculosis.

Authors: Sara C Auld; N Sarita Shah; Ted Cohen; Neil A Martinson; Neel R Gandhi
Journal: Respirology Date: 2018-06-05 Impact factor: 6.424

8. Four Degrees of Separation: Social Contacts and Health Providers Influence the Steps to Final Diagnosis of Active Tuberculosis Patients in Urban Uganda.

Authors: Juliet N Sekandi; Sarah Zalwango; Leonardo Martinez; Andreas Handel; Robert Kakaire; Allan K Nkwata; Amara E Ezeamama; Noah Kiwanuka; Christopher C Whalen
Journal: BMC Infect Dis Date: 2015-08-21 Impact factor: 3.090

9. Prevention praised, cure preferred: results of between-subjects experimental studies comparing (monetary) appreciation for preventive and curative interventions.

Authors: Ree M Meertens; Vivian M J Van de Gaar; Maitta Spronken; Nanne K de Vries
Journal: BMC Med Inform Decis Mak Date: 2013-12-18 Impact factor: 2.796