Edriss Yassine1, Ronald Galiwango2, Willy Ssengooba3,4, Fred Ashaba5, Moses L Joloba5, Sarah Zalwango5, Christopher C Whalen2, Frederick Quinn1. 1. Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GA, USA. 2. Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, USA. 3. Makerere University Lung Institute, College of Health Sciences, Makerere University, Kampala, Uganda. 4. Mycobacteriology (BSL-3) Laboratory, Department of Medical Microbiology, Makerere University, Kampala, Uganda. 5. Uganda-CWRU Research Collaboration, Makerere University and Mulago Hospital, Kampala, Uganda.
Abstract
Tuberculosis (TB) is the leading cause of death in humans by a single infectious agent worldwide with approximately two billion humans latently infected with the bacterium Mycobacterium tuberculosis. Currently, the accepted method for controlling the disease is Tuberculosis Directly Observed Treatment Shortcourse (TB-DOTS). This program is not preventative and individuals may transmit disease before diagnosis, thus better understanding of disease transmission is essential. Using whole-genome sequencing and single nucleotide polymorphism analysis, we analyzed genomes of 145 M. tuberculosis clinical isolates from active TB cases from the Rubaga Division of Kampala, Uganda. We established that these isolates grouped into M. tuberculosis complex (MTBC) lineages 1, 2, 3, and 4, with the most isolates grouping into lineage 4. Possible transmission pairs containing ≤12 SNPs were identified in lineages 1, 3, and 4 with the prevailing transmission in lineages 3 and 4. Furthermore, investigating DNA codon changes as a result of specific SNPs in prominent virulence genes including plcA and plcB could indicate potentially important modifications in protein function. Incorporating this analysis with corresponding epidemiological data may provide a blueprint for the integration of public health interventions to decrease TB transmission in a region.
Tuberculosis (TB) is the leading cause of death in humans by a single infectious agent worldwide with approximately two billion humans latently infected with the bacterium Mycobacterium tuberculosis. Currently, the accepted method for controlling the disease is Tuberculosis Directly Observed Treatment Shortcourse (TB-DOTS). This program is not preventative and individuals may transmit disease before diagnosis, thus better understanding of disease transmission is essential. Using whole-genome sequencing and single nucleotide polymorphism analysis, we analyzed genomes of 145 M. tuberculosis clinical isolates from active TB cases from the Rubaga Division of Kampala, Uganda. We established that these isolates grouped into M. tuberculosis complex (MTBC) lineages 1, 2, 3, and 4, with the most isolates grouping into lineage 4. Possible transmission pairs containing ≤12 SNPs were identified in lineages 1, 3, and 4 with the prevailing transmission in lineages 3 and 4. Furthermore, investigating DNA codon changes as a result of specific SNPs in prominent virulence genes including plcA and plcB could indicate potentially important modifications in protein function. Incorporating this analysis with corresponding epidemiological data may provide a blueprint for the integration of public health interventions to decrease TB transmission in a region.
Tuberculosis (TB) in humans is caused primarily by infection with Mycobacterium tuberculosis (Mtb). Most TB disease is generated when the bacilli transmit person‐to‐person via the aerosol route from an individual with an active infection coughing, sneezing, or speaking. Once the mycobacteria‐containing droplets are inhaled by an individual nearby, the infection that follows is typically established in the lungs; however, the bacteria can disseminate to other organs such as the kidneys, spine, and brain (Gupta et al., 2011; Yates et al., 2016).The World Health Organization (WHO) estimates that in 2018, there were 10 million new TB cases and 1.5 million deaths (WHO, 2019). Except for the COVID‐19 pandemic, TB is the leading infectious cause of death in the world today due to a single agent. An estimated two billion individuals may be latently infected with approximately 5%–10% being at risk for reactivation TB in their lifetime (WHO, 2019). Although the overall outlook for disease control has been reported to be trending positively, with incidence and mortality rates declining by 2% and 3%, respectively, since the year 2000, we are still below the goals set forth by the WHO End TB Strategy (WHO, 2015, 2019).In most parts of the world, public health organizations routinely screen for M. tuberculosis transmission among household contacts (Buu et al., 2010; Warria et al., 2020), which was long thought to be the primary means of dissemination. More recent epidemiological studies show that M. tuberculosis transmission is more likely to occur outside of the household (Buu et al., 2010; Yates et al., 2016). From outbreak investigations, research shows that transmission of M. tuberculosis bacilli can occur in social settings (Auld et al., 2018; Pinho et al., 2020) and at other events in the community (Cavalcante et al., 2010; Verver et al., 2004) with the actual frequency of occurrence in these settings outside of the household not being known. Thus, a more robust understanding of the transmission process would help to identify infected individuals early in the disease course, thus preventing transmission and subsequent disease (Meertens et al., 2013).The genome of M. tuberculosis provides a useful means of determining species‐specific diversity. Currently, eight global M. tuberculosis complex (MTBC) lineages have been identified: 1‐Indo‐Oceanic, 2‐East Asian (Beijing), 3‐East African Indian, 4‐Euro‐American, 5‐West Africa I, 6‐West Africa II, 7‐Ethiopia‐Horn of Africa, and 8‐African Great Lakes (Coll et al., 2014; Semuto Ngabonziza et al., 2020). Lineages are important for implementing control measures because it has been shown that different lineages may correlate with different epidemiologic and potential disease outcomes (Ford et al., 2013; Hernández‐Pando et al., 2003).Whole‐genome sequencing (WGS) has given researchers the ability to examine an organism's genetic structure down to the single nucleotide and the use of WGS has evolved from being primarily a research tool to being used in a clinical aspect to aid in the diagnosis and surveillance of diseases including M. tuberculosis (Meehan et al., 2019). Pertinent to this study, M. tuberculosis WGS also has allowed investigators to determine genetic diversity within the species, identify genomic variances potentially involved in pathogenesis (Sharma et al., 2017), and highlight transmission patterns based on the detection of single nucleotide polymorphisms (SNPs). A SNP is a nucleotide base variation at a single position in a DNA sequence. Generally, a SNP is considered valid when more than 1% of the population does not carry that specific nucleotide at the position through deletion or substitution (Jayakanthan et al., 2019). SNPs can be found in both coding and non‐coding regions of sequences and may or may not change the amino acid sequence depending on the nucleotide substitution.Examples of single SNP differences in M. tuberculosis that result in important gene function differences include modifications to katG, mabA, and Rv1772 and the subsequent development of drug resistance to one of the primary TB drugs, isoniazid (Ramaswamy et al., 2003).There is no shortage of studies that have used WGS and SNP‐based threshold analysis to assess TB transmission patterns. Famously, Walker et al. used these methods to determine the number of SNPs present between genomes in their study in the United Kingdom that would infer possible transmission of disease between individuals (Walker et al., 2013). Lee et al., (2015) used WGS to determine the reemergence of several M. tuberculosis strains in an outbreak in a small village in the arctic that was previously thought to have been controlled. Furthermore, Roetzer et al. (2013) used WGS and SNP threshold analysis in their longitudinal study to confirm the superiority of this method in the determination of transmission and improved surveillance.Uganda is one of the 30 high TB burden countries identified by the WHO with 86,000 new cases and an incidence rate of 200/100,000 in 2018 (WHO, 2019; Verver et al., 2004). In this study, using WGS and SNP analysis of M. tuberculosis isolates collected from active TB cases within a Ugandan social network study (Sekandi et al., 2015), we assessed transmission of disease by comparing the number of SNPs among the isolates using the SNP threshold method. The transmission data presented can be combined with epidemiological data to determine possible transmission hotspots within Ugandan social networks. In addition, we identified SNP differences in key virulence genes that could potentially be involved in enhancing or limiting transmission. Thus, in addition to providing an improved understanding of TB transmission within a population, SNP data such as these could be used to develop improved diagnostic tests, identify new targets for novel drug and vaccine development, and ultimately improve implementation of future public health intervention efforts to decrease the TB disease burden.
MATERIALS AND METHODS
Study design
This cross‐sectional transmission study was conducted in the Rubaga Division of Kampala, Uganda, located in the western part of the city. According to the Uganda Bureau of Statistics’ National Population and Housing Census 2014, Rubaga has a population of approximately 380,000 individuals (UBOS, 2017). Tuberculosis is a growing problem in this area of the city with the prevalence of positive TB smear tests estimated to be 1025 per 100,000 individuals, and a third of cases also being HIV‐positive (Sekandi et al., 2014). Study details, including sampling strategy and study population demographics, can be found in the manuscript by Kakaire et al., (2020). Briefly, adults, 15 years of age and older, as defined by a majority of African countries, presenting with TB symptoms and residing in the Rubaga Division were given a clinical test and acid‐fast staining was performed on two sputum samples. Individuals were included in the study if they showed clinical symptoms of pulmonary TB in addition to two positive sputum smears. The issue of drug resistance in an isolate was beyond the scope of this analysis.
Growth and DNA isolation of clinical isolates
Culturing and manipulation of M. tuberculosis isolates were performed in the College of American Pathologist (CAP)‐accredited, Mycobacteriology (BSL‐3) Laboratory in the Department of Medical Microbiology, Makerere University College of Health Sciences, Kampala, Uganda. Isolates were cultured and frozen bacterial stocks were made for research use. Clinical isolates were sub‐cultured on Middlebrook 7H10 agar (Becton and Dickinson), incubated at 37°C in 5% CO2. Growth was observed daily for four weeks. The bacteria were harvested and suspended in absolute ethanol (Sigma Aldrich) for inactivation by suffocation. Subsequently, chromosomal DNA was extracted using the protocol outlined in the ZR Fungal/Bacterial DNA Microprep kit (Zymo Research) with a slight modification. Because a bead‐beater instrument was not available, bacterial cells in ZR BashingBead Lysis tubes were attached to a vortexer and shaken for 5 min for lysis. After elution of each sample, the DNA concentrations were measured using a Nanodrop spectrophotometer. The DNA extracts were then shipped at ambient temperature to the Department of Infectious Diseases, University of Georgia, College of Veterinary Medicine, Athens, Georgia.
Sterility testing
Sterility testing of DNA samples was performed prior to WGS following the Center for Disease Control and Prevention protocol. Each DNA sample was resuspended in 20 μl of PBS. Middlebrook 7H10 agar (Becton and Dickinson) Petri dishes were spotted with 1 μl of each sample. One microliter of Mycobacterium bovis BCG was used as positive growth control. Plates were incubated at 37°C in 5% CO2 for six weeks and observed for growth. After the DNA samples were confirmed negative for growth, the remainder of the DNA samples was transferred to 96‐well plates and stored at −20°C until processed for DNA sequencing.
Whole‐genome sequencing (WGS) and single nucleotide polymorphism (SNP) analyses
Sequencing libraries were prepared using Nugen Ultralow V2 or Nextera XT V2 following the manufacturer's recommended protocol. The libraries were sequenced on a NextSeq 500 using mid output V2 chemistry (2 × 150 bp) or on a Miseq using V2 chemistry (2 × 250 bp). SNP analysis was conducted using BioNumerics 7.6.3 (Applied Maths NV). Reference‐guided assemblies were created using BioNumerics Reference Mapper 1.2.3 (Pouseele and Supply, 2015) with M. tuberculosis H37Rv (NCBI NC_00962.3) used as the reference genome for alignment. The settings for base calling were set as follows: minimum total coverage = 3, minimum forward coverage = 1, minimum reverse coverage = 1, Single base threshold = 0.75, double base threshold = 0.85, triple base threshold = 0.95, and gap threshold = 0.5. Isolates found with an average coverage of the genome of less than 50 were re‐sequenced (no sequences fell into this category). Reference‐guided assemblies are compared using Bionumerics 7.6.3 SNP analysis filters. For a SNP to be retained in the analysis, it had to meet the following criteria: have a total coverage of five reads, not contain ambiguous bases (bases not defined as A, T, C, G), not contain gaps and not be within 12 base pairs of adjoining called SNPs. Non‐informative SNPs were also excluded from further analysis. The number of high‐quality SNPs determined to be present between two isolates was recorded as the SNP distance. Isolates were grouped into lineages by the presence of pre‐defined SNPs that are unique to that particular lineage. Using the SNP threshold method, we used the Walker et al. limit of ≤12 SNPs being the determinant of relatedness between two isolates (Walker et al., 2013). Although Walker et al. established isolates containing 6–12 SNPs as indeterminate, a threshold of ≤12 SNPs was chosen to encompass all possibly linked pairs of isolates. Any indeterminate pairs can be filtered out by comparing the SNP data to the separate epidemiological data by the principal investigators, should the need arise. Sequences from a total of 143 isolates were analyzed using the BioNumerics pipeline.
Network analysis
Transmission networks were created using R statistical software (Vienna, Austria) and data visualization package qgraph (Epskamp et al., 2012). SNP distance matrices outputted by the BioNumerics pipeline were supplied into qgraph and desired output settings (color and SNP ranges) were selected to create the transmission network.
Mycobacterium tuberculosis gene SNP search
SNPs present in M. tuberculosis isolates were identified using UNIX command line tools. When the position of each SNP was attained, specific codon mutations were visualized using Integrative Genomics Viewer (IGV) (Broad Institute, MA).
RESULTS
MTBC lineages
Of the 143 sequences analyzed, a total of 30 were excluded from further analysis due to the following: Twenty‐five sequences did not meet the inclusion criteria described in the Methods section. One failed the de novo assembly process, and thus the pipeline was not able to assemble the sequenced fragments due to errors. Three contained mixed genomic material from more than one bacterial species. Lastly, one presented with general sequencing failure. After exclusion, a total of 113 isolates were included in the final SNP analysis (Figure 1).
FIGURE 1
UPGMA rooted tree of the 113 isolates included in the analysis separated into color‐coded MTBC lineages using the Bionumerics SNP analysis pipeline. Branch numbers indicate the SNP distance between isolates. L1 includes 2 isolates; L2, one isolate; L3, 23 isolates; L4, 87 isolates. UPGMA, unweighted pair group method with arithmetic mean
UPGMA rooted tree of the 113 isolates included in the analysis separated into color‐coded MTBC lineages using the Bionumerics SNP analysis pipeline. Branch numbers indicate the SNP distance between isolates. L1 includes 2 isolates; L2, one isolate; L3, 23 isolates; L4, 87 isolates. UPGMA, unweighted pair group method with arithmetic meanOf the 113 isolates analyzed, 2 isolates, 17918 and 20850, grouped into MTBC lineage 1, Indo‐Oceanic. SNP analysis determined that the two isolates were identical with 0 SNPs occurring between them; indicating a possible transmission pair from the same individual.A single isolate, 28272, grouped into MTBC lineage 2, East Asian (Beijing). A second isolate forming a transmission pair was not identified thus indicating this was an isolated strain within the sampled population.A total of 23 isolates were grouped into MTBC lineage 3, East African Indian, separating into two transmission clusters of interest (Figure 2). Isolates 16294, 20695, 20839, 19621, 20918, and 22199 formed cluster 1 and isolates 20060, 20061, and 18346 formed cluster 2 (Table A1). The number of SNPs between each isolate can be seen in Table A1, Figure 3, and Figure 4. All samples from the two clusters contain ≤12 SNPs which may indicate that isolates within the clusters were transmitted from a single individual.
FIGURE 2
UPGMA rooted tree of the 23 isolates grouped into MTBC lineage 3 separated into possible transmission clusters. Branch numbers indicate the SNP distance between isolates. Cluster 1 consists of 6 total isolates, and cluster 2 consists of 3 isolates. White circles are isolates that did not group into clusters. UPGMA, unweighted pair group method with arithmetic mean
TABLE A1
MTBC lineage 3 clusters showing possible transmission pairs with the number of SNPs between isolates
Cluster number
Isolate pair IDs
Number of SNPs between pairs
1
16294
20695
0
19621
20918
0
16294
19621
1
16294
20918
1
20695
19621
1
20695
20918
1
16294
20839
1
20695
20839
1
16294
22199
1
20695
22199
1
2
20060
20061
0
20060
18346
2
20061
18346
2
FIGURE 3
MST of MTBC lineage 3, cluster 1. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning tree
FIGURE 4
MST of MTBC lineage 3, cluster 2. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning tree
UPGMA rooted tree of the 23 isolates grouped into MTBC lineage 3 separated into possible transmission clusters. Branch numbers indicate the SNP distance between isolates. Cluster 1 consists of 6 total isolates, and cluster 2 consists of 3 isolates. White circles are isolates that did not group into clusters. UPGMA, unweighted pair group method with arithmetic meanMST of MTBC lineage 3, cluster 1. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning treeMST of MTBC lineage 3, cluster 2. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. An MST is a subnetwork that shows the strongest connections from a larger set of weighted connections (van Dellen et al., 2018). MSTs are used in epidemiology to delineate the most likely chain of transmission during events such as an outbreak. Here, it is used to infer possible transmission between our isolates. MST, minimum spanning treeThere were a total of 87 isolates grouped into MTBC lineage 4 Euro‐American, separating into 19 clusters (Figure 5). Of the 19 clusters, transmission pairs containing ≤12 SNPs in 17 of the 19 clusters were identified. Clusters with two or more, non‐identical, isolates can be represented as a minimum spanning tree (MST) or a neighbor‐joining tree (NJT) (Figures A1, A2, A3, A4, A5, A6, A7, A8). The number of SNPs between isolates in each cluster can be found in Table A2 and their respective phylogenetic trees. The greatest number of isolate pairs in lineage 4 was found in cluster 13.
FIGURE 5
UPGMA rooted tree of isolates grouped into MTBC lineage 4 and separated into possible transmission clusters. Branch numbers indicate the SNP distance between isolates. There were 19 clusters of interest identified, each represented by a different color. White circles are isolates that did not group into clusters. Clusters 2 and 9 contained pairs that had >12 SNPs and therefore no transmission pairs. UPGMA, unweighted pair group method with arithmetic mean
FIGURE A1
MST of MTBC lineage 4, cluster 1. Numbers between branches indicate SNP distance. MST, minimum spanning tree
FIGURE A2
MST of MTBC lineage 4, cluster 3. Numbers between branches indicate SNP distance. MST, minimum spanning tree
FIGURE A3
MST of MTBC lineage 4, cluster 4. Numbers between branches indicate SNP distance. MST, minimum spanning tree
FIGURE A4
MST of MTBC lineage 4, cluster 6. Numbers between branches indicate SNP distance. MST, minimum spanning tree
FIGURE A5
MST of MTBC lineage 4, cluster 13. Numbers between branches indicate SNP distance. Two samples in one circle indicate identical isolates with 0 SNPs. MST, minimum spanning tree
FIGURE A6
MST of MTBC lineage 4, cluster 15. Numbers between branches indicate SNP distance. MST, minimum spanning tree
FIGURE A7
NJT of MTBC lineage 4, cluster 17. Numbers between branches indicate SNP distance. MST, minimum spanning tree; NJT, Neighbor‐Joining Tree
FIGURE A8
MST of MTBC lineage 4, cluster 18. Numbers between branches indicate SNP distance. MST, minimum spanning tree
TABLE A2
MTBC lineage 4 clusters showing possible transmission pairs with the number of SNPs between isolates
Cluster number
Isolate pair IDs
Number of SNPs between pairs
1
19891
27889
4
3
16607
21779
6
16607
16608
2
4
19034
26720
7
5
22466
22468
0
6
14956
19895
2
7
15545
15547
0
8
23229
26963
0
10
13577
13578
0
13577
13579
0
13578
13579
0
11
20574
20603
0
12
19595
19801
0
19595
19832
0
19801
19832
0
13
19077
20606
0
19077
16732
9
20606
16732
9
16732
15634
8
14
14158
14159
0
15
17778
17782
3
16
17549
17551
0
17
18673
20148
12
18
17085
14774
1
19
20253
20634
0
UPGMA rooted tree of isolates grouped into MTBC lineage 4 and separated into possible transmission clusters. Branch numbers indicate the SNP distance between isolates. There were 19 clusters of interest identified, each represented by a different color. White circles are isolates that did not group into clusters. Clusters 2 and 9 contained pairs that had >12 SNPs and therefore no transmission pairs. UPGMA, unweighted pair group method with arithmetic mean
Isolates visualized as a network
Using the data generated in this study, it is observed that isolates from each lineage form distinct networks connected based on the number of SNPs between isolate pairs. The isolates in lineage 3 (Figure 6A) and lineage 4 (Figure 6B) both form identifiable networks and can be visualized based on the number of SNPs separating the isolates. Each node is connected to another if they are associated with each other within the network. Possible transmission pairs containing ≤12 SNPs are highlighted in red to indicate where they fit in the transmission network. Sixteen of the 23 “lineage 3” isolates and 67 of the 87 “lineage 4” isolates were included in the analysis from their respective transmission networks.
FIGURE 6
Pairwise SNP matrix visualized as a network colored by number of SNPs: 0–12 SNPs (red), 13–50 SNPs (blue), 51–100 SNPs (green) and >100 SNPs (black). Each node represents an individual isolate. Each node is connected to another if they are associated with each other within the network. Transmission networks were created using R statistical software and data visualization package qgraph. (a) Isolates grouping in MTBC lineage 3 forming a network. (b) Isolates grouping in MTBC lineage 4 forming a network
Pairwise SNP matrix visualized as a network colored by number of SNPs: 0–12 SNPs (red), 13–50 SNPs (blue), 51–100 SNPs (green) and >100 SNPs (black). Each node represents an individual isolate. Each node is connected to another if they are associated with each other within the network. Transmission networks were created using R statistical software and data visualization package qgraph. (a) Isolates grouping in MTBC lineage 3 forming a network. (b) Isolates grouping in MTBC lineage 4 forming a network
SNPs present in Mycobacterium tuberculosis virulence genes
The genomes of the isolates in this study were examined for the presence of SNPs in commonly identified MTBC virulence‐associated genes identified by Forrellad et al., (2013). Mycobacterium tuberculosis genes containing SNPs among 50% or more of the isolates are listed in Table 1 and those less than 50% in Table A3. One hundred percent of isolates contained at least one SNP in genes htrA2, ctpV, pks12, and pstA1 when compared to the M. tuberculosis H37Rv reference genome. Furthermore, greater than half of all isolates also contained SNPs in the genes mce1, plcA, plcB, pks7, dosT, and pks5. These data suggest that SNPs in these genes may contribute to the pathogenicity of these isolates, whether it be transmission or the establishment of disease. Furthermore, if certain SNPs are present in all isolates in this cohort then we can hypothesize that they may present some survival advantage. These top 10 genes were further evaluated to determine the specific SNP(s) present in each gene and to assess any potential changes in strain virulence that could be associated with the mutation(s). Of the SNPs found, the largest number and greatest diversity were most prominent in the genes plcA and plcB, encoding phospholipase C (Table 2) and the others of interest in Table A4. Both phospholipase C genes are translated in the reverse orientation in the M. tuberculosis genome, and therefore, the SNP positions occur early in the protein‐coding regions. The SNP in plcB, position 2630173, generates a nonsense mutation (from a serine to a stop codon). Of the 8 SNPs present in plcA, half were found to be synonymous; however, some of the non‐synonymous mutations translate to changes in amino acid charge that can potentially cause modifications in protein folding or alterations in side‐chain interactions.
TABLE 1
Mycobacterium tuberculosis virulence genes containing SNPs among 50% or more of the study isolates relative to reference strain H37Rv
Gene name
Rv number
Description
Number of isolates containing SNP
Percentage of isolates containing SNP
htrA2
Rv0983
Serine protease and chaperone
113
100
ctpV
Rv0969
Copper efflux p‐type ATPase
113
100
pks12
Rv2048c
Polyketide synthase
113
100
pstA1
Rv0930
Inorganic phosphate ABC transporter
113
100
mce1
Rv0166
Mammalian cell entry protein
105
93
plcA
Rv2351c
Phospholipase C
92
81
plcB
Rv2350c
Phospholipase C
87
77
pks7
Rv1661
Polyketide synthase
66
58
dosT
Rv2027c
Transcriptional regulator
63
56
pks5
Rv1527c
Polyketide synthase
62
55
TABLE A3
Mycobacterium tuberculosis virulence genes containing SNPs among less than 50% of the study isolates relative to reference strain H37Rv
Gene name
Rv number
Description
Number of isolates containing SNP
Percentage of isolates containing SNP
fadD26
Rv2930
Fatty acid CoA synthase
29
26
RD1
Rv3868
Esx1 component
26
23
dosR
Rv3133c
Transcriptional regulator
17
15
pknD
Rv0931c
Protein kinase D
11
10
pknE
Rv1743
Serine/Threonine kinase E
11
10
sigC
Rv2069
Sigma factor C
8
7
erp
Rv3810
Exported repetitive protein
4
4
esxB
Rv3874
esx1 component
4
4
mce2
Rv0586
Mammalian cell entry protein
4
4
esxD
Rv3874
Esx1 component
4
4
sodC
Rv0432
Superoxide dismutase C
3
3
acg
Rv2032
unknown
3
3
ahpC
Rv2428
Alkyl hydroperoxide reductase C
2
2
mce4
Rv3501c
Mammalian cell entry protein
2
2
pcaA
Rv0470c
Mycolic acid synthase
1
1
hspX
Rv2031c
Alpha Crystallin protein
1
1
mce3
Rv1964
Mammalian cell entry protein
1
1
hbhA
Rv0475
Heparin‐binding hemagglutinin protein
0
0
esxA
Rv3875
Esx1 component
0
0
katG
Rv1908c
Catalase peroxidase enzyme
0
0
TABLE 2
Mycobacterium tuberculosis virulence genes containing the highest number of SNPs from study isolates showing SNP codon‐specific changes
Gene
Rv#
SNP
Position
Codon change
AA change
plcA
Rv2351c
C → G
2631556
CCG → CGG
Pro → Arg
T → C
2631565
ATG → ACG
Met → Thr
T → C
2631574
GTG → GCG
Val → Ala
G → A
2631583
AGC → AAC
Ser → Asn
G → A
2631599
GGG → GGA
Synonymous
A → G
2631620
TAA → TAG
Synonymous
A → G
2631971
CAA → CAG
Synonymous
G → C
2631977
CCG → CCC
Synonymous
plcB
Rv2350c
C → G
2630158
ACC → AGC
Thr → Ser
A → G
2630161
GAT → GGT
Asp → Gly
C → G
2630173
TCA → TGA
Ser → Stop
C → G
2630176
ACA → AGA
Thr → Arg
G → A
2630182
CGA → CAA
Arg → Gln
T → A
2630184
TGT → AGT
Cys → Ser
C → T
2630188
GCT → GTT
Ala → Val
T → G
2630206
GTC → GGC
Val → Gly
G → A
2630211
GGC → AGC
Gly → Ser
A → G
2630215
AAG → AGG
Lys → Arg
TABLE A4
Mycobacterium tuberculosis virulence genes containing the highest number of SNPs from study isolates showing SNP codon‐specific changes
Gene
Rv#
SNP
Position
Codon change
AA change
htrA2
Rv0983
T → C
1100234
CCT → CCC
Synonymous
ctpV
Rv0969
C → A
1079927
ACC → ACA
Synonymous
pks12
Rv2048c
G → C
2296042
GGT → CGT
Gly → Arg
G → T
2297287
TGC → TTC
Cys → Phe
A → G
2300237
CCA → CCG
Synonymous
A → T
2300546
CGA → CGT
Synonymous
T → G
2300552
TGT → TGG
Synonymous
psta1
Rv0930
C → T
1037911
GCG → GTG
Ala → Val
T → C
1037012
AAT → AAC
Synonymous
mce1
Rv0166
C → T
196642
ACC → ATC
Thr → Ile
pks7
Rv1661
T → G
1875544
GTT → GGT
Val → Gly
dosT
Rv2027c
C → T
2273627
CCC → CCT
Synonymous
pks5
Rv1527c
G → A
1724120
AGG → AGA
Synonymous
Mycobacterium tuberculosis virulence genes containing SNPs among 50% or more of the study isolates relative to reference strain H37RvMycobacterium tuberculosis virulence genes containing the highest number of SNPs from study isolates showing SNP codon‐specific changes
DISCUSSION
In this study, we showed that possible transmission relationships do exist between numerous M. tuberculosis isolates collected from patients presenting with pulmonary TB symptoms in a defined geographic region (the Rubaga Division of Kampala, Uganda) based on genome sequence comparisons. One hundred and thirteen isolates were included in the SNP analysis and grouped into distinct MTBC lineages. According to the SNP analysis, using a threshold of ≤12 SNPs as indicative of a transmission pair, transmission pairs in all lineages containing at least two isolates were found, with lineage 4 having the highest frequency, the most transmission pairs, and the most isolates. This should be expected as lineage 4 is the dominant lineage present in Uganda, followed by lineage 3 (Wampande et al., 2015).When a pairwise SNP matrix is generated to visualize the isolates (Figure 6), clear relationships can be seen by the connection of the isolates to each other. These data not only show possible transmission of M. tuberculosis isolates between individuals, but the transmission networks identified, once combined with epidemiological data, will allow public health interventions to be implemented in this region for social gatherings and other establishments that are frequented by the human TB transmitters.This type of study also allows the correlation of SNPs in specific genes that may translate into functional differences in the resulting products and thus alterations in virulence phenotypes including transmission efficiency. Even though our overall understanding of many virulence factors expressed by M. tuberculosis is limited, some gene functions are fairly well defined and this type of analysis can add to that understanding. For example, multiple mutations in the genes plcA and plcB, especially a nonsense mutation in the latter, bring about the question of survival advantage to the bacteria. The plcABCD family of genes encodes a phospholipase C, playing a role in pathogenesis by cleaving phospholipids during intracellular replication and trafficking during acute infection (Talarico et al., 2005). These genes also have been shown to have sphingomyelinase activity which can catalyze the hydrolysis of sphingomyelin and can interfere with the host inflammatory response aiding the infection (Castro‐Garza et al., 2016). Alteration and/or inactivation of those genes as observed in our study isolates could potentially modify virulence to decrease lung damage and prolong a less severe disease stage for the host. An example of this concept was shown for Pseudomonas aeruginosa wild‐type infection caused significant lung function impairment and rapid death of the host animal (Wargo et al., 2011), whereas the effects of infection with a phospholipase C mutant strain were less severe, potentially permitting longer co‐survival of pathogen and host.Several future studies can be performed based on the data generated in this study. For example, the project protocol required patients to give a minimum of two sputum samples; however, the relationship between the isolates found in samples from the same person was not analyzed. Therefore, future studies should consider analyzing SNP differences between isolates collected from the same patient to determine within‐patient differences in the M. tuberculosis genomes from these infected individuals. This would possibly help determine if a person carries more than one strain of M. tuberculosis during infection in that region or if transmission occurred from multiple individuals. One limitation of this study is that it was conducted in one Division of Kampala, Rubaga. Rubaga was chosen for this study for several reasons: First, it was established by Sekandi et al. that Rubaga was an area of high tuberculosis disease burden (Sekandi et al., 2014). Next, due to the high levels of disease burden, we should also expect to see high levels of transmission. Third, the principal investigators have an established working relationship with the local community, the community health system, and political leaders. Lastly, due to the established relationship, the investigators have the trust of the community. Due to this geographical limitation, we suggest that this type of analysis should be expanded beyond the Rubaga Division to determine more transmission networks where interventions can be incorporated and to make the data more generalizable to more regions and potentially to the entire country.Currently, few countries have the capability to whole‐genome sequence every M. tuberculosis isolate to help better define transmission patterns and thus make national public health policy. Additionally, the minimum number or percentage of isolates needed to be sequenced in a region or country to help determine the most accurate transmission model has not been determined. Thus, in most TB endemic and non‐endemic areas of the world, smaller studies like this one are generating local transmission models as we plan for more expansive future programs (Gurjav et al., 2016).
The study was approved by the University of Georgia Institutional Review Board, the Higher Degrees Research and Ethics Committee at Makerere University School of Public Health, and approved by the Uganda National Council for Science and Technology.
Authors: Marina A Forrellad; Laura I Klepp; Andrea Gioffré; Julia Sabio y García; Hector R Morbidoni; María de la Paz Santangelo; Angel A Cataldi; Fabiana Bigi Journal: Virulence Date: 2012-10-17 Impact factor: 5.882
Authors: Suani T R Pinho; Susan M Pereira; José G V Miranda; Tonya A Duarte; Joilda S Nery; Maeli G de Oliveira; M Yana G S Freitas; Naila A De Almeida; Fabio B Moreira; Raoni B C Gomes; Ligia Kerr; Carl Kendall; M Gabriela M Gomes; Theolis C B Bessa; Roberto F S Andrade; Mauricio L Barreto Journal: Tuberculosis (Edinb) Date: 2020-10-24 Impact factor: 3.131
Authors: Suzanne Verver; Robin M Warren; Zahn Munch; Madalene Richardson; Gian D van der Spuy; Martien W Borgdorff; Marcel A Behr; Nulda Beyers; Paul D van Helden Journal: Lancet Date: 2004-01-17 Impact factor: 79.321
Authors: Jorge Castro-Garza; Francisco González-Salazar; Frederick D Quinn; Russell K Karls; Laura Hermila De La Garza-Salinas; Francisco J Guzmán-de la Garza; Javier Vargas-Villarreal Journal: Rev Argent Microbiol Date: 2016-03-03 Impact factor: 1.852
Authors: Juliet N Sekandi; Sarah Zalwango; Leonardo Martinez; Andreas Handel; Robert Kakaire; Allan K Nkwata; Amara E Ezeamama; Noah Kiwanuka; Christopher C Whalen Journal: BMC Infect Dis Date: 2015-08-21 Impact factor: 3.090
Authors: Ree M Meertens; Vivian M J Van de Gaar; Maitta Spronken; Nanne K de Vries Journal: BMC Med Inform Decis Mak Date: 2013-12-18 Impact factor: 2.796