Literature DB >> 35987561

Genomic characteristics of two most widely used BCG vaccine strains: Danish 1331 and Pasteur 1173P2.

Mahla Asadian1, Seyed Mehdi Hassanzadeh2, Azadeh Safarchi3, Masoumeh Douraghi4.   

Abstract

BACKGROUND: Bacillus Calmette-Guérin (BCG) refers to a group of vaccine strains with unique genetic characteristics. BCG is the only available vaccine for preventing tuberculosis (TB). Genetic and biochemical variations among the BCG vaccine strains have been considered as one of the significant parameters affecting the variable protective efficacy of the vaccine against pulmonary tuberculosis. To track genetic variations, here two vaccine strains (Danish 1331 and Pasteur 1173P2) popularly used according to the BCG World Atlas were subjected to a comparative analysis against the Mycobacterium tuberculosis H37Rv, Mycobacterium bovis AF2122/97, and Mycobacterium tuberculosis variant bovis BCG str. Pasteur 1173P2 reference genomes. Besides, the presence or absence of the experimentally verified human T cell epitopes was examined.
RESULTS: Only two variants were identified in BCG Danish 1331 that have not been reported previously in any BCG strains with the complete submitted genome yet. Furthermore, we identified a DU1-like 14,577 bp region in BCG Danish 1331; The duplication which was previously seemed to be exclusive to the BCG Pasteur. We also found that 35% of the T cell epitopes are absent from both strains, and epitope sequences are more conserved than the rest of the genome.
CONCLUSIONS: We provided a comprehensive catalog of single nucleotide polymorphisms (SNPs) and short insertions and deletions (indels) in BCG Danish 1331 and BCG Pasteur 1173P2. These findings may help determine the effect of genetic variations on the variable protective efficacy of BCG vaccine strains.
© 2022. The Author(s).

Entities:  

Keywords:  BCG vaccine; Danish 1331; Genomic analysis; Pasteur 1173P2

Mesh:

Substances:

Year:  2022        PMID: 35987561      PMCID: PMC9392950          DOI: 10.1186/s12864-022-08826-9

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   4.547


Background

Bacillus Calmette-Guérin (BCG), an attenuated derivative of Mycobacterium bovis (M. bovis), is the only vaccine used against tuberculosis. It is obtained through 230 consecutive in vitro passages over 13 years at the Pasteur Institute of Lille in 1921 [1]. The vaccine has been used to immunize more than four billion people over a century, which has made BCG the most widely used vaccine [1]. In 1924, the primary BCG vaccine was distributed to different countries, and the continuous subcultures under different conditions led to the emergence of various vaccine strains. Until the development of the seed lot system in the 1960s, BCG vaccine strains were exposed to more than 1000 passages in different laboratories that resulted in genotypic and phenotypic differences among them [2, 3]. Although the role of BCG has been proved in the prevention of disseminated forms of tuberculosis in children [4, 5], its protective efficacy is highly variable against pulmonary tuberculosis in adults (0%-80%) [6, 7]. Several factors, such as the usage of different vaccine strains with genetic differences, contact with non-tuberculosis mycobacteria (NTMs), host genetic factors, and diversities in the circulating Mycobacterium tuberculosis (M. tuberculosis) strains are associated with variability in protection [8-11]. While there is evidence about the evolution of the BCG vaccine strains since 1921 [1, 2], the genetic differences among these strains as one of the most important factors that might affect immunogenicity, viability, virulence, and thus variable efficacy have been little investigated. The genetic events may be shared among all strains and appear to be involved in attenuation of the primary strain or may be observed in each of the vaccine strains exclusively and be responsible for their over-attenuation [12-18]. Therefore, there is a need to examine the frequency of these genetic differences and their relationship with the phenotype of vaccines used by comparing the whole genome sequences of the BCG vaccine strains. In 2009 and 2010, to standardize the vaccine production, Danish 1331, Tokyo 172–1, Russia BCG-1, and Moreau-RJ strains were introduced by the WHO Expert Committee on Biological Standardization (ECBS) as BCG reference strains [19, 20]. However, the chosen vaccine strain is different in the various countries and there is insufficient evidence to prove which strain provides the best protective efficacy against tuberculosis [21]. According to the latest update of the BCG World Atlas in 2020, Danish 1331 (16.6%), Pasteur 1173P2 (9.2%) and, Tokyo 172 (7.3%) strains are the most globally used strains for vaccine production, respectively [21]. This study aimed to analysis the complete genome sequences of the two most widely used BCG vaccine strains, including the reference strain introduced by WHO (Danish 1331) and the BCG strain used in Iran (Pasteur 1173P2). The main hypothesis of this study was to identify the new genetic variations in these two BCG strains that have not been reported so far in comparison to the reference strains.

Results

Genomic features for BCG Pasteur 1173P2 and BCG Danish 1331

The quality control of the assemblies estimated the total length of the contigs to be 4.2 Mbp for Pasteur 1173P2 and 4.3 Mbp for Danish 1331, and the GC content approximately 65% for both strains. A total of 4060 genes for Pasteur 1173P2 and 4037 for Danish 1331 were identified through scaffolds annotation (Fig. 1a-b). Total coding sequences (CDSs) and tRNA genes were found 4006 and 50 for Pasteur 1173P2 and 3982 and 51 for Danish 1331, respectively. Three genes for rRNA (5S, 16S, and 23S) and one for tmRNA (ssrA) were identified in both strains (Fig. 1a-b). In comparison to M. tuberculosis H37Rv, 58 and 31 genes encoding PPE and PE family proteins were recognized in both strains, respectively. In addition, 58 genes encoding proteins of the PE_PGRS subfamily were identified in Pasteur 1173P2 and 59 in Danish 1331. Among the regions of difference (RDs) that can differentiate between these two late strains, RD14 and N-RD18 were identified in Pasteur 1173P2 while not exist in Danish 1331. Moreover, we identified a DU1-like 14,577 bp region in Danish 1331.
Fig. 1

General genomic features of the BCG Pasteur 1173P2 and BCG Danish 1331. a Circular representation of the BCG Pasteur 1173P2 contigs using Proksee (https://proksee.ca). The scale is shown in megabases on the black central circle. Moving inward, two outer violet circles show forward and reverse strand CDSs, respectively. Some genes are shown on the outer violet circle with the Proksee's default. The tRNAs (orange arrows), rRNAs (light blue arrows), tmRNA (red arrow) and two CRISPR sequences (light green arrows adjacent each other) are shown in CDSs circles. The next circle shows GC content (dark blue) followed by the GC skew (dark green and pink). b Circular representation of the BCG Danish 1331 contigs using Proksee (https://proksee.ca). The scale is shown in megabases on the black central circle. Moving inward, two outer dark blue circles show forward and reverse strand CDSs, respectively. Some genes are shown on the outer dark blue circle with the Proksee's default. The tRNAs (orange arrows), rRNAs (light blue arrows), tmRNA (red arrow) and two CRISPR sequences (light green arrows) are shown in CDSs circles. The next circle shows GC content (dark green) followed by the GC skew (violet and pink). Category 0 > Virulence, detoxification, adaptation. Category 1 > Lipid metabolism. Category 2 > Information pathways. Category 3 > Cell wall and cell processes. Category 5 > Insertion sequences and phages. Category 6 > PE/PPE. Category 7 > Intermediary metabolism and respiration. Category 8 > Unknown. Category 9 > Regulatory proteins. Category 10 > Conserved hypothetical proteins. Category 16 > Conserved hypothetical with an orthologous in M. bovis

General genomic features of the BCG Pasteur 1173P2 and BCG Danish 1331. a Circular representation of the BCG Pasteur 1173P2 contigs using Proksee (https://proksee.ca). The scale is shown in megabases on the black central circle. Moving inward, two outer violet circles show forward and reverse strand CDSs, respectively. Some genes are shown on the outer violet circle with the Proksee's default. The tRNAs (orange arrows), rRNAs (light blue arrows), tmRNA (red arrow) and two CRISPR sequences (light green arrows adjacent each other) are shown in CDSs circles. The next circle shows GC content (dark blue) followed by the GC skew (dark green and pink). b Circular representation of the BCG Danish 1331 contigs using Proksee (https://proksee.ca). The scale is shown in megabases on the black central circle. Moving inward, two outer dark blue circles show forward and reverse strand CDSs, respectively. Some genes are shown on the outer dark blue circle with the Proksee's default. The tRNAs (orange arrows), rRNAs (light blue arrows), tmRNA (red arrow) and two CRISPR sequences (light green arrows) are shown in CDSs circles. The next circle shows GC content (dark green) followed by the GC skew (violet and pink). Category 0 > Virulence, detoxification, adaptation. Category 1 > Lipid metabolism. Category 2 > Information pathways. Category 3 > Cell wall and cell processes. Category 5 > Insertion sequences and phages. Category 6 > PE/PPE. Category 7 > Intermediary metabolism and respiration. Category 8 > Unknown. Category 9 > Regulatory proteins. Category 10 > Conserved hypothetical proteins. Category 16 > Conserved hypothetical with an orthologous in M. bovis

Genetic variations in comparison to M. tuberculosis H37Rv

The mean nucleotide variations (SNPs and Indels below 100 bp) was 549.7 for Pasteur 1173P2 and 559.9 for Danish 1331 per megabase of contigs length. In comparing the genome sequence of M. tuberculosis H37Rv with those of Pasteur 1173P2 and Danish 1331, a total of 2289 SNPs was identified (Table S1). The 2155 SNPs were shared in both strains, while 56 were found in Pasteur 1173P2 and 78 in Danish 1331. From 1992 SNPs occurred within 1328 genes, 63.8% are non-synonymous. Non-synonymous SNPs (nSNP) were found in 28 genes that resulted in translation shift compared to their homolog in M. tuberculosis H37Rv. Substitution of X404Q in surface-associated esterase LipC leads to a longer protein and nonsense mutations in ugpB (Rv2833c) and lpdA (Rv3303c) result in the lack of functional proteins. Analysis of strain-specific SNPs revealed only a nonsense mutation in galE2 (Rv0501) and an nSNP in Rv1830 in the Danish 1331. Consideration of genes encoding regulatory proteins in both strains revealed a mutation in virS (Rv3082c) that could affect the transcription of the mymA operon (Rv3089-Rv3083). Mutations were also found in crp (Rv3676), pstP (Rv0018c), and fhaA -as a mycobacterial core gene- (Rv0020c). Members of the mycobacterial serine/threonine protein kinases family (PknH, PknF, PknL, and PknK) and two-component systems (SenX3-RegX3, MprAB, KdpDE, etc.) were also identified in the regulators containing variants. pks12, as the largest open reading frame (ORF) in the mycobacterial genome, showed the most nSNPs among genes. In total, 47 nSNPs were detected in genes specifically required for mycobacterial in vivo survival in both strains (Table 1). Non-synonymous mutations were also identified in sigma factors involved in the initiation of replication. We found the substitution of guanine residue by adenine in the initiation codon of sigK (Rv0445c) and a nonsense mutation in Rv3687c encoding the anti-sigma factor antagonist RsfB. SNPs were also found in loci encoding ribosomal proteins in both strains. These mutations, which cause allelic differences between M. tuberculosis and BCG, lead to an altered amino acid at only four loci.
Table 1

Required genes for mycobacterial in vivo growth that contain non-synonymous SNPs in Pasteur 1173P2 and Danish 1331

NoH37Rv locus tagGene nameProductNon-synonymous amino acid substitutionPasteur 1173P2Danish 1331
1Rv0101nrpPeptide synthetase NrpL1365M +  + 
2Rv0169mce1AMce family protein Mce1AS313A, P359S +  + 
3Rv0170mce1BMce family protein Mce1BI179T +  + 
4Rv0171mce1CMce family protein Mce1CE212D +  + 
5Rv0176-Mce associated transmembrane proteinN285S, S291A +  + 
6Rv0218-Transmembrane proteinC316R, D413N +  + 
7Rv0490senX3Two component sensor histidine kinase SenX3F109S +  + 
8Rv0636hadB(3R)-hydroxyacyl-ACP dehydratase subunit HadBT54A +  + 
9Rv0643cmmaA3Methoxy mycolic acid synthase MmaA3G98D +  + 
10Rv1028ckdpDSensor protein KdpDN776D, P368S, G295D, P83S +  + 
11Rv1109c-Hypothetical proteinA147T +  + 
12Rv1128c-Hypothetical proteinE270G +  + 
13Rv1204c-Hypothetical proteinL484I +  + 
14Rv1224tatBSec-independent protein translocase protein TatBW8G +  + 
15Rv1244lpqZLipoprotein LpqZQ242K +  + 
16Rv1338murIGlutamate racemaseR154L +  + 
17Rv1371-Membrane proteinI368V +  + 
18Rv1460-Transcriptional regulatorI198F, A266V +  + 
19Rv1640clysXBifunctional lysine-tRNA ligase/phosphatidylglycerol lysyltransferaseD944G, D769E +  + 
20Rv2048cpks12Polyketide synthaseA4047S, P3095L, S2964R H2147Q, G1865S +  + 
21Rv2072ccobLPrecorrin-6Y C(5,15)-methyltransferaseL205P +  + 
22Rv2275-Cyclo(L-tyrosyl-L-tyrosyl) synthaseE261A +  + 
23Rv2359zurZinc uptake regulation proteinH64R +  + 
24Rv2374chrcAHeat-inducible transcription repressor HrcAR79Q + -
25Rv2388chemNOxygen-independent coproporphyrinogen III oxidaseA184T +  + 
26Rv2483cplsCBifunctional L-3 phosphoserine phosphatase/1-acyl-sn-glycerol-3-phosphate acyltransferaseC189G +  + 
27Rv2502caccD1Acetyl/propionyl-CoA carboxylase subunit betaF343L, G77S +  + 
27Rv2692ceoCTRK system potassium uptake protein CeoCI133V +  + 
29Rv2696c-Hypothetical proteinD164N +  + 
30Rv2702ppgKPolyphosphate glucokinaseI203T +  + 
31Rv2813-Hypothetical proteinI76V +  + 
32Rv2845cproSProline–tRNA ligaseA232T, H177R +  + 
33Rv2936drrADaunorubicin ABC transporter ATP-binding protein DrrAH309D +  + 
34Rv2981cddlAD-alanine–D-alanine ligaseT365A +  + 
35Rv3042cserB2Phosphoserine phosphatase SerBG116E, A70S +  + 
36Rv3061cfadE22Acyl-CoA dehydrogenase FadE22S497C, K488E +  + 
37Rv3087-Diacylglycerol O-acyltransferaseL447V +  + 
38Rv3114-Hypothetical proteinS11P +  + 
39Rv3277-Transmembrane proteinS272L +  + 
40Rv3335c-Integral membrane proteinA86V +  + 
41Rv3371-Diacylglycerol O-acyltransferaseR339G, I368V +  + 
42Rv3497cmce4CMce family protein Mce4CT46P +  + 
43Rv3551-CoA-transferase subunit alphaA7S +  + 
44Rv3563fadE32Acyl-CoA dehydrogenase FadE32Q105R, W275S +  + 
45Rv3616cespAESX-1 secretion-associated protein EspAT192I, A4V +  + 
46Rv3805caftBTerminalbeta-(1- > 2)-arabinofuranosyltransferaseI327V +  + 
47Rv3868eccA1ESX-1 secretion system protein EccA1A243V +  + 
48Rv3910-Peptidoglycan biosynthesis proteinV480A +  + 
Required genes for mycobacterial in vivo growth that contain non-synonymous SNPs in Pasteur 1173P2 and Danish 1331 A total of 222 indels below 100 bp were detected, of which 199 occurred in both strains, 14 in Danish 1331 and nine in Pasteur 1173P2 (Table S2). In total, deletions accounted for 53.1% and insertions 46.9%. Of the 154 indels in the genes, 115 led to frameshift mutations and abnormally short and long proteins. Danish 1331 was found as a phoR mutant due to a 10 bp deletion, whereas this loss was not detected for Pasteur 1173P2. Indels identified in both strains included frameshift insertions in mce1R (Rv0165c) and pknD (Rv0931c) encoding transcriptional regulators belonging to the GntR family and serine/threonine protein kinases. As well, the phoT (Rv0820) and pstB (Rv0933), which encode members of the ABC transporter complex PstSCAB, nrdZ (Rv0570), recB (Rv0630c), treZ (Rv1562c), and stp (Rv2333c) have shifted frames. Both strains were identified as sigM mutants. Furthermore, in the study of MIRU-VNTR loci, we found that locus 580 located in the intergenic region of the genes encoding the components of the SenX3-RegX3 two-component regulatory system had two and three 77 bp repeats in Pasteur 1173P2 and Danish 1331, respectively. Assessment of the distribution of the variants in the functional classification of genes encoding a protein in M. tuberculosis using TubercuList (http://genolist.pasteur.fr/TubercuList/) showed that SNPs rate in genes involved in intermediate metabolism and respiration is higher in both strains (Fig. 2a). Most indels were detected in Pasteur 1173P2 and Danish 1331 in genes encoding proteins involved in cellular processes and conserved hypotheticals, respectively (Fig. 2b).
Fig. 2

a SNPs rate in the functional classification of genes encoding a protein in M. tuberculosis. b Indels rate in the functional classification of genes encoding a protein in M. tuberculosis. Category 0 > Virulence, detoxification, adaptation. Category 1 > Lipid metabolism. Category 2 > Information pathways. Category 3 > Cell wall and cell processes. Category 5 > Insertion sequences and phages. Category 6 > PE/PPE. Category 7 > Intermediary metabolism and respiration. Category 9 > Regulatory proteins. Category 10 > Conserved hypothetical proteins. Category 16 > Conserved hypothetical with an orthologous in M. tuberculosis

a SNPs rate in the functional classification of genes encoding a protein in M. tuberculosis. b Indels rate in the functional classification of genes encoding a protein in M. tuberculosis. Category 0 > Virulence, detoxification, adaptation. Category 1 > Lipid metabolism. Category 2 > Information pathways. Category 3 > Cell wall and cell processes. Category 5 > Insertion sequences and phages. Category 6 > PE/PPE. Category 7 > Intermediary metabolism and respiration. Category 9 > Regulatory proteins. Category 10 > Conserved hypothetical proteins. Category 16 > Conserved hypothetical with an orthologous in M. tuberculosis

Genetic variations in comparison to M. bovis AF2122/97

In comparison to the M. bovis AF2122/97 genomic sequence, a set of 728 high-quality SNPs was identified (Table S3). Of these, 645 SNPs occurred in genes and 83 SNPs located in intergenic regions. 672 SNPs were found in both strains, 34 SNPs in Danish 1331 and 22 in Pasteur 1173P2 alone. The 65.6% of SNPs in genes were associated with non-synonymous amino acid substitutions and 34.4% with synonymous. Ten nSNPs cause a change in protein length; Of them, seven SNPs result in the longer protein and three lead to the shorter ones. Several SNPs, including mutation in galE2 (Mb0513) in Danish 1331, were also identified compared to the M. tuberculosis H37Rv. In addition, mutations in lprL (Mb0609) and fadB3A (Mb1742) lead to longer products than their homolog in M. bovis. The missense mutations in mmaA3 (Mb0662c) and pykA (Mb1643) were also identified in both strains. Out of 74 indels found, 62 in both strains, five in the Pasteur 1173P2, and seven in the Danish 1331 were identified (Table S4). The 55.4% of indels were insertions and 44.6% deletions. Fifty-seven indels occurred in genes, 16 in intergenic regions, and one insertion in a pseudogene. Among the indels in the genes, 42 were frameshift mutations with one to ten base pair insertions or deletions. Pasteur 1173P2-specific indels including 10 and 15 bp deletions and a single base pair insertion in the genes, and two frameshift insertions in the intergenic regions, which were identified when compared to the M. tuberculosis H37Rv. Moreover, the Danish 1331-specific indels were three insertions including one and 5 bp with three deletions including one and 10 bp in the genes and a 3 bp insertion in the intergenic region. Some of these were common with the identified indels compared to the M. tuberculosis H37Rv. Analyzes also showed frameshifts in fusA2a (Mb0125C), ugpB (MB2857C), ugpAa (Mb2860C), and glpK (Mb3722C) (Involved in glycerol catabolism) compared to M. bovis AF2122/97. Using BoviList (http://genolist.pasteur.fr/BoviList/), it was shown that the rate of SNPs and indels in the genes involved in the intermediate metabolism and respiration, and the genes encoding proteins involved in the cell wall and cell processes is higher in both strains, respectively (Fig. 3a-b).
Fig. 3

a SNPs rate in the functional classification of genes encoding a protein M. bovis. b Indels rate in the functional classification of genes encoding a protein in M. bovis

a SNPs rate in the functional classification of genes encoding a protein M. bovis. b Indels rate in the functional classification of genes encoding a protein in M. bovis

Genetic variations in comparison to M. tuberculosis variant bovis BCG str. Pasteur 1173P2

A total of 37 variants (30 SNPs and 7 Indels) were identified (Table S5). The identified SNPs were three (a synonymous SNP [sSNP] and two SNPs in the intergenic region) for the Pasteur 1173P2 and 30 (14 nSNPs and seven sSNPs in the genes, and nine in the intergenic regions) for Danish 1331. Most of the SNPs identified in Danish 1331 compared to Pasteur 1173P2 were also found as Pasteur-SNPs compared to M. bovis AF2122/97. A missense mutation was observed in RS02785, encoding the oxidoreductase of the SDR family in Danish 1331. An insertion was shared between two strains, which was in frame. Three indels in the genes showed frameshift mutations in Danish 1331. One and 10 bp deletions in RS20015 and phoR, respectively, along with a 10 bp insertion in RS08265, were also detected in Danish 1331 compared to M. bovis AF2122/97.

Presence or absence of antigenic epitopes

Here, we examined 486 experimentally verified human T cell epitopes in the M. tuberculosis H37Rv. Our findings showed that 172 epitopes (35.4%) were absent in both BCG strains (Table S6). Moreover, the Pasteur 1173P2 has lost four additional epitopes relative to Danish 1331, which is associated with the RD14. Most of the absent epitopes from both strains (133 epitopes, 77.3%) are related to the RD1, which was deleted from all BCG strains between 1908 and 1921 [1]. The 23 deleted epitopes were located in RD2 containing the immunogenic protein Mpt64, which is absent only in late strains (obtained after 1927) [1]. Other missing epitopes belonged to RD7 (n = 4), RD3 and RD10 (n = 3), RD13 (n = 2) and RD4, RD5, RD11, and IS1081 (n = 1). A total of 42 antigens in both strains, four in Pasteur 1173P2 and one in Danish 1331, have at least a genetic variation outside their epitopic regions (Table 2). While the antigenic epitopes appear to be conserved amino acid sequences, we found four amino acid substitutions (three nSNPs and one sSNP) in six epitopes in both strains (Table 3). A transmembrane protein (Protein ID: NP_216249.1) had the highest variant-carrying epitopes. A missense mutation in Rv1733c substitutes an uncharged hydrophilic amino acid with a positively charged one in position 68 of the protein chain.
Table 2

M. tuberculosis antigens containing variants in non-epitope sequences in two BCG strains

Epitope IDH37Rv locus tagGene nameProductGenetic variationPasteur 1173P2Danish 1331
2190Rv0171mce1CMce family protein Mce1CnSNPa +  + 
4002Rv3018cPPE46PPE family protein PPE46nSNP + -
4520Rv2627c-Hypothetical proteinnSNP +  + 
9474Rv2608PPE42PPE family protein PPE42nSNP +  + 
18,059Rv1291c-Hypothetical proteinsSNPb +  + 
24,566Rv1037cesxIESAT-6 like protein EsxInSNP + -
32,710Rv3467-Hypothetical proteinnSNPs +  + 
32,860Rv0670endEndonuclease IVsSNP +  + 
39,011Rv3804cfbpADiacylglycerol acyltransferase/mycolyltransferase Ag85AsSNP +  + 
45,757Rv0170mce1BMce family protein Mce1BnSNP +  + 
55,156Rv1945-Hypothetical proteinsSNPs & nSNPs +  + 
55,188Rv1641infCInitiation factor IF-3nSNP +  + 
55,191Rv3689-Transmembrane proteinnSNP +  + 
55,192Rv3378c-Diterpene synthasesSNP & nSNP +  + 
55,315Rv2823c-CRISPR-associated protein Cas10/Csm1Insertion +  + 
55,334Rv2476cgdhNAD-dependent glutamate dehydrogenasenSNP & Deletion +  + 
57,680Rv0174mce1FMce family protein Mce1FsSNPs +  + 
60,095Rv3714c-Hypothetical proteinsSNP +  + 
64,663Rv0169mce1AMce family protein Mce1AnSNPs +  + 
92,817Rv1886cfbpBDiacylglycerol acyltransferase/mycolyltransferase Ag85BnSNP +  + 
93,270Rv3839-Hypothetical proteinnSNP +  + 
99,857Rv2770cPPE44PPE family protein PPE44sSNP & nSNP +  + 
99,866Rv2770cPPE44PPE family protein PPE44sSNP & nSNP + -
118,590Rv2600-Integral membrane proteinDeletion +  + 
120,392Rv1866-Hypothetical proteinsSNP & nSNP +  + 
120,408Rv3883cmycP1Membrane-anchored mycosinsSNP & nSNP +  + 
120,481Rv0934pstS1Phosphate ABC transporter substrate-binding lipoprotein PstSnSNP +  + 
120,887Rv3736-AraC/XylS family transcriptional regulatorsSNP & nSNP +  + 
121,059Rv0755cPPE12PPE family protein PPE12sSNP & nSNP +  + 
125,165Rv1361cPPE19PPE family protein PPE19sSNPs & nSNP +  + 
126,028Rv3296lhrATP-dependent helicasenSNPs +  + 
126,912Rv0024-NLP/P60 family proteinDeletion +  + 
140,543Rv2006otsB1Trehalose-6-phosphate phosphatase OtsBsSNPs & nSNP +  + 
140,561Rv1997ctpFCation transporter ATPase FsSNP & nSNP +  + 
140,576Rv2780aldL-alanine dehydrogenaseDeletion +  + 
140,597Rv3499cmce4AMce family protein Mce4AsSNP + -
140,615Rv2531cadiAmino acid decarboxylasesSNP +  + 
140,617Rv2813-Hypothetical proteinnSNP +  + 
144,870Rv1769-Hypothetical proteinsSNP- + 
161,402Rv0787-Hypothetical proteinnSNP +  + 
168,735Rv1789PPE26PPE family protein PPE26sSNP +  + 
196,087Rv3343cPPE54PPE family protein PPE54sSNPs & nSNPs +  + 
229,047Rv1009rpfBResuscitation-promoting factor RpfBnSNPs +  + 
738,104Rv3616cespAESX-1 secretion-associated protein EspAnSNPs +  + 
851,000Rv1626-Two-component system transcriptional regulatorsSNP +  + 
857,468Rv3792aftAArabinofuranosyltransferasesSNP +  + 
1,081,150Rv0442cPPE10PPE family protein PPE10nSNPs +  + 

aNon-synonymous SNP

bSynonymous SNP

Table 3

M. tuberculosis antigens containing variants in epitope sequences in two BCG strains

Epitope IDH37Rv locus tagGene nameProductEpitope sequenceGenetic variationAmino acid substitutionPasteur 1173P2Danish 1331
20,707Rv3497cmce4CMce family protein Mce4CGKTYDAYFTDAGGITPGnSNPaThr > Pro +  + 
106,585Rv2628-Hypothetical proteinKVQSATIYQVTDRSHnSNPSer > Leu +  + 
120,511Rv0956purNPhosphoribosylglycinamide formyltransferase PurNETLHERIKVTERRLLVAAVAALATHsSNPbHis > His +  + 
153,959Rv1733c-Transmembrane proteinAAAGTAVQDSRSHVYAHQAQnSNPGln > His +  + 
155,973Rv1733c-Transmembrane proteinTVSLLTIPFAAAAGTAVQDSnSNPGln > His +  + 
434,619Rv1733c-Transmembrane proteinIPFAAAAGTAVQDSRSHVYAHQAQTRHPnSNPGln > His +  + 

aNon-synonymous SNP

bSynonymous SNP

M. tuberculosis antigens containing variants in non-epitope sequences in two BCG strains aNon-synonymous SNP bSynonymous SNP M. tuberculosis antigens containing variants in epitope sequences in two BCG strains aNon-synonymous SNP bSynonymous SNP Fifty-four antigenic epitopes from M. bovis AF2122/97 were also examined in both strains (Table S7). Our findings revealed that nine epitopes (16.6%) were deleted from both strains related to RD1. Of the 45 antigens present, only one non-polar to polar amino acid substitution was found to be outside the epitope sequence of Eis N-acetyltransferase.

Discussion

There is sufficient evidence that BCG strains have undergone variations in their genomes since they were derived from the parental BCG. Genetic differences among the BCG strains have been considered as one of the factors associated with the vaccine variable efficacy [22], but screening of these differences is a prerequisite for demonstrating the variation in the vaccine efficacy. In this study, we attempted to extract the genetic differences between BCG Pasteur 1173P2 and BCG Danish 1331 with M. tuberculosis H37Rv and M. bovis AF2122/97 based on whole genome sequencing (WGS) data. According to the latest update of the BCG World Atlas in 2020 [21], Danish 1331 is the most common BCG vaccine strain used in 27 countries (16.6%) worldwide. Pasteur 1173P2 is in the second place with a frequency of 15 countries (9.2%). However, there is no evidence as to which vaccine is superior to the others, and therefore different BCG strains are used in immunization programs around the world. In 2007, WGS of the Pasteur 1173P2 was performed using the Sanger sequencing (ABI3700) of the pUC19, pMAQ1b, and M13 libraries, and a whole-genome shotgun library prepared in the pCDNA2.1 [14]. The WGS of the Danish 1331 was also recently performed by combining a second (Illumina MiSeq) and third (PacBio) generation technologies [23]. Consistent with Brosch’s study [14], we reported 2211 and 694 SNPs in BCG Pasteur 1173P2 compared to the M. tuberculosis H37Rv and M. bovis AF2122/97, respectively. However, another study using NimbleGen detected 1010 SNPs between BCG Pasteur and M. tuberculosis H37Rv; Of them, 945 were correctly identified compared to the whole genome sequence [24]. In addition, the study showed that the NimbleGen method has a limited ability to identify SNPs. Unlike the previous study that reported 42 SNPs in Danish 1331 compared to the Pasteur 1173P2 [23], we found 30 SNPs. This discrepancy is probably due to the inability of short read-based sequencing techniques to identify variants in the PE_PGRS genes. In addition, large indels (i.e., RDs) cannot be accurately examined using short reads alone [23]. The ideal to identify variants and RDs is to use hybrid sequencing platforms that create long read along with short reads, which can minimize variants due to the sequencing errors and provide the correct report for genomic variations and large differences [23]. Due to the presence of inherently repetitive structures in the mycobacterium genome, the use of short reads alone may falsely identify these structures as large indels [23]. Therefore, to prevent incorrect reports, we only examined the RDs and duplications that were previously reported. Deletion of MB0097c-MB0098c, as a Danish determinant, was identified by the current study as described by Abdallah et al. [25]. Inconsistent, this deletion was not detected by Borgers in the Danish 1331 sequence [23]. We also identified this deletion in the Pasteur 1173P2 with different length, which was not reported by Abdallah et al. [25]. Furthermore, we detected RD Denmark/Glaxo, which truncated PPE33 (Rv1809) and removed Rv1810 (equivalent to MB1840 from M. bovis) for Danish 1331, as described by Abdallah et al. [25]. Consistent with the previous study [23], we identified a DU1-like region with a length less than Pasteur 1173P2 DU1. Borgers et al. reported that only Danish 1331 deposited as the WHO reference at the National Institute for Biological Standards and Control (NIBSC) contains this duplication [23]. DU1 has also been reported with different lengths in BCG China and Birkhaug [24]. Moreover, a triploid 7.2 kb DU1-like sequence that covered six genes and crossed the oriC region has been identified in a clinical BCG strain (BCG 3281) [26]. The presence of oriC in these duplications may indicate that this region is prone to duplicate. The effect of dnaA-dnaN (located in DU1) copies on the biology of BCG strains is not well elucidated [13]. Bedwell et al. identified two separate genetic populations in a commercial preparation of the BCG Copenhagen vaccine (a.k.a. BCG Danish 1331) which differed in the copy number of 77 bp repeat in the senX3-regX3 region (2 and 3 repeats) [27]. As reported previously by Borgers’s study [23], we also identified only three 77 bp repeats for BCG Danish 1331. In contrast, Magdalena et al. reported two repeats for a BCG Danish vaccine strain provided by M. Lagranderie (Institut Pasteur, Paris, France) [28]. This finding may indicate that the different strains of BCG Danish are in circulation. As the previous studies [27, 28], two repeats were reported at senX3-regX3 region for Pasteur 1173P2 by the present study. We described a range of variants in BCG Pasteur 1173P2 and BCG Danish 1331 in this study. All detected variants (SNPs and indels below 100 bp) were reported in BCG strains with the complete genome in the NCBI (variable from one strain to all). Examination of strain-specific variants in other strain (Pasteur 1173P2 and Danish 1331 compared to each other) using contigs alignment with the reference genome showed that: 1) the identified variant in one strain is adjacent to the contigs border in the other strain and is not detectable. 2) The variant in one strain is associated with a deleted region in another. 3) The variant in one strain is intact in the other one. We found the single nucleotide substitutions C592620A and G2074890T (position in M. tuberculosis H37Rv) in Danish 1331. While the former leads to an amber termination codon (TAG) at position 323 of the amino acid sequence and premature termination, the latter does not alter the structure of the protein produced. SNP in galE2 (C592620A), which is involved in the galactose metabolism (http://genolist.pasteur.fr/TubercuList/), has not been reported in other vaccine strains. In addition to Danish, which shows a 10 bp deletion in phoR, other studies have reported that the BCG Glaxo, Sweden, Birkhaug, and Frappier are defective in phoR by frameshift mutations with different base pairs [24]. While several studies have reported that phoP plays an important role in M. tuberculosis virulence, the role of phoR in virulence is less understood [29]. A previous study reported mutations in nine genes required for mycobacterial in vivo survival in 13 BCG strains [30], all identified in this study as well. Except for the identified common SNPs between the 13 BCG strains, we found that the kdpD, lysX, pks12, and espA genes in Pasteur 1173P2 and Danish 1331 carry SNPs that are not present in the other strains. Considering the role of these genes, Pasteur 1173P2 and Danish 1331 may be less resistant to environmental conditions [30], which requires further investigation. In screening T cell epitopes, we found that several epitopes were lost in Pasteur 1173P2 and Danish 1331 compared to M. tuberculosis H37Rv. These findings differ from the results of Zhang et al., which reported 295 epitopes in 13 BCG strains, 117 lost epitopes through deletion of RD1, and 28 of RD2 [31]. A study of 491 experimentally confirmed human T cell epitopes in 21 Mycobacterium tuberculosis Complex (MTBC) strains reported that these epitopes are highly conserved compared to the remaining MTBC genomes [32]. They found that this observation contradicts studies in pathogenic viruses, bacteria, and protozoa that showed the genes encoding antigens are highly variable to escape host immunity [33-35]. Although we showed that most of the present epitopes in two strains were as highly conserved as the M. tuberculosis strains, we identified amino acid substitutions in six T cell epitopes associated with four antigens compared to the M. tuberculosis H37Rv. Replacements in 15 epitopes of nine antigens have been reported by Copin et al., four of which are shared with the present study, including PstS1, Fibronectin-binding protein B/antigen 85B, ESX-1 secretion-associated protein EspA and diacylglycerol acyltransferase/mycolyltransferase Ag85A proteins [36]. All these were reported by Copin et al. as antigens containing variants in their epitopes [36]. Inconsistent, they were identified with variants in their non-epitopic regions in the current study. Moreover, Zhang et al. did not identify mutations in the epitopes of 13 BCG strains [31]. We noted that all absent epitopes in both strains were associated with RDs previously described, and no new missing epitope was identified. We also found that M. tuberculosis epitopes present in the BCG strains are highly conserved, probably due to the low rate of genetic variations during in vitro evolution that causes sequence diversity in these regions [36].

Conclusions

We presented a complete report of the variants (SNPs and short indels) in two of the most widely used BCG vaccines compared to the three reference strains using WGS. These findings may be helpful in a more accurate understanding of phenotypic differences following genetic differences between vaccine strains. In addition, this study may provide evidence for investigating the protective efficacy induced by different BCG strains. This study revealed that the present experimentally confirmed human T cell epitopes are highly conserved in both BCG strains and do not reflect any ongoing evolutionary. However, further studies are required to find the association of genetic variations with vaccine variable efficacy and determine the most effective vaccine strain.

Methods

Vaccine strains, DNA isolation and whole genome sequencing

Two strains of the BCG vaccine, Pasteur 1173P2 and Danish 1331, were prepared from Pasteur Institute (Paris, France) and Statens Serum Institute (Copenhagen, Denmark), respectively. The culture of freeze-dried seed lots was carried out on the Sauton broth medium, which is routinely used to produce the vaccine. Genomic DNA preparation was performed according to the protocol described previously [37]. The identity of the vaccine strains was determined using specific primers of DU1 and RD14 for BCG Pasteur 1173P2 [14, 16] and DU2-III for BCG Danish 1331 [14]. Genomic libraries were constructed using a modified Nextera Flex protocol (Hackflex) [38] and sequenced on an Illumina NovaSeq 6000 instrument using an S4 flow cell to generate 150 bp paired-end reads with an average coverage of 133-fold at ithree institute (University of Technology Sydney, Sydney, Australia).

Genome assembly, alignment, and variant calling

Raw sequence reads were trimmed as a part of the Shovill pipeline (https://github.com/tseemann/shovill) using Trimmomatic (v0.39) [39] to remove adapter sequences and low-quality ends (Phred score < 20) and assessed for quality by FastQC (v0.11.9) [40]. The reads were de novo assembled to construct contigs using Shovill and SPAdes genome assembler (v3.13.1) [41] and put in the GenBank under BioProject No. PRJNA691088 (Pasteur 1173P2 BioSample No.: SAMN17277795; Danish 1331 BioSample No.: SAMN17277808). Filtration was performed [42] to maintain contigs with a size over 1000 bp before evaluating the quality of the assemblies with QUAST (v.5.0.2) [43]. The generated contigs were aligned and rearranged with the M. tuberculosis H37Rv (Accession No.: NC_000962.3), M. bovis AF2122/97 (Accession No.: NC_002945.4), and M. tuberculosis variant bovis BCG str. Pasteur 1173P2 (Accession No.: NC_008769.1) as the reference genomes by Mauve aligner (v20150226) using progressiveMauve algorithm (v20150226) [44]. Annotation was performed using Prokka (v1.12) [45] and Rapid Annotation using Subsystem Technology (RAST) [46, 47]. Raw reads were also mapped to three references mentioned above by Burrows-Wheeler Aligner (BWA) (v0.7.17) [48], followed by BamQC with Qualimap (v2.2.1) [49]. Variant calling was conducted using SAMtools (v0.1.19) [50] to generate VCF output files and Mauve [44]. Raw VCF files were filtered considering the quality score and read depth of more than 30 and 0.75, respectively [25]. The presence or absence of experimentally confirmed human T cell epitopes in the BCG genomes was assessed by retrieving M. tuberculosis T cell epitopes from the Immune Epitopes DataBase (IEDB) [51]. Additional file 1: Table S1. SNPs compared to M. tuberculosis H37RV. Table S2. Indels compared to M. tuberculosis H37Rv. Table S3. SNPs compared to M. tuberculosis variant bovis AF212297. Table S4. Indels compared to M. tuberculosis variant bovis AF212297. Table S5. SNPs & Indels compared to M. tuberculosis variant bovis BCG str. Pasteur 1173P2. Table S6. Antigenic epitopes compared to M. tuberculosis H37Rv. Table S7. Antigenic epitopes compared to M. tuberculosis variant bovis AF212297.
  49 in total

1.  Detection of deleted genomic DNA using a semiautomated computational analysis of GeneChip data.

Authors:  H Salamon; M Kato-Maeda; P M Small; J Drenkow; T R Gingeras
Journal:  Genome Res       Date:  2000-12       Impact factor: 9.043

2.  The effect of bacille Calmette-Guérin vaccine strain and route of administration on induced immune responses in vaccinated infants.

Authors:  Virginia Davids; Willem A Hanekom; Nazma Mansoor; Hoyam Gamieldien; Sebastian J Gelderbloem; Anthony Hawkridge; Gregory D Hussey; E Jane Hughes; Jorge Soler; Rose Ann Murray; Stanley R Ress; Gilla Kaplan
Journal:  J Infect Dis       Date:  2006-01-13       Impact factor: 5.226

Review 3.  BCG vaccines: their mechanisms of attenuation and impact on safety and protective efficacy.

Authors:  Jun Liu; Vanessa Tran; Andrea S Leung; David C Alexander; Baoli Zhu
Journal:  Hum Vaccin       Date:  2009-02-20

4.  Preventing tuberculosis with bacillus Calmette-Guérin vaccine: a meta-analysis of the literature.

Authors:  T F Brewer
Journal:  Clin Infect Dis       Date:  2000-09       Impact factor: 9.079

5.  Effect of BCG vaccination on childhood tuberculous meningitis and miliary tuberculosis worldwide: a meta-analysis and assessment of cost-effectiveness.

Authors:  B Bourdin Trunz; Pem Fine; C Dye
Journal:  Lancet       Date:  2006-04-08       Impact factor: 79.321

6.  The establishment of sub-strain specific WHO Reference Reagents for BCG vaccine.

Authors:  Belinda Dagg; Jason Hockley; Peter Rigsby; Mei M Ho
Journal:  Vaccine       Date:  2014-10-11       Impact factor: 3.641

7.  Genomic analysis of a Mycobacterium bovis bacillus [corrected] Calmette-Guérin strain isolated from an adult patient with pulmonary tuberculosis..

Authors:  Xuming Li; Liping Chen; Yongqiang Zhu; Xia Yu; Jun Cao; Rui Wang; Xinyan Lv; Jin He; Aizhen Guo; Hairong Huang; Huajun Zheng; Siguo Liu
Journal:  PLoS One       Date:  2015-04-13       Impact factor: 3.240

8.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

9.  The RAST Server: rapid annotations using subsystems technology.

Authors:  Ramy K Aziz; Daniela Bartels; Aaron A Best; Matthew DeJongh; Terrence Disz; Robert A Edwards; Kevin Formsma; Svetlana Gerdes; Elizabeth M Glass; Michael Kubal; Folker Meyer; Gary J Olsen; Robert Olson; Andrei L Osterman; Ross A Overbeek; Leslie K McNeil; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D Pusch; Claudia Reich; Rick Stevens; Olga Vassieva; Veronika Vonstein; Andreas Wilke; Olga Zagnitko
Journal:  BMC Genomics       Date:  2008-02-08       Impact factor: 3.969

10.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).

Authors:  Ross Overbeek; Robert Olson; Gordon D Pusch; Gary J Olsen; James J Davis; Terry Disz; Robert A Edwards; Svetlana Gerdes; Bruce Parrello; Maulik Shukla; Veronika Vonstein; Alice R Wattam; Fangfang Xia; Rick Stevens
Journal:  Nucleic Acids Res       Date:  2013-11-29       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.