| Literature DB >> 31286858 |
Katlyn Borgers1,2, Jheng-Yang Ou3,4, Po-Xing Zheng3,4, Petra Tiels1,2, Annelies Van Hecke1,2, Evelyn Plets1,2, Gitte Michielsen1,2, Nele Festjens1,2, Nico Callewaert5,6, Yao-Cheng Lin3,4.
Abstract
BACKGROUND: Mycobacterium bovis bacillus Calmette-Guérin (M. bovis BCG) is the only vaccine available against tuberculosis (TB). In an effort to standardize the vaccine production, three substrains, i.e. BCG Danish 1331, Tokyo 172-1 and Russia BCG-1 were established as the WHO reference strains. Both for BCG Tokyo 172-1 as Russia BCG-1, reference genomes exist, not for BCG Danish. In this study, we set out to determine the completely assembled genome sequence for BCG Danish and to establish a workflow for genome characterization of engineering-derived vaccine candidate strains.Entities:
Keywords: BCG; Complete genomic sequence; Genetic differences; Live vaccines; Next-generation sequencing; Tandem duplications; Tuberculosis
Year: 2019 PMID: 31286858 PMCID: PMC6615170 DOI: 10.1186/s12864-019-5909-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Genome analysis pipeline
Fig. 2Organization of the BCG Danish 1331 (07/270) genome, focusing on the DU1 and DU2. a Circular representation of the BCG Danish chromosome. The scale is shown in megabases on the outer black circle. Moving inward, the next two circles show forward (dark blue) and reverse (yellow) strand CDS (coding sequence). The next circle shows 3 rRNAs (5S, 16S and 23S; orange), 45 tRNAs (black), 1 tmRNA (ssrA; green) and 1 ncRNA (rnpB; dark green3), followed by 42 SNPs (red) detected between BCG Danish and Pasteur. The subsequent circle shows DU2-III (dark blue), DU1-Danish (purple) and RD (light blue, names of RD in black) that are typical for BCG Danish. The two inner circles represent G + C content and GC skew. b Organization of the two tandem duplications in BCG Danish and confirmation by PCR. The DU2 is made up by two repeats (R1 and R2), as well as the DU1-Danish (R3 and R4). Used primer pairs (1–8) to validate their organization are indicated. c Visual representation of the oriC with position and size of DU1-China, −Danish, −Pasteur and -Birkhaug. The table indicates which substrains have the DU1. d Copy-number analysis of genes (indicated in grey in subfigure c) in and surrounding the DU1 region for Pasteur 1173 ATCC 35734, Pasteur 1721 and Danish 1331 NIBSC 07/270. The represented data are averages (± SD) of four technical replicates
Fig. 3DU1 duplication detection in BCG strains. Tiling array data (a) from Leung et al. 2008 [15] and Illumina sequencing data (b) for BCG Danish 1331 (this study) as well as published genome data from Pan et al. 2011 [16–19], Abdallah et al. 2015 [12] and Festjens et al. 2019 [20] were reanalyzed for the presence of a DU1 in the region of the oriC. These references were chosen as they contain BCG Danish or BCG Pasteur genome sequencing data. The graphs in (a) depict the ratio of the reference (M. tb H37Rv) probe intensity (Cy5) divided by the test (BCG strain) probe intensity as originally presented in Leung et al. 2008 [14]. The graphs in (b) depict the ratio of mean whole genome read coverage divided by the mean read coverage in 500 bp window size. Detection of a DU1-like duplication in BCG Pasteur 1173P2 [15], Birkhaug [12, 15], Danish 1331 07/270 (this study) [21] and BCG China [15, 16] sequencing data, indicated in grey. No detection of DU1-duplication for other BCG Pasteur [12, 20], Danish [12, 17] and China [12] sequencing data
Fig. 4BCG Danish 1331 sapM KO has lost the DU2 to form the sapM KO locus. a Illustration of the outrecombination of the DU2 duplicated genomic region in the formation of the BCG Danish 1331 sapM KO from BCG Danish 1331 WT, containing two sapM loci, due to the presence of the sapM locus in the DU2. b Genomic organization of the sapM region for BCG Danish WT and sapM KO. The organization of the DU2 is indicated. †: truncated sapM. c Copy-number analysis of selected genes (indicated in grey in subfigure b) in and surrounding the DU2 via qPCR on gDNA for BCG Danish 1331 WT and sapM KO. The represented data are averages (± SD) of four technical replicates
Fig. 5Refined genealogy of BCG vaccine strains. The year when the strain was obtained per geographical location is indicated where possible (indigo). The scheme shows regions of difference (RD), insertions (Ins), deletions (‘∆’), indels and tandem duplications (DU), which differentiate the different BCG strains (Additional file 2: Table S8). The blue dashed squares indicate the different DU2-forms, which classify the BCG strains into four major lineages. When the DU1 is not found in all substrains of a certain strain, this is indicated on the scheme. According to the literature, two different substrains of BCG are named BCG China or Beijing [8]. Therefore, the scheme contains two ‘BCG China’ strains: BCG China [8] and BCG China* [7, 14]. Adapted from references [8, 11, 14, 28, 29]. Concerning reference [8], only the RD and deleted genes that could be verified on the assembled genomes [12] are included
Genes (and genome feature) common to all DU1-like duplications (DU1-Pasteur, -Birhaug, -China and -Danish and the DU1-like triplication identified in the clinical isolate BCG 3281)
| Gene/feature | Product/Function | Functional category |
|---|---|---|
| Rv3921c (BCG3979c) | Probable conserved transmembrane protein | cell wall and cell processes |
| Rv3922c (BCG3980c) | Possible hemolysin | virulence, detoxification, adaptation |
|
| Ribonuclease P protein component RnpA or RNaseP. RNaseP catalyzes the removal of the 5′-leader sequence from PRE-tRNA to produce the mature 5′ terminus. It can also cleave other RNA substrates such as 4.5S RNA. The protein component plays an auxiliary but essential role in vivo by binding to the 5′-leader sequence and broadening the substrate specificity of the ribozyme. | information pathways |
|
| 50S ribosomal protein L34 RpmH. Involved in translation mechanism. This protein is one of the early assembly proteins of the 50S ribosomal subunit. | information pathways |
|
| Chromosomal replication initiator protein DnaA. Plays an important role in the initiation and regulation of chromosomal replication. Binds to the | information pathways |
|
| Sequence in the genome at which replication is initiated. Contains non-overlapping MtrA- and DnaA-binding boxes. Is located in the | – |
|
| DNA polymerase III (β-chain) DnaN (DNA nucleotidyltransferase). DNA polymerase III is a complex, multichain enzyme responsible for most of the replicative synthesis in bacteria. This DNA polymerase also exhibits 3′ to 5′ exonuclease activity. The β-chain is required for initiation of replication. Once it is clamped onto DNA, it slides freely (bidirectionally and ATP-independently) along duplex DNA. | information pathways |
|
| DNA replication and repair protein RecF (ssDNA binding protein) is involved in DNA metabolism and recombination; it is required for DNA replication and normal SOS inducibility. Binds preferentially to linear ssDNA. It also seems to bind ATP. | information pathways |
Gene information was extracted from Mycobrowser [36]
List of M. bovis BCG strains for which high per-bp coverage complete genomes are available
| Strain | Method | type DU2 | DU2 resolved | Accession numbers | Reference | Year of publication |
|---|---|---|---|---|---|---|
| BCG Pasteur 1173P2 | IV | yes | PRJEA18059, AM408590 | [ | 2007 | |
| BCG Tokyo 172 | SOLiD (Agencourt Bioscience Corporation) sequencing of | I | yes (2x) | PRJDA31211, AP010918 | [ | 2009 |
| BCG Moreau RDJ | I | no | PRJEA70285, AM412059 | [ | 2011 | |
| BCG Tice ATCC 35743 | Illumina genome sequencing, | IV | no | PRJNA63839, CP003494 | [ | 2011 |
| BCG Mexico 1931 | Roche 454 pyrosequencing, | IV | yes | PRJNA45811, CP002095 | [ | 2011 |
| BCG Korea 1168P | Roche 454 (GS-FLX) and Illumina (HiSeq) sequencing, | IV | yes (2x) | PRJNA170028, CP003900 | [ | 2013 |
| BCG Russia 368 | Roche 454 (GS Junior) sequencing on | I | yes (2x) | PRJNA256163, CP009243 | [ | 2014 |
| BCG 3281 (clinical isolate) | Roche 454 (GS-FLX) and Illumina (Hiseq2500) sequencing, | III | yes | PRJNA251957, CP008744 | [ | 2015 |
| BCG Russia BCG-1 | Roche 454 (GS-FLX) sequencing of | I | yes (2x) | PRJNA306822, CP013741 | [ | 2016 |
| BCG Danish 1331 (07/270) | PacBio (RSII) (long read) sequencing (235x coverage) and Illumina (MiSeq) (short read) sequencing | III | yes | PRJNA494982, CP039850 | this study | 2019 |
For each strain, we indicated the used method to create the assembled genome, the type of DU2 present in the strain, whether the DU2 was resolved in the genome assembly, the BioProject and genome assembly accession number, the reference to the study in which genome assembly method was published and the year of publication. In the ‘method’ description, we have put labor/capital-intensive aspects in bold, illustrating that our approach using solely massive parallel sequencing, is the only one that provides both high per-bp accuracy (allowing for SNP calling) and complete resolution of the assembly across large repeat regions