| Literature DB >> 33575588 |
James Gallant1,2, Jomien Mouton1, Roy Ummels3, Corinne Ten Hagen-Jongman2, Nastassja Kriel1, Arnab Pain4,5, Robin M Warren1, Wilbert Bitter2,3, Tiaan Heunis1,6, Samantha L Sampson1.
Abstract
Mycobacterium tuberculosis is a facultative intracellular pathogen responsible for causing tuberculosis. The harsh environment in which M. tuberculosis survives requires this pathogen to continuously adapt in order to maintain an evolutionary advantage. However, the apparent absence of horizontal gene transfer in M. tuberculosis imposes restrictions in the ways by which evolution can occur. Large-scale changes in the genome can be introduced through genome reduction, recombination events and structural variation. Here, we identify a functional chimeric protein in the ppe38-71 locus, the absence of which is known to have an impact on protein secretion and virulence. To examine whether this approach was used more often by this pathogen, we further develop software that detects potential gene fusion events from multigene deletions using whole genome sequencing data. With this software we could identify a number of other putative gene fusion events within the genomes of M. tuberculosis isolates. We were able to demonstrate the expression of one of these gene fusions at the protein level using mass spectrometry. Therefore, gene fusions may provide an additional means of evolution for M. tuberculosis in its natural environment whereby novel chimeric proteins and functions can arise.Entities:
Year: 2020 PMID: 33575588 PMCID: PMC7671302 DOI: 10.1093/nargab/lqaa033
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 1.Deletions in the ppe38–71 operon are more prevalent in M. tuberculosis lineage 2 isolates. Deletions are represented as Circos plots displaying whole genome sequence alignment by genomic coordinates (outer track), strain (first track), read density (middle track) and average coverage (inner track) for clinical isolates of M. tuberculosis representing (A) lineage 2 and (B) lineage 4. Averages were determined per strain and only within the genomic coordinates spanning the deletion. (C) Occurrence of ppe38–71 deletion as a percent calculated from each subpopulation of lineage 2 and lineage 4 representing 90 clinical isolates, respectively. Deletions were determined by inspecting coverage in the ppe38–71 operon. (D) Western blot of cell-free supernatants obtained from M. tuberculosis clinical isolates, M. tuberculosis CDC1551 and control strains targeting PE-PGRS proteins. Red dots indicate clinical isolates that were predicted by whole genome sequencing to have a ppe38–71 deletion.
Figure 2.Mycobacterium tuberculosis lineage 4 strain with a ppe38–71 deletion produces a functional chimeric protein. (A) Specific clinical isolates used for further investigation of the ppe38–71 operon. M. tuberculosis H37Rv was used as a control for the NGS and aligned to CDC1551; full-length ppe38–71 is detected in contrast to the published reference. S3651 and S507 represent clinical isolates with a ppe38 deletion and S3388 represents a clinical isolate without deletion. (B) Schematic representation of targeted de novo assembly and contig ordering of S507 indicating a transposon insertion between ppe71 and ppe38, thereby disrupting the ppe71 reading frame. (C) Schematic representation of targeted de novo assembly in the ppe38–71 operon of S3651. No transposon insertion was found and the reading frame of ppe71 is intact causing a gene fusion. (D) Western blot of CDC1551 reference, Δppe38–71, ppe38–71 complemented strain, S507 and S3651 cell-free supernatant probed for PE-PGRS proteins, PPE38 and ESAT-6 as the loading control. S3651 secreted PE-PGRS proteins and expressed PPE38 similarly to the wild type.
Figure 3.Systematic detection of potential fusion events from M. tuberculosis large sequence polymorphisms. (A) The distribution of multigene deletions that fall within an open reading frame and have the same orientation as predicted by our software compared to the distribution of SVs found across 180 isolates of lineage 2 and lineage 4. The numbers on the graph represent the mean of the distribution. (B) Most abundant annotations associated with potential gene fusions across all clinical isolates and separated by lineage. Annotation terms were sourced from the Mycobrowser functional categories. PE/PPE proteins constituted the majority of identifications associated with gene fusions. Potential gene fusions falling in this category were removed from further consideration as alignment failures in these areas are highly prevalent. (C) Occurrence of specific multigene deletions that fall within open reading frames. Each of these were manually inspected for closed reading frames and annotated as either fused or truncated.
Figure 4.Rv2623/28 and Rv0071/74 are gene fusions that have formed as a result of large deletions. (A) Circos plot depicting the genomic region of Rv2623–Rv2628 (outer track) from clinical isolates S5218, S5527, S507 and H37Rv. Middle and inner tracks display read density in the region and average coverage, respectively. (B) Circos plot of Rv0071–Rv0074 (outer track) as well as the read density (middle track) and average coverage (inner track) in the region. (C) Polymerase chain reaction of wild-type and deleted Rv2623–Rv2628 regions. (D) Chromatograms from capillary electrophoresis displaying the deletion breakpoints (red arrow) from clinical isolate S5218 (wild-type Rv2623) and S507 (Rv2623/Rv2628 fusion protein). In the Circos tracks, M. tuberculosis H37Rv is representative of the reference genotype.
Figure 5.Targeted de novo assemblies and tandem mass spectrometry identify Rv2623/28 as chimeric protein. (A) Schematic illustration of Rv2623–Rv2628de novo assembly and contig ordering from clinical isolate S507. (B) Schematic illustration of de novo assembly and contig ordering from S507 displaying the deletion region and in-frame translation of Rv0071/74. Black indicates out-of-bounds genes and grey indicates deleted genes. Tandem mass spectra of peptides representing (C) the Rv2623/28 fusion junction from S507 and (D) the wild-type Rv2623 of S5218 in the same location. False discovery rate cut-off for assigning peptides was set at 0.01.