| Literature DB >> 28415976 |
Afif Elghraoui1, Samuel J Modlin1, Faramarz Valafar2.
Abstract
BACKGROUND: The genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication.Entities:
Keywords: Comparative genomics; De novo assembly; H37Ra; H37Rv; Mycobacteria; Reference genomes; Sequencing errors; Single-molecule sequencing; Tuberculosis; Virulence
Mesh:
Substances:
Year: 2017 PMID: 28415976 PMCID: PMC5393005 DOI: 10.1186/s12864-017-3687-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Sequencing Coverage and GC-content by Genome Position. GC-content and coverage are shown in 1kb windows. The coverage plot refers to reads mapped to our assembly during the final polishing round. Reads with mapping quality values less than 10 were not used in polishing and are not counted here. Imposing linearity in the contig despite circularity of the genome creates mapping difficulties at the contig edges, resulting in irregularities in apparent sequencing coverage at these sites
Fig. 2Example Classification of Genes Based on Variant Comparisons. Considering the profile of H37Ra-specific variants (those with respect to H37Rv not also appearing in CDC1551), a given gene (blue arrow) is categorized as “supported”, “contradicted”, or “adjusted” by our H37Ra assembly as a result of comparison with the hitherto reference sequence NC_009525.1. The illustration shows examples of the different variant profiles a gene could have and their resulting classifications. Genes in the “supported” and “contradicted” categories are strictly those where our assembly either fully matches the H37Ra reference (supported) or the H37Rv reference (contradicted). Multiple factors may cause a gene to be classified as “adjusted”. Such genes may have variant profiles not fully meeting the criteria of “supported” or “contradicted”, or they may have novel H37Ra-specific variants observed only in our assembly
Status of Genes Previously Reported as Affected by H37Ra-specific Mutations
| Locus Tag | Gene name | Description | Notes | Citation |
|---|---|---|---|---|
| (a) Genes with all High-Confidence Variants Contradicted by our Assembly | ||||
| Rv0037c |
| Probable conserved integral membrane protein | ||
| Rv0124 |
| PE-PGRS family protein PE_PGRS2 | ||
| Rv0189c |
| Probable dihydroxy-acid dehydratase IlvD (dad) | [ | |
| Rv0279c |
| PE-PGRS family protein PE_PGRS4 | ∙ | |
| Rv0383c |
| Possible conserved secreted protein | masks sequencing error in H37Rv | [ |
| Rv0578c |
| PE-PGRS family protein PE_PGRS7 | ∙ | |
| Rv0880 |
| Possible MarR-family transcriptional regulatory protein | [ | |
| Rv0977 |
| PE-PGRS family protein PE_PGRS16 | ∙ | |
| Rv1068c |
| PE-PGRS family protein PE_PGRS20 | ∙ | |
| Rv1091 |
| PE-PGRS family protein PE_PGRS22 | [ | |
| Rv1095 |
| Probable PHOH-like protein PhoH2 | ||
| Rv1196 |
| PPE family protein PPE18 | [ | |
| Rv1386 |
| PE family protein PE15 | ∙ | |
| Rv1450c |
| PE-PGRS family protein PE_PGRS27 | ∙ | |
| Rv1802 |
| PPE family protein PPE30 | SNV instantiates CTGGAG motif | ∙ |
| Rv1929c |
| Conserved hypothetical protein | ||
| Rv2048c |
| Polyketide synthase Pks12 | ||
| Rv2068c |
| Class a beta-lactamase BlaC | ||
| Rv2069 |
| RNA polymerase sigma factor, ECF subfamily, SigC | [ | |
| Rv2098c |
| PE-PGRS family protein PE_PGRS36 | Likely pseudogene | ∙ |
| Rv2202c |
| Adenosine kinase | Synonymous mutation | |
| Rv2396 |
| PE-PGRS family protein PE_PGRS41 | ∙ | |
| Rv2649 |
| Probable transposase for IS6110 | ||
| Rv2733c |
| Conserved hypothetical alanine, arginine-rich protein | ||
| Rv2734 |
| Conserved hypothetical protein | ||
| Rv2825c |
| Conserved hypothetical protein | ||
| Rv3031 |
| Conserved protein | Synonymous mutation | |
| Rv3191c |
| Probable transposase | labeled intergenic in H37RaJH | |
| Rv3192 |
| Conserved hypothetical alanine and proline-rich protein | labeled intergenic in H37RaJH | |
| Rv3303c |
| NAD(P)H quinone reductase LpdA | tandem repeat copy number variation | [ |
| Rv3350c |
| PPE family protein | ∙ | |
| Rv3388 |
| PE-PGRS family protein PE_PGRS52 | [ | |
| Rv3389c |
| Probable 3-hydroxyacyl-thioester dehydratase HtdY | ||
| Rv3507 |
| PE-PGRS family protein PE_PGRS53 | [ | |
| Rv3595c |
| PE-PGRS family protein PE_PGRS59 | [ | |
| Rv3611 |
| Hypothetical arginine and proline rich protein | One deletion also at | |
| (b) Genes with Different H37Ra-specific Variant Profiles in our Assembly | ||||
| Rv1764 |
| Putative transposase of insertion element IS6110 | disrupted by IS6110 in our assembly | |
| Rv3343c |
| PPE family protein | tandem repeat copy number variation | [ |
| Rv3508 |
| PE-PGRS family protein PE_PGRS54 | [ | |
| Rv3514 |
| PE-PGRS family protein PE_PGRS57 | [ | |
| (c) Genes with High-Confidence Variant Profiles Fully Confirmed by our Assembly | ||||
| Rv0010c |
| Probable conserved membrane protein | ||
| Rv0039c |
| Possible conserved transmembrane protein | ||
| Rv0101 |
| Probable peptide synthetase Nrp (peptide synthase) | [ | |
| Rv0635 |
| (3R)-hydroxyacyl-ACP dehydratase subunit HadA | [ | |
| Rv0637 |
| (3R)-hydroxyacyl-ACP dehydratase subunit HadC | [ | |
| Rv0757 |
| Member of Two-component response complex PhoPR | [ | |
| Rv0878c |
| PPE family protein PPE13 | [ | |
| Rv1005c |
| Probable para-aminobenzoate synthase component I | [ | |
| Rv1006 |
| Unknown protein | ||
| Rv1021 |
| NTP Pyrophosphohydrolase, MazG | [ | |
| Rv1755c |
| Probable phospholipase C 4 (fragment) PlcD | [ | |
| Rv1759c |
| PE-PGRS family protein Wag22 | [ | |
| Rv2352c |
| PPE family protein PPE38 | exact, adjacent duplication of PPE38 | [ |
| Rv3879c |
| ESX-1 secretion-associated protein EspK. | ||
| (d) Genes with Variant Profiles Erroneously Declared as H37Ra-specific | ||||
| Rv2421c† |
| Probable nicotinate-nucleotide adenylyltransferase NadD | SNV instantiates CTGGAG motif | [ |
| Rv3053c |
| Probable glutaredoxin electron transport component of NRDEF NrdH | [ | |
Studies [8, 25, 38, 49, 58] considered all of these genes. Studies [36, 59] (indicated by ∙ in the table) considered all the PE_PPE genes among the set.
†: one or more variants affecting this gene reported as sequencing errors in H37Rv [8]
Fig. 3Visualization of the Reduced Set of H37Ra-specific Variants and Their Effect on Phenotype. Our assembly contradicts many variants previously thought to be H37Ra-specific, reducing the number of genes that may contribute to H37Ra’s virulence attenuation. Several of these genes have been reassigned function since the first published assembly of the H37Ra genome [5], which is reflected in the figure. a The set of genes identified to carry H37Ra-specific polymorphisms in the original H37Ra genome publication [5] and their contribution to phenotype as understood at that time. 56 genes are affected, the majority of which were PE_PPE genes or were of unknown function. b The set of genes with H37Ra-specific variants confirmed by our assembly is reduced markedly, particularly in PE_PPE genes, highlighting the strength of single-molecule sequencing in resolving GC-rich and repetitive stretches of DNA. Genes with functions not yet characterized were also reduced significantly.*Though in a few instances this was because these genes’ function was characterized between 2008 and now, most were due to our assembly showing that they matched that of H37Rv and, therefore, are not H37Ra-specific. **For lpdA, the altered copy number in H37Ra was found not to be specific to the avirulent phenotype. However, the observed altered expression of lpdA in H37Ra may be due to altered regulation from PhoP. blueThe H37Ra-specific variant(s) in these genes have been shown to confer a phenotypic change in H37Ra relative to H37Rv in wet-lab studies. For these genes, the mechanisms affected by the H37Ra-specific variant are illustrated in detail (see Fig. 4 for hadC and phoP). For other genes, their general function is described or briefly illustrated
Genes with Variants in H37Ra Unique to our Assembly
| Locus Tag | Gene name | Variant | Notes |
|---|---|---|---|
| Rv0279ca, b |
| Two substitutions | Both mutations are not specific to H37Ra |
| Rv0383ca, b |
| A459399C - 84bp upstream of Rv0383c | Potential sequencing error in H37Rv [ |
| Rv1450ca, b |
| 208bp inframe insertion | |
| Rv1764 |
| insertion of IS6110 | |
| Rv3303ca, b |
| 174bp insertion 12bp upstream | Tandem repeat CNV |
| Rv3343ca |
| 1728bp insertion | Tandem duplication with respect to H37Rv |
| Rv3508a |
| multiple variants | |
| Rv3514b |
| multiple variants | Only two are H37Ra-specific |
The mutations in this table are with respect to the H37Rv reference (NC_000962.3), so variants with respect to the current H37Ra reference sequence (NC_009525.1) that cause agreement with the H37Rv sequence do not appear here.
agene previously implicated as affected by H37Ra-specific mutations [5].
bone or more mutations affecting this gene are also present in at least one of the sequences CDC1551 (NC_002755.2), H37RvBroad (NC_018143.2), H37RvSiena (NZ_CP007027.1), and H37RvTMC102 (NZ_CP009480.1)
Fig. 4Cell Wall Differences in H37Ra and H37Rv. a State of knowledge following publication of H37RaJH. At this time it was known that the SNP in the DNA-binding site of phoP abrogated synthesis of sulfolipids (yellow) and acyltrehaloses (purple and red) of the mycomembrane outer leaflet, while two SNPs in pks12, both of which were refuted in our assembly, were thought to cause the observed lack of phthiocerol dimycocerosates (blue) in H37Ra. b Current state of knowledge. Advances were made in understanding the inner leaflet. A single nucleotide, frameshift deletion in the now annotated hadC gene was shown by Slama and colleagues [33] to alter the mycolic acid profile in three distinct ways: i. Lower proportion of oxygenated mycolic acids (K-MA and Me-MA; green and blue carbon skeletons, respectively) to α-MAs (orange carbon skeleton). There are seven Me-MAs depicted in H37Rv compared to three in H37Ra, reflecting the proportions reported by Slama and colleagues [33]. ii. Extra degree of unsaturation (red circles) in H37Ra mycolic acids due to truncation of the HadC protein in H37Ra. iii. Shorter chain lengths of mycolic acids in H37Ra. Note that Me-MAs have larger loops in H37Rv than in H37Ra, and that the height of the α-MAs is shorter in H37Ra than H37Rv. Carbon chain lengths are based on results reported by Slama and colleagues. The folding geometry of the mycolic acids is depicted in panel B, as described by Groenewald and colleagues [56], and inspired by the illustration style of Minnikin and colleagues [57]
Available Finished Assemblies for the Reference Strains M. tuberculosis H37Rv and H37Ra
| Strain | Name | ATCC identifier | Accession | Technology | Last updated |
|---|---|---|---|---|---|
| H37Rv | F1a[ | 27294 | CP010329.1 | Pacific Biosciences (P4-C2) | 02/2016 |
| H37RvSiena | unspecified | NZ_CP007027.1/CP007027.1 | Illumina | 01/2015 | |
| H37RvTMC102 | 27294D-2 | NZ_CP009480.1/CP009480.1 | Illumina | 09/2014 | |
| H37RvBroad | unspecified | NC_018143.2/CP003248.2 | 454/Sanger/Illumina | 10/2013 | |
| H37Rv [ | 25618b | NC_000962.3/AL123456.3 | Sanger | 02/2013 | |
| H37Ra | H37RaSD [present study] | 25177 | CP016972.1 | Pacific Biosciences (P6-C4) | 08/2016 |
| F28a[ | 25177 | NZ_CP010330.1/CP010330.1 | Pacific Biosciences (P4-C2) | 02/2016 | |
| H37RaJH [ | 25177 | NC_009525.1/CP000611.1 | Sanger | 05/2007 |
Unreferenced entries were direct database submissions and do not have an associated publication
aThe unconventional names for these samples were not explained by Zhu and colleagues [13]. The name F28 in particular is already known from the literature to refer to a family of clinical isolates [63]
bThe ATCC number was unspecified by Cole and colleagues [62]. However, the ATCC catalog entry for this strain identifies it as the source for the sequence