| Literature DB >> 30914757 |
Paula J Gomez-Gonzalez1, Nuria Andreu1, Jody E Phelan1, Paola Florez de Sessions2, Judith R Glynn3, Amelia C Crampin3,4, Susana Campino1, Philip D Butcher5, Martin L Hibberd1,2, Taane G Clark6,7.
Abstract
Human tuberculosis disease (TB), caused by Mycobacterium tuberculosis (Mtb), is a complex disease, with a spectrum of outcomes. Genomic, transcriptomic and methylation studies have revealed differences between Mtb lineages, likely to impact on transmission, virulence and drug resistance. However, so far no studies have integrated sequence-based genomic, transcriptomic and methylation characterisation across a common set of samples, which is critical to understand how DNA sequence and methylation affect RNA expression and, ultimately, Mtb pathogenesis. Here we perform such an integrated analysis across 22 M. tuberculosis clinical isolates, representing ancient (lineage 1) and modern (lineages 2 and 4) strains. The results confirm the presence of lineage-specific differential gene expression, linked to specific SNP-based expression quantitative trait loci: with 10 eQTLs involving SNPs in promoter regions or transcriptional start sites; and 12 involving potential functional impairment of transcriptional regulators. Methylation status was also found to have a role in transcription, with evidence of differential expression in 50 genes across lineage 4 samples. Lack of methylation was associated with three novel variants in mamA, likely to cause loss of function of this enzyme. Overall, our work shows the relationship of DNA sequence and methylation to RNA expression, and differences between ancient and modern lineages. Further studies are needed to verify the functional consequences of the identified mechanisms of gene expression regulation.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30914757 PMCID: PMC6435705 DOI: 10.1038/s41598-019-41692-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Phylogenetic tree of the 22 Karonga strains. Maximum-likelihood phylogenetic tree of the 22 isolates analysed, covering lineages 1 (L1), 2 (L2) and 4 (L4).
Figure 2Gene expression differences between modern (lineage 2 and 4) and ancient (lineage 1) strains. A heatmap showing the 105 genes differentially expressed between ancient and modern strains, constructed with the gene expression distances between rows. Rows and columns are ordered based on row or column means. Over-expressed genes are coloured in red whilst under-expressed ones in green. Ancient strains (n = 8) represented on the left of the white vertical line and modern strains (n = 14) on the right. Lineage 1 represented in violet, Lineage 2 in blue and Lineage 4 in red.
Putative functional SNPs associated with expression (cis-eQTLs with allele frequencies >5%; adjusted p < 0.05).
| Transcript differentially expressed | Annotation | SNP | Position SNP | Regulation | Strain | Allele frequency** | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gene | Distance (bp) from start codon | Promoter (P)/TSS | Ancient | Modern | ||||||
| SNPs in upstream region |
| 1 | G226676A | IGR | −105 | — | Up | 1 | 0.973 | 0 |
|
| — | T392261C |
| −12 | — | Up | 1,2 | 0.978 | 0.324 | |
|
| 6 | T454295C |
| −126 | — | Up | 1,2,4.1,4.3.4, 4.8,4.9 | 1 | 0.994 | |
|
| 4 | T655986G | IGR | −37 | P | Up | 1,2 | 0.976 | 0.324 | |
|
| 6 | A690450C |
| −51 | — | Up | 1,2 | 0.976 | 0.324 | |
|
| 3 | T769663G | IGR | −66 | P | Down | 4.3.3 | 0 | 0.050 | |
|
| 3 | C1069871T | IGR | −12 | P | Up | 1.1.3 | 0.220 | 0 | |
|
| 3 | T1224367C | IGR | −18 | P | Down | 1,2,4.1,4.3,4.8 | 1 | 0.976 | |
|
| 1 | A1694547C | IGR | −3 | — | Up | 1 | 0.973 | 0 | |
|
| 4 | T2177073C | IGR | −14 | TSS/P | Down | 1 | 0.973 | 0 | |
|
| 3 | C2282058T |
| −41 | — | Up | 1.2.2* | 0.157 | 0 | |
|
| 1 | A2421816G |
| −151 | — | Down | 1,2 | 0.977 | 0.323 | |
|
| 7 | A2424864G | IGR | −18 | TSS | Down | 1 | 0.973 | 0 | |
|
| 1 | C3025431T | IGR | −103 | P | Up | 1 | 0.971 | 0 | |
|
| 5 | T3137237C | IGR | −13 | P | Up | 1 | 0.973 | 0 | |
|
| 5 | G3920109T |
| −47 | — | Up | 1 | 0.971 | 0 | |
|
| 2 | T4137190C | IGR | −16 | — | Down | 1 | 0.973 | 0 | |
Table showing the candidate transcripts differentially expressed due to SNPs in upstream intergenic regions (IGRs) or within the upstream gene. Annotation of the transcript differentially expressed: 1 – Conserved hypotheticals, 2 – Cell wall and cell processes, 3 – Intermediary metabolism and respiration, 4 – Lipid metabolism, 5 – Virulence, detoxification, adaptation, 6 – Regulatory proteins, 7 – PE/PPE, 8 – information pathways. Distance of the SNP location from the start codon of the transcript is showed as negative when it is upstream and positive when it is located within the gene. TSS = Transcriptional Start Site. *Only one or two samples from the lineage out of the 3 analysed. **Allele frequency refers to the fraction of strains harbouring the SNP in a larger data set (n = 6,218)[50]; “—“ when not available.
Non-synonymous variants in transcriptional regulatory genes with eQTL associations, with potential functional impairment.
| Gene | Mutation | Family | Lineage of strains carrying mutation | Allele frequency | |
|---|---|---|---|---|---|
| Ancient | Modern | ||||
|
| S21G | whiB | 1.2.2** | 0.021 | 0 |
|
| G217D | 4.9** | 0 | 0.001 | |
|
| L186R* | MarR | 4.9** | 0 | 0 |
|
| P36L* | tetR | 4.9** | 0 | 0 |
|
| C41STOP | LuxR | 1.2.2** | 0.021 | 0 |
|
| S24L | tetR | 1 | 0.973 | 0 |
|
| E23K | 1.2.2** | 0.019 | 0 | |
|
| P302R* | LysR | 1 | 0.973 | 0 |
|
| L475R* | LuxR/UhpA | 4.1.1.3 | 0 | 0.003 |
|
| P91Q | 1 | 0.973 | 0 | |
| T118A | 4.9** | 0 | 0.001 | ||
| Q121R | 1 | 0.973 | 0 | ||
|
| R233H* | ArsR | 1,2 | 0.978 | 0.334 |
|
| A140T | 2 | 0.003 | 0.114 | |
|
| P227L* | 4.1.1.3 | 0 | 0.003 | |
| E246K* | 4.1.2 | 0 | 0.009 | ||
|
| G169R* | 2 | 0.003 | 0.147 | |
|
| E234G* | LuxR | 2 | 0.003 | 0.111 |
| E303K* | 4.1.2 | 0 | 0.009 | ||
|
| V37G* | 1,2,4.1,4.3,4.8 | 1 | 0.974 | |
|
| G60S* | KDPD/KDPE | 2 | 0.003 | 0.111 |
|
| R11T | 1.2.2** | 0.148 | 0 | |
|
| A70S | 4.1.2 | 0 | 0.009 | |
| C110Y | 1 | 0.973 | 0 | ||
|
| D208N | 1.1.3 | 0.230 | 0 | |
| D218N | 1.2.2** | 0.021 | 0 | ||
| P405Q | 1,2,4.1,4.3,4.8 | 1 | 0.974 | ||
|
| E189G* | 4.3 | 0.014 | 0.281 | |
|
| V59A | CRP/FNR | 1 | 0.974 | 0 |
| A125S | 1.1.3* | 0.072 | 0 | ||
|
| R154S | 1.2.2** | 0.019 | 0 | |
|
| L57R | 1 | 0.970 | 0 | |
|
| D148Y* | tetR | 1.1.3** | — | — |
|
| A262E | 1,2,4.1,4.3,4.8 | 0.998 | 0.973 | |
|
| C155R | tetR | 1,2 | 0.977 | 0.323 |
|
| H64R* | 1 | 0.973 | 0 | |
|
| D184Y* | LuxR | 1.2.2** | 0.018 | 0 |
|
| A110V | 2 | 0.003 | 0.148 | |
|
| Q131STOP | 1 | 0.973 | 0 | |
|
| G420D | GntR | 4.1.2 | 0 | 0.009 |
|
| L316R* | AraC/XylS | 1 | 0.973 | 0 |
|
| P17Q | tetR | 1 | 0.973 | 0 |
|
| T154A | tetR | 4.1.1.3 | 0.003 | 0.049 |
|
| S2L | whiB | 1.1.3 | 0.223 | 0 |
|
| G144R* | AraC/XylS | 1 | 0.971 | 0 |
|
| G71D | whiB | 1.2.2** | 0.014 | 0 |
Table showing non-synonymous mutations in transcriptional regulatory genes found as potential eQTLs. *Sorting Intolerant from tolerant (SIFT) predicted scores (p value) < 0.05 and considered to have functional impact; whilst for the others the SIFT software was unable to predict functional effects of mutations; **Only one or two samples available from the lineage. Allele frequency refers to the fraction of strains harbouring the SNP in a larger data set (n = 6,218)[26].
cis-eQTLs located in upstream intergenic regions linked with methylation in Lineage 4 strains.
| Gene | Position | strand | Motif | Distance from start codon (bp) | Promoter | Regulation in non-methylated samples |
|---|---|---|---|---|---|---|
|
| 657533 | − | CTGGAG | −63 | − | Down |
|
| 1002711 | + | CTCCAG | −101 | − | Down |
|
| 1543277 | + | CTCCAG | −82 | − | Up |
|
| 1938088 | + | CTCCAG | −58 | P, TSS | Up |
|
| 3710411 | − | CTCCAG | −163 | − | Up |
|
| 3710411 | − | CTCCAG | −32 | − | Up |
|
| 3710408 | + | CTGGAG | −25 | − | Down |
|
| 4093563 | + | CTGGAG | −69 | − | Down |
Table showing genes differentially expressed potentially due to the lack of methylation in the upstream region. The name of the gene, the position of the eQTL (methylation site), strand, motif, distance of the methylated base from start codon of the transcript (negative shown as upstream), prediction of promoter or TSS (P = promoter region, TSS = Transcriptional Start Site), and type of regulation of the gene in non-methylated samples is shown.
Figure 3Venn diagram showing the overlap of genes differentially expressed (from the 3,987 investigated) associated with the different eQTL types (cis, trans and modified). The numbers represent the number of genes differentially expressed associated with the different types of eQTLs: cis-eQTLs, SNPs in promoter regions, transcriptional start sites (TSS), upstream (up to −200 bp) or within the gene; tr-eQTLs, potentially impairing non-synonymous SNPs located in transcriptional regulators; and mod-eQTLs, methylated bases located either within the gene or upstream including promoter regions and TSS.