| Literature DB >> 35143385 |
Jonathan S Abrahams1, Michael R Weigand2, Natalie Ring1, Iain MacArthur1, Joss Etty1, Scott Peng2, Margaret M Williams2, Barret Bready3, Anthony P Catalano3, Jennifer R Davis3, Michael D Kaiser3, John S Oliver3, Jay M Sage3, Stefan Bagby1, M Lucia Tondella2, Andrew R Gorringe4, Andrew Preston1.
Abstract
Bacterial genetic diversity is often described solely using base-pair changes despite a wide variety of other mutation types likely being major contributors. Tandem duplication/amplifications are thought to be widespread among bacteria but due to their often-intractable size and instability, comprehensive studies of these mutations are rare. We define a methodology to investigate amplifications in bacterial genomes based on read depth of genome sequence data as a proxy for copy number. We demonstrate the approach with Bordetella pertussis, whose insertion sequence element-rich genome provides extensive scope for amplifications to occur. Analysis of data for 2430 B. pertussis isolates identified 272 putative amplifications, of which 94 % were located at 11 hotspot loci. We demonstrate limited phylogenetic connection for the occurrence of amplifications, suggesting unstable and sporadic characteristics. Genome instability was further described in vitro using long-read sequencing via the Nanopore platform, which revealed that clonally derived laboratory cultures produced heterogenous populations rapidly. We extended this research to analyse a population of 1000 isolates of another important pathogen, Mycobacterium tuberculosis. We found 590 amplifications in M. tuberculosis, and like B. pertussis, these occurred primarily at hotspots. Genes amplified in B. pertussis include those involved in motility and respiration, whilst in M. tuberuclosis, functions included intracellular growth and regulation of virulence. Using publicly available short-read data we predicted previously unrecognized, large amplifications in B. pertussis and M. tuberculosis. This reveals the unrecognized and dynamic genetic diversity of B. pertussis and M. tuberculosis, highlighting the need for a more holistic understanding of bacterial genetics.Entities:
Keywords: B. pertussis; amplifications; duplications; genetic diversity; genome structure
Mesh:
Year: 2022 PMID: 35143385 PMCID: PMC8942028 DOI: 10.1099/mgen.0.000761
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
qPCR primers and probes used in addition to the determined optimal concentration
|
Name |
expt |
Sequence (5′ to 3′) |
concn (nM) |
|---|---|---|---|
|
CNV_fw |
Copy number determination |
TCTGGGGAGTCGAAAGCAAT |
300 |
|
CNV_rv |
Copy number determination |
TCTTGAGGGTGGCGAAGAAT |
900 |
|
CNV_probe |
Copy number determination |
FAM-ACGCCCCTTGCTGACGTCGC-BHQ |
200 |
|
BP283_fw |
Copy number determination |
CAGGCACAGCACTATTGCG |
500 |
|
BP283_RV |
Copy number determination |
GACGATTACCAGCGAGATTACGA |
300 |
|
BP283_probe |
Copy number determination |
FAM-CCGCCATCGCAACCGTCGCATTCA-BHQ |
200 |
|
CNV_fw |
Gene expression |
TCTGGGGAGTCGAAAGCAAT |
300 |
|
CNV_rv |
Gene expression |
TCTTGAGGGTGGCGAAGAAT |
300 |
|
RecA_fw |
Gene expression |
CGCGTCAAGGTGGTCAAGA |
300 |
|
RecA_rv |
Gene expression |
CTGCCATACATGATGTCGAACTC |
300 |
|
RecA probe |
Gene expression |
FAM-TGGCGCCGCCGTTCAAGC-BHQ |
250 |
|
1736F |
Gene expression |
AAGACAAGCCCAAGCAATCG |
300 |
|
1736R |
Gene expression |
TCACCACGCCATTGTTCGT |
300 |
|
1736 probe |
Gene expression |
FAM-CGAGTACGCCTCCGATGCCACG-BHQ |
250 |
|
1740F |
Gene expression |
TGCGCAATCACTCCTCCAT |
300 |
|
1740R |
Gene expression |
AAGTCACGACATCGAGAAATTCAA |
300 |
|
1744F |
Gene expression |
ATGCCGGATTCGACGACTT |
300 |
|
1744R |
Gene expression |
CGGTCCTGGCGGATTTTC |
300 |
Fig. 1.Schematic overview of prediction of amplifications from sequencing read depth. In the theoretical example (purple box, left), the query strain contains a tandem amplification of gene 1 whilst gene 2 and 3 are at single copy (a). Short-reads from the query strain are generated (b) and mapped to the reference genome, that contains all genes at single copy (c). Reads from both copies of gene 1 in the query strain map to this locus in the reference sequence and thus twice as many reads map to this gene compared to genes 2 and 3. This data must be processed to avoid technical bias, the pipeline processes read coverage data into estimates of copy number (d). Using an example with real data (red box, right) the strain SAMN08200079 was analysed. Read coverage was plotted to reveal an amplification at ~1.4 Mb (e, analogous to theoretical graph c), which was statistically analysed using our pipeline (f, analogous to theoretical graph d).
Fig. 2.Heatmap containing the copy-number states gene-by-gene index (Y axis) of 9 isolates (A to I) belonging to network 1 (X axis). Colour scale (Z axis) indicates the copy number of each gene. A legend of the colour scale is on the far right.
Eleven hotspot loci contained 254/272 (93 %) of the predicted amplifications and are described here. Columns are, from left to right: the hotspot name (ordered by frequency), the number of strains in which each hotspot was observed, median number of genes in each amplification, median start gene, median end gene and mean copy number. Median start/end columns refer to the locus tags in the reference genome, B1917. Expanded on in Table S3
|
Hotspot name |
Frequency |
Median length (genes) |
Median start |
Median end |
Mean copy no. |
|---|---|---|---|---|---|
|
|
102 |
106 |
B1917_RS12140 |
B1917_RS12755 |
1.6 |
|
|
57 |
82 |
B1917_RS15100 |
B1917_RS15490 |
1.7 |
|
|
21 |
80 |
B1917_RS07175 |
B1917_RS07660 |
1.68 |
|
|
18 |
20 |
B1917_RS00010 |
B1917_RS00130 |
1.35 |
|
|
13 |
67 |
B1917_RS19230 |
B1917_RS19625 |
1.93 |
|
|
11 |
75 |
B1917_RS05505 |
B1917_RS05935 |
1.6 |
|
|
8 |
49 |
B1917_RS04185 |
B1917_RS04430 |
1.88 |
|
|
8 |
74 |
B1917_RS09665 |
B1917_RS10290 |
1.82 |
|
|
7 |
13 |
B1917_RS19965 |
B1917_RS10580 |
2.49 |
|
|
6 |
23 |
B1917_RS19465 |
B1917_RS19565 |
1.32 |
|
|
3 |
45 |
B1917_RS01035 |
B1917_RS01300 |
1.63 |
Fig. 3.Quantification of amplification copy number of 8 clones of UK54 (X axis) by qPCR demonstrated a range of copy numbers from 2.17 to 51.21 (Y axis). UK54 was predicted to have an 16 kb amplification at the copy number of 4.1. Eight clones of UK54 were screened for their copy number of this locus (X axis) and their copy number was determined (Y axis). For clarity, the exact copy number that was determined is noted in white text in each bar. A range of copy numbers could be seen, from 2.17 to 51.21.
Fig. 4.Nanopore sequencing of UK54 clone 8 revealed reads with between one and seven copies of the amplified locus (X axis) in varying frequencies (Y axis) within a single culture. No reads were observed that spanned the full locus, so the true copy number for these reads is unknown.
Fig. 5.DNA copy number and RNA expression of the gene B1917_RS10525 was quantified in clones 2, 4 and 8 using qPCR and RT-qPCR, respectively. Expression is shown as a relative fold change to clone 2. Error bars represent standard deviation. The results show that copy number corresponds to gene expression levels.
Fig. 6.Heatmap of genes ~3550 to ~3750 (Y axis) in 12 isolates of M. tuberuclosis (X axis). Estimates of copy number are colour coded. Multiple regions of increased copy number can be seen, displaying a hotspot-like effect.