| Literature DB >> 34596508 |
Duah Alkam1, Thidathip Wongsurawat2,3, Intawat Nookaew3, Anthony R Richardson4, David Ussery3, Mark S Smeltzer5, Piroon Jenjaroenpun2,3.
Abstract
As transposon sequencing (TnSeq) assays have become prolific in the microbiology field, it is of interest to scrutinize their potential drawbacks. TnSeq data consist of millions of nucleotide sequence reads that are generated by PCR amplification of transposon-genomic junctions. Reads mapping to the junctions are enumerated thus providing information on the number of transposon insertion mutations in each individual gene. Here we explore the possibility that PCR amplification of transposon insertions in a TnSeq library skews the results by introducing bias into the detection and/or enumeration of insertions. We compared the detection and frequency of mapped insertions when altering the number of PCR cycles, and when including a nested PCR, in the enrichment step. Additionally, we present nCATRAs - a novel, amplification-free TnSeq method where the insertions are enriched via CRISPR/Cas9-targeted transposon cleavage and subsequent Oxford Nanopore MinION sequencing. nCATRAs achieved 54 and 23% enrichment of the transposons and transposon-genomic junctions, respectively, over background genomic DNA. These PCR-based and PCR-free experiments demonstrate that, overall, PCR amplification does not significantly bias the results of TnSeq insofar as insertions in the majority of genes represented in our library were similarly detected regardless of PCR cycle number and whether or not PCR amplification was employed. However, the detection of a small subset of genes which had been previously described as essential is sensitive to the number of PCR cycles. We conclude that PCR-based enrichment of transposon insertions in a TnSeq assay is reliable, but researchers interested in profiling putative essential genes should carefully weigh the number of amplification cycles employed in their library preparation protocols. In addition, nCATRAs is comparable to traditional PCR-based methods (Kendall's correlation=0.896-0.897) although the latter remain superior owing to their accessibility and high sequencing depth.Entities:
Keywords: Cas9; Nanopore; PCR-bias; PCR-free; TnSeq; nCATS; transposon
Mesh:
Substances:
Year: 2021 PMID: 34596508 PMCID: PMC8627206 DOI: 10.1099/mgen.0.000655
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 4.NESTED identifies fewer insertion counts in candidate essential genes. (a) The insertion counts per gene of libraries prepared by either PCR25 or NESTED methods were fitted to a Gaussian Mixture Model (GMM). The GMM was generated for both exponential (top) and stationary (bottom) phase datasets. (b) Violin plots showing the distribution of insertion counts of genes belonging to either the ‘low-counts’ (left) or ‘high-counts’ (right) groups. Four asterisks represent P-value<2.2e-16 calculated by the Wilcoxon signed-rank test. (c) Kendall correlation plots of insertion counts in the ‘low-counts’ genes (left) and ‘high-counts’ genes (right). Three asterisks represent P-value <2.2e-16 (d) log2fold change of insertion counts in ‘low-counts’ genes (red) and ‘high-counts’ genes (green) assessed by PCR25 relative to NESTED. For a–d: each plot represents insertion counts from a pool of three TnSeq libraries grown independently to the noted growth phase and prepared via the same method outlined in (Fig. 3a). (e) UpsetR showing the intersections of genes which fall in each of the indicated categories. Categories include genes described as essential by: Grosser2018: [10]; Fey2013: [32]; and Valentino2014: [11]. Low_counts: genes with low insertion counts which were grouped by GMM described in (a). Intersections of the ‘low-counts’ and at least one other group are highlighted in pink and add up to 421 genes. The ‘low-counts’ genes not previously described as essential by any of the indicated groups are highlighted in blue and add up to 89 genes.
Fig. 3.Number of PCR cycles determines sensitivity of insertion detection and sequencing coverage of insertions. (a) Schematic describing three methods for TnSeq library preparation that vary in the number of PCR cycles and the inclusion of a nested PCR during the transposon insertion enrichment step. Created with BioRender. (b) Sequencing reads were pooled from all libraries (both exponential and stationary) prepared via the same PCR-based method described in (a). These pools were randomly sampled and the number of unique insertions (left) and number of genes with (at least one) mapped insertions (right) were determined using different sized sub-sets of sequence reads drawn randomly from the combined reads. The dashed lines indicate the maximum number of unique insertions and genes in the genome. The numbers in round brackets indicate the number of sequencing reads and the number of unique insertion from each method, respectively. (c) Kendall correlation plots of normalized transposon insertion counts per gene between PCR25 and NESTED. Each sample represents a pool of three TnSeq libraries grown independently and prepared via the same method outlined in (a). Here, the TnSeq libraries grown to either exponential (left) or stationary (right) phase are presented separately. Each dot represents a gene and is coloured in semi-transparent grey. Those which appear in a darker colour are high-density overlapped dots. Triple asterisks represent a P-value <2.2e-16. Axes are presented in log scale.
The nucleotide sequence of the whole USA300 LAC genome was determined using both Illumina and Oxford Nanopore Technologies methods. The short and long reads resulting from both platforms were used to assemble a closed circular genome de novo. In addition, we sequenced and assembled the cryptic plasmid within this strain. The chromosome is 2874400 bp with a cryptic plasmid of 3125 bp, totaling one TA dinucleotide every ~10 bp on average. The number of different annotated characteristics is shown, including TA dinucleotide sequences which are the target for the Mariner transposon used to create the transposon mutant library
|
Features annotated |
Chromosome |
Plasmid | ||
|---|---|---|---|---|
|
no. of features |
TA sites |
no. of features |
TA sites | |
|
CDSs |
2803 |
223337 |
3 |
170 |
|
tRNAs |
59 |
185 |
0 |
0 |
|
rRNAs |
19 |
1609 |
0 |
0 |
|
ncRNAs |
3 |
67 |
0 |
0 |
|
Intergenic regions |
2460 |
50208 |
4 |
135 |
|
Total |
5344 |
275406 |
7 |
305 |
CDS, Coding Sequence.
Fig. 1.nCATRAs: a novel amplification-free transposon enrichment and sequencing method. (a) Schematic showing the steps of the nCATRAs method - (adapted from [17]). The Cas9 nuclease is simultaneously targeted to two adjacent sites (in the same reaction tube) at the 5’ end of the transposon (region of interest) releasing the transposon-genomic junctions that are subsequently sequenced using Oxford Nanopore Technologies (ONT). The resulting sequencing reads are mapped to the transposon in (b), which shows enrichment at the Cas9 cut sites, quantified in (c). Sequencing of the TnSeq library without nCATRAs enrichment, denoted as ‘WGS’ for whole-genome sequencing’ emphasizes the necessity of enriching for the transposon as only 0.17% of the reads mapped to the transposon (sum of: 0.04% junctions; and 0.13% transposon sequence without genomic region attached). The nCATRAs method substantially enriched for the transposon as a total of 54% (sum of: 23% junctions; and 31% transposon sequence without genomic region attached) of the reads mapped to the Cas9 cut sites on the transposon. The plots represent read pooled from three experimental replicates. Panel (a) was created with BioRender.
Fig. 2.The novel amplification-free TnSeq method (nCATRAs) is comparable to traditional PCR-based methods. (a) Distribution and frequency of mapped transposon insertions of samples prepared by either the PCR-based PCR25 (outer blue track), NESTED method (middle, pink track) or by the PCR-free nCATRAs method (inner, purple track). Each track represents a pool of three libraries grown to stationary phase and prepared by the same method. Scales were adjusted per track to illustrate the differences among the samples. (b) Kendall correlation of the normalized insertion counts in ‘low-counts’ genes (left) and ‘high-counts’ genes (right) between nCATRAs and the PCR-based methods. Here, the correlation plots were separated by the groups of genes defined in (Fig. 4a). (c) Genes with zero insertions detected by nCATRAs were extracted from the PCR25 (blue) and NESTED (pink) datasets and distribution of their insertion counts detected by each of the PCR-based methods was plotted. The dashed lines indicate the 90th percentile where 90% of the genes have insertion counts <15 (NESTED) and <100 (PCR25).