UNLABELLED: Transposon insertion sequencing is a high-throughput technique for assaying large libraries of otherwise isogenic transposon mutants providing insight into gene essentiality, gene function and genetic interactions. We previously developed the Transposon Directed Insertion Sequencing (TraDIS) protocol for this purpose, which utilizes shearing of genomic DNA followed by specific PCR amplification of transposon-containing fragments and Illumina sequencing. Here we describe an optimized high-yield library preparation and sequencing protocol for TraDIS experiments and a novel software pipeline for analysis of the resulting data. The Bio-Tradis analysis pipeline is implemented as an extensible Perl library which can either be used as is, or as a basis for the development of more advanced analysis tools. This article can serve as a general reference for the application of the TraDIS methodology. AVAILABILITY AND IMPLEMENTATION: The optimized sequencing protocol is included as supplementary information. The Bio-Tradis analysis pipeline is available under a GPL license at https://github.com/sanger-pathogens/Bio-Tradis CONTACT: parkhill@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
UNLABELLED: Transposon insertion sequencing is a high-throughput technique for assaying large libraries of otherwise isogenic transposon mutants providing insight into gene essentiality, gene function and genetic interactions. We previously developed the Transposon Directed Insertion Sequencing (TraDIS) protocol for this purpose, which utilizes shearing of genomic DNA followed by specific PCR amplification of transposon-containing fragments and Illumina sequencing. Here we describe an optimized high-yield library preparation and sequencing protocol for TraDIS experiments and a novel software pipeline for analysis of the resulting data. The Bio-Tradis analysis pipeline is implemented as an extensible Perl library which can either be used as is, or as a basis for the development of more advanced analysis tools. This article can serve as a general reference for the application of the TraDIS methodology. AVAILABILITY AND IMPLEMENTATION: The optimized sequencing protocol is included as supplementary information. The Bio-Tradis analysis pipeline is available under a GPL license at https://github.com/sanger-pathogens/Bio-Tradis CONTACT: parkhill@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Steady improvements in high-throughput sequencing technologies have resulted in an increasing number of sequenced bacterial genomes, revealing extensive genetic diversity both within and between species. Associated sequencing-based technologies, such as RNA-seq, ChIP-seq and RIP-seq provide insight into the effects of this variation on gene expression and regulation; however, none provides direct information on cell survival, and hence how this genetic variation may impact the fitness of the bacterium (Gray ). Transposon insertion sequencing (TIS) bridges this gap between sequence and fitness by allowing for direct measurement of survival dynamics within a population of single transposon mutants, by using sequencing reads flanking transposon insertions as a read-out of mutant frequency within the population (Barquist ; Van Opijnen and Camilli, 2013). We previously developed a method for this purpose, called Transposon Directed Insertion Sequencing (TraDIS; (Langridge ). TraDIS uses fragmentation of genomic DNA followed by specific PCR amplification of transposon-containing fragments to selectively enrich for transposon-flanking sequences, and can be adapted for any transposon of interest through a simple redesign of sequencing primers. TraDIS has since been applied to a variety of target organisms and transposons in a wide variety of both in vivo and in vitro growth conditions. These include Tn5-based libraries in Salmonella (Barquist ; Chaudhuri et al., 2013; Langridge ) and Escherichia (Dziva et al., 2013; Eckert ) and Mariner-based libraries in Clostridia (Dembek ) and Mycobacteria (Weerdenburg ).
2 Library preparation and sequencing
We have made a number of refinements to the TraDIS sequencing protocol since its initial publication (Langridge ), described in more detail in the supplement. We have redesigned TraDIS adapters and primers using a splinkerette approach (Devon ; Rad ; Uren ), which increases enrichment of genuine transposon-chromosome junctions by preventing hybridization of the reverse primer until the transposon-specific forward primer has generated a complementary strand. We have substituted a magnetic bead-based fragment size selection for gel-based size selection to increase yield and allow for easier automation (Bronner ). Finally, we have substituted Kapa Hifi DNA polymerase for Taq polymerase, as this enzyme has been shown to have minimal amplification biases (Quail ), and reduced the number of cycles of PCR amplification to provide a more accurate representation of input.TraDIS sequencing primers are designed to begin sequencing within the transposon sequence, so as to provide a short 8–10 base ‘transposon tag’ at the beginning of each read to verify that each read originates from a genuine transposon-chromosome junction. This poses a challenge for Illumina sequencing machines, as the base-calling algorithms assume a complex sample for the purposes of calibration. We have developed HiSeq and MiSeq recipes that use ‘dark cycles’ during which chemistry is run but no imaging is performed to read through this transposon tag, before imaged sequencing commences on the complex chromosomal DNA (see supplement). Once the first read is completed, the DNA is denatured and the transposon-specific sequencing primer is re-annealed for a separate short 10–12 cycle transposon read. This requires a PhiX (or other complex library) spike-in of 5–10% to prevent sequencing failure due to a lack of fluorescence in some channels. Using this protocol we routinely achieve results of > 90% of sequencing reads both containing an intact transposon tag and mapping uniquely to the source genome. We have applied this method to Tn5-, Tn917-, Himar1- and Mu-based mutant libraries, and it should be adaptable to any transposon of interest assuming a suitable priming site exists (see supplement for design parameter details).
3 The Bio-Tradis analysis pipeline
To support the use of this improved TraDIS protocol, we have developed a portable processing and analysis pipeline implemented in the Perl and R languages. The functionality provided is similar to that in other recently published TIS analysis pipelines (DeJesus ; Solaimanpour ), however our command-line driven approach has been designed with a production environment in mind, where many sequencing libraries may be processed simultaneously. We provide tools for each step of analysis from the raw unaligned fastq files produced by the sequencer, through to predictions of gene essentiality and fitness effects. The main pipeline script, bacteria_tradis, filters reads in fastq format for transposon tags, removes these tags, then maps the modified reads using the SMALT short read mapper (https://www.sanger.ac.uk/resources/software/smalt/), with support for multiple contigs and/or replicons, such as plasmids. Default k-mer, step size and percent identity parameters are set depending on input read length, though these can be manually specified by the user. The mapped bam file is then processed to produce plot files, containing insertion counts per nucleotide, suitable for visualization in the Artemis genome browser (Carver ) and for further analysis. The mapping, processing, and data manipulation steps are implemented as self-contained Perl modules that could be easily used as a foundation for the development of more sophisticated analyses.Additional scripts are provided to process these plot files in conjunction with genome annotations in EMBL-Bank format to produce annotated tab-delimited files containing various statistics including read counts and unique insertion sites per gene. Two basic analysis scripts for this gene-level data written in R are available. One, tradis_essentiality.R, produces predictions of gene essentiality within a high-density transposon library based on the empirically observed bimodal distribution of insertion sites over genes when normalized for gene length (Barquist ; Langridge ). The second, tradis_comparisons.R, applies the edgeR package (Robinson ) to identify significant differences in read counts, and hence mutant frequencies, between experimental conditions (Dembek ) providing insight into the relative contribution of all mutagenized genes to fitness under the assayed condition.
4 Summary
We have described recent refinements to the TraDIS method for the sequencing and analysis of dense transposon libraries. These include an optimized sequencing protocol, and processing and analysis tools that can rapidly provide insight into the contribution of genomic regions to organismal fitness. It is our hope that making these tools more accessible will accelerate their application to an ever wider variety of bacteria and experimental conditions.
Authors: Sabine E Eckert; Francis Dziva; Roy R Chaudhuri; Gemma C Langridge; Daniel J Turner; Derek J Pickard; Duncan J Maskell; Nicholas R Thomson; Mark P Stevens Journal: J Bacteriol Date: 2011-01-28 Impact factor: 3.490
Authors: Francis Dziva; Heidi Hauser; Thomas R Connor; Pauline M van Diemen; Graham Prescott; Gemma C Langridge; Sabine Eckert; Roy R Chaudhuri; Christa Ewers; Melha Mellata; Suman Mukhopadhyay; Roy Curtiss; Gordon Dougan; Lothar H Wieler; Nicholas R Thomson; Derek J Pickard; Mark P Stevens Journal: Infect Immun Date: 2012-12-28 Impact factor: 3.441
Authors: Eveline M Weerdenburg; Abdallah M Abdallah; Farania Rangkuti; Moataz Abd El Ghany; Thomas D Otto; Sabir A Adroub; Douwe Molenaar; Roy Ummels; Kars Ter Veen; Gunny van Stempvoort; Astrid M van der Sar; Shahjahan Ali; Gemma C Langridge; Nicholas R Thomson; Arnab Pain; Wilbert Bitter Journal: Infect Immun Date: 2015-02-17 Impact factor: 3.441
Authors: Marcin Dembek; Lars Barquist; Christine J Boinett; Amy K Cain; Matthew Mayho; Trevor D Lawley; Neil F Fairweather; Robert P Fagan Journal: MBio Date: 2015-02-24 Impact factor: 7.867
Authors: Lars Barquist; Gemma C Langridge; Daniel J Turner; Minh-Duy Phan; A Keith Turner; Alex Bateman; Julian Parkhill; John Wain; Paul P Gardner Journal: Nucleic Acids Res Date: 2013-03-06 Impact factor: 16.971
Authors: Roy R Chaudhuri; Eirwen Morgan; Sarah E Peters; Stephen J Pleasance; Debra L Hudson; Holly M Davies; Jinhong Wang; Pauline M van Diemen; Anthony M Buckley; Alison J Bowen; Gillian D Pullinger; Daniel J Turner; Gemma C Langridge; A Keith Turner; Julian Parkhill; Ian G Charles; Duncan J Maskell; Mark P Stevens Journal: PLoS Genet Date: 2013-04-18 Impact factor: 5.917
Authors: Leah M Smith; Simon A Jackson; Lucia M Malone; James E Ussher; Paul P Gardner; Peter C Fineran Journal: Nat Microbiol Date: 2021-01-04 Impact factor: 17.745
Authors: Joyce E Karlinsey; Taylor A Stepien; Matthew Mayho; Larissa A Singletary; Lacey K Bingham-Ramos; Michael A Brehm; Dale L Greiner; Leonard D Shultz; Larry A Gallagher; Matt Bawn; Robert A Kingsley; Stephen J Libby; Ferric C Fang Journal: Cell Host Microbe Date: 2019-08-22 Impact factor: 21.023
Authors: Luchang Zhu; Randall J Olsen; Stephen B Beres; Matthew Ojeda Saavedra; Samantha L Kubiak; Concepcion C Cantu; Leslie Jenkins; Andrew S Waller; Zhizeng Sun; Timothy Palzkill; Adeline R Porter; Frank R DeLeo; James M Musser Journal: JCI Insight Date: 2020-06-04
Authors: Luchang Zhu; Prasanti Yerramilli; Layne Pruitt; Matthew Ojeda Saavedra; Concepcion C Cantu; Randall J Olsen; Stephen B Beres; Andrew S Waller; James M Musser Journal: Infect Immun Date: 2020-09-18 Impact factor: 3.441