| Literature DB >> 22900082 |
Aldert Zomer1, Peter Burghout, Hester J Bootsma, Peter W M Hermans, Sacha A F T van Hijum.
Abstract
High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally) essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22900082 PMCID: PMC3416827 DOI: 10.1371/journal.pone.0043012
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Effect of statistical methods on the prediction of essential genes based on two datasets.
| Experiment | Applied processing step | Essential genes detected | Predictive value | ||
| AUC | std.error | P | |||
| Essential |
| 288 | 0.9517 | 2.23E-02 | 1.11E-22 |
|
| 305 | 0.9588 | 1.83E-02 | 6.36E-23 | |
|
| 358 | 0.9996 | 7.12E-04 | 3.20E-38 | |
|
| 359 | 0.9996 | 7.12E-04 | 6.08E-38 | |
|
| 342 | 1.0000 | 0.00E+00 | 1.26E-37 | |
|
| 339 | 1.0000 | 0.00E+00 | 3.36E-40 | |
| Essential for tobramycinresistance |
| 185 | 0.6774 | 8.06E-02 | 1.27E-02 |
|
| 180 | 0.6747 | 8.07E-02 | 1.29E-02 | |
|
| 173 | 0.6667 | 8.06E-02 | 1.08E-02 | |
|
| 174 | 0.6640 | 8.10E-02 | 1.06E-02 | |
|
| 190 | 0.6640 | 8.10E-02 | 1.09E-02 | |
|
| 121 | 0.7634 | 7.40E-02 | 2.16E-03 | |
| Literature | 117 | 0.7406 | 7.56E-02 | 2.66E-03 | |
The predictive value of each method was assessed using ROC curves and a Welch T-test.
Cut-offs for S. pneumoniae R6 were automatically detected by ESSENTIALS while for P. aeruginosa PAO1 a cut-off of 2.5 fold underrepresentation of reads per gene in the challenge condition was used to facilitate comparison with the literature data from Gallagher et al.
Figure 1Box whisker plots of gene essentiality data.
Box whisker plot showing the sample minimum, lower quartile, median, upper quartile, and sample maximum of (A) fold change data of essential (E) and nonessential (NE) genes for growth of S. pneumoniae and (B) fold change data of essential (E) and nonessential (NE) genes for tobramycin resistance of P. aeruginosa PAO1 as calculated by ESSENTIALS after the various processing steps and in the case of P. aeruginosa PAO1 also for the fold change data presented by Gallagher et al. [10]. Significant difference between the essential and non-essential gene distributions is shown by *(p<0.01).
Figure 2Read count as a function of genomic position per 1 kb.
Read count of a single Tn-seq experiment of S. pneumoniae R6 gene essentiality as a function of the genomic position before (A) and after (B) genomic location correction using Loess. Each dot represents 1 kb of sequence. Regression on the data was performed using Loess as implemented in the loess R package and plotted on the graph as a black line.
The use of ESSENTIALS on data generated by various transposon sequencing techniques.
| Strain | Condition | Number of essential genes/Log2 FC cut-off | Method | Ref | ||||
| Literature | ESSENTIALS | Overlap | ||||||
| N | FC | N | FC | N | ||||
|
| essential | 396 | NA | 423 | −4.1 | 357 | Tn-seq |
|
|
| essential | 356 | NA | 335 | −3.71 | 323 | TraDIS |
|
| bile salt | 169 | −1.40 | 229 | −1.40 | 161 | TraDIS | ||
|
| essential | 358 | −4.32 | 383 | −3.2 | 344 | HITS |
|
| Rd |
| 141 | −1.79 | 130 | −1.79 | 100 | HITS | |
Optimal fold change (FC) underrepresentation cut-offs detected by ESSENTIALS; N: number; NA: Not available, a different method was used to determine gene essentiality in these studies.
A minimum normalized average read count of 50 reads per gene was required; FC cut-offs were the same as used in the literature reference to facilitate comparison.
Although the authors state in their methods that a −2 log2 fold change and a p<1*10-5, adj. p<2.5E-4 cut-off was used, only the p-value cut-offs were applied, resulting in a -1.4 fold change cut-off (personal communication Julian Parkhill).
Figure 3Simplified flowchart of the ESSENTIALS procedure.
Links to sequence reads files are uploaded and parameters are optionally changed via the FG-web interface that works on most web-browsers. It allows users to perform multiple runs at the same time through session management. As processes are queued, users can start multiple analyses at the same time, and check the progress via web-pages that can be bookmarked.