| Literature DB >> 28096490 |
Michael A DeJesus1, Elias R Gerrick2, Weizhen Xu3, Sae Woong Park3, Jarukit E Long4, Cara C Boutte2, Eric J Rubin2, Dirk Schnappinger3, Sabine Ehrt3, Sarah M Fortune2, Christopher M Sassetti4,5, Thomas R Ioerger6.
Abstract
For decades, identifying the regions of a bacterial chromosome that are necessary for viability has relied on mapping integration sites in libraries of random transposon mutants to find loci that are unable to sustain insertion. To date, these studies have analyzed subsaturated libraries, necessitating the application of statistical methods to estimate the likelihood that a gap in transposon coverage is the result of biological selection and not the stochasticity of insertion. As a result, the essentiality of many genomic features, particularly small ones, could not be reliably assessed. We sought to overcome this limitation by creating a completely saturated transposon library in Mycobacterium tuberculosis In assessing the composition of this highly saturated library by deep sequencing, we discovered that a previously unknown sequence bias of the Himar1 element rendered approximately 9% of potential TA dinucleotide insertion sites less permissible for insertion. We used a hidden Markov model of essentiality that accounted for this unanticipated bias, allowing us to confidently evaluate the essentiality of features that contained as few as 2 TA sites, including open reading frames (ORF), experimentally identified noncoding RNAs, methylation sites, and promoters. In addition, several essential regions that did not correspond to known features were identified, suggesting uncharacterized functions that are necessary for growth. This work provides an authoritative catalog of essential regions of the M. tuberculosis genome and a statistical framework for applying saturating mutagenesis to other bacteria. IMPORTANCE: Sequencing of transposon-insertion mutant libraries has become a widely used tool for probing the functions of genes under various conditions. The Himar1 transposon is generally believed to insert with equal probabilities at all TA dinucleotides, and therefore its absence in a mutant library is taken to indicate biological selection against the corresponding mutant. Through sequencing of a saturated Himar1 library, we found evidence that TA dinucleotides are not equally permissive for insertion. The insertion bias was observed in multiple prokaryotes and influences the statistical interpretation of transposon insertion (TnSeq) data and characterization of essential genomic regions. Using these insights, we analyzed a fully saturated TnSeq library for M. tuberculosis, enabling us to generate a comprehensive catalog of in vitro essentiality, including ORFs smaller than those found in any previous study, small (noncoding) RNAs (sRNAs), promoters, and other genomic features.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28096490 PMCID: PMC5241402 DOI: 10.1128/mBio.02133-16
Source DB: PubMed Journal: MBio Impact factor: 7.867
FIG 1 Cumulative fraction of TA sites represented as independent TnSeq data sets (black line). The gray bars show the saturation level of the individual data sets.
Insertion count statistics of TA sites
| TA site category | No. (%) of sites | % saturation | Mean read count (nonzero sites) |
|---|---|---|---|
| All | 74,602 | 84.3 | 182.27 |
| Those in putative nonessential regions | 67,992 | 92.5 | 182.27 |
| Those in high-coverage regions | 57,452 | 96.5 | 189.54 |
| Those in HC regions not matching NP sequence motif | 52,672 | 98.8 | 192.18 |
| All matching the NP motif | 6,659 (9%) | 59.7 | 50.00 |
| All not matching the NP motif | 67,943 (91%) | 96.1 | 184.94 |
Nonessential regions are defined as regions not containing a run of 4 or more unoccupied sites.
High-coverage regions are based on labeling by the segmentation algorithm.
FIG 2 (a) Logo plot of log2 of nucleotide frequencies surrounding TA sites in the set of 1,746 unoccupied sites found in high-coverage regions (nonpermissive set). (b) Logo plot of log2 of nucleotide frequencies surrounding TA sites in the permissive set.
FIG 3 (a) Number of libraries representing sites matching (GC)GNTANC(GC) occupied by at least 1 insertion (red), compared to distribution over all TA sites (blue). (b) Box plot of nonzero insertion counts at sites matching the NP sequence motif versus sites not matching the motif. The boxes show the 25% to 75% interquartile range, while the whiskers show the majority of the range of insertion counts, except for the most extreme outliers.
Statistics for TnSeq data sets analyzed to determine transposon bias at sites matching NP motif
| Organism | Tn | Study | %GC | No. of TA sites | No. of NP sites | % density | % density at NP sites |
|---|---|---|---|---|---|---|---|
| This study | 66 | 74,602 | 6,659 | 84 | 60 | ||
| Murray et al. ( | 67 | 44,708 | 1,672 | 94 | 43 | ||
| Gawronski et al. ( | 38 | 131,954 | 814 | 53 | 6 | ||
| Chao et al. ( | 47 | 192,681 | 4,439 | 56 | 7 | ||
| Tn | Fels et al. ( | 63 | 61,769 | 2,687 | 7 | 8 | |
| Tn | Sarmiento et al. ( | 33 | 133,503 | 514 | 5 | 6 | |
| Tn | Langridge et al. ( | 53 | 233,259 | 9,289 | 3 | 4 | |
| Tn | Pechter et al. ( | 65 | 72,385 | 3,844 | 3 | 3 |
Although the Tn5 transposon can insert at many different sites, the analysis was restricted to TA dinucleotides to investigate if Tn5 had difficulty inserting in sites matching the NP motif as well.
FIG 4 Mean read count at sites with at least one insertion for data sets made with the Himar1 transposon (A) and the Tn5 transposon (B). The nonpermissive sites (white bar), which match the nonpermissive motif identified in this study, significantly suppressed the read counts relative to sites that do not match the motif (permissive sites; grey bar). In contrast, limiting the analysis of the Tn5 data sets to only insertions at TA dinucleotides, the mean read counts are similar for the permissive and nonpermissive sites. Error bars show the standard errors of the means.
Essentiality of non-ORF genomic features
| Feature | Total no. | No. with ≥2 TA sites | No. essential |
|---|---|---|---|
| sRNA | 62 | 48 | 7 |
| tRNA | 45 | 35 | 21 |
| rRNA and other structural RNAs | 5 | 5 | 5 |
| DNA methylation site | 362 | 55 | 0 |
| Predicted rho-independent terminator | 148 | 73 | 2 |
| 5′UTR | 1,558 | 1,003 | 39 |
| Promoter region | 2,060 | 1,841 | 57 |