| Literature DB >> 30787116 |
Troy P Hubbard1,2, Jonathan D D'Gama1,2, Gabriel Billings1,2, Brigid M Davis1,2, Matthew K Waldor3,2,4.
Abstract
Transposon insertion sequencing (TIS) is a widely used technique for conducting genome-scale forward genetic screens in bacteria. However, few methods enable comparison of TIS data across multiple replicates of a screen or across independent screens, including screens performed in different organisms. Here, we introduce a post hoc analytic framework, comparative TIS (CompTIS), which utilizes unsupervised learning to enable meta-analysis of multiple TIS data sets. CompTIS first implements screen-level principal-component analysis (PCA) and clustering to identify variation between the TIS screens. This initial screen-level analysis facilitates the selection of related screens for additional analyses, reveals the relatedness of complex environments based on growth phenotypes measured by TIS, and provides a useful quality control step. Subsequently, PCA is performed on genes to identify loci whose corresponding mutants lead to concordant/discordant phenotypes across all or in a subset of screens. We used CompTIS to analyze published intestinal colonization TIS data sets from two vibrio species. Gene-level analyses identified both pan-vibrio genes required for intestinal colonization and conserved genes that displayed species-specific requirements. CompTIS is applicable to virtually any combination of TIS screens and can be implemented without regard to either the number of screens or the methods used for upstream data analysis.IMPORTANCE Forward genetic screens are powerful tools for functional genomics. The comparison of similar forward genetic screens performed in different organisms enables the identification of genes with similar or different phenotypes across organisms. Transposon insertion sequencing is a widely used method for conducting genome-scale forward genetic screens in bacteria, yet few bioinformatic approaches have been developed to compare the results of screen replicates and different screens conducted across species or strains. Here, we used principal-component analysis (PCA) and hierarchical clustering, two unsupervised learning approaches, to analyze the relatedness of multiple in vivo screens of pathogenic vibrios. This analytic framework reveals both shared pan-vibrio requirements for intestinal colonization and strain-specific dependencies. Our findings suggest that PCA-based analytics will be a straightforward widely applicable approach for comparing diverse transposon insertion sequencing screens.Entities:
Keywords: PCA; Tn-seq; Vibrio choleraezzm321990; host-pathogen interactions; in vivo screen; principal-component analysis; vibrio pathogenesis
Mesh:
Substances:
Year: 2019 PMID: 30787116 PMCID: PMC6382967 DOI: 10.1128/mSphere.00031-19
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
FIG 1Screen-level comparative TIS analysis of V. parahaemolyticus screens. (A) Screen-level PCA of 5 V. parahaemolyticus TIS screens (1 in vitro, 4 in vivo biological replicates); units shown on axes are arbitrary values in principal component space. (B) Hierarchical agglomerative clustering with bootstrapping; values at each node represent approximately unbiased values calculated via pvclust; distal small intestine (dSI) 1 to 4 represent the 4 in vivo replicates. (C) Distance matrix of clustering.
FIG 2Gene-level comparative TIS analysis of V. parahaemolyticus screens. (A) Variance explained by each principal component in gene-level PCA of V. parahaemolyticus genes across 4 in vivo screens (distal small intestine [dSI]). (B) Principal component 1 coefficients. (C) PC1 score distribution across all genes. (D) Heatmap of log2(fold change) values for each in vivo replicate for the genes with the lowest 0.5% of gene-level PC1 scores. Genes in the T3SS2 gene cluster, a critical colonization factor, are highlighted.
FIG 3PCA-based analyses of in vivo TIS data from 3 pathogenic vibrio strains. (A) Screen-level PCA of V. cholerae C6706 (Vc Peru), V. cholerae H1 (Vc Haiti), and V. parahaemolyticus (Vp) in vivo screens; units shown on axes are arbitrary values in principal component space. (B) Hierarchical agglomerative clustering with bootstrapping; values at each node represent approximately unbiased values calculated via pvclust. (C) Distance matrix of clustering. (D) Variance explained by each principal component of gene-level PCA of all conserved vibrio genes across 11 in vivo screens. (E) Principal component 1 and 2 coefficients.
FIG 4Gene-level PC1 and PC2 identify genes required for colonization by all strains and by specific strains, respectively. (A) Heatmap of log2(fold change) values of genes with the 1% lowest of gene-level PC1 scores across 11 in vivo vibrio screens. (B) Categories highly represented among the genes with the lowest 1% of PC1 scores (25 genes total). (C) Heatmap of a subset of genes with discordant L2FC values across strains, selected from genes with the lowest 1% or highest 1% of gene-level PC2 scores.