| Literature DB >> 25777298 |
Jesse Stombaugh1, Abel Licon1, Žaklina Strezoska1, Joshua Stahl1, Sarah Bael Anderson1, Michael Banos1, Anja van Brabant Smith1, Amanda Birmingham1, Annaleen Vermeulen2.
Abstract
RNA interference screening using pooled, short hairpin RNA (shRNA) is a powerful, high-throughput tool for determining the biological relevance of genes for a phenotype. Assessing an shRNA pooled screen's performance is difficult in practice; one can estimate the performance only by using reproducibility as a proxy for power or by employing a large number of validated positive and negative controls. Here, we develop an open-source software tool, the Power Decoder simulator, for generating shRNA pooled screening experiments in silico that can be used to estimate a screen's statistical power. Using the negative binomial distribution, it models both the relative abundance of multiple shRNAs within a single screening replicate and the biological noise between replicates for each individual shRNA. We demonstrate that this simulator can successfully model the data from an actual laboratory experiment. We then use it to evaluate the effects of biological replicates and sequencing counts on the performance of a pooled screen, without the necessity of gathering additional data. The Power Decoder simulator is written in R and Python and is available for download under the GNU General Public License v3.0.Entities:
Keywords: Monte Carlo simulations; RNA interference; pooled screening; power analysis; shRNA library
Mesh:
Substances:
Year: 2015 PMID: 25777298 PMCID: PMC4543901 DOI: 10.1177/1087057115576715
Source DB: PubMed Journal: J Biomol Screen ISSN: 1087-0571
Figure 1.Differential enrichment and depletion of short hairpin RNAs (shRNAs) in engineered screens. MA plots of representative examples of normalized data from experimental shRNA pooled screens with engineered twofold enrichment and depletion of shRNAs in which transductions were performed at (A) 100 and (B) 500 independent shRNA integrations on average. The shRNAs with significantly (p* ≤ 0.05) higher and lower abundance in T1 in the next-generation sequencing count data are in red and blue, respectively. Power values listed are mean ± standard deviation over 30 normalizations.
Figure 2.Modeled next-generation sequencing (NGS) screen data compared with actual experimental NGS screen data. Kernel density estimate plots for the distributions of NGS counts for representative examples of normalized actual (red) and simulated (blue) T0 data generated by fitting parameters to the negative binomial distribution for (A) Screen 100_2x and (C) Screen 500_2x. Cumulative distributions of the same actual and simulated T0 count distributions for (B) Screen 100_2x and (D) Screen 500_2x.
Figure 3.Differential enrichment and depletion of short hairpin RNAs (shRNAs) in simulated screens. MA plots for two representative examples of simulated data from shRNA pooled screens with in silico twofold enrichment and depletion of shRNAs based on (A) Screen 100_2x and (B) Screen 500_2x. The shRNAs with significantly (p* ≤ 0.05) higher and lower abundance in T1 in the simulated next-generation sequencing count data are in red and blue, respectively. Power values listed are mean ± standard deviation over 900 simulations.
Figure 4.Simulated and actual powers for both high- and low-noise screens. The power of each experiment with (A) two and (B) three replicates for actual (red) and simulated (blue) data. Error bars are the standard deviations of 30 normalizations for actual experiments and 900 simulations for simulated experiments. (C) The correlation between simulated and actual power for three-replicate experiments.
Comparison of Actual and Simulated Screen Data Analysis.
| Actual Power (%) | Actual Specificity (%) | Simulated Power (%) | Simulated Specificity (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Experiment | Number of Replicates | Average | σ | Average | σ | Average | σ | Average | σ |
| Screen 100_2x | 2 | 12.93 | 0.34 | 99.22 | 0.03 | 15.24 | 1.72 | 99.82 | 0.07 |
| Screen 100_2x | 3 | 27.60 | 0.45 | 98.95 | 0.01 | 30.43 | 1.97 | 99.66 | 0.10 |
| Screen 100_4x | 2 | 65.49 | 0.60 | 98.50 | 0.03 | 71.77 | 1.67 | 99.27 | 0.14 |
| Screen 100_4x | 3 | 76.10 | 0.55 | 98.68 | 0.01 | 84.37 | 1.25 | 99.15 | 0.15 |
| Screen 500_1.5xPCR_100 | 2 | 0.94 | 0.17 | 99.62 | 0.03 | 4.21 | 1.16 | 99.95 | 0.04 |
| Screen 500_1.5xPCR_100 | 3 | 3.34 | 0.38 | 99.78 | 0.02 | 11.19 | 1.60 | 99.87 | 0.06 |
| Screen 500_2xPCR_100 | 2 | 33.73 | 0.46 | 98.82 | 0.02 | 38.43 | 2.15 | 99.58 | 0.11 |
| Screen 500_2xPCR_100 | 3 | 34.79 | 0.69 | 99.16 | 0.02 | 50.10 | 1.93 | 99.46 | 0.12 |
| Screen 500_1.5x | 2 | 28.96 | 0.83 | 99.31 | 0.03 | 41.18 | 2.16 | 99.58 | 0.11 |
| Screen 500_1.5x | 3 | 58.35 | 0.92 | 98.95 | 0.02 | 63.19 | 1.75 | 99.33 | 0.14 |
| Screen 500_2x | 2 | 83.45 | 0.60 | 98.73 | 0.02 | 87.24 | 1.09 | 99.13 | 0.15 |
| Screen 500_2x | 3 | 93.26 | 0.39 | 98.60 | 0.01 | 95.52 | 0.67 | 99.00 | 0.16 |
Figure 5.(A) Power as a function of replicate number. Box plots represent powers derived from DESeq analysis of 900 simulated next-generation sequencing (NGS) experiments of Screen 100_2x per replicate level. For comparison, the actual power of the Screen 500_2x using two biological replicates is also plotted. (B) Power as a function of sequencing coverage. Box plots represent powers derived from DESeq analysis of 900 simulated NGS experiments per coverage. This was done at increments of 100,000 counts per simulation or ~18 sequences per shRNA.