| Literature DB >> 24860646 |
Zhiyin Dai1, Julie M Sheridan2, Linden J Gearing3, Darcy L Moore3, Shian Su1, Sam Wormald4, Stephen Wilcox4, Liam O'Connor4, Ross A Dickins3, Marnie E Blewitt3, Matthew E Ritchie3.
Abstract
Pooled library sequencing screens that perturb gene function in a high-throughput manner are becoming increasingly popular in functional genomics research. Irrespective of the mechanism by which loss of function is achieved, via either RNA interference using short hairpin RNAs (shRNAs) or genetic mutation using single guide RNAs (sgRNAs) with the CRISPR-Cas9 system, there is a need to establish optimal analysis tools to handle such data. Our open-source processing pipeline in edgeR provides a complete analysis solution for screen data, that begins with the raw sequence reads and ends with a ranked list of candidate genes for downstream biological validation. We first summarize the raw data contained in a fastq file into a matrix of counts (samples in the columns, genes in the rows) with options for allowing mismatches and small shifts in sequence position. Diagnostic plots, normalization and differential representation analysis can then be performed using established methods to prioritize results in a statistically rigorous way, with the choice of either the classic exact testing methodology or generalized linear modeling that can handle complex experimental designs. A detailed users' guide that demonstrates how to analyze screen data in edgeR along with a point-and-click implementation of this workflow in Galaxy are also provided. The edgeR package is freely available from http://www.bioconductor.org.Entities:
Year: 2014 PMID: 24860646 PMCID: PMC4023662 DOI: 10.12688/f1000research.3928.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Summary of the raw data, workflow and diagnostic plots from edgeR.
( A) Structure of the amplicons sequenced in a typical shRNA-seq screen. Each amplicon will contain sample and hairpin specific sequences at predetermined locations. In sgRNA-seq screens, the amplicon sequences have a similar structure, with the sgRNA sequence replacing the hairpin. After sequencing, the raw data is available in a fastq file. ( B) The main steps and functions used in an analysis of shRNA/sgRNA-seq screen data in edgeR are shown. ( C) Example of a multidimensional scaling (MDS) plot showing the relationships between replicate dimethyl sulfoxide (DMSO) and Nutlin treated samples (data from Sullivan et al. (2012) [4]). MDS plots provide a quick display of overall variability in the screen and can highlight inconsistent samples. ( D) Plot of log 2-fold-change versus hairpin abundance (log 2CPM) for the same data. Hairpins with a false discovery rate < 0.05 from an exact test analysis in edgeR (highlighted in red) may be prioritized for further validation.
Figure 2. Screenshots of the Galaxy tool for analyzing pooled genetic sequencing screens using edgeR.
( A) From the main screen, the user selects the appropriate input files and analysis options. ( B) The results of an analysis are summarized in an HTML page that includes various diagnostic plots. ( C) Output also includes a table of ranked results at the hairpin/guide and gene-level (where appropriate) as well as barcode plots ( D) that highlight the ranks of hairpins/guides targeting a specific gene relative to all other hairpins/guides in the data set.