| Literature DB >> 27583132 |
Abstract
RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at: http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and https://hub.docker.com/r/maayanlab/zika/.Entities:
Keywords: RNA-seq; Systems biology; bioinformatics pipeline; gene expression analysis
Year: 2016 PMID: 27583132 PMCID: PMC4972086 DOI: 10.12688/f1000research.9110.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Principal Component Analysis (PCA) of the samples in the first two principal component space.
ZIKV-infected and mock-treated cells are colored in orange and blue, respectively. The shapes of the dots indicate the sequencing platforms: MiSeq – squares, and NextSeq - circles.
Figure 2. Hierarchical clustering heatmap of the 800 genes with the largest variance.
The CPM of 800 genes with the largest variance across the eight samples were log transformed and z-score normalized across samples. Blue indicates low expression and red high.
Figure 3. Bar plots of the top enriched gene sets from the ( a) ChEA and ( b) KEGG libraries for the downregulated genes after ZIKV infection.
Figure 4. Workflow of the different steps carried out in the pipeline.