| Literature DB >> 29420797 |
Taemook Kim1, Hogyu David Seo1, Lothar Hennighausen2, Daeyoup Lee1, Keunsoo Kang3.
Abstract
Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed to generate BAM and BigWig files, which are used for advanced analyses and visualization. Currently, it can process NGS data from popular model genomes such as, human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), plant (Arabidopsis thaliana), zebrafish (Danio rerio), fruit fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and budding yeast (Saccharomyces cerevisiae) genomes. With the processed files from Octopus-toolkit, the meta-analysis of various data sets, motif searches for DNA-binding proteins, and the identification of differentially expressed genes and/or protein-binding sites can be easily conducted with few commands by users. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29420797 PMCID: PMC5961211 DOI: 10.1093/nar/gky083
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Octopus-Toolkit workflow. (A) Detailed information on data types, input, and output for Octopus-Toolkit is shown. Programs associated with Octopus-Toolkit and their purposes are described (dashed line box). (B) An example of its use is depicted. (C) Graphical user interface of Octopus-Toolkit is shown. The only input required for Octopus-Toolkit is an accession number for epigenomic and transcriptomic NGS data sets (GSE accession number) or a single piece of NGS data (GSM accession number) (black box). Multiple NGS data sets can be sequentially processed by providing a list of GSE (or GSM) accession numbers as a text file. Octopus-Toolkit runs all the steps after the Run icon (dotted line box) is clicked on.
Figure 2.UCSC browser snapshot of RNA-seq and STAT5 ChIP-seq performed in mouse mammary gland (GSE48685) and liver (GSE31578) tissues. (A) Each track indicates either an RNA-seq or ChIP-seq sample. Peaks on the ChIP-seq tracks represent binding (or enrichment) of proteins, while peaks on the RNA-seq track indicate relative expression levels of genes. (B) Line plot and heatmap generated by Octopus-toolkit clearly show tissue-specific binding of STAT5 between mammary gland and liver tissues.
Figure 3.De novo motif prediction on ESA1, GCN5 and SET1-binding sites. HOMER was used to predict DNA-binding motifs of proteins significantly associated with each histone-modifying protein. Top three motifs are shown (left panel). Significance of the motifs at each time point is shown (right panel).