Literature DB >> 25342065

piPipes: a set of pipelines for piRNA and transposon analysis via small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq and genomic DNA sequencing.

Bo W Han¹, Wei Wang², Phillip D Zamore¹, Zhiping Weng³.

Abstract

MOTIVATION: PIWI-interacting RNAs (piRNAs), 23-36 nt small silencing RNAs, repress transposon expression in the metazoan germ line, thereby protecting the genome. Although high-throughput sequencing has made it possible to examine the genome and transcriptome at unprecedented resolution, extracting useful information from gigabytes of sequencing data still requires substantial computational skills. Additionally, researchers may analyze and interpret the same data differently, generating results that are difficult to reconcile. To address these issues, we developed a coordinated set of pipelines, 'piPipes', to analyze piRNA and transposon-derived RNAs from a variety of high-throughput sequencing libraries, including small RNA, RNA, degradome or 7-methyl guanosine cap analysis of gene expression (CAGE), chromatin immunoprecipitation (ChIP) and genomic DNA-seq. piPipes can also produce figures and tables suitable for publication. By facilitating data analysis, piPipes provides an opportunity to standardize computational methods in the piRNA field. SUPPLEMENTARY INFORMATION: Supplementary information, including flowcharts and example figures for each pipeline, are available at Bioinformatics online.
AVAILABILITY AND IMPLEMENTATION: piPipes is implemented in Bash, C++, Python, Perl and R. piPipes is free, open-source software distributed under the GPLv3 license and is available at http://bowhan.github.io/piPipes/. CONTACT: Phillip.Zamore@umassmed.edu or Zhiping.Weng@umassmed.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2014 PMID： 25342065 PMCID： PMC4325541 DOI： 10.1093/bioinformatics/btu647

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

piRNAs, a class of 23–36 nt long small silencing RNAs, suppress transposon expression in the metazoan germ line and, in some animals, the adjacent gonadal somatic cells (Luteijn and Ketting, 2013). By preventing transposition, the piRNA pathway ensures that genetic information passes faithfully to the next generation. Disruption of the piRNA pathway typically leads to transposon mobilization, double-stranded DNA breaks and sterility. High-throughput sequencing technologies have been widely deployed in the study of piRNAs. Small RNA-seq reveals the identity and abundance of piRNAs (Brennecke ); RNA-seq detects and quantifies mRNA and transposon transcripts (Reuter ); degradome-seq (also termed RACE-seq) detects the cleavage products of PIWI-proteins guided by piRNAs (Reuter ); chromatin immunoprecipitation (ChIP)-seq detects chromatin modifications directed by piRNAs or transcription factor-binding events that regulate piRNA precursor or target transcription (Sienski ); and genomic DNA sequencing detects new transposition events caused by transposons that escape piRNA repression or background differences between experimental strains and the assembled genome (Khurana ; Sienski ). Correctly extracting biological knowledge from such voluminous data requires significant computational expertise and effort. The repetitive nature of transposon sequences lays another layer of complexity. Moreover, different laboratories use diverse methods to analyze and interpret data (e.g. the way of treating reads that map to multiple locations in a reference genome). To provide a standardized set of tools to analyze these diverse data types, we developed piPipes, a collection of integrated pipelines for small RNA-seq, RNA-seq, degradome- and cap analysis of gene expression-seq (CAGE-seq), ChIP-seq and genome-seq analyses.

2 METHODS

piPipes comprises five pipelines designed to analyze small RNA-seq, RNA-seq, degradome- and CAGE-seq, ChIP-seq or genome-seq data. The small RNA-seq pipeline reports the abundance, length distribution, nucleotide composition and 5′-to-5′ distance (‘Ping-Pong’ signature) of piRNAs assigned to genomic annotations, including individual transposon families and piRNA clusters, the initial sources of piRNA precursor transcripts. The RNA-seq pipeline reports the normalized abundance of transcripts from both genes and transposons. The degradome-seq pipeline offers methods to identify piRNA-directed cleavage products. This pipeline can also be used to analyze any long RNA sequencing method designed to define RNA 5′ ends, e.g. CAGE-seq. The ChIP-seq pipeline uses the widely used peak-calling algorithm MACS2 (Zhang ), focusing on piRNA clusters and transposons. The genome-seq pipeline detects novel transposition events as well as structural variation. Supplementary Figure S1 illustrates the general piPipes workflow, using the small RNA-seq pipeline as an example. First, all reads aligned to ribosomal RNA (rRNA) sequences are removed. The remaining reads are then mapped to microRNA (miRNA) hairpin sequences to quantify the abundance and 5′- and 3′-end heterogeneity of mature miRNAs. Reads that do not match rRNAs or miRNAs are then mapped to the reference genome. piPipes next assigns reads to different genomic features (e.g., transposons, piRNA clusters and genes) by their coordinates. To achieve maximal speed, piPipes parallelizes this step on multiple threads using ParaFly software from the Trinity package (Grabherr ). For the reads assigned to each genomic feature, piPipes draws publication-quality graphs of length distribution and nucleotide composition, as well as the distance between the 5′ ends of two small RNAs from opposite strands of the same locus, a standard method for detecting piRNA ‘Ping-Pong’ amplification or siRNA phasing (Fig. 1A and Supplementary Fig. S1B). Furthermore, piPipes generates a table that summarizes the number of unique and multiple mappers counted as species (distinct sequences) or reads. The RNA-seq pipeline also starts with rRNA removal. The remaining reads are then mapped to the genome using STAR (Dobin ). piPipes quantifies transcript abundance from genomic alignment by both Cufflinks (Trapnell ) and HTSeq-count (Anders ). In addition, direct mapping of the reads to the transcriptome is performed using Bowtie2 followed by eXpress quantification (Roberts and Pachter, 2013). Degradome-seq and CAGE-seq share the same pipeline because both methods aim to characterize the 5′ ends of RNAs. This pipeline discards reads that can only be mapped to the genome via soft clipping of their 5′ ends (i.e. the prefixes of these reads do not map to the genome). The alignment procedure is otherwise similar to that used for RNA-seq data. The nucleotide composition for each genomic feature is calculated as in the small RNA pipeline (Supplementary Fig. S3B). The ChIP-seq pipeline aligns the ChIP and input libraries to the genome using Bowtie2. piPipes calls peaks using MACS2 (Zhang ), which supports both narrow (such as transcription factors) and broad (such as histone 3 trimethyl lysine 9, H3K9me3) peaks. Transcription start site, transcription end site and metagene analyses of different genomic features are implemented by bwtool (Pohl and Beato, 2014). The genome-seq pipeline applies different algorithms, including BreakDancer (Chen ), RetroSeq (Hormozdiari ; Keane ) and TEMP (Zhuang ), to discover transposon insertion, deletion and other structural variation events (Supplementary Fig. S5). piPipes uses a Circos plot (Zhang ) to represent the variant loci discovered by each algorithm across different chromosomes (Fig. 1D).

Fig. 1.

Gallery of piPipes Figures (A) Barplot representing length distribution of Drosophila w ovary small RNAs assigned to sense (blue) and antisense (red) strands of transposons. (B) Scatterplot comparing w to aub ovary RNA-seq reads assigned to mRNA (NM; red), non-coding RNA (NR; green) and transposons (blue). (C) Metagene plot of H3K9me3 ChIP-seq of piRNA clusters from flies in which piwi mRNA was depleted by double-stranded RNA-triggered RNA driven by a triple Gal4 driver (SRX215630). (D) Circos plot representing the locations of, from the periphery to the center, cytological position, piRNA clusters, SV discovered by TEMP (tiles), retroSeq (tiles) and VariationHunter (links) using genomic sequencing of 2–4-day-old ovaries The small RNA-seq, RNA-seq and ChIP-seq pipelines can each be run in two modes, allowing analysis of a single sample or a pair of samples. The dual-sample mode uses the output from the single-sample mode and performs pair-wise comparison as illustrated by balloonplots and scatterplots (Supplementary Fig. S1C and D). The comparison can be performed on miRNA, piRNA or mRNA. Figure 1B illustrates a scatterplot showing the mRNA abundance in an RNA-seq dataset analyzed by the RNA-seq pipeline in the dual-sample mode. The dual-sample mode of the RNA-seq pipeline also uses Cuffdiff (Trapnell ) to perform differential analysis on genic transcripts. In the dual-sample mode, the ChIP-seq pipeline uses MACS2 to identify differentially enriched loci (Supplementary Fig. S4).

18 in total

1. STAR: ultrafast universal RNA-seq aligner.

Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937

2. Miwi catalysis is required for piRNA amplification-independent LINE1 transposon silencing.

Authors: Michael Reuter; Philipp Berninger; Shinichiro Chuma; Hardik Shah; Mihoko Hosokawa; Charlotta Funaya; Claude Antony; Ravi Sachidanandam; Ramesh S Pillai
Journal: Nature Date: 2011-11-27 Impact factor: 49.962

3. Adaptation to P element transposon invasion in Drosophila melanogaster.

Authors: Jaspreet S Khurana; Jie Wang; Jia Xu; Birgit S Koppetsch; Travis C Thomson; Anetta Nowosielska; Chengjian Li; Phillip D Zamore; Zhiping Weng; William E Theurkauf
Journal: Cell Date: 2011-12-23 Impact factor: 41.582

4. Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery.

Authors: Fereydoun Hormozdiari; Iman Hajirasouliha; Phuong Dao; Faraz Hach; Deniz Yorukoglu; Can Alkan; Evan E Eichler; S Cenk Sahinalp
Journal: Bioinformatics Date: 2010-06-15 Impact factor: 6.937

5. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila.

Authors: Julius Brennecke; Alexei A Aravin; Alexander Stark; Monica Dus; Manolis Kellis; Ravi Sachidanandam; Gregory J Hannon
Journal: Cell Date: 2007-03-08 Impact factor: 41.582

6. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.

Authors: Ken Chen; John W Wallis; Michael D McLellan; David E Larson; Joelle M Kalicki; Craig S Pohl; Sean D McGrath; Michael C Wendl; Qunyuan Zhang; Devin P Locke; Xiaoqi Shi; Robert S Fulton; Timothy J Ley; Richard K Wilson; Li Ding; Elaine R Mardis
Journal: Nat Methods Date: 2009-08-09 Impact factor: 28.547

7. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors: Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908

8. Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors: Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal: Nat Biotechnol Date: 2011-05-15 Impact factor: 54.908

9. Transcriptional silencing of transposons by Piwi and maelstrom and its impact on chromatin state and gene expression.

Authors: Grzegorz Sienski; Derya Dönertas; Julius Brennecke
Journal: Cell Date: 2012-11-15 Impact factor: 41.582

10. Model-based analysis of ChIP-Seq (MACS).

Authors: Yong Zhang; Tao Liu; Clifford A Meyer; Jérôme Eeckhoute; David S Johnson; Bradley E Bernstein; Chad Nusbaum; Richard M Myers; Myles Brown; Wei Li; X Shirley Liu
Journal: Genome Biol Date: 2008-09-17 Impact factor: 13.583

60 in total

1. RNAdetector: a free user-friendly stand-alone and cloud-based system for RNA-Seq data analysis.

Authors: Alessandro La Ferlita; Salvatore Alaimo; Sebastiano Di Bella; Emanuele Martorana; Georgios I Laliotis; Francesco Bertoni; Luciano Cascione; Philip N Tsichlis; Alfredo Ferro; Roberta Bosotti; Alfredo Pulvirenti
Journal: BMC Bioinformatics Date: 2021-06-03 Impact factor: 3.169

2. Structural insights into Rhino-Deadlock complex for germline piRNA cluster specification.

Authors: Bowen Yu; Yu An Lin; Swapnil S Parhad; Zhaohui Jin; Jinbiao Ma; William E Theurkauf; Zz Zhao Zhang; Ying Huang
Journal: EMBO Rep Date: 2018-06-01 Impact factor: 8.807

3. Co-chaperone Hsp70/Hsp90-organizing protein (Hop) is required for transposon silencing and Piwi-interacting RNA (piRNA) biogenesis.

Authors: Joseph A Karam; Rasesh Y Parikh; Dhananjaya Nayak; David Rosenkranz; Vamsi K Gangaraju
Journal: J Biol Chem Date: 2017-02-13 Impact factor: 5.157

4. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets.

Authors: Ying Jin; Oliver H Tam; Eric Paniagua; Molly Hammell
Journal: Bioinformatics Date: 2015-07-23 Impact factor: 6.937

5. Functional Redundancy of Variant and Canonical Histone H3 Lysine 9 Modification in Drosophila.

Authors: Taylor J R Penke; Daniel J McKay; Brian D Strahl; A Gregory Matera; Robert J Duronio
Journal: Genetics Date: 2017-11-13 Impact factor: 4.562

Review 6. Clinical value of non-coding RNAs in cardiovascular, pulmonary, and muscle diseases.

Authors: Sébastien Bonnet; Olivier Boucherat; Roxane Paulin; Danchen Wu; Charles C T Hindmarch; Stephen L Archer; Rui Song; Joseph B Moore; Steeve Provencher; Lubo Zhang; Shizuka Uchida
Journal: Am J Physiol Cell Physiol Date: 2019-09-04 Impact factor: 4.249