| Literature DB >> 25958398 |
Tao Wang1, Guanghua Xiao1, Yongjun Chu2, Michael Q Zhang3, David R Corey2, Yang Xie4.
Abstract
The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP-RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25958398 PMCID: PMC4477666 DOI: 10.1093/nar/gkv439
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Number of related scientific articles found by Google Scholar by searching for each of the key terms in the given year interval. Since ‘iCLIP’ could have many other meanings, it is searched together with ‘CLIP-Seq’.
Features of the three genome-wide CLIP platforms, as well as the major considerations for data analysis
| HITS-CLIP | PAR-CLIP | iCLIP | |
|---|---|---|---|
| Ribonucleoside analog treatment | No | Yes (4-SU, 6-SG) | No |
| Cross-linking | UV light cross-linking | Ribonucleoside analog treatment and UV light cross-linking | UV light cross-linking |
| UV light wavelength | 254 nm | 365nm | 254 nm |
| Adaptor ligations | Inter(molecular)/Inter | Inter/Inter | Inter/Intra |
| Diagnostic sites | No definite type of mutations | T→C or G→A | Pattern of cDNA truncations |
| PCR duplicates | Estimated by similarity in read sequence and alignment positions | Estimated by similarity in read sequence and alignment positions | Found by random barcodes |
| Advantages | Broad applications (from cultured cells, animal tissues and plants) | Enhanced UV-crosslinking efficiency; high signal-to-noise ratio at determining true binding sites | Broad applications; high signal-to-noise ratio at determining true binding sites |
| Disadvantages | Low characteristic mutation ratios | Potential toxicity of ribonucleoside analogs fed to cells | Technically more challenging |
Sequencing reads statistics for some genome-wide CLIP studies
| Experiment | # Total sequencing reads (million by default) | # Unique sequencing reads (million by default) | # Uniquely mapped reads (million by default) | # Replicate | Method to handle replicates | Year | Citation |
|---|---|---|---|---|---|---|---|
| HITS-CLIP | 26 (all replicates combined) | ∼1.8 of all mapped reads | Unclear whether mapping allows non-unique alignment | 5 | Biologic complexity | 2009 | ( |
| PAR-CLIP | 4.1–33 (all replicates combined) | 0.65–7.0 | 20–70% of sequencing reads after adaptor removal | 1–7 | Pooled | 2010 | ( |
| iCLIP | 6.5 (all replicates combined) | 0.6 out of 4.2M uniquely mapped reads | 4.2 | 3 | Pooled | 2010 | ( |
| PAR-CLIP | 22–24 (all replicates combined) | Not reported | 2.6–4.1 | 2 | Pooled | 2011 | ( |
| iCLIP | 113 (all replicates combined) | 33 out of 43M uniquely mapped reads | 43 | 3 | Focus on binding sites reproduced in all replicates | 2012 | ( |
| HITS-CLIP | 36–37 (second replicate) | 0.95–1.5 out of 11M–15M uniquely mapped reads | 11–15 | 2 | Analyze the second replicate | 2012 | ( |
| PAR-CLIP | 60 (all replicates combined) | 1.1 | 0.32 | 4 | Pooled | 2013 | ( |
| HITS-CLIP | 72 | 0.35 | 0.22 out of 0.35M unique sequencing reads | 1 | NA | 2014 | ( |
| HITS-CLIP | 250–340 (each protein) | 0.87–2.3 | Not reported | 4–5 | Pooled | 2014 | ( |
| iCLIP | 169–433 (all replicates combined) | 0.16–9.6 out of all mapped reads | 12–48% | 2 | Pooled | 2015 | ( |
Mapping software used in genome-wide CLIP analysis
| Aligner | Title/Citation | Example studies |
|---|---|---|
| Bowtie | Ultrafast and memory-efficient alignment of short DNA sequences to the human genome ( | ( |
| Novoalign | ( | |
| BLAT | BLAT—the BLAST-like alignment tool ( | ( |
| Gsnap | Fast and SNP-tolerant detection of complex variants and splicing in short reads ( | ( |
| BWA | Fast and accurate short-read alignment with Burrows–Wheeler transform ( | ( |
| RMAP | Updates to the RMAP short-read mapping software ( | ( |
Figure 2.Summary of the analysis software, pipelines and databases for CLIP-Seq analysis mentioned in this review.
Summary of genome-wide CLIP analysis software programs and databases
| Software/Database | Type | Comment | Citation |
|---|---|---|---|
| CLIPZ | Database | Can carry out simple bioinformatics analysis | ( |
| StarBase v2 | Database | Contains CLASH datasets as well | ( |
| doRiNA | Database | Focuses on miRNA biology | ( |
| CLIPdb | Database | Contain uniformly identified binding sites of publicly available genome-wide CLIP datasets | ( |
| PARalyzer | Software | Peak-finding algorithm for PAR-CLIP dataset only | ( |
| Piranha | Software | Peak-finding and differential binding detection algorithm | ( |
| dCLIP | Software | Differential binding detection algorithm | ( |
| PIPE-CLIP | Software | Peak-finding algorithm | ( |
| wavClusteR | Software | Peak-finding algorithm for PAR-CLIP dataset only | ( |
| PARma | Software | Differential binding detection algorithm for AGO PAR-CLIP dataset only | ( |
| MiClip | Software | Peak-finding algorithm wrapped as an R package | ( |
| PAR-CLIP HMM | Software | Peak-finding algorithm employing Hidden Markov Model | ( |
| GraphProt | Software | Peak-finding algorithm that can handle both RNAcompete and genome-wide CLIP data flexibly | ( |
| Pyicos | Software | Peak-finding algorithm that can handle ChIP-Seq, genome-wide CLIP and RNA-Seq data flexibly | ( |
| miRTarCLIP | Software | Peak-finding algorithm that employs a novel C to T reversion strategy in PAR-CLIP dataset analysis | ( |