| Literature DB >> 28605404 |
Silvia Bottini1, David Pratella1, Valerie Grandjean1, Emanuela Repetto1, Michele Trabucchi1.
Abstract
Cross-Linking Immunoprecipitation associated to high-throughput sequencing (CLIP-seq) is a technique used to identify RNA directly bound to RNA-binding proteins across the entire transcriptome in cell or tissue samples. Recent technological and computational advances permit the analysis of many CLIP-seq samples simultaneously, allowing us to reveal the comprehensive network of RNA-protein interaction and to integrate it to other genome-wide analyses. Therefore, the design and quality management of the CLIP-seq analyses are of critical importance to extract clean and biological meaningful information from CLIP-seq experiments. The application of CLIP-seq technique to Argonaute 2 (Ago2) protein, the main component of the microRNA (miRNA)-induced silencing complex, reveals the direct binding sites of miRNAs, thus providing insightful information about the role played by miRNA(s). In this review, we summarize and discuss the most recent computational methods for CLIP-seq analysis, and discuss their impact on Ago2/miRNA-binding site identification and prediction with a regard toward human pathologies.Entities:
Mesh:
Substances:
Year: 2018 PMID: 28605404 PMCID: PMC6291801 DOI: 10.1093/bib/bbx063
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Main steps of the bioinformatics workflow to analyze CLIP-seq data with the main software or pipeline to use.
Number of reads obtained using different preprocessing and mapping tools on the in-house Ago2 HITS-CLIP data set from P19 stem cells
| # of reads before preprocessing | Preprocessing programs | # of reads after preprocessing | Mapping tools | # of reads after mapping | ||
|---|---|---|---|---|---|---|
| Unique mapping (%) | Multiple mapping (%) | No mapping (%) | ||||
| Replicate 1 | ||||||
| 38 865 698 | PRINSEQ | 3 783 764 | Novoalign | 2 120 447 (56.0) | 1 300 950 (34.4) | 34 565 (0.9) |
| STAR | 1 142 926 (30.21) | 1 579 235 (41.74) | 1 061 603 (28.05) | |||
| FASTX-Toolkit | 933 985 | Novoalign | 596 192 (63.8) | 315 845 (33.8) | 7 004 (0.7) | |
| STAR | 372 062 (39.84) | 472 457 (50.59) | 89 466 (9.58) | |||
| CIMS | 2 467 666 | Novoalign | 1 355 710 (54.9) | 784 179 (31.8) | 25 516 (1.0) | |
| STAR | 787 349 (31.91) | 1 297 371 (52.57) | 382 946 (15.51) | |||
| Replicate 2 | ||||||
| 34.094.384 | PRINSEQ | 3 924 075 | Novoalign | 2 196 430 (56.0) | 1 321 658 (33.7) | 34 607 (0.9) |
| STAR | 1 311 574 (33.42) | 2 036 793 (51.91) | 575 708 (14.67) | |||
| fastXtoolkit | 957 572 | Novoalign | 634 824 (65.1) | 316 995% (32.1) | 6 529 (0.7) | |
| STAR | 423 300 (43.39) | 461 865 (47.34) | 72 407 (7.42) | |||
| CIMS | 2 584 470 | Novoalign | 1 425 553 (55.2) | 801 740 (31.0) | 25 557 (1.0) | |
| STAR | 871 764 (33.73) | 1 314 533 (50.86) | 398 173 (15.41) | |||
| Replicate 3 | ||||||
| 32.904.107 | PRINSEQ | 3 668 439 | Novoalign | 2 001 844 (54.6) | 1 264 702 (34.5) | 35 261 (1.0) |
| STAR | 1 152 648 (31.42) | 1 956 392 (53.33) | 559 399 (15.25) | |||
| fastXtoolkit | 888 186 | Novoalign | 562 070 (63.3) | 300 647 (33.8) | 6 308 (0.7) | |
| STAR | 358 485 (40.36) | 442 132 (49.78) | 87 569 (9.86) | |||
| CIMS | 2 359 084 | Novoalign | 1 264 546 (53.6) | 750 678 (31.8) | 25 402 (1.1) | |
| STAR | 740 554 (31.39) | 1 242 835 (52.68) | 375 695 (15.92) | |||
Main characteristics of first- and second-generation algorithms to predict miRNA-binding sites
| Binding site position on mRNA | Type of
binding site | Features | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Program name | Type of resource | 5ʹ UTR | CDS | 3ʹ UTR | Seed | NCSL | Conservation | Free energy | Accessibility | CLIP data | Reference | Web site |
| First generation | ||||||||||||
| Targetscan | WS | No | No | Yes | Yes | No | Yes | No | No | No | [ | |
| PicTar | WS | No | No | Yes | Yes | SL | No | Yes | No | No | [ | |
| PITA | WS/SA | No | No | Yes | Yes | No | Yes | Yes | Yes | No | [ | |
| miRanda | D/SA | No | No | Yes | Yes | No | Yes | Yes | No | No | [ | |
| RNAhybrid | WS/SA | No | No | Yes | Yes | Yes | No | Yes | No | No | [ | |
| RNA22 | WS/SA | Yes | Yes | Yes | Yes | SL | No | Yes | No | No | [ | |
| Second generation | ||||||||||||
| TargetSpy | WS/D | No | No | Yes | Yes | Yes | No | Yes | Yes | Yes | [ | |
| MIRZA | SA/WS | No | No | Yes | Yes | SL | No | Yes | No | Yes | [ | |
| DIANA-micro-T-CDS | WS/SA | No | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | [ | |
| STarMir | WS | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | [ | |
| miRDB | D/WS | Yes | Yes | Yes | Yes | No | Yes | Yes | No | Yes | [ | |
| miRTar2GO | D/WS | No | No | Yes | Yes | No | Yes | Yes | No | Yes | [ | |
| chimiRic | SA | No | No | Yes | Yes | Yes | No | Yes | No | Yes | [ | |
WS, Web server; D, database; SA, stand-alone software; CDS, Protein-coding sequence; NC, Noncanonical; SL, seed-like.