| Literature DB >> 32345346 |
Shun Liu1,2,3, Allen Zhu2,3,4, Chuan He2,3,5, Mengjie Chen6,7.
Abstract
The REPIC (RNA EPItranscriptome Collection) database records about 10 million peaks called from publicly available m6A-seq and MeRIP-seq data using our unified pipeline. These data were collected from 672 samples of 49 studies, covering 61 cell lines or tissues in 11 organisms. REPIC allows users to query N6-methyladenosine (m6A) modification sites by specific cell lines or tissue types. In addition, it integrates m6A/MeRIP-seq data with 1418 histone ChIP-seq and 118 DNase-seq data tracks from the ENCODE project in a modern genome browser to present a comprehensive atlas of m6A methylation sites, histone modification sites, and chromatin accessibility regions. REPIC is accessible at https://repicmod.uchicago.edu/repic.Entities:
Keywords: Database; Genome browser; Tissue specificity; m6A modification
Mesh:
Substances:
Year: 2020 PMID: 32345346 PMCID: PMC7187508 DOI: 10.1186/s13059-020-02012-4
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Summary of comparison between REPIC and published databases
| Item | REPIC | RMBase v2.0* | MeT-DB v2.0 | CVm6A |
|---|---|---|---|---|
| Species | 11 | 13 | 7 | 2 |
| Cell/tissue | 61 | 45 | 40 | 31 |
| Data set | 49 | 39 | 26 | 23 |
| Sample | 672 | 524 | 437 | 130 |
| Peak set | 339 | NA | 185 | 69 |
| De novo data processing | ✓ | ✓** | ✓ | ✓ |
| Pipeline supported | ✓ | NA | NA | NA |
| Peak calling tools | exomePeak MeTPeak MACS2 | exomePeak** | exomePeak | MeTPeak |
| Cell/tissue-based query | ✓ | NA | NA | ✓ |
| Genome browser | ✓ | ✓ | ✓ | ✓ |
| Intergenic m6A query | ✓ | NA | NA | NA |
| RNA modification type | m6A | 5+*** | m6A | m6A |
| Epigenomic data | 1536 | NA | NA | NA |
NA not available
*Statistics from five modification types (m1A, m5C, m6A, Nm, and Ψ)
**Only m6A/MeRIP-seq and m1A-seq data were considered
***More than five RNA modification types
Fig. 1a Overall design of the REPIC database. b Schema of the customized pipeline for m6A-seq or MeRIP-seq data processing
Fig. 2The quality of m6A-seq or MeRIP-seq reads mapping. Boxplots depicting the distribution of reads mapped to a rRNAs and b genomes in the input and IP samples, respectively. The y-axis in a and b represents the percentage of reads mapped to rRNAs and non-rRNA reads mapped to genomes, respectively. Both left-side panels show the whole range of the ratios and the right-side panels of a and b zoom in on the ranges of 0–5% and 75–100%, respectively
Fig. 3Evaluation of similarities of peak sets generated by three peak calling tools. Scatter plots showing the distributions of the Jaccard Index and Simpson Index from comparisons of a exomePeak versus MACS2, b MeTPeak versus MACS2, and c exomePeak versus MeTPeak across all samples. Paired-end and single-end sequencing types are represented by triangles and circles, respectively. Species are indicated by colors
Fig. 4Cell- or tissue-specific m6A modifications. a Heatmap depicting the Pearson correlation of different human cell lines and tissues of the top 2000 genes ranked by CVs of fold enrichment levels of m6A peaks at stop codon regions (± 200 bp around the stop codons). The dendrogram was constructed using complete linkage based on Euclidean distances. Each row label represents the sample information in the format of “input_IP”. bt-SNE plot displaying grouping patterns of different cell/tissue samples in a lower-dimensional space for the same data in a. Each dot represents a sample. Cell/tissue types are indicated by colors
Fig. 5Screenshots of the web interfaces of the REPIC database. a The Home page. b The Search page. c Taking the query region near gene NANOG as an example, we show a visualization of m6A peaks, histone modifications, and chromatin accessibility in the genome browser