| Literature DB >> 32226440 |
Bowen Song1, Yujiao Tang1,2, Zhen Wei1,3, Gang Liu4, Jionglong Su4, Jia Meng1,2, Kunqi Chen1,3.
Abstract
Known as the "fifth RNA nucleotide", pseudouridine (Ψ or psi) is the first-discovered and most abundant RNA modification occurring at the Uridine site, and it plays a prominent role in a number of biological processes. Thousands of Ψ sites have been identified within different biological contexts thanks to the advancement in high-throughput sequencing technology; nevertheless, the transcriptome-wide distribution, biomolecular functions, regulatory mechanisms, and disease relevance of pseudouridylation are largely elusive. We report here a web server-PIANO-for pseudouridine site (Ψ) identification and functional annotation. PIANO was built upon a high-accuracy predictor that takes advantage of both conventional sequence features and 42 additional genomic features. When tested on six independent datasets generated from four independent Ψ-profiling technologies (Ψ-seq, RBS-seq, Pseudo-seq, and CeU-seq) as benchmarks, PIANO achieved an average AUC of 0.955 and 0.838 under the full transcript and mature mRNA models, respectively, marking a substantial improvement in accuracy compared to the existing in silico Ψ-site prediction methods, i.e., PPUS (0.713 and 0.707), iRNA-PseU (0.713 and 0.712), and PseUI (0.634 and 0.652). Besides, PIANO web server systematically annotates the predicted Ψ sites with post-transcriptional regulatory mechanisms (miRNA-targets, RBP-binding regions, and splicing sites) in its prediction report to help the users explore potential machinery of Ψ. Moreover, a concise query interface was also built for 4,303 known Ψ sites, which is currently the largest collection of experimentally validated human Ψ sites. The PIANO website is freely accessible at: http://piano.rnamd.com.Entities:
Keywords: RNA modification; Web-server; functional annotation; genome-derived feature; pseudouridine sites
Year: 2020 PMID: 32226440 PMCID: PMC7080813 DOI: 10.3389/fgene.2020.00088
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Base-resolution dataset used for Ψ-site prediction.
| Dataset | Cell line | Treatment | Technique | Site # | Source |
|---|---|---|---|---|---|
| H1 | HEK293 | Ψ-Seq | 652 | ( | |
| H2 | Hela | RBS-Seq | 322 | ( | |
| H3 | HEK293T | CeU-Seq | 1555 | ( | |
| H4 | HEK293T | H2O2 | 460 | ||
| H5 | HEK293T | Heat Shock (HS) | 421 | ||
| H6 | Hela | Pseudo-Seq | 156 | ( |
The experimentally validated human Ψ sites used in this project are also available from the PIANO website of this project (http://piano.rnamd.com), annotated with various post-transcriptional regulations.
Figure 1Negative and Positive Data. Negative sites were randomly selected from un-modified U sites located on the same transcripts of the positive sites.
Performance evaluation of Ψ-site predictors.
| Mode | Method | Benchmarking data (AUC) | Average AUC | |||
|---|---|---|---|---|---|---|
| Ψ-Seq | RBS-Seq | CeU-Seq | Pseudo-Seq | |||
| Full | PIANO | 0.957 | 0.978 | 0.914 | 0.972 | 0.955 |
| iRNA-PseU | 0.679 | 0.727 | 0.721 | 0.708 | 0.713 | |
| PPUS | 0.700 | 0.721 | 0.724 | 0.705 | 0.713 | |
| PseUI | 0.631 | 0.710 | 0.610 | 0.585 | 0.634 | |
| Mature | PIANO | 0.859 | 0.770 | 0.864 | 0.857 | 0.838 |
| iRNA-PseU | 0.753 | 0.582 | 0.760 | 0.751 | 0.712 | |
| PPUS | 0.749 | 0.575 | 0.757 | 0.748 | 0.707 | |
| PseUI | 0.666 | 0.651 | 0.652 | 0.639 | 0.652 | |
The table presents the performance of different Ψ site predictors achieved on independent human datasets with different technologies as a benchmark, and it is summarized from and . Only the Ψ sites not previously used as training data were considered during performance evaluation, so the training sites and testing sites did not overlap. Because existing datasets overwhelmingly relied on polyA selection in RNA library preparation and intronic Ψ sites are likely to be underrepresented in the data, the performances were evaluated under two modes: full transcript and mature mRNA modes. In the mature mRNA mode, only positive and negative Ψ sites located on mature mRNA transcripts are considered, as previously described (Chen K,et al., 2019). Our new approach PIANO substantially outperformed competing approaches in accuracy.
PUS-specific substrate prediction.
| Method | Full transcript model | Mature mRNA model | ||||
|---|---|---|---|---|---|---|
| TruB2 | PSU7 | TruB1 | TruB2 | PSU7 | TruB1 | |
| PIANO | 0.981 | 0.966 | 0.973 | 0.837 | 0.960 | 0.910 |
| iRNA-PseU | 0.812 | 0.829 | 0.838 | 0.719 | 0.812 | 0.731 |
| PPUS | 0.806 | 0.824 | 0.824 | 0.733 | 0.816 | 0.739 |
| PseUI | 0.853 | 0.870 | 0.840 | 0.805 | 0.861 | 0.786 |
Figure 2Interface and output of the PIANO web server for Ψ-site prediction and functional annotation. (A) When predicting human Ψ sites, the PIANO web server supports two types of input: the genomic ranges of human genome assembly and the FASTA sequences. As the prediction process may take quite some time, it is highly recommended that the user should provide an email address, where an email notification will be sent when the job is finished. (B) The basic information of each putative Ψ site, such as gene symbol, likelihood ratio, confidence level, and the number of related post-transcriptions associated with the putative site. (C) The source and detailed information of each putative Ψ site. If the input file contains any experimental validated Ψ sites collected in PIANO, the sites will be annotated with additional information. (D) The details of the site-relevant RBP information. (E) A graph to visualize the position of predicted Ψ sites on a user-provided FASTA sequence. (F) An overall review of the prediction result.