| Literature DB >> 35283668 |
Kevin Regan1, Abolfazl Saghafi2, Zhijun Li1.
Abstract
Background: Splice junctions are the key to move from pre-messenger RNA to mature messenger RNA in many multi-exon genes due to alternative splicing. Since the percentage of multi-exon genes that undergo alternative splicing is very high, identifying splice junctions is an attractive research topic with important implications. Objective: The aim of this paper is to develop a deep learning model capable of identifying splice junctions in RNA sequences using 13,666 unique sequences of primate RNA.Entities:
Keywords: LSTM; RNA-seq; Splice junction; classification; deep learning; neural networks
Year: 2021 PMID: 35283668 PMCID: PMC8844938 DOI: 10.2174/1389202922666211011143008
Source DB: PubMed Journal: Curr Genomics ISSN: 1389-2029 Impact factor: 2.689
Summary of recent splice junction identification models for Homo sapiens in literature.
| Article | Method | Data | Results |
|---|---|---|---|
| Mapleson | An RNA sequence mapping process called Portcullis | ~76 million simulated Human training dataset [13], and combined real data from PRJEB4208 [14]. | Up to 98.17% f-score for |
| Zhang | DeepSplice using Convolutional Neural Networks | 2,880 (28,800) true (false) acceptor sites; 2,796 (27,960) true (false) donor sites for | auROC of 0.983 (0.974) on donor (acceptor) splice sites and auPRC of 0.863 (0.800) on donor (acceptor) splice sites |
| Zuallaert | SpliceRover using Convolutional Neural Networks | 1,324 (5,553) true (false) acceptor sites; 1,324 (4,922) true (false) donor sites from NN269 [16]. | 96.12% (95.35%) accuracy, 0.9899 (0.9829) auROC, 93.96% (93.31%) f-score for detecting acceptor (donor) sites |
| Van Moerbeke | A linear mixed model, Random Effects for the Identification of Differential Splicing (REIDS) | In total, 33,516 genes were measured using 298,281 exons and 249,475 junctions, each represented by eight probes on average from HJAY [17]. | REIDS analytical framework detected between 65–77% of the validated exon probe sets |
| Zhao | Assembling Splice Junctions Analysis (ASJA) | RNA-seq datasets from twelve normal tissues, seven cancerous tissues and seven matched adjacent tissues from GEO were used [18], making up a total of 322,675 linear junctions and 81,484 Back-splice junctions. | The sensitivity of ASJA known linear junctions is 97.3%. For novel linear junctions, the sensitivity is 89.8% comparing the known splice of 2- pass without annotation with gold standard |
| Wang | SpliceFinder using Convolutional Neural Networks | 10,000 donor sites, 10,000 acceptor sites, and 10,000 non-splice-sites which were randomly selected from Ensembl [19]. | Up to 90.25% accuracy |
| Lee | LSTM and Gated Recurrent Unit (GRU) | Dataset contains consolidated epigenomes from the Roadmap Epigenomics Consortium and the ENCODE Consortium [20]. | Up to 86% f-score; above 80% precision-recall curve metric |
| Amilpur | EDeepSSP using Convolutional Neural Networks | 2,880 (238,431) true (false) acceptor sites; 2,796 (180,975) true (false) donor sites from HS3D dataset [15]. | 0.9870 (0.9887) on auPRC and 0.9873 (0.9891) on auROC for acceptor (donor) site detection |
| Albaradei | Splice2Deep using Ensemble of Convolutional Neural Networks | A total of 250,400 (250,400) true (false) acceptor sites; 248,150 (248,150) true (false) donor sites for | Accuracy (f-score) of 96.91% (96.91%) for acceptor site detections, 97.38% (96.38%) for donor site detection |
| Dasari | InterSSPP using Convolutional Neural Networks | 2,880 (238,431) true (false) acceptor sites; 2,796 (180,975) true (false) donor sites from HS3D [15]. 1,324 (5,553) true (false) acceptor sites; 1,324 (4,922) true (false) donor sites from NN269 [16]. | HS3D: 0.9946 (0.9945) on auPRC and 0.9947 (0.9891) on auROC for acceptor (donor) site detection. NN269: 0.9922 (0.9891) on auPRC and 0.9923 (0.9894) on auROC for acceptor (donor) site detection |
Performance comparison with recent LSTM models.
| Article | Data | Results |
|---|---|---|
| Developed model | 13,666 cases with 3,470 instances (25.39%) of class EI, 3,550 instances (25.98%) of class IE, and 6,646 instances (48.63%) of class N from HS3D [ | With 10-fold-CV, average accuracy (f-score) of 91.31% (91.27%), average auROC (auPRC) for one-on-one comparisons was 0.9820 (0.9649). |
| Wang | 10,000 donor sites, 10,000 acceptor sites, and 10,000 non-splice-sites which were randomly selected have been used from Ensembl [ | auROC score of 0.960 (0.942) on donor (acceptor) splice site classification and an auPRC score of 0.803 (0.721) on donor (acceptor) splice site classification. |
| Lee | Dataset contains consolidated epigenomes from the Roadmap Epigenomics Consortium and the ENCODE Consortium [ | Up to 86% f-score; above 80% precision-recall curve metric. |