| Literature DB >> 32784445 |
Sambit K Mishra1, Viraj Muthye1, Gaurav Kandoi1.
Abstract
Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable, and plant diseases. The mRNA isoforms of the same gene can have dramatically different functions. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.Entities:
Keywords: RNA-seq; alternative splicing; deep learning; gene ontology; mRNA isoforms; machine learning; multiple instance learning; recommender systems
Mesh:
Substances:
Year: 2020 PMID: 32784445 PMCID: PMC7460821 DOI: 10.3390/ijms21165686
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Common Alternative splicing events. The mechanisms of the most common alternative splicing events: Exon Skipping, Intron Retention, Alternative 5′ splice site selection, and Alternative 3′ Splice Site selection, are presented. There are several other alternative splicing events that are not shown here. The colored bars (blue, orange, and green) represent exons, while the black lines connecting them represent intronic segments. Exon skipping—The most prevalent mechanism in vertebrates and invertebrates where specific exons in the pre-mRNA are skipped in the mature mRNA transcript. Intron retention—Common in lower metazoans and plants, it is a process where an intron is retained in the mature mRNA, and Alternative 3′/5′ acceptor/donor sites—A process that involves exons that are flanked by competing splice sites on one end (3′/5′) and a fixed splice site on the opposite end, resulting in an alternative region that is either included or excluded in the mature mRNA.
Figure 2Overview of multiple instance learning (MIL) framework. Each gene (black ellipse) is considered a bag, where each mRNA isoform (circle or square within the ellipse) is considered an instance of the bag. A gene associated with a function is a positive bag, and all instances (mRNA isoforms) associated with that function are called “witnesses”.
Frequently used terminologies and their contextual definitions in the field of mRNA isoform function prediction.
| Terminology | Definitions |
|---|---|
| Alternative splicing | A transcriptional regulatory mechanism that leads to the production of multiple mature mRNA isoforms from a single gene. |
| mRNA isoforms | Mature mRNA products of the same gene which usually differ in their sequences and may perform different functions. |
| Multiple Instance Learning (MIL) | A weakly supervised learning framework where labels are available at the level of gene instead of the individual mRNA isoforms and the goal is to find the specific mRNA isoforms responsible for a gene’s function. |
| RNA-seq | A high-throughput way of measuring the expression of gene and mRNA isoforms. |
| Gene Ontology (GO) term | A controlled vocabulary term that refers to a specific function performed by genes and gene products. |
| mRNA isoform–isoform interaction (III) functional network | mRNA isoform level functional networks where an edge between two mRNA isoforms suggests the involvement of both mRNA isoforms in the same function. |
Figure 3Deep Learning approach for mRNA isoform function prediction. In these methods, gene and mRNA isoform level features are used as input to a deep neural network which consists of multiple hidden layers. The output from these deep neural networks are the predicted gene and mRNA isoform level function predictions.
Figure 4An overview of recommendation system approaches for mRNA isoform function prediction. In recommendation system-based approaches, the mRNA isoform level features and the features at the biological functions like GO are projected to a latent space which are then associated by a decomposition unit to produce the final mRNA isoform level function recommendations.
Summary of methods reviewed based on their input data type and approach.
| Method | Method | Input Data Type | Input Data Description | Performance (GO Biological Process Terms) | Limitations |
|---|---|---|---|---|---|
| isoPred | MIL with support vector machine (SVM) as a base learner | RNA-seq; GO | 19,209 genes and 24,274 mRNA isoforms from mouse | Area Under the Receiver Operating Curve (AUROC): 0.68–0.76 (multiple mRNA isoform genes) AUROC: 0.62–0.68 (single mRNA isoform genes) | Only RNA-Seq input; Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |
| iMILP | MIL with label propagation | RNA-seq; GO | 31,454 human mRNAs | AUROC: 0.67 | Only RNA-Seq input; Genes annotated to sibling GO terms used as negative set; no tissue, cell, sex, or age specificity |
| IsoFunc | MIL with SVM as base learner | RNA-seq; GO | 11,946 genes and 59,297 mRNA isoforms from human | AUROC: 0.64 | Only RNA-Seq input; Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |
| WLRM | MIL with weighted logistic regression | RNA-seq; GO | 11,946 genes and 59,297 mRNA isoforms from human | AUROC: 0.6–0.85 | Only RNA-Seq input; Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |
| IIIDB | Network-based | RNA-seq; domain–domain interactions; GO; protein–protein interaction (PPI) | 31,454 mRNA isoforms from human | Data not available | Only RNA-Seq input; Subcellular localization as negative set; no tissue, cell, sex, or age specificity; limited to existing PPIs |
| Mouse Splice Isoform Network | Network-based; MIL with Bayesian network | RNA-Seq; Exon array; Protein docking; pseudo-amino acid composition; GO; Pathways | Data not available | AUROC: 0.62 | Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |
| TENSION | Network- based; Random Forest | RNA-Seq; mRNA Sequence; Protein Sequence; PPI; GO; Pathways | 21,813 genes and 75,826 mRNA isoforms from mouse | AUROC: 0.94 | No cell, sex, or age specificity |
| DeepIsoFun | Deep learning | RNA-Seq; GO | 19,532 genes and 47,393 mRNA isoforms from human | AUROC: 0.74 | Only RNA-Seq input; Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |
| DIFFUSE | Deep learning | RNA-Seq; mRNA sequence; Protein sequence; GO | 19,303 genes and 39,375 mRNA isoforms from human | AUROC: 0.84 | Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |
| mFRecSys | Recommendation system | RNA-Seq; mRNA sequence; Protein sequence; PPI; GO; Pathways | 21,813 genes and 75,826 mRNA isoforms from mouse | AUROC: 0.99 | Limited tissue-specificity; No cell, sex, or age specificity |
| DisoFun | Recommendation system | RNA-Seq; PPI; GO | 11,868 genes and 25,939 mRNA isoforms from human | AUROC: 0.71 | Only RNA-Seq input; Random unannotated genes as negative set; no tissue, cell, sex, or age specificity |