| Literature DB >> 29220461 |
Jin Li1,2, Ching-San Tseng3, Antonio Federico4,5, Franjo Ivankovic6, Yi-Shuian Huang3, Alfredo Ciccodicola4,5, Maurice S Swanson6, Peng Yu1,2.
Abstract
Although the number of RNA-Seq datasets deposited publicly has increased over the past few years, incomplete annotation of the associated metadata limits their potential use. Because of the importance of RNA splicing in diseases and biological processes, we constructed a database called SFMetaDB by curating datasets related with RNA splicing factors. Our effort focused on the RNA-Seq datasets in which splicing factors were knocked-down, knocked-out or over-expressed, leading to 75 datasets corresponding to 56 splicing factors. These datasets can be used in differential alternative splicing analysis for the identification of the potential targets of these splicing factors and other functional studies. Surprisingly, only ∼15% of all the splicing factors have been studied by loss- or gain-of-function experiments using RNA-Seq. In particular, splicing factors with domains from a few dominant Pfam domain families have not been studied. This suggests a significant gap that needs to be addressed to fully elucidate the splicing regulatory landscape. Indeed, there are already mouse models available for ∼20 of the unstudied splicing factors, and it can be a fruitful research direction to study these splicing factors in vitro and in vivo using RNA-Seq. Database URL:http://sfmetadb.ece.tamu.edu/Entities:
Mesh:
Substances:
Year: 2017 PMID: 29220461 PMCID: PMC5737203 DOI: 10.1093/database/bax071
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.A use case of SFMetaDB for the splicing factor Mbnl1. We showed a use case of the splicing factor Mbnl1 to demonstrate the advantage of SFMetaDB over ArrayExpress. By using the same keyword, Mbnl1, SFMetaDB returned five accurate datasets that can be used for the downstream alternative splicing analyses. On the contrary, ArrayExpress returned 13 datasets with 8 that could not be used for the downstream alternative splicing analyses for Mbnl1. (a) The result page in SFMetaDB of the query Mbnl1. (b) The description page of the dataset GSE39911 in GEO. (c) The result page in ArrayExpress of the query Mbnl1. (d) The description page of the dataset E-GEOD-76222 in ArrayExpress.
Figure 2.The occurrence of Pfam domain families in splicing factors. The known RNA splicing factors are annotated in UniProt according to the Pfam domain families of the protein domains found in these factors. A splicing factor may have multiple domains that belong to multiple Pfam families, and a Pfam domain family may contain domains in multiple splicing factors. The Pfam annotations were retrieved for each of 353 splicing factors, and the number of splicing factors was calculated for each of the Pfam families. For the 56 splicing factors that have curated datasets in SFMetaDB, the number of splicing factors was also calculated for the associated Pfam families. In the dodged barplots, the Pfam domain families are ranked by the number of the splicing factors which contain domains in the given families. Of the total 217 Pfam domain families annotated in UniProt, 26 Pfam domain families have ≥3 splicing factors annotated. The Pfam domain family with the most number of splicing factors is Pfam RRM_1 (PF00076). It contains 87 splicing factors, and 25 of these splicing factors have been studied according to our curation results. However, the splicing factors in the rest of the Pfam domain families have brought relatively less attention in RNA-Seq analysis, and they may be promising candidates for future studies.