| Literature DB >> 31888461 |
Aarthi Ramakrishnan1, Sarath Chandra Janga2,3,4.
Abstract
BACKGROUND: RNA-binding proteins (RBPs) are crucial in modulating RNA metabolism in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Although previous studies on the conservation of RBP targets have been carried out in lower eukaryotes such as yeast, relatively little is known about the extent of conservation of the binding sites of RBPs across mammalian species.Entities:
Keywords: CLIP-seq; Evolution of binding sites; Gene expression dynamics; Gene regulatory network; Genotype-phenotype; Network evolution; Post-transcriptional control; Protein-RNA interactions; RNA binding proteins
Mesh:
Substances:
Year: 2019 PMID: 31888461 PMCID: PMC6936122 DOI: 10.1186/s12864-019-6330-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flowchart showing the various steps employed to study the difference in the extent of conservation of binding sites of RBPs across species. BED files containing binding site coordinates of human RBPs (60 files; one for each RBP) were downloaded from CLIPdb [28]. Multiple Alignment Format (MAF) files (22 files; one for each human chromosome) were downloaded from UCSC genome browser [29], which contain multiple alignments of the whole genomes of 46 vertebrate species arranged in a series of blocks. If the start and end coordinates of a binding site of an RBP from the BED file occurred within the human genome coordinates of a block in a MAF file, the block was extracted. Otherwise, the binding site was ignored. The percentage of species each binding site of an RBP was conserved in was computed from its corresponding MAF block. Repeating this procedure for all RBPs revealed the extent of conservation of binding sites for each RBP. To study the factors contributing to the differences in the extent of conservation of binding sites of RBPs, various RBP-centric and RBP-target level features were examined: A) Phylogenetic relationship of RBPs belonging to the same family, and their extent of conservation of binding sites. B) A multivariate analysis to uncover the RBP-centric features that could influence the extent of conservation of binding sites. C) The extent of conservation of binding sites depending on their location of occurrence along the length of a gene. D) Gene set enrichment to identify the phenotypes associated with an RBP’s post-transcriptional network, ranked by the percentage of species binding sites of an RBP’s target gene were conserved in
Fig. 2Boxplots showing the extent of conservation of binding sites for each of the 60 human RBPs. Each box plot corresponds to the distribution of the extent of conservation of experimentally identified binding sites of an RBP, across 46 species. Box plots have been arranged in the increasing order of median extent of conservation of binding sites. Circles in the boxplots correspond to the outliers
Fig. 3Heatmap showing the conservation of binding sites of RBPs across species. The columns in the heatmap represent species that are arranged based on their evolutionary distances using the R package ape [58], whereas the rows represent RBPs that have been clustered based on Euclidean distance and ward’s method using the function hclust in R. Each cell in the heatmap corresponds to the average extent of conservation of all the binding sites of an RBP in a specific species. While humans and chimps exhibit more than 80% conservation of binding sites of RBPs, lamprey exhibited the least with less than 40% conservation. RBPs NUDT21, EIF4A3 and LIN28A were found to show a high extent of conservation of binding sites across species, whereas the RBPs HNRNPC, HNRNPM and CPSF2 exhibited the least
Fig. 4Phylogenetic relationship of RBPs belonging to the same family and their extent of conservation of binding sites across species. RBPs belonging to the same family exhibit varying extents of conservation of binding sites. Box plots represent the extent of conservation of binding sites for the corresponding RBPs in the phylogenetic tree. Comparisons between the phylogenetic trees and the extent of conservation of their binding site profiles, have been shown for members of four RBP families a HNRNP family b CPSF family c IGF2BP family d AGO family
RBP-centric features employed to uncover the predictor variables likely to explain the variations in the extent of conservation of binding sites for RBPs
| Variable | Name of Feature | Description |
|---|---|---|
| Response | Median extent of conservation of binding sites of RBPs across species. | For each RBP, the median extent of conservation of binding sites was calculated by computing the median of percentage of species each binding site was conserved in. |
| Predictors | Tissue Specificity Index ( | TSI for each RBP was found using the TSI formula as described in a previous study [ |
| Number of binding sites | For each RBP, the total number of binding sites from the BED file that mapped to a block in the MAF file was considered. | |
| Length of transcript | The length of transcript for each RBP was obtained from Ensembl Biomart [ | |
| Number of protein-protein interactions | For each RBP, the number of interacting partners was calculated with data obtained from BioGRID [ | |
| Median protein level expression of RBPs across tissues | Protein level expression across 17 adult tissues was calculated for each RBP from protein level expression matrix available on Human Proteome Map [ | |
| Median transcript level expression of RBPs across tissues | Transcript level expression of RBPs across 16 tissues was calculated using Human BodyMap 2.0 data from Ensembl [ | |
| Number of RNA-binding domains | Number of RNA binding domains for each RBP was obtained from a previous study on human RBPs [ | |
| Number of Paralogs | The number of paralogs for each RBP was obtained from Ensembl [ | |
| Number of sub-cellular compartments | For each RBP, the number of sub-cellular compartments that it is present in was found from UniProt [ | |
| Conservation of RBPs | The number of species that each RBP was conserved in was obtained from a previous study [ | |
| Number of RBP-RBP interactions | For each RBP, the number of interacting RBPs was computed using data from BioGRID [ |
Attribute importance from RReliefF feature selection analysis for RBP-centric features described in Table 1
| Features | Attribute Importance |
|---|---|
| Number of binding sites | 0.0114 |
| Median protein level expression of RBPs across tissues | 0.0048 |
| Number of protein-protein interactions | 0.0043 |
| Number of RBP-RBP interactions | 0.0026 |
| Median transcript level expression of RBPs across tissues | 0.0009 |
| Tissue Specificity Index ( | 0.0005 |
| Number of RNA-binding domains | −0.0025 |
| Length of transcript | −0.0033 |
| Number of sub-cellular compartments | −0.0139 |
| Conservation of RBPs | −0.0145 |
| Number of Paralogs | −0.0195 |
Fig. 5Heatmap showing the extent of conservation of binding sites classified based on their occurrence in the 5′, 3′ or middle region of a gene. Following the classification of all genes in the human genome into 3 equal segments namely 5′, 3′ and middle region, binding sites of RBPs were mapped onto these genic classes. Heatmap shows the median extent of conservation of binding sites of RBPs occurring in the genic classes indicated on the X-axis. Darker colors in the heatmap indicated by the scale bar correspond to higher median extent of conservation of the binding sites
Fig. 6Heatmap showing the Human Phenotype Ontology (HPO) gene sets associated with the binding site conservation profiles of RBPs. Heatmap shows the most significant (corrected p-value < 0.05) HPO gene sets that were enriched for genes with highly conserved binding sites. Enriched HPO gene sets were identified by performing a modified gene set enrichment analysis, which uses the extent of conservation of RBP binding sites, as described in the Materials and Methods. Binding sites of NUDT21 yielded the highest number of significant HPO associations