| Literature DB >> 28361701 |
Umberto Perron1, Paolo Provero1,2, Ivan Molineris3.
Abstract
BACKGROUND: In recent years long non coding RNAs (lncRNAs) have been the subject of increasing interest. Thanks to many recent functional studies, the existence of a large class of lncRNAs with potential regulatory functions is now widely accepted. Although an increasing number of lncRNAs is being characterized and shown to be involved in many biological processes, the functions of the vast majority lncRNA genes is still unknown. Therefore computational methods able to take advantage of the increasing amount of publicly available data to predict lncRNA functions could be very useful.Entities:
Keywords: Coexpression; Disease gene prediction; Functional prediction; lncRNA
Mesh:
Substances:
Year: 2017 PMID: 28361701 PMCID: PMC5374551 DOI: 10.1186/s12859-017-1535-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Summarized representation of predicted GO terms that are more typical of lincRNAs than PCGs and vice versa. a GO biological process terms typical of lncRNAs; b GO cellular component terms typical of lncRNAs; c GO biological process terms typical of PCGs; d GO cellular component terms typical of PCGs. Bubble color represents how much the term is specific for PCGs or lincRNAs (brighter is more specific); bubble size indicates the frequency of the GO term in the whole GO database. Highly similar GO terms are linked by edges in the graph, where the line width indicates the degree of similarity
The table displays a set of 37 genes that were manually annotated with one term from the generic GO slim starting from the functional description reported in lncRNAdb
| ENSG | GOslim | ENSG | GOslim |
|---|---|---|---|
| ENSG00000130600 | GO:0009790 | ENSG00000230590 | GO:0000228 |
| ENSG00000130600 | GO:0040007 | ENSG00000230590 | GO:0005694 |
| ENSG00000130600 | GO:0000988 | ENSG00000231265 | GO:0048870 |
| ENSG00000130600 | GO:0006412 | ENSG00000231265 | GO:0016301 |
| ENSG00000130600 | GO:0008283 | ENSG00000236790 | GO:0048856 |
| ENSG00000153363 | GO:0008219 | ENSG00000241684 | GO:0048870 |
| ENSG00000153363 | GO:0040007 | ENSG00000241684 | GO:0000988 |
| ENSG00000176840 | GO:0008219 | ENSG00000241743 | GO:0003677 |
| ENSG00000177410 | GO:0040007 | ENSG00000241743 | GO:0000228 |
| ENSG00000177410 | GO:0008283 | ENSG00000241743 | GO:0005694 |
| ENSG00000177410 | GO:0030154 | ENSG00000244306 | GO:0030154 |
| ENSG00000204092 | GO:0008283 | ENSG00000244306 | GO:0048870 |
| ENSG00000204092 | GO:0007049 | ENSG00000244306 | GO:0006397 |
| ENSG00000204092 | GO:0003723 | ENSG00000244306 | GO:0007165 |
| ENSG00000214548 | GO:0040007 | ENSG00000245532 | GO:0030674 |
| ENSG00000214548 | GO:0021700 | ENSG00000245532 | GO:0005634 |
| ENSG00000214548 | GO:0030154 | ENSG00000245532 | GO:0043234 |
| ENSG00000214548 | GO:0000988 | ENSG00000245532 | GO:0005198 |
| ENSG00000214548 | GO:0006259 | ENSG00000245532 | GO:0065003 |
| ENSG00000223403 | GO:0000988 | ENSG00000245910 | GO:0006412 |
| ENSG00000223403 | GO:0009790 | ENSG00000245910 | GO:0005840 |
| ENSG00000223403 | GO:0040007 | ENSG00000247556 | GO:0009790 |
| ENSG00000223403 | GO:0048856 | ENSG00000247556 | GO:0048646 |
| ENSG00000223573 | GO:0030154 | ENSG00000247844 | GO:0008283 |
| ENSG00000223573 | GO:0006397 | ENSG00000247844 | GO:0048870 |
| ENSG00000223573 | GO:0003723 | ENSG00000248323 | GO:0008283 |
| ENSG00000223573 | GO:0003729 | ENSG00000249669 | GO:0030154 |
| ENSG00000223850 | GO:0008283 | ENSG00000249669 | GO:0000988 |
| ENSG00000223850 | GO:0006397 | ENSG00000249859 | GO:0005578 |
| ENSG00000224177 | GO:0005856 | ENSG00000249859 | GO:0008283 |
| ENSG00000224177 | GO:0005198 | ENSG00000249859 | GO:0008219 |
| ENSG00000225127 | GO:0040007 | ENSG00000249859 | GO:0030154 |
| ENSG00000225127 | GO:0021700 | ENSG00000250366 | GO:0009790 |
| ENSG00000225407 | GO:0009790 | ENSG00000250366 | GO:0048856 |
| ENSG00000225407 | GO:0000988 | ENSG00000251002 | GO:0006259 |
| ENSG00000225407 | GO:0005634 | ENSG00000251002 | GO:0005634 |
| ENSG00000225407 | GO:0000228 | ENSG00000251002 | GO:0002376 |
| ENSG00000225407 | GO:0051276 | ENSG00000251164 | GO:0008283 |
| ENSG00000225506 | GO:0030154 | ENSG00000253352 | GO:0008219 |
| ENSG00000225783 | GO:0006397 | ENSG00000253352 | GO:0000988 |
| ENSG00000225783 | GO:0030154 | ENSG00000253352 | GO:0000228 |
| ENSG00000225783 | GO:0048856 | ENSG00000253438 | GO:0006950 |
| ENSG00000225783 | GO:0003723 | ENSG00000253438 | GO:0006412 |
| ENSG00000225783 | GO:0030154 | ENSG00000253438 | GO:0006259 |
| ENSG00000229140 | GO:0008283 | ENSG00000255733 | GO:0002376 |
| ENSG00000229140 | GO:0030154 | ENSG00000258399 | GO:0009790 |
| ENSG00000229140 | GO:0008219 | ENSG00000258399 | GO:0000988 |
| ENSG00000229807 | GO:0030234 | ENSG00000258399 | GO:0000228 |
| ENSG00000229807 | GO:0003677 | ENSG00000258609 | GO:0030154 |
| ENSG00000229807 | GO:0000228 | ENSG00000258609 | GO:0008219 |
| ENSG00000229807 | GO:0005694 |
Fig. 2ROC curve comparing the performance of our approach (rank prod) and the standard enrichment method (Fisher). Different cutoffs are applied to the Pearson correlation that measures the gene coexpression. The little box is a zoom of the region of small FPS. Blurred thick curves result from the overlap of semi-transparent curves derived by 100 random sampling for each cutoff and our method. Dark thin line result from the averaging of such data
Log(ods) and relative Pvalues associated to different tissues in logistic models
| Univariate | Bivariate | Multivariate | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Predictor | Samples | Log(odds) | AUC |
| Log(odds) | AUC |
| Log(odds) |
|
| Adipose_Tissue | 159 | –3.9 | 0.75 | <1e-256 | –1.7 | 0.8 | 2.4e-214 | –0.69 | 1e-11 |
| Adrenal_Gland | 52 | –3.6 | 0.72 | <1e-256 | –1.7 | 0.79 | 5.7e-149 | –0.49 | 1.2e-07 |
| Bladder | 11 | –1.9 | 0.65 | 7.8e-206 | –0.59 | 0.77 | 3.1e-24 | –0.05 | 0.53 |
| Blood | 245 | –3.4 | 0.75 | <1e-256 | –1.2 | 0.79 | 2.6e-115 | –0.33 | 0.00018 |
| Blood_Vessel | 263 | –4.2 | 0.75 | <1e-256 | –1.6 | 0.8 | 2.5e-198 | –0.18 | 0.11 |
| Brain | 357 | –3.4 | 0.72 | <1e-256 | –1.6 | 0.79 | 3.5e-108 | –0.32 | 0.00029 |
| Breast | 66 | –3.2 | 0.71 | <1e-256 | –1.2 | 0.79 | 5.5e-131 | –0.03 | 0.76 |
| Cervix_Uteri | 9 | –2.5 | 0.65 | 5.8e-290 | –0.85 | 0.78 | 4.1e-42 | –0.16 | 0.1 |
| Colon | 74 | –3.4 | 0.71 | <1e-256 | –1.2 | 0.79 | 2.3e-124 | –0.25 | 0.018 |
| Esophagus | 227 | –3.8 | 0.74 | <1e-256 | –1.4 | 0.79 | 1.1e-155 | –0.29 | 0.0073 |
| Fallopian_Tube | 6 | –1.8 | 0.62 | 3.1e-182 | –0.76 | 0.77 | 1e-16 | –0.01 | 0.93 |
| Heart | 133 | –4.1 | 0.78 | <1e-256 | –1.7 | 0.81 | 8.5e-298 | –0.64 | 1.7e-10 |
| Kidney | 8 | –2.2 | 0.74 | <1e-256 | –0.89 | 0.79 | 1.7e-119 | –0.47 | 5.6e-16 |
| Liver | 34 | –3.3 | 0.73 | <1e-256 | –1.4 | 0.79 | 8.5e-166 | –0.42 | 1.3e-06 |
| Lung | 133 | –3.8 | 0.74 | <1e-256 | –1.5 | 0.79 | 2.4e-188 | –0.28 | 0.0061 |
| Muscle | 157 | –3.9 | 0.78 | <1e-256 | –1.7 | 0.81 | 2.4e-294 | –0.98 | 2.6e-27 |
| Nerve | 114 | –3.7 | 0.72 | <1e-256 | –1.4 | 0.79 | 4.2e-148 | 0.12 | 0.29 |
| Ovary | 35 | –2.8 | 0.68 | <1e-256 | –1.2 | 0.78 | 3.6e-83 | –0.08 | 0.4 |
| Pancreas | 65 | –3.8 | 0.79 | <1e-256 | –2 | 0.82 | <1e-256 | –1.1 | 2.5e-43 |
| Pituitary | 22 | –1.8 | 0.66 | 7.3e-261 | –0.69 | 0.78 | 4.1e-47 | –0.08 | 0.33 |
| Prostate | 42 | –3.1 | 0.7 | <1e-256 | –1.1 | 0.78 | 1.9e-92 | 0.04 | 0.7 |
| Salivary_Gland | 5 | –0.46 | 0.6 | 2.9e-27 | 0.09 | 0.77 | 0.21 | 0.37 | 2.3e-08 |
| Skin | 322 | –3.7 | 0.72 | <1e-256 | –1.6 | 0.79 | 1.5e-129 | –0.57 | 7.4e-10 |
| Small_Intestine | 17 | –2.5 | 0.66 | <1e-256 | –0.81 | 0.78 | 6.6e-40 | –0.03 | 0.76 |
| Spleen | 34 | –2.7 | 0.67 | <1e-256 | –1 | 0.78 | 5.4e-72 | 0.03 | 0.7 |
| Stomach | 81 | –3.7 | 0.74 | <1e-256 | –1.6 | 0.79 | 1e-152 | –0.23 | 0.028 |
| Testis | 60 | –3 | 0.67 | <1e-256 | –1.2 | 0.78 | 6.2e-78 | –0.35 | 0.00012 |
| Thyroid | 120 | –3.6 | 0.72 | <1e-256 | –1.4 | 0.79 | 1.1e-140 | 0.19 | 0.071 |
| Uterus | 36 | –2.7 | 0.68 | <1e-256 | –0.99 | 0.78 | 9.1e-70 | 0.06 | 0.54 |
| Vagina | 34 | –3 | 0.7 | <1e-256 | –1.2 | 0.78 | 1.4e-93 | 0.12 | 0.25 |
| AS | 2921 | –3.7 | 0.77 | <1e-256 | –1.2 | 5.9e-51 | |||
Bivariate models include two predictors: the indicated TS plus AS, the AUC is relative to the entire model. The multivariate model include all the predictors, in this case the AUC is 0.85
Sequential analysis of deviance (anova): it sequentially compares the smaller model with the next more complex model by adding one variable (TS) in each step
| Predictor | Df | Deviance | Resid. Df | Resid. Dev | Pr(>Chi) |
|---|---|---|---|---|---|
| All tissues | 1 | 5534.7 | 23920 | 24919 | < 2.2e-16*** |
| Pancreas | 1 | 1694.8 | 23919 | 23224 | < 2.2e-16*** |
| Muscle | 1 | 617.6 | 23918 | 22606 | < 2.2e-16 *** |
| Kidney | 1 | 131.7 | 23917 | 22475 | < 2.2e-16*** |
| Adipose Tissue | 1 | 208.0 | 23916 | 22267 | < 2.2e-16*** |
| Heart | 1 | 124.2 | 23915 | 22142 | < 2.2e-16*** |
| Skin | 1 | 79.4 | 23914 | 22063 | < 2.2e-16*** |
| Salivary Gland | 1 | 27.6 | 23913 | 22035 | 1.476e-07*** |
| Adrenal Gland | 1 | 54.2 | 23912 | 21981 | 1.844e-13*** |
| Liver | 1 | 38.0 | 23911 | 21943 | 7.127e-10 *** |
| Testis | 1 | 18.6 | 23910 | 21925 | 1.582e-05*** |
| Blood | 1 | 35.0 | 23909 | 21890 | 3.280e-09*** |
| Brain | 1 | 14.5 | 23908 | 21875 | 0.0001425*** |
| Lung | 1 | 12.6 | 23907 | 21862 | 0.0003821*** |
| Esophagus | 1 | 19.8 | 23906 | 21843 | 8.546e-06*** |
| Colon | 1 | 12.9 | 23905 | 21830 | 0.0003299*** |
| Stomach | 1 | 5.5 | 23904 | 21824 | 0.0185580* |
| Thyroid | 1 | 3.2 | 23903 | 21821 | 0.0756208. |
| Cervix Uteri | 1 | 2.0 | 23902 | 21819 | 0.1569939 |
| Blood Vessel | 1 | 2.1 | 23901 | 21817 | 0.1460208 |
| Vagina | 1 | 1.2 | 23900 | 21816 | 0.2668693 |
| Nerve | 1 | 1.0 | 23899 | 21815 | 0.3103205 |
| Pituitary | 1 | 1.0 | 23898 | 21814 | 0.3151511 |
| Ovary | 1 | 0.6 | 23897 | 21813 | 0.4379597 |
| Bladder | 1 | 0.3 | 23896 | 21813 | 0.5573609 |
| Uterus | 1 | 0.3 | 23895 | 21812 | 0.5560578 |
| Prostate | 1 | 0.2 | 23894 | 21812 | 0.6973397 |
| Spleen | 1 | 0.1 | 23893 | 21812 | 0.7249901 |
| Small Intestine | 1 | 0.1 | 23892 | 21812 | 0.7569913 |
| Breast | 1 | 0.1 | 23891 | 21812 | 0.7576411 |
| Fallopian Tube | 1 | 0.0 | 23890 | 21812 | 0.9245252 |
Each of those comparisons is done via a likelihood ratio test. The model does not significantly improve after the inclusion of the best 17 TS as predictors (Signif. codes: 0 ‘***’ 0.001 ‘*’ 0.05 ‘.’ 0.1 ‘ ’1)
Function predicted for lncRNA implicated in cancer. Ten best MSigBDh functions are reported, none of the Pvalues (Wilcoxon rank sum test) is significant per se after multiple testing correction
| MSigBDh functions | Raw |
|---|---|
| E2F TARGETS | 0.0029 |
| G2M CHECKPOINT | 0.0036 |
| DNA REPAIR | 0.0055 |
| MITOTIC SPINDLE | 0.006 |
| SPERMATOGENESIS | 0.0067 |
| WNT BETA CATENIN SIGNALING | 0.035 |
| MYC TARGETS V1 | 0.044 |
| HEME METABOLISM | 0.093 |
| UNFOLDED PROTEIN RESPONSE | 0.13 |
| UV RESPONSE UP | 0.13 |
Log(ods) and relative Pvalues associated to different single species (SSs) in logistic models
| Univariate | Bivariate | Multivariate | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Predictor | Samples | Log(odds) | AUC |
| Log(odds) | AUC |
| Log(odds) |
|
| ggallus | 17 | –2.6 | 0.67 | 2.2e–217 | –1.9 | 0.72 | 5.9e–106 | –1.3 | 1.1e–51 |
| ggorilla | 12 | –2.9 | 0.65 | 3.7e–162 | –0.99 | 0.78 | 4e–19 | –0.0092 | 0.95 |
| hsapiens | 59 | –4.3 | 0.7 | 1.4e–300 | –1.9 | 0.74 | 7.3e–54 | –1.7 | 2.6e–31 |
| mdomestica | 20 | –3.1 | 0.68 | 1.5e–225 | –1.8 | 0.73 | 1.2e–77 | –1 | 3.3e–23 |
| mmulatta | 14 | –2.7 | 0.65 | 1e–175 | –1.1 | 0.8 | 2.1e–25 | 0.38 | 0.002 |
| mmusculus | 49 | –4 | 0.69 | 6.4e–275 | –2.1 | 0.78 | 1.9e–74 | –1.4 | 1.2e–24 |
| oanatinus | 19 | –2.9 | 0.68 | 2.5e–229 | –1.9 | 0.78 | 3.9e–92 | –1.4 | 1.3e–46 |
| pabelii | 10 | –2.8 | 0.64 | 3.7e–142 | –1.2 | 0.81 | 1.7e–27 | –0.53 | 2.2e–05 |
| ptroglodytes | 28 | –3.1 | 0.67 | 1.5e–204 | –1.2 | 0.77 | 1.6e–27 | –0.36 | 0.01 |
| xtropicalis | 13 | –2.7 | 0.65 | 5.5e–176 | –1.8 | 0.78 | 4.4e–79 | –1.5 | 5.3e–56 |
Bivariate models include two predictors: the indicated species plus AS. Note that AS and SSs derive from different expression datasets, GTEx and Necsulea respectively. The bivariate model that include hsapiens (from Necsulea) and AS shows that the contribute of hsapiens to the prediction is significant even if derive from the same specie of AS. The multivariate model consider all SSs but not AS, the AUC in this case is 0.77, the same AUC that we obtain with AS alone
Fig. 3Schematic workflow of our annotation algorithm. The correlation rank (CR) among tissue- or species-specific expression profiles is used to generate complete weighted single-tissue or single-species gene networks (STN or SSN). Previously known functional annotations linked to human genes are then used along with our gene networks to compute a functional prediction score (FPS) between each gene and every annotation term; in the case of SSN it is necessary to consider homology relations between human genes and those of the considered species. The information is then combined using logistic models. The models gives a list of predictions as output; each one consists of a score associated to a gene id and an ontology term