| Literature DB >> 23132350 |
Xingli Guo1, Lin Gao, Qi Liao, Hui Xiao, Xiaoke Ma, Xiaofei Yang, Haitao Luo, Guoguang Zhao, Dechao Bu, Fei Jiao, Qixiang Shao, RunSheng Chen, Yi Zhao.
Abstract
More and more evidences demonstrate that the long non-coding RNAs (lncRNAs) play many key roles in diverse biological processes. There is a critical need to annotate the functions of increasing available lncRNAs. In this article, we try to apply a global network-based strategy to tackle this issue for the first time. We develop a bi-colored network based global function predictor, long non-coding RNA global function predictor ('lnc-GFP'), to predict probable functions for lncRNAs at large scale by integrating gene expression data and protein interaction data. The performance of lnc-GFP is evaluated on protein-coding and lncRNA genes. Cross-validation tests on protein-coding genes with known function annotations indicate that our method can achieve a precision up to 95%, with a suitable parameter setting. Among the 1713 lncRNAs in the bi-colored network, the 1625 (94.9%) lncRNAs in the maximum connected component are all functionally characterized. For the lncRNAs expressed in mouse embryo stem cells and neuronal cells, the inferred putative functions by our method highly match those in the known literature.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23132350 PMCID: PMC3554231 DOI: 10.1093/nar/gks967
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Principles of lnc-GFP. (A) The coding–non-coding bi-colored network is represented as a graph. (B) Function T is used to compute the previous knowledge score between an unannotated lncRNA v and the given function category f. (C) Function S is used to compute the final association score between v and f based on the genes known to be annotated with f. The computation not only simulates the iterative propagation of the ‘function flow’ on the network but also considers the local constraint on behalf of previous knowledge score.
Figure
2.Coding–non-coding bi-colored biological network. (A) A maximum connected subnetwork of the bi-colored network of mouse is shown; here, the red node represents protein coding gene and the green node represents lncRNA, the blue line represents co-expression between two genes, the light blue line represents co-expression and protein interaction between two genes and the black line represents protein interaction between two genes. (B) The distribution of ‘co-expression’ edges and ‘protein interaction’ edges in the bi-colored network. (C) The degree distribution of the bi-colored network. Here, k is degree, P(k) denotes the probability with a degree k. (D) Superior performance of our bi-colored network.
Figure 3.Performance of lnc-GFP. (A) The performance of lnc-GFP in cross-validation tests. (B) The performance of lnc-GFP in noisy bi-colored networks with part of edges randomized. (C) The performance of lnc-GFP in noisy bi-colored networks with part of edges deleted.
Figure 4.LncRNAs involved in diverse GO BPs. Here, the rank denotes the rank threshold. For the given rank threshold, the number of lncRNAs and GO BPs involved in the predicted ‘lnc2go’associations are given on the top of bars.