| Literature DB >> 28076842 |
Chaohan Xu1, Rui Qi1, Yanyan Ping1, Jie Li1, Hongying Zhao1, Li Wang1, Michael Yifei Du2, Yun Xiao1,3, Xia Li1.
Abstract
LncRNAs have emerged as a major class of regulatory molecules involved in normal cellular physiology and disease, our knowledge of lncRNAs is very limited and it has become a major research challenge in discovering novel disease-related lncRNAs in cancers. Based on the assumption that diverse diseases with similar phenotype associations show similar molecular mechanisms, we presented a pan-cancer network-based prioritization approach to systematically identify disease-specific risk lncRNAs by integrating disease phenotype associations. We applied this strategy to approximately 2800 tumor samples from 14 cancer types for prioritizing disease risk lncRNAs. Our approach yielded an average area under the ROC curve (AUC) of 80.66%, with the highest AUC (98.14%) for medulloblastoma. When evaluated using leave-one-out cross-validation (LOOCV) for prioritization of disease candidate genes, the average AUC score of 97.16% was achieved. Moreover, we demonstrated the robustness as well as the integrative importance of this approach, including disease phenotype associations, known disease genes and the numbers of cancer types. Taking glioblastoma multiforme as a case study, we identified a candidate lncRNA gene SNHG1 as a novel disease risk factor for disease diagnosis and prognosis. In summary, we provided a novel lncRNA prioritization approach by integrating pan-cancer phenotype associations that could help researchers better understand the important roles of lncRNAs in human cancers.Entities:
Keywords: disease phenotype association; identification and prioritization; pan cancer; risk lncRNA
Mesh:
Substances:
Year: 2017 PMID: 28076842 PMCID: PMC5355324 DOI: 10.18632/oncotarget.14510
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Heat maps of co-expression relationships in the fourteen cancer-types
Clustering maps showing existence of gene-gene or lncRNA-gene co-expression relationships among different cancer types that can be used to construct disease-specific sub-networks.
Figure 2Evaluation of the performance of our lncRNA prioritization approach
A. The ROC curves of gene prioritization results by LOOCV B. The ROC curves of lncRNA prioritization results. C. Top 100 ranks of known disease genes (left) and lncRNAs (right) after prioritization.
Figure 3Evaluation of the influence of disease phenotype associations
A. Fourteen disease phenotype association network. B. The ROC curves of lncRNA prioritization results generated by excluding disease phenotype associations. C. The ROC curves generated by randomly selecting disease phenotype associations with 1000 repetitions. D. The comparison results of lncRNA prioritization generated by either using, excluding or permuting disease phenotype associations.
Figure 4The prioritization results in the case study of GBM
A. The GO and KEGG enrichment analysis results for top 17 non-disease candidate genes and top 20 candidate lncRNAs of GBM. B. Survival analysis results of three candidate lncRNAs in GSE7696 and GSE16011.
Figure 5The workflow of prioritization of risk lncRNAs through integration of disease phenotype associations
A. Array-based expression data collection and re-annotation. B. Construction of the gene and lncRNA co-expression pan-cancer network (GLCPN). C. Application of the random walk method to predict scores for all candidates according to known disease genes. D. Integration of prediction scores by disease phenotype associations and generation of disease candidate lncRNA lists for prioritization.