| Literature DB >> 29321535 |
Renchu Guan1,2, Xu Wang1, Mary Qu Yang2, Yu Zhang1,3, Fengfeng Zhou1, Chen Yang4, Yanchun Liang5,6.
Abstract
The war on cancer is progressing globally but slowly as researchers around the world continue to seek and discover more innovative and effective ways of curing this catastrophic disease. Organizing biological information, representing it, and making it accessible, or biocuration, is an important aspect of biomedical research and discovery. However, because maintaining sophisticated biocuration is highly resource dependent, it continues to lag behind the continually being generated biomedical data. Another critical aspect of cancer research, pathway analysis, has proven to be an efficient method for gaining insight into the underlying biology associated with cancer. We propose a deep-learning-based model, Stacked Denoising Autoencoder Multi-Label Learning (SdaMLL), for facilitating gene multi-function discovery and pathway completion. SdaMLL can capture intermediate representations robust to partial corruption of the input pattern and generate low-dimensional codes superior to conditional dimension reduction tools. Experimental results indicate that SdaMLL outperforms existing classical multi-label algorithms. Moreover, we found some gene functions, such as Fused in Sarcoma (FUS, which may be part of transcriptional misregulation in cancer) and p27 (which we expect will become a member viral carcinogenesis), that can be used to complete the related pathways. We provide a visual tool ( https://www.keaml.cn/gpvisual ) to view the new gene functions in cancer pathways.Entities:
Mesh:
Year: 2018 PMID: 29321535 PMCID: PMC5762767 DOI: 10.1038/s41598-017-17842-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Feature matrix generation flowchart. The upper half illustrates the process for extracting all the gene names from the pathways in KEGG[12] and the other half shows how we select articles that embed descriptions about gene function.
Figure 2The architecture of SdaMLL. Each row of the feature matrix represents a gene. After being fed into the stacked denoising autoencoder, the original vector is tuned by removing various noises. The output of the autoencoder is then provided as the input to BP-MLL, predicting the gene assignment to the pathways.
Experiments results for all multi-label classification algorithm. Results of BP-MLL and SdaMLL with highest average precision are selected.
| Cverage precision | Ranking loss | Coverage | |
|---|---|---|---|
| KNN | 0.333 | 0.662 | 95.25 |
| Decision trees | 0.385 | 0.644 | 100.763 |
| SdaMLL |
|
|
|
| BP-MLL | 0.529 | 0.306 | 2.608 |
Figure 3The Comparison between BP-MLL and SdaMLL. ↑ denotes that the model’s performance is better when the metrics are larger and vice versa.
Figure 4Illustration of new gene functions. (a) Network connecting genes, articles and pathways. The yellow points are the pathways, light blue points are the PubMed manuscripts and blue points are functional genes. (b) Detailed description about relations. All the relationships are listed as triple-items {gene, pathway, article}. (c) Predicted results. The illustration of our predicted results about gene functions. (d) Visualizing predicted genes in KEGG. The completion results for KEGG pathways[12] are listed at the bottom of the web page.