Lingpeng Kong1, Yuanyuan Chen2, Fengjiao Xu2, Mingmin Xu1, Zutan Li1, Jingya Fang1, Liangyun Zhang3, Cong Pian4. 1. College of Agriculture, Nanjing Agricultural University, Jiangsu, 210095, Nanjing, China. 2. Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China. 3. Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China. zlyun@njau.edu.cn. 4. Department of Mathematics, College of Science, Nanjing Agricultural University, Nanjing, 210095, China. piancong@njau.edu.cn.
Abstract
BACKGROUND: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. RESULTS: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. CONCLUSIONS: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.
BACKGROUND: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. RESULTS: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. CONCLUSIONS: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.
Entities:
Keywords:
AutoEncoder; Deep learning; DeepLIFT; Landmark genes
Authors: Helena Brunel; Joan-Josep Gallardo-Chacón; Alfonso Buil; Montserrat Vallverdú; José Manuel Soria; Pere Caminal; Alexandre Perera Journal: Bioinformatics Date: 2010-06-18 Impact factor: 6.937
Authors: Alexandre Calon; Enza Lonardo; Antonio Berenguer-Llergo; Elisa Espinet; Xavier Hernando-Momblona; Mar Iglesias; Marta Sevillano; Sergio Palomo-Ponce; Daniele V F Tauriello; Daniel Byrom; Carme Cortina; Clara Morral; Carles Barceló; Sebastien Tosi; Antoni Riera; Camille Stephan-Otto Attolini; David Rossell; Elena Sancho; Eduard Batlle Journal: Nat Genet Date: 2015-02-23 Impact factor: 38.330