Xinpeng Guo1,2, Yafei Song2, Shuhui Liu1, Meihong Gao1, Yang Qi1, Xuequn Shang3. 1. School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China. 2. School of Air and Missile Defense, Air Force Engineering University, Xi'an, 710051, People's Republic of China. 3. School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China. shang@nwpu.edu.cn.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. RESULTS: To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. CONCLUSIONS: We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.
BACKGROUND: Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. RESULTS: To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. CONCLUSIONS: We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.
Authors: Marylyn D Ritchie; Emily R Holzinger; Ruowang Li; Sarah A Pendergrass; Dokyoon Kim Journal: Nat Rev Genet Date: 2015-01-13 Impact factor: 53.242
Authors: Marianne L Slaten; Yen On Chan; Vivek Shrestha; Alexander E Lipka; Ruthie Angelovici Journal: Bioinformatics Date: 2020-11-01 Impact factor: 6.937
Authors: Yang Wu; Jian Zeng; Futao Zhang; Zhihong Zhu; Ting Qi; Zhili Zheng; Luke R Lloyd-Jones; Riccardo E Marioni; Nicholas G Martin; Grant W Montgomery; Ian J Deary; Naomi R Wray; Peter M Visscher; Allan F McRae; Jian Yang Journal: Nat Commun Date: 2018-03-02 Impact factor: 14.919