| Literature DB >> 36046233 |
Xinpeng Guo1,2, Jinyu Han3, Yafei Song2, Zhilei Yin1, Shuaichen Liu4, Xuequn Shang1.
Abstract
Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype-phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics' internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes.Entities:
Keywords: SNP; eQTL; expression quantitative trait loci; gene; genotype-phenotype; graph-embedded deep neural network
Year: 2022 PMID: 36046233 PMCID: PMC9421127 DOI: 10.3389/fgene.2022.921775
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1G-EDNN model diagram based on expression quantitative trait loci data. The intergenic correlation network is embedded into a DNN to form the final G-EDNN model.
FIGURE 2Comparison of the performance of G-EDNN with DNN, E-DNN, GEDFN, and other methods using receiver operating characteristics curves. (A) shows the results for GSE28127, and (B) shows the results for GSE95496. The legend contains the AUC values of each method on the data sets. Our method has the highest AUC value, indicating that our method outperforms the other methods.
FIGURE 3Comparative receiver operating characteristics curves of G-EDNN performance for different sample sizes. The left side shows the results for GSE28127, and the right side shows the results for GSE95496. The legend depicts the sample size and the AUC value for each sample size on the data. As the sample size decreases, corresponding AUC values also decrease.
PS values of the top 20 pathways.
| SNP | Gene | PS | Presence in PhenoScanner | |
|---|---|---|---|---|
| 1 | rs6564261 | CFDP1 | 383.11 | Yes |
| 2 | rs11915851 | ITIH3 | 341.02 | No |
| 3 | rs17304995 | RFT1 | 302.16 | Yes |
| 4 | rs35671032 | PRKCD | 274.37 | No |
| 5 | rs1178032 | CENPB | 270.12 | No |
| 6 | rs59895335 | PRKCE | 259.24 | Yes |
| 7 | rs10781976 | BCAR1 | 257.39 | Yes |
| 8 | rs113487987 | DPYD | 252.41 | Yes |
| 9 | rs76214357 | ITIH1 | 250.26 | Yes |
| 10 | rs8100824 | LRRC25 | 243.75 | No |
| 11 | rs116793674 | MYL7 | 243.05 | No |
| 12 | rs12652555 | ERAP1 | 238.23 | No |
| 13 | rs56063308 | MAP3K7 | 238.05 | Yes |
| 14 | rs217361 | TMED4 | 237.97 | No |
| 15 | rs118052674 | CENPB | 225.43 | No |
| 16 | rs72697033 | RFX3 | 223.58 | Yes |
| 17 | rs1041608 | WDR5 | 221.85 | No |
| 18 | rs117104394 | NISCH | 220.17 | No |
| 19 | rs1471483 | MMRN1 | 220.02 | No |
| 20 | rs117259301 | VPS16 | 218.36 | Yes |