| Literature DB >> 24228248 |
Ching Lee Koo1, Mei Jing Liew, Mohd Saberi Mohamad, Abdul Hakim Mohamed Salleh.
Abstract
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.Entities:
Mesh:
Year: 2013 PMID: 24228248 PMCID: PMC3818807 DOI: 10.1155/2013/432375
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Types of gene-gene interactions [1].
Figure 2Structure of biological neuron.
Figure 3Basic neural model.
Figure 4Classification of input generated by perceptron.
Figure 5Neural network with one hidden layer of neurons.
Summary of detect gene-gene interaction using neural network method.
| No. | Author | Dataset | Description |
|---|---|---|---|
| (1) | Ritchie et al. [ | Epistasis model. | GPNN and BPNN were used to model gene-gene interactions by using simulated data. The simulated data contains functional SNPs and nonfunctional SNPs which model the interaction between genes. |
|
| |||
| (2) | Tomita et al. [ | Childhood allergic asthma (CAA). | Artificial neural network was utilized with parameter decreasing method in order to analyse susceptible SNPs among the Japanese people. |
|
| |||
| (3) | Keedwell and Narayanan [ | Artificial data experiments, rat spinal cord and yeast | Genetic algorithm which was implemented along with neural networks discovers gene-gene interactions in temporal gene expression dataset by elucidating the information between regulatory connections and interactions between genes, proteins, and other gene products. |
|
| |||
| (4) | Motsinger et al. [ | Parkinson's disease. | GPNN had been used to optimize the architecture of neural network. This method can be used to enhance the identification of gene combinations associated with Parkinson's disease. |
|
| |||
| (5) | Ritchie et al. [ | Alzheimer's disease, breast's disease, colorectal disease, and prostate's disease. | GPNN had been used to detect gene-gene interactions and gene-environment interaction in studies of human disease to optimize the architecture of Neural Network by using simulated dataset. |
|
| |||
| (6) | Motsinger-Reif et al. [ | Epitasis model. | GENN was utilized to discover gene-gene interactions that caused are by noise (for instance, genotyping error, missing data, phenocopy, and genetic heterogeneity) in high dimensional genetic epidemiological data. |
|
| |||
| (7) | Günther et al. [ | Two-locus disease models, multiplicative and epistasis model. | NN had been used in simulation study to model the different kind of two-locus disease model by constructing six neural networks. |
|
| |||
| (8) | Turner et al. [ | Simulated human. | ATHENA had been used to discover the gene-gene interactions that influence complex human traits by integrating alternative tree-based crossover, back propagation, and domain knowledge in ATHENA. |
|
| |||
| (9) |
Hardison and Motsinger-Reif [ | Genetic models. | QTGENN had applied GENN methods to quantitative traits in various types of simulated genetic models. This method had been successfully applied in single-locus models and two-locus models. |
Figure 6Linear SVM with maximum-margin hyperplane.
Figure 7Input space are mapping into feature space by using kernels method.
Summary of detect gene-gene interaction using support vector machine method.
| No. | Author | Dataset | Description |
|---|---|---|---|
| (1) | Matchenko-Shimko and Dubé [ | Simulated disease. | Both SVM and artificial neural network (ANN) were used to preselect the combination of SNP to test the importance of potential interactions between genes in complex disease. |
|
| |||
| (2) | Chen et al. [ | Real prostate cancer genotyping. | SVM was applied in different kinds of combinatorial optimization methods which were recursive feature addition, recursive feature elimination, local search, and genetic algorithm. |
|
| |||
| (3) | Özgür et al. [ | Prostate cancer. | Automatic method that was proposed to extract known genes-disease and infer unknown gene-disease association by using automatic literature mining based on dependency parsing and support vector machines. |
|
| |||
| (4) | Shen et al. [ | Parkinson disease. | Authors had employ two-stage method by using SVM with L1 penalty to detect gene-gene interactions for human complex disease. |
|
| |||
| (5) | Ban et al. [ | Type 2 diabetes mellitus-related genes. | SVM was used to predict the importance of gene-gene interactions in T2D in the studies of Korean cohort studies. |
|
| |||
| (6) | Missiuro [ |
| SVM was utilized in this research to detect interactions between gene in kinase families for |
|
| |||
| (7) | Fang and Chiu [ | COGA (genetics of alcoholism). | SVM-based PGMDR was introduced to study the interactions of gene-gene and gene-covariate in the presence or absence of main effects of genes. |
|
| |||
| (8) | Zhang et al. [ | Human cancer. | Binary matrix shuffling filter (BMSF) as an efficient SVM search schemes was integrated with SVM to classify cancer tissue samples. |
|
| |||
| (9) | Marvel and Motsinger-Reif [ | Disease model, M1 and M2. | GESVM was applied in large dataset to select important features, parameters, or kernel in SVM. |
Summary of detect gene-gene interaction using random forest method.
| No. | Author | Dataset | Description |
|---|---|---|---|
| (1) | Lunetta et al. [ | H2M2, H4M2, H8M2, H16M2, H4M4, and H8M4. | RF as a screening procedure to identify top-ranked true-associated SNPs which can cause disease without losing any interactions. |
|
| |||
| (2) | Jiang et al. [ | Three simulated disease model. | RF is used to recognize the cases that were against controls and to obtain the Gini importance which is used to measure the contribution of each SNP to the classification performance. |
|
| |||
| (3) | Schwarz et al. [ | Crohn's disease. | A new method of RJ based on basis RF knowledge was developed to facilitate a fast processing in the high-dimensional of genome-wide analysis data of gene-gene interactions. |
|
| |||
| (4) | Liu et al. [ | NARAC1 and NARAC2. | RF is used to detect contributed gene-gene interactions for identifing RA susceptibility and to identify SNPs of RA patients to classify them into anticyclic citrullinated protein positive and healthy controls. |
|
| |||
| (5) | Winham et al. [ | Five models. | Focus on identifing rarely gene-gene interactions and detecting gene-gene interaction effects and their potential effectiveness on high-dimensional data using RF. |
|
| |||
| (6) | Pan et al. [ | Bladder cancer. | The proposed method of MINGRF is proposed to improve the performance of RF such as accuracy and computational time. |
|
| |||
| (7) | Staiano et al. [ | Familial combined hyperlipidemia (FCH). | RF is used to identify gene-gene interactions that are involved in FCH. FCH increase the plasma triglycerides and/or total cholesterol level of patients and hence increase the risk of coronary heart disease. |
|
| |||
| (8) | Chen and Ishwaran [ | Colon cancer and ovarian cancer. | RSF as new hunting pathway to detect gene correlation and genomic interactions from a high-dimensional genomic data. |
Strengths and weaknesses of neural networks, support vector machine, and random forests methods for detect gene-gene interactions.
| Methods | Author | Strengths | Weaknesses |
|---|---|---|---|
| Neural network | Musani et al. [ | (i) NN is able to model the relationship between disease and single nucleotide polymorphism (SNP) | (i) Presence of black box |
|
| |||
| Support vector machine | Chen et al. [ | (i) SVM can deal with high dimension data set | (i) Presence of black box |
|
| |||
| Random forest (RF) | Upstill-Goddard et al. [ | (i) RF does not exhibit strong main effects which uncover interactions among genes. | (i) Presence of black box |
| Random jungle (RJ) | (i) RJ is able to analyze data on a genome-wide scale. | If the main effects are weak, RJ fails to detect interactions. | |