Literature DB >> 27294121

Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks.

Xiao Wang1, Qinke Peng1, Yue Fan1.   

Abstract

Studies for the association between diseases and informative single nucleotide polymorphisms (SNPs) have received great attention. However, most of them just use the whole set of useful SNPs and fail to consider the SNP-SNP interactions, while these interactions have already been proven in biology experiments. In this paper, we use a binary particle swarm optimization with hierarchical structure (BPSOHS) algorithm to improve the effective of PSO for the identification of the SNP-SNP interactions. Furthermore, in order to use these SNP interactions in the susceptibility analysis, we propose an emotional neural network (ENN) to treat SNP interactions as emotional tendency. Different from the normal architecture, just as the emotional brain, this architecture provides a specific path to treat the emotional value, by which the SNP interactions can be considered more quickly and directly. The ENN helps us use the prior knowledge about the SNP interactions and other influence factors together. Finally, the experimental results prove that the proposed BPSOHS_ENN algorithm can detect the informative SNP-SNP interaction and predict the breast cancer risk with a much higher accuracy than existing methods.

Entities:  

Mesh:

Year:  2016        PMID: 27294121      PMCID: PMC4879248          DOI: 10.1155/2016/5164347

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

Breast cancer is a major cause of death among women. Some genes (e.g., BRCA1 and BRCA2) have already been known as the cause of breast cancer [1]. However, only 5% case of breast cancer has these mutations; in this way, these symbols failed to be used for most of women. Fortunately, besides, on these rare mutations, increasing evidence shows that the risk of breast cancer can be measured by the SNPs, which is one of the most common types of mutations for human being [2-4]. Furthermore, with the development of SNP microarray and genome-wide associations studies (GWAS), the research about breast cancer and SNPs becomes more popular. For complex disease like breast cancer, the effect of an individual SNP is small. Researchers generally focus on the joint genetic effect of SNP combinations which may increase susceptibility to the cancer. However, most of studies just care about the whole set of disease related SNPs and treat all the SNPs in the dataset equally [5]. In this way, they fail to consider the small scale interaction among SNPs. On the other hand, increasing evidence proves that SNP-SNP interactions exist [6-8]. To change this situation, a method to use the SNP-SNP interaction in breast cancer susceptibility identification is in need. With the development of microarray, one of the most important challenges to detect SNP-SNP interactions is the complex combination of data with increasing SNPs number. In order to improve the effectiveness to identify these interactions, researchers try to use many different algorithms, for example, [9] uses the Genetic Algorithm (GA), [10] uses the Particle Swarm Optimization (PSO) algorithm, and [11] uses the Polymorphism Interaction Analysis (PIA). These methods have the ability to detect SNP interactions in high dimensional dataset; however, due to the random generator initial values and optimizing process, they generally need lots of iterations and are easily trapped into the local optima. So an improved algorithm for solving this puzzle is essential. Here, we propose BPSOHS to improve the performance of identifying the SNP interactions. Inspired by the decision-making process in the real society, we divide the particle swarm as “leader” part and “follower” part, which have different evolutionary strategies. Our former work proved that this novel method is faster than other swarm intelligent algorithms and much easier to converge to the globally optimal solution [12]. So, by this improved PSO, after encoding and matching, we can improve the performance of identifying the SNP interactions related to the cancer. The defined SNP-SNP interaction is an important tendency factor about the breast cancer [13]. However, there is still no specific method to use this kind of information into the cancer susceptibility. ENN is a novel method inspired by the emotional process of the human brain, which is usually used in the face recognition [14, 15]. In this paper, by designing the emotional value based on the SNP interactions, we explore the range of ENN and use it for susceptibility analysis. So, in this novel ENN, the SNP interaction features can be regarded as the prior tendency of classification and influence the result of classification, just like what emotions do in the process of people making choices. In the end, we propose a pipeline for the selection and usage of SNP-SNP interactions. At the beginning, the BPSOHS algorithm is used to identify the informative SNP combinations related to the cancer. Then, we transform these new features as an additional vector of each sample. Finally, by treating the vector as tendency emotional value in the ENN, the novel network can particularly consider the SNP-SNP interaction in a more directly and quickly way. According to the output of the networks, our method can measure the breast cancer risk for each sample. The case-control study in 10000 people suggests that our pipeline can detect the useful SNP interactions and effectively consider them in susceptibility analysis to breast cancer.

2. Material and Methods

2.1. Dataset Preparation

The datasets we use in this paper were obtained from the breast cancer case-control study in [16]. This dataset has 5000 controls and 5000 cases, with the SNPs selected by the biological research. In order to be convenient for the further discussion, we assume that the dataset has m samples and each sample has n SNPs. The SNP value can be represented as ∑ = {1,2, 3}, where 1 represents the major homozygous sites, 2 represents the minor homozygous sites, and 3 represents the heterozygous sites. Then, S = (s , s ,…, s ) denote an individual sample and D = (S 1, S 2,…, S ) represent the whole dataset.

2.2. Particle Swarm Optimization

PSO is a swarm algorithm developed by Kennedy and Eberhart [17]. For basic PSO, each solution is corresponding to the position of a particle, and the velocity and direction of each particle can be adjusted according to both its own parameter and the best particles in the swarm. The idea of this process is particles “flying” through a multidimensional possible search space and looking for the global optimum. By both individual memory and global memory, a particle calculates and moves to its next position. The basic elements of PSO are described as follows:where v and x are the velocity and position of the ith particle in jth bit, r 1, r 2 are the random numbers within [0,1], and c 1, c 2 are the acceleration constants. p is the best position of the ith particle in jth bit, and p is the jth bit of best position of the best particle in the swarm. The velocity v is within [−V max, V max] to make sure that the particle is flying in the range of possible solution space.

2.3. Binary Particle Swarm Optimization with Hierarchical Structure

The basic PSO is just designed for the continuous problems. In order to make it more widely used, in 1997, Kennedy and Eberhart propose BPSO (binary particle swarm optimization) to use in the discrete problems [18]. Different from the basic conventional PSO, in BPSO, each bit of a particle can just move in a state space to 0 or 1 as in the following function: In this function, S(v ) = 1/(1 + e −) denote the probability of x choosing 1 and rand() is a random real number within [0,1]. In basic PSO and BPSO algorithm, the particles are treated equally. However, this process works much differently from the real society. The researchers about sociology point out that the leaders in the group seem to have the stronger say in making decision. Most of people prefer to follow their leaders, so the particles seem to also have status in the swarm [19]. Inspired by this idea, we propose BPSOHS [12] with two kinds of particles: “leader” particles and “follower” particles, and they can be regarded as the “leader” and “followers” in the society. At the beginning, there are K particles defined as “leader” in the swarm, while others are “follower.” According to the fitness function, the “leader” and “follower” can switch in these two statuses, to make sure that the better particles are “leaders.” Thus, at the tth iteration, the followers can walk toward the leaders based on the following formula: where S −1(x) = ln(x/(1 − x)), L indicates the “leader,” and F indicates the “follower.” And α LIF is a parameter to limit the followers' speed. According to these formulas, when we update the position of particle, we need consider both its own position and the decisions of “leader.”

2.4. Encoding Schemes and Performance Measurement

In BPSOHS, the dth SNPs in ith particle can be represented by two bits: X = (S , S ). S and S can be 0 and 1, so a SNP can be represented as follows: In this way, the combination of (S , S ) can represent the four different states of the SNPs. The ith particle can be described by For example, if X = {(0,0), (1,1), (0,0), (0,1)}, this particle represents that we choose the 2nd and 4th SNP in genotypes 3 and 1. Thus, the phenotype of samples can be described by the particles. In this BPSOHS process, the value of fitness function presents the importance of particle. In the susceptibility analysis, we employ the SNP-SNP interactions to influence the tendency of the classification; thus, the ratio difference between the cases and controls is useful. Referring to [9], we use the following formula to measure the importance of SNP interactions: where n denote the elements number in a dataset, while case ∩C is the dataset subset in the breast cancer case group with specific interaction C , and control∩C is the dataset subset in the control group with specific interaction C . By using this fitness function and BPSOHS, we get some SNP-SNP interactions in significant association with breast cancer; the details of some of them are shown in Table 1.
Table 1

SNP interactions about breast cancer.

SNP (gene) (chromosome/position)SNP typeCase numberControl numberDifferenceOdds ratio95% CI P value
rs3020314 (ESR1) (6/152270672)rs500760 (PGR) (11/100909991)CC-AA123014041740.8360.7644–0.91350.001

rs3020314 (ESR1) (6/152270672)rs2017591 (STS) (X/7158114)CC-TC98411521680.8180.7436–0.90080.003

rs3020314 (ESR1) (6/152270672)rs2077647 (ESR1) (6/152129077)CT-GG105512131580.8350.7602–0.91700.008

rs500760 (PGR) (11/100909991)rs2017591 (STS) (X/7158114)AG-CC132614761500.8620.7896–0.94040.001

rs2077647 (ESR1) (6/152129077)rs2017591 (STS) (X/7158114)AG-TC111712631460.8510.7762–0.93330.040

rs3020314 (ESR1) (6/152270672) rs660149 (PGR) (11/100934314)rs11571171 (PGR) (11/100974887)CT-CC-TT602509931.2081.0657–1.36870.003

rs3020314 (ESR1) (6/152270672)rs500760 (PGR) (11/100909991)rs2017591 (STS) (X/7158114)CC-AA-TC5716991280.7930.7048–0.89290.012

rs6269 (COMT) (22/19949952)rs2175898 (ESR1) (6/152196952)rs660149 (PGR) (11/100934314)AA-AG-CC103511581230.8660.7877–0.95220.003

rs3020314 (ESR1) (6/152270672)rs1543404 (ESR1) (6/152428838)rs2747652 (ESR1) (6/152437016)rs9340799 (ESR1) (6/152163381)rs1709182 (ESR1) (6/152175357)rs9478249 (ESR1) (6/152194431)rs660149 (PGR) (11/100934314)rs11571171 (PGR) (11/100974887)rs858518 (SHBG) (17/7533025)rs858524 (SHBG) (17/7511287)rs2017591 (STS) (X/7158114)CT- TC- CT- AG- TT- TG- CC- TT- TC- AG-TT7167.010.8800–151.680.034
In addition to these interactions in 2 or 3 interactions, we still find some interactions with more SNPs. For example, the combination, rs3020314 (CT)-rs1543404 (TC)-rs2747652 (CT)-rs9340799 (AG)-rs1709182 (TT)-rs9478249 (TG)-rs660149 (CC)-rs11571171 (TT)-rs858518 (TC)-rs858524 (AG)-rs2017591 (TT), appears 7 times in the case group and only once in the control group.

2.5. Emotional Neural Networks Based on SNP-SNP Interactions

Neural networks are popular in the field of bioinformatics and disease susceptibility analysis. However, the basic fully connected feed forward neural networks just treat all the input in the same level. Although the powers will adjust by the importance of input data, this architecture still fails to use the inside information about data. Emotional neural networks are a novel method to deal with this problem. According to the related works [20], researches about brain characterize that there are a short path for the emotional signal in brain, which help the feedback become more directly and quickly. The simple modeling of the emotional brain is shown in Figure 1.
Figure 1

Simple modeling of emotional brain.

Referring to the works of Lotfi and Akbarzadeh-T [14], we can use the neural network in Figure 2 to simulate this architecture. In the long path from the thalamus, sensory cortex OFC to the amygdala is the path for the general input. And the short path from the thalamus directly to the amygdala is the path for the tendency emotional input.
Figure 2

Emotional neural networks for susceptibility analysis to breast cancer.

In the related works about ENN architecture [21, 22], the input of the sensory cortex and amygdala is the same, which is different from the real process of our brain. Generally, in the human emotional brain, the emotional value is the a priori tendency information [23, 24]. So, in this paper, based on the BPSOHS and the SNP-SNP interaction detection, we try to use this kind of a priori information in susceptibility analysis. As it is shown in Table 1, the specific phenotype has different emergence probability in case and control class. So these SNP-SNP interactions can influence the cancer risk of the samples. Assume that we use p SNP-SNP interactions as the emotional value; then, the tendency vector is as follows: S = (s , s ,…, s ,…, s ). According to (7), we can get the final sequence, S all = (S , S ) = (s , s ,…, s , s ,…, s ,…, s ), as follows: where s is the odds ratio of the qth interaction, which can measure its importance. Inspired by the ENN architecture focus on the image recognition [25], we design the output of the thalamus as S = s ,…, s ,…, s , and the final output of the neural networks is calculated bywhere v is the weight for the amygdala, while v , i⊆[1, n + p], is the related weight of S all, and v , i⊆[n + p + 1, n + 2p], is the related weight about the emotional value S . w is the weight for the OFC. In this architecture, these parameters are updated as in the following formula:where the learning rates are α and β. The decay rate is γ. R 1 is a binary input about reinforce, and R 0 is the reward value calculated according to the BELPIC model [26]:where In the training process, the value R 1 = 1 is used when the goal is presented, while the value R 1 = 0 is used for other situations. In the test process, the final output E is calculated according to (8). By this process, we treat the SNP interactions as the emotional value and use these tendency features to reflect the final output of the neural network. By this architecture, we get the score to measure the risk for breast cancer, and the details will be shown in the next section.

3. Results and Discussion

3.1. Evaluation Function

To measure the performance of the SNP-SNP interactions and the novel neural networks, we need an effective evaluation function. Referring to other works about classification, in this paper, we use Sn (sensitivity), Sp (specificity), Acc (accuracy), RR (risk ratio), and OR (odds ratio); they are defined as follows: where TP (True Positive) is the number of positive samples that are predicted as positive, TN (True Negative) is the number of negative samples that are predicted as negative, FP (False Positive) is the number of negative samples that are predicted as positive, and FN (False Negative) is the number of positive samples that are predicted as negative.

3.2. Experimental Results and Comparison

In order to prove that the SNP-SNP interaction feature is useful for the susceptibility analysis, we use 5-fold cross-validation to evaluate the performance of our method. We compare the performance of some popular basic neural networks (BP (Back Propagation), RBF (Radial Basis Function), and PNN (Pattern Recognition Neural networks)) with or without the novel information. The details of the experiment are shown in Table 2.
Table 2

The performance of ENN compared with basic neural networks.

MethodSnSpAccRROR
BP0.54860.48640.51751.06811.1505
RBF0.53940.51520.52731.11261.2445
PNN0.54480.50660.52571.15081.2808
BPSOHS_BP0.64620.56060.60341.47062.3302
BPSOHS_RBF0.62900.58260.60581.50692.3664
BPSOHS_PNN0.64060.58880.61471.55792.5523
BPSOHS_ENN0.72680.71060.71872.51146.5322
Figure 3 shows that these methods use the SNP-SNP interaction features to achieve better performance than those methods which do not use the small scale information. By considering this tendency information in the specific architecture, our BPSOHS_ENN gets the best performance compared with these basic methods.
Figure 3

Bar graph of the performance of different methods.

To prove that our method is suitable for this problem, we also use our method to compare with some newly published papers. Majority of these researches just give useful SNP interaction, while only part of samples has these interactions. So, to make sure of the fairness of the comparison, we filter these results based on interactions that only appeared in less than 10% of samples, and in Table 3 we compare our result with the best of the remaining ones.
Table 3

The performance of ENN compared with published methods.

MethodSnSpAccRROR
IGA [9]0.53511.081.17
IPSO [16]0.490.820.79
IBBA [27]0.1320.9400.6192.384
BPSOHS_ENN0.72680.71060.71872.51146.5322
In Table 3, IBBA method uses the dataset derived by 7 SNPs from CXCL12-related genes, and IGA and IPSO use the same dataset as this paper. The tests on the different dataset also prove the effectiveness of our method. By using the small scale feature and other features in the same neural network architecture, the process of the susceptibility analysis becomes much easier. The tendency of SNP combination can give a more reasonable initial value for the classifier. So consider this a priori information in a fast and direct way to help improve the performance of the neural network. The experimental results prove that our method has the power to consider these useful SNP-SNP interactions together, which is much better than these single barcodes. What is more, Table 3 shows that our method is competitive compared with the other published methods. Compared with them, our method has significant advantages in identifying the case and control, which is useful for us to predict and prevent the cancer.

4. Conclusions

The susceptibility analysis to disease is normally based on a group of SNPs. However, the small scale relationship in the group is rarely being mentioned. In this paper, we use the BPSOHS to pick out the neglected small scale SNP-SNPs. In order to consider the relationship about the SNP interaction, we propose specific partially connected architecture neural networks. By simulating the process of human brain deal with the emotional value, the related SNPs have the chance to work together with other normal features to calculate possibility of the samples suffering from breast cancer. According to the cancer-related SNP interactions and output of the novel neural network, we can measure the cancer risk of the samples, which is useful for us to prevent the possible cancer.
  18 in total

1.  A modified backpropagation learning algorithm with added emotional coefficients.

Authors:  Adnan Khashman
Journal:  IEEE Trans Neural Netw       Date:  2008-11

2.  Particle swarm optimization algorithm for analyzing SNP-SNP interaction of renin-angiotensin system genes against hypertension.

Authors:  Shyh-Jong Wu; Li-Yeh Chuang; Yu-Da Lin; Wen-Hsien Ho; Fu-Tien Chiang; Cheng-Hong Yang; Hsueh-Wei Chang
Journal:  Mol Biol Rep       Date:  2013-05-22       Impact factor: 2.316

3.  Identification of a combination of SNPs associated with Graves' disease using swarm intelligence.

Authors:  Bin Wei; QinKe Peng; QuanWei Zhang; ChenYao Li
Journal:  Sci China Life Sci       Date:  2011-02-14       Impact factor: 6.038

4.  Gene expression microarray classification using PCA-BEL.

Authors:  Ehsan Lotfi; Azita Keshavarz
Journal:  Comput Biol Med       Date:  2014-09-26       Impact factor: 4.589

5.  Exploring SNP-SNP interactions and colon cancer risk using polymorphism interaction analysis.

Authors:  Julie E Goodman; Leah E Mechanic; Brian T Luke; Stefan Ambs; Stephen Chanock; Curtis C Harris
Journal:  Int J Cancer       Date:  2006-04-01       Impact factor: 7.396

Review 6.  SNP array technology: an array of hope in breast cancer research.

Authors:  C C Ho; K S Mun; R Naidu
Journal:  Malays J Pathol       Date:  2013-06       Impact factor: 0.656

Review 7.  Practical emotional neural networks.

Authors:  Ehsan Lotfi; M-R Akbarzadeh-T
Journal:  Neural Netw       Date:  2014-07-08

8.  Amplification of distant estrogen response elements deregulates target genes associated with tamoxifen resistance in breast cancer.

Authors:  Pei-Yin Hsu; Hang-Kai Hsu; Xun Lan; Liran Juan; Pearlly S Yan; Jadwiga Labanowska; Nyla Heerema; Tzu-Hung Hsiao; Yu-Chiao Chiu; Yidong Chen; Yunlong Liu; Lang Li; Rong Li; Ian M Thompson; Kenneth P Nephew; Zelton D Sharp; Nameer B Kirma; Victor X Jin; Tim H-M Huang
Journal:  Cancer Cell       Date:  2013-08-12       Impact factor: 31.743

9.  An improved PSO algorithm for generating protective SNP barcodes in breast cancer.

Authors:  Li-Yeh Chuang; Yu-Da Lin; Hsueh-Wei Chang; Cheng-Hong Yang
Journal:  PLoS One       Date:  2012-05-18       Impact factor: 3.240

10.  The SNP rs6500843 in 16p13.3 is associated with survival specifically among chemotherapy-treated breast cancer patients.

Authors:  Rainer Fagerholm; Marjanka K Schmidt; Sofia Khan; Sajjad Rafiq; William Tapper; Kristiina Aittomäki; Dario Greco; Tuomas Heikkinen; Taru A Muranen; Peter A Fasching; Wolfgang Janni; Richard Weinshilboum; Christian R Loehberg; John L Hopper; Melissa C Southey; Renske Keeman; Annika Lindblom; Sara Margolin; Arto Mannermaa; Vesa Kataja; Georgia Chenevix-Trench; Diether Lambrechts; Hans Wildiers; Jenny Chang-Claude; Petra Seibold; Fergus J Couch; Janet E Olson; Irene L Andrulis; Julia A Knight; Montserrat García-Closas; Jonine Figueroa; Maartje J Hooning; Agnes Jager; Mitul Shah; Barbara J Perkins; Robert Luben; Ute Hamann; Maria Kabisch; Kamila Czene; Per Hall; Douglas F Easton; Paul D P Pharoah; Jianjun Liu; Diana Eccles; Carl Blomqvist; Heli Nevanlinna
Journal:  Oncotarget       Date:  2015-04-10
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.