| Literature DB >> 15339344 |
Yasuyuki Tomita1, Shuta Tomida, Yuko Hasegawa, Yoichi Suzuki, Taro Shirakawa, Takeshi Kobayashi, Hiroyuki Honda.
Abstract
BACKGROUND: Screening of various gene markers such as single nucleotide polymorphism (SNP) and correlation between these markers and development of multifactorial disease have previously been studied. Here, we propose a susceptible marker-selectable artificial neural network (ANN) for predicting development of allergic disease.Entities:
Mesh:
Year: 2004 PMID: 15339344 PMCID: PMC518959 DOI: 10.1186/1471-2105-5-120
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Structure of ANN. For the analysis of 25 SNPs, 50 input layer units were provided. The number of hidden layer units was changed from the usual 6 to 10, to optimize the ANN for the highest possible prediction accuracy. The output layer had only 1 unit. Because the ANN model has connection weight parameters, which depend on the number of connection units, analysis of 25 SNPs with 6 hidden layer units requires 306 connection weight parameters.
Polymorphisms used in the present study.
| Gene | Polymorphism | D.F.a | |
| -509 C/T | 2 | 0.6299 | |
| -571 C/A | 2 | 0.1074 | |
| -590 C/T | 2 | 0.9085 | |
| 148 G/A (Val50Ile) | 2 | 0.0155 | |
| 924 T/C (synonym) | 1 | 0.5603 | |
| 265 A/G (Arg16Gly) | 2 | 0.0541 | |
| 2964 G/A | 2 | 0.7881 | |
| 811 A/G (Glu237Gly) | 1 | 0.5382 | |
| 2860 G/A | 2 | 0.9623 | |
| -580 T/C | 2 | 0.7868 | |
| 108 C/A | 1 | 0.0036 | |
| 2534 A/G | 2 | 0.5664 | |
| 1146 C/A | 2 | 0.1154 | |
| 2446 A/C (Ile816Leu) | 2 | 0.7988 | |
| 3214 T/A (Cys1072Ser) | 2 | 0.2997 | |
| 3473 C/T (Pro1158Leu) | 2 | 0.2779 | |
| 1155 A/G | 2 | 0.1056 | |
| 4266 G/A | 2 | 0.0581 | |
| 912 G/A | 2 | 0.5020 | |
| 1692 G/A | 2 | 0.6993 | |
| 4896 C/T | 2 | 0.7205 | |
| 1289 C/A | 2 | 0.1211 | |
| 1337 C/T | 1 | 0.2398 | |
| 1526 G/A | 1 | 0.7366 | |
| 329 G/A (Arg110Gln) | 2 | 0.7924 |
aDegree of freedom (see text). bAssociation between SNPs and CAA was evaluated as P-value, which was calculated with χ2 test.
Figure 2Diagnostic prediction of ANN (a) and LR (b) using 25 SNPs, and prediction of ANN (c) and LR (d) using the 10 selected SNPs. Prediction results of evaluation data are presented. Gray and white bars represent frequency of case subjects and control subjects, respectively.
Accuracy, sensitivity and specificity of ANN and LR.
| a | |||
| ANN | accuracy [%] | sensitivity [%] | specificity [%] |
| learning | 98.8 | 99.1 | 98.4 |
| evaluation | 73.3 | 74.4 | 72.1 |
| LR | accuracy [%] | sensitivity [%] | specificity [%] |
| learning | 68.8 | 69.2 | 68.3 |
| evaluation | 48.3 | 45.3 | 51.2 |
| b | |||
| ANN | accuracy [%] | sensitivity [%] | specificity [%] |
| learning | 97.7 | 98.0 | 97.5 |
| evaluation | 74.4 | 77.9 | 70.9 |
| LR | accuracy [%] | sensitivity [%] | specificity [%] |
| learning | 59.4 | 57.8 | 60.9 |
| evaluation | 47.7 | 48.3 | 47.1 |
(a) with 25 SNPs.
(b) with 10 SNPs selected by PDM.
Figure 3Effect of number of input variables on ANN model accuracy during PDM procedure. Closed triangles represent average accuracy for learning and evaluation data. Closed squares represent the rate of case subjects that means N'/N. In this case, N'is the number of cases whose genotype pattern is match to control's genotype pattern at least one control (N= 172 subjects).
Ranking of SNPs selected by PDM.
| SNP | n (/5)b | point (/50)c | |
| 0.0155 | 5 | 47 | |
| 0.5664 | 5 | 38 | |
| 0.1074 | 5 | 27 | |
| 0.7205 | 4 | 27 | |
| 0.6993 | 2 | 18 | |
| 0.7988 | 5 | 17 | |
| 0.7924 | 4 | 16 | |
| 0.5020 | 3 | 16 | |
| 0.7881 | 2 | 14 | |
| 0.9085 | 2 | 13 | |
| ......................................................................................................................................................................................................................................................... | |||
| 0.2779 | 2 | 12 | |
| 0.0541 | 4 | 8 | |
| 0.2997 | 2 | 8 | |
| 0.1211 | 1 | 6 | |
| 0.0581 | 1 | 5 | |
| 0.0036 | 2 | 2 | |
| 0.7366 | 1 | 1 | |
The 10 SNPs over the dotted line were used for the following experiments.
aP-value was calculated with χ2test.
bNumber of SNPs selected within last 10 input variables during PDM procedure (5 trials performed).
cPoint of SNPs selected within last 10 input variables during PDM procedure. The score of order ranged from 1 to 10 points, based on the significance order in 1 PDM procedure and totaled in 5 trials.
Figure 4Reconstruction of ANN model using SNPs listed in Table 3. Closed triangles represent average accuracy for all data of learning and evaluation.
Selection of effective combinations evaluated with two bases P-value and ECV among the 10 selected SNPs.
| combination | 2-SNP | 3-SNP |
| 90 | 360 | |
| 13 | 72 | |
| 47/11 | 43/43 | |
| 27/10 | 25/25 | |
| 6/5 | 3/3 |
aThe number of combination that satisfies the following conditions: P-value < 0.05.
bThe number of combination that satisfies the conditions: ECV<1, 0.5 and 0.1, respectively.
cThe number of combination that satisfies the conditions: both ECV<1, 0.5, 0.1 and P-value < 0.05, respectively.
Number of effective combinations (N) and its concentration ratio.
| 2-SNP combination | ||||
| 10 SNPa:15 SNPb | 2 : 0 | 1 : 1 | 0 : 2 | |
| 90 | 300 | 210 | ||
| 10 | 23 | 3 | ||
| 0.28 | 0.64 | 0.08 | ||
| 0.15 | 0.5 | 0.35 | ||
| 1.85 | 1.28 | 0.24 | ||
| 3-SNP combination | ||||
| 10 SNPa : 15 SNPb | 3 : 0 | 2 : 1 | 1 : 2 | 0 : 3 |
| 360 | 2025 | 3150 | 1365 | |
| 25 | 82 | 64 | 39 | |
| 0.12 | 0.39 | 0.30 | 0.19 | |
| 0.05 | 0.29 | 0.46 | 0.20 | |
| 2.28 | 1.33 | 0.67 | 0.94 | |
aSelected by PDM
bNot including 10 SNP selected by PDM
cThe number of 2-SNP combination that satisfies the following conditions:P-value < 0.05 and ECV2<0.5.
The number of 3-SNP combination that satisfies the conditions: P-value < 0.05 and ECV3 < 0.5.
Two-SNP interactions among the 10 selected SNPs (P-value < 0.05 and ECV2 < 0.5)
| SNP 1 | SNP 1 genotype | SNP 2 | |||||
| TT | 0.01461 | 0.7205 | 0.6993 | 0.5038 | 0.0290 | ||
| AG+GG | 0.00030 | 0.5664 | 0.0155 | 0.0088 | 0.0344 | ||
| AA | 0.02858 | 0.6993 | 0.7205 | 0.5038 | 0.0567 | ||
| GG | 0.02345 | 0.5020 | 0.7205 | 0.3617 | 0.0648 | ||
| TT | 0.03073 | 0.7205 | 0.5020 | 0.3617 | 0.0850 | ||
| GA | 0.04917 | 0.6993 | 0.5664 | 0.3961 | 0.1241 | ||
| GG+GA | 0.01752 | 0.7881 | 0.1074 | 0.0846 | 0.2070 | ||
| TT | 0.00271 | 0.7205 | 0.0155 | 0.0112 | 0.2426 | ||
| GG | 0.00345 | 0.5020 | 0.0155 | 0.0078 | 0.4442 | ||
| CT+TT | 0.00689 | 0.9085 | 0.0155 | 0.0141 | 0.4896 |
aP-value of combination of SNP 2 with genotype consisting of SNP 1. In this case, D.F. of SNP 2 is identical to that of Table 1.
bP-value of SNP 1 calculated alone.
cP-value of SNP 2 calculated alone.
aP/(P × P) represents effective combination value (ECV2).
Figure 5Distribution of IL-4Rα (148 G/A) genotype with CysLT2 (2534 A/G) genotype AG or GG (P = 0.00030) (a), and distribution of C3 (1692 G/A) genotype with IL-10 (-571 C/A) genotype CA and IL-4 (-590 C/T) genotype CT (P = 0.00426) (b). Gray and white bars represent frequency of case subjects and control subjects, respectively.