| Literature DB >> 16026601 |
Qian Xie1, Luke D Ratnasinghe, Huixiao Hong, Roger Perkins, Ze-Zhong Tang, Nan Hu, Philip R Taylor, Weida Tong.
Abstract
BACKGROUND: Systematic evaluation and study of single nucleotide polymorphisms (SNPs) made possible by high throughput genotyping technologies and bioinformatics promises to provide breakthroughs in the understanding of complex diseases. Understanding how the millions of SNPs in the human genome are involved in conferring susceptibility or resistance to disease, or in rendering a drug efficacious or toxic in the individual is a major goal of the relatively new fields of pharmacogenomics. Esophageal squamous cell carcinoma is a high-mortality cancer with complex etiology and progression involving both genetic and environmental factors. We examined the association between esophageal cancer risk and patterns of 61 SNPs in a case-control study for a population from Shanxi Province in North Central China that has among the highest rates of esophageal squamous cell carcinoma in the world.Entities:
Mesh:
Year: 2005 PMID: 16026601 PMCID: PMC1637030 DOI: 10.1186/1471-2105-6-S2-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the method of Decision Forest for SNPs (DF-SNPs). There are several components in DF-SNPs: (1) Data pre-processing; (2) Tree development; (3) Multiple tree development; (4) Tree combination; and (5) statistical test. DF-SNPs not only produces a classifier but also identifies the significant SNPs, SNP types and SNP patterns.
Figure 2Overview of a decision tree. A binary tree starts from the root node. The best SNP genotype is selected at each node to separate the cases and controls using an "IF-THEN" rule. SNP types used in a path of an "IF-THEN" rule forms a SNP pattern.
Figure 3Misclassifications versus the number of combined trees in a DF-SNPs model. The number of combined trees correlates negatively with the misclassifications in DF-SNPs. The 10-tree model reduces the misclassifications significantly compared to the first tree.
Statistically significant SNPs (P < 0.05) relevant to esophageal squamous cell carcinoma. In 2000 runs of 10-fold cross-validation, nine SNPs were found to be the esophageal squamous cell carcinoma relevant. They had frequencies greater than the critical value of 0.028 at the 5% level of significance (P < 0.05).
| No | SNPs | Frequency | Gene function | |
| Gene | RS number | |||
| 1 | GADD45B | E1122 | 0.186 | DNA repair |
| 2 | NQO1 | rs1800566 | 0.069 | Phase II |
| 3 | adh1b_55 | 0.064 | Alcohol-related | |
| 4 | ERCC5 | rs17655 | 0.064 | DNA repair |
| 5 | COMT | rs4818 | 0.052 | Phase II |
| 6 | GADD45B | rs14384 | 0.049 | DNA repair |
| 7 | ercc5_55 | 0.048 | DNA repair | |
| 8 | CYP1A1 | rs1048943 | 0.035 | Phase I |
| 9 | GPX1_ | rs1800668 | 0.030 | Phase II |
RS – Reference SNP
Statistically significant SNP types relevant to esophageal squamous cell carcinoma. In 2000 runs of 10-fold cross-validation, 14 SNP types are found to be the esophageal squamous cell carcinoma relevant. They had frequencies greater than the critical value of 0.0095 at the 5% level of significance (P < 0.05). By applying these SNP types to the SNP data set, the first five SNP types have 95% confidence intervals (CIs) whose upper and lower limits not cross with OR = 1.
| No | OR | 95% CI | SNP site | ||
| Gene | RS number | Genotype | |||
| 1 | 1.66 | 1.16 – 2.37 | GADD45B | E1122 | Homozygous common |
| 2 | 1.50 | 1.05 – 2.15 | ercc5_55 | Homozygous common | |
| 3 | 1.44 | 1.00 – 2.06 | COMT | rs4818 | Homozygous common |
| 4 | 0.55 | 0.38 – 0.79 | GADD45B | E1122 | Heterozygous |
| 5 | 0.45 | 0.23 – 0.87 | NQO1 | rs1800566 | Homozygous variant |
| 6 | 1.64 | 0.70 – 3.87 | ERCC5 | rs17655 | Homozygous variant |
| 7 | 1.57 | 0.76 – 3.26 | GPX1 | rs1800668 | Homozygous variant |
| 8 | 1.43 | 0.98 – 2.09 | NQO1 | rs1800566 | Heterozygous |
| 9 | 1.05 | 0.73 – 1.52 | ERCC2 | rs1052559 | Heterozygous |
| 10 | 1.00 | 0.51 – 1.97 | GADD45B | rs14384 | Homozygous variant |
| 11 | 0.99 | 0.51 – 1.91 | CYP1A1 | rs1048943 | Homozygous variant |
| 12 | 0.87 | 0.60 – 1.26 | cyp1a2_5 | Heterozygous | |
| 13 | 0.83 | 0.57 – 1.20 | CYP1A2 | rs2472304 | Heterozygous |
| 14 | 0.59 | 0.27 – 1.32 | adh1b_55 | Heterozygous | |
n-SNP patterns identified by DF-SNPs (n = 1, 2, ...,20).
| Pattern Length | Total number of patterns | Number of patterns above the critical value | Number of patterns with differentiating ability | Critical Value |
| 1 | 72 | 14 | 5 | 0.0095223 |
| 2 | 578 | 52 | 15 | 0.0002364 |
| 3 | 5558 | 3560 | 379 | 0.0000375 |
| 4 | 28562 | 5015 | 204 | 0.0000180 |
| 5 | 88124 | 10898 | 169 | 0.0000119 |
| 6 | 175401 | 15529 | 59 | 0.0000095 |
| 7 | 255460 | 16639 | 8 | 0.0000088 |
| 8 | 291469 | 14217 | 0 | 0.0000094 |
| 9 | 265793 | 9738 | 0 | 0.0000117 |
| 10 | 205505 | 4512 | 0 | 0.0000173 |
| 11 | 138306 | 1853 | 0 | 0.0000291 |
| 12 | 81279 | 560 | 0 | 0.0000536 |
| 13 | 41785 | 241 | 0 | 0.0001036 |
| 14 | 18557 | 70 | 0 | 0.0002274 |
| 15 | 7047 | 40 | 0 | 0.0005389 |
| 16 | 2436 | 16 | 0 | 0.0013277 |
| 17 | 735 | 15 | 0 | 0.0031564 |
| 18 | 180 | 17 | 0 | 0.0092945 |
| 19 | 38 | 11 | 0 | 0.0300700 |
| 20 | 9 | 3 | 0 | 0.0997076 |
Statistically significant 2-SNP patterns relevant to esophageal squamous cell carcinoma. In 2000 runs of 10-fold cross-validation, 52 2-SNP patterns were found to be esophageal cancer relevant. They had frequencies greater than the critical value of 0.00024 at the 5% level of significance (P < 0.05). By applying these 2-SNP patterns to the SNP data set, 15 2-SNP patterns have 95% confidence intervals (CIs) whose upper and lower limits not cross with OR = 1.
| No | OR | 95% CI | SNP site 1 | SNP site 2 | ||||
| Gene | RS number | Genotype | Gene | RS number | Genotype | |||
| 1 | 2.33 | 1.29 – 4.19 | NQO1 | rs1800566 | Heterozygous | GADD45B | E1122 | Homozygous common |
| 2 | 2.09 | 1.41 – 3.09 | GADD45B | E1122 | Homozygous common | COMT | rs4818 | Homozygous common |
| 3 | 2.03 | 1.37 – 3.00 | ercc5_55 | Homozygous common | GADD45B | E1122 | Homozygous common | |
| 4 | 0.61 | 0.38 – 0.98 | cyp1a2_5 | Heterozygous | GADD45B | E1122 | Heterozygous | |
| 5 | 0.60 | 0.39 – 0.94 | GADD45B | E1122 | Heterozygous | ERCC2 | rs1799787 | Heterozygous |
| 6 | 0.60 | 0.38 – 0.94 | GADD45B | E1122 | Heterozygous | COMT | rs4818 | Heterozygous |
| 7 | 0.58 | 0.37 – 0.92 | GADD45B | E1122 | Heterozygous | CYP1A2 | rs2472304 | Heterozygous |
| 8 | 0.58 | 0.37 – 0.89 | GSTM3 | rs1537234 | Heterozygous | GADD45B | E1122 | Heterozygous |
| 9 | 0.57 | 0.36 – 0.90 | GADD45B | E1122 | Heterozygous | COMT | rs4818 | Homozygous common |
| 10 | 0.57 | 0.37 – 0.87 | LIG1 | rs20579 | Heterozygous | GADD45B | E1122 | Heterozygous |
| 11 | 0.54 | 0.35 – 0.84 | GADD45B | E1122 | Heterozygous | ERCC2 | rs1052559 | Homozygous common |
| 12 | 0.45 | 0.28 – 0.72 | GADD45B | rs14384 | Homozygous common | GADD45B | E1122 | Heterozygous |
| 13 | 0.40 | 0.26 – 0.63 | GADD45B | E1122 | Heterozygous | ERCC5 | rs17655 | Homozygous common |
| 14 | 0.22 | 0.06 – 0.73 | adh1b_55 | Homozygous variant | GADD45B | E1122 | Heterozygous | |
| 15 | 0.16 | 0.04 – 0.62 | NQO1 | rs1800566 | Homozygous variant | GADD45B | E1122 | Homozygous common |