| Literature DB >> 22373418 |
Yuan Jiang1, Jennifer S Brennan, Rose Calixte, Yunxiao He, Epiphanie Nyirabahizi, Heping Zhang.
Abstract
Existing methods for analyzing rare variant data focus on collapsing a group of rare variants into a single common variant; collapsing is based on an intuitive function of the rare variant genotype information, such as an indicator function or a weighted sum. It is more natural, however, to take into account the single-nucleotide polymorphism (SNP) interactions informed directly by the data. We propose a novel tree-based method that automatically detects SNP interactions and generates candidate markers from the original pool of rare variants. In addition, we utilize the advantage of having 200 phenotype replications in the Genetic Analysis Workshop 17 data to assess the candidate markers by means of repeated logistic regressions. This new approach shows potential in the rare variant analysis. We correctly identify the association between gene FLT1 and phenotype Affect, although there exist other false positives in our results. Our analyses are performed without knowledge of the underlying simulating model.Entities:
Year: 2011 PMID: 22373418 PMCID: PMC3287825 DOI: 10.1186/1753-6561-5-S9-S102
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Example tree. An example tree fitted using phenotype Y and a group of SNP variables X1, X2, X3, and X4. X1 and X2 are used as the partitioning variables, yielding three leaf nodes. X1, X2, X3, and X4 are the searched SNPs, and X1 and X2 are the final SNPs used in the tree. The branches of this tree are grouped into two categories depending on the prediction value (case or control) of each leaf node. A bilevel marker is then constructed according to this categorization of branches.
Populations and corresponding clusters
| Population | Cluster | ||
|---|---|---|---|
| 1 | 2 | 3 | |
| CEPH-1 | 45 | 0 | 0 |
| CEPH-2 | 44 | 1 | 0 |
| Tuscan | 62 | 0 | 0 |
| Tuscan, additional | 4 | 0 | 0 |
| Denver Chinese | 0 | 87 | 0 |
| Denver Chinese, additional | 0 | 20 | 0 |
| Han Chinese 1 | 0 | 25 | 0 |
| Han Chinese 2 | 0 | 36 | 0 |
| Han Chinese, additional | 0 | 48 | 0 |
| Japanese 1 | 0 | 31 | 0 |
| Japanese 2 | 0 | 41 | 0 |
| Japanese, additional | 0 | 33 | 0 |
| Luhya | 0 | 0 | 90 |
| Luhya, additional | 0 | 0 | 18 |
| Yoruba 1 | 0 | 0 | 40 |
| Yoruba 2 | 0 | 0 | 47 |
| Yoruba, additional | 0 | 0 | 25 |
Figure 2SNP numbers that were searched (left panel) and finally used (right panel) in the 570 markers. The left-hand panel indicates that there were relatively few cases with more than 100 searched SNPs; thus we omit these in the plot for a more clear comparison. The right-hand panel indicates the SNPs that were used to split the tree. Note that most of the 570 markers consist of fewer than 30 SNPs, which simplifies the interpretation of the generated markers.
Ten most frequently significant markers (Bonferroni correction)
| Marker | Chromosome | Number of SNPs | Frequency (%) | Included genes |
|---|---|---|---|---|
| 405 | 13 | 18 | 10 | |
| 529 | 19 | 39 | 8 | |
| 528 | 19 | 30 | 5 | |
| 207 | 7 | 10 | 5 | |
| 439 | 16 | 21 | 4 | |
| 296 | 10 | 53 | 4 | |
| 533 | 19 | 13 | 3 | |
| 363 | 11 | 65 | 3 | |
| 361 | 11 | 79 | 3 | |
| 83 | 2 | 76 | 3 |
Ten most frequently significant markers (FDR control)
| Marker | Chromosome | Number of SNPs | Frequency (%) | Included Genes |
|---|---|---|---|---|
| 529 | 19 | 39 | 18 | |
| 405 | 13 | 18 | 17 | |
| 528 | 19 | 30 | 11 | |
| 439 | 16 | 21 | 9 | |
| 471 | 17 | 13 | 8 | |
| 361 | 11 | 79 | 8 | |
| 207 | 7 | 10 | 8 | |
| 44 | 1 | 38 | 8 | |
| 525 | 19 | 20 | 7 | |
| 461 | 17 | 17 | 7 |