| Literature DB >> 16451591 |
Kuang-Yu Liu1, Jennifer Lin, Xiaobo Zhou, Stephen T C Wong.
Abstract
We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z >or= 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16451591 PMCID: PMC1866804 DOI: 10.1186/1471-2156-6-S1-S132
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Figure 1The AdaBoost algorithm.
Figure 2A full alternating decision tree (full-ADT). Rectangles represent decision nodes; ellipses are prediction nodes. The number placed before each introduced feature in the rectangle is the boosting iteration step.
Averaged number of iteration steps during the boosting in the four classification models (full alternative (full-ADT) or classic boosting decision trees (BDT) paired with exponential (Exp) or log (Log) loss function in the four populations.
| Model | Aipotu | Danacca | Karangar | NYC |
| full-ADT & exponential | 10.9 | 9.1 | 11.8 | 12.3 |
| full-ADT & log | 10.4 | 7.7 | 12.7 | 11.6 |
| BDT & exponential | 13.2 | 12.9 | 10.6 | 14.8 |
| BDT & log | 14.2 | 13.9 | 14 | 13.9 |
Relevant phenotypes in the two classification models for the four populations ordered by iteration scores and abundance scores.
| Model | Aipotu | Danacca | Karangar | NYC |
| Iteration scores | ||||
| full-ADT & exponential | f, h, e, b, g, c, d | b, e, f, h | g, f, d, e, h, c | h, f, e, b, c, d, g |
| full-ADT & log | f, h, e, b, g, c, d | b, e, f, h | g, f, d, e, h, c | h, f, e, b, c, d, g |
| Abundance scores | ||||
| full-ADT & exponential | e, f, h, b, c, d, g | h, b, e, f | d, c, g, e, h, f | e, d, f, c, b, h, g |
| full-ADT & Log | e, g, h, c, f, d, b | h, e, b, f | d, g, c, h, e, f | d, c, e, b, f, g, h |
Haseman-Elston Z-scores on markers closest to disease-related loci for the two classification models in the four populations
| Population | Model | Chromosomal regions | ||||||
| 1 | 3 | 5 | 9 | 2 | 10 | |||
| Aipotu | full-ADT & Exp | Marker | C01R0052 | C03R0280 | C05R0379 | |||
| Position | 169.60 | 295.89 | 3.07 | |||||
| Z-score | 3.13 | 3.41 | 2.96 | |||||
| full-ADT & Log | Marker | C01R0052 | C03R0280 | C05R0379 | ||||
| Position | 169.60 | 295.89 | 3.07 | |||||
| Z-score | 3.19 | 3.57 | 2.78 | |||||
| Danacca | full-ADT & Exp | Marker | C01R0052 | C03R0281 | --b | -- | -- | -- |
| Position | 169.60 | 298.31 | -- | -- | -- | -- | ||
| Z-score | 5.62 | 3.16 | -- | -- | -- | -- | ||
| full-ADT & Log | Marker | C01R0052 | C03R0281 | -- | -- | -- | -- | |
| Position | 169.60 | 298.31 | -- | -- | -- | -- | ||
| Z-score | 5.66 | 3.55 | -- | -- | -- | -- | ||
| Karangar | full-ADT & Exp | Marker | C03R0281 | C05R0380 | C09R0765 | |||
| Position | 298.31 | 5.74 | 5.83 | |||||
| Z-score | 4.18 | 3.65 | 4.28 | |||||
| full-ADT & Log | Marker | C03R0281 | C05R0380 | C09R0765 | ||||
| Position | 298.31 | 5.74 | 5.83 | |||||
| Z-score | 4.14 | 3.76 | 4.27 | |||||
| NYC | full-ADT & Exp | Marker | C01R0052 | C03R0278 | C05R0380 | |||
| Position | 169.60 | 290.37 | 5.74 | |||||
| Z-score | 2.20 | 4.56 | 2.21 | - | - | |||
| full-ADT & Log | Marker | C01R0052 | C03R0278 | C05R0380 | ||||
| Position | 169.60 | 290.37 | 5.74 | |||||
| Z-score | 2.32 | 4.72 | 2.19 | - | - | |||
aSNP markers with false negative evidence are in italics.
b--, no disease related loci on the chromosome according to the disease model given by GAW14 Problem 2.