| Literature DB >> 20467479 |
Lang Li1, Silvana Borges, Robarge D Jason, Changyu Shen, Zeruesenay Desta, David Flockhart.
Abstract
A mixture normal model has been developed to partition genotypes in predicting quantitative phenotypes. Its estimation and inference are performed through an EM algorithm. This approach can conduct simultaneous genotype clustering and hypothesis testing. It is a valuable method for predicting the distribution of quantitative phenotypes among multi-locus genotypes across genes or within a gene. This mixture model's performance is evaluated in data analyses for two pharmacogenetics studies. In one example, thirty five CYP2D6 genotypes were partitioned into three groups to predict pharmacokinetics of a breast cancer drug, Tamoxifen, a CYP2D6 substrate (p-value = 0.04). In a second example, seventeen CYP2B6 genotypes were categorized into three clusters to predict CYP2B6 protein expression (p-value = 0.002). The biological validities of both partitions are examined using established function of CYP2D6 and CYP2B6 alleles. In both examples, we observed genotypes clustered in the same group to have high functional similarities. The power and recovery rate of the true partition for the mixture model approach are investigated in statistical simulation studies, where it outperforms another published method.Entities:
Keywords: genotype/phenotype association; mixture model; pharmacogenetics
Year: 2010 PMID: 20467479 PMCID: PMC2867634 DOI: 10.4137/cin.s3493
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.Genotype/phenotype association analysis for the Tamoxifen study. A) is a raw data description. The x-axis is the NDM/Endoxifen ratio in log-scale, where both NDM and Endoxifen are Tamoxifen metabolites. The y-axis denotes the 35 CYP2D6 genotypes. B) Thirty-five genotypes are clustered into three groups by a mixture model, which are characterized by three normal distributions. The x-axis is the NDM/Endoxifen ratio in log-scale, and y-axis is the probability density. C) shows genotype cell probability assignments (s) to each of the three predicted normal mixture components, where colored bar lengths (scaled on (0,1)) indicate the value of s for each mixture component. In A), B), and C), green, blue, and red colors represent the memberships of three clusters.
Mixture model based data analyses.
| Tamoxifen study | 1 | N(−3.76, 0.15; 0.12) | *3/*41, *17/*41, *4/*4, *41/*41 |
| Log (NDM/Endoxifen) | 2 | N(−2.82, 0.40; 0.50) | *4/*41, *10/*4, *10/*4xn, *35/*41, *1/*10, *10/*2, *35/*5, *10/*41, *2/*4, *1/*3, *2/*41xn, *2/*35, *1/*4, *5/*9, *1/*41, *1/*29, *1/*35, *35/*4, *1/*5, *2/*41, *41/*9 |
| 3 | N(−2.28, 0.42; 0.38) | *1/*2, *2/*2, *1/*1, *2/*9 *10/*35, *1/*1xn, *2xn/*4, *1xn/*2, *41/*41xn, *1/*2xn | |
| Efavirenz study | 1 | N(2.81, 1.64; 0.31) | *6/*13, *5/*5, *5/*6, *1/*15, *5/*15, *1/*4 |
| Protein expression (pmol/mg) | 2 | N(11.6, 58.1; 0.52) | *6/*14, *2/*4, *1/*5, *6/*6, *1/*6, *5/*22, *4/*6, *2/*22, *1/*2 |
| 3 | N(28.1, 259.7; 0.17) | *1/*22, *1/*1 |
RPM based data analyses.
| P-value | 0.036 | 0.007 |
| Grouping | 35 groups for 35 genotypes | 17 groups for 17 genotypes |
Figure 2.Genotype/phenotype association analysis for the CYP2B6 study. A) is a raw data description. The x-axis is the CYP2B6 protein expression (pmol/mg). The y-axis denotes the 17 CYP2B6 genotypes. B) Seventeen genotypes are clustered into three groups by a mixture model, which are characterized by three normal distributions. The x-axis is the protein expression level, and y-axis is the probability density. C) shows genotype cell probability assignments (s) to each of the three predicted normal mixture components, where colored bar lengths (scaled on (0,1)) indicate the value of s for each mixture component. In A), B), and C), green, blue, and red colors represent the memberships of three clusters.
Figure 3.Bi-allelic epistatic models. A) Checkerboard model was simulated with 2 distributions among the 9 cells, with equal or unequal variances. One group consists of 4 genotype cells containing exactly one heterozygote (shaded cells), with a phenotypic mean of 0. The other five genotype cells have a higher phenotypic mean. B) Diagonal model was simulated with 3 distributions among the 9 cells, with equal or unequal variances. All the cells off the main diagonal have a phenotypic mean of 0. The diagonal cells (dark shaded cells) have higher phenotypic means, with the double heterozygote (light shaded cell) phenotypic mean as half that of the other two cells, but with equal variance.
Simulation studies.
| Situation 1: Equal variance | ||||
| Check board model | ||||
| 0.25 | 8% | 9.7% | 8.6% | 49.8% |
| 0.50 | 87% | 51.4% | 88.5% | 83.2% |
| 1.00 | 100% | 79.3% | 100% | 99.8% |
| Diagonal model | ||||
| 0.25 | 40% | 1.3% | 44.3% | 33.2% |
| 0.50 | 100% | 44.3% | 100% | 61.1% |
| 1.00 | 100% | 86.5% | 100% | 97.9% |
| Situation 2: Unequal variance | ||||
| Checkerboard model | ||||
| 0.25 | 0.4% | 5.8% | 13.5% | 77.8% |
| 0.50 | 16.4% | 22.4% | 92.9% | 92.3% |
| 1.00 | 99.8% | 0.3% | 100% | 100% |
| Diagonal model | ||||
| 0.25 | 0.6% | 0.2% | 54.4% | 49.4% |
| 0.50 | 15.2% | 0.2% | 100% | 82.3% |
| 1.00 | 95.4% | 0.2% | 100% | 100% |
| Situation 3: Skewness (Gamma distribution) | ||||
| Check board model | ||||
| 0.25 | 7.5% | 5.5% | 7.3% | 49.3% |
| 0.50 | 82% | 46.3% | 85.5% | 84.2% |
| 1.00 | 99% | 74.3% | 98.3% | 93.8% |
| Diagonal model | ||||
| 0.25 | 38% | 2.3% | 39.3% | 36.7% |
| 0.50 | 100% | 43.4% | 100% | 63.2% |
| 1.00 | 100% | 87.4% | 100% | 98.9% |