MOTIVATION: With complex traits and diseases having potential genetic contributions of thousands of genetic factors, and with current genotyping arrays consisting of millions of single nucleotide polymorphisms (SNPs), powerful high-dimensional statistical techniques are needed to comprehensively model the genetic variance. Machine learning techniques have many advantages including lack of parametric assumptions, and high power and flexibility. RESULTS: We have applied three machine learning approaches: Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) to the prediction of warfarin maintenance dose in a cohort of African Americans. We have developed a multi-step approach that selects SNPs, builds prediction models with different subsets of selected SNPs along with known associated genetic and environmental variables and tests the discovered models in a cross-validation framework. Preliminary results indicate that our modeling approach gives much higher accuracy than previous models for warfarin dose prediction. A model size of 200 SNPs (in addition to the known genetic and environmental variables) gives the best accuracy. The R(2) between the predicted and actual square root of warfarin dose in this model was on average 66.4% for RFR, 57.8% for SVR and 56.9% for BRT. Thus RFR had the best accuracy, but all three techniques achieved better performance than the current published R(2) of 43% in a sample of mixed ethnicity, and 27% in an African American sample. In summary, machine learning approaches for high-dimensional pharmacogenetic prediction, and for prediction of clinical continuous traits of interest, hold great promise and warrant further research.
MOTIVATION: With complex traits and diseases having potential genetic contributions of thousands of genetic factors, and with current genotyping arrays consisting of millions of single nucleotide polymorphisms (SNPs), powerful high-dimensional statistical techniques are needed to comprehensively model the genetic variance. Machine learning techniques have many advantages including lack of parametric assumptions, and high power and flexibility. RESULTS: We have applied three machine learning approaches: Random Forest Regression (RFR), Boosted Regression Tree (BRT) and Support Vector Regression (SVR) to the prediction of warfarin maintenance dose in a cohort of African Americans. We have developed a multi-step approach that selects SNPs, builds prediction models with different subsets of selected SNPs along with known associated genetic and environmental variables and tests the discovered models in a cross-validation framework. Preliminary results indicate that our modeling approach gives much higher accuracy than previous models for warfarin dose prediction. A model size of 200 SNPs (in addition to the known genetic and environmental variables) gives the best accuracy. The R(2) between the predicted and actual square root of warfarin dose in this model was on average 66.4% for RFR, 57.8% for SVR and 56.9% for BRT. Thus RFR had the best accuracy, but all three techniques achieved better performance than the current published R(2) of 43% in a sample of mixed ethnicity, and 27% in an African American sample. In summary, machine learning approaches for high-dimensional pharmacogenetic prediction, and for prediction of clinical continuous traits of interest, hold great promise and warrant further research.
Authors: Margaret A Shipp; Ken N Ross; Pablo Tamayo; Andrew P Weng; Jeffery L Kutok; Ricardo C T Aguiar; Michelle Gaasenbeek; Michael Angelo; Michael Reich; Geraldine S Pinkus; Tane S Ray; Margaret A Koval; Kim W Last; Andrew Norton; T Andrew Lister; Jill Mesirov; Donna S Neuberg; Eric S Lander; Jon C Aster; Todd R Golub Journal: Nat Med Date: 2002-01 Impact factor: 53.440
Authors: Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich Journal: Nat Genet Date: 2006-07-23 Impact factor: 38.330
Authors: Silke Szymczak; Joanna M Biernacka; Heather J Cordell; Oscar González-Recio; Inke R König; Heping Zhang; Yan V Sun Journal: Genet Epidemiol Date: 2009 Impact factor: 2.135
Authors: Nita A Limdi; T Mark Beasley; Michael R Crowley; Joyce A Goldstein; Mark J Rieder; David A Flockhart; Donna K Arnett; Ronald T Acton; Nianjun Liu Journal: Pharmacogenomics Date: 2008-10 Impact factor: 2.533
Authors: Mia Wadelius; Leslie Y Chen; Jonatan D Lindh; Niclas Eriksson; Mohammed J R Ghori; Suzannah Bumpstead; Lennart Holm; Ralph McGinnis; Anders Rane; Panos Deloukas Journal: Blood Date: 2008-06-23 Impact factor: 22.113
Authors: H Schelleman; J Chen; Z Chen; J Christie; C W Newcomb; C M Brensinger; M Price; A S Whitehead; C Kealey; C F Thorn; F F Samaha; S E Kimmel Journal: Clin Pharmacol Ther Date: 2008-07-02 Impact factor: 6.875
Authors: Zhi Wei; Kai Wang; Hui-Qi Qu; Haitao Zhang; Jonathan Bradfield; Cecilia Kim; Edward Frackleton; Cuiping Hou; Joseph T Glessner; Rosetta Chiavacci; Charles Stanley; Dimitri Monos; Struan F A Grant; Constantin Polychronakos; Hakon Hakonarson Journal: PLoS Genet Date: 2009-10-09 Impact factor: 5.917
Authors: Kathryn G Link; Michael T Stobb; Dougald M Monroe; Aaron L Fogelson; Keith B Neeves; Suzanne S Sindi; Karin Leiderman Journal: Arterioscler Thromb Vasc Biol Date: 2020-10-29 Impact factor: 8.311
Authors: Roxana Daneshjou; Nicholas P Tatonetti; Konrad J Karczewski; Hersh Sagreiya; Stephane Bourgeois; Katarzyna Drozda; James K Burmester; Tatsuhiko Tsunoda; Yusuke Nakamura; Michiaki Kubo; Matthew Tector; Nita A Limdi; Larisa H Cavallari; Minoli Perera; Julie A Johnson; Teri E Klein; Russ B Altman Journal: BMC Genomics Date: 2013-05-28 Impact factor: 3.969