Hui An1, Chang-Shuai Wei2, Oliver Wang3, Da-Hui Wang1, Liang-Wen Xu4, Qing Lu5, Cheng-Yin Ye1. 1. Department of Health Management, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China. 2. Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX 76107, USA. 3. HBI Solutions Inc, Palo Alto, CA 94301, USA. 4. Department of Preventive Medicine, School of Medicine, Hangzhou Normal University, Hangzhou 310036, China. 5. Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, East Lansing, MI 48824, USA.
Abstract
OBJECTIVE: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. METHODS: In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. RESULTS: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. CONCLUSIONS: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.
OBJECTIVE: As one of the most popular designs used in genetic research, family-based design has been well recognized for its advantages, such as robustness against population stratification and admixture. With vast amounts of genetic data collected from family-based studies, there is a great interest in studying the role of genetic markers from the aspect of risk prediction. This study aims to develop a new statistical approach for family-based risk prediction analysis with an improved prediction accuracy compared with existing methods based on family history. METHODS: In this study, we propose an ensemble-based likelihood ratio (ELR) approach, Fam-ELR, for family-based genomic risk prediction. Fam-ELR incorporates a clustered receiver operating characteristic (ROC) curve method to consider correlations among family samples, and uses a computationally efficient tree-assembling procedure for variable selection and model building. RESULTS: Through simulations, Fam-ELR shows its robustness in various underlying disease models and pedigree structures, and attains better performance than two existing family-based risk prediction methods. In a real-data application to a family-based genome-wide dataset of conduct disorder, Fam-ELR demonstrates its ability to integrate potential risk predictors and interactions into the model for improved accuracy, especially on a genome-wide level. CONCLUSIONS: By comparing existing approaches, such as genetic risk-score approach, Fam-ELR has the capacity of incorporating genetic variants with small or moderate marginal effects and their interactions into an improved risk prediction model. Therefore, it is a robust and useful approach for high-dimensional family-based risk prediction, especially on complex disease with unknown or less known disease etiology.
Entities:
Keywords:
Family-based study; Genetic risk prediction; High-dimensional data
Authors: Gustavo de los Campos; Hugo Naya; Daniel Gianola; José Crossa; Andrés Legarra; Eduardo Manfredi; Kent Weigel; José Miguel Cotes Journal: Genetics Date: 2009-03-16 Impact factor: 4.562
Authors: Edmund J S Sonuga-Barke; Jessica Lasky-Su; Benjamin M Neale; Robert Oades; Wai Chen; Barbara Franke; Jan Buitelaar; Tobias Banaschewski; Richard Ebstein; Michael Gill; Richard Anney; Ana Miranda; Fernando Mulas; Herbert Roeyers; Aribert Rothenberger; Joseph Sergeant; Hans Christoph Steinhausen; Margaret Thompson; Philip Asherson; Stephen V Faraone Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2008-12-05 Impact factor: 3.568
Authors: Julian Maller; Sarah George; Shaun Purcell; Jes Fagerness; David Altshuler; Mark J Daly; Johanna M Seddon Journal: Nat Genet Date: 2006-08-27 Impact factor: 38.330
Authors: Richard J L Anney; Jessica Lasky-Su; Colm O'Dúshláine; Elaine Kenny; Benjamin M Neale; Aisling Mulligan; Barbara Franke; Kaixin Zhou; Wai Chen; Hanna Christiansen; Alejandro Arias-Vásquez; Tobias Banaschewski; Jan Buitelaar; Richard Ebstein; Ana Miranda; Fernando Mulas; Robert D Oades; Herbert Roeyers; Aribert Rothenberger; Joseph Sergeant; Edmund Sonuga-Barke; Hans Steinhausen; Philip Asherson; Stephen V Faraone; Michael Gill Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2008-12-05 Impact factor: 3.568
Authors: Jessica Lasky-Su; Benjamin M Neale; Barbara Franke; Richard J L Anney; Kaixin Zhou; Julian B Maller; Alejandro Arias Vasquez; Wai Chen; Philip Asherson; Jan Buitelaar; Tobias Banaschewski; Richard Ebstein; Michael Gill; Ana Miranda; Fernando Mulas; Robert D Oades; Herbert Roeyers; Aribert Rothenberger; Joseph Sergeant; Edmund Sonuga-Barke; Hans Christoph Steinhausen; Eric Taylor; Mark Daly; Nan Laird; Christoph Lange; Stephen V Faraone Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2008-12-05 Impact factor: 3.568
Authors: Anna C Need; Deborah K Attix; Jill M McEvoy; Elizabeth T Cirulli; Kristen L Linney; Priscilla Hunt; Dongliang Ge; Erin L Heinzen; Jessica M Maia; Kevin V Shianna; Michael E Weale; Lynn F Cherkas; Gail Clement; Tim D Spector; Greg Gibson; David B Goldstein Journal: Hum Mol Genet Date: 2009-09-04 Impact factor: 6.150
Authors: James B Meigs; Peter Shrader; Lisa M Sullivan; Jarred B McAteer; Caroline S Fox; Josée Dupuis; Alisa K Manning; Jose C Florez; Peter W F Wilson; Ralph B D'Agostino; L Adrienne Cupples Journal: N Engl J Med Date: 2008-11-20 Impact factor: 91.245
Authors: Manuel A R Ferreira; Michael C O'Donovan; Yan A Meng; Ian R Jones; Douglas M Ruderfer; Lisa Jones; Jinbo Fan; George Kirov; Roy H Perlis; Elaine K Green; Jordan W Smoller; Detelina Grozeva; Jennifer Stone; Ivan Nikolov; Kimberly Chambert; Marian L Hamshere; Vishwajit L Nimgaonkar; Valentina Moskvina; Michael E Thase; Sian Caesar; Gary S Sachs; Jennifer Franklin; Katherine Gordon-Smith; Kristin G Ardlie; Stacey B Gabriel; Christine Fraser; Brendan Blumenstiel; Matthew Defelice; Gerome Breen; Michael Gill; Derek W Morris; Amanda Elkin; Walter J Muir; Kevin A McGhee; Richard Williamson; Donald J MacIntyre; Alan W MacLean; Clair David St; Michelle Robinson; Margaret Van Beck; Ana C P Pereira; Radhika Kandaswamy; Andrew McQuillin; David A Collier; Nicholas J Bass; Allan H Young; Jacob Lawrence; I Nicol Ferrier; Adebayo Anjorin; Anne Farmer; David Curtis; Edward M Scolnick; Peter McGuffin; Mark J Daly; Aiden P Corvin; Peter A Holmans; Douglas H Blackwood; Hugh M Gurling; Michael J Owen; Shaun M Purcell; Pamela Sklar; Nick Craddock Journal: Nat Genet Date: 2008-09 Impact factor: 38.330