BACKGROUND: During the past 5 years, high-throughput technologies have been successfully used by epidemiology studies, but almost all have focused on sequence variation through genome-wide association studies (GWAS). Today, the study of other genomic events is becoming more common in large-scale epidemiological studies. Many of these, unlike the single-nucleotide polymorphism studied in GWAS, are continuous measures. In this context, the exercise of searching for regions of interest for disease is akin to the problems described in the statistical 'bump hunting' literature. METHODS: New statistical challenges arise when the measurements are continuous rather than categorical, when they are measured with uncertainty, and when both biological signal, and measurement errors are characterized by spatial correlation along the genome. Perhaps the most challenging complication is that continuous genomic data from large studies are measured throughout long periods, making them susceptible to 'batch effects'. An example that combines all three characteristics is genome-wide DNA methylation measurements. Here, we present a data analysis pipeline that effectively models measurement error, removes batch effects, detects regions of interest and attaches statistical uncertainty to identified regions. RESULTS: We illustrate the usefulness of our approach by detecting genomic regions of DNA methylation associated with a continuous trait in a well-characterized population of newborns. Additionally, we show that addressing unexplained heterogeneity like batch effects reduces the number of false-positive regions. CONCLUSIONS: Our framework offers a comprehensive yet flexible approach for identifying genomic regions of biological interest in large epidemiological studies using quantitative high-throughput methods.
BACKGROUND: During the past 5 years, high-throughput technologies have been successfully used by epidemiology studies, but almost all have focused on sequence variation through genome-wide association studies (GWAS). Today, the study of other genomic events is becoming more common in large-scale epidemiological studies. Many of these, unlike the single-nucleotide polymorphism studied in GWAS, are continuous measures. In this context, the exercise of searching for regions of interest for disease is akin to the problems described in the statistical 'bump hunting' literature. METHODS: New statistical challenges arise when the measurements are continuous rather than categorical, when they are measured with uncertainty, and when both biological signal, and measurement errors are characterized by spatial correlation along the genome. Perhaps the most challenging complication is that continuous genomic data from large studies are measured throughout long periods, making them susceptible to 'batch effects'. An example that combines all three characteristics is genome-wide DNA methylation measurements. Here, we present a data analysis pipeline that effectively models measurement error, removes batch effects, detects regions of interest and attaches statistical uncertainty to identified regions. RESULTS: We illustrate the usefulness of our approach by detecting genomic regions of DNA methylation associated with a continuous trait in a well-characterized population of newborns. Additionally, we show that addressing unexplained heterogeneity like batch effects reduces the number of false-positive regions. CONCLUSIONS: Our framework offers a comprehensive yet flexible approach for identifying genomic regions of biological interest in large epidemiological studies using quantitative high-throughput methods.
Authors: Martin J Aryee; Zhijin Wu; Christine Ladd-Acosta; Brian Herb; Andrew P Feinberg; Srinivasan Yegnasubramanian; Rafael A Irizarry Journal: Biostatistics Date: 2010-09-21 Impact factor: 5.899
Authors: Akiko Doi; In-Hyun Park; Bo Wen; Peter Murakami; Martin J Aryee; Rafael Irizarry; Brian Herb; Christine Ladd-Acosta; Junsung Rho; Sabine Loewer; Justine Miller; Thorsten Schlaeger; George Q Daley; Andrew P Feinberg Journal: Nat Genet Date: 2009-11-01 Impact factor: 38.330
Authors: Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry Journal: Nat Rev Genet Date: 2010-09-14 Impact factor: 53.242
Authors: Hong Ji; Lauren I R Ehrlich; Jun Seita; Peter Murakami; Akiko Doi; Paul Lindau; Hwajin Lee; Martin J Aryee; Rafael A Irizarry; Kitai Kim; Derrick J Rossi; Matthew A Inlay; Thomas Serwold; Holger Karsunky; Lena Ho; George Q Daley; Irving L Weissman; Andrew P Feinberg Journal: Nature Date: 2010-08-15 Impact factor: 49.962
Authors: Anna Köttgen; Nicole L Glazer; Abbas Dehghan; Shih-Jen Hwang; Ronit Katz; Man Li; Qiong Yang; Vilmundur Gudnason; Lenore J Launer; Tamara B Harris; Albert V Smith; Dan E Arking; Brad C Astor; Eric Boerwinkle; Georg B Ehret; Ingo Ruczinski; Robert B Scharpf; Yii-Der Ida Chen; Ian H de Boer; Talin Haritunians; Thomas Lumley; Mark Sarnak; David Siscovick; Emelia J Benjamin; Daniel Levy; Ashish Upadhyay; Yurii S Aulchenko; Albert Hofman; Fernando Rivadeneira; André G Uitterlinden; Cornelia M van Duijn; Daniel I Chasman; Guillaume Paré; Paul M Ridker; W H Linda Kao; Jacqueline C Witteman; Josef Coresh; Michael G Shlipak; Caroline S Fox Journal: Nat Genet Date: 2009-05-10 Impact factor: 38.330
Authors: K Kim; A Doi; B Wen; K Ng; R Zhao; P Cahan; J Kim; M J Aryee; H Ji; L I R Ehrlich; A Yabuuchi; A Takeuchi; K C Cunniff; H Hongguang; S McKinney-Freeman; O Naveiras; T J Yoon; R A Irizarry; N Jung; J Seita; J Hanna; P Murakami; R Jaenisch; R Weissleder; S H Orkin; I L Weissman; A P Feinberg; G Q Daley Journal: Nature Date: 2010-09-16 Impact factor: 49.962
Authors: Kasper Daniel Hansen; Winston Timp; Héctor Corrada Bravo; Sarven Sabunciyan; Benjamin Langmead; Oliver G McDonald; Bo Wen; Hao Wu; Yun Liu; Dinh Diep; Eirikur Briem; Kun Zhang; Rafael A Irizarry; Andrew P Feinberg Journal: Nat Genet Date: 2011-06-26 Impact factor: 38.330
Authors: Hwajin Lee; Andrew E Jaffe; Jason I Feinberg; Rakel Tryggvadottir; Shannon Brown; Carolina Montano; Martin J Aryee; Rafael A Irizarry; Julie Herbstman; Frank R Witter; Lynn R Goldman; Andrew P Feinberg; M Daniele Fallin Journal: Int J Epidemiol Date: 2012-02 Impact factor: 7.196
Authors: Mark P Chao; Andrew J Gentles; Susmita Chatterjee; Feng Lan; Andreas Reinisch; M Ryan Corces; Seethu Xavy; Jinfeng Shen; Daniel Haag; Soham Chanda; Rahul Sinha; Rachel M Morganti; Toshinobu Nishimura; Mohamed Ameen; Haodi Wu; Marius Wernig; Joseph C Wu; Ravindra Majeti Journal: Cell Stem Cell Date: 2017-01-12 Impact factor: 24.633
Authors: Jason I Feinberg; Kelly M Bakulski; Andrew E Jaffe; Rakel Tryggvadottir; Shannon C Brown; Lynn R Goldman; Lisa A Croen; Irva Hertz-Picciotto; Craig J Newschaffer; M Daniele Fallin; Andrew P Feinberg Journal: Int J Epidemiol Date: 2015-04-14 Impact factor: 7.196
Authors: Leonardo Collado-Torres; Abhinav Nellore; Alyssa C Frazee; Christopher Wilks; Michael I Love; Ben Langmead; Rafael A Irizarry; Jeffrey T Leek; Andrew E Jaffe Journal: Nucleic Acids Res Date: 2016-09-29 Impact factor: 16.971
Authors: Ivana V Yang; Brent S Pedersen; Andrew Liu; George T O'Connor; Stephen J Teach; Meyer Kattan; Rana Tawil Misiak; Rebecca Gruchalla; Suzanne F Steinbach; Stanley J Szefler; Michelle A Gill; Agustin Calatroni; Gloria David; Corinne E Hennessy; Elizabeth J Davidson; Weiming Zhang; Peter Gergen; Alkis Togias; William W Busse; David A Schwartz Journal: J Allergy Clin Immunol Date: 2015-03-11 Impact factor: 10.793
Authors: Elena Flowers; Annesa Flentje; Jon Levine; Adam Olshen; Marilyn Hammer; Steven Paul; Yvette Conley; Christine Miaskowski; Kord M Kober Journal: Biol Res Nurs Date: 2019-01-31 Impact factor: 2.522