Sujin Seo1, Kyungtaek Park2, Jang Jae Lee3, Kyu Yeong Choi3,4, Kun Ho Lee5,6, Sungho Won7,8,9. 1. Department of Public Health Science, Graduate School of Public Health, Seoul National University, 1 Kwanak-ro Kwanak-gu, Seoul, 151-742, Republic of Korea. 2. Interdisciplinary Program of Bioinformatics, College of National Sciences, Seoul National University, Seoul, 151-742, Republic of Korea. 3. National Research Center for Dementia, Chosun University, Gwangju, 61452, Republic of Korea. 4. Premedical Science, School of Medicine, Chousn University, Gwangju, 61452, Republic of Korea. 5. National Research Center for Dementia, Chosun University, Gwangju, 61452, Republic of Korea. leekho@chosun.ac.kr. 6. Department of Biomedical Science, College of Natural Sciences, Chosun University, Gwangju, 61452, Republic of Korea. leekho@chosun.ac.kr. 7. Department of Public Health Science, Graduate School of Public Health, Seoul National University, 1 Kwanak-ro Kwanak-gu, Seoul, 151-742, Republic of Korea. sunghow@gmail.com. 8. Interdisciplinary Program of Bioinformatics, College of National Sciences, Seoul National University, Seoul, 151-742, Republic of Korea. sunghow@gmail.com. 9. Institute of Health and Environment, Seoul National University, Seoul, 151-742, Republic of Korea. sunghow@gmail.com.
Abstract
BACKGROUND: In genetic analyses, the term 'batch effect' refers to systematic differences caused by batch heterogeneity. Controlling this unintended effect is the most important step in quality control (QC) processes that precede analyses. Currently, batch effects are not appropriately controlled by statistics, and newer approaches are required. METHODS: In this report, we propose a new method to detect the heterogeneity of probe intensities among different batches and a procedure for calling genotypes and QC in the presence of a batch effect. First, we conducted a multivariate analysis of variance (MANOVA) to test the differences in probe intensities among batches. If heterogeneity is detected, subjects should be clustered using a K-medoid algorithm using the averages of the probe intensity measurements for each batch and the genotypes of subjects in different clusters should be called separately. RESULTS: The proposed method was used to assess genotyping data of 3619 subjects consisting of 1074 patients with Alzheimer's disease, 296 with mild cognitive impairment (MCI), and 1153 controls. The proposed method improves the accuracy of called genotypes without the need to filter a lot of subjects and SNPs, and therefore is a reasonable approach for controlling batch effects. CONCLUSIONS: We proposed a new strategy that detects batch effects with probe intensity measurement and calls genotypes in the presence of batch effects. The application of the proposed method to real data shows that it produces a balanced approach. Furthermore, the proposed method can be extended to various scenarios with a simple modification.
BACKGROUND: In genetic analyses, the term 'batch effect' refers to systematic differences caused by batch heterogeneity. Controlling this unintended effect is the most important step in quality control (QC) processes that precede analyses. Currently, batch effects are not appropriately controlled by statistics, and newer approaches are required. METHODS: In this report, we propose a new method to detect the heterogeneity of probe intensities among different batches and a procedure for calling genotypes and QC in the presence of a batch effect. First, we conducted a multivariate analysis of variance (MANOVA) to test the differences in probe intensities among batches. If heterogeneity is detected, subjects should be clustered using a K-medoid algorithm using the averages of the probe intensity measurements for each batch and the genotypes of subjects in different clusters should be called separately. RESULTS: The proposed method was used to assess genotyping data of 3619 subjects consisting of 1074 patients with Alzheimer's disease, 296 with mild cognitive impairment (MCI), and 1153 controls. The proposed method improves the accuracy of called genotypes without the need to filter a lot of subjects and SNPs, and therefore is a reasonable approach for controlling batch effects. CONCLUSIONS: We proposed a new strategy that detects batch effects with probe intensity measurement and calls genotypes in the presence of batch effects. The application of the proposed method to real data shows that it produces a balanced approach. Furthermore, the proposed method can be extended to various scenarios with a simple modification.
Entities:
Keywords:
Batch effect; Calling; Genome-wide association analysis; K-medoid clustering; Quality control
Authors: K Miclaus; R Wolfinger; S Vega; M Chierici; C Furlanello; C Lambert; H Hong; Li Zhang; S Yin; F Goodsaid Journal: Pharmacogenomics J Date: 2010-08 Impact factor: 3.550
Authors: Valentina Moskvina; Nick Craddock; Peter Holmans; Michael J Owen; Michael C O'Donovan Journal: Hum Hered Date: 2006-04-06 Impact factor: 0.444
Authors: Carl A Anderson; Fredrik H Pettersson; Geraldine M Clarke; Lon R Cardon; Andrew P Morris; Krina T Zondervan Journal: Nat Protoc Date: 2010-08-26 Impact factor: 13.491
Authors: B Winblad; K Palmer; M Kivipelto; V Jelic; L Fratiglioni; L-O Wahlund; A Nordberg; L Bäckman; M Albert; O Almkvist; H Arai; H Basun; K Blennow; M de Leon; C DeCarli; T Erkinjuntti; E Giacobini; C Graff; J Hardy; C Jack; A Jorm; K Ritchie; C van Duijn; P Visser; R C Petersen Journal: J Intern Med Date: 2004-09 Impact factor: 8.989
Authors: Huixiao Hong; Zhenqiang Su; Weigong Ge; Leming Shi; Roger Perkins; Hong Fang; Joshua Xu; James J Chen; Tao Han; Jim Kaput; James C Fuscoe; Weida Tong Journal: BMC Bioinformatics Date: 2008-08-12 Impact factor: 3.169
Authors: Ye An Kim; Ji Won Yoon; Young Lee; Hyuk Jin Choi; Jae Won Yun; Eunsin Bae; Seung-Hyun Kwon; So Eun Ahn; Ah-Ra Do; Heejin Jin; Sungho Won; Do Joon Park; Chan Soo Shin; Je Hyun Seo Journal: Endocrinol Metab (Seoul) Date: 2021-12-02
Authors: Heejin Jin; Hyun Ju Yoo; Ye An Kim; Ji Hyun Lee; Young Lee; Seung-Hyun Kwon; Young Joo Seo; Seung Hun Lee; Jung-Min Koh; Yunmi Ji; Ah Ra Do; Sungho Won; Je Hyun Seo Journal: Sci Rep Date: 2022-03-03 Impact factor: 4.379
Authors: Jin An; Ah Ra Do; Hae Yeon Kang; Woo Jin Kim; Sanghun Lee; Ji Hyang Lee; Woo Jung Song; Hyouk Soo Kwon; You Sook Cho; Hee Bom Moon; Sile Hu; Ian M Adcock; Kian Fan Chung; Sungho Won; Tae Bum Kim Journal: Allergy Asthma Immunol Res Date: 2021-07 Impact factor: 5.764