David C Qian1, Jonathan A Busam2, Xiangjun Xiao1, Tracy A O'Mara3, Rosalind A Eeles4, Frederick R Schumacher5, Catherine M Phelan6, Christopher I Amos1. 1. Department of Biomedical Data Science, Dartmouth Geisel School of Medicine, Lebanon, NH 03756, USA. 2. Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA. 3. Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia. 4. Division of Genetics and Epidemiology, Institute of Cancer Research, London SW7 3RP, UK. 5. Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA. 6. Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL 33612, USA.
Abstract
Motivation: Checking concordance between reported sex and genotype-inferred sex is a crucial quality control measure in genome-wide association studies (GWAS). However, limited insights exist regarding the true accuracy of software that infer sex from genotype array data. Results: We present seXY, a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer GWAS. Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification. Availability and Implementation: https://github.com/Christopher-Amos-Lab/seXY. Contact: Christopher.I.Amos@dartmouth.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Checking concordance between reported sex and genotype-inferred sex is a crucial quality control measure in genome-wide association studies (GWAS). However, limited insights exist regarding the true accuracy of software that infer sex from genotype array data. Results: We present seXY, a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer GWAS. Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification. Availability and Implementation: https://github.com/Christopher-Amos-Lab/seXY. Contact: Christopher.I.Amos@dartmouth.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025
Authors: Conghui Qu; Johanna M Schuetz; Jeong Eun Min; Stephen Leach; Denise Daley; John J Spinelli; Angela Brooks-Wilson; Jinko Graham Journal: Front Genet Date: 2011-06-15 Impact factor: 4.599
Authors: Christopher I Amos; Joe Dennis; Zhaoming Wang; Jinyoung Byun; Fredrick R Schumacher; Simon A Gayther; Graham Casey; David J Hunter; Thomas A Sellers; Stephen B Gruber; Alison M Dunning; Kyriaki Michailidou; Laura Fachal; Kimberly Doheny; Amanda B Spurdle; Yafang Li; Xiangjun Xiao; Jane Romm; Elizabeth Pugh; Gerhard A Coetzee; Dennis J Hazelett; Stig E Bojesen; Charlisse Caga-Anan; Christopher A Haiman; Ahsan Kamal; Craig Luccarini; Daniel Tessier; Daniel Vincent; François Bacot; David J Van Den Berg; Stefanie Nelson; Stephen Demetriades; David E Goldgar; Fergus J Couch; Judith L Forman; Graham G Giles; David V Conti; Heike Bickeböller; Angela Risch; Melanie Waldenberger; Irene Brüske-Hohlfeld; Belynda D Hicks; Hua Ling; Lesley McGuffog; Andrew Lee; Karoline Kuchenbaecker; Penny Soucy; Judith Manz; Julie M Cunningham; Katja Butterbach; Zsofia Kote-Jarai; Peter Kraft; Liesel FitzGerald; Sara Lindström; Marcia Adams; James D McKay; Catherine M Phelan; Sara Benlloch; Linda E Kelemen; Paul Brennan; Marjorie Riggan; Tracy A O'Mara; Hongbing Shen; Yongyong Shi; Deborah J Thompson; Marc T Goodman; Sune F Nielsen; Andrew Berchuck; Sylvie Laboissiere; Stephanie L Schmit; Tameka Shelford; Christopher K Edlund; Jack A Taylor; John K Field; Sue K Park; Kenneth Offit; Mads Thomassen; Rita Schmutzler; Laura Ottini; Rayjean J Hung; Jonathan Marchini; Ali Amin Al Olama; Ulrike Peters; Rosalind A Eeles; Michael F Seldin; Elizabeth Gillanders; Daniela Seminara; Antonis C Antoniou; Paul D P Pharoah; Georgia Chenevix-Trench; Stephen J Chanock; Jacques Simard; Douglas F Easton Journal: Cancer Epidemiol Biomarkers Prev Date: 2016-10-03 Impact factor: 4.254
Authors: Chol-Hee Jung; Daniel J Park; Peter Georgeson; Khalid Mahmood; Roger L Milne; Melissa C Southey; Bernard J Pope Journal: Int J Mol Sci Date: 2018-10-15 Impact factor: 5.923