Yucheng Wang1, Eilis Hannon2, Olivia A Grant3, Tyler J Gorrie-Stone4, Meena Kumari5, Jonathan Mill2, Xiaojun Zhai6, Klaus D McDonald-Maier1, Leonard C Schalkwyk3. 1. School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, UK. 2. Medical School, University of Exeter, Barrack Road, Exeter, UK. 3. School of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, UK. 4. Diamond Light Source Ltd., Harwell Science & Innovation Campus, Oxfordshire, UK. 5. Institute for Social and Economic Research, University of Essex, Wivenhoe Park, Colchester, UK. 6. School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, UK. xzhai@essex.ac.uk.
Abstract
BACKGROUND: Sex is an important covariate of epigenome-wide association studies due to its strong influence on DNA methylation patterns across numerous genomic positions. Nevertheless, many samples on the Gene Expression Omnibus (GEO) frequently lack a sex annotation or are incorrectly labelled. Considering the influence that sex imposes on DNA methylation patterns, it is necessary to ensure that methods for filtering poor samples and checking of sex assignment are accurate and widely applicable. RESULTS: Here we presented a novel method to predict sex using only DNA methylation beta values, which can be readily applied to almost all DNA methylation datasets of different formats (raw IDATs or text files with only signal intensities) uploaded to GEO. We identified 4345 significantly (p<0.01) sex-associated CpG sites present on both 450K and EPIC arrays, and constructed a sex classifier based on the two first principal components of the DNA methylation data of sex-associated probes mapped on sex chromosomes. The proposed method is constructed using whole blood samples and exhibits good performance across a wide range of tissues. We further demonstrated that our method can be used to identify samples with sex chromosome aneuploidy, this function is validated by five Turner syndrome cases and one Klinefelter syndrome case. CONCLUSIONS: This proposed sex classifier not only can be used for sex predictions but also applied to identify samples with sex chromosome aneuploidy, and it is freely and easily accessible by calling the 'estimateSex' function from the newest wateRmelon Bioconductor package ( https://github.com/schalkwyk/wateRmelon ).
BACKGROUND: Sex is an important covariate of epigenome-wide association studies due to its strong influence on DNA methylation patterns across numerous genomic positions. Nevertheless, many samples on the Gene Expression Omnibus (GEO) frequently lack a sex annotation or are incorrectly labelled. Considering the influence that sex imposes on DNA methylation patterns, it is necessary to ensure that methods for filtering poor samples and checking of sex assignment are accurate and widely applicable. RESULTS: Here we presented a novel method to predict sex using only DNA methylation beta values, which can be readily applied to almost all DNA methylation datasets of different formats (raw IDATs or text files with only signal intensities) uploaded to GEO. We identified 4345 significantly (p<0.01) sex-associated CpG sites present on both 450K and EPIC arrays, and constructed a sex classifier based on the two first principal components of the DNA methylation data of sex-associated probes mapped on sex chromosomes. The proposed method is constructed using whole blood samples and exhibits good performance across a wide range of tissues. We further demonstrated that our method can be used to identify samples with sex chromosome aneuploidy, this function is validated by five Turner syndrome cases and one Klinefelter syndrome case. CONCLUSIONS: This proposed sex classifier not only can be used for sex predictions but also applied to identify samples with sex chromosome aneuploidy, and it is freely and easily accessible by calling the 'estimateSex' function from the newest wateRmelon Bioconductor package ( https://github.com/schalkwyk/wateRmelon ).
Entities:
Keywords:
Aneuploidy; DNA methylation; Sex prediction
Authors: Marina Bibikova; Bret Barnes; Chan Tsan; Vincent Ho; Brandy Klotzle; Jennie M Le; David Delano; Lu Zhang; Gary P Schroth; Kevin L Gunderson; Jian-Bing Fan; Richard Shen Journal: Genomics Date: 2011-08-02 Impact factor: 5.736
Authors: Martin J Aryee; Andrew E Jaffe; Hector Corrada-Bravo; Christine Ladd-Acosta; Andrew P Feinberg; Kasper D Hansen; Rafael A Irizarry Journal: Bioinformatics Date: 2014-01-28 Impact factor: 6.937
Authors: Taru Tukiainen; Alexandra-Chloé Villani; Angela Yen; Manuel A Rivas; Jamie L Marshall; Rahul Satija; Matt Aguirre; Laura Gauthier; Mark Fleharty; Andrew Kirby; Beryl B Cummings; Stephane E Castel; Konrad J Karczewski; François Aguet; Andrea Byrnes; Tuuli Lappalainen; Aviv Regev; Kristin G Ardlie; Nir Hacohen; Daniel G MacArthur Journal: Nature Date: 2017-10-11 Impact factor: 49.962
Authors: Nina S McCarthy; Phillip E Melton; Gemma Cadby; Seyhan Yazar; Maria Franchina; Eric K Moses; David A Mackey; Alex W Hewitt Journal: BMC Genomics Date: 2014-11-18 Impact factor: 3.969
Authors: Daniel L McCartney; Rosie M Walker; Stewart W Morris; Andrew M McIntosh; David J Porteous; Kathryn L Evans Journal: Genom Data Date: 2016-05-26
Authors: Yucheng Wang; Tyler J Gorrie-Stone; Olivia A Grant; Alexandria D Andrayas; Xiaojun Zhai; Klaus D McDonald-Maier; Leonard C Schalkwyk Journal: Bioinformatics Date: 2022-06-30 Impact factor: 6.931
Authors: Anna Niehues; Daniele Bizzarri; Marcel J T Reinders; P Eline Slagboom; Alain J van Gool; Erik B van den Akker; Peter A C 't Hoen Journal: BMC Genomics Date: 2022-07-31 Impact factor: 4.547