Sun Ah Kim1, Chang-Sung Cho2, Suh-Ryung Kim2, Shelley B Bull3,4, Yun Joo Yoo2,5. 1. Department of Statistics. 2. Department of Mathematics Education, Seoul National University, Seoul 08826, South Korea. 3. Prosserman Centre for Health Research, The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto M5T 3L9, Canada. 4. Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto M5T 3M7, Canada. 5. Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea.
Abstract
Motivation: Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation. Results: We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive. Big-LD uses an agglomerative approach that starts by identifying small communities of SNPs, i.e. the SNPs in each LD bin region, and proceeds by merging these communities. We determine the number of blocks using a method to find maximum-weight independent set. Big-LD produces larger LD blocks compared to existing methods such as MATILDE, Haploview, MIG ++, or S-MIG ++ and the LD blocks better agree with recombination hotspot locations determined by sperm-typing experiments. The observed average runtime of Big-LD for 13 288 240 non-monomorphic SNPs from 1000 Genomes Project autosome data (286 East Asians) is about 5.83 h, which is a significant improvement over the existing methods. Availability and implementation: Source code and documentation are available for download at http://github.com/sunnyeesl/BigLD. Contact: yyoo@snu.ac.kr. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation. Results: We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive. Big-LD uses an agglomerative approach that starts by identifying small communities of SNPs, i.e. the SNPs in each LD bin region, and proceeds by merging these communities. We determine the number of blocks using a method to find maximum-weight independent set. Big-LD produces larger LD blocks compared to existing methods such as MATILDE, Haploview, MIG ++, or S-MIG ++ and the LD blocks better agree with recombination hotspot locations determined by sperm-typing experiments. The observed average runtime of Big-LD for 13 288 240 non-monomorphic SNPs from 1000 Genomes Project autosome data (286 East Asians) is about 5.83 h, which is a significant improvement over the existing methods. Availability and implementation: Source code and documentation are available for download at http://github.com/sunnyeesl/BigLD. Contact: yyoo@snu.ac.kr. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Michael C Wu; Peter Kraft; Michael P Epstein; Deanne M Taylor; Stephen J Chanock; David J Hunter; Xihong Lin Journal: Am J Hum Genet Date: 2010-06-11 Impact factor: 11.025
Authors: Gilean A T McVean; Simon R Myers; Sarah Hunt; Panos Deloukas; David R Bentley; Peter Donnelly Journal: Science Date: 2004-04-23 Impact factor: 47.728
Authors: Rebecca C J Twells; Charles A Mein; Michael S Phillips; J Fred Hess; Riitta Veijola; Matthew Gilbey; Matthew Bright; Michael Metzker; Benedicte A Lie; Amanda Kingsnorth; Edward Gregory; Yusuke Nakagawa; Hywel Snook; William Y S Wang; Jennifer Masters; Gillian Johnson; Iain Eaves; Joanna M M Howson; David Clayton; Heather J Cordell; Sarah Nutland; Helen Rance; Philippa Carr; John A Todd Journal: Genome Res Date: 2003-05 Impact factor: 9.043
Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean Journal: Nature Date: 2012-11-01 Impact factor: 49.962
Authors: Susanna Pagni; James D Mills; Adam Frankish; Jonathan M Mudge; Sanjay M Sisodiya Journal: Neuropathol Appl Neurobiol Date: 2021-12-16 Impact factor: 6.250
Authors: Katri Silvennoinen; Kinga Gawel; Despina Tsortouktzidis; Albert J Becker; Camila V Esguerra; Sanjay M Sisodiya; Julika Pitsch; Saud Alhusaini; Karen M J van Loo; Richard Picardo; Zuzanna Michalak; Susanna Pagni; Helena Martins Custodio; James Mills; Christopher D Whelan; Greig I de Zubicaray; Katie L McMahon; Wietske van der Ent; Karolina J Kirstein-Smardzewska; Ettore Tiraboschi; Jonathan M Mudge; Adam Frankish; Maria Thom; Margaret J Wright; Paul M Thompson; Susanne Schoch Journal: Acta Neuropathol Date: 2022-05-12 Impact factor: 15.887
Authors: Edoardo Marcora; Alison M Goate; Gloriia Novikova; Manav Kapoor; Julia Tcw; Edsel M Abud; Anastasia G Efthymiou; Steven X Chen; Haoxiang Cheng; John F Fullard; Jaroslav Bendl; Yiyuan Liu; Panos Roussos; Johan Lm Björkegren; Yunlong Liu; Wayne W Poon; Ke Hao Journal: Nat Commun Date: 2021-03-12 Impact factor: 14.919
Authors: Andre C Araujo; Paulo L S Carneiro; Amanda B Alvarenga; Hinayah R Oliveira; Stephen P Miller; Kelli Retallick; Luiz F Brito Journal: Genes (Basel) Date: 2021-12-22 Impact factor: 4.096