Kevin Sharp1, Warren Kretzschmar2, Olivier Delaneau3, Jonathan Marchini4. 1. Department of Statistics, University of Oxford, Oxford, UK. 2. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK and. 3. Département De Génétique Et Développement (GEDEV), University of Geneva, Geneva, Switzerland. 4. Department of Statistics, University of Oxford, Oxford, UK, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK and.
Abstract
MOTIVATION: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. RESULTS: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. AVAILABILITY AND IMPLEMENTATION: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/ CONTACT: marchini@stats.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. RESULTS: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. AVAILABILITY AND IMPLEMENTATION: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/ CONTACT: marchini@stats.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Heather M McLaughlin; Reiko Sakaguchi; Cuiping Liu; Takao Igarashi; Davut Pehlivan; Kristine Chu; Ram Iyer; Pedro Cruz; Praveen F Cherukuri; Nancy F Hansen; James C Mullikin; Leslie G Biesecker; Thomas E Wilson; Victor Ionasescu; Garth Nicholson; Charles Searby; Kevin Talbot; Jeffrey M Vance; Stephan Züchner; Kinga Szigeti; James R Lupski; Ya-Ming Hou; Eric D Green; Anthony Antonellis Journal: Am J Hum Genet Date: 2010-10-08 Impact factor: 11.025
Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean Journal: Nature Date: 2012-11-01 Impact factor: 49.962
Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo Journal: Nature Date: 2015-09-14 Impact factor: 49.962
Authors: Elliot S Gershon; Godfrey Pearlson; Matcheri S Keshavan; Carol Tamminga; Brett Clementz; Peter F Buckley; Ney Alliey-Rodriguez; Chunyu Liu; John A Sweeney; Sarah Keedy; Shashwath A Meda; Neeraj Tandon; Rebecca Shafee; Jeffrey R Bishop; Elena I Ivleva Journal: Schizophr Res Date: 2017-10-20 Impact factor: 4.939
Authors: Jared O'Connell; Kevin Sharp; Nick Shrine; Louise Wain; Ian Hall; Martin Tobin; Jean-Francois Zagury; Olivier Delaneau; Jonathan Marchini Journal: Nat Genet Date: 2016-06-06 Impact factor: 38.330
Authors: Shane McCarthy; Sayantan Das; Warren Kretzschmar; Olivier Delaneau; Andrew R Wood; Alexander Teumer; Hyun Min Kang; Christian Fuchsberger; Petr Danecek; Kevin Sharp; Yang Luo; Carlo Sidore; Alan Kwong; Nicholas Timpson; Seppo Koskinen; Scott Vrieze; Laura J Scott; He Zhang; Anubha Mahajan; Jan Veldink; Ulrike Peters; Carlos Pato; Cornelia M van Duijn; Christopher E Gillies; Ilaria Gandin; Massimo Mezzavilla; Arthur Gilly; Massimiliano Cocca; Michela Traglia; Andrea Angius; Jeffrey C Barrett; Dorrett Boomsma; Kari Branham; Gerome Breen; Chad M Brummett; Fabio Busonero; Harry Campbell; Andrew Chan; Sai Chen; Emily Chew; Francis S Collins; Laura J Corbin; George Davey Smith; George Dedoussis; Marcus Dorr; Aliki-Eleni Farmaki; Luigi Ferrucci; Lukas Forer; Ross M Fraser; Stacey Gabriel; Shawn Levy; Leif Groop; Tabitha Harrison; Andrew Hattersley; Oddgeir L Holmen; Kristian Hveem; Matthias Kretzler; James C Lee; Matt McGue; Thomas Meitinger; David Melzer; Josine L Min; Karen L Mohlke; John B Vincent; Matthias Nauck; Deborah Nickerson; Aarno Palotie; Michele Pato; Nicola Pirastu; Melvin McInnis; J Brent Richards; Cinzia Sala; Veikko Salomaa; David Schlessinger; Sebastian Schoenherr; P Eline Slagboom; Kerrin Small; Timothy Spector; Dwight Stambolian; Marcus Tuke; Jaakko Tuomilehto; Leonard H Van den Berg; Wouter Van Rheenen; Uwe Volker; Cisca Wijmenga; Daniela Toniolo; Eleftheria Zeggini; Paolo Gasparini; Matthew G Sampson; James F Wilson; Timothy Frayling; Paul I W de Bakker; Morris A Swertz; Steven McCarroll; Charles Kooperberg; Annelot Dekker; David Altshuler; Cristen Willer; William Iacono; Samuli Ripatti; Nicole Soranzo; Klaudia Walter; Anand Swaroop; Francesco Cucca; Carl A Anderson; Richard M Myers; Michael Boehnke; Mark I McCarthy; Richard Durbin Journal: Nat Genet Date: 2016-08-22 Impact factor: 38.330
Authors: William C Hahn; Matthew Meyerson; Andrew L Hong; Kar-Tong Tan; Hyunji Kim; Jian Carrot-Zhang; Yuxiang Zhang; Won Jun Kim; Guillaume Kugener; Jeremiah A Wala; Thomas P Howard; Yueh-Yun Chi; Rameen Beroukhim; Heng Li; Gavin Ha; Seth L Alper; Elizabeth J Perlman; Elizabeth A Mullen Journal: Genome Med Date: 2021-07-14 Impact factor: 11.117
Authors: Po-Ru Loh; Petr Danecek; Pier Francesco Palamara; Christian Fuchsberger; Yakir A Reshef; Hilary K Finucane; Sebastian Schoenherr; Lukas Forer; Shane McCarthy; Goncalo R Abecasis; Richard Durbin; Alkes L Price Journal: Nat Genet Date: 2016-10-03 Impact factor: 38.330