Heng Li1. 1. Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, MA 02142, USA. hengli@broadinstitute.org
Abstract
MOTIVATION: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. RESULTS: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. AVAILABILITY: http://github.com/lh3/fermi
MOTIVATION: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. RESULTS: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. AVAILABILITY: http://github.com/lh3/fermi
Authors: Stephan Ossowski; Korbinian Schneeberger; Richard M Clark; Christa Lanz; Norman Warthmann; Detlef Weigel Journal: Genome Res Date: 2008-09-25 Impact factor: 9.043
Authors: E W Myers; G G Sutton; A L Delcher; I M Dew; D P Fasulo; M J Flanigan; S A Kravitz; C M Mobarry; K H Reinert; K A Remington; E L Anson; R A Bolanos; H H Chou; C M Jordan; A L Halpern; S Lonardi; E M Beasley; R C Brandon; L Chen; P J Dunn; Z Lai; Y Liang; D R Nusskern; M Zhan; Q Zhang; X Zheng; G M Rubin; M D Adams; J C Venter Journal: Science Date: 2000-03-24 Impact factor: 47.728
Authors: Ryan E Mills; W Stephen Pittard; Julienne M Mullaney; Umar Farooq; Todd H Creasy; Anup A Mahurkar; David M Kemeza; Daniel S Strassler; Chris P Ponting; Caleb Webber; Scott E Devine Journal: Genome Res Date: 2011-04-01 Impact factor: 9.043
Authors: Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J Craig Venter Journal: PLoS Biol Date: 2007-09-04 Impact factor: 8.029
Authors: Jessie Nicodemus-Johnson; Katherine A Naughton; Jyotsna Sudi; Kyle Hogarth; Edward T Naurekas; Dan L Nicolae; Anne I Sperling; Julian Solway; Steven R White; Carole Ober Journal: Am J Respir Crit Care Med Date: 2016-02-15 Impact factor: 21.405
Authors: Maulik N Thaker; Wenliang Wang; Peter Spanogiannopoulos; Nicholas Waglechner; Andrew M King; Ricardo Medina; Gerard D Wright Journal: Nat Biotechnol Date: 2013-09-22 Impact factor: 54.908
Authors: Maciej Pietrzak; Audrey Papp; Amanda Curtis; Samuel K Handelman; Maria Kataki; Douglas W Scharre; Grzegorz Rempala; Wolfgang Sadee Journal: Biochem Biophys Res Commun Date: 2016-09-22 Impact factor: 3.575
Authors: Jessie Nicodemus-Johnson; Rachel A Myers; Noburu J Sakabe; Debora R Sobreira; Douglas K Hogarth; Edward T Naureckas; Anne I Sperling; Julian Solway; Steven R White; Marcelo A Nobrega; Dan L Nicolae; Yoav Gilad; Carole Ober Journal: JCI Insight Date: 2016-12-08