Literature DB >> 22569178

Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly.

Heng Li1.   

Abstract

MOTIVATION: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs.
RESULTS: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. AVAILABILITY: http://github.com/lh3/fermi

Entities:  

Mesh:

Year:  2012        PMID: 22569178      PMCID: PMC3389770          DOI: 10.1093/bioinformatics/bts280

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  31 in total

1.  The fragment assembly string graph.

Authors:  Eugene W Myers
Journal:  Bioinformatics       Date:  2005-09-01       Impact factor: 6.937

2.  Sequencing of natural strains of Arabidopsis thaliana with short reads.

Authors:  Stephan Ossowski; Korbinian Schneeberger; Richard M Clark; Christa Lanz; Norman Warthmann; Detlef Weigel
Journal:  Genome Res       Date:  2008-09-25       Impact factor: 9.043

3.  A whole-genome assembly of Drosophila.

Authors:  E W Myers; G G Sutton; A L Delcher; I M Dew; D P Fasulo; M J Flanigan; S A Kravitz; C M Mobarry; K H Reinert; K A Remington; E L Anson; R A Bolanos; H H Chou; C M Jordan; A L Halpern; S Lonardi; E M Beasley; R C Brandon; L Chen; P J Dunn; Z Lai; Y Liang; D R Nusskern; M Zhan; Q Zhang; X Zheng; G M Rubin; M D Adams; J C Venter
Journal:  Science       Date:  2000-03-24       Impact factor: 47.728

4.  A strategy of DNA sequencing employing computer programs.

Authors:  R Staden
Journal:  Nucleic Acids Res       Date:  1979-06-11       Impact factor: 16.971

5.  A new algorithm for DNA sequence assembly.

Authors:  R M Idury; M S Waterman
Journal:  J Comput Biol       Date:  1995       Impact factor: 1.479

6.  Toward simplifying and accurately formulating fragment assembly.

Authors:  E W Myers
Journal:  J Comput Biol       Date:  1995       Impact factor: 1.479

7.  SEQAID: a DNA sequence assembling program based on a mathematical model.

Authors:  H Peltola; H Söderlund; E Ukkonen
Journal:  Nucleic Acids Res       Date:  1984-01-11       Impact factor: 16.971

8.  Computer programs for the assembly of DNA sequences.

Authors:  T R Gingeras; J P Milazzo; D Sciaky; R J Roberts
Journal:  Nucleic Acids Res       Date:  1979-09-25       Impact factor: 16.971

9.  Natural genetic variation caused by small insertions and deletions in the human genome.

Authors:  Ryan E Mills; W Stephen Pittard; Julienne M Mullaney; Umar Farooq; Todd H Creasy; Anup A Mahurkar; David M Kemeza; Daniel S Strassler; Chris P Ponting; Caleb Webber; Scott E Devine
Journal:  Genome Res       Date:  2011-04-01       Impact factor: 9.043

10.  The diploid genome sequence of an individual human.

Authors:  Samuel Levy; Granger Sutton; Pauline C Ng; Lars Feuk; Aaron L Halpern; Brian P Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A Kravitz; Dana A Busam; Karen Y Beeson; Tina C McIntosh; Karin A Remington; Josep F Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E Frazier; Stephen W Scherer; Robert L Strausberg; J Craig Venter
Journal:  PLoS Biol       Date:  2007-09-04       Impact factor: 8.029

View more
  146 in total

1.  Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR.

Authors:  Hui Yang; Kai Wang
Journal:  Nat Protoc       Date:  2015-09-17       Impact factor: 13.491

2.  Genome-Wide Methylation Study Identifies an IL-13-induced Epigenetic Signature in Asthmatic Airways.

Authors:  Jessie Nicodemus-Johnson; Katherine A Naughton; Jyotsna Sudi; Kyle Hogarth; Edward T Naurekas; Dan L Nicolae; Anne I Sperling; Julian Solway; Steven R White; Carole Ober
Journal:  Am J Respir Crit Care Med       Date:  2016-02-15       Impact factor: 21.405

3.  FermiKit: assembly-based variant calling for Illumina resequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2015-07-27       Impact factor: 6.937

4.  High speed BLASTN: an accelerated MegaBLAST search tool.

Authors:  Ying Chen; Weicai Ye; Yongdong Zhang; Yuesheng Xu
Journal:  Nucleic Acids Res       Date:  2015-08-06       Impact factor: 16.971

5.  Capillary electrophoresis coupled with automated fraction collection.

Authors:  Bonnie Jaskowski Huge; Ryan J Flaherty; Oluwatosin O Dada; Norman J Dovichi
Journal:  Talanta       Date:  2014-07-15       Impact factor: 6.057

Review 6.  Massively parallel sequencing: the new frontier of hematologic genomics.

Authors:  Jill M Johnsen; Deborah A Nickerson; Alex P Reiner
Journal:  Blood       Date:  2013-09-10       Impact factor: 22.113

7.  PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data.

Authors:  Feng Zeng; Rui Jiang; Ting Chen
Journal:  Bioinformatics       Date:  2013-08-31       Impact factor: 6.937

8.  Identifying producers of antibacterial compounds by screening for antibiotic resistance.

Authors:  Maulik N Thaker; Wenliang Wang; Peter Spanogiannopoulos; Nicholas Waglechner; Andrew M King; Ricardo Medina; Gerard D Wright
Journal:  Nat Biotechnol       Date:  2013-09-22       Impact factor: 54.908

9.  Gene expression profiling of brain samples from patients with Lewy body dementia.

Authors:  Maciej Pietrzak; Audrey Papp; Amanda Curtis; Samuel K Handelman; Maria Kataki; Douglas W Scharre; Grzegorz Rempala; Wolfgang Sadee
Journal:  Biochem Biophys Res Commun       Date:  2016-09-22       Impact factor: 3.575

10.  DNA methylation in lung cells is associated with asthma endotypes and genetic risk.

Authors:  Jessie Nicodemus-Johnson; Rachel A Myers; Noburu J Sakabe; Debora R Sobreira; Douglas K Hogarth; Edward T Naureckas; Anne I Sperling; Julian Solway; Steven R White; Marcelo A Nobrega; Dan L Nicolae; Yoav Gilad; Carole Ober
Journal:  JCI Insight       Date:  2016-12-08
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.