Literature DB >> 35361932

Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.

Giulio Formenti1,2,3, Arang Rhie4, Brian P Walenz5, Françoise Thibaud-Nissen6, Kishwar Shafin7, Sergey Koren5, Eugene W Myers8, Erich D Jarvis9,10,11, Adam M Phillippy5.   

Abstract

Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.
© 2022. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.

Entities:  

Mesh:

Year:  2022        PMID: 35361932     DOI: 10.1038/s41592-022-01445-y

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   47.990


  2 in total

1.  Genome sequence of the small brown planthopper, Laodelphax striatellus.

Authors:  Junjie Zhu; Feng Jiang; Xianhui Wang; Pengcheng Yang; Yanyuan Bao; Wan Zhao; Wei Wang; Hong Lu; Qianshuo Wang; Na Cui; Jing Li; Xiaofang Chen; Lan Luo; Jinting Yu; Le Kang; Feng Cui
Journal:  Gigascience       Date:  2017-12-01       Impact factor: 6.524

2.  KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies.

Authors:  Daniel Mapleson; Gonzalo Garcia Accinelli; George Kettleborough; Jonathan Wright; Bernardo J Clavijo
Journal:  Bioinformatics       Date:  2017-02-15       Impact factor: 6.937

  2 in total
  3 in total

1.  Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies.

Authors:  Alexander S Leonard; Danang Crysnanto; Zih-Hua Fang; Michael P Heaton; Brian L Vander Ley; Carolina Herrera; Heinrich Bollwein; Derek M Bickhart; Kristen L Kuhn; Timothy P L Smith; Benjamin D Rosen; Hubert Pausch
Journal:  Nat Commun       Date:  2022-05-31       Impact factor: 17.694

2.  Semi-automated assembly of high-quality diploid human reference genomes.

Authors:  Erich D Jarvis; Giulio Formenti; Arang Rhie; Andrea Guarracino; Chentao Yang; Jonathan Wood; Alan Tracey; Francoise Thibaud-Nissen; Mitchell R Vollger; David Porubsky; Haoyu Cheng; Mobin Asri; Glennis A Logsdon; Paolo Carnevali; Mark J P Chaisson; Chen-Shan Chin; Sarah Cody; Joanna Collins; Peter Ebert; Merly Escalona; Olivier Fedrigo; Robert S Fulton; Lucinda L Fulton; Shilpa Garg; Jennifer L Gerton; Jay Ghurye; Anastasiya Granat; Richard E Green; William Harvey; Patrick Hasenfeld; Alex Hastie; Marina Haukness; Erich B Jaeger; Miten Jain; Melanie Kirsche; Mikhail Kolmogorov; Jan O Korbel; Sergey Koren; Jonas Korlach; Joyce Lee; Daofeng Li; Tina Lindsay; Julian Lucas; Feng Luo; Tobias Marschall; Matthew W Mitchell; Jennifer McDaniel; Fan Nie; Hugh E Olsen; Nathan D Olson; Trevor Pesout; Tamara Potapova; Daniela Puiu; Allison Regier; Jue Ruan; Steven L Salzberg; Ashley D Sanders; Michael C Schatz; Anthony Schmitt; Valerie A Schneider; Siddarth Selvaraj; Kishwar Shafin; Alaina Shumate; Nathan O Stitziel; Catherine Stober; James Torrance; Justin Wagner; Jianxin Wang; Aaron Wenger; Chuanle Xiao; Aleksey V Zimin; Guojie Zhang; Ting Wang; Heng Li; Erik Garrison; David Haussler; Ira Hall; Justin M Zook; Evan E Eichler; Adam M Phillippy; Benedict Paten; Kerstin Howe; Karen H Miga
Journal:  Nature       Date:  2022-10-19       Impact factor: 69.504

3.  The complete sequence of a human genome.

Authors:  Sergey Nurk; Sergey Koren; Arang Rhie; Mikko Rautiainen; Andrey V Bzikadze; Alla Mikheenko; Mitchell R Vollger; Nicolas Altemose; Lev Uralsky; Ariel Gershman; Sergey Aganezov; Savannah J Hoyt; Mark Diekhans; Glennis A Logsdon; Michael Alonge; Stylianos E Antonarakis; Matthew Borchers; Gerard G Bouffard; Shelise Y Brooks; Gina V Caldas; Nae-Chyun Chen; Haoyu Cheng; Chen-Shan Chin; William Chow; Leonardo G de Lima; Philip C Dishuck; Richard Durbin; Tatiana Dvorkina; Ian T Fiddes; Giulio Formenti; Robert S Fulton; Arkarachai Fungtammasan; Erik Garrison; Patrick G S Grady; Tina A Graves-Lindsay; Ira M Hall; Nancy F Hansen; Gabrielle A Hartley; Marina Haukness; Kerstin Howe; Michael W Hunkapiller; Chirag Jain; Miten Jain; Erich D Jarvis; Peter Kerpedjiev; Melanie Kirsche; Mikhail Kolmogorov; Jonas Korlach; Milinn Kremitzki; Heng Li; Valerie V Maduro; Tobias Marschall; Ann M McCartney; Jennifer McDaniel; Danny E Miller; James C Mullikin; Eugene W Myers; Nathan D Olson; Benedict Paten; Paul Peluso; Pavel A Pevzner; David Porubsky; Tamara Potapova; Evgeny I Rogaev; Jeffrey A Rosenfeld; Steven L Salzberg; Valerie A Schneider; Fritz J Sedlazeck; Kishwar Shafin; Colin J Shew; Alaina Shumate; Ying Sims; Arian F A Smit; Daniela C Soto; Ivan Sović; Jessica M Storer; Aaron Streets; Beth A Sullivan; Françoise Thibaud-Nissen; James Torrance; Justin Wagner; Brian P Walenz; Aaron Wenger; Jonathan M D Wood; Chunlin Xiao; Stephanie M Yan; Alice C Young; Samantha Zarate; Urvashi Surti; Rajiv C McCoy; Megan Y Dennis; Ivan A Alexandrov; Jennifer L Gerton; Rachel J O'Neill; Winston Timp; Justin M Zook; Michael C Schatz; Evan E Eichler; Karen H Miga; Adam M Phillippy
Journal:  Science       Date:  2022-03-31       Impact factor: 63.714

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.