| Literature DB >> 33419473 |
Guillaume Holley1, Doruk Beyter2, Helga Ingimundardottir2, Peter L Møller3, Snædis Kristmundsdottir2,4, Hannes P Eggertsson2, Bjarni V Halldorsson2,4.
Abstract
A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.Entities:
Year: 2021 PMID: 33419473 PMCID: PMC7792008 DOI: 10.1186/s13059-020-02244-4
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583