BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.
BACKGROUND: There is currently no method to precisely measure the errors that occur in the sequencing instrument/sequencer, which is critical for next-generation sequencing applications aimed at discovering the genetic makeup of heterogeneous cellular populations. RESULTS: We propose a novel computational method, SequencErr, to address this challenge by measuring the base correspondence between overlapping regions in forward and reverse reads. An analysis of 3777 public datasets from 75 research institutions in 18 countries revealed the sequencer error rate to be ~ 10 per million (pm) and 1.4% of sequencers and 2.7% of flow cells have error rates > 100 pm. At the flow cell level, error rates are elevated in the bottom surfaces and > 90% of HiSeq and NovaSeq flow cells have at least one outlier error-prone tile. By sequencing a common DNA library on different sequencers, we demonstrate that sequencers with high error rates have reduced overall sequencing accuracy, and removal of outlier error-prone tiles improves sequencing accuracy. We demonstrate that SequencErr can reveal novel insights relative to the popular quality control method FastQC and achieve a 10-fold lower error rate than popular error correction methods including Lighter and Musket. CONCLUSIONS: Our study reveals novel insights into the nature of DNA sequencing errors incurred on DNA sequencers. Our method can be used to assess, calibrate, and monitor sequencer accuracy, and to computationally suppress sequencer errors in existing datasets.
Entities:
Keywords:
DNA sequencing; Error suppression; Sequencer/instrument error
Authors: Zhaoming Wang; Carmen L Wilson; John Easton; Andrew Thrasher; Heather Mulder; Qi Liu; Dale J Hedges; Shuoguo Wang; Michael C Rusch; Michael N Edmonson; Shawn Levy; Jennifer Q Lanctot; Eric Caron; Kyla Shelton; Kelsey Currie; Matthew Lear; Aman Patel; Celeste Rosencrance; Ying Shao; Bhavin Vadodaria; Donald Yergeau; Yadav Sapkota; Russell J Brooke; Wonjong Moon; Evadnie Rampersaud; Xiaotu Ma; Ti-Cheng Chang; Stephen V Rice; Cynthia Pepper; Xin Zhou; Xiang Chen; Wenan Chen; Angela Jones; Braden Boone; Matthew J Ehrhardt; Matthew J Krasin; Rebecca M Howell; Nicholas S Phillips; Courtney Lewis; Deokumar Srivastava; Ching-Hon Pui; Chimene A Kesserwan; Gang Wu; Kim E Nichols; James R Downing; Melissa M Hudson; Yutaka Yasui; Leslie L Robison; Jinghui Zhang Journal: J Clin Oncol Date: 2018-05-30 Impact factor: 44.544
Authors: Pedram Razavi; Bob T Li; David N Brown; Byoungsok Jung; Earl Hubbell; Ronglai Shen; Wassim Abida; Krishna Juluru; Ino De Bruijn; Chenlu Hou; Oliver Venn; Raymond Lim; Aseem Anand; Tara Maddala; Sante Gnerre; Ravi Vijaya Satya; Qinwen Liu; Ling Shen; Nicholas Eattock; Jeanne Yue; Alexander W Blocker; Mark Lee; Amy Sehnert; Hui Xu; Megan P Hall; Angie Santiago-Zayas; William F Novotny; James M Isbell; Valerie W Rusch; George Plitas; Alexandra S Heerdt; Marc Ladanyi; David M Hyman; David R Jones; Monica Morrow; Gregory J Riely; Howard I Scher; Charles M Rudin; Mark E Robson; Luis A Diaz; David B Solit; Alexander M Aravanis; Jorge S Reis-Filho Journal: Nat Med Date: 2019-11-25 Impact factor: 53.440
Authors: David W Craig; Sara Nasser; Richard Corbett; Simon K Chan; Lisa Murray; Christophe Legendre; Waibhav Tembe; Jonathan Adkins; Nancy Kim; Shukmei Wong; Angela Baker; Daniel Enriquez; Stephanie Pond; Erin Pleasance; Andrew J Mungall; Richard A Moore; Timothy McDaniel; Yussanne Ma; Steven J M Jones; Marco A Marra; John D Carpten; Winnie S Liang Journal: Sci Rep Date: 2016-04-20 Impact factor: 4.379
Authors: Keith Mitchell; Jaqueline J Brito; Igor Mandric; Qiaozhen Wu; Sergey Knyazev; Sei Chang; Lana S Martin; Aaron Karlsberg; Ekaterina Gerasimov; Russell Littman; Brian L Hill; Nicholas C Wu; Harry Taegyun Yang; Kevin Hsieh; Linus Chen; Eli Littman; Taylor Shabani; German Enik; Douglas Yao; Ren Sun; Jan Schroeder; Eleazar Eskin; Alex Zelikovsky; Pavel Skums; Mihai Pop; Serghei Mangul Journal: Genome Biol Date: 2020-03-17 Impact factor: 13.583
Authors: James C Willey; Tom B Morrison; Bradley Austermiller; Erin L Crawford; Daniel J Craig; Thomas M Blomquist; Wendell D Jones; Aminah Wali; Jennifer S Lococo; Nathan Haseley; Todd A Richmond; Natalia Novoradovskaya; Rebecca Kusko; Guangchun Chen; Quan-Zhen Li; Donald J Johann; Ira W Deveson; Timothy R Mercer; Leihong Wu; Joshua Xu Journal: Cell Rep Methods Date: 2021-11-03