Christopher M Watson1,2,3, Nick Camm4, Laura A Crinnion4,5, Samuel Clokie6, Rachel L Robinson4, Julian Adlard4, Ruth Charlton4, Alexander F Markham5,7, Ian M Carr5,7, David T Bonthron4,5,7. 1. Yorkshire Regional Genetics Service, St. James's University Hospital, 6.2 Clinical Sciences Building, Leeds, LS9 7TF, United Kingdom. c.m.watson@leeds.ac.uk. 2. MRC Single Cell Functional Genomics Centre, St. James's University Hospital, University of Leeds, Leeds, LS9 7TF, United Kingdom. c.m.watson@leeds.ac.uk. 3. MRC Medical Bioinformatics Centre, Leeds Institute for Data Analytics, University of Leeds, Leeds, LS2 9JT, United Kingdom. c.m.watson@leeds.ac.uk. 4. Yorkshire Regional Genetics Service, St. James's University Hospital, 6.2 Clinical Sciences Building, Leeds, LS9 7TF, United Kingdom. 5. MRC Single Cell Functional Genomics Centre, St. James's University Hospital, University of Leeds, Leeds, LS9 7TF, United Kingdom. 6. West Midlands Regional Genetics Laboratory, Birmingham Women's NHS Foundation Trust, Birmingham, B15 2TG, United Kingdom. 7. MRC Medical Bioinformatics Centre, Leeds Institute for Data Analytics, University of Leeds, Leeds, LS2 9JT, United Kingdom.
Abstract
BACKGROUND: Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. METHODS: We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. RESULTS: ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. CONCLUSIONS: This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.
BACKGROUND: Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. METHODS: We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. RESULTS: ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. CONCLUSIONS: This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.
Authors: Jan P Schouten; Cathal J McElgunn; Raymond Waaijer; Danny Zwijnenburg; Filip Diepvens; Gerard Pals Journal: Nucleic Acids Res Date: 2002-06-15 Impact factor: 16.971
Authors: Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043
Authors: Daniel F Gudbjartsson; Hannes Helgason; Sigurjon A Gudjonsson; Florian Zink; Asmundur Oddson; Arnaldur Gylfason; Soren Besenbacher; Gisli Magnusson; Bjarni V Halldorsson; Eirikur Hjartarson; Gunnar Th Sigurdsson; Simon N Stacey; Michael L Frigge; Hilma Holm; Jona Saemundsdottir; Hafdis Th Helgadottir; Hrefna Johannsdottir; Gunnlaugur Sigfusson; Gudmundur Thorgeirsson; Jon Th Sverrisson; Solveig Gretarsdottir; G Bragi Walters; Thorunn Rafnar; Bjarni Thjodleifsson; Einar S Bjornsson; Sigurdur Olafsson; Hildur Thorarinsdottir; Thora Steingrimsdottir; Thora S Gudmundsdottir; Asgeir Theodors; Jon G Jonasson; Asgeir Sigurdsson; Gyda Bjornsdottir; Jon J Jonsson; Olafur Thorarensen; Petur Ludvigsson; Hakon Gudbjartsson; Gudmundur I Eyjolfsson; Olof Sigurdardottir; Isleifur Olafsson; David O Arnar; Olafur Th Magnusson; Augustine Kong; Gisli Masson; Unnur Thorsteinsdottir; Agnar Helgason; Patrick Sulem; Kari Stefansson Journal: Nat Genet Date: 2015-03-25 Impact factor: 38.330
Authors: Henry M Wood; Ornella Belvedere; Caroline Conway; Catherine Daly; Rebecca Chalkley; Melissa Bickerdike; Claire McKinley; Phil Egan; Lisa Ross; Bruce Hayward; Joanne Morgan; Leslie Davidson; Ken MacLennan; Thian K Ong; Kostas Papagiannopoulos; Ian Cook; David J Adams; Graham R Taylor; Pamela Rabbitts Journal: Nucleic Acids Res Date: 2010-06-04 Impact factor: 16.971
Authors: Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly Journal: Nat Genet Date: 2011-04-10 Impact factor: 38.330
Authors: Christopher J Mattocks; Michael A Morris; Gert Matthijs; Elfriede Swinnen; Anniek Corveleyn; Els Dequeker; Clemens R Müller; Victoria Pratt; Andrew Wallace Journal: Eur J Hum Genet Date: 2010-07-28 Impact factor: 4.246
Authors: Andy Rimmer; Hang Phan; Iain Mathieson; Zamin Iqbal; Stephen R F Twigg; Andrew O M Wilkie; Gil McVean; Gerton Lunter Journal: Nat Genet Date: 2014-07-13 Impact factor: 38.330
Authors: Christopher M Watson; Fatima Nadat; Sammiya Ahmed; Laura A Crinnion; Sean O'Riordan; Clive Carter; Sinisa Savic Journal: Genes Immun Date: 2022-03-09 Impact factor: 4.248