Ponraj Prabakaran1, Emily Streaker, Weizao Chen, Dimiter S Dimitrov. 1. Protein Interactions Group, Center for Cancer Research Nanobiology Program, National Cancer Institute (NCI)-Frederick, National Institutes of Health (NIH), Frederick, MD 21702-1201, USA. dimiter.dimitrov@nih.gov.
Abstract
BACKGROUND: 454 sequencing is currently the method of choice for sequencing of antibody repertoires and libraries containing large numbers (106 to 1012) of different molecules with similar frameworks and variable regions which poses significant challenges for identifying sequencing errors. Identification and correction of sequencing errors in such mixtures is especially important for the exploration of complex maturation pathways and identification of putative germline predecessors of highly somatically mutated antibodies. To quantify and correct errors incorporated in 454 antibody sequencing, we sequenced six antibodies at different known concentrations twice over and compared them with the corresponding known sequences as determined by standard Sanger sequencing. RESULTS: We found that 454 antibody sequencing could lead to approximately 20% incorrect reads due to insertions that were mostly found at shorter homopolymer regions of 2-3 nucleotide length, and less so by insertions, deletions and other variants at random sites. Correction of errors might reduce this population of erroneous reads down to 5-10%. However, there are a certain number of errors accounting for 4-8% of the total reads that could not be corrected unless several repeated sequencing is performed, although this may not be possible for large diverse libraries and repertoires including complete sets of antibodies (antibodyomes). CONCLUSIONS: The experimental test procedure carried out for assessing 454 antibody sequencing errors reveals high (up to 20%) incorrect reads; the errors can be reduced down to 5-10% but not less which suggests the use of caution to avoid false discovery of antibody variants and diversity.
BACKGROUND: 454 sequencing is currently the method of choice for sequencing of antibody repertoires and libraries containing large numbers (106 to 1012) of different molecules with similar frameworks and variable regions which poses significant challenges for identifying sequencing errors. Identification and correction of sequencing errors in such mixtures is especially important for the exploration of complex maturation pathways and identification of putative germline predecessors of highly somatically mutated antibodies. To quantify and correct errors incorporated in 454 antibody sequencing, we sequenced six antibodies at different known concentrations twice over and compared them with the corresponding known sequences as determined by standard Sanger sequencing. RESULTS: We found that 454 antibody sequencing could lead to approximately 20% incorrect reads due to insertions that were mostly found at shorter homopolymer regions of 2-3 nucleotide length, and less so by insertions, deletions and other variants at random sites. Correction of errors might reduce this population of erroneous reads down to 5-10%. However, there are a certain number of errors accounting for 4-8% of the total reads that could not be corrected unless several repeated sequencing is performed, although this may not be possible for large diverse libraries and repertoires including complete sets of antibodies (antibodyomes). CONCLUSIONS: The experimental test procedure carried out for assessing 454 antibody sequencing errors reveals high (up to 20%) incorrect reads; the errors can be reduced down to 5-10% but not less which suggests the use of caution to avoid false discovery of antibody variants and diversity.
Authors: Jacob Glanville; Wenwu Zhai; Jan Berka; Dilduz Telman; Gabriella Huerta; Gautam R Mehta; Irene Ni; Li Mei; Purnima D Sundar; Giles M R Day; David Cox; Arvind Rajpal; Jaume Pons Journal: Proc Natl Acad Sci U S A Date: 2009-10-29 Impact factor: 11.205
Authors: Scott D Boyd; Eleanor L Marshall; Jason D Merker; Jay M Maniar; Lyndon N Zhang; Bita Sahaf; Carol D Jones; Birgitte B Simen; Bozena Hanczaruk; Khoa D Nguyen; Kari C Nadeau; Michael Egholm; David B Miklos; James L Zehnder; Andrew Z Fire Journal: Sci Transl Med Date: 2009-12-23 Impact factor: 17.956
Authors: Scott D Boyd; Bruno A Gaëta; Katherine J Jackson; Andrew Z Fire; Eleanor L Marshall; Jason D Merker; Jay M Maniar; Lyndon N Zhang; Bita Sahaf; Carol D Jones; Birgitte B Simen; Bozena Hanczaruk; Khoa D Nguyen; Kari C Nadeau; Michael Egholm; David B Miklos; James L Zehnder; Andrew M Collins Journal: J Immunol Date: 2010-05-21 Impact factor: 5.422
Authors: Jiang Zhu; Gilad Ofek; Yongping Yang; Baoshan Zhang; Mark K Louder; Gabriel Lu; Krisha McKee; Marie Pancera; Jeff Skinner; Zhenhai Zhang; Robert Parks; Joshua Eudailey; Krissey E Lloyd; Julie Blinn; S Munir Alam; Barton F Haynes; Melissa Simek; Dennis R Burton; Wayne C Koff; James C Mullikin; John R Mascola; Lawrence Shapiro; Peter D Kwong Journal: Proc Natl Acad Sci U S A Date: 2013-03-27 Impact factor: 11.205
Authors: Arumugam Palanichamy; Leonard Apeltsin; Tracy C Kuo; Marina Sirota; Shengzhi Wang; Steven J Pitts; Purnima D Sundar; Dilduz Telman; Lora Z Zhao; Mia Derstine; Aya Abounasr; Stephen L Hauser; H-Christian von Büdingen Journal: Sci Transl Med Date: 2014-08-06 Impact factor: 17.956
Authors: Jiang Zhu; Sijy O'Dell; Gilad Ofek; Marie Pancera; Xueling Wu; Baoshan Zhang; Zhenhai Zhang; James C Mullikin; Melissa Simek; Dennis R Burton; Wayne C Koff; Lawrence Shapiro; John R Mascola; Peter D Kwong Journal: Front Microbiol Date: 2012-09-11 Impact factor: 5.640
Authors: George Georgiou; Gregory C Ippolito; John Beausang; Christian E Busse; Hedda Wardemann; Stephen R Quake Journal: Nat Biotechnol Date: 2014-01-19 Impact factor: 54.908
Authors: Velislava N Petrova; Luke Muir; Paul F McKay; George S Vassiliou; Kenneth G C Smith; Paul A Lyons; Colin A Russell; Carl A Anderson; Paul Kellam; Rachael J M Bashford-Rogers Journal: Front Immunol Date: 2018-08-10 Impact factor: 7.561