Sarah Sheppard1, Nathan D Lawson, Lihua Julie Zhu. 1. Program in Gene Function and Expression and Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 364 Plantation St, Worcester, MA 01605, USA.
Abstract
MOTIVATION: 3' end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3' ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences. RESULTS: By analyzing sequence features flanking 3' ends derived from oligo-dT-based sequencing, we developed a naïve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites.
MOTIVATION: 3' end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3' ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences. RESULTS: By analyzing sequence features flanking 3' ends derived from oligo-dT-based sequencing, we developed a naïve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites.
Authors: Douglas Kyung Nam; Sanggyu Lee; Guolin Zhou; Xiaohong Cao; Clarence Wang; Terry Clark; Jianjun Chen; Janet D Rowley; San Ming Wang Journal: Proc Natl Acad Sci U S A Date: 2002-04-23 Impact factor: 11.205
Authors: Kerstin Howe; Matthew D Clark; Carlos F Torroja; James Torrance; Camille Berthelot; Matthieu Muffato; John E Collins; Sean Humphray; Karen McLaren; Lucy Matthews; Stuart McLaren; Ian Sealy; Mario Caccamo; Carol Churcher; Carol Scott; Jeffrey C Barrett; Romke Koch; Gerd-Jörg Rauch; Simon White; William Chow; Britt Kilian; Leonor T Quintais; José A Guerra-Assunção; Yi Zhou; Yong Gu; Jennifer Yen; Jan-Hinnerk Vogel; Tina Eyre; Seth Redmond; Ruby Banerjee; Jianxiang Chi; Beiyuan Fu; Elizabeth Langley; Sean F Maguire; Gavin K Laird; David Lloyd; Emma Kenyon; Sarah Donaldson; Harminder Sehra; Jeff Almeida-King; Jane Loveland; Stephen Trevanion; Matt Jones; Mike Quail; Dave Willey; Adrienne Hunt; John Burton; Sarah Sims; Kirsten McLay; Bob Plumb; Joy Davis; Chris Clee; Karen Oliver; Richard Clark; Clare Riddle; David Elliot; David Eliott; Glen Threadgold; Glenn Harden; Darren Ware; Sharmin Begum; Beverley Mortimore; Beverly Mortimer; Giselle Kerry; Paul Heath; Benjamin Phillimore; Alan Tracey; Nicole Corby; Matthew Dunn; Christopher Johnson; Jonathan Wood; Susan Clark; Sarah Pelan; Guy Griffiths; Michelle Smith; Rebecca Glithero; Philip Howden; Nicholas Barker; Christine Lloyd; Christopher Stevens; Joanna Harley; Karen Holt; Georgios Panagiotidis; Jamieson Lovell; Helen Beasley; Carl Henderson; Daria Gordon; Katherine Auger; Deborah Wright; Joanna Collins; Claire Raisen; Lauren Dyer; Kenric Leung; Lauren Robertson; Kirsty Ambridge; Daniel Leongamornlert; Sarah McGuire; Ruth Gilderthorp; Coline Griffiths; Deepa Manthravadi; Sarah Nichol; Gary Barker; Siobhan Whitehead; Michael Kay; Jacqueline Brown; Clare Murnane; Emma Gray; Matthew Humphries; Neil Sycamore; Darren Barker; David Saunders; Justene Wallis; Anne Babbage; Sian Hammond; Maryam Mashreghi-Mohammadi; Lucy Barr; Sancha Martin; Paul Wray; Andrew Ellington; Nicholas Matthews; Matthew Ellwood; Rebecca Woodmansey; Graham Clark; James D Cooper; James Cooper; Anthony Tromans; Darren Grafham; Carl Skuce; Richard Pandian; Robert Andrews; Elliot Harrison; Andrew Kimberley; Jane Garnett; Nigel Fosker; Rebekah Hall; Patrick Garner; Daniel Kelly; Christine Bird; Sophie Palmer; Ines Gehring; Andrea Berger; Christopher M Dooley; Zübeyde Ersan-Ürün; Cigdem Eser; Horst Geiger; Maria Geisler; Lena Karotki; Anette Kirn; Judith Konantz; Martina Konantz; Martina Oberländer; Silke Rudolph-Geiger; Mathias Teucke; Christa Lanz; Günter Raddatz; Kazutoyo Osoegawa; Baoli Zhu; Amanda Rapp; Sara Widaa; Cordelia Langford; Fengtang Yang; Stephan C Schuster; Nigel P Carter; Jennifer Harrow; Zemin Ning; Javier Herrero; Steve M J Searle; Anton Enright; Robert Geisler; Ronald H A Plasterk; Charles Lee; Monte Westerfield; Pieter J de Jong; Leonard I Zon; John H Postlethwait; Christiane Nüsslein-Volhard; Tim J P Hubbard; Hugues Roest Crollius; Jane Rogers; Derek L Stemple Journal: Nature Date: 2013-04-17 Impact factor: 49.962
Authors: Richard J White; John E Collins; Ian M Sealy; Neha Wali; Christopher M Dooley; Zsofia Digby; Derek L Stemple; Daniel N Murphy; Konstantinos Billis; Thibaut Hourlier; Anja Füllgrabe; Matthew P Davis; Anton J Enright; Elisabeth M Busch-Nentwich Journal: Elife Date: 2017-11-16 Impact factor: 8.140
Authors: Shaked Afik; Osnat Bartok; Maxim N Artyomov; Alexander A Shishkin; Sabah Kadri; Mor Hanan; Xiaopeng Zhu; Manuel Garber; Sebastian Kadener Journal: Nucleic Acids Res Date: 2017-06-20 Impact factor: 16.971
Authors: Briana E Mittleman; Sebastian Pott; Shane Warland; Tony Zeng; Zepeng Mu; Mayher Kaur; Yoav Gilad; Yang Li Journal: Elife Date: 2020-06-25 Impact factor: 8.140